[jira] [Comment Edited] (FLINK-10455) Potential Kafka producer leak in case of failures

2019-06-04 Thread sunjincheng (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16855384#comment-16855384
 ] 

sunjincheng edited comment on FLINK-10455 at 6/4/19 6:31 AM:
-

Currently, we have added an exception capture for the 
`FlinkKafkaProducer011#commit` method to ensure that the 
`recycleTransactionalProducer` will be executed, so do we also need to add an 
exception capture for `FlinkKafkaProducer011#abort`?

 

!image-2019-06-04-14-30-55-985.png!


was (Author: sunjincheng121):
Currently, we have added an exception capture for the 
`FlinkKafkaProducer011#commit` method to ensure that the 
`recycleTransactionalProducer` will be executed, so do we also need to add an 
exception capture for `FlinkKafkaProducer011#abort`?

 

!image-2019-06-04-14-25-16-916.png!

> Potential Kafka producer leak in case of failures
> -
>
> Key: FLINK-10455
> URL: https://issues.apache.org/jira/browse/FLINK-10455
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka
>Affects Versions: 1.5.2
>Reporter: Nico Kruber
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.5.6, 1.7.0, 1.7.3, 1.9.0, 1.8.1
>
> Attachments: image-2019-06-04-14-25-16-916.png, 
> image-2019-06-04-14-30-55-985.png
>
>
> If the Kafka brokers' timeout is too low for our checkpoint interval [1], we 
> may get an {{ProducerFencedException}}. Documentation around 
> {{ProducerFencedException}} explicitly states that we should close the 
> producer after encountering it.
> By looking at the code, it doesn't seem like this is actually done in 
> {{FlinkKafkaProducer011}}. Also, in case one transaction's commit in 
> {{TwoPhaseCommitSinkFunction#notifyCheckpointComplete}} fails with an 
> exception, we don't clean up (nor try to commit) any other transaction.
> -> from what I see, {{TwoPhaseCommitSinkFunction#notifyCheckpointComplete}} 
> simply iterates over the {{pendingCommitTransactions}} which is not touched 
> during {{close()}}
> Now if we restart the failing job on the same Flink cluster, any resources 
> from the previous attempt will still linger around.
> [1] 
> https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/connectors/kafka.html#kafka-011



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (FLINK-10455) Potential Kafka producer leak in case of failures

2019-05-29 Thread Chris Slotterback (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16851427#comment-16851427
 ] 

Chris Slotterback edited comment on FLINK-10455 at 5/30/19 12:30 AM:
-

[~sunjincheng121] I think it should be solved, as it blocks exactly once 
production for any recovery scenarios. I don't have any flink commit experience 
or time right now, but I would be able to work with whoever does, or I can 
revisit in a couple of weeks and take a stab at fixing it myself.


was (Author: cslotterback):
[~sunjincheng121] I think it should be solved, as it blocks exactly once 
production. I don't have any flink commit experience or time right now, but I 
would be able to work with whoever does, or I can revisit in a couple of weeks 
and take a stab at fixing it myself.

> Potential Kafka producer leak in case of failures
> -
>
> Key: FLINK-10455
> URL: https://issues.apache.org/jira/browse/FLINK-10455
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka
>Affects Versions: 1.5.2
>Reporter: Nico Kruber
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.5.6, 1.7.0, 1.7.3, 1.9.0, 1.8.1
>
>
> If the Kafka brokers' timeout is too low for our checkpoint interval [1], we 
> may get an {{ProducerFencedException}}. Documentation around 
> {{ProducerFencedException}} explicitly states that we should close the 
> producer after encountering it.
> By looking at the code, it doesn't seem like this is actually done in 
> {{FlinkKafkaProducer011}}. Also, in case one transaction's commit in 
> {{TwoPhaseCommitSinkFunction#notifyCheckpointComplete}} fails with an 
> exception, we don't clean up (nor try to commit) any other transaction.
> -> from what I see, {{TwoPhaseCommitSinkFunction#notifyCheckpointComplete}} 
> simply iterates over the {{pendingCommitTransactions}} which is not touched 
> during {{close()}}
> Now if we restart the failing job on the same Flink cluster, any resources 
> from the previous attempt will still linger around.
> [1] 
> https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/connectors/kafka.html#kafka-011



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (FLINK-10455) Potential Kafka producer leak in case of failures

2019-05-29 Thread sunjincheng (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16850556#comment-16850556
 ] 

sunjincheng edited comment on FLINK-10455 at 5/29/19 9:40 AM:
--

What is the current situation of this JIRA, is there anyone else to follow up? 
[~till.rohrmann] [~cslotterback]

[~azagrebin]  Do you want to continue to solve this problem?

Since this is a blocker issue for release 1.8.1, we're expecting a quick 
response here.


Thanks.


was (Author: sunjincheng121):
What is the current situation of this JIRA, is there anyone else to follow up? 
[~till.rohrmann] [~cslotterback]

[~azagrebin]  Do you want to continue to solve this problem?

> Potential Kafka producer leak in case of failures
> -
>
> Key: FLINK-10455
> URL: https://issues.apache.org/jira/browse/FLINK-10455
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka
>Affects Versions: 1.5.2
>Reporter: Nico Kruber
>Assignee: Andrey Zagrebin
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.5.6, 1.7.0, 1.7.3, 1.9.0, 1.8.1
>
>
> If the Kafka brokers' timeout is too low for our checkpoint interval [1], we 
> may get an {{ProducerFencedException}}. Documentation around 
> {{ProducerFencedException}} explicitly states that we should close the 
> producer after encountering it.
> By looking at the code, it doesn't seem like this is actually done in 
> {{FlinkKafkaProducer011}}. Also, in case one transaction's commit in 
> {{TwoPhaseCommitSinkFunction#notifyCheckpointComplete}} fails with an 
> exception, we don't clean up (nor try to commit) any other transaction.
> -> from what I see, {{TwoPhaseCommitSinkFunction#notifyCheckpointComplete}} 
> simply iterates over the {{pendingCommitTransactions}} which is not touched 
> during {{close()}}
> Now if we restart the failing job on the same Flink cluster, any resources 
> from the previous attempt will still linger around.
> [1] 
> https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/connectors/kafka.html#kafka-011



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)