[jira] [Commented] (KAFKA-9982) [kafka-connect] Source connector does not guarantee at least once delivery

2020-05-30 Thread Qinghui Xu (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17120173#comment-17120173
 ] 

Qinghui Xu commented on KAFKA-9982:
---

Thanks [~ChrisEgerton], very nice pointers, now it starts to make sense for me.
But I still have doubts on some scenarios, I'll try to make some tests to 
check. Will keep you posted about it.

> [kafka-connect] Source connector does not guarantee at least once delivery
> --
>
> Key: KAFKA-9982
> URL: https://issues.apache.org/jira/browse/KAFKA-9982
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 2.5.0
>Reporter: Qinghui Xu
>Priority: Major
>
> In kafka-connect runtime, the WorkerSourceTask is responsible for sending 
> records to the destination topics and managing the source offset commit. 
> Committed offsets are then used later for recovery of tasks during rebalance 
> or restart.
> But there are two concerns when looking into the WorkerSourceTask 
> implementation:
>  * When producer fail to send records, there's no retry but just skipping 
> offset commit and then execute next loop (poll for new records)
>  * The offset commit and effectively sending records over network are in fact 
> asynchronous, which means the offset commit could happen before records are 
> received by brokers, and a rebalance/restart in this gap could lead to 
> message loss.
> The conclusion is thus that the source connector does not support at least 
> once semantics by default (without the plugin implementation making extra 
> effort itself). I consider this as a bug.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9982) [kafka-connect] Source connector does not guarantee at least once delivery

2020-05-28 Thread Chris Egerton (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118842#comment-17118842
 ] 

Chris Egerton commented on KAFKA-9982:
--

Hi [~q.xu], although offset commit is asynchronous, the framework doesn't 
actually commit offsets for records that haven't been ack'd by the broker yet.

During offset commit, the worker will first [create a 
snapshot|https://github.com/apache/kafka/blob/1c4eb1a5757df611735cfac9b709e0d80d0da4b3/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSourceTask.java#L486]
 of the offsets for to-be-committed records, and then [wait for all of those 
records to be 
ack'd|https://github.com/apache/kafka/blob/1c4eb1a5757df611735cfac9b709e0d80d0da4b3/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSourceTask.java#L490-L492]
 by the broker before committing the offsets in that snapshot.

There is a small issue here where, if the producer fails to send a record with 
a non-retriable exception, none of the offsets in that snapshot will be 
committed before the task is failed, instead of just the offsets for that 
failed record or possibly any record from that point on. But the only downside 
of this is duplicate records being sent on task restart, which doesn't 
compromise the at-least-once delivery guarantees already provided by the 
framework.

Does that clear things up? I'm happy to continue the discussion if you have 
other questions or remarks :)

> [kafka-connect] Source connector does not guarantee at least once delivery
> --
>
> Key: KAFKA-9982
> URL: https://issues.apache.org/jira/browse/KAFKA-9982
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 2.5.0
>Reporter: Qinghui Xu
>Priority: Major
>
> In kafka-connect runtime, the WorkerSourceTask is responsible for sending 
> records to the destination topics and managing the source offset commit. 
> Committed offsets are then used later for recovery of tasks during rebalance 
> or restart.
> But there are two concerns when looking into the WorkerSourceTask 
> implementation:
>  * When producer fail to send records, there's no retry but just skipping 
> offset commit and then execute next loop (poll for new records)
>  * The offset commit and effectively sending records over network are in fact 
> asynchronous, which means the offset commit could happen before records are 
> received by brokers, and a rebalance/restart in this gap could lead to 
> message loss.
> The conclusion is thus that the source connector does not support at least 
> once semantics by default (without the plugin implementation making extra 
> effort itself). I consider this as a bug.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9982) [kafka-connect] Source connector does not guarantee at least once delivery

2020-05-28 Thread Qinghui Xu (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118591#comment-17118591
 ] 

Qinghui Xu commented on KAFKA-9982:
---

For now as there is no metrics provided in the framework, it's difficult to 
track message lost (or not).

> [kafka-connect] Source connector does not guarantee at least once delivery
> --
>
> Key: KAFKA-9982
> URL: https://issues.apache.org/jira/browse/KAFKA-9982
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 2.5.0
>Reporter: Qinghui Xu
>Priority: Major
>
> In kafka-connect runtime, the WorkerSourceTask is responsible for sending 
> records to the destination topics and managing the source offset commit. 
> Committed offsets are then used later for recovery of tasks during rebalance 
> or restart.
> But there are two concerns when looking into the WorkerSourceTask 
> implementation:
>  * When producer fail to send records, there's no retry but just skipping 
> offset commit and then execute next loop (poll for new records)
>  * The offset commit and effectively sending records over network are in fact 
> asynchronous, which means the offset commit could happen before records are 
> received by brokers, and a rebalance/restart in this gap could lead to 
> message loss.
> The conclusion is thus that the source connector does not support at least 
> once semantics by default (without the plugin implementation making extra 
> effort itself). I consider this as a bug.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9982) [kafka-connect] Source connector does not guarantee at least once delivery

2020-05-28 Thread Qinghui Xu (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118586#comment-17118586
 ] 

Qinghui Xu commented on KAFKA-9982:
---

Hi, [~ChrisEgerton] Sorry for the delay.

> As of the fix for https://issues.apache.org/jira/browse/KAFKA-8586, the 
>framework will cease processing records from a source task if it fails to send 
>a record to Kafka.
This indeed addresses my concern about failure retries.

> The framework does use an entirely different producer to write source offsets 
>to Kafka, but no offsets are written to Kafka unless the record they 
>correspond to has been ack'd by the broker and safely made it to Kafka.

I don't see how this is guaranteed, though. As offset commit and producer 
sending records are asynchronous, the two can happen in any order, and if the 
task is lost/restarted in the middle, there's a chance that offset is committed 
while records are not yet sent.

> [kafka-connect] Source connector does not guarantee at least once delivery
> --
>
> Key: KAFKA-9982
> URL: https://issues.apache.org/jira/browse/KAFKA-9982
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 2.5.0
>Reporter: Qinghui Xu
>Priority: Major
>
> In kafka-connect runtime, the WorkerSourceTask is responsible for sending 
> records to the destination topics and managing the source offset commit. 
> Committed offsets are then used later for recovery of tasks during rebalance 
> or restart.
> But there are two concerns when looking into the WorkerSourceTask 
> implementation:
>  * When producer fail to send records, there's no retry but just skipping 
> offset commit and then execute next loop (poll for new records)
>  * The offset commit and effectively sending records over network are in fact 
> asynchronous, which means the offset commit could happen before records are 
> received by brokers, and a rebalance/restart in this gap could lead to 
> message loss.
> The conclusion is thus that the source connector does not support at least 
> once semantics by default (without the plugin implementation making extra 
> effort itself). I consider this as a bug.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9982) [kafka-connect] Source connector does not guarantee at least once delivery

2020-05-19 Thread Chris Egerton (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17111563#comment-17111563
 ] 

Chris Egerton commented on KAFKA-9982:
--

[~q.xu] I'd like to mark this closed, but if you've encountered problems with 
at-least-once delivery in the framework please feel free to reopen with steps 
to reproduce However, from what I can tell the framework does provide this 
guarantee correctly, or at least, it doesn't have the issues described in this 
ticket.

> [kafka-connect] Source connector does not guarantee at least once delivery
> --
>
> Key: KAFKA-9982
> URL: https://issues.apache.org/jira/browse/KAFKA-9982
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 2.5.0
>Reporter: Qinghui Xu
>Priority: Major
>
> In kafka-connect runtime, the WorkerSourceTask is responsible for sending 
> records to the destination topics and managing the source offset commit. 
> Committed offsets are then used later for recovery of tasks during rebalance 
> or restart.
> But there are two concerns when looking into the WorkerSourceTask 
> implementation:
>  * When producer fail to send records, there's no retry but just skipping 
> offset commit and then execute next loop (poll for new records)
>  * The offset commit and effectively sending records over network are in fact 
> asynchronous, which means the offset commit could happen before records are 
> received by brokers, and a rebalance/restart in this gap could lead to 
> message loss.
> The conclusion is thus that the source connector does not support at least 
> once semantics by default (without the plugin implementation making extra 
> effort itself). I consider this as a bug.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9982) [kafka-connect] Source connector does not guarantee at least once delivery

2020-05-12 Thread Chris Egerton (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17105942#comment-17105942
 ] 

Chris Egerton commented on KAFKA-9982:
--

The producers the framework uses to write data from source tasks to Kafka are 
[configured 
conservatively|https://github.com/apache/kafka/blob/9bc96d54f8953d190a1fb6478a0656f049ee3b32/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/Worker.java#L557-L563]
 to prevent multiple concurrent in-flight requests, which might compromise the 
ordering of the records.

As of the fix for https://issues.apache.org/jira/browse/KAFKA-8586, the 
framework will cease processing records from a source task if it fails to send 
a record to Kafka.

The framework does use an entirely different producer to write source offsets 
to Kafka, but no offsets are written to Kafka unless the record they correspond 
has been ack'd by the broker and safely made it to Kafka.

[~q.xu] based on the source code for the worker, I don't think this analysis is 
correct. Have you observed this behavior yourself?

> [kafka-connect] Source connector does not guarantee at least once delivery
> --
>
> Key: KAFKA-9982
> URL: https://issues.apache.org/jira/browse/KAFKA-9982
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 2.5.0
>Reporter: Qinghui Xu
>Priority: Major
>
> In kafka-connect runtime, the WorkerSourceTask is responsible for sending 
> records to the destination topics and managing the source offset commit. 
> Committed offsets are then used later for recovery of tasks during rebalance 
> or restart.
> But there are two concerns when looking into the WorkerSourceTask 
> implementation:
>  * When producer fail to send records, there's no retry but just skipping 
> offset commit and then execute next loop (poll for new records)
>  * The offset commit and effectively sending records over network are in fact 
> asynchronous, which means the offset commit could happen before records are 
> received by brokers, and a rebalance/restart in this gap could lead to 
> message loss.
> The conclusion is thus that the source connector does not support at least 
> once semantics by default (without the plugin implementation making extra 
> effort itself). I consider this as a bug.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)