[jira] [Commented] (FLINK-12675) Event time synchronization in Kafka consumer
[ https://issues.apache.org/jira/browse/FLINK-12675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17638662#comment-17638662 ] Thomas Weise commented on FLINK-12675: -- Partition/split level alignment is supported with the new KafkaSource: https://issues.apache.org/jira/browse/FLINK-28853 > Event time synchronization in Kafka consumer > > > Key: FLINK-12675 > URL: https://issues.apache.org/jira/browse/FLINK-12675 > Project: Flink > Issue Type: Sub-task > Components: Connectors / Kafka >Reporter: Thomas Weise >Priority: Major > Labels: auto-unassigned, pull-request-available > Attachments: 0001-Kafka-event-time-alignment.patch > > > Integrate the source watermark tracking into the Kafka consumer and implement > the sync mechanism (different consumer model, compared to Kinesis). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-12675) Event time synchronization in Kafka consumer
[ https://issues.apache.org/jira/browse/FLINK-12675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17350597#comment-17350597 ] Alexey Trenikhin commented on FLINK-12675: -- Does new Source interface support suspend/resume ? > Event time synchronization in Kafka consumer > > > Key: FLINK-12675 > URL: https://issues.apache.org/jira/browse/FLINK-12675 > Project: Flink > Issue Type: Sub-task > Components: Connectors / Kafka >Reporter: Thomas Weise >Priority: Major > Labels: auto-unassigned, pull-request-available > Attachments: 0001-Kafka-event-time-alignment.patch > > > Integrate the source watermark tracking into the Kafka consumer and implement > the sync mechanism (different consumer model, compared to Kinesis). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-12675) Event time synchronization in Kafka consumer
[ https://issues.apache.org/jira/browse/FLINK-12675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17333924#comment-17333924 ] Flink Jira Bot commented on FLINK-12675: This issue was marked "stale-assigned" and has not received an update in 7 days. It is now automatically unassigned. If you are still working on it, you can assign it to yourself again. Please also give an update about the status of the work. > Event time synchronization in Kafka consumer > > > Key: FLINK-12675 > URL: https://issues.apache.org/jira/browse/FLINK-12675 > Project: Flink > Issue Type: Sub-task > Components: Connectors / Kafka >Reporter: Thomas Weise >Assignee: Akshay Aggarwal >Priority: Major > Labels: pull-request-available, stale-assigned > Attachments: 0001-Kafka-event-time-alignment.patch > > > Integrate the source watermark tracking into the Kafka consumer and implement > the sync mechanism (different consumer model, compared to Kinesis). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-12675) Event time synchronization in Kafka consumer
[ https://issues.apache.org/jira/browse/FLINK-12675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17323266#comment-17323266 ] Flink Jira Bot commented on FLINK-12675: This issue is assigned but has not received an update in 7 days so it has been labeled "stale-assigned". If you are still working on the issue, please give an update and remove the label. If you are no longer working on the issue, please unassign so someone else may work on it. In 7 days the issue will be automatically unassigned. > Event time synchronization in Kafka consumer > > > Key: FLINK-12675 > URL: https://issues.apache.org/jira/browse/FLINK-12675 > Project: Flink > Issue Type: Sub-task > Components: Connectors / Kafka >Reporter: Thomas Weise >Assignee: Akshay Aggarwal >Priority: Major > Labels: pull-request-available, stale-assigned > Attachments: 0001-Kafka-event-time-alignment.patch > > > Integrate the source watermark tracking into the Kafka consumer and implement > the sync mechanism (different consumer model, compared to Kinesis). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-12675) Event time synchronization in Kafka consumer
[ https://issues.apache.org/jira/browse/FLINK-12675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17116153#comment-17116153 ] Stephan Ewen commented on FLINK-12675: -- The existing Kafka Connector is already super complex (partially due to some bad architecture) and very hard to keep stable. We have been burned a bit in the past with adding more non-trivial features. Given that this is probably the most important connector, I would favor to not add more features to it at this point. Can you fork the Kafka Connector and add the event-time alignment to that fork? That way we don't affect the connector in the core. Putting the connector onto https://flink-packages.org would be a way to share it with more users. For the new Source Interface, I would be up to start a discussion about how a generic event-time alignment mechanism would look like, implemented in the {{SourceReaderBase}} and {{SplitReader}} classes. Something like an extended {{SplitReader}} interface that can handle callbacks in the form of {{suspendSplitReadind(splitId)}} and {{resumeSplitReading(splitId)}} or so. > Event time synchronization in Kafka consumer > > > Key: FLINK-12675 > URL: https://issues.apache.org/jira/browse/FLINK-12675 > Project: Flink > Issue Type: Sub-task > Components: Connectors / Kafka >Reporter: Thomas Weise >Assignee: Akshay Aggarwal >Priority: Major > Labels: pull-request-available > Attachments: 0001-Kafka-event-time-alignment.patch > > > Integrate the source watermark tracking into the Kafka consumer and implement > the sync mechanism (different consumer model, compared to Kinesis). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-12675) Event time synchronization in Kafka consumer
[ https://issues.apache.org/jira/browse/FLINK-12675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17115924#comment-17115924 ] Aljoscha Krettek commented on FLINK-12675: -- Sorry for chiming in this late but I think with recent developments I don't think we should put more work into the existing Kafka connector and add more complexity there. FLINK-10740 has mostly landed in master and Becket is working on a Kafka Connector using the new interfaces. With the new interfaces, we should also focus on solving even-time alignment at the framework level and not for individual connectors. [~becket_qin], could you please chime in with your progress here? cc [~sewen] who's also working on FLIP-27 and might have a comment. > Event time synchronization in Kafka consumer > > > Key: FLINK-12675 > URL: https://issues.apache.org/jira/browse/FLINK-12675 > Project: Flink > Issue Type: Sub-task > Components: Connectors / Kafka >Reporter: Thomas Weise >Assignee: Akshay Aggarwal >Priority: Major > Labels: pull-request-available > Attachments: 0001-Kafka-event-time-alignment.patch > > > Integrate the source watermark tracking into the Kafka consumer and implement > the sync mechanism (different consumer model, compared to Kinesis). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-12675) Event time synchronization in Kafka consumer
[ https://issues.apache.org/jira/browse/FLINK-12675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17097822#comment-17097822 ] Akshay Aggarwal commented on FLINK-12675: - Thanks [~thw]. I've raised a PR, can someone help review? > Event time synchronization in Kafka consumer > > > Key: FLINK-12675 > URL: https://issues.apache.org/jira/browse/FLINK-12675 > Project: Flink > Issue Type: Sub-task > Components: Connectors / Kafka >Reporter: Thomas Weise >Assignee: Akshay Aggarwal >Priority: Major > Labels: pull-request-available > Attachments: 0001-Kafka-event-time-alignment.patch > > > Integrate the source watermark tracking into the Kafka consumer and implement > the sync mechanism (different consumer model, compared to Kinesis). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-12675) Event time synchronization in Kafka consumer
[ https://issues.apache.org/jira/browse/FLINK-12675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17096587#comment-17096587 ] Thomas Weise commented on FLINK-12675: -- [~akshay-aggarwal] thanks for working on this. I assigned the ticket to you. Can you please open a pull request, this will make it easier to review. > Event time synchronization in Kafka consumer > > > Key: FLINK-12675 > URL: https://issues.apache.org/jira/browse/FLINK-12675 > Project: Flink > Issue Type: Sub-task > Components: Connectors / Kafka >Reporter: Thomas Weise >Assignee: Akshay Aggarwal >Priority: Major > Attachments: 0001-Kafka-event-time-alignment.patch > > > Integrate the source watermark tracking into the Kafka consumer and implement > the sync mechanism (different consumer model, compared to Kinesis). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-12675) Event time synchronization in Kafka consumer
[ https://issues.apache.org/jira/browse/FLINK-12675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17096462#comment-17096462 ] Akshay Aggarwal commented on FLINK-12675: - [~wind_ljy] I've implemented the design you had attached earlier, since the feature was becoming fairly important for us I quickly wrote the code without worrying about the WatermarkTracker based approach. It seems to be working fine when I tested it locally. The patch I've attached is on top of 1.10.0 release. It'll be great if you can review it once, also I'd be happy to implement the changes with WatermarkTracker if you can share some guidelines on dev and testing. [^0001-Kafka-event-time-alignment.patch] > Event time synchronization in Kafka consumer > > > Key: FLINK-12675 > URL: https://issues.apache.org/jira/browse/FLINK-12675 > Project: Flink > Issue Type: Sub-task > Components: Connectors / Kafka >Reporter: Thomas Weise >Assignee: Thomas Weise >Priority: Major > Attachments: 0001-Kafka-event-time-alignment.patch > > > Integrate the source watermark tracking into the Kafka consumer and implement > the sync mechanism (different consumer model, compared to Kinesis). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-12675) Event time synchronization in Kafka consumer
[ https://issues.apache.org/jira/browse/FLINK-12675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16967243#comment-16967243 ] Jiayi Liao commented on FLINK-12675: [~thw] Since the FLIP-27 design will be updated soon according to the mail thread, I'm going to help to review the updated design and continue the work after that. > Event time synchronization in Kafka consumer > > > Key: FLINK-12675 > URL: https://issues.apache.org/jira/browse/FLINK-12675 > Project: Flink > Issue Type: Sub-task > Components: Connectors / Kafka >Reporter: Thomas Weise >Assignee: Thomas Weise >Priority: Major > > Integrate the source watermark tracking into the Kafka consumer and implement > the sync mechanism (different consumer model, compared to Kinesis). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-12675) Event time synchronization in Kafka consumer
[ https://issues.apache.org/jira/browse/FLINK-12675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16966726#comment-16966726 ] Jiayi Liao commented on FLINK-12675: [~thw] Thanks for attaching this. I've already given attention to this :). And I'll elaborate more details in design in these days. > Event time synchronization in Kafka consumer > > > Key: FLINK-12675 > URL: https://issues.apache.org/jira/browse/FLINK-12675 > Project: Flink > Issue Type: Sub-task > Components: Connectors / Kafka >Reporter: Thomas Weise >Assignee: Thomas Weise >Priority: Major > > Integrate the source watermark tracking into the Kafka consumer and implement > the sync mechanism (different consumer model, compared to Kinesis). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-12675) Event time synchronization in Kafka consumer
[ https://issues.apache.org/jira/browse/FLINK-12675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16966690#comment-16966690 ] Thomas Weise commented on FLINK-12675: -- [https://lists.apache.org/thread.html/a168a3634a329d1a2c278f660c995f98d821585e75eabe7c38629def@%3Cdev.flink.apache.org%3E] > Event time synchronization in Kafka consumer > > > Key: FLINK-12675 > URL: https://issues.apache.org/jira/browse/FLINK-12675 > Project: Flink > Issue Type: Sub-task > Components: Connectors / Kafka >Reporter: Thomas Weise >Assignee: Thomas Weise >Priority: Major > > Integrate the source watermark tracking into the Kafka consumer and implement > the sync mechanism (different consumer model, compared to Kinesis). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-12675) Event time synchronization in Kafka consumer
[ https://issues.apache.org/jira/browse/FLINK-12675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16965704#comment-16965704 ] Thomas Weise commented on FLINK-12675: -- [~wind_ljy] perhaps it would be best if you outline the next level of detail in the design doc? We should avoid refactoring if possible, to minimize the surface of these changes. I just checked and see that the universal consumer (FlinkKafkaConsumer) does not share all the internals with older, Kafka version specific implementation, so we can hopefully avoid touching those altogether. > Event time synchronization in Kafka consumer > > > Key: FLINK-12675 > URL: https://issues.apache.org/jira/browse/FLINK-12675 > Project: Flink > Issue Type: Sub-task > Components: Connectors / Kafka >Reporter: Thomas Weise >Assignee: Thomas Weise >Priority: Major > > Integrate the source watermark tracking into the Kafka consumer and implement > the sync mechanism (different consumer model, compared to Kinesis). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-12675) Event time synchronization in Kafka consumer
[ https://issues.apache.org/jira/browse/FLINK-12675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16965575#comment-16965575 ] Jiayi Liao commented on FLINK-12675: [~thw] Okay, I'll continue my work. And if you don't mind, I'll create a few subtasks and start from works like integrating #WatermarkTracker and removing duplicate codes in multiple Kakfa module, which will help to implement this feature. > Event time synchronization in Kafka consumer > > > Key: FLINK-12675 > URL: https://issues.apache.org/jira/browse/FLINK-12675 > Project: Flink > Issue Type: Sub-task > Components: Connectors / Kafka >Reporter: Thomas Weise >Assignee: Thomas Weise >Priority: Major > > Integrate the source watermark tracking into the Kafka consumer and implement > the sync mechanism (different consumer model, compared to Kinesis). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-12675) Event time synchronization in Kafka consumer
[ https://issues.apache.org/jira/browse/FLINK-12675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16965425#comment-16965425 ] Thomas Weise commented on FLINK-12675: -- [~wind_ljy] FLIP-27 has still ways to go and is going to result in brand new implementations for some of the existing connectors, while the old implementations that are used in production will need to be retained for a while until new things reach parity and maturity. The purpose of pointing you to FLIP-27 is not to arrive at compatible solutions (which is impossible) but rather as input for specific aspects such as the record and watermark emitter. It would be great if you continue your work for the existing consumer, or else I can maybe pick it up as we will need this problem solved fairly soon. > Event time synchronization in Kafka consumer > > > Key: FLINK-12675 > URL: https://issues.apache.org/jira/browse/FLINK-12675 > Project: Flink > Issue Type: Sub-task > Components: Connectors / Kafka >Reporter: Thomas Weise >Assignee: Thomas Weise >Priority: Major > > Integrate the source watermark tracking into the Kafka consumer and implement > the sync mechanism (different consumer model, compared to Kinesis). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-12675) Event time synchronization in Kafka consumer
[ https://issues.apache.org/jira/browse/FLINK-12675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16965417#comment-16965417 ] Jiayi Liao commented on FLINK-12675: [~thw] Thanks for your review. I've addressed some downsides of the design according to your comments. And It took me some time to look into FLIP-27 and Idle Partition related discussions. I'm concerned that this design may have conflicts with FLIP-27 because I can see FLIP-27 refactor may introduce completely new abstraction for the communication between data production (#KafkaConsumerThread) and data reads (#KafkaFetcher) in all connectors, which pretty much includes my whole design. IMO at least we should wait until a FLIP-27 MVP or maybe I can help to consider the feedback mechanism into the new #Reader implementation. But currently there're still some works worth doing like moving #WatermarkTracker into core you mentioned (we'll finally do it either way). What do you think? > Event time synchronization in Kafka consumer > > > Key: FLINK-12675 > URL: https://issues.apache.org/jira/browse/FLINK-12675 > Project: Flink > Issue Type: Sub-task > Components: Connectors / Kafka >Reporter: Thomas Weise >Assignee: Thomas Weise >Priority: Major > > Integrate the source watermark tracking into the Kafka consumer and implement > the sync mechanism (different consumer model, compared to Kinesis). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-12675) Event time synchronization in Kafka consumer
[ https://issues.apache.org/jira/browse/FLINK-12675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963185#comment-16963185 ] Thomas Weise commented on FLINK-12675: -- [~wind_ljy] thanks for the proposal! I added comments to the document. > Event time synchronization in Kafka consumer > > > Key: FLINK-12675 > URL: https://issues.apache.org/jira/browse/FLINK-12675 > Project: Flink > Issue Type: Sub-task > Components: Connectors / Kafka >Reporter: Thomas Weise >Assignee: Thomas Weise >Priority: Major > > Integrate the source watermark tracking into the Kafka consumer and implement > the sync mechanism (different consumer model, compared to Kinesis). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-12675) Event time synchronization in Kafka consumer
[ https://issues.apache.org/jira/browse/FLINK-12675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16960301#comment-16960301 ] Jiayi Liao commented on FLINK-12675: [~thw] I've drafted a design for this. It follows your design on Kinesis Source but there're still a lot of changes due to the difference between these two connectors processing model. Could you take a look if you can spare time? https://docs.google.com/document/d/1d4XhVK4BD9GGNqhfSk_U30nEYOXx8THCHhR77RNbIbA/edit# > Event time synchronization in Kafka consumer > > > Key: FLINK-12675 > URL: https://issues.apache.org/jira/browse/FLINK-12675 > Project: Flink > Issue Type: Sub-task > Components: Connectors / Kafka >Reporter: Thomas Weise >Assignee: Thomas Weise >Priority: Major > > Integrate the source watermark tracking into the Kafka consumer and implement > the sync mechanism (different consumer model, compared to Kinesis). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-12675) Event time synchronization in Kafka consumer
[ https://issues.apache.org/jira/browse/FLINK-12675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1692#comment-1692 ] Thomas Weise commented on FLINK-12675: -- [~wind_ljy] thanks and feel free to take over this ticket once you are ready to work on it. Happy to help with the review. > Event time synchronization in Kafka consumer > > > Key: FLINK-12675 > URL: https://issues.apache.org/jira/browse/FLINK-12675 > Project: Flink > Issue Type: Sub-task > Components: Connectors / Kafka >Reporter: Thomas Weise >Assignee: Thomas Weise >Priority: Major > > Integrate the source watermark tracking into the Kafka consumer and implement > the sync mechanism (different consumer model, compared to Kinesis). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-12675) Event time synchronization in Kafka consumer
[ https://issues.apache.org/jira/browse/FLINK-12675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16954635#comment-16954635 ] Jiayi Liao commented on FLINK-12675: [~thw] This is a good thinking and should be done. If no one is working on this, I can take this. I will draft a detailed design in a few days. cc [~becket_qin] > Event time synchronization in Kafka consumer > > > Key: FLINK-12675 > URL: https://issues.apache.org/jira/browse/FLINK-12675 > Project: Flink > Issue Type: Sub-task > Components: Connectors / Kafka >Reporter: Thomas Weise >Assignee: Thomas Weise >Priority: Major > > Integrate the source watermark tracking into the Kafka consumer and implement > the sync mechanism (different consumer model, compared to Kinesis). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-12675) Event time synchronization in Kafka consumer
[ https://issues.apache.org/jira/browse/FLINK-12675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16942196#comment-16942196 ] Thomas Weise commented on FLINK-12675: -- [~gerardg] yes it is still the plan to implement source synchronization for the Kafka consumer. Just have not gotten around to work on this. If someone else is interested to take this over, please let me know. The implementation will be quite different from the Kinesis consumer. Partitions are merged in the Kafka client and we will have to pause/resume individual partitions while in the Kinesis case we can just control the consumer thread in the record emitter. > Event time synchronization in Kafka consumer > > > Key: FLINK-12675 > URL: https://issues.apache.org/jira/browse/FLINK-12675 > Project: Flink > Issue Type: Sub-task > Components: Connectors / Kafka >Reporter: Thomas Weise >Assignee: Thomas Weise >Priority: Major > > Integrate the source watermark tracking into the Kafka consumer and implement > the sync mechanism (different consumer model, compared to Kinesis). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-12675) Event time synchronization in Kafka consumer
[ https://issues.apache.org/jira/browse/FLINK-12675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932209#comment-16932209 ] Gerard Garcia commented on FLINK-12675: --- I understand that the Kinesis source already have watermark synchronization between shards and that this would be the next step. Is that still the plan? > Event time synchronization in Kafka consumer > > > Key: FLINK-12675 > URL: https://issues.apache.org/jira/browse/FLINK-12675 > Project: Flink > Issue Type: Sub-task > Components: Connectors / Kafka >Reporter: Thomas Weise >Assignee: Thomas Weise >Priority: Major > > Integrate the source watermark tracking into the Kafka consumer and implement > the sync mechanism (different consumer model, compared to Kinesis). -- This message was sent by Atlassian Jira (v8.3.4#803005)