[GitHub] [flink] tweise commented on pull request #20289: [FLINK-23633][FLIP-150][connector/common] HybridSource: Support dynamic stop position in FileSource
tweise commented on PR #20289: URL: https://github.com/apache/flink/pull/20289#issuecomment-1196887340 @xinbinhuang thanks for describing the thought process. As you already mentioned, the goal of the JIRA was to add the passing of end position to file source and when we implemented FLIP-150 we presumably already added everything that is required to achieve that goal to the HybridSource. I think we need to zoom in why or why not the enumerator knows the actual stop position without involvement of the reader. It is correct that we do not know the stop position at graph construction time or otherwise we would not need any runtime behavior. However, the enumerator already knows what splits have been processed because it has passed those to the readers and the readers have finished the splits that they got assigned. Remember that we are dealing with **bounded** sources. So there really should be no need to pass splits back to the enumerator. Now it will depend on the specific type of source how the enumerator communicates the end position to the next enumerator. In most typical cases that will simply come from the partition metadata (iceberg, files). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] tweise commented on pull request #20289: [FLINK-23633][FLIP-150][connector/common] HybridSource: Support dynamic stop position in FileSource
tweise commented on PR #20289: URL: https://github.com/apache/flink/pull/20289#issuecomment-1196163399 @xinbinhuang looking at all the modifications to HybridSource itself I think it is necessary to take a step back here and discuss the design aspects first. The underlying sources are a sequence of bounded sources, optionally followed by an unbounded source. Therefore, there should be no need to have a "dynamic reader" that does special things. The enumerator knows upfront which splits need to be processed and when it is finished. The HybridSource already has the support to transfer the end position to the next enumerator. That was part of the FLIP and the details can be found https://cwiki.apache.org/confluence/display/FLINK/FLIP-150%3A+Introduce+Hybrid+Source and you can find an example in the tests: HybridSourceITCase.sourceWithDynamicSwitchPosition -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org