[GitHub] [flink] tweise commented on pull request #20289: [FLINK-23633][FLIP-150][connector/common] HybridSource: Support dynamic stop position in FileSource

2022-07-27 Thread GitBox


tweise commented on PR #20289:
URL: https://github.com/apache/flink/pull/20289#issuecomment-1196887340

   @xinbinhuang thanks for describing the thought process.
   
   As you already mentioned, the goal of the JIRA was to add the passing of end 
position to file source and when we implemented FLIP-150 we presumably already 
added everything that is required to achieve that goal to the HybridSource. 
   
   I think we need to zoom in why or why not the enumerator knows the actual 
stop position without involvement of the reader. 
   
   It is correct that we do not know the stop position at graph construction 
time or otherwise we would not need any runtime behavior. However, the 
enumerator already knows what splits have been processed because it has passed 
those to the readers and the readers have finished the splits that they got 
assigned. Remember that we are dealing with **bounded** sources. So there 
really should be no need to pass splits back to the enumerator. Now it will 
depend on the specific type of source how the enumerator communicates the end 
position to the next enumerator. In most typical cases that will simply come 
from the partition metadata (iceberg, files).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] tweise commented on pull request #20289: [FLINK-23633][FLIP-150][connector/common] HybridSource: Support dynamic stop position in FileSource

2022-07-26 Thread GitBox


tweise commented on PR #20289:
URL: https://github.com/apache/flink/pull/20289#issuecomment-1196163399

   @xinbinhuang looking at all the modifications to HybridSource itself I think 
it is necessary to take a step back here and discuss the design aspects first.
   
   The underlying sources are a sequence of bounded sources, optionally 
followed by an unbounded source. Therefore, there should be no need to have a 
"dynamic reader" that does special things. The enumerator knows upfront which 
splits need to be processed and when it is finished.
   
   The HybridSource already has the support to transfer the end position to the 
next enumerator. That was part of the FLIP and the details can be found 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-150%3A+Introduce+Hybrid+Source
 and you can find an example in the tests: 
HybridSourceITCase.sourceWithDynamicSwitchPosition
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org