[ 
https://issues.apache.org/jira/browse/FLINK-21364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17501525#comment-17501525
 ] 

Steven Zhen Wu commented on FLINK-21364:
----------------------------------------

[~pnowojski] sorry for missed you message earlier.

Yeah, the motivation is for watermark alignment for Iceberg source, where the 
watermark alignment is happening in the enumerator. Hence the enumerator needs 
to know which files/splits are completed and decides whether to advance 
watermark or not.
https://github.com/stevenzwu/iceberg/blob/flip27IcebergSource/flink/v1.13/flink/src/main/java/org/apache/iceberg/flink/source/enumerator/AbstractIcebergEnumerator.java#L89

For sources with unbounded split (like Kafka), I believe the watermark 
alignment is done at readers side.

> piggyback finishedSplitIds in RequestSplitEvent
> -----------------------------------------------
>
>                 Key: FLINK-21364
>                 URL: https://issues.apache.org/jira/browse/FLINK-21364
>             Project: Flink
>          Issue Type: Improvement
>          Components: Connectors / Common
>    Affects Versions: 1.12.1
>            Reporter: Steven Zhen Wu
>            Priority: Minor
>              Labels: auto-deprioritized-major, pull-request-available
>
> For some split assignment strategy, the enumerator/assigner needs to track 
> the completed splits to advance watermark for event time alignment or rough 
> ordering. Right now, `RequestSplitEvent` for FLIP-27 source doesn't support 
> pass-along of the `finishedSplitIds` info and hence we have to create our own 
> custom source event type for Iceberg source. 
> Here is the proposal of add such optional info to `RequestSplitEvent`.
> {code}
> public RequestSplitEvent(
>     @Nullable String hostName, 
>     @Nullable Collection<String> finishedSplitIds)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to