[ 
https://issues.apache.org/jira/browse/FLINK-21364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17284024#comment-17284024
 ] 

Steven Zhen Wu edited comment on FLINK-21364 at 2/12/21, 10:57 PM:
-------------------------------------------------------------------

For pull based source (like file/Iceberg), it is probably more 
natural/efficient to piggyback the `finishedSplitsIds` in the 
`RequestSplitEvent`. A reader request a new split when the current split is 
done.

It doesn't mean that a reader has to request for a new split when finishing 
some splits, like bounded Kafka source case.

You have a good point that some sources (like Kafka/Kineses) may still need to 
communicate the watermark info to coordinator/enumerator. In this case, it 
definitely will be a separate type of event (like `WatermarkUpdateEvent`). 

In our Iceberg source use cases, readers didn't actually report watermark. They 
just need to report which split are finished. All the ordering/watermark 
tracking is centralized in Iceberg source coordinator. But I can see that this 
may not be a very generic scenario to change the `RequestSplitEvent` in 
flink-runtime.

cc [~sundaram]


was (Author: stevenz3wu):
For pull based source (like file/Iceberg), it is probably more 
natural/efficient to piggyback the `finishedSplitsIds` in the 
`RequestSplitEvent`. A reader request a new split when the current split is 
done.

It doesn't mean that a reader has to request for a new split when finishing 
some splits, like bounded Kafka source case.

You have a good point that some sources (like Kafka/Kineses) may still need to 
communicate the watermark info to coordinator/enumerator. In this case, it 
definitely will be a separate type of event (like `WatermarkUpdateEvent`). 

In our Iceberg source use cases, readers didn't actually report watermark. They 
just need to report which split are finished. But I can see that this may not 
be a very generic scenario to change the `RequestSplitEvent` in flink-runtime.

cc [~sundaram]

> piggyback finishedSplitIds in RequestSplitEvent
> -----------------------------------------------
>
>                 Key: FLINK-21364
>                 URL: https://issues.apache.org/jira/browse/FLINK-21364
>             Project: Flink
>          Issue Type: Improvement
>          Components: Connectors / Common
>    Affects Versions: 1.12.1
>            Reporter: Steven Zhen Wu
>            Priority: Major
>              Labels: pull-request-available
>
> For some split assignment strategy, the enumerator/assigner needs to track 
> the completed splits to advance watermark for event time alignment or rough 
> ordering. Right now, `RequestSplitEvent` for FLIP-27 source doesn't support 
> pass-along of the `finishedSplitIds` info and hence we have to create our own 
> custom source event type for Iceberg source. 
> Here is the proposal of add such optional info to `RequestSplitEvent`.
> {code}
> public RequestSplitEvent(
>     @Nullable String hostName, 
>     @Nullable Collection<String> finishedSplitIds)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to