Personally I prefer 3) to keep split/select and correct the behavior. I
feel side output is kind of overkill for such a primitive function, and I
prefer simple APIs like split/select.

Hao Sun


On Thu, Jul 4, 2019 at 11:20 AM Xingcan Cui <xingc...@gmail.com> wrote:

> Hi folks,
>
> Two weeks ago, I started a thread [1] discussing whether we should discard
> the split/select methods (which have been marked as deprecation since v1.7)
> in DataStream API.
>
> The fact is, these methods will cause "unexpected" results when using
> consecutively (e.g., ds.split(a).select(b).split(c).select(d)) or
> multi-times on the same target (e.g., ds.split(a).select(b),
> ds.split(c).select(d)). The reason is that following the initial design,
> the new split/select logic will always override the existing one on the
> same target operator, rather than append to it. Some users may not be
> aware of that, but if you do, a current solution would be to use the more
> powerful side output feature [2].
>
> FLINK-11084 <https://issues.apache.org/jira/browse/FLINK-11084> added
> some restrictions to the existing split/select logic and suggest to
> replace it with side output in the future. However, considering that the
> side output is currently only available in the process function layer and
> the split/select could have been widely used in many real-world
> applications, we'd like to start a vote andlisten to the community on how
> to deal with them.
>
> In the discussion thread [1], we proposed three solutions as follows. All
> of them are feasible but have different impacts on the public API.
>
> 1) Port the side output feature to DataStream API's flatMap and replace
> split/select with it.
>
> 2) Introduce a dedicated function in DataStream API (with the "correct"
> behavior but a different name) that can be used to replace the existing
> split/select.
>
> 3) Keep split/select but change the behavior/semantic to be "correct".
>
> Note that this is just a vote for gathering information, so feel free to
> participate and share your opinions.
>
> The voting time will end on *July 7th 17:00 EDT*.
>
> Thanks,
> Xingcan
>
> [1]
> https://lists.apache.org/thread.html/f94ea5c97f96c705527dcc809b0e2b69e87a4c5d400cb7c61859e1f4@%3Cdev.flink.apache.org%3E
> <https://lists.apache.org/thread.html/f94ea5c97f96c705527dcc809b0e2b69e87a4c5d400cb7c61859e1f4@%3Cdev.flink.apache.org%3E>
> [2]
> https://ci.apache.org/projects/flink/flink-docs-master/dev/stream/side_output.html
> <https://ci.apache.org/projects/flink/flink-docs-master/dev/stream/side_output.html>
>

Reply via email to