Hi Weijie,

Thank you for driving the design of the new DataStream API. I have a
few questions regarding the FLIP:

1. In the partitioning section, it says that "broadcast can only be
used as a side-input of other Inputs." Could you clarify what is meant
by "side-input"? If I understand correctly, it refer to one of the
inputs of the `TwoInputStreamProcessFunction`. If that is the case,
the term "side-input" may not be accurate.

2. Is there a particular reason we do not support a
`TwoInputProcessFunction` to combine a KeyedStream with a
BroadcastStream to result in a KeyedStream? There seems to be a valid
use case where a KeyedStream is enriched with a BroadcastStream and
returns a Stream that is partitioned in the same way.

3. There appears to be a typo in the example code. The
`SingleStreamProcessFunction` should probably be
`OneInputStreamProcessFunction`.

4. How do we set the global configuration for the
ExecutionEnvironment? Currently, we have the
StreamExecutionEnvironment.getExecutionEnvironment(Configuration)
method to provide the global configuration in the API.

5. I noticed that there are two `collect` methods in the Collector,
one with a timestamp and one without. Could you elaborate on the
differences between them? Additionally, in what use case would one use
the method that includes the timestamp?

Best regards,
Xuannan



On Fri, Jan 26, 2024 at 2:21 PM Yunfeng Zhou
<flink.zhouyunf...@gmail.com> wrote:
>
> Hi Weijie,
>
> Thanks for raising discussions about the new DataStream API. I have a
> few questions about the content of the FLIP.
>
> 1. Will we provide any API to support choosing which input to consume
> between the two inputs of TwoInputStreamProcessFunction? It would be
> helpful in online machine learning cases, where a process function
> needs to receive the first machine learning model before it can start
> predictions on input data. Similar requirements might also exist in
> Flink CEP, where a rule set needs to be consumed by the process
> function first before it can start matching the event stream against
> CEP patterns.
>
> 2. A typo might exist in the current FLIP describing the API to
> generate a global stream, as I can see either global() or coalesce()
> in different places of the FLIP. These two methods might need to be
> unified into one method.
>
> 3. The order of parameters in the current ProcessFunction is (record,
> context, output), while this FLIP proposes to change the order into
> (record, output, context). Is there any reason to make this change?
>
> 4. Why does this FLIP propose to use connectAndProcess() instead of
> connect() (+ keyBy()) + process()? The latter looks simpler to me.
>
> Looking forward to discussing these questions with you.
>
> Best regards,
> Yunfeng Zhou
>
> On Tue, Dec 26, 2023 at 2:44 PM weijie guo <guoweijieres...@gmail.com> wrote:
> >
> > Hi devs,
> >
> >
> > I'd like to start a discussion about FLIP-409: DataStream V2 Building
> > Blocks: DataStream, Partitioning and ProcessFunction [1].
> >
> >
> > As the first sub-FLIP for DataStream API V2, we'd like to discuss and
> > try to answer some of the most fundamental questions in stream
> > processing:
> >
> >    1. What kinds of data streams do we have?
> >    2. How to partition data over the streams?
> >    3. How to define a processing on the data stream?
> >
> > The answer to these questions involve three core concepts: DataStream,
> > Partitioning and ProcessFunction. In this FLIP, we will discuss the
> > definitions and related API primitives of these concepts in detail.
> >
> >
> > You can find more details in FLIP-409 [1]. This sub-FLIP is at the
> > heart of the entire DataStream API V2, and its relationship with other
> > sub-FLIPs can be found in the umbrella FLIP [2].
> >
> >
> > Looking forward to hearing from you, thanks!
> >
> >
> > Best regards,
> >
> > Weijie
> >
> >
> >
> > [1]
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-409%3A+DataStream+V2+Building+Blocks%3A+DataStream%2C+Partitioning+and+ProcessFunction
> >
> > [2]
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-408%3A+%5BUmbrella%5D+Introduce+DataStream+API+V2

Reply via email to