Hi, I think the proposal looks good.  The only thing I wasn't clear on was
which API is actually being proposed.  The one where .withSideInput() comes
before the user function or after.  I would definitely prefer it come after
since that's the normal pattern in the Flink API.  I understood that makes
the implementation different (maybe harder) but I think it helps keep the
API uniform which is really good.

Overall I think the API looks good and yes there are some tricky semantics
here but in general if, when processing keyed main streams, we always wait
until there is a side-input available for that key we're off to a great
start and I think that was what you're suggesting in the design doc.

-Jamie


On Thu, Mar 9, 2017 at 7:27 AM, Aljoscha Krettek <aljos...@apache.org>
wrote:

> Hi,
> these are all valuable suggestions and I think that we should implement
> them when the time is right. However, I would like to first get a
> minimal viable version of this feature into Flink and then expand on it.
> I think the last few tries of tackling this problem fizzled out because
> we got to deep into discussing special semantics and features. I think
> the most important thing to agree on right now is the basic API and the
> implementation plan. What do you think about that?
>
> Regarding your suggestions, I have in fact a branch [1] from May 2016
> where I implemented a prototype implementation. This has an n-ary
> operator and inputs can be either bounded or unbounded and the
> implementation actually waits for all bounded inputs to finish before
> starting to process the unbounded inputs.
>
> In general, I think blocking on an input is only possible while you're
> waiting for a bounded input to finish. If all inputs are unbounded you
> cannot block because you might run into deadlocks (in the processing
> graph, due to back pressure) and also because blocking will also block
> elements that might have a lower timestamp and might fall into a
> different window which is already ready for processing.
>
> Best,
> Aljoscha
>
> [1]
> https://github.com/aljoscha/flink/commits/operator-ng-side-input-wrapper
>
> On Tue, Mar 7, 2017, at 14:39, wenlong.lwl wrote:
> > Hi Aljoscha, thank you for the proposal, it is great to hear about the
> > progress in side input.
> >
> > Following is my point of view:
> > 1. I think there may be an option to block the processing of the main
> > input
> > instead of buffer the data in state because in production, the through
> > put
> > of the main input is usually much larger, and buffering the data before
> > the
> > side input may slow down the preparing of side input since the i-o and
> > computing resources are always limited.
> > 2. another issue may need to be disscussed: how can we do checkpointing
> > with side input, because static side input may finish soon once started
> > which will stop the checkpointing.
> > 3. I agree with Gyula that user should be able to determines when a side
> > input is ready? Maybe we can do it one step further: whether users can
> > determine a operator with multiple inputs to process which input each
> > time
> > or not?  It would be more flexible.
> >
> >
> > Best Regards!
> > Wenlong
> >
> > On 7 March 2017 at 18:39, Ventura Del Monte <venturadelmo...@gmail.com>
> > wrote:
> >
> > > Hi Aljoscha,
> > >
> > > Thank you for the proposal and for bringing up again this discussion.
> > >
> > > Regarding the implementation aspect,I would say the first way could
> > > be easier/faster to implement but it could add some overhead when
> > > dealing with multiple side inputs through the current 2-streams union
> > > transform. I tried the second option myself as it has less overhead
> > > but then the outcome was something close to a N-ary operator consuming
> > > first each side input while buffering the main one.
> > > Therefore, I would choose the third option as it is more generic
> > > and might help also in other scenarios, although its implementation
> > > requires more effort.
> > > I also agree with Gyula, I think the user should be allowed to define
> the
> > > condition that determines when a side input is ready, e.g., load the
> side
> > > input first, incrementally update the side input.
> > >
> > > Best,
> > > Ventura
> > >
> > >
> > >
> > >
> > >
> > >
> > > This message, for the D. Lgs n. 196/2003 (Privacy Code), may contain
> > > confidential and/or privileged information. If you are not the
> addressee or
> > > authorized to receive this for the addressee, you must not use, copy,
> > > disclose or take any action based on this message or any information
> > > herein. If you have received this message in error, please advise the
> > > sender immediately by reply e-mail and delete this message. Thank you
> for
> > > your cooperation.
> > >
> > > On Mon, Mar 6, 2017 at 3:50 PM, Gyula Fóra <gyula.f...@gmail.com>
> wrote:
> > >
> > > > Hi Aljoscha,
> > > >
> > > > Thank you for the nice proposal!
> > > >
> > > > I think it would make sense to allow user's to affect the readiness
> of
> > > the
> > > > side input. I think making it ready when the first element arrives is
> > > only
> > > > slightly better then making it always ready from usability
> perspective.
> > > For
> > > > instance if I am joining against a static data set I want to wait
> for the
> > > > whole set before making it ready. This could be exposed as a user
> defined
> > > > condition that could also recognize bounded inputs maybe.
> > > >
> > > > Maybe we could also add an aggregating (merging) side input type,
> that
> > > > could work as a broadcast state.
> > > >
> > > > What do you think?
> > > >
> > > > Gyula
> > > >
> > > > Aljoscha Krettek <aljos...@apache.org> ezt írta (időpont: 2017.
> márc.
> > > 6.,
> > > > H, 15:18):
> > > >
> > > > > Hi Folks,
> > > > >
> > > > > I would like to finally agree on a plan for implementing side
> inputs in
> > > > > Flink. There has already been an attempt to come to consensus [1],
> > > which
> > > > > resulted in two design documents. I tried to consolidate those two
> and
> > > > > also added a section about implementation plans. This is the
> resulting
> > > > > FLIP:
> > > > >
> > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-
> > > > 17+Side+Inputs+for+DataStream+API
> > > > >
> > > > >
> > > > > In terms of semantics I tried to go with the minimal viable
> solution.
> > > > > The part that needs discussing is how we want to implement this. I
> > > > > outlined three possible implementation plans in the FLIP but what
> it
> > > > > boils down to is that we need to introduce some way of getting
> several
> > > > > inputs into an operator/task.
> > > > >
> > > > >
> > > > > Please have a look at the doc and let us know what you think.
> > > > >
> > > > >
> > > > >
> > > > > Best,
> > > > >
> > > > > Aljoscha
> > > > >
> > > > >
> > > > >
> > > > > [1]
> > > > > https://lists.apache.org/thread.html/
> 797df0ba066151b77c7951fd7d603a
> > > > 8afd7023920d0607a0c6337db3@1462181294@%3Cdev.flink.apache.org%3E
> > > > >
> > > >
> > >
>



-- 

Jamie Grier
data Artisans, Director of Applications Engineering
@jamiegrier <https://twitter.com/jamiegrier>
ja...@data-artisans.com

Reply via email to