Re: Some thoughts about the lower-level Flink APIs

Jamie Grier Mon, 15 Aug 2016 19:24:37 -0700

You lost me at lattice, Aljoscha ;)

I do think something like the more powerful N-way FlatMap w/ Timers
Aljoscha is describing here would probably solve most of the problem.
Often Flink's higher level primitives work well for people and that's
great.  It's just that I also spend a fair amount of time discussing with
people how to map what they know they want to do onto operations that
aren't a perfect fit and it sometimes liberates them when they realize they
can just implement it the way they want by dropping down a level.  They
usually don't go there themselves, though.


I mention teaching this "first" and then the higher layers I guess because
that's just a matter of teaching philosophy.  I think it's good to to see
the basic operations that are available first and then understand that the
other abstractions are built on top of that.  That way you're not afraid to
drop-down to basics when you know what you want to get done.

-Jamie


On Mon, Aug 15, 2016 at 2:11 AM, Aljoscha Krettek <aljos...@apache.org>
wrote:

> Hi All,
> I also thought about this recently. A good think would be to add a good
> user facing operator that behaves more or less like an enhanced FlatMap
> with multiple inputs, multiple outputs, state access and keyed timers. I'm
> a bit hesitant, though, since users rarely think about the implications
> that come with state updating and out-of-order events. If you don't
> implement a stateful operator correctly you have pretty much arbitrary
> results.
>
> The problem with out-of-order event arrival and state update is that the
> state basically has to monotonically transition "upwards" through a lattice
> for the computation to make sense. I know this sounds rather theoretical so
> I'll try to explain with an example. Say you have an operator that waits
> for timestamped elements A, B, C to arrive in timestamp order and then does
> some processing. The naive approach would be to have a small state machine
> that tracks what element you have seen so far. The state machine has three
> states: NOTHING, SEEN_A, SEEN_B and SEEN_C. The state machine is supposed
> to traverse these states linearly as the elements arrive. This doesn't
> work, however, when elements arrive in an order that does not match their
> timestamp order. What the user should do is to have a "Set" state that
> keeps track of the elements that it has seen. Once it has seen {A, B, C}
> the operator must check the timestamps and then do the processing, if
> required. The set of possible combinations of A, B, and C forms a lattice
> when combined with the "subset" operation. And traversal through these sets
> is monotonically "upwards" so it works regardless of the order that the
> elements arrive in. (I recently pointed this out on the Beam mailing list
> and Kenneth Knowles rightly pointed out that what I was describing was in
> fact a lattice.)
>
> I know this is a bit off-topic but I think it's very easy for users to
> write wrong operations when they are dealing with state. We should still
> have a good API for it, though. Just wanted to make people aware of this.
>
> Cheers,
> Aljoscha
>
> On Mon, 15 Aug 2016 at 08:18 Matthias J. Sax <mj...@apache.org> wrote:
>
> > It really depends on the skill level of the developer. Using low-level
> > API requires to think about many details (eg. state handling etc.) that
> > could be done wrong.
> >
> > As Flink gets a broader community, more people will use it who might not
> > have the required skill level to deal with low-level API. For more
> > trained uses, it is of course a powerful tool!
> >
> > I guess it boils down to the question, what type of developer Flink
> > targets, if low-level API should be offensive advertised or not. Also
> > keep in mind, that many people criticized Storm's low-level API as hard
> > to program etc.
> >
> >
> > -Matthias
> >
> > On 08/15/2016 07:46 AM, Gyula Fóra wrote:
> > > Hi Jamie,
> > >
> > > I agree that it is often much easier to work on the lower level APIs if
> > you
> > > know what you are doing.
> > >
> > > I think it would be nice to have very clean abstractions on that level
> so
> > > we could teach this to the users first but currently I thinm its not
> easy
> > > enough to be good starting point.
> > >
> > > The user needs to understand a lot about the system if the dont want to
> > > hurt other parts of the pipeline. For insance working with the
> > > streamrecords, propagating watermarks, working with state internals
> > >
> > > This all might be overwhelming at the first glance. But maybe we can
> slim
> > > some abstractions down to the point where this becomes kind of the
> > > extension of the RichFunctions.
> > >
> > > Cheers,
> > > Gyula
> > >
> > > On Sat, Aug 13, 2016, 17:48 Jamie Grier <ja...@data-artisans.com>
> wrote:
> > >
> > >> Hey all,
> > >>
> > >> I've noticed a few times now when trying to help users implement
> > particular
> > >> things in the Flink API that it can be complicated to map what they
> know
> > >> they are trying to do onto higher-level Flink concepts such as
> > windowing or
> > >> Connect/CoFlatMap/ValueState, etc.
> > >>
> > >> At some point it just becomes easier to think about writing a Flink
> > >> operator yourself that is integrated into the pipeline with a
> > transform()
> > >> call.
> > >>
> > >> It can just be easier to think at a more basic level.  For example I
> can
> > >> write an operator that can consume one or two input streams (should
> > >> probably be N), update state which is managed for me fault tolerantly,
> > and
> > >> output elements or setup timers/triggers that give me callbacks from
> > which
> > >> I can also update state or emit elements.
> > >>
> > >> When you think at this level you realize you can program just about
> > >> anything you want.  You can create whatever fault-tolerant data
> > structures
> > >> you want, and easily execute robust stateful computation over data
> > streams
> > >> at scale.  This is the real technology and power of Flink IMO.
> > >>
> > >> Also, at this level I don't have to think about the complexities of
> > >> windowing semantics, learn as much API, etc.  I can easily have some
> > inputs
> > >> that are broadcast, others that are keyed, manage my own state in
> > whatever
> > >> data structure makes sense, etc.  If I know exactly what I actually
> > want to
> > >> do I can just do it with the full power of my chosen language, data
> > >> structures, etc.  I'm not "restricted" to trying to map everything
> onto
> > >> higher-level Flink constructs which is sometimes actually more
> > complicated.
> > >>
> > >> Programming at this level is actually fairly easy to do but people
> seem
> > a
> > >> bit afraid of this level of the API.  They think of it as low-level or
> > >> custom hacking..
> > >>
> > >> Anyway, I guess my thought is this..  Should we explain Flink to
> people
> > at
> > >> this level *first*?  Show that you have nearly unlimited power and
> > >> flexibility to build what you want *and only then* from there explain
> > the
> > >> higher level APIs they can use *if* those match their use cases well.
> > >>
> > >> Would this better demonstrate to people the power of Flink and maybe
> > >> *liberate* them a bit from feeling they have to map their problem
> onto a
> > >> more complex set of higher level primitives?  I see people trying to
> > >> shoe-horn what they are really trying to do, which is simple to
> explain
> > in
> > >> english, onto windows, triggers, CoFlatMaps, etc, and this get's
> > >> complicated sometimes.  It's like an impedance mismatch.  You could
> just
> > >> solve the problem very easily programmed in straight Java/Scala.
> > >>
> > >> Anyway, it's very easy to drop down a level in the API and program
> > whatever
> > >> you want but users don't seem to *perceive* it that way.
> > >>
> > >> Just some thoughts...  Any feedback?  Have any of you had similar
> > >> experiences when working with newer Flink users or as a newer Flink
> user
> > >> yourself?  Can/should we do anything to make the *lower* level API
> more
> > >> accessible/visible to users?
> > >>
> > >> -Jamie
> > >>
> > >
> >
> >
>



-- 

Jamie Grier
data Artisans, Director of Applications Engineering
@jamiegrier <https://twitter.com/jamiegrier>
ja...@data-artisans.com

Re: Some thoughts about the lower-level Flink APIs

Reply via email to