Thanks Gyula.

I agree too that simple and working implementations are preferrable over
hacky complex solutions. I'll start sketching out an initial straighforward
API with only basic pattern matching features
and base it on the existing windowing API. I'll post a draft of the
proposal,  keeping the points you've said in mind, tomorrow, so you can
look it over to see if its all right.
Regards,
Akshay Dixit

On Tue, Mar 24, 2015 at 6:30 PM, Gyula Fóra <gyf...@apache.org> wrote:

> Hey Dixit,
>
> Sorry for the delay, I had to discuss this in more detail with some of our
> other core developers.
>
> The consensus seems to be that we would like push this project in a
> direction where the changes can be quickly included in the next releases.
> For this it is essential that we implement features that are complete (and
> clean) from the users perspective. This does not necessarily mean that we
> would like to have everything at once but rather that it is preferable to
> start with something clean and simple (for instance the naive chained
> filter approach) and progressively build more complex logic.
>
> This also mean that we would like to avoid "researchy" code in the codebase
> as much as possible. Of course once we have a stable api for this
> functionality we can work towards making the optimizations that you have
> mentioned like operator sharing and so on.
>
> The ideal proposal would give a clear sketch of the pattern matching API
> that you would like to implement, which might be some added operators at
> first to the current API and possible a DSL later with more advanced
> functionality (this would probably go in a separate library until it is
> very stable).
>
> So please in the proposal include a preview of what the pattern matching
> syntax would look like integrated with the current operators, how it would
> interact with other parts of the system etc.
>
> These are the thing we need to figure out before we consider the
> optimizations I think, because it usually turns out, that the API semantics
> you would like to provide can hugely affect (probably limit) the
> possibilities that you have afterwards in terms of optimizations.
>
> Let me know if you have further questions regarding this :)
>
> Gyula
>
> On Tue, Mar 24, 2015 at 12:01 PM, Gyula Fóra <gyf...@apache.org> wrote:
>
> > Hey,
> >
> > Give me an hour or so as I am in a meeting currently, but I will get back
> > to you afterwards.
> >
> > Regards,
> > Gyula
> >
> > On Tue, Mar 24, 2015 at 11:03 AM, Akshay Dixit <akshayd...@gmail.com>
> > wrote:
> >
> >> Hi,
> >> It'd really help if I got a reply soon. It'll be helpful in writing the
> >> proposal since the deadline is on 27th. Thanks
> >> Regards,
> >> Akshay Dixit
> >>
> >> On Sun, Mar 22, 2015 at 1:17 AM, Akshay Dixit <akshayd...@gmail.com>
> >> wrote:
> >>
> >> > Thanks for the explanation Marton. I've decided to try out for
> >> FLINK-1534.
> >> >
> >> > After reading through the thesis[4] and a few other papers[1][2][3], I
> >> > believe I've gathered a little context to ask more questions. But I'm
> >> still
> >> > not sure how Flink's internals work
> >> > so please bear with me. Although the ongoing effort to document the
> >> > architecture and internal is really helpful for newbies like me and
> >> would
> >> > greatly decrease the ramping up time.
> >> >
> >> > Detecting a pattern of events would comprise of a pipeline that
> accepts
> >> > the pattern query and
> >> > sources of DataStreams, and outputs detected matches of that pattern
> to
> >> a
> >> > sink or forwards it
> >> > along to another stream for further computation.
> >> >
> >> > As you said, a simple filter-join-aggregate query system could be
> >> > developed implementing using the existing Streaming windowing API.
> >> > But matching over complex events and decoding their pattern queries
> >> would
> >> > require implementing a DSL that transforms queries into an evaluation
> >> > model. For e.g,
> >> > in [1], the authors have implemented an NFA automaton with a shared
> >> > versioned buffer that models the queries. In [4], the authors
> >> > propose a new language that is much more expressive and compiles into
> a
> >> > topology graph for Storm.
> >> >
> >> > So in Flink's case, I believe the proposed DSL would generate operator
> >> > graphs for the Flink compiler to schedule Jobgraphs over TaskManagers.
> >> > If we don't depend on the Windowing API, would we need to create new
> >> > operators such as the Projection, Conjunction and Union operators
> >> defined
> >> > in [4] ?
> >> > Also I would like to hear your thoughts on how to approach scaling the
> >> > pattern matching query. Note all these techniques talk about scaling a
> >> > single query.
> >> > I've read various ways such as
> >> >
> >> > 1.  Merging equivalent runs[1] -: This seems a good way to squash
> >> multiple
> >> > instances of pattern matching forks into a single one if they have the
> >> same
> >> > state.
> >> > But I'm not sure how we would implement this in Flink since this is a
> >> > runtime optimization.
> >> >
> >> > 2.  Implementing a matched version buffer[1] -: This would involve
> >> sharing
> >> > state of a buffer datastructure across multiple candidate match
> >> instances
> >> > for the pattern.
> >> >
> >> > 3.  Splitting complex composite patterns into simpler sub-patterns[4]
> >> and
> >> > executing separate queries to detect those sub-patterns. This might
> >> > translate into different
> >> > tasks and duplicating the source datastreams to all the new generated
> >> > tasks.
> >> >
> >> > Also since I don't know how the Flink compiler behaves, would some of
> >> the
> >> > optimizations involve making changes to it too?
> >> >
> >> > Regards,
> >> > Akshay Dixit
> >> >
> >> > [1] : Efficient Pattern Matching over Event Streams
> >> > <
> http://people.cs.umass.edu/~yanlei/publications/sase-sigmod08-long.pdf
> >> >
> >> > [2] : On Supporting Kleene Closure over Event Streams
> >> > <http://people.cs.umass.edu/~yanlei/publications/sase-icde08.pdf>
> >> > [3] : Processing Flows of Information: From Data Stream to Complex
> Event
> >> > Processing
> >> > <
> >>
> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.396.1785&rep=rep1&type=pdf
> >> >
> >> > [4] : Distributing Complex Event Detection
> >> > <
> >> http://www.doc.ic.ac.uk/teaching/distinguished-projects/2012/k.nagy.pdf
> >
> >> >
> >> > On Mon, Mar 16, 2015 at 3:22 PM, Márton Balassi <
> >> balassi.mar...@gmail.com>
> >> > wrote:
> >> >
> >> >> Dear Akshay,
> >> >>
> >> >> Thanks again for your interest and for the recent contribution to
> >> >> streaming.
> >> >>
> >> >> Both of the projects mentioned wold be largely appreciated by the
> >> >> community, and you can also propose other project suggestions here
> for
> >> >> discussion.
> >> >>
> >> >> Regarding FLINK-1534, the thesis I mentioned serves as a starting
> point
> >> >> and
> >> >> indeed the basic solution can be implemented with filtering and
> >> >> windowing/mapping with some state storing whether the cause of an
> event
> >> >> has
> >> >> been already seen. Solely relying on the now existing windowing API
> >> this
> >> >> however might cause performance issues if the events also have an
> >> >> expiration timeout - some optimization there would be included. The
> >> >> further
> >> >> challenge is to try to further exploit the parallel job execution of
> >> Flink
> >> >> to possibly scale a pattern matching query.
> >> >>
> >> >> Best,
> >> >>
> >> >> Marton
> >> >>
> >> >> On Sun, Mar 15, 2015 at 3:22 PM, Akshay Dixit <akshayd...@gmail.com>
> >> >> wrote:
> >> >>
> >> >> > Hi,
> >> >> > I'm Akshay Dixit[1], a 4th year undergrad at VIT Vellore, India.
> I'm
> >> >> > currently interested in distributed systems and stream processing
> >> and am
> >> >> > looking to delve deeper into the subject, and hope to get some
> >> insight
> >> >> by
> >> >> > contributing to Apache Flink. I've gathered some idea of the
> >> >> > flink-streaming codebase by recently working on a PR for
> >> FLINK-1450[2].
> >> >> >
> >> >> > Both FLINK-1617[3] and FLINK-1534[4] are interesting projects that
> I
> >> >> would
> >> >> > love to work on over the summer. I was wondering which amongst
> these
> >> >> would
> >> >> > be more appreciated by the community, so I can start working
> towards
> >> a
> >> >> > proposal for either one.
> >> >> >
> >> >> > Regarding FLINK-1534, I was wondering why would simply merging and
> >> >> > filtering the existing streams for events we want to detect not
> work?
> >> >> Also
> >> >> > on going through the document mentioned by @mbalassi in the JIRA
> >> >> > comment[5], the authors specify some Runtime Event Detection
> >> concepts in
> >> >> > Section 5.2. I'm assuming the project entails on building a similar
> >> >> analogy
> >> >> > using Flink and the deliverables would include working pattern
> >> matching
> >> >> > operators over Flink DataStreams as described in the report. If so,
> >> then
> >> >> > shouldn't it be trivial to implement the described the Binary
> >> operator
> >> >> > using a WindowedStream and a Filter?
> >> >> > I hope my questions don't seem misplaced here and I would
> appreciate
> >> >> links
> >> >> > to literature where I can learn more on the topic.
> >> >> >
> >> >> > Regards,
> >> >> > Akshay Dixit
> >> >> >
> >> >> > [1] : http://akshaydixi.me
> >> >> > [2] : https://github.com/apache/flink/pull/481
> >> >> > [3] : https://issues.apache.org/jira/browse/FLINK-1617
> >> >> > [4] : https://issues.apache.org/jira/browse/FLINK-1534
> >> >> > [5] :
> >> >> >
> >> http://www.doc.ic.ac.uk/teaching/distinguished-projects/2012/k.nagy.pdf
> >> >> >
> >> >>
> >> >
> >> >
> >>
> >
> >
>

Reply via email to