Hey,

Give me an hour or so as I am in a meeting currently, but I will get back
to you afterwards.

Regards,
Gyula

On Tue, Mar 24, 2015 at 11:03 AM, Akshay Dixit <akshayd...@gmail.com> wrote:

> Hi,
> It'd really help if I got a reply soon. It'll be helpful in writing the
> proposal since the deadline is on 27th. Thanks
> Regards,
> Akshay Dixit
>
> On Sun, Mar 22, 2015 at 1:17 AM, Akshay Dixit <akshayd...@gmail.com>
> wrote:
>
> > Thanks for the explanation Marton. I've decided to try out for
> FLINK-1534.
> >
> > After reading through the thesis[4] and a few other papers[1][2][3], I
> > believe I've gathered a little context to ask more questions. But I'm
> still
> > not sure how Flink's internals work
> > so please bear with me. Although the ongoing effort to document the
> > architecture and internal is really helpful for newbies like me and would
> > greatly decrease the ramping up time.
> >
> > Detecting a pattern of events would comprise of a pipeline that accepts
> > the pattern query and
> > sources of DataStreams, and outputs detected matches of that pattern to a
> > sink or forwards it
> > along to another stream for further computation.
> >
> > As you said, a simple filter-join-aggregate query system could be
> > developed implementing using the existing Streaming windowing API.
> > But matching over complex events and decoding their pattern queries would
> > require implementing a DSL that transforms queries into an evaluation
> > model. For e.g,
> > in [1], the authors have implemented an NFA automaton with a shared
> > versioned buffer that models the queries. In [4], the authors
> > propose a new language that is much more expressive and compiles into a
> > topology graph for Storm.
> >
> > So in Flink's case, I believe the proposed DSL would generate operator
> > graphs for the Flink compiler to schedule Jobgraphs over TaskManagers.
> > If we don't depend on the Windowing API, would we need to create new
> > operators such as the Projection, Conjunction and Union operators defined
> > in [4] ?
> > Also I would like to hear your thoughts on how to approach scaling the
> > pattern matching query. Note all these techniques talk about scaling a
> > single query.
> > I've read various ways such as
> >
> > 1.  Merging equivalent runs[1] -: This seems a good way to squash
> multiple
> > instances of pattern matching forks into a single one if they have the
> same
> > state.
> > But I'm not sure how we would implement this in Flink since this is a
> > runtime optimization.
> >
> > 2.  Implementing a matched version buffer[1] -: This would involve
> sharing
> > state of a buffer datastructure across multiple candidate match instances
> > for the pattern.
> >
> > 3.  Splitting complex composite patterns into simpler sub-patterns[4] and
> > executing separate queries to detect those sub-patterns. This might
> > translate into different
> > tasks and duplicating the source datastreams to all the new generated
> > tasks.
> >
> > Also since I don't know how the Flink compiler behaves, would some of the
> > optimizations involve making changes to it too?
> >
> > Regards,
> > Akshay Dixit
> >
> > [1] : Efficient Pattern Matching over Event Streams
> > <http://people.cs.umass.edu/~yanlei/publications/sase-sigmod08-long.pdf>
> > [2] : On Supporting Kleene Closure over Event Streams
> > <http://people.cs.umass.edu/~yanlei/publications/sase-icde08.pdf>
> > [3] : Processing Flows of Information: From Data Stream to Complex Event
> > Processing
> > <
> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.396.1785&rep=rep1&type=pdf
> >
> > [4] : Distributing Complex Event Detection
> > <http://www.doc.ic.ac.uk/teaching/distinguished-projects/2012/k.nagy.pdf
> >
> >
> > On Mon, Mar 16, 2015 at 3:22 PM, Márton Balassi <
> balassi.mar...@gmail.com>
> > wrote:
> >
> >> Dear Akshay,
> >>
> >> Thanks again for your interest and for the recent contribution to
> >> streaming.
> >>
> >> Both of the projects mentioned wold be largely appreciated by the
> >> community, and you can also propose other project suggestions here for
> >> discussion.
> >>
> >> Regarding FLINK-1534, the thesis I mentioned serves as a starting point
> >> and
> >> indeed the basic solution can be implemented with filtering and
> >> windowing/mapping with some state storing whether the cause of an event
> >> has
> >> been already seen. Solely relying on the now existing windowing API this
> >> however might cause performance issues if the events also have an
> >> expiration timeout - some optimization there would be included. The
> >> further
> >> challenge is to try to further exploit the parallel job execution of
> Flink
> >> to possibly scale a pattern matching query.
> >>
> >> Best,
> >>
> >> Marton
> >>
> >> On Sun, Mar 15, 2015 at 3:22 PM, Akshay Dixit <akshayd...@gmail.com>
> >> wrote:
> >>
> >> > Hi,
> >> > I'm Akshay Dixit[1], a 4th year undergrad at VIT Vellore, India. I'm
> >> > currently interested in distributed systems and stream processing and
> am
> >> > looking to delve deeper into the subject, and hope to get some insight
> >> by
> >> > contributing to Apache Flink. I've gathered some idea of the
> >> > flink-streaming codebase by recently working on a PR for
> FLINK-1450[2].
> >> >
> >> > Both FLINK-1617[3] and FLINK-1534[4] are interesting projects that I
> >> would
> >> > love to work on over the summer. I was wondering which amongst these
> >> would
> >> > be more appreciated by the community, so I can start working towards a
> >> > proposal for either one.
> >> >
> >> > Regarding FLINK-1534, I was wondering why would simply merging and
> >> > filtering the existing streams for events we want to detect not work?
> >> Also
> >> > on going through the document mentioned by @mbalassi in the JIRA
> >> > comment[5], the authors specify some Runtime Event Detection concepts
> in
> >> > Section 5.2. I'm assuming the project entails on building a similar
> >> analogy
> >> > using Flink and the deliverables would include working pattern
> matching
> >> > operators over Flink DataStreams as described in the report. If so,
> then
> >> > shouldn't it be trivial to implement the described the Binary operator
> >> > using a WindowedStream and a Filter?
> >> > I hope my questions don't seem misplaced here and I would appreciate
> >> links
> >> > to literature where I can learn more on the topic.
> >> >
> >> > Regards,
> >> > Akshay Dixit
> >> >
> >> > [1] : http://akshaydixi.me
> >> > [2] : https://github.com/apache/flink/pull/481
> >> > [3] : https://issues.apache.org/jira/browse/FLINK-1617
> >> > [4] : https://issues.apache.org/jira/browse/FLINK-1534
> >> > [5] :
> >> >
> http://www.doc.ic.ac.uk/teaching/distinguished-projects/2012/k.nagy.pdf
> >> >
> >>
> >
> >
>

Reply via email to