Hey, Give me an hour or so as I am in a meeting currently, but I will get back to you afterwards.
Regards, Gyula On Tue, Mar 24, 2015 at 11:03 AM, Akshay Dixit <akshayd...@gmail.com> wrote: > Hi, > It'd really help if I got a reply soon. It'll be helpful in writing the > proposal since the deadline is on 27th. Thanks > Regards, > Akshay Dixit > > On Sun, Mar 22, 2015 at 1:17 AM, Akshay Dixit <akshayd...@gmail.com> > wrote: > > > Thanks for the explanation Marton. I've decided to try out for > FLINK-1534. > > > > After reading through the thesis[4] and a few other papers[1][2][3], I > > believe I've gathered a little context to ask more questions. But I'm > still > > not sure how Flink's internals work > > so please bear with me. Although the ongoing effort to document the > > architecture and internal is really helpful for newbies like me and would > > greatly decrease the ramping up time. > > > > Detecting a pattern of events would comprise of a pipeline that accepts > > the pattern query and > > sources of DataStreams, and outputs detected matches of that pattern to a > > sink or forwards it > > along to another stream for further computation. > > > > As you said, a simple filter-join-aggregate query system could be > > developed implementing using the existing Streaming windowing API. > > But matching over complex events and decoding their pattern queries would > > require implementing a DSL that transforms queries into an evaluation > > model. For e.g, > > in [1], the authors have implemented an NFA automaton with a shared > > versioned buffer that models the queries. In [4], the authors > > propose a new language that is much more expressive and compiles into a > > topology graph for Storm. > > > > So in Flink's case, I believe the proposed DSL would generate operator > > graphs for the Flink compiler to schedule Jobgraphs over TaskManagers. > > If we don't depend on the Windowing API, would we need to create new > > operators such as the Projection, Conjunction and Union operators defined > > in [4] ? > > Also I would like to hear your thoughts on how to approach scaling the > > pattern matching query. Note all these techniques talk about scaling a > > single query. > > I've read various ways such as > > > > 1. Merging equivalent runs[1] -: This seems a good way to squash > multiple > > instances of pattern matching forks into a single one if they have the > same > > state. > > But I'm not sure how we would implement this in Flink since this is a > > runtime optimization. > > > > 2. Implementing a matched version buffer[1] -: This would involve > sharing > > state of a buffer datastructure across multiple candidate match instances > > for the pattern. > > > > 3. Splitting complex composite patterns into simpler sub-patterns[4] and > > executing separate queries to detect those sub-patterns. This might > > translate into different > > tasks and duplicating the source datastreams to all the new generated > > tasks. > > > > Also since I don't know how the Flink compiler behaves, would some of the > > optimizations involve making changes to it too? > > > > Regards, > > Akshay Dixit > > > > [1] : Efficient Pattern Matching over Event Streams > > <http://people.cs.umass.edu/~yanlei/publications/sase-sigmod08-long.pdf> > > [2] : On Supporting Kleene Closure over Event Streams > > <http://people.cs.umass.edu/~yanlei/publications/sase-icde08.pdf> > > [3] : Processing Flows of Information: From Data Stream to Complex Event > > Processing > > < > http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.396.1785&rep=rep1&type=pdf > > > > [4] : Distributing Complex Event Detection > > <http://www.doc.ic.ac.uk/teaching/distinguished-projects/2012/k.nagy.pdf > > > > > > On Mon, Mar 16, 2015 at 3:22 PM, Márton Balassi < > balassi.mar...@gmail.com> > > wrote: > > > >> Dear Akshay, > >> > >> Thanks again for your interest and for the recent contribution to > >> streaming. > >> > >> Both of the projects mentioned wold be largely appreciated by the > >> community, and you can also propose other project suggestions here for > >> discussion. > >> > >> Regarding FLINK-1534, the thesis I mentioned serves as a starting point > >> and > >> indeed the basic solution can be implemented with filtering and > >> windowing/mapping with some state storing whether the cause of an event > >> has > >> been already seen. Solely relying on the now existing windowing API this > >> however might cause performance issues if the events also have an > >> expiration timeout - some optimization there would be included. The > >> further > >> challenge is to try to further exploit the parallel job execution of > Flink > >> to possibly scale a pattern matching query. > >> > >> Best, > >> > >> Marton > >> > >> On Sun, Mar 15, 2015 at 3:22 PM, Akshay Dixit <akshayd...@gmail.com> > >> wrote: > >> > >> > Hi, > >> > I'm Akshay Dixit[1], a 4th year undergrad at VIT Vellore, India. I'm > >> > currently interested in distributed systems and stream processing and > am > >> > looking to delve deeper into the subject, and hope to get some insight > >> by > >> > contributing to Apache Flink. I've gathered some idea of the > >> > flink-streaming codebase by recently working on a PR for > FLINK-1450[2]. > >> > > >> > Both FLINK-1617[3] and FLINK-1534[4] are interesting projects that I > >> would > >> > love to work on over the summer. I was wondering which amongst these > >> would > >> > be more appreciated by the community, so I can start working towards a > >> > proposal for either one. > >> > > >> > Regarding FLINK-1534, I was wondering why would simply merging and > >> > filtering the existing streams for events we want to detect not work? > >> Also > >> > on going through the document mentioned by @mbalassi in the JIRA > >> > comment[5], the authors specify some Runtime Event Detection concepts > in > >> > Section 5.2. I'm assuming the project entails on building a similar > >> analogy > >> > using Flink and the deliverables would include working pattern > matching > >> > operators over Flink DataStreams as described in the report. If so, > then > >> > shouldn't it be trivial to implement the described the Binary operator > >> > using a WindowedStream and a Filter? > >> > I hope my questions don't seem misplaced here and I would appreciate > >> links > >> > to literature where I can learn more on the topic. > >> > > >> > Regards, > >> > Akshay Dixit > >> > > >> > [1] : http://akshaydixi.me > >> > [2] : https://github.com/apache/flink/pull/481 > >> > [3] : https://issues.apache.org/jira/browse/FLINK-1617 > >> > [4] : https://issues.apache.org/jira/browse/FLINK-1534 > >> > [5] : > >> > > http://www.doc.ic.ac.uk/teaching/distinguished-projects/2012/k.nagy.pdf > >> > > >> > > > > >