fwiw Apache Flink just added CEP. Queries are constructed programmatically rather than in SQL, but the underlying functionality is similar.
https://issues.apache.org/jira/browse/FLINK-3215 On 1 March 2016 at 08:19, Jerry Lam <chiling...@gmail.com> wrote: > Hi Herman, > > Thank you for your reply! > This functionality usually finds its place in financial services which use > CEP (complex event processing) for correlation and pattern matching. Many > commercial products have this including Oracle and Teradata Aster Data MR > Analytics. I do agree the syntax a bit awkward but after you understand it, > it is actually very compact for expressing something that is very complex. > Esper has this feature partially implemented ( > http://www.espertech.com/esper/release-5.1.0/esper-reference/html/match-recognize.html > ). > > I found the Teradata Analytics documentation best to describe the usage of > it. For example (note npath is similar to match_recognize): > > SELECT last_pageid, MAX( count_page80 ) > FROM nPath( > ON ( SELECT * FROM clicks WHERE category >= 0 ) > PARTITION BY sessionid > ORDER BY ts > PATTERN ( 'A.(B|C)*' ) > MODE ( OVERLAPPING ) > SYMBOLS ( pageid = 50 AS A, > pageid = 80 AS B, > pageid <> 80 AND category IN (9,10) AS C ) > RESULT ( LAST ( pageid OF ANY ( A,B,C ) ) AS last_pageid, > COUNT ( * OF B ) AS count_page80, > COUNT ( * OF ANY ( A,B,C ) ) AS count_any ) > ) > WHERE count_any >= 5 > GROUP BY last_pageid > ORDER BY MAX( count_page80 ) > > The above means: > Find user click-paths starting at pageid 50 and passing exclusively > through either pageid 80 or pages in category 9 or category 10. Find the > pageid of the last page in the path and count the number of times page 80 > was visited. Report the maximum count for each last page, and sort the > output by the latter. Restrict to paths containing at least 5 pages. Ignore > pages in the sequence with category < 0. > > If this query is written in pure SQL (if possible at all), it requires > several self-joins. The interesting thing about this feature is that it > integrates SQL+Streaming+ML in one (perhaps potentially graph too). > > Best Regards, > > Jerry > > > On Tue, Mar 1, 2016 at 9:39 AM, Herman van Hövell tot Westerflier < > hvanhov...@questtec.nl> wrote: > >> Hi Jerry, >> >> This is not on any roadmap. I (shortly) browsed through this; and this >> looks like some sort of a window function with very awkward syntax. I think >> spark provided better constructs for this using dataframes/datasets/nested >> data... >> >> Feel free to submit a PR. >> >> Kind regards, >> >> Herman van Hövell >> >> 2016-03-01 15:16 GMT+01:00 Jerry Lam <chiling...@gmail.com>: >> >>> Hi Spark developers, >>> >>> Will you consider to add support for implementing "Pattern matching in >>> sequences of rows"? More specifically, I'm referring to this: >>> http://web.cs.ucla.edu/classes/fall15/cs240A/notes/temporal/row-pattern-recogniton-11.pdf >>> >>> This is a very cool/useful feature to pattern matching over live >>> stream/archived data. It is sorted of related to machine learning because >>> this is usually used in clickstream analysis or path analysis. Also it is >>> related to streaming because of the nature of the processing (time series >>> data mostly). It is SQL because there is a good way to express and optimize >>> the query. >>> >>> Best Regards, >>> >>> Jerry >>> >> >> >