[ https://issues.apache.org/jira/browse/CALCITE-1935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123478#comment-16123478 ]
Julian Hyde commented on CALCITE-1935: -------------------------------------- Remember, it doesn't have to be efficient. Could you use Java's built-in regular expression support? Compile a regular expression, and as each row comes in, add a string token to a StringBuilder, then see whether the regular expression has matched. > Reference implementation for MATCH_RECOGNIZE > -------------------------------------------- > > Key: CALCITE-1935 > URL: https://issues.apache.org/jira/browse/CALCITE-1935 > Project: Calcite > Issue Type: Bug > Reporter: Julian Hyde > Assignee: Julian Hyde > > We now have comprehensive support for parsing and validating MATCH_RECOGNIZE > queries (see CALCITE-1570 and sub-tasks) but we cannot execute them. I know > the purpose of this work is to do CEP within Flink, but a reference > implementation that works on non-streaming data would be valuable. > I propose that we add a class EnumerableMatch that can generate Java code to > evaluate MATCH_RECOGNIZE queries on Enumerable data. It does not need to be > efficient. I don't mind if it (say) buffers all the data in memory and makes > O(n ^ 3) passes over it. People can make it more efficient over time. > When we have a reference implementation, people can start playing with this > feature. And we can start building a corpus of data sets, queries, and their > expected result. The Flink implementation will be able to test against those > same queries, and should give the same results, even though Flink will be > reading streaming data. > Let's create {{match.iq}} with the following query based on > https://oracle-base.com/articles/12c/pattern-matching-in-oracle-database-12cr1: > {code} > !set outputformat mysql > !use match > SELECT * > FROM sales_history MATCH_RECOGNIZE ( > PARTITION BY product > ORDER BY tstamp > MEASURES STRT.tstamp AS start_tstamp, > LAST(UP.tstamp) AS peak_tstamp, > LAST(DOWN.tstamp) AS end_tstamp, > MATCH_NUMBER() AS mno > ONE ROW PER MATCH > AFTER MATCH SKIP TO LAST DOWN > PATTERN (STRT UP+ FLAT* DOWN+) > DEFINE > UP AS UP.units_sold > PREV(UP.units_sold), > FLAT AS FLAT.units_sold = PREV(FLAT.units_sold), > DOWN AS DOWN.units_sold < PREV(DOWN.units_sold) > ) MR > ORDER BY MR.product, MR.start_tstamp; > PRODUCT START_TSTAM PEAK_TSTAMP END_TSTAMP MNO > ---------- ----------- ----------- ----------- ---------- > TWINKIES 01-OCT-2014 03-OCT-2014 06-OCT-2014 1 > TWINKIES 06-OCT-2014 08-OCT-2014 09-OCT-2014 2 > TWINKIES 09-OCT-2014 13-OCT-2014 16-OCT-2014 3 > TWINKIES 16-OCT-2014 18-OCT-2014 20-OCT-2014 4 > 4 rows selected. > !ok > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)