[ https://issues.apache.org/jira/browse/CALCITE-1935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16905538#comment-16905538 ]
Julian Feinauer commented on CALCITE-1935: ------------------------------------------ Thank you so much [~julianhyde] for your support and the final polishing. I am very happy that I was finally able to contribute something 'major' to calcite! I hope that I'm able to improve MATCH_RECOGNIZE in the future. > Reference implementation for MATCH_RECOGNIZE > -------------------------------------------- > > Key: CALCITE-1935 > URL: https://issues.apache.org/jira/browse/CALCITE-1935 > Project: Calcite > Issue Type: Bug > Reporter: Julian Hyde > Assignee: Julian Feinauer > Priority: Major > Labels: match > Fix For: 1.21.0 > > > We now have comprehensive support for parsing and validating MATCH_RECOGNIZE > queries (see CALCITE-1570 and sub-tasks) but we cannot execute them. I know > the purpose of this work is to do CEP within Flink, but a reference > implementation that works on non-streaming data would be valuable. > I propose that we add a class EnumerableMatch that can generate Java code to > evaluate MATCH_RECOGNIZE queries on Enumerable data. It does not need to be > efficient. I don't mind if it (say) buffers all the data in memory and makes > O(n ^ 3) passes over it. People can make it more efficient over time. > When we have a reference implementation, people can start playing with this > feature. And we can start building a corpus of data sets, queries, and their > expected result. The Flink implementation will be able to test against those > same queries, and should give the same results, even though Flink will be > reading streaming data. > Let's create {{match.iq}} with the following query based on > https://oracle-base.com/articles/12c/pattern-matching-in-oracle-database-12cr1: > {code} > !set outputformat mysql > !use match > SELECT * > FROM sales_history MATCH_RECOGNIZE ( > PARTITION BY product > ORDER BY tstamp > MEASURES STRT.tstamp AS start_tstamp, > LAST(UP.tstamp) AS peak_tstamp, > LAST(DOWN.tstamp) AS end_tstamp, > MATCH_NUMBER() AS mno > ONE ROW PER MATCH > AFTER MATCH SKIP TO LAST DOWN > PATTERN (STRT UP+ FLAT* DOWN+) > DEFINE > UP AS UP.units_sold > PREV(UP.units_sold), > FLAT AS FLAT.units_sold = PREV(FLAT.units_sold), > DOWN AS DOWN.units_sold < PREV(DOWN.units_sold) > ) MR > ORDER BY MR.product, MR.start_tstamp; > PRODUCT START_TSTAM PEAK_TSTAMP END_TSTAMP MNO > ---------- ----------- ----------- ----------- ---------- > TWINKIES 01-OCT-2014 03-OCT-2014 06-OCT-2014 1 > TWINKIES 06-OCT-2014 08-OCT-2014 09-OCT-2014 2 > TWINKIES 09-OCT-2014 13-OCT-2014 16-OCT-2014 3 > TWINKIES 16-OCT-2014 18-OCT-2014 20-OCT-2014 4 > 4 rows selected. > !ok > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016)