[ https://issues.apache.org/jira/browse/CALCITE-1935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16699512#comment-16699512 ]
Julian Hyde commented on CALCITE-1935: -------------------------------------- Over Thanksgiving, I started working on {{MATCH_RECOGNIZE}} again. I wrote a standalone class called {{Automaton}} that allows you to build patterns (basically regular expressions, but sufficient for the {{PATTERN}} sub-clause of {{MATCH_RECOGNIZE}}), and execute them in a unit test. Would someone like to help me develop this? We have support for {{*}} (zero or more repeats), {{+}} (1 or more repeats) and \{m,n\} (bounded repeats) but need {{|}} (or) and several others. It should be fairly straightforward test-driven development: add tests to [AutomatonTest|https://github.com/julianhyde/calcite/blob/1935-match-recognize/core/src/test/java/org/apache/calcite/runtime/AutomatonTest.java], then change {{Automaton}}, {{AutomatonBuilder}}, {{Pattern}} or {{Matcher}} until they pass. We also need lots of SQL tests. Could someone write queries against Oracle’s “ticker” table and paste the queries and results into {{match.iq}}? There is some trickier integration to make {{JdbcTest.testMatch}} work end-to-end; I am working on that. See [my dev branch|https://github.com/julianhyde/calcite/tree/1935-match-recognize/]. I have cherry-picked commits from [Zhiqiang He’s branch|https://github.com/Zhiqiang-He/calcite/tree/calcite-1935-MR-Implementation3] into my branch, so this will be a joint effort when it is finished. > Reference implementation for MATCH_RECOGNIZE > -------------------------------------------- > > Key: CALCITE-1935 > URL: https://issues.apache.org/jira/browse/CALCITE-1935 > Project: Calcite > Issue Type: Bug > Reporter: Julian Hyde > Assignee: Julian Hyde > Priority: Major > Labels: match > > We now have comprehensive support for parsing and validating MATCH_RECOGNIZE > queries (see CALCITE-1570 and sub-tasks) but we cannot execute them. I know > the purpose of this work is to do CEP within Flink, but a reference > implementation that works on non-streaming data would be valuable. > I propose that we add a class EnumerableMatch that can generate Java code to > evaluate MATCH_RECOGNIZE queries on Enumerable data. It does not need to be > efficient. I don't mind if it (say) buffers all the data in memory and makes > O(n ^ 3) passes over it. People can make it more efficient over time. > When we have a reference implementation, people can start playing with this > feature. And we can start building a corpus of data sets, queries, and their > expected result. The Flink implementation will be able to test against those > same queries, and should give the same results, even though Flink will be > reading streaming data. > Let's create {{match.iq}} with the following query based on > https://oracle-base.com/articles/12c/pattern-matching-in-oracle-database-12cr1: > {code} > !set outputformat mysql > !use match > SELECT * > FROM sales_history MATCH_RECOGNIZE ( > PARTITION BY product > ORDER BY tstamp > MEASURES STRT.tstamp AS start_tstamp, > LAST(UP.tstamp) AS peak_tstamp, > LAST(DOWN.tstamp) AS end_tstamp, > MATCH_NUMBER() AS mno > ONE ROW PER MATCH > AFTER MATCH SKIP TO LAST DOWN > PATTERN (STRT UP+ FLAT* DOWN+) > DEFINE > UP AS UP.units_sold > PREV(UP.units_sold), > FLAT AS FLAT.units_sold = PREV(FLAT.units_sold), > DOWN AS DOWN.units_sold < PREV(DOWN.units_sold) > ) MR > ORDER BY MR.product, MR.start_tstamp; > PRODUCT START_TSTAM PEAK_TSTAMP END_TSTAMP MNO > ---------- ----------- ----------- ----------- ---------- > TWINKIES 01-OCT-2014 03-OCT-2014 06-OCT-2014 1 > TWINKIES 06-OCT-2014 08-OCT-2014 09-OCT-2014 2 > TWINKIES 09-OCT-2014 13-OCT-2014 16-OCT-2014 3 > TWINKIES 16-OCT-2014 18-OCT-2014 20-OCT-2014 4 > 4 rows selected. > !ok > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)