Julian Hyde created CALCITE-1935:
------------------------------------

             Summary: Reference implementation for MATCH_RECOGNIZE
                 Key: CALCITE-1935
                 URL: https://issues.apache.org/jira/browse/CALCITE-1935
             Project: Calcite
          Issue Type: Bug
            Reporter: Julian Hyde
            Assignee: Julian Hyde


We now have comprehensive support for parsing and validating MATCH_RECOGNIZE 
queries (see CALCITE-1570 and sub-tasks) but we cannot execute them. I know the 
purpose of this work is to do CEP within Flink, but a reference implementation 
that works on non-streaming data would be valuable.

I propose that we add a class EnumerableMatch that can generate Java code to 
evaluate MATCH_RECOGNIZE queries on Enumerable data. It does not need to be 
efficient. I don't mind if it (say) buffers all the data in memory and makes 
O(n ^ 3) passes over it. People can make it more efficient over time.

When we have a reference implementation, people can start playing with this 
feature. And we can start building a corpus of data sets, queries, and their 
expected result. The Flink implementation will be able to test against those 
same queries, and should give the same results, even though Flink will be 
reading streaming data.

Let's create {{match.iq}} with the following query based on 
https://oracle-base.com/articles/12c/pattern-matching-in-oracle-database-12cr1:
{code}
!set outputformat mysql
!use match

SELECT *
FROM sales_history MATCH_RECOGNIZE (
         PARTITION BY product
         ORDER BY tstamp
         MEASURES  STRT.tstamp AS start_tstamp,
                   LAST(UP.tstamp) AS peak_tstamp,
                   LAST(DOWN.tstamp) AS end_tstamp,
                   MATCH_NUMBER() AS mno
         ONE ROW PER MATCH
         AFTER MATCH SKIP TO LAST DOWN
         PATTERN (STRT UP+ FLAT* DOWN+)
         DEFINE
           UP AS UP.units_sold > PREV(UP.units_sold),
           FLAT AS FLAT.units_sold = PREV(FLAT.units_sold),
           DOWN AS DOWN.units_sold < PREV(DOWN.units_sold)
       ) MR
ORDER BY MR.product, MR.start_tstamp;

PRODUCT    START_TSTAM PEAK_TSTAMP END_TSTAMP         MNO
---------- ----------- ----------- ----------- ----------
TWINKIES   01-OCT-2014 03-OCT-2014 06-OCT-2014          1
TWINKIES   06-OCT-2014 08-OCT-2014 09-OCT-2014          2
TWINKIES   09-OCT-2014 13-OCT-2014 16-OCT-2014          3
TWINKIES   16-OCT-2014 18-OCT-2014 20-OCT-2014          4

4 rows selected.

!ok
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to