[jira] [Commented] (CALCITE-1935) Reference implementation for MATCH_RECOGNIZE

Julian Hyde (JIRA) Thu, 08 Aug 2019 17:26:38 -0700


    [ 
https://issues.apache.org/jira/browse/CALCITE-1935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16903435#comment-16903435
 ]

Julian Hyde commented on CALCITE-1935:
--------------------------------------

In [an 
email|[https://lists.apache.org/thread.html/475eec9bc95fc05ddabe3721aa0fffd545f77bf4fc5f7a9ea79c690e@%3Cdev.calcite.apache.org%3E]]
 [~julian.feinauer] wrote:
{quote}I finally finished to bring the joint work on MATCH_RECOGNIZE to a state 
where at least two non-trivial Tests work, see [3].

 The Work is based on a lot of preliminary work of Hongze and Julian (Hyde) 
which was done over a period of over a year, therefore the code is rather large.

 I also decided to not squash this PR (yet) as most of the code is not from 
myself but from Hongze and Julian which would be lost, in case of a squash.

 As I had some issues during the implementation and found, I think, some bugs 
in (yet) unused parts of the code I would be very grateful for support with 
reviewing this PR and bringing the code base to a state where it is merge-able 
into master.

Most of the discussions can be found in [1] and [2].

The tests that work can be found in JdbcTest:
 * testSimpleMatch
 * testMatch

The query that works now is:

  select *
  from "hr"."emps" match_recognize (
    order by "empid" desc
    measures "commission" as c,
      "empid" as empid
    pattern (s up)
    define up as up."commission" < prev(up."commission"))

 which covers all basic ingredients of the MATCH_RECOGNIZE clause.

The PR can be found in [4].

This PR does NOT yet completely resolve CALCITE-1935 as the given match.iq file 
does not yet work but I think it is a good idea to resync it with mainline and 
fix some flaws in my code before moving on.

As you may know I made only very few contributions to the Calcite codebase yet, 
so please forgive me if some of my approaches in the Code are rather unusual or 
bad design.

If there are any questions regarding my implementation please feel free to 
discuss.
 I think this PR sets the first MWE for Match recognize and can be the basis 
for all the other (missing) features.

As this branch is pretty old was worked on by several people I do not know if 
all changes are reasonable, so it would be great if original authos (Julian, 
Hongze) could look into these diffs.
 Short list of things reviewers should look into:
 * RelBuilder – Don’t know about those changes?
 * CircularArrayList – Is unused I think and at least the Tests had huge 
performance issues, could be removed, I think
 * blank.iq – CoreQuidemTest fails and I have no idea why as I see no changes 
around. Probably that has to do with ExtensionSqlParser
 * I’m unsure about my changes in RexImpTable and would like to get comments 
about that
 * Match.java:197 – I had to introduce this (ugly?) hack to make the tests in 
JdbcTest work. Perhaps someone could me help with that and explain why the 
former line fails?
 * RexAction / RexPattern – I have no idea about those files and if they can be 
removed?

If there are any further questions please feel free to ask.
{quote}

> Reference implementation for MATCH_RECOGNIZE
> --------------------------------------------
>
>                 Key: CALCITE-1935
>                 URL: https://issues.apache.org/jira/browse/CALCITE-1935
>             Project: Calcite
>          Issue Type: Bug
>            Reporter: Julian Hyde
>            Assignee: Julian Feinauer
>            Priority: Major
>              Labels: match
>
> We now have comprehensive support for parsing and validating MATCH_RECOGNIZE 
> queries (see CALCITE-1570 and sub-tasks) but we cannot execute them. I know 
> the purpose of this work is to do CEP within Flink, but a reference 
> implementation that works on non-streaming data would be valuable.
> I propose that we add a class EnumerableMatch that can generate Java code to 
> evaluate MATCH_RECOGNIZE queries on Enumerable data. It does not need to be 
> efficient. I don't mind if it (say) buffers all the data in memory and makes 
> O(n ^ 3) passes over it. People can make it more efficient over time.
> When we have a reference implementation, people can start playing with this 
> feature. And we can start building a corpus of data sets, queries, and their 
> expected result. The Flink implementation will be able to test against those 
> same queries, and should give the same results, even though Flink will be 
> reading streaming data.
> Let's create {{match.iq}} with the following query based on 
> https://oracle-base.com/articles/12c/pattern-matching-in-oracle-database-12cr1:
> {code}
> !set outputformat mysql
> !use match
> SELECT *
> FROM sales_history MATCH_RECOGNIZE (
>          PARTITION BY product
>          ORDER BY tstamp
>          MEASURES  STRT.tstamp AS start_tstamp,
>                    LAST(UP.tstamp) AS peak_tstamp,
>                    LAST(DOWN.tstamp) AS end_tstamp,
>                    MATCH_NUMBER() AS mno
>          ONE ROW PER MATCH
>          AFTER MATCH SKIP TO LAST DOWN
>          PATTERN (STRT UP+ FLAT* DOWN+)
>          DEFINE
>            UP AS UP.units_sold > PREV(UP.units_sold),
>            FLAT AS FLAT.units_sold = PREV(FLAT.units_sold),
>            DOWN AS DOWN.units_sold < PREV(DOWN.units_sold)
>        ) MR
> ORDER BY MR.product, MR.start_tstamp;
> PRODUCT    START_TSTAM PEAK_TSTAMP END_TSTAMP         MNO
> ---------- ----------- ----------- ----------- ----------
> TWINKIES   01-OCT-2014 03-OCT-2014 06-OCT-2014          1
> TWINKIES   06-OCT-2014 08-OCT-2014 09-OCT-2014          2
> TWINKIES   09-OCT-2014 13-OCT-2014 16-OCT-2014          3
> TWINKIES   16-OCT-2014 18-OCT-2014 20-OCT-2014          4
> 4 rows selected.
> !ok
> {code}

--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Commented] (CALCITE-1935) Reference implementation for MATCH_RECOGNIZE

Reply via email to