Re: Row pattern recognition

Vik Fearing Mon, 26 Jun 2023 15:38:45 -0700

On 6/26/23 03:05, Tatsuo Ishii wrote:

I don't understand this.  RPR in a window specification limits the
window to the matched rows, so this looks like your rpr() function is
just the regular first_value() window function that we already have?


No, rpr() is different from first_value(). rpr() returns the argument
value at the first row in a frame only when matched rows found. On the
other hand first_value() returns the argument value at the first row
in a frame unconditionally.

company  |   tdate    | price | rpr  | first_value
----------+------------+-------+------+-------------
  company1 | 2023-07-01 |   100 |      |         100
  company1 | 2023-07-02 |   200 |  200 |         200
  company1 | 2023-07-03 |   150 |  150 |         150
  company1 | 2023-07-04 |   140 |      |         140
  company1 | 2023-07-05 |   150 |  150 |         150
  company1 | 2023-07-06 |    90 |      |          90
  company1 | 2023-07-07 |   110 |      |         110
  company1 | 2023-07-08 |   130 |      |         130
  company1 | 2023-07-09 |   120 |      |         120
  company1 | 2023-07-10 |   130 |      |         130

For example, a frame starting with (tdate = 2023-07-02, price = 200)
consists of rows (price = 200, 150, 140, 150) satisfying the pattern,
thus rpr returns 200. Since in this example frame option "ROWS BETWEEN
CURRENT ROW AND UNBOUNDED FOLLOWING" is specified, next frame starts
with (tdate = 2023-07-03, price = 150). This frame satisfies the
pattern too (price = 150, 140, 150), and rpr retus 150... and so on.



Okay, I see the problem now, and why you need the rpr() function.

You are doing this as something that happens over a window frame, but itis actually something that *reduces* the window frame. The patternmatching needs to be done when the frame is calculated and not when anyparticular function is applied over it.


This query (with all the defaults made explicit):

SELECT s.company, s.tdate, s.price,
       FIRST_VALUE(s.tdate) OVER w,
       LAST_VALUE(s.tdate) OVER w,
       lowest OVER w
FROM stock AS s
WINDOW w AS (
  PARTITION BY s.company
  ORDER BY s.tdate
  ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
  EXCLUDE NO OTHERS
  MEASURES
    LAST(DOWN) AS lowest
  AFTER MATCH SKIP PAST LAST ROW
  INITIAL PATTERN (START DOWN+ UP+)
  DEFINE
    START AS TRUE,
    UP AS price > PREV(price),
    DOWN AS price < PREV(price)
);

Should produce this result:

 company  |   tdate    | price | first_value | last_value | lowest
----------+------------+-------+-------------+------------+--------
 company1 | 07-01-2023 |   100 |             |            |
 company1 | 07-02-2023 |   200 | 07-02-2023  | 07-05-2023 |    140
 company1 | 07-03-2023 |   150 |             |            |
 company1 | 07-04-2023 |   140 |             |            |
 company1 | 07-05-2023 |   150 |             |            |
 company1 | 07-06-2023 |    90 |             |            |
 company1 | 07-07-2023 |   110 |             |            |
 company1 | 07-08-2023 |   130 | 07-05-2023  | 07-05-2023 |    120
 company1 | 07-09-2023 |   120 |             |            |
 company1 | 07-10-2023 |   130 |             |            |
(10 rows)

Or if we switch to AFTER MATCH SKIP TO NEXT ROW, then we get:

 company  |   tdate    | price | first_value | last_value | lowest
----------+------------+-------+-------------+------------+--------
 company1 | 07-01-2023 |   100 |             |            |
 company1 | 07-02-2023 |   200 | 07-02-2023  | 07-05-2023 |    140
 company1 | 07-03-2023 |   150 | 07-03-2023  | 07-05-2023 |    140
 company1 | 07-04-2023 |   140 |             |            |
 company1 | 07-05-2023 |   150 | 07-05-2023  | 07-08-2023 |     90
 company1 | 07-06-2023 |    90 |             |            |
 company1 | 07-07-2023 |   110 |             |            |
 company1 | 07-08-2023 |   130 | 07-08-2023  | 07-10-2023 |    120
 company1 | 07-09-2023 |   120 |             |            |
 company1 | 07-10-2023 |   130 |             |            |
(10 rows)

And then if we change INITIAL to SEEK:

 company  |   tdate    | price | first_value | last_value | lowest
----------+------------+-------+-------------+------------+--------
 company1 | 07-01-2023 |   100 | 07-02-2023  | 07-05-2023 |    140
 company1 | 07-02-2023 |   200 | 07-02-2023  | 07-05-2023 |    140
 company1 | 07-03-2023 |   150 | 07-03-2023  | 07-05-2023 |    140
 company1 | 07-04-2023 |   140 | 07-05-2023  | 07-08-2023 |     90
 company1 | 07-05-2023 |   150 | 07-05-2023  | 07-08-2023 |     90
 company1 | 07-06-2023 |    90 | 07-08-2023  | 07-10-2023 |    120
 company1 | 07-07-2023 |   110 | 07-08-2023  | 07-10-2023 |    120
 company1 | 07-08-2023 |   130 | 07-08-2023  | 07-10-2023 |    120
 company1 | 07-09-2023 |   120 |             |            |
 company1 | 07-10-2023 |   130 |             |            |
(10 rows)

Since the pattern recognition is part of the frame, the windowaggregates should Just Work.

o SUBSET is not supported


Is this because you haven't done it yet, or because you ran into
problems trying to do it?


Because it seems SUBSET is not useful without MEASURES support. Thus
my plan is, firstly implement MEASURES, then SUBSET. What do you
think?

SUBSET elements can be used in DEFINE clauses, but I do not think thisis important compared to other features.

Comments and suggestions are welcome.


I have not looked at the patch yet, but is the reason for doing R020
before R010 because you haven't done the MEASURES clause yet?


One of the reasons is, implementing MATCH_RECOGNIZE (R010) looked
harder for me because modifying main SELECT clause could be a hard
work. Another reason is, I had no idea how to implement PREV/NEXT in
other than in WINDOW clause. Other people might feel differently
though.

I think we could do this with a single tuplesort if we use backtracking(which might be really slow for some patterns). I have not looked intoit in any detail.

We would need to be able to remove tuples from the end (even if onlylogically), and be able to update tuples inside the store. Both ofthose needs come from backtracking and possibly changing the classifier.

Without backtracking, I don't see how we could do it without have aseparate tuplestore for every current possible match.

In any case, I will be watching this with a close eye, and I am eager
to help in any way I can.


Thank you! I am looking forward to comments on my patch.  Also any
idea how to implement MEASURES clause is welcome.

I looked at your v2 patches a little bit and the only comment that Icurrently have on the code is you spelled PERMUTE as PREMUTE.Everything else is hopefully explained above.

--
Vik Fearing

Re: Row pattern recognition

Reply via email to