Re: Row pattern recognition

2024-06-12 Thread Tatsuo Ishii
I gave a talk on RPR in PGConf.dev 2024.
https://www.pgevents.ca/events/pgconfdev2024/schedule/session/114-implementing-row-pattern-recognition/
(Slides are available from the link).

Vik Faring and Jacob Champion were one of the audiences and we had a
small discussion after the talk. We continued the discussion off list
on how to move forward the RPR implementation project. One of the
ideas is, to summarize what are in the patch and what are not from the
SQL standard specification's point of view. This should help us to
reach the consensus regarding "minimum viable" feature set if we want
to bring the patch in upcoming PostgreSQL v18.

Here is the first cut of the document. Comments/feedback are welcome.

-
This memo describes the current status of implementation of SQL/RPR
(Row Pattern Recognition), as of June 13, 2024 (the latest patch is v20).

- RPR in FROM clause and WINDOW clause

The SQL standard defines two features regarding SQL/RPR - R010 (RPR in
FROM clause) and R020 (RPR in WINDOW clause). Only R020 is
implemented. From now on, we discuss on R020.

- Overview of R020 syntax

WINDOW window_name AS (
[ PARTITION BY ... ]
[ ORDER BY... ]
[ MEASURES ... ]
ROWS BETWEEN CURRENT ROW AND ...
[ AFTER MATCH SKIP ... ]
[ INITIAL|SEEK ]
PATTERN (...)
[ SUBSET ... ]
DEFINE ...
)

-- PARTITION BY and ORDER BY are not specific to RPR and has been
  already there in current PostgreSQL.

-- What are (partially) implemented:

AFTER MATCH SKIP
INITIAL|SEEK
PATTERN
DEFINE

-- What are not implemented at all:
MEASURES
SUBSET

Followings are detailed status of the each clause.

- AFTER MATCH SKIP

-- Implemented:
AFTER MATCH SKIP TO NEXT ROW
AFTER MATCH SKIP PAST LAST ROW

-- Not implemented:
AFTER MATCH SKIP TO FIRST|LAST pattern_variable

- INITIAL|SEEK

--Implemented:
INITIAL

-- Not implemented:
SEEK

- DEFINE

-- Partially implemented row pattern navigation operations are PREV and
   NEXT. FIRST and LAST are not implemented.

-- The standard says PREV and NEXT accepts optional argument "offset"
   but it's not implemented.

-- The standard says the row pattern navigation operations can be
   nested but it's not implemented.

-- CLLASSIFIER, use of aggregate functions and subqueries in DEFINE
   clause are not implemented.

- PATTERN

-- Followings are implemented:
+: 1 or more rows
*: 0 or more rows

-- Followings are not implemented:
?: 0 or 1 row
A | B: OR condition
(A B): grouping
{n}: n rows
{n,}: n or more rows
{n,m}: greater or equal to n rows and less than or equal to m rows
{,m}: more than 0 and less than or equal to m rows
-

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp




Re: Row pattern recognition

2024-04-26 Thread Tatsuo Ishii
> On Tue, Apr 23, 2024 at 8:13 PM Tatsuo Ishii  wrote:
>> SELECT v.a, count(*) OVER w
>> FROM (VALUES ('A'),('B'),('B'),('C')) AS v (a)
>> WINDOW w AS (
>>   ORDER BY v.a
>>   ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
>>   PATTERN (B+)
>>   DEFINE B AS a = 'B'
>> )
>>  a | count
>> ---+---
>>  A | 0
>>  B | 2
>>  B |
>>  C | 0
>> (4 rows)
>>
>> Here row 3 is skipped because the pattern B matches row 2 and 3. In
>> this case I think cont(*) should return 0 rathern than NULL for row 3.
> 
> I think returning zero would match Vik's explanation upthread [1],
> yes. Unfortunately I don't have a spec handy to double-check for
> myself right now.

Ok. I believe you and Vik are correct.
I am modifying the patch in this direction.
Attached is the regression diff after modifying the patch.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
diff -U3 
/usr/local/src/pgsql/current/postgresql/src/test/regress/expected/rpr.out 
/usr/local/src/pgsql/current/postgresql/src/test/regress/results/rpr.out
--- /usr/local/src/pgsql/current/postgresql/src/test/regress/expected/rpr.out   
2024-04-24 11:30:27.710523139 +0900
+++ /usr/local/src/pgsql/current/postgresql/src/test/regress/results/rpr.out
2024-04-26 14:39:03.543759205 +0900
@@ -181,8 +181,8 @@
  company1 | 07-01-2023 |   100 | 0
  company1 | 07-02-2023 |   200 | 0
  company1 | 07-03-2023 |   150 | 3
- company1 | 07-04-2023 |   140 |  
- company1 | 07-05-2023 |   150 |  
+ company1 | 07-04-2023 |   140 | 0
+ company1 | 07-05-2023 |   150 | 0
  company1 | 07-06-2023 |90 | 0
  company1 | 07-07-2023 |   110 | 0
  company1 | 07-08-2023 |   130 | 0
@@ -556,24 +556,24 @@
  company  |   tdate| price | first_value | last_value | count 
 --++---+-++---
  company1 | 07-01-2023 |   100 | 07-01-2023  | 07-03-2023 | 3
- company1 | 07-02-2023 |   200 | ||  
- company1 | 07-03-2023 |   150 | ||  
+ company1 | 07-02-2023 |   200 | || 0
+ company1 | 07-03-2023 |   150 | || 0
  company1 | 07-04-2023 |   140 | 07-04-2023  | 07-06-2023 | 3
- company1 | 07-05-2023 |   150 | ||  
- company1 | 07-06-2023 |90 | ||  
+ company1 | 07-05-2023 |   150 | || 0
+ company1 | 07-06-2023 |90 | || 0
  company1 | 07-07-2023 |   110 | 07-07-2023  | 07-09-2023 | 3
- company1 | 07-08-2023 |   130 | ||  
- company1 | 07-09-2023 |   120 | ||  
+ company1 | 07-08-2023 |   130 | || 0
+ company1 | 07-09-2023 |   120 | || 0
  company1 | 07-10-2023 |   130 | || 0
  company2 | 07-01-2023 |50 | 07-01-2023  | 07-03-2023 | 3
- company2 | 07-02-2023 |  2000 | ||  
- company2 | 07-03-2023 |  1500 | ||  
+ company2 | 07-02-2023 |  2000 | || 0
+ company2 | 07-03-2023 |  1500 | || 0
  company2 | 07-04-2023 |  1400 | 07-04-2023  | 07-06-2023 | 3
- company2 | 07-05-2023 |  1500 | ||  
- company2 | 07-06-2023 |60 | ||  
+ company2 | 07-05-2023 |  1500 | || 0
+ company2 | 07-06-2023 |60 | || 0
  company2 | 07-07-2023 |  1100 | 07-07-2023  | 07-09-2023 | 3
- company2 | 07-08-2023 |  1300 | ||  
- company2 | 07-09-2023 |  1200 | ||  
+ company2 | 07-08-2023 |  1300 | || 0
+ company2 | 07-09-2023 |  1200 | || 0
  company2 | 07-10-2023 |  1300 | || 0
 (20 rows)
 
@@ -604,24 +604,24 @@
  company  |   tdate| price | first_value | last_value | max  | min | sum  
|  avg  | count 
 
--++---+-++--+-+--+---+---
  company1 | 07-01-2023 |   100 | 100 |140 |  200 | 100 |  590 
|  147.5000 | 4
- company1 | 07-02-2023 |   200 | ||  | |  
|   |  
- company1 | 07-03-2023 |   150 | ||  | |  
|   |  
- company1 | 07-04-2023 |   140 | ||  | |  
|   |  
+ company1 | 07-02-2023 |   200 | ||  | |  
|   | 0
+ company1 | 07-03-2023 |   150 | ||  | |  
|   | 0
+ company1 | 

Re: Row pattern recognition

2024-04-24 Thread Jacob Champion
On Tue, Apr 23, 2024 at 8:13 PM Tatsuo Ishii  wrote:
> SELECT v.a, count(*) OVER w
> FROM (VALUES ('A'),('B'),('B'),('C')) AS v (a)
> WINDOW w AS (
>   ORDER BY v.a
>   ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
>   PATTERN (B+)
>   DEFINE B AS a = 'B'
> )
>  a | count
> ---+---
>  A | 0
>  B | 2
>  B |
>  C | 0
> (4 rows)
>
> Here row 3 is skipped because the pattern B matches row 2 and 3. In
> this case I think cont(*) should return 0 rathern than NULL for row 3.

I think returning zero would match Vik's explanation upthread [1],
yes. Unfortunately I don't have a spec handy to double-check for
myself right now.

--Jacob

[1] 
https://www.postgresql.org/message-id/c9ebc3d0-c3d1-e8eb-4a57-0ec099cbda17%40postgresfriends.org




Re: Row pattern recognition

2024-04-23 Thread Tatsuo Ishii
Hi Vik and Champion,

I think the current RPR patch is not quite correct in handling
count(*).

(using slightly modified version of Vik's example query)

SELECT v.a, count(*) OVER w
FROM (VALUES ('A'),('B'),('B'),('C')) AS v (a)
WINDOW w AS (
  ORDER BY v.a
  ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
  PATTERN (B+)
  DEFINE B AS a = 'B'
)
 a | count 
---+---
 A | 0
 B | 2
 B |  
 C | 0
(4 rows)

Here row 3 is skipped because the pattern B matches row 2 and 3. In
this case I think cont(*) should return 0 rathern than NULL for row 3.

What do you think?

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp




Re: Row pattern recognition

2024-01-21 Thread Tatsuo Ishii
> Thank you very much for providing the patch for the RPR implementation.
> 
> After applying the v12-patches, I noticed an issue that
> the rpr related parts in window clauses were not displayed in the
> view definitions (the definition column of pg_views).
> 
> To address this, I have taken the liberty of adding an additional patch
> that modifies the relevant rewriter source code.
> I have attached the rewriter patch for your review and would greatly 
> appreciate your feedback.
> 
> Thank you for your time and consideration.

Thank you so much for spotting the issue and creating the patch. I
confirmed that your patch applies cleanly and solve the issue. I will
include the patches into upcoming v13 patches.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp




Re: Row pattern recognition

2024-01-21 Thread NINGWEI CHEN
On Sat, 09 Dec 2023 07:22:58 +0900 (JST)
Tatsuo Ishii  wrote:

> > On 04.12.23 12:40, Tatsuo Ishii wrote:
> >> diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
> >> index d631ac89a9..5a77fca17f 100644
> >> --- a/src/backend/parser/gram.y
> >> +++ b/src/backend/parser/gram.y
> >> @@ -251,6 +251,8 @@ static Node *makeRecursiveViewSelect(char
> >> *relname, List *aliases, Node *query);
> >>DefElem*defelt;
> >>SortBy *sortby;
> >>WindowDef  *windef;
> >> +  RPCommonSyntax  *rpcom;
> >> +  RPSubsetItem*rpsubset;
> >>JoinExpr   *jexpr;
> >>IndexElem  *ielem;
> >>StatsElem  *selem;
> >> @@ -278,6 +280,7 @@ static Node *makeRecursiveViewSelect(char
> >> *relname, List *aliases, Node *query);
> >>MergeWhenClause *mergewhen;
> >>struct KeyActions *keyactions;
> >>struct KeyAction *keyaction;
> >> +  RPSkipToskipto;
> >>   }
> >> %typestmt toplevel_stmt schema_stmt routine_body_stmt
> > 
> > It is usually not the style to add an entry for every node type to the
> > %union.  Otherwise, we'd have hundreds of entries in there.
> 
> Ok, I have removed the node types and used existing node types.  Also
> I have moved RPR related %types to same place to make it easier to know
> what are added by RPR.
> 
> >> @@ -866,6 +878,7 @@ static Node *makeRecursiveViewSelect(char
> >> *relname, List *aliases, Node *query);
> >>   %nonassoc UNBOUNDED /* ideally would have same precedence as IDENT */
> >>   %nonassoc IDENT PARTITION RANGE ROWS GROUPS PRECEDING FOLLOWING CUBE
> >>   %ROLLUP
> >>SET KEYS OBJECT_P SCALAR VALUE_P WITH WITHOUT
> >> +%nonassoc MEASURES AFTER INITIAL SEEK PATTERN_P
> >>   %left Op OPERATOR /* multi-character ops and user-defined operators */
> >>   %left'+' '-'
> >>   %left'*' '/' '%'
> > 
> > It was recently discussed that these %nonassoc should ideally all have
> > the same precedence.  Did you consider that here?
> 
> No, I didn't realize it. Thanks for pointing it out. I have removed
> %nonassoc so that MEASURES etc. have the same precedence as IDENT etc.
> 
> Attached is the new diff of gram.y against master branch.

Thank you very much for providing the patch for the RPR implementation.

After applying the v12-patches, I noticed an issue that
the rpr related parts in window clauses were not displayed in the
view definitions (the definition column of pg_views).

To address this, I have taken the liberty of adding an additional patch
that modifies the relevant rewriter source code.
I have attached the rewriter patch for your review and would greatly appreciate 
your feedback.

Thank you for your time and consideration.

-- 
SRA OSS LLC
Ningwei Chen 
TEL: 03-5979-2701 FAX: 03-5979-2702
diff --git a/src/backend/utils/adt/ruleutils.c 
b/src/backend/utils/adt/ruleutils.c
index 0b2a164057..baded88201 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -428,6 +428,10 @@ static void get_rule_groupingset(GroupingSet *gset, List 
*targetlist,
 bool 
omit_parens, deparse_context *context);
 static void get_rule_orderby(List *orderList, List *targetList,
 bool force_colno, 
deparse_context *context);
+static void get_rule_pattern(List *patternVariable, List *patternRegexp,
+bool force_colno, 
deparse_context *context);
+static void get_rule_define(List *defineClause, List *patternVariables,
+   bool force_colno, 
deparse_context *context);
 static void get_rule_windowclause(Query *query, deparse_context *context);
 static void get_rule_windowspec(WindowClause *wc, List *targetList,
deparse_context 
*context);
@@ -6459,6 +6463,64 @@ get_rule_orderby(List *orderList, List *targetList,
}
 }
 
+/*
+ * Display a PATTERN clause.
+ */
+static void
+get_rule_pattern(List *patternVariable, List *patternRegexp,
+bool force_colno, deparse_context *context)
+{
+   StringInfo  buf = context->buf;
+   const char *sep;
+   ListCell   *lc_var, *lc_reg = list_head(patternRegexp);
+
+   sep = "";
+   appendStringInfoChar(buf, '(');
+   foreach(lc_var, patternVariable)
+   {
+   char *variable = strVal((String *) lfirst(lc_var));
+   char *regexp = NULL;
+
+   if (lc_reg != NULL)
+   {
+   regexp = strVal((String *) lfirst(lc_reg));
+   lc_reg = lnext(patternRegexp, lc_reg);
+   }
+
+   appendStringInfo(buf, "%s%s", sep, variable);
+   if (regexp != NULL)
+   appendStringInfoString(buf, regexp);
+
+   sep = " ";
+   }
+   appendStringInfoChar(buf, ')');
+}
+

Re: Row pattern recognition

2023-12-08 Thread Tatsuo Ishii
> On 04.12.23 12:40, Tatsuo Ishii wrote:
>> diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
>> index d631ac89a9..5a77fca17f 100644
>> --- a/src/backend/parser/gram.y
>> +++ b/src/backend/parser/gram.y
>> @@ -251,6 +251,8 @@ static Node *makeRecursiveViewSelect(char
>> *relname, List *aliases, Node *query);
>>  DefElem*defelt;
>>  SortBy *sortby;
>>  WindowDef  *windef;
>> +RPCommonSyntax  *rpcom;
>> +RPSubsetItem*rpsubset;
>>  JoinExpr   *jexpr;
>>  IndexElem  *ielem;
>>  StatsElem  *selem;
>> @@ -278,6 +280,7 @@ static Node *makeRecursiveViewSelect(char
>> *relname, List *aliases, Node *query);
>>  MergeWhenClause *mergewhen;
>>  struct KeyActions *keyactions;
>>  struct KeyAction *keyaction;
>> +RPSkipToskipto;
>>   }
>> %type  stmt toplevel_stmt schema_stmt routine_body_stmt
> 
> It is usually not the style to add an entry for every node type to the
> %union.  Otherwise, we'd have hundreds of entries in there.

Ok, I have removed the node types and used existing node types.  Also
I have moved RPR related %types to same place to make it easier to know
what are added by RPR.

>> @@ -866,6 +878,7 @@ static Node *makeRecursiveViewSelect(char
>> *relname, List *aliases, Node *query);
>>   %nonassoc UNBOUNDED /* ideally would have same precedence as IDENT */
>>   %nonassoc IDENT PARTITION RANGE ROWS GROUPS PRECEDING FOLLOWING CUBE
>>   %ROLLUP
>>  SET KEYS OBJECT_P SCALAR VALUE_P WITH WITHOUT
>> +%nonassoc   MEASURES AFTER INITIAL SEEK PATTERN_P
>>   %left Op OPERATOR /* multi-character ops and user-defined operators */
>>   %left  '+' '-'
>>   %left  '*' '/' '%'
> 
> It was recently discussed that these %nonassoc should ideally all have
> the same precedence.  Did you consider that here?

No, I didn't realize it. Thanks for pointing it out. I have removed
%nonassoc so that MEASURES etc. have the same precedence as IDENT etc.

Attached is the new diff of gram.y against master branch.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index d631ac89a9..6c41aa2e9f 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -659,6 +659,21 @@ static Node *makeRecursiveViewSelect(char *relname, List 
*aliases, Node *query);
json_object_constructor_null_clause_opt
json_array_constructor_null_clause_opt
 
+%type  row_pattern_measure_item
+   row_pattern_definition
+%typeopt_row_pattern_common_syntax
+   opt_row_pattern_skip_to
+   row_pattern_subset_item
+   row_pattern_term
+%typeopt_row_pattern_measures
+   row_pattern_measure_list
+   row_pattern_definition_list
+   opt_row_pattern_subset_clause
+   row_pattern_subset_list
+   row_pattern_subset_rhs
+   row_pattern
+%type opt_row_pattern_initial_or_seek
+   first_or_last
 
 /*
  * Non-keyword token types.  These are hard-wired into the "flex" lexer.
@@ -702,7 +717,7 @@ static Node *makeRecursiveViewSelect(char *relname, List 
*aliases, Node *query);
CURRENT_TIME CURRENT_TIMESTAMP CURRENT_USER CURSOR CYCLE
 
DATA_P DATABASE DAY_P DEALLOCATE DEC DECIMAL_P DECLARE DEFAULT DEFAULTS
-   DEFERRABLE DEFERRED DEFINER DELETE_P DELIMITER DELIMITERS DEPENDS DEPTH 
DESC
+   DEFERRABLE DEFERRED DEFINE DEFINER DELETE_P DELIMITER DELIMITERS 
DEPENDS DEPTH DESC
DETACH DICTIONARY DISABLE_P DISCARD DISTINCT DO DOCUMENT_P DOMAIN_P
DOUBLE_P DROP
 
@@ -718,7 +733,7 @@ static Node *makeRecursiveViewSelect(char *relname, List 
*aliases, Node *query);
HANDLER HAVING HEADER_P HOLD HOUR_P
 
IDENTITY_P IF_P ILIKE IMMEDIATE IMMUTABLE IMPLICIT_P IMPORT_P IN_P 
INCLUDE
-   INCLUDING INCREMENT INDENT INDEX INDEXES INHERIT INHERITS INITIALLY 
INLINE_P
+   INCLUDING INCREMENT INDENT INDEX INDEXES INHERIT INHERITS INITIAL 
INITIALLY INLINE_P
INNER_P INOUT INPUT_P INSENSITIVE INSERT INSTEAD INT_P INTEGER
INTERSECT INTERVAL INTO INVOKER IS ISNULL ISOLATION
 
@@ -731,7 +746,7 @@ static Node *makeRecursiveViewSelect(char *relname, List 
*aliases, Node *query);
LEADING LEAKPROOF LEAST LEFT LEVEL LIKE LIMIT LISTEN LOAD LOCAL
LOCALTIME LOCALTIMESTAMP LOCATION LOCK_P LOCKED LOGGED
 
-   MAPPING MATCH MATCHED MATERIALIZED MAXVALUE MERGE METHOD
+   MAPPING MATCH MATCHED MATERIALIZED MAXVALUE MEASURES MERGE METHOD
MINUTE_P MINVALUE MODE MONTH_P MOVE
 
NAME_P NAMES NATIONAL NATURAL NCHAR NEW NEXT NFC NFD NFKC NFKD NO NONE
@@ 

Re: Row pattern recognition

2023-12-06 Thread Peter Eisentraut

On 04.12.23 12:40, Tatsuo Ishii wrote:

diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index d631ac89a9..5a77fca17f 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -251,6 +251,8 @@ static Node *makeRecursiveViewSelect(char *relname, List 
*aliases, Node *query);
DefElem*defelt;
SortBy *sortby;
WindowDef  *windef;
+   RPCommonSyntax  *rpcom;
+   RPSubsetItem*rpsubset;
JoinExpr   *jexpr;
IndexElem  *ielem;
StatsElem  *selem;
@@ -278,6 +280,7 @@ static Node *makeRecursiveViewSelect(char *relname, List 
*aliases, Node *query);
MergeWhenClause *mergewhen;
struct KeyActions *keyactions;
struct KeyAction *keyaction;
+   RPSkipToskipto;
  }
  
  %type 	stmt toplevel_stmt schema_stmt routine_body_stmt


It is usually not the style to add an entry for every node type to the 
%union.  Otherwise, we'd have hundreds of entries in there.



@@ -866,6 +878,7 @@ static Node *makeRecursiveViewSelect(char *relname, List 
*aliases, Node *query);
  %nonassoc UNBOUNDED   /* ideally would have same precedence 
as IDENT */
  %nonassoc IDENT PARTITION RANGE ROWS GROUPS PRECEDING FOLLOWING CUBE 
ROLLUP
SET KEYS OBJECT_P SCALAR VALUE_P WITH WITHOUT
+%nonassoc  MEASURES AFTER INITIAL SEEK PATTERN_P
  %left Op OPERATOR /* multi-character ops and user-defined 
operators */
  %left '+' '-'
  %left '*' '/' '%'


It was recently discussed that these %nonassoc should ideally all have 
the same precedence.  Did you consider that here?






Re: Row pattern recognition

2023-10-30 Thread Jacob Champion
On Tue, Oct 24, 2023 at 7:49 PM Tatsuo Ishii  wrote:
> I am impressed the simple NFA implementation.

Thanks!

> It would be nicer if it
> could be implemented without using recursion.

Yeah. If for some reason we end up going with a bespoke
implementation, I assume we'd just convert the algorithm to an
iterative one and optimize it heavily. But I didn't want to do that
too early, since it'd probably make it harder to add new features...
and anyway my goal is still to try to reuse src/backend/regex
eventually.

> > SELECT aid, first_value(aid) OVER w,
> > count(*) OVER w
> > FROM pgbench_accounts
> > WINDOW w AS (
> > PARTITION BY bid
> > ORDER BY aid
> > ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
> > AFTER MATCH SKIP PAST LAST ROW
> > INITIAL
> > PATTERN (START UP+)
> > DEFINE
> > START AS TRUE,
> > UP AS aid > PREV(aid)
> > );
>
> I ran this against your patch. It failed around > 60k rows.

Nice, that's actually more frames than I expected. Looks like I have
similar results here with my second test query (segfault at ~58k
rows).

Thanks,
--Jacob




Re: Row pattern recognition

2023-10-24 Thread Tatsuo Ishii
> Great. I will look into this.

I am impressed the simple NFA implementation.  It would be nicer if it
could be implemented without using recursion.

> By the way, I tested my patch (v10) to handle more large data set and
> tried to following query with pgbench database. On my laptop it works
> with 100k rows pgbench_accounts table but with beyond the number I got
   ~~~ I meant 10k.

> OOM killer. I would like to enhance this in the next patch.
> 
> SELECT aid, first_value(aid) OVER w,
> count(*) OVER w
> FROM pgbench_accounts
> WINDOW w AS (
> PARTITION BY bid
> ORDER BY aid
> ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
> AFTER MATCH SKIP PAST LAST ROW
> INITIAL
> PATTERN (START UP+)
> DEFINE
> START AS TRUE,
> UP AS aid > PREV(aid)
> );

I ran this against your patch. It failed around > 60k rows.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp




Re: Row pattern recognition

2023-10-24 Thread Tatsuo Ishii
> On Sat, Oct 21, 2023 at 7:39 PM Tatsuo Ishii  wrote:
>> Attached is the v10 patch. This version enhances the performance of
>> pattern matching.
> 
> Nice! I've attached a couple of more stressful tests (window
> partitions of 1000 rows each). Beware that the second one runs my
> desktop out of memory fairly quickly with the v10 implementation.
> 
> I was able to carve out some time this week to implement a very basic
> recursive NFA, which handles both the + and * qualifiers (attached).

Great. I will look into this.

> It's not production quality -- a frame on the call stack for every row
> isn't going to work

Yeah.

> -- but with only those two features, it's pretty
> tiny, and it's able to run the new stress tests with no issue. If I
> continue to have time, I hope to keep updating this parallel
> implementation as you add features to the StringSet implementation,
> and we can see how it evolves. I expect that alternation and grouping
> will ratchet up the complexity.

Sounds like a plan.

By the way, I tested my patch (v10) to handle more large data set and
tried to following query with pgbench database. On my laptop it works
with 100k rows pgbench_accounts table but with beyond the number I got
OOM killer. I would like to enhance this in the next patch.

SELECT aid, first_value(aid) OVER w,
count(*) OVER w
FROM pgbench_accounts
WINDOW w AS (
PARTITION BY bid
ORDER BY aid
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
AFTER MATCH SKIP PAST LAST ROW
INITIAL
PATTERN (START UP+)
DEFINE
START AS TRUE,
UP AS aid > PREV(aid)
);

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp


Re: Row pattern recognition

2023-10-24 Thread Jacob Champion
On Sat, Oct 21, 2023 at 7:39 PM Tatsuo Ishii  wrote:
> Attached is the v10 patch. This version enhances the performance of
> pattern matching.

Nice! I've attached a couple of more stressful tests (window
partitions of 1000 rows each). Beware that the second one runs my
desktop out of memory fairly quickly with the v10 implementation.

I was able to carve out some time this week to implement a very basic
recursive NFA, which handles both the + and * qualifiers (attached).
It's not production quality -- a frame on the call stack for every row
isn't going to work -- but with only those two features, it's pretty
tiny, and it's able to run the new stress tests with no issue. If I
continue to have time, I hope to keep updating this parallel
implementation as you add features to the StringSet implementation,
and we can see how it evolves. I expect that alternation and grouping
will ratchet up the complexity.

Thanks!
--Jacob
From fb3cbb6f99f0fe7b05027759454d7e0013225929 Mon Sep 17 00:00:00 2001
From: Jacob Champion 
Date: Fri, 20 Oct 2023 16:11:14 -0700
Subject: [PATCH 2/2] squash! Row pattern recognition patch (executor).

- Extract pattern matching logic into match_pattern().
- Replace the str_set/initials implementation with a recursive
  backtracker.
- Rework match_pattern in preparation for * quantifier: separate success
  from number of matched rows.
- Handle `*` quantifier
- Simplify the evaluate_pattern() signature.
- Remove defineInitial and related code.
---
 src/backend/executor/nodeWindowAgg.c| 871 +++-
 src/backend/optimizer/plan/createplan.c |   4 -
 src/backend/parser/parse_clause.c   |  31 -
 src/include/nodes/execnodes.h   |   1 -
 src/include/nodes/parsenodes.h  |   2 -
 src/include/nodes/plannodes.h   |   3 -
 6 files changed, 91 insertions(+), 821 deletions(-)

diff --git a/src/backend/executor/nodeWindowAgg.c 
b/src/backend/executor/nodeWindowAgg.c
index 2e1baef7ea..30abe60159 100644
--- a/src/backend/executor/nodeWindowAgg.c
+++ b/src/backend/executor/nodeWindowAgg.c
@@ -161,40 +161,6 @@ typedef struct WindowStatePerAggData
boolrestart;/* need to restart this agg in 
this cycle? */
 } WindowStatePerAggData;
 
-/*
- * Set of StringInfo. Used in RPR.
- */
-typedef struct StringSet {
-   StringInfo  *str_set;
-   Sizeset_size;   /* current array allocation size in 
number of items */
-   int set_index;  /* current used size */
-} StringSet;
-
-/*
- * Allowed subsequent PATTERN variables positions.
- * Used in RPR.
- *
- * pos represents the pattern variable defined order in DEFINE caluase.  For
- * example. "DEFINE START..., UP..., DOWN ..." and "PATTERN START UP DOWN UP"
- * will create:
- * VariablePos[0].pos[0] = 0;  START
- * VariablePos[1].pos[0] = 1;  UP
- * VariablePos[1].pos[1] = 3;  UP
- * VariablePos[2].pos[0] = 2;  DOWN
- *
- * Note that UP has two pos because UP appears in PATTERN twice.
- *
- * By using this strucrture, we can know which pattern variable can be followed
- * by which pattern variable(s). For example, START can be followed by UP and
- * DOWN since START's pos is 0, and UP's pos is 1 or 3, DOWN's pos is 2.
- * DOWN can be followed by UP since UP's pos is either 1 or 3. 
- * 
- */
-#define NUM_ALPHABETS  26  /* we allow [a-z] variable initials */
-typedef struct VariablePos {
-   int pos[NUM_ALPHABETS]; /* postion(s) in 
PATTERN */
-} VariablePos;
-
 static void initialize_windowaggregate(WindowAggState *winstate,
   
WindowStatePerFunc perfuncstate,
   
WindowStatePerAgg peraggstate);
@@ -249,27 +215,11 @@ static void register_reduced_frame_map(WindowAggState 
*winstate, int64 pos, int
 static void clear_reduced_frame_map(WindowAggState *winstate);
 static void update_reduced_frame(WindowObject winobj, int64 pos);
 
-static int64 evaluate_pattern(WindowObject winobj, int64 current_pos,
- char *vname, 
StringInfo encoded_str, bool *result);
+static bool evaluate_pattern(WindowObject winobj, int64 current_pos,
+char *vname);
 
 static bool get_slots(WindowObject winobj, int64 current_pos);
 
-//static int   search_str_set(char *pattern, StringSet *str_set, int 
*initial_orders);
-static int search_str_set(char *pattern, StringSet *str_set, VariablePos 
*variable_pos);
-static charpattern_initial(WindowAggState *winstate, char *vname);
-static int do_pattern_match(char *pattern, char *encoded_str);
-
-static StringSet *string_set_init(void);
-static void string_set_add(StringSet *string_set, StringInfo str);
-static StringInfo string_set_get(StringSet *string_set, int index);
-stat

Re: Row pattern recognition

2023-10-21 Thread Tatsuo Ishii
Attached is the v10 patch. This version enhances the performance of
pattern matching.  Previously it generated all possible pattern string
candidates. This resulted in unnecessarily large number of
candidates. For example if you have 2 pattern variables and the target
frame includes 100 rows, the number of candidates can reach to 2^100
in the worst case. To avoid this, I do a pruning in the v10
patch. Suppose you have:

PATTERN (A B+ C+)

Candidates like "BAC" "CAB" cannot survive because they never satisfy
the search pattern. To judge this, I assign sequence numbers (0, 1, 2)
to (A B C).  If the pattern generator tries to generate BA, this is
not allowed because the sequence number for B is 1 and for A is 0, and
0 < 1: B cannot be followed by A. Note that this technique can be
applied when the quantifiers are "+" or "*". Maybe other quantifiers
such as '?'  or '{n, m}' can be applied too but I don't confirm yet
because I have not implemented them yet.

Besides this improvement, I fixed a bug in the previous and older
patches: when an expression in DEFINE uses text operators, it errors
out:

ERROR:  could not determine which collation to use for string comparison
HINT:  Use the COLLATE clause to set the collation explicitly.

This was fixed by adding assign_expr_collations() in
transformDefineClause().

Also I have updated documentation "3.5. Window Functions"

- It still mentioned about rpr(). It's not applied anymore.
- Enhance the description about DEFINE and PATTERN.
- Mention that quantifier '*' is supported.

Finally I have added more test cases to the regression test.
- same pattern variable appears twice
- case for quantifier '*'

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
>From a8b41dd155a2d9ce969083fcfc53bbae3c28b263 Mon Sep 17 00:00:00 2001
From: Tatsuo Ishii 
Date: Sun, 22 Oct 2023 11:22:10 +0900
Subject: [PATCH v10 1/7] Row pattern recognition patch for raw parser.

---
 src/backend/parser/gram.y   | 222 ++--
 src/include/nodes/parsenodes.h  |  56 
 src/include/parser/kwlist.h |   8 ++
 src/include/parser/parse_node.h |   1 +
 4 files changed, 273 insertions(+), 14 deletions(-)

diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index c224df4ecc..e09eb061f8 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -251,6 +251,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	DefElem	   *defelt;
 	SortBy	   *sortby;
 	WindowDef  *windef;
+	RPCommonSyntax	*rpcom;
+	RPSubsetItem	*rpsubset;
 	JoinExpr   *jexpr;
 	IndexElem  *ielem;
 	StatsElem  *selem;
@@ -278,6 +280,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	MergeWhenClause *mergewhen;
 	struct KeyActions *keyactions;
 	struct KeyAction *keyaction;
+	RPSkipTo	skipto;
 }
 
 %type 	stmt toplevel_stmt schema_stmt routine_body_stmt
@@ -453,8 +456,12 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 TriggerTransitions TriggerReferencing
 vacuum_relation_list opt_vacuum_relation_list
 drop_option_list pub_obj_list
-
-%type 	opt_routine_body
+row_pattern_measure_list row_pattern_definition_list
+opt_row_pattern_subset_clause
+row_pattern_subset_list row_pattern_subset_rhs
+row_pattern
+%type 	 row_pattern_subset_item
+%type 	opt_routine_body row_pattern_term
 %type  group_clause
 %type 	group_by_list
 %type 	group_by_item empty_grouping_set rollup_clause cube_clause
@@ -551,6 +558,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type 	relation_expr_opt_alias
 %type 	tablesample_clause opt_repeatable_clause
 %type 	target_el set_target insert_column_item
+row_pattern_measure_item row_pattern_definition
+%type 	first_or_last
 
 %type 		generic_option_name
 %type 	generic_option_arg
@@ -633,6 +642,9 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type 	window_clause window_definition_list opt_partition_clause
 %type 	window_definition over_clause window_specification
 opt_frame_clause frame_extent frame_bound
+%type 	opt_row_pattern_common_syntax opt_row_pattern_skip_to
+%type 	opt_row_pattern_initial_or_seek
+%type 	opt_row_pattern_measures
 %type 	opt_window_exclusion_clause
 %type 		opt_existing_window_name
 %type  opt_if_not_exists
@@ -659,7 +671,6 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 json_object_constructor_null_clause_opt
 json_array_constructor_null_clause_opt
 
-
 /*
  * Non-keyword token types.  These are hard-wired into the "flex" lexer.
  * They must be listed first so that their numeric codes do not depend on
@@ -702,7 +713,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	CURRENT_TIME CURRENT_TIMESTAMP 

Re: Row pattern recognition

2023-10-04 Thread Tatsuo Ishii
> By the way, I was thinking about eliminating recusrive calls in
> pattern matching. Attached is the first cut of the implementation. In
> the attached v8 patch:
> 
> - No recursive calls anymore. search_str_set_recurse was removed.
> 
> - Instead it generates all possible pattern variable name initial
>   strings before pattern matching. Suppose we have "ab" (row 0) and
>   "ac" (row 1). "a" and "b" represents the pattern variable names
>   which are evaluated to true.  In this case it will generate "aa",
>   "ac", "ba" and "bc" and they are examined by do_pattern_match one by
>   one, which performs pattern matching.
> 
> - To implement this, an infrastructure string_set* are created. They
>   take care of set of StringInfo.
> 
> I found that the previous implementation did not search all possible
> cases. I believe the bug is fixed in v8.

The v8 patch does not apply anymore due to commit d060e921ea "Remove obsolete 
executor cleanup code".
So I rebased and created v9 patch. The differences from the v8 include:

- Fix bug with get_slots. It did not correctly detect the end of full frame.
- Add test case using "ROWS BETWEEN CURRENT ROW AND offset FOLLOWING".

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
>From d8fd143ab717fd62814e73fd24bf1a1694143dfc Mon Sep 17 00:00:00 2001
From: Tatsuo Ishii 
Date: Wed, 4 Oct 2023 14:51:48 +0900
Subject: [PATCH v9 1/7] Row pattern recognition patch for raw parser.

---
 src/backend/parser/gram.y   | 222 ++--
 src/include/nodes/parsenodes.h  |  56 
 src/include/parser/kwlist.h |   8 ++
 src/include/parser/parse_node.h |   1 +
 4 files changed, 273 insertions(+), 14 deletions(-)

diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index e56cbe77cb..730c51bc87 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -251,6 +251,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	DefElem	   *defelt;
 	SortBy	   *sortby;
 	WindowDef  *windef;
+	RPCommonSyntax	*rpcom;
+	RPSubsetItem	*rpsubset;
 	JoinExpr   *jexpr;
 	IndexElem  *ielem;
 	StatsElem  *selem;
@@ -278,6 +280,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	MergeWhenClause *mergewhen;
 	struct KeyActions *keyactions;
 	struct KeyAction *keyaction;
+	RPSkipTo	skipto;
 }
 
 %type 	stmt toplevel_stmt schema_stmt routine_body_stmt
@@ -453,8 +456,12 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 TriggerTransitions TriggerReferencing
 vacuum_relation_list opt_vacuum_relation_list
 drop_option_list pub_obj_list
-
-%type 	opt_routine_body
+row_pattern_measure_list row_pattern_definition_list
+opt_row_pattern_subset_clause
+row_pattern_subset_list row_pattern_subset_rhs
+row_pattern
+%type 	 row_pattern_subset_item
+%type 	opt_routine_body row_pattern_term
 %type  group_clause
 %type 	group_by_list
 %type 	group_by_item empty_grouping_set rollup_clause cube_clause
@@ -551,6 +558,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type 	relation_expr_opt_alias
 %type 	tablesample_clause opt_repeatable_clause
 %type 	target_el set_target insert_column_item
+row_pattern_measure_item row_pattern_definition
+%type 	first_or_last
 
 %type 		generic_option_name
 %type 	generic_option_arg
@@ -633,6 +642,9 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type 	window_clause window_definition_list opt_partition_clause
 %type 	window_definition over_clause window_specification
 opt_frame_clause frame_extent frame_bound
+%type 	opt_row_pattern_common_syntax opt_row_pattern_skip_to
+%type 	opt_row_pattern_initial_or_seek
+%type 	opt_row_pattern_measures
 %type 	opt_window_exclusion_clause
 %type 		opt_existing_window_name
 %type  opt_if_not_exists
@@ -659,7 +671,6 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 json_object_constructor_null_clause_opt
 json_array_constructor_null_clause_opt
 
-
 /*
  * Non-keyword token types.  These are hard-wired into the "flex" lexer.
  * They must be listed first so that their numeric codes do not depend on
@@ -702,7 +713,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	CURRENT_TIME CURRENT_TIMESTAMP CURRENT_USER CURSOR CYCLE
 
 	DATA_P DATABASE DAY_P DEALLOCATE DEC DECIMAL_P DECLARE DEFAULT DEFAULTS
-	DEFERRABLE DEFERRED DEFINER DELETE_P DELIMITER DELIMITERS DEPENDS DEPTH DESC
+	DEFERRABLE DEFERRED DEFINE DEFINER DELETE_P DELIMITER DELIMITERS DEPENDS DEPTH DESC
 	DETACH DICTIONARY DISABLE_P DISCARD DISTINCT DO DOCUMENT_P DOMAIN_P
 	DOUBLE_P DROP
 
@@ -718,7 +729,7 @@ static Node *make

Re: Row pattern recognition

2023-09-24 Thread Tatsuo Ishii
> On Fri, Sep 22, 2023, 3:13 AM Tatsuo Ishii  wrote:
> 
>> > Op 9/22/23 om 07:16 schreef Tatsuo Ishii:
>> >>> Attached is the fix against v6 patch. I will include this in upcoming
>> >>> v7 patch.
>> >> Attached is the v7 patch. It includes the fix mentioned above.  Also
>> > (Champion's address bounced; removed)
>>
>> On my side his adress bounced too:-<
>>
> 
> Yep. I'm still here, just lurking for now. It'll take a little time for me
> to get back to this thread, as my schedule has changed significantly. :D

Hope you get back soon...

By the way, I was thinking about eliminating recusrive calls in
pattern matching. Attached is the first cut of the implementation. In
the attached v8 patch:

- No recursive calls anymore. search_str_set_recurse was removed.

- Instead it generates all possible pattern variable name initial
  strings before pattern matching. Suppose we have "ab" (row 0) and
  "ac" (row 1). "a" and "b" represents the pattern variable names
  which are evaluated to true.  In this case it will generate "aa",
  "ac", "ba" and "bc" and they are examined by do_pattern_match one by
  one, which performs pattern matching.

- To implement this, an infrastructure string_set* are created. They
  take care of set of StringInfo.

I found that the previous implementation did not search all possible
cases. I believe the bug is fixed in v8.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
>From 281ee5c3eae85f96783422cb2f9987c3af8c8a68 Mon Sep 17 00:00:00 2001
From: Tatsuo Ishii 
Date: Mon, 25 Sep 2023 14:01:14 +0900
Subject: [PATCH v8 1/7] Row pattern recognition patch for raw parser.

---
 src/backend/parser/gram.y   | 222 ++--
 src/include/nodes/parsenodes.h  |  56 
 src/include/parser/kwlist.h |   8 ++
 src/include/parser/parse_node.h |   1 +
 4 files changed, 273 insertions(+), 14 deletions(-)

diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 7d2032885e..74c2069050 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -251,6 +251,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	DefElem	   *defelt;
 	SortBy	   *sortby;
 	WindowDef  *windef;
+	RPCommonSyntax	*rpcom;
+	RPSubsetItem	*rpsubset;
 	JoinExpr   *jexpr;
 	IndexElem  *ielem;
 	StatsElem  *selem;
@@ -278,6 +280,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	MergeWhenClause *mergewhen;
 	struct KeyActions *keyactions;
 	struct KeyAction *keyaction;
+	RPSkipTo	skipto;
 }
 
 %type 	stmt toplevel_stmt schema_stmt routine_body_stmt
@@ -453,8 +456,12 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 TriggerTransitions TriggerReferencing
 vacuum_relation_list opt_vacuum_relation_list
 drop_option_list pub_obj_list
-
-%type 	opt_routine_body
+row_pattern_measure_list row_pattern_definition_list
+opt_row_pattern_subset_clause
+row_pattern_subset_list row_pattern_subset_rhs
+row_pattern
+%type 	 row_pattern_subset_item
+%type 	opt_routine_body row_pattern_term
 %type  group_clause
 %type 	group_by_list
 %type 	group_by_item empty_grouping_set rollup_clause cube_clause
@@ -551,6 +558,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type 	relation_expr_opt_alias
 %type 	tablesample_clause opt_repeatable_clause
 %type 	target_el set_target insert_column_item
+row_pattern_measure_item row_pattern_definition
+%type 	first_or_last
 
 %type 		generic_option_name
 %type 	generic_option_arg
@@ -633,6 +642,9 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type 	window_clause window_definition_list opt_partition_clause
 %type 	window_definition over_clause window_specification
 opt_frame_clause frame_extent frame_bound
+%type 	opt_row_pattern_common_syntax opt_row_pattern_skip_to
+%type 	opt_row_pattern_initial_or_seek
+%type 	opt_row_pattern_measures
 %type 	opt_window_exclusion_clause
 %type 		opt_existing_window_name
 %type  opt_if_not_exists
@@ -659,7 +671,6 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 json_object_constructor_null_clause_opt
 json_array_constructor_null_clause_opt
 
-
 /*
  * Non-keyword token types.  These are hard-wired into the "flex" lexer.
  * They must be listed first so that their numeric codes do not depend on
@@ -702,7 +713,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	CURRENT_TIME CURRENT_TIMESTAMP CURRENT_USER CURSOR CYCLE
 
 	DATA_P DATABASE DAY_P DEALLOCATE DEC DECIMAL_P DECLARE DEFAULT DEFAULTS
-	DEFERRABLE DEFERRED DEFINER DELETE_P DELIMITER DELIMITERS DEPENDS DEPTH DESC
+

Re: Row pattern recognition

2023-09-22 Thread Jacob Champion
On Fri, Sep 22, 2023, 3:13 AM Tatsuo Ishii  wrote:

> > Op 9/22/23 om 07:16 schreef Tatsuo Ishii:
> >>> Attached is the fix against v6 patch. I will include this in upcoming
> >>> v7 patch.
> >> Attached is the v7 patch. It includes the fix mentioned above.  Also
> > (Champion's address bounced; removed)
>
> On my side his adress bounced too:-<
>

Yep. I'm still here, just lurking for now. It'll take a little time for me
to get back to this thread, as my schedule has changed significantly. :D

Thanks,
--Jacob

>


Re: Row pattern recognition

2023-09-22 Thread Erik Rijkers

Op 9/22/23 om 12:12 schreef Tatsuo Ishii:

Op 9/22/23 om 07:16 schreef Tatsuo Ishii:

Attached is the fix against v6 patch. I will include this in upcoming
v7 patch.

Attached is the v7 patch. It includes the fix mentioned above.  Also

(Champion's address bounced; removed)


On my side his adress bounced too:-<


Hi,

In my hands, make check fails on the rpr test; see attached .diff
file.
In these two statements:
-- using NEXT
-- using AFTER MATCH SKIP TO NEXT ROW
   result of first_value(price) and next_value(price) are empty.


Strange. I have checked out fresh master branch and applied the v7
patches, then ran make check. All tests including the rpr test
passed. This is Ubuntu 20.04.


The curious thing is that the server otherwise builds ok, and if I 
explicitly run on that server 'CREATE TEMP TABLE stock' + the 20 INSERTS 
 (just to make sure to have known data), those two statements now both 
return the correct result.


So maybe the testing/timing is wonky (not necessarily the server).

Erik



Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp





Re: Row pattern recognition

2023-09-22 Thread Tatsuo Ishii
> Op 9/22/23 om 07:16 schreef Tatsuo Ishii:
>>> Attached is the fix against v6 patch. I will include this in upcoming
>>> v7 patch.
>> Attached is the v7 patch. It includes the fix mentioned above.  Also
> (Champion's address bounced; removed)

On my side his adress bounced too:-<

> Hi,
> 
> In my hands, make check fails on the rpr test; see attached .diff
> file.
> In these two statements:
> -- using NEXT
> -- using AFTER MATCH SKIP TO NEXT ROW
>   result of first_value(price) and next_value(price) are empty.

Strange. I have checked out fresh master branch and applied the v7
patches, then ran make check. All tests including the rpr test
passed. This is Ubuntu 20.04.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp




Re: Row pattern recognition

2023-09-22 Thread Erik Rijkers

Op 9/22/23 om 07:16 schreef Tatsuo Ishii:

Attached is the fix against v6 patch. I will include this in upcoming v7 patch.


Attached is the v7 patch. It includes the fix mentioned above.  Also


Hi,

In my hands, make check fails on the rpr test; see attached .diff file.
In these two statements:
-- using NEXT
-- using AFTER MATCH SKIP TO NEXT ROW
  result of first_value(price) and next_value(price) are empty.


Erik Rijkers



this time the pattern matching engine is enhanced: previously it
recursed to row direction, which means if the number of rows in a
frame is large, it could exceed the limit of stack depth.  The new
version recurses over matched pattern variables in a row, which is at
most 26 which should be small enough.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jpdiff -U3 
/home/aardvark/pg_stuff/pg_sandbox/pgsql.rpr/src/test/regress/expected/rpr.out 
/home/aardvark/pg_stuff/pg_sandbox/pgsql.rpr/src/test/regress/results/rpr.out
--- 
/home/aardvark/pg_stuff/pg_sandbox/pgsql.rpr/src/test/regress/expected/rpr.out  
2023-09-22 09:04:17.770392635 +0200
+++ 
/home/aardvark/pg_stuff/pg_sandbox/pgsql.rpr/src/test/regress/results/rpr.out   
2023-09-22 09:38:23.826458109 +0200
@@ -253,23 +253,23 @@
 );
  company  |   tdate| price | first_value | last_value 
 --++---+-+
- company1 | 07-01-2023 |   100 | 100 |200
+ company1 | 07-01-2023 |   100 | |   
  company1 | 07-02-2023 |   200 | |   
  company1 | 07-03-2023 |   150 | |   
- company1 | 07-04-2023 |   140 | 140 |150
+ company1 | 07-04-2023 |   140 | |   
  company1 | 07-05-2023 |   150 | |   
  company1 | 07-06-2023 |90 | |   
- company1 | 07-07-2023 |   110 | 110 |130
+ company1 | 07-07-2023 |   110 | |   
  company1 | 07-08-2023 |   130 | |   
  company1 | 07-09-2023 |   120 | |   
  company1 | 07-10-2023 |   130 | |   
- company2 | 07-01-2023 |50 |  50 |   2000
+ company2 | 07-01-2023 |50 | |   
  company2 | 07-02-2023 |  2000 | |   
  company2 | 07-03-2023 |  1500 | |   
- company2 | 07-04-2023 |  1400 |1400 |   1500
+ company2 | 07-04-2023 |  1400 | |   
  company2 | 07-05-2023 |  1500 | |   
  company2 | 07-06-2023 |60 | |   
- company2 | 07-07-2023 |  1100 |1100 |   1300
+ company2 | 07-07-2023 |  1100 | |   
  company2 | 07-08-2023 |  1300 | |   
  company2 | 07-09-2023 |  1200 | |   
  company2 | 07-10-2023 |  1300 | |   
@@ -290,23 +290,23 @@
 );
  company  |   tdate| price | first_value | last_value 
 --++---+-+
- company1 | 07-01-2023 |   100 | 100 |200
+ company1 | 07-01-2023 |   100 | |   
  company1 | 07-02-2023 |   200 | |   
  company1 | 07-03-2023 |   150 | |   
- company1 | 07-04-2023 |   140 | 140 |150
+ company1 | 07-04-2023 |   140 | |   
  company1 | 07-05-2023 |   150 | |   
  company1 | 07-06-2023 |90 | |   
- company1 | 07-07-2023 |   110 | 110 |130
+ company1 | 07-07-2023 |   110 | |   
  company1 | 07-08-2023 |   130 | |   
  company1 | 07-09-2023 |   120 | |   
  company1 | 07-10-2023 |   130 | |   
- company2 | 07-01-2023 |50 |  50 |   2000
+ company2 | 07-01-2023 |50 | |   
  company2 | 07-02-2023 |  2000 | |   
  company2 | 07-03-2023 |  1500 | |   
- company2 | 07-04-2023 |  1400 |1400 |   1500
+ company2 | 07-04-2023 |  1400 | |   
  company2 | 07-05-2023 |  1500 | |   
  company2 | 07-06-2023 |60 | |   
- company2 | 07-07-2023 |  1100 |1100 |   1300
+ company2 | 07-07-2023 |  1100 | |   
  company2 | 07-08-2023 |  1300 | |   
  company2 | 07-09-2023 |  1200 | |   
  company2 | 07-10-2023 |  1300 | |   


Re: Row pattern recognition

2023-09-22 Thread Erik Rijkers

Op 9/22/23 om 10:23 schreef Erik Rijkers:

Op 9/22/23 om 07:16 schreef Tatsuo Ishii:
Attached is the fix against v6 patch. I will include this in upcoming 
v7 patch.


Attached is the v7 patch. It includes the fix mentioned above.  Also

(Champion's address bounced; removed)



Sorry, I forgot to re-attach the regression.diffs with resend...

Erik


Hi,

In my hands, make check fails on the rpr test; see attached .diff file.
In these two statements:
-- using NEXT
-- using AFTER MATCH SKIP TO NEXT ROW
   result of first_value(price) and next_value(price) are empty.

Erik Rijkers



this time the pattern matching engine is enhanced: previously it
recursed to row direction, which means if the number of rows in a
frame is large, it could exceed the limit of stack depth.  The new
version recurses over matched pattern variables in a row, which is at
most 26 which should be small enough.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp


diff -U3 
/home/aardvark/pg_stuff/pg_sandbox/pgsql.rpr/src/test/regress/expected/rpr.out 
/home/aardvark/pg_stuff/pg_sandbox/pgsql.rpr/src/test/regress/results/rpr.out
--- 
/home/aardvark/pg_stuff/pg_sandbox/pgsql.rpr/src/test/regress/expected/rpr.out  
2023-09-22 09:04:17.770392635 +0200
+++ 
/home/aardvark/pg_stuff/pg_sandbox/pgsql.rpr/src/test/regress/results/rpr.out   
2023-09-22 09:38:23.826458109 +0200
@@ -253,23 +253,23 @@
 );
  company  |   tdate| price | first_value | last_value 
 --++---+-+
- company1 | 07-01-2023 |   100 | 100 |200
+ company1 | 07-01-2023 |   100 | |   
  company1 | 07-02-2023 |   200 | |   
  company1 | 07-03-2023 |   150 | |   
- company1 | 07-04-2023 |   140 | 140 |150
+ company1 | 07-04-2023 |   140 | |   
  company1 | 07-05-2023 |   150 | |   
  company1 | 07-06-2023 |90 | |   
- company1 | 07-07-2023 |   110 | 110 |130
+ company1 | 07-07-2023 |   110 | |   
  company1 | 07-08-2023 |   130 | |   
  company1 | 07-09-2023 |   120 | |   
  company1 | 07-10-2023 |   130 | |   
- company2 | 07-01-2023 |50 |  50 |   2000
+ company2 | 07-01-2023 |50 | |   
  company2 | 07-02-2023 |  2000 | |   
  company2 | 07-03-2023 |  1500 | |   
- company2 | 07-04-2023 |  1400 |1400 |   1500
+ company2 | 07-04-2023 |  1400 | |   
  company2 | 07-05-2023 |  1500 | |   
  company2 | 07-06-2023 |60 | |   
- company2 | 07-07-2023 |  1100 |1100 |   1300
+ company2 | 07-07-2023 |  1100 | |   
  company2 | 07-08-2023 |  1300 | |   
  company2 | 07-09-2023 |  1200 | |   
  company2 | 07-10-2023 |  1300 | |   
@@ -290,23 +290,23 @@
 );
  company  |   tdate| price | first_value | last_value 
 --++---+-+
- company1 | 07-01-2023 |   100 | 100 |200
+ company1 | 07-01-2023 |   100 | |   
  company1 | 07-02-2023 |   200 | |   
  company1 | 07-03-2023 |   150 | |   
- company1 | 07-04-2023 |   140 | 140 |150
+ company1 | 07-04-2023 |   140 | |   
  company1 | 07-05-2023 |   150 | |   
  company1 | 07-06-2023 |90 | |   
- company1 | 07-07-2023 |   110 | 110 |130
+ company1 | 07-07-2023 |   110 | |   
  company1 | 07-08-2023 |   130 | |   
  company1 | 07-09-2023 |   120 | |   
  company1 | 07-10-2023 |   130 | |   
- company2 | 07-01-2023 |50 |  50 |   2000
+ company2 | 07-01-2023 |50 | |   
  company2 | 07-02-2023 |  2000 | |   
  company2 | 07-03-2023 |  1500 | |   
- company2 | 07-04-2023 |  1400 |1400 |   1500
+ company2 | 07-04-2023 |  1400 | |   
  company2 | 07-05-2023 |  1500 | |   
  company2 | 07-06-2023 |60 | |   
- company2 | 07-07-2023 |  1100 |1100 |   1300
+ company2 | 07-07-2023 |  1100 | |   
  company2 | 07-08-2023 |  1300 | |   
  company2 | 07-09-2023 |  1200 | |   
  company2 | 07-10-2023 |  1300 | |   


Re: Row pattern recognition

2023-09-22 Thread Erik Rijkers

Op 9/22/23 om 07:16 schreef Tatsuo Ishii:

Attached is the fix against v6 patch. I will include this in upcoming v7 patch.


Attached is the v7 patch. It includes the fix mentioned above.  Also

(Champion's address bounced; removed)

Hi,

In my hands, make check fails on the rpr test; see attached .diff file.
In these two statements:
-- using NEXT
-- using AFTER MATCH SKIP TO NEXT ROW
  result of first_value(price) and next_value(price) are empty.

Erik Rijkers



this time the pattern matching engine is enhanced: previously it
recursed to row direction, which means if the number of rows in a
frame is large, it could exceed the limit of stack depth.  The new
version recurses over matched pattern variables in a row, which is at
most 26 which should be small enough.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp





Re: Row pattern recognition

2023-09-21 Thread Tatsuo Ishii
> Attached is the fix against v6 patch. I will include this in upcoming v7 
> patch.

Attached is the v7 patch. It includes the fix mentioned above.  Also
this time the pattern matching engine is enhanced: previously it
recursed to row direction, which means if the number of rows in a
frame is large, it could exceed the limit of stack depth.  The new
version recurses over matched pattern variables in a row, which is at
most 26 which should be small enough.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
>From 4d5be4c27ae5e4a50924520574e2c1ca3e398551 Mon Sep 17 00:00:00 2001
From: Tatsuo Ishii 
Date: Fri, 22 Sep 2023 13:53:35 +0900
Subject: [PATCH v7 1/7] Row pattern recognition patch for raw parser.

---
 src/backend/parser/gram.y   | 222 ++--
 src/include/nodes/parsenodes.h  |  56 
 src/include/parser/kwlist.h |   8 ++
 src/include/parser/parse_node.h |   1 +
 4 files changed, 273 insertions(+), 14 deletions(-)

diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 7d2032885e..74c2069050 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -251,6 +251,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	DefElem	   *defelt;
 	SortBy	   *sortby;
 	WindowDef  *windef;
+	RPCommonSyntax	*rpcom;
+	RPSubsetItem	*rpsubset;
 	JoinExpr   *jexpr;
 	IndexElem  *ielem;
 	StatsElem  *selem;
@@ -278,6 +280,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	MergeWhenClause *mergewhen;
 	struct KeyActions *keyactions;
 	struct KeyAction *keyaction;
+	RPSkipTo	skipto;
 }
 
 %type 	stmt toplevel_stmt schema_stmt routine_body_stmt
@@ -453,8 +456,12 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 TriggerTransitions TriggerReferencing
 vacuum_relation_list opt_vacuum_relation_list
 drop_option_list pub_obj_list
-
-%type 	opt_routine_body
+row_pattern_measure_list row_pattern_definition_list
+opt_row_pattern_subset_clause
+row_pattern_subset_list row_pattern_subset_rhs
+row_pattern
+%type 	 row_pattern_subset_item
+%type 	opt_routine_body row_pattern_term
 %type  group_clause
 %type 	group_by_list
 %type 	group_by_item empty_grouping_set rollup_clause cube_clause
@@ -551,6 +558,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type 	relation_expr_opt_alias
 %type 	tablesample_clause opt_repeatable_clause
 %type 	target_el set_target insert_column_item
+row_pattern_measure_item row_pattern_definition
+%type 	first_or_last
 
 %type 		generic_option_name
 %type 	generic_option_arg
@@ -633,6 +642,9 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type 	window_clause window_definition_list opt_partition_clause
 %type 	window_definition over_clause window_specification
 opt_frame_clause frame_extent frame_bound
+%type 	opt_row_pattern_common_syntax opt_row_pattern_skip_to
+%type 	opt_row_pattern_initial_or_seek
+%type 	opt_row_pattern_measures
 %type 	opt_window_exclusion_clause
 %type 		opt_existing_window_name
 %type  opt_if_not_exists
@@ -659,7 +671,6 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 json_object_constructor_null_clause_opt
 json_array_constructor_null_clause_opt
 
-
 /*
  * Non-keyword token types.  These are hard-wired into the "flex" lexer.
  * They must be listed first so that their numeric codes do not depend on
@@ -702,7 +713,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	CURRENT_TIME CURRENT_TIMESTAMP CURRENT_USER CURSOR CYCLE
 
 	DATA_P DATABASE DAY_P DEALLOCATE DEC DECIMAL_P DECLARE DEFAULT DEFAULTS
-	DEFERRABLE DEFERRED DEFINER DELETE_P DELIMITER DELIMITERS DEPENDS DEPTH DESC
+	DEFERRABLE DEFERRED DEFINE DEFINER DELETE_P DELIMITER DELIMITERS DEPENDS DEPTH DESC
 	DETACH DICTIONARY DISABLE_P DISCARD DISTINCT DO DOCUMENT_P DOMAIN_P
 	DOUBLE_P DROP
 
@@ -718,7 +729,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	HANDLER HAVING HEADER_P HOLD HOUR_P
 
 	IDENTITY_P IF_P ILIKE IMMEDIATE IMMUTABLE IMPLICIT_P IMPORT_P IN_P INCLUDE
-	INCLUDING INCREMENT INDENT INDEX INDEXES INHERIT INHERITS INITIALLY INLINE_P
+	INCLUDING INCREMENT INDENT INDEX INDEXES INHERIT INHERITS INITIAL INITIALLY INLINE_P
 	INNER_P INOUT INPUT_P INSENSITIVE INSERT INSTEAD INT_P INTEGER
 	INTERSECT INTERVAL INTO INVOKER IS ISNULL ISOLATION
 
@@ -731,7 +742,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	LEADING LEAKPROOF LEAST LEFT LEVEL LIKE LIMIT LISTEN LOAD LOCAL
 	LOCALTIME LOCALTIMESTAMP LOCATION LOCK_P LOCKED LOGGED
 
-	MAPPING MATCH MATCHED MATERIALIZED MAXVALUE MERGE METHOD
+	MAPPING MATCH MATCHED MATERIALIZED MAXVALUE MEASURES MERGE METHOD
 	MINUTE_P MINVALUE MODE MONTH_P MOVE
 
 	NA

Re: Row pattern recognition

2023-09-13 Thread Tatsuo Ishii
> On 9/13/23 07:14, Tatsuo Ishii wrote:
>>> 
>> I was looking for this but I only found ISO/IEC 19075-5:2021.
>> https://www.iso.org/standard/78936.html
>> Maybe 19075-5:2021 is the latest one?
> 
> Yes, probably.  Sorry.

No problem. Thanks for confirmation.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp




Re: Row pattern recognition

2023-09-13 Thread Vik Fearing

On 9/13/23 07:14, Tatsuo Ishii wrote:




I was looking for this but I only found ISO/IEC 19075-5:2021.
https://www.iso.org/standard/78936.html

Maybe 19075-5:2021 is the latest one?


Yes, probably.  Sorry.
--
Vik Fearing





Re: Row pattern recognition

2023-09-12 Thread Tatsuo Ishii
> 

I was looking for this but I only found ISO/IEC 19075-5:2021.
https://www.iso.org/standard/78936.html

Maybe 19075-5:2021 is the latest one?

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp




Re: Row pattern recognition

2023-09-12 Thread Jacob Champion
On Mon, Sep 11, 2023 at 11:18 PM Tatsuo Ishii  wrote:
> What I am not sure about is, you and Vik mentioned that the
> traditional NFA is superior that POSIX NFA in terms of performance.
> But how "lexicographic ordering" is related to performance?

I think they're only tangentially related. POSIX NFAs have to fully
backtrack even after the first match is found, so that's where the
performance difference comes in. (We would be introducing new ways to
catastrophically backtrack if we used that approach.) But since you
don't visit every possible path through the graph with a traditional
NFA, it makes sense to define an order in which you visit the nodes,
so that you can reason about which string is actually going to be
matched in the end.

> BTW, attched is the v6 patch. The differences from v5 include:
>
> - Now aggregates can be used with RPR. Below is an example from the
>   regression test cases, which is added by v6 patch.

Great, thank you!

--Jacob




Re: Row pattern recognition

2023-09-12 Thread Tatsuo Ishii
Regarding v6 patch:

> SELECT company, tdate, price,
>  first_value(price) OVER w,
>  last_value(price) OVER w,
>  max(price) OVER w,
>  min(price) OVER w,
>  sum(price) OVER w,
>  avg(price) OVER w,
>  count(price) OVER w
> FROM stock
> WINDOW w AS (
> PARTITION BY company
> ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
> AFTER MATCH SKIP PAST LAST ROW
> INITIAL
> PATTERN (START UP+ DOWN+)
> DEFINE
> START AS TRUE,
> UP AS price > PREV(price),
> DOWN AS price < PREV(price)
> );
>  company  |   tdate| price | first_value | last_value | max  | min | sum  
> |  avg  | count 
> --++---+-++--+-+--+---+---
>  company1 | 07-01-2023 |   100 | 100 |140 |  200 | 100 |  590 
> |  147.5000 | 4
>  company1 | 07-02-2023 |   200 | ||  | |  
> |   |  
>  company1 | 07-03-2023 |   150 | ||  | |  
> |   |  
>  company1 | 07-04-2023 |   140 | ||  | |  
> |   |  
>  company1 | 07-05-2023 |   150 | ||  | |  
> |   |  
>  company1 | 07-06-2023 |90 |  90 |120 |  130 |  90 |  450 
> |  112.5000 | 4
>  company1 | 07-07-2023 |   110 | ||  | |  
> |   |  
>  company1 | 07-08-2023 |   130 | ||  | |  
> |   |  
>  company1 | 07-09-2023 |   120 | ||  | |  
> |   |  
>  company1 | 07-10-2023 |   130 | ||  | |  
> |   |  
>  company2 | 07-01-2023 |50 |  50 |   1400 | 2000 |  50 | 4950 
> | 1237.5000 | 4
>  company2 | 07-02-2023 |  2000 | ||  | |  
> |   |  
>  company2 | 07-03-2023 |  1500 | ||  | |  
> |   |  
>  company2 | 07-04-2023 |  1400 | ||  | |  
> |   |  
>  company2 | 07-05-2023 |  1500 | ||  | |  
> |   |  
>  company2 | 07-06-2023 |60 |  60 |   1200 | 1300 |  60 | 3660 
> |  915. | 4
>  company2 | 07-07-2023 |  1100 | ||  | |  
> |   |  
>  company2 | 07-08-2023 |  1300 | ||  | |  
> |   |  
>  company2 | 07-09-2023 |  1200 | ||  | |  
> |   |  
>  company2 | 07-10-2023 |  1300 | ||  | |  
> |   |  
> (20 rows)

count column for unmatched rows should have been 0, rather than
NULL. i.e.

 company  |   tdate| price | first_value | last_value | max  | min | sum  | 
 avg  | count 
--++---+-++--+-+--+---+---
 company1 | 07-01-2023 |   100 | 100 |140 |  200 | 100 |  590 | 
 147.5000 | 4
 company1 | 07-02-2023 |   200 | ||  | |  | 
  |  
 company1 | 07-03-2023 |   150 | ||  | |  | 
  |  
 company1 | 07-04-2023 |   140 | ||  | |  | 
  |  
 company1 | 07-05-2023 |   150 | ||  | |  | 
  | 0
 company1 | 07-06-2023 |90 |  90 |120 |  130 |  90 |  450 | 
 112.5000 | 4
 company1 | 07-07-2023 |   110 | ||  | |  | 
  |  
 company1 | 07-08-2023 |   130 | ||  | |  | 
  |  
 company1 | 07-09-2023 |   120 | ||  | |  | 
  |  
 company1 | 07-10-2023 |   130 | ||  | |  | 
  | 0
 company2 | 07-01-2023 |50 |  50 |   1400 | 2000 |  50 | 4950 | 
1237.5000 | 4
 company2 | 07-02-2023 |  2000 | ||  | |  | 
  |  
 company2 | 07-03-2023 |  1500 | ||  | |  | 
  |  
 company2 | 07-04-2023 |  1400 | ||  | |  | 
  |  
 company2 | 07-05-2023 |  1500 | ||  | |  | 
  | 0
 

Re: Row pattern recognition

2023-09-12 Thread Tatsuo Ishii
  |  
 company2 | 07-09-2023 |  1200 | ||  | |  | 
  |  
 company2 | 07-10-2023 |  1300 | ||  | |  | 
  |  
(20 rows)

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
>From c5699cdf64df9b102c5d9e3b6f6002e504c93fc6 Mon Sep 17 00:00:00 2001
From: Tatsuo Ishii 
Date: Tue, 12 Sep 2023 14:22:22 +0900
Subject: [PATCH v6 1/7] Row pattern recognition patch for raw parser.

---
 src/backend/parser/gram.y   | 216 +---
 src/include/nodes/parsenodes.h  |  56 +
 src/include/parser/kwlist.h |   8 ++
 src/include/parser/parse_node.h |   1 +
 4 files changed, 267 insertions(+), 14 deletions(-)

diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 7d2032885e..70409cdc9a 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -251,6 +251,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	DefElem	   *defelt;
 	SortBy	   *sortby;
 	WindowDef  *windef;
+	RPCommonSyntax	*rpcom;
+	RPSubsetItem	*rpsubset;
 	JoinExpr   *jexpr;
 	IndexElem  *ielem;
 	StatsElem  *selem;
@@ -453,8 +455,12 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 TriggerTransitions TriggerReferencing
 vacuum_relation_list opt_vacuum_relation_list
 drop_option_list pub_obj_list
-
-%type 	opt_routine_body
+row_pattern_measure_list row_pattern_definition_list
+opt_row_pattern_subset_clause
+row_pattern_subset_list row_pattern_subset_rhs
+row_pattern
+%type 	 row_pattern_subset_item
+%type 	opt_routine_body row_pattern_term
 %type  group_clause
 %type 	group_by_list
 %type 	group_by_item empty_grouping_set rollup_clause cube_clause
@@ -551,6 +557,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type 	relation_expr_opt_alias
 %type 	tablesample_clause opt_repeatable_clause
 %type 	target_el set_target insert_column_item
+row_pattern_measure_item row_pattern_definition
 
 %type 		generic_option_name
 %type 	generic_option_arg
@@ -633,6 +640,9 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type 	window_clause window_definition_list opt_partition_clause
 %type 	window_definition over_clause window_specification
 opt_frame_clause frame_extent frame_bound
+%type 	opt_row_pattern_common_syntax opt_row_pattern_skip_to
+%type 	opt_row_pattern_initial_or_seek
+%type 	opt_row_pattern_measures
 %type 	opt_window_exclusion_clause
 %type 		opt_existing_window_name
 %type  opt_if_not_exists
@@ -659,7 +669,6 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 json_object_constructor_null_clause_opt
 json_array_constructor_null_clause_opt
 
-
 /*
  * Non-keyword token types.  These are hard-wired into the "flex" lexer.
  * They must be listed first so that their numeric codes do not depend on
@@ -702,7 +711,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	CURRENT_TIME CURRENT_TIMESTAMP CURRENT_USER CURSOR CYCLE
 
 	DATA_P DATABASE DAY_P DEALLOCATE DEC DECIMAL_P DECLARE DEFAULT DEFAULTS
-	DEFERRABLE DEFERRED DEFINER DELETE_P DELIMITER DELIMITERS DEPENDS DEPTH DESC
+	DEFERRABLE DEFERRED DEFINE DEFINER DELETE_P DELIMITER DELIMITERS DEPENDS DEPTH DESC
 	DETACH DICTIONARY DISABLE_P DISCARD DISTINCT DO DOCUMENT_P DOMAIN_P
 	DOUBLE_P DROP
 
@@ -718,7 +727,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	HANDLER HAVING HEADER_P HOLD HOUR_P
 
 	IDENTITY_P IF_P ILIKE IMMEDIATE IMMUTABLE IMPLICIT_P IMPORT_P IN_P INCLUDE
-	INCLUDING INCREMENT INDENT INDEX INDEXES INHERIT INHERITS INITIALLY INLINE_P
+	INCLUDING INCREMENT INDENT INDEX INDEXES INHERIT INHERITS INITIAL INITIALLY INLINE_P
 	INNER_P INOUT INPUT_P INSENSITIVE INSERT INSTEAD INT_P INTEGER
 	INTERSECT INTERVAL INTO INVOKER IS ISNULL ISOLATION
 
@@ -731,7 +740,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	LEADING LEAKPROOF LEAST LEFT LEVEL LIKE LIMIT LISTEN LOAD LOCAL
 	LOCALTIME LOCALTIMESTAMP LOCATION LOCK_P LOCKED LOGGED
 
-	MAPPING MATCH MATCHED MATERIALIZED MAXVALUE MERGE METHOD
+	MAPPING MATCH MATCHED MATERIALIZED MAXVALUE MEASURES MERGE METHOD
 	MINUTE_P MINVALUE MODE MONTH_P MOVE
 
 	NAME_P NAMES NATIONAL NATURAL NCHAR NEW NEXT NFC NFD NFKC NFKD NO NONE
@@ -743,8 +752,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	ORDER ORDINALITY OTHERS OUT_P OUTER_P
 	OVER OVERLAPS OVERLAY OVERRIDING OWNED OWNER
 
-	PARALLEL PARAMETER PARSER PARTIAL PARTITION PASSING PASSWORD
-	PLACING PLANS POLICY
+	PARALLEL PARAMETER PARSER PARTIAL PARTITION PASSING PASSWORD PAST
+	PATTERN_P PERMUTE PLACING PLANS POLICY
 	POSITION PRECEDING PRECISION PRESERVE PREPARE PREPARED PRIMARY
 	PRIOR PRIVILEGES PROCEDURAL PROCEDURE PROCEDURES PR

Re: Row pattern recognition

2023-09-11 Thread Jacob Champion
On Sat, Sep 9, 2023 at 4:21 AM Tatsuo Ishii  wrote:
> Then we will get for str_set:
> r0: B
> r1: AB
>
> Because r0 only has classifier B, r1 can have A and B.  Problem is,
> r2. If we choose A at r1, then r2 = B. But if we choose B at t1, then
> r2 = AB. I guess this is the issue you pointed out.

Right.

> Yeah, probably we have delay evaluation of such pattern variables like
> A, then reevaluate A after the first scan.
>
> What about leaving this (reevaluation) for now? Because:
>
> 1) we don't have CLASSIFIER
> 2) we don't allow to give CLASSIFIER to PREV as its arggument
>
> so I think we don't need to worry about this for now.

Sure. I'm all for deferring features to make it easier to iterate; I
just want to make sure the architecture doesn't hit a dead end. Or at
least, not without being aware of it.

Also: is CLASSIFIER the only way to run into this issue?

> What if we don't follow the standard, instead we follow POSIX EREs?  I
> think this is better for users unless RPR's REs has significant merit
> for users.

Piggybacking off of what Vik wrote upthread, I think we would not be
doing ourselves any favors by introducing a non-compliant
implementation that performs worse than a traditional NFA. Those would
be some awful bug reports.

> > - I think we have to implement a parallel parser regardless (RPR PATTERN
> > syntax looks incompatible with POSIX)
>
> I am not sure if we need to worry about this because of the reason I
> mentioned above.

Even if we adopted POSIX NFA semantics, we'd still have to implement
our own parser for the PATTERN part of the query. I don't think
there's a good way for us to reuse the parser in src/backend/regex.

> > Does that seem like a workable approach? (Worst-case, my code is just
> > horrible, and we throw it in the trash.)
>
> Yes, it seems workable. I think for the first cut of RPR needs at
> least the +quantifier with reasonable performance. The current naive
> implementation seems to have issue because of exhaustive search.

+1

Thanks!
--Jacob




Re: Row pattern recognition

2023-09-09 Thread Vik Fearing

On 9/9/23 13:21, Tatsuo Ishii wrote:

Thanks for the explanation.  Surprising concet of the standard:-)




This leaves the choice between traditional NFA and Posix NFA. The 
difference between these is that a traditional NFA exits (declares a 
match) as soon as it finds the first possible match, whereas a Posix NFA 
is obliged to find all possible matches and then choose the “leftmost 
longest”. There are examples that show that, even for conventional 
regular expression matching on text strings and without back references, 
there are patterns for which a Posix NFA is orders of magnitude slower 
than a traditional NFA. In addition, reluctant quantifiers cannot be 
defined in a Posix NFA, because of the leftmost longest rule.


Therefore it was decided not to use the Posix NFA model, which leaves 
the traditional NFA as the model for row pattern matching. Among 
available tools that use traditional NFA engines, Perl is the most 
influential; therefore Perl was adopted as the design target for pattern 
matching rules.





Is it different from SIMILAR TO REs too?


Of course it is. :-)  SIMILAR TO uses its own language and rules.


What if we don't follow the standard, instead we follow POSIX EREs?  I
think this is better for users unless RPR's REs has significant merit
for users.


This would get big pushback from me.
--
Vik Fearing





Re: Row pattern recognition

2023-09-09 Thread Tatsuo Ishii
Hi,

>> But:
>> 
>> UP AS price > PREV(price)
>> 
>> also depends on previous row, no?
> 
> PREV(CLASSIFIER()) depends not on the value of the previous row but the
> state of the match so far. To take an example from the patch:
> 
>> * Example:
>> * str_set[0] = "AB";
>> * str_set[1] = "AC";
>> * In this case at row 0 A and B are true, and A and C are true in row 1.
> 
> With these str_sets and my example DEFINE, row[1] is only classifiable
> as 'A' if row[0] is *not* classified as 'A' at this point in the match.
> "AA" is not a valid candidate string, even if it matches the PATTERN.

Ok, Let me clarify my understanding. Suppose we have:

PATTER (A B)
DEFINE A AS PREV(CLASSIFIER()) IS DISTINCT FROM 'A',
B AS price > 100

and the target table has price column values:

row[0]: 110
row[1]: 110
row[2]: 110
row[3]: 110

Then we will get for str_set:
r0: B
r1: AB

Because r0 only has classifier B, r1 can have A and B.  Problem is,
r2. If we choose A at r1, then r2 = B. But if we choose B at t1, then
r2 = AB. I guess this is the issue you pointed out.

> So if we don't reevaluate the pattern variable condition for the row, we
> at least have to prune the combinations that search_str_set() visits, so
> that we don't generate a logically impossible combination. That seems
> like it could be pretty fragile, and it may be difficult for us to prove
> compliance.

Yeah, probably we have delay evaluation of such pattern variables like
A, then reevaluate A after the first scan.

What about leaving this (reevaluation) for now? Because:

1) we don't have CLASSIFIER
2) we don't allow to give CLASSIFIER to PREV as its arggument

so I think we don't need to worry about this for now.

>> Can you explain a little bit more? I think 'aaa' matches a regular
>> expression 'a+(b|a)+' and should be no problem before "aab" is
>> considered.
> 
> Assuming I've understood the rules correctly, we're not allowed to
> classify the last row as 'A' if it also matches 'B'. Lexicographic
> ordering takes precedence, so we have to try "aab" first. Otherwise our
> query could return different results compared to another implementation.
> 
>>> Similarly, the assumption that we want to match the longest string only
>>> works because we don't allow alternation yet.
>> 
>> Can you please clarify more on this?
> 
> Sure: for the pattern
> 
> PATTERN ( (A|B)+ )
> 
> we have to consider the candidate "a" over the candidate "ba", even
> though the latter is longer. Like the prior example, lexicographic
> ordering is considered more important than the greedy quantifier.
> Quoting ISO/IEC 9075-2:2016:
> 
> More precisely, with both reluctant and greedy quantifiers, the set
> of matches is ordered lexicographically, but when one match is an
> initial substring of another match, reluctant quantifiers prefer the
> shorter match (the substring), whereas greedy quantifiers prefer the
> longer match (the “superstring”).
> 
> Here, "ba" doesn't have "a" as a prefix, so "ba" doesn't get priority.
> ISO/IEC 19075-5:2021 has a big section on this (7.2) with worked examples.
> 
> (The "lexicographic order matters more than greediness" concept was the
> most mind-bending part for me so far, probably because I haven't figured
> out how to translate the concept into POSIX EREs. It wouldn't make sense
> to say "the letter 't' can match 'a', 'B', or '3' in this regex", but
> that's what RPR is doing.)

Thanks for the explanation.  Surprising concet of the standard:-) Is
it different from SIMILAR TO REs too?

What if we don't follow the standard, instead we follow POSIX EREs?  I
think this is better for users unless RPR's REs has significant merit
for users.

>> Do you mean you want to provide a better patch for the pattern
>> matching part? That will be helpfull.
> 
> No guarantees that I'll find a better patch :D But yes, I will give it a
> try.

Ok.

>> Because I am currently working
>> on the aggregation part and have no time to do it. However, the
>> aggregation work affects the v5 patch: it needs a refactoring. So can
>> you wait until I release v6 patch? I hope it will be released in two
>> weeks or so.
> 
> Absolutely!

Thanks.

> Heh, I think it would be pretty foolish for me to code an NFA, from
> scratch, and then try to convince the community to maintain it.
> 
> But:
> - I think we have to implement a parallel parser regardless (RPR PATTERN
> syntax looks incompatible with POSIX)

I am not sure if we need to worry about this because of the reason I
mentioned above.

> - I suspect we need more control over the backtracking than the current
> pg_reg* API is going to give us, or else I'm worried performance is
> going to fall off a cliff with usefully-large partitions

Agreed.

> - there's a lot of stuff in POSIX EREs that we don't need, and of the
> features we do need, the + quantifier is probably one of the easiest
> - it seems easier to prove the correctness of a slow, naive,
> row-at-a-time engine, because we can compare it directly 

Re: Row pattern recognition

2023-09-08 Thread Vik Fearing

On 9/8/23 21:27, Jacob Champion wrote:

On 9/7/23 20:54, Tatsuo Ishii wrote:



But it's easy
to come up with a pattern where that's the wrong order, like

 PATTERN ( A+ (B|A)+ )

Now "aaa" will be considered before "aab", which isn't correct.


Can you explain a little bit more? I think 'aaa' matches a regular
expression 'a+(b|a)+' and should be no problem before "aab" is
considered.


Assuming I've understood the rules correctly, we're not allowed to
classify the last row as 'A' if it also matches 'B'. Lexicographic
ordering takes precedence, so we have to try "aab" first. Otherwise our
query could return different results compared to another implementation.



Your understanding is correct.
--
Vik Fearing





Re: Row pattern recognition

2023-09-08 Thread Jacob Champion
On 9/7/23 20:54, Tatsuo Ishii wrote:
>> DEFINE
>>   A AS PREV(CLASSIFIER()) IS DISTINCT FROM 'A',
>>   ...
> 
> But:
> 
> UP AS price > PREV(price)
> 
> also depends on previous row, no?

PREV(CLASSIFIER()) depends not on the value of the previous row but the
state of the match so far. To take an example from the patch:

> * Example:
> * str_set[0] = "AB";
> * str_set[1] = "AC";
> * In this case at row 0 A and B are true, and A and C are true in row 1.

With these str_sets and my example DEFINE, row[1] is only classifiable
as 'A' if row[0] is *not* classified as 'A' at this point in the match.
"AA" is not a valid candidate string, even if it matches the PATTERN.

So if we don't reevaluate the pattern variable condition for the row, we
at least have to prune the combinations that search_str_set() visits, so
that we don't generate a logically impossible combination. That seems
like it could be pretty fragile, and it may be difficult for us to prove
compliance.

>> But it's easy
>> to come up with a pattern where that's the wrong order, like
>>
>> PATTERN ( A+ (B|A)+ )
>>
>> Now "aaa" will be considered before "aab", which isn't correct.
> 
> Can you explain a little bit more? I think 'aaa' matches a regular
> expression 'a+(b|a)+' and should be no problem before "aab" is
> considered.

Assuming I've understood the rules correctly, we're not allowed to
classify the last row as 'A' if it also matches 'B'. Lexicographic
ordering takes precedence, so we have to try "aab" first. Otherwise our
query could return different results compared to another implementation.

>> Similarly, the assumption that we want to match the longest string only
>> works because we don't allow alternation yet.
> 
> Can you please clarify more on this?

Sure: for the pattern

PATTERN ( (A|B)+ )

we have to consider the candidate "a" over the candidate "ba", even
though the latter is longer. Like the prior example, lexicographic
ordering is considered more important than the greedy quantifier.
Quoting ISO/IEC 9075-2:2016:

More precisely, with both reluctant and greedy quantifiers, the set
of matches is ordered lexicographically, but when one match is an
initial substring of another match, reluctant quantifiers prefer the
shorter match (the substring), whereas greedy quantifiers prefer the
longer match (the “superstring”).

Here, "ba" doesn't have "a" as a prefix, so "ba" doesn't get priority.
ISO/IEC 19075-5:2021 has a big section on this (7.2) with worked examples.

(The "lexicographic order matters more than greediness" concept was the
most mind-bending part for me so far, probably because I haven't figured
out how to translate the concept into POSIX EREs. It wouldn't make sense
to say "the letter 't' can match 'a', 'B', or '3' in this regex", but
that's what RPR is doing.)

>> Cool, I will give this piece some more thought. Do you mind if I try to
>> add some more complicated pattern quantifiers to stress the
>> architecture, or would you prefer to tackle that later? Just alternation
>> by itself will open up a world of corner cases.
> 
> Do you mean you want to provide a better patch for the pattern
> matching part? That will be helpfull.

No guarantees that I'll find a better patch :D But yes, I will give it a
try.

> Because I am currently working
> on the aggregation part and have no time to do it. However, the
> aggregation work affects the v5 patch: it needs a refactoring. So can
> you wait until I release v6 patch? I hope it will be released in two
> weeks or so.

Absolutely!

>> But I'm wondering if we might want to just implement the NFA directly?
>> The current implementation's Cartesian explosion can probably be pruned
>> aggressively, but replaying the entire regex match once for every
>> backtracked step will still duplicate a lot of work.
> 
> Not sure if you mean implementing new regular expression engine
> besides src/backend/regexp. I am afraid it's not a trivial work. The
> current regexp code consists of over 10k lines. What do you think?

Heh, I think it would be pretty foolish for me to code an NFA, from
scratch, and then try to convince the community to maintain it.

But:
- I think we have to implement a parallel parser regardless (RPR PATTERN
syntax looks incompatible with POSIX)
- I suspect we need more control over the backtracking than the current
pg_reg* API is going to give us, or else I'm worried performance is
going to fall off a cliff with usefully-large partitions
- there's a lot of stuff in POSIX EREs that we don't need, and of the
features we do need, the + quantifier is probably one of the easiest
- it seems easier to prove the correctness of a slow, naive,
row-at-a-time engine, because we can compare it directly to the spec

So what I'm thinking is: if I start by open-coding the + quantifier, and
slowly add more pieces in, then it might be easier to see the parts of
src/backend/regex that I've duplicated. We can try to expose those parts
directly from the 

Re: Row pattern recognition

2023-09-07 Thread Tatsuo Ishii
Hi,

> Hello!
> 
>> (1) I completely changed the pattern matching engine so that it
>> performs backtracking. Now the engine evaluates all pattern elements
>> defined in PATTER against each row, saving matched pattern variables
>> in a string per row. For example if the pattern element A and B
>> evaluated to true, a string "AB" is created for current row.
> 
> If I understand correctly, this strategy assumes that one row's
> membership in a pattern variable is independent of the other rows'
> membership. But I don't think that's necessarily true:
> 
> DEFINE
>   A AS PREV(CLASSIFIER()) IS DISTINCT FROM 'A',
>   ...

But:

UP AS price > PREV(price)

also depends on previous row, no? Can you please elaborate how your
example could break current implementation? I cannot test it because
CLASSIFIER is not implemented yet.

>> See row_is_in_reduced_frame, search_str_set and search_str_set_recurse
>> in nodeWindowAggs.c for more details. For now I use a naive depth
>> first search and probably there is a lot of rooms for optimization
>> (for example rewriting it without using
>> recursion).
> 
> The depth-first match is doing a lot of subtle work here. For example, with
> 
> PATTERN ( A+ B+ )
> DEFINE A AS TRUE,
>B AS TRUE
> 
> (i.e. all rows match both variables), and three rows in the partition,
> our candidates will be tried in the order
> 
> aaa
> aab
> aba
> abb
> ...
> bbb
> 
> The only possible matches against our regex `^a+b+` are "aab" and "abb",
> and that order matches the preferment order, so it's fine. But it's easy
> to come up with a pattern where that's the wrong order, like
> 
> PATTERN ( A+ (B|A)+ )
> 
> Now "aaa" will be considered before "aab", which isn't correct.

Can you explain a little bit more? I think 'aaa' matches a regular
expression 'a+(b|a)+' and should be no problem before "aab" is
considered.

> Similarly, the assumption that we want to match the longest string only
> works because we don't allow alternation yet.

Can you please clarify more on this?

> Cool, I will give this piece some more thought. Do you mind if I try to
> add some more complicated pattern quantifiers to stress the
> architecture, or would you prefer to tackle that later? Just alternation
> by itself will open up a world of corner cases.

Do you mean you want to provide a better patch for the pattern
matching part? That will be helpfull. Because I am currently working
on the aggregation part and have no time to do it. However, the
aggregation work affects the v5 patch: it needs a refactoring. So can
you wait until I release v6 patch? I hope it will be released in two
weeks or so.

> On 8/9/23 01:41, Tatsuo Ishii wrote:
>> - I found a bug with pattern matching code. It creates a string for
>>   subsequent regular expression matching. It uses the initial letter
>>   of each define variable name. For example, if the varname is "foo",
>>   then "f" is used. Obviously this makes trouble if we have two or
>>   more variables starting with same "f" (e.g. "food"). To fix this, I
>>   assign [a-z] to each variable instead of its initial letter. However
>>   this way limits us not to have more than 26 variables. I hope 26 is
>>   enough for most use cases.
> 
> There are still plenty of alphanumerics left that could be assigned...
> 
> But I'm wondering if we might want to just implement the NFA directly?
> The current implementation's Cartesian explosion can probably be pruned
> aggressively, but replaying the entire regex match once for every
> backtracked step will still duplicate a lot of work.

Not sure if you mean implementing new regular expression engine
besides src/backend/regexp. I am afraid it's not a trivial work. The
current regexp code consists of over 10k lines. What do you think?

> I've attached another test case; it looks like last_value() is depending
> on some sort of side effect from either first_value() or nth_value(). I
> know the window frame itself is still under construction, so apologies
> if this is an expected failure.

Thanks. Fortunately current code which I am working passes the new
test. I will include it in the next (v6) patch.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp




Re: Row pattern recognition

2023-09-07 Thread Jacob Champion
Hello!

> (1) I completely changed the pattern matching engine so that it
> performs backtracking. Now the engine evaluates all pattern elements
> defined in PATTER against each row, saving matched pattern variables
> in a string per row. For example if the pattern element A and B
> evaluated to true, a string "AB" is created for current row.

If I understand correctly, this strategy assumes that one row's
membership in a pattern variable is independent of the other rows'
membership. But I don't think that's necessarily true:

DEFINE
  A AS PREV(CLASSIFIER()) IS DISTINCT FROM 'A',
  ...

> See row_is_in_reduced_frame, search_str_set and search_str_set_recurse
> in nodeWindowAggs.c for more details. For now I use a naive depth
> first search and probably there is a lot of rooms for optimization
> (for example rewriting it without using
> recursion).

The depth-first match is doing a lot of subtle work here. For example, with

PATTERN ( A+ B+ )
DEFINE A AS TRUE,
   B AS TRUE

(i.e. all rows match both variables), and three rows in the partition,
our candidates will be tried in the order

aaa
aab
aba
abb
...
bbb

The only possible matches against our regex `^a+b+` are "aab" and "abb",
and that order matches the preferment order, so it's fine. But it's easy
to come up with a pattern where that's the wrong order, like

PATTERN ( A+ (B|A)+ )

Now "aaa" will be considered before "aab", which isn't correct.

Similarly, the assumption that we want to match the longest string only
works because we don't allow alternation yet.

> Suggestions/patches are welcome.

Cool, I will give this piece some more thought. Do you mind if I try to
add some more complicated pattern quantifiers to stress the
architecture, or would you prefer to tackle that later? Just alternation
by itself will open up a world of corner cases.

> With the new engine, cases above do not fail anymore. See new
> regression test cases. Thanks for providing valuable test cases!

You're very welcome!

On 8/9/23 01:41, Tatsuo Ishii wrote:
> - I found a bug with pattern matching code. It creates a string for
>   subsequent regular expression matching. It uses the initial letter
>   of each define variable name. For example, if the varname is "foo",
>   then "f" is used. Obviously this makes trouble if we have two or
>   more variables starting with same "f" (e.g. "food"). To fix this, I
>   assign [a-z] to each variable instead of its initial letter. However
>   this way limits us not to have more than 26 variables. I hope 26 is
>   enough for most use cases.

There are still plenty of alphanumerics left that could be assigned...

But I'm wondering if we might want to just implement the NFA directly?
The current implementation's Cartesian explosion can probably be pruned
aggressively, but replaying the entire regex match once for every
backtracked step will still duplicate a lot of work.

--

I've attached another test case; it looks like last_value() is depending
on some sort of side effect from either first_value() or nth_value(). I
know the window frame itself is still under construction, so apologies
if this is an expected failure.

Thanks!
--Jacobdiff --git a/src/test/regress/expected/rpr.out 
b/src/test/regress/expected/rpr.out
index 340ccf242c..bd35e8389e 100644
--- a/src/test/regress/expected/rpr.out
+++ b/src/test/regress/expected/rpr.out
@@ -89,6 +89,44 @@ SELECT company, tdate, price, first_value(price) OVER w, 
last_value(price) OVER
  company2 | 07-10-2023 |  1300 | || 
 (20 rows)
 
+-- last_value() should remain consistent
+SELECT company, tdate, price, last_value(price) OVER w
+ FROM stock
+ WINDOW w AS (
+ PARTITION BY company
+ ORDER BY tdate
+ ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
+ INITIAL
+ PATTERN (START UP+ DOWN+)
+ DEFINE
+  START AS TRUE,
+  UP AS price > PREV(price),
+  DOWN AS price < PREV(price)
+);
+ company  |   tdate| price | last_value 
+--++---+
+ company1 | 07-01-2023 |   100 |140
+ company1 | 07-02-2023 |   200 |   
+ company1 | 07-03-2023 |   150 |   
+ company1 | 07-04-2023 |   140 |   
+ company1 | 07-05-2023 |   150 |   
+ company1 | 07-06-2023 |90 |120
+ company1 | 07-07-2023 |   110 |   
+ company1 | 07-08-2023 |   130 |   
+ company1 | 07-09-2023 |   120 |   
+ company1 | 07-10-2023 |   130 |   
+ company2 | 07-01-2023 |50 |   1400
+ company2 | 07-02-2023 |  2000 |   
+ company2 | 07-03-2023 |  1500 |   
+ company2 | 07-04-2023 |  1400 |   
+ company2 | 07-05-2023 |  1500 |   
+ company2 | 07-06-2023 |60 |   1200
+ company2 | 07-07-2023 |  1100 |   
+ company2 | 07-08-2023 |  1300 |   
+ company2 | 07-09-2023 |  1200 |   
+ company2 | 07-10-2023 |  1300 |   
+(20 rows)
+
 -- omit "START" in DEFINE but it is ok because "START 

Re: Row pattern recognition

2023-09-02 Thread Tatsuo Ishii
> Hi,
> 
> The patches compile & tests run fine but this statement from the
> documentation crashes an assert-enabled server:
> 
> SELECT company, tdate, price, max(price) OVER w FROM stock
> WINDOW w AS (
> PARTITION BY company
> ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
> AFTER MATCH SKIP PAST LAST ROW
> INITIAL
> PATTERN (LOWPRICE UP+ DOWN+)
> DEFINE
> LOWPRICE AS price <= 100,
> UP AS price > PREV(price),
> DOWN AS price < PREV(price)
> );
> server closed the connection unexpectedly
>   This probably means the server terminated abnormally
>   before or while processing the request.
> connection to server was lost

Thank you for the report. Currently the patch has an issue with
aggregate functions including max. I have been working on aggregations
in row pattern recognition but will take more time to complete the
part.

In the mean time if you want to play with RPR, you can try window
functions. Examples can be found in src/test/regress/sql/rpr.sql.
Here is one of this:

-- the first row start with less than or equal to 100
SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER 
w
 FROM stock
 WINDOW w AS (
 PARTITION BY company
 ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
 INITIAL
 PATTERN (LOWPRICE UP+ DOWN+)
 DEFINE
  LOWPRICE AS price <= 100,
  UP AS price > PREV(price),
  DOWN AS price < PREV(price)
);

-- second row raises 120%
SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER 
w
 FROM stock
 WINDOW w AS (
 PARTITION BY company
 ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
 INITIAL
 PATTERN (LOWPRICE UP+ DOWN+)
 DEFINE
  LOWPRICE AS price <= 100,
  UP AS price > PREV(price) * 1.2,
  DOWN AS price < PREV(price)
);

Sorry for inconvenience.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp




Re: Row pattern recognition

2023-09-02 Thread Erik Rijkers

Op 9/2/23 om 08:52 schreef Tatsuo Ishii:


Attached is the v5 patch. Differences from previous patch include:



Hi,

The patches compile & tests run fine but this statement from the 
documentation crashes an assert-enabled server:


SELECT company, tdate, price, max(price) OVER w FROM stock
WINDOW w AS (
PARTITION BY company
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
AFTER MATCH SKIP PAST LAST ROW
INITIAL
PATTERN (LOWPRICE UP+ DOWN+)
DEFINE
LOWPRICE AS price <= 100,
UP AS price > PREV(price),
DOWN AS price < PREV(price)
);
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
connection to server was lost


Log file:

TRAP: failed Assert("aggregatedupto_nonrestarted <= 
winstate->aggregatedupto"), File: "nodeWindowAgg.c", Line: 1054, PID: 68975
postgres: 17_rpr_d0ec_gulo: aardvark testdb ::1(34808) 
SELECT(ExceptionalCondition+0x54)[0x9b0824]

postgres: 17_rpr_d0ec_gulo: aardvark testdb ::1(34808) SELECT[0x71ae8d]
postgres: 17_rpr_d0ec_gulo: aardvark testdb ::1(34808) 
SELECT(standard_ExecutorRun+0x13a)[0x6def9a]

/home/aardvark/pg_stuff/pg_installations/pgsql.rpr/lib/pg_stat_statements.so(+0x55e5)[0x7ff3798b95e5]
/home/aardvark/pg_stuff/pg_installations/pgsql.rpr/lib/auto_explain.so(+0x2680)[0x7ff3798ab680]
postgres: 17_rpr_d0ec_gulo: aardvark testdb ::1(34808) SELECT[0x88a4ff]
postgres: 17_rpr_d0ec_gulo: aardvark testdb ::1(34808) 
SELECT(PortalRun+0x240)[0x88bb50]

postgres: 17_rpr_d0ec_gulo: aardvark testdb ::1(34808) SELECT[0x887cca]
postgres: 17_rpr_d0ec_gulo: aardvark testdb ::1(34808) 
SELECT(PostgresMain+0x14dc)[0x88958c]

postgres: 17_rpr_d0ec_gulo: aardvark testdb ::1(34808) SELECT[0x7fb0da]
postgres: 17_rpr_d0ec_gulo: aardvark testdb ::1(34808) 
SELECT(PostmasterMain+0xd2d)[0x7fc01d]
postgres: 17_rpr_d0ec_gulo: aardvark testdb ::1(34808) 
SELECT(main+0x1e0)[0x5286d0]

/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xea)[0x7ff378e9dd0a]
postgres: 17_rpr_d0ec_gulo: aardvark testdb ::1(34808) 
SELECT(_start+0x2a)[0x5289aa]
2023-09-02 19:59:05.329 CEST 46723 LOG:  server process (PID 68975) was 
terminated by signal 6: Aborted
2023-09-02 19:59:05.329 CEST 46723 DETAIL:  Failed process was running: 
SELECT company, tdate, price, max(price) OVER w FROM stock

WINDOW w AS (
PARTITION BY company
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
AFTER MATCH SKIP PAST LAST ROW
INITIAL
PATTERN (LOWPRICE UP+ DOWN+)
DEFINE
LOWPRICE AS price <= 100,
UP AS price > PREV(price),
DOWN AS price < PREV(price)
);
2023-09-02 19:59:05.329 CEST 46723 LOG:  terminating any other active 
server processes




Erik Rijkers




Re: Row pattern recognition

2023-09-02 Thread Tatsuo Ishii
Attached is the v5 patch. Differences from previous patch include:

* Resolve complaint from "PostgreSQL Patch Tester"
  https://commitfest.postgresql.org/44/4460/

- Change gram.y to use PATTERN_P instead of PATTERN. Using PATTERN seems
  to make trouble with Visual Studio build.

:
:
[10:07:57.853] FAILED: 
src/backend/parser/parser.a.p/meson-generated_.._gram.c.obj 
[10:07:57.853] "cl" "-Isrc\backend\parser\parser.a.p" "-Isrc\backend\parser" 
"-I..\src\backend\parser" "-Isrc\include" "-I..\src\include" 
"-Ic:\openssl\1.1\include" "-I..\src\include\port\win32" 
"-I..\src\include\port\win32_msvc" "/MDd" "/nologo" "/showIncludes" "/utf-8" 
"/W2" "/Od" "/Zi" "/DWIN32" "/DWINDOWS" "/D__WINDOWS__" "/D__WIN32__" 
"/D_CRT_SECURE_NO_DEPRECATE" "/D_CRT_NONSTDC_NO_DEPRECATE" "/wd4018" "/wd4244" 
"/wd4273" "/wd4101" "/wd4102" "/wd4090" "/wd4267" "-DBUILDING_DLL" 
"/Fdsrc\backend\parser\parser.a.p\meson-generated_.._gram.c.pdb" 
/Fosrc/backend/parser/parser.a.p/meson-generated_.._gram.c.obj "/c" 
src/backend/parser/gram.c
[10:07:57.860] c:\cirrus\build\src\backend\parser\gram.h(379): error C2365: 
'PATTERN': redefinition; previous definition was 'typedef'
[10:07:57.860] C:\Program Files (x86)\Windows 
Kits\10\include\10.0.20348.0\um\wingdi.h(1375): note: see declaration of 
'PATTERN'
[10:07:57.860] c:\cirrus\build\src\backend\parser\gram.h(379): error C2086: 
'yytokentype PATTERN': redefinition
[10:07:57.860] c:\cirrus\build\src\backend\parser\gram.h(379): note: see 
declaration of 'PATTERN'
[10:07:57.860] ninja: build stopped: subcommand failed.

* Resolve complaint from "make headerscheck"

- Change Windowapi.h and nodeWindowAgg.c to remove unecessary extern
  and public functions.
  
Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
>From 3e02bccbd3dc02304d6dc34f5ab6cc6dd2ee26d1 Mon Sep 17 00:00:00 2001
From: Tatsuo Ishii 
Date: Sat, 2 Sep 2023 15:32:49 +0900
Subject: [PATCH v5 1/7] Row pattern recognition patch for raw parser.

---
 src/backend/parser/gram.y   | 216 +---
 src/include/nodes/parsenodes.h  |  56 +
 src/include/parser/kwlist.h |   8 ++
 src/include/parser/parse_node.h |   1 +
 4 files changed, 267 insertions(+), 14 deletions(-)

diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 7d2032885e..70409cdc9a 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -251,6 +251,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	DefElem	   *defelt;
 	SortBy	   *sortby;
 	WindowDef  *windef;
+	RPCommonSyntax	*rpcom;
+	RPSubsetItem	*rpsubset;
 	JoinExpr   *jexpr;
 	IndexElem  *ielem;
 	StatsElem  *selem;
@@ -453,8 +455,12 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 TriggerTransitions TriggerReferencing
 vacuum_relation_list opt_vacuum_relation_list
 drop_option_list pub_obj_list
-
-%type 	opt_routine_body
+row_pattern_measure_list row_pattern_definition_list
+opt_row_pattern_subset_clause
+row_pattern_subset_list row_pattern_subset_rhs
+row_pattern
+%type 	 row_pattern_subset_item
+%type 	opt_routine_body row_pattern_term
 %type  group_clause
 %type 	group_by_list
 %type 	group_by_item empty_grouping_set rollup_clause cube_clause
@@ -551,6 +557,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type 	relation_expr_opt_alias
 %type 	tablesample_clause opt_repeatable_clause
 %type 	target_el set_target insert_column_item
+row_pattern_measure_item row_pattern_definition
 
 %type 		generic_option_name
 %type 	generic_option_arg
@@ -633,6 +640,9 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type 	window_clause window_definition_list opt_partition_clause
 %type 	window_definition over_clause window_specification
 opt_frame_clause frame_extent frame_bound
+%type 	opt_row_pattern_common_syntax opt_row_pattern_skip_to
+%type 	opt_row_pattern_initial_or_seek
+%type 	opt_row_pattern_measures
 %type 	opt_window_exclusion_clause
 %type 		opt_existing_window_name
 %type  opt_if_not_exists
@@ -659,7 +669,6 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 json_object_constructor_null_clause_opt
 json_array_constructor_null_clause_opt
 
-
 /*
  * Non-keyword token types.  These are hard-wired into the "flex" lexer.
  * They must be listed first so that their numeric codes do not depend on
@@ -702,7 +711,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	CURRE

Re: Row pattern recognition

2023-08-09 Thread Tatsuo Ishii
Attached is the v4 patch. Differences from previous patch include:

> - PERMUTE is still misspelled as PREMUTE

Fixed.

> - PATTERN variables do not have to exist in the DEFINE clause.  They are
> - considered TRUE if not present.

Fixed. Moreover new regression test case is added.

- It was possible that tle nodes in DEFINE clause do not appear in the
  plan's target list. This makes impossible to evaluate expressions in
  the DEFINE because it does not appear in the outer plan's target
  list. To fix this, call findTargetlistEntrySQL99 (with resjunk is
  true) so that the missing TargetEntry is added to the outer plan
  later on.

- I eliminated some hacks in handling the Var node in DEFINE
  clause. Previously I replaced varattno of Var node in a plan tree by
  hand so that it refers to correct varattno in the outer plan
  node. In this patch I modified set_upper_references so that it calls
  fix_upper_expr for those Var nodes in the DEFINE clause. See v4-0003
  patch for more details.

- I found a bug with pattern matching code. It creates a string for
  subsequent regular expression matching. It uses the initial letter
  of each define variable name. For example, if the varname is "foo",
  then "f" is used. Obviously this makes trouble if we have two or
  more variables starting with same "f" (e.g. "food"). To fix this, I
  assign [a-z] to each variable instead of its initial letter. However
  this way limits us not to have more than 26 variables. I hope 26 is
  enough for most use cases.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
>From 5c6430e171ab9f9a75df92fac25949cb96fe1da2 Mon Sep 17 00:00:00 2001
From: Tatsuo Ishii 
Date: Wed, 9 Aug 2023 16:56:17 +0900
Subject: [PATCH v4 1/7] Row pattern recognition patch for raw parser.

---
 src/backend/parser/gram.y   | 216 +---
 src/include/nodes/parsenodes.h  |  56 +
 src/include/parser/kwlist.h |   8 ++
 src/include/parser/parse_node.h |   1 +
 4 files changed, 267 insertions(+), 14 deletions(-)

diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 15ece871a0..62c1919538 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -251,6 +251,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	DefElem	   *defelt;
 	SortBy	   *sortby;
 	WindowDef  *windef;
+	RPCommonSyntax	*rpcom;
+	RPSubsetItem	*rpsubset;
 	JoinExpr   *jexpr;
 	IndexElem  *ielem;
 	StatsElem  *selem;
@@ -453,8 +455,12 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 TriggerTransitions TriggerReferencing
 vacuum_relation_list opt_vacuum_relation_list
 drop_option_list pub_obj_list
-
-%type 	opt_routine_body
+row_pattern_measure_list row_pattern_definition_list
+opt_row_pattern_subset_clause
+row_pattern_subset_list row_pattern_subset_rhs
+row_pattern
+%type 	 row_pattern_subset_item
+%type 	opt_routine_body row_pattern_term
 %type  group_clause
 %type 	group_by_list
 %type 	group_by_item empty_grouping_set rollup_clause cube_clause
@@ -551,6 +557,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type 	relation_expr_opt_alias
 %type 	tablesample_clause opt_repeatable_clause
 %type 	target_el set_target insert_column_item
+row_pattern_measure_item row_pattern_definition
 
 %type 		generic_option_name
 %type 	generic_option_arg
@@ -633,6 +640,9 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type 	window_clause window_definition_list opt_partition_clause
 %type 	window_definition over_clause window_specification
 opt_frame_clause frame_extent frame_bound
+%type 	opt_row_pattern_common_syntax opt_row_pattern_skip_to
+%type 	opt_row_pattern_initial_or_seek
+%type 	opt_row_pattern_measures
 %type 	opt_window_exclusion_clause
 %type 		opt_existing_window_name
 %type  opt_if_not_exists
@@ -659,7 +669,6 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 json_object_constructor_null_clause_opt
 json_array_constructor_null_clause_opt
 
-
 /*
  * Non-keyword token types.  These are hard-wired into the "flex" lexer.
  * They must be listed first so that their numeric codes do not depend on
@@ -702,7 +711,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	CURRENT_TIME CURRENT_TIMESTAMP CURRENT_USER CURSOR CYCLE
 
 	DATA_P DATABASE DAY_P DEALLOCATE DEC DECIMAL_P DECLARE DEFAULT DEFAULTS
-	DEFERRABLE DEFERRED DEFINER DELETE_P DELIMITER DELIMITERS DEPENDS DEPTH DESC
+	DEFERRABLE DEFERRED DEFINE DEFINER DELETE_P DELIMITER DELIMITERS DEPENDS DEPTH DESC
 	DETACH DICTIONARY DISABLE_P DISCARD DISTINCT DO DOCUMENT_P DOMAIN_P
 	DOUBLE_P DROP
 
@@ -718,7 +727,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 

Re: Row pattern recognition

2023-07-28 Thread Tatsuo Ishii
>>> - PATTERN variables do not have to exist in the DEFINE clause.  They are
>>> - considered TRUE if not present.
>> Do you think we really need this? I found a criticism regarding this.
>> https://link.springer.com/article/10.1007/s13222-022-00404-3
>> "3.2 Explicit Definition of All Row Pattern Variables"
>> What do you think?
> 
> I think that a large part of obeying the standard is to allow queries
> from other engines to run the same on ours.  The standard does not
> require the pattern variables to be defined and so there are certainly
> queries out there without them, and that hurts migrating to
> PostgreSQL.

Yeah, migration is good point. I agree we should have the feature.

>>> When we get to adding count in the MEASURES clause, there will be a
>>> difference between no match and empty match, but that does not apply
>>> here.
>> Can you elaborate more? I understand that "no match" and "empty match"
>> are different things. But I do not understand how the difference
>> affects the result of count.
> 
> This query:
> 
> SELECT v.a, wcnt OVER w, count(*) OVER w
> FROM (VALUES ('A')) AS v (a)
> WINDOW w AS (
>   ORDER BY v.a
>   MEASURES count(*) AS wcnt
>   ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
>   PATTERN (B)
>   DEFINE B AS B.a = 'B'
> )
> 
> produces this result:
> 
>  a | wcnt | count
> ---+--+---
>  A |  | 0
> (1 row)
> 
> Inside the window specification, *no match* was found and so all of
> the MEASURES are null.  The count(*) in the target list however, still
> exists and operates over zero rows.
> 
> This very similar query:
> 
> SELECT v.a, wcnt OVER w, count(*) OVER w
> FROM (VALUES ('A')) AS v (a)
> WINDOW w AS (
>   ORDER BY v.a
>   MEASURES count(*) AS wcnt
>   ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
>   PATTERN (B?)
>   DEFINE B AS B.a = 'B'
> )
> 
> produces this result:
> 
>  a | wcnt | count
> ---+--+---
>  A |0 | 0
> (1 row)
> 
> In this case, the pattern is B? instead of just B, which produces an
> *empty match* for the MEASURES to be applied over.

Thank you for the detailed explanation. I think I understand now.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp




Re: Row pattern recognition

2023-07-28 Thread Vik Fearing

On 7/28/23 13:02, Tatsuo Ishii wrote:

Attached is the v3 patch. In this patch following changes are made.


- PATTERN variables do not have to exist in the DEFINE clause.  They are
- considered TRUE if not present.


Do you think we really need this? I found a criticism regarding this.

https://link.springer.com/article/10.1007/s13222-022-00404-3
"3.2 Explicit Definition of All Row Pattern Variables"

What do you think?


I think that a large part of obeying the standard is to allow queries 
from other engines to run the same on ours.  The standard does not 
require the pattern variables to be defined and so there are certainly 
queries out there without them, and that hurts migrating to PostgreSQL.



- I am working on making window aggregates RPR aware now. The
implementation is in progress and far from completeness. An example
is below. I think row 2, 3, 4 of "count" column should be NULL
instead of 3, 2, 0, 0. Same thing can be said to other
rows. Probably this is an effect of moving aggregate but I still
studying the window aggregation code.


This tells me again that RPR is not being run in the right place.  All
windowed aggregates and frame-level window functions should Just Work
with no modification.


I am not touching each aggregate function. I am modifying
eval_windowaggregates() in nodeWindowAgg.c, which calls each aggregate
function. Do you think it's not the right place to make window
aggregates RPR aware?


Oh, okay.


SELECT company, tdate, first_value(price) OVER W, count(*) OVER w FROM
stock
   WINDOW w AS (
   PARTITION BY company
   ORDER BY tdate
   ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
   AFTER MATCH SKIP PAST LAST ROW
   INITIAL
   PATTERN (START UP+ DOWN+)
   DEFINE
START AS TRUE,
UP AS price > PREV(price),
DOWN AS price < PREV(price)
);
   company  |   tdate| first_value | count
--++-+---
   company1 | 2023-07-01 | 100 | 4
   company1 | 2023-07-02 | | 3
   company1 | 2023-07-03 | | 2
   company1 | 2023-07-04 | | 0
   company1 | 2023-07-05 | | 0
   company1 | 2023-07-06 |  90 | 4
   company1 | 2023-07-07 | | 3
   company1 | 2023-07-08 | | 2
   company1 | 2023-07-09 | | 0
   company1 | 2023-07-10 | | 0
   company2 | 2023-07-01 |  50 | 4
   company2 | 2023-07-02 | | 3
   company2 | 2023-07-03 | | 2
   company2 | 2023-07-04 | | 0
   company2 | 2023-07-05 | | 0
   company2 | 2023-07-06 |  60 | 4
   company2 | 2023-07-07 | | 3
   company2 | 2023-07-08 | | 2
   company2 | 2023-07-09 | | 0
   company2 | 2023-07-10 | | 0


In this scenario, row 1's frame is the first 5 rows and specified SKIP
PAST LAST ROW, so rows 2-5 don't have *any* frame (because they are
skipped) and the result of the outer count should be 0 for all of them
because there are no rows in the frame.


Ok. Just I want to make sure. If it's other aggregates like sum or
avg, the result of the outer aggregates should be NULL.


They all behave the same way as in a normal query when they receive no 
rows as input.



When we get to adding count in the MEASURES clause, there will be a
difference between no match and empty match, but that does not apply
here.


Can you elaborate more? I understand that "no match" and "empty match"
are different things. But I do not understand how the difference
affects the result of count.


This query:

SELECT v.a, wcnt OVER w, count(*) OVER w
FROM (VALUES ('A')) AS v (a)
WINDOW w AS (
  ORDER BY v.a
  MEASURES count(*) AS wcnt
  ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
  PATTERN (B)
  DEFINE B AS B.a = 'B'
)

produces this result:

 a | wcnt | count
---+--+---
 A |  | 0
(1 row)

Inside the window specification, *no match* was found and so all of the 
MEASURES are null.  The count(*) in the target list however, still 
exists and operates over zero rows.


This very similar query:

SELECT v.a, wcnt OVER w, count(*) OVER w
FROM (VALUES ('A')) AS v (a)
WINDOW w AS (
  ORDER BY v.a
  MEASURES count(*) AS wcnt
  ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
  PATTERN (B?)
  DEFINE B AS B.a = 'B'
)

produces this result:

 a | wcnt | count
---+--+---
 A |0 | 0
(1 row)

In this case, the pattern is B? instead of just B, which produces an 
*empty match* for the MEASURES to be applied over.

--
Vik Fearing





Re: Row pattern recognition

2023-07-28 Thread Tatsuo Ishii
>> Attached is the v3 patch. In this patch following changes are made.
> 
> Excellent.  Thanks!

You are welcome!

> A few quick comments:
> 
> - PERMUTE is still misspelled as PREMUTE

Oops. Will fix.

> - PATTERN variables do not have to exist in the DEFINE clause.  They are
> - considered TRUE if not present.

Do you think we really need this? I found a criticism regarding this.

https://link.springer.com/article/10.1007/s13222-022-00404-3
"3.2 Explicit Definition of All Row Pattern Variables"

What do you think?

>> - I am working on making window aggregates RPR aware now. The
>>implementation is in progress and far from completeness. An example
>>is below. I think row 2, 3, 4 of "count" column should be NULL
>>instead of 3, 2, 0, 0. Same thing can be said to other
>>rows. Probably this is an effect of moving aggregate but I still
>>studying the window aggregation code.
> 
> This tells me again that RPR is not being run in the right place.  All
> windowed aggregates and frame-level window functions should Just Work
> with no modification.

I am not touching each aggregate function. I am modifying
eval_windowaggregates() in nodeWindowAgg.c, which calls each aggregate
function. Do you think it's not the right place to make window
aggregates RPR aware?

>> SELECT company, tdate, first_value(price) OVER W, count(*) OVER w FROM
>> stock
>>   WINDOW w AS (
>>   PARTITION BY company
>>   ORDER BY tdate
>>   ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
>>   AFTER MATCH SKIP PAST LAST ROW
>>   INITIAL
>>   PATTERN (START UP+ DOWN+)
>>   DEFINE
>>START AS TRUE,
>>UP AS price > PREV(price),
>>DOWN AS price < PREV(price)
>> );
>>   company  |   tdate| first_value | count
>> --++-+---
>>   company1 | 2023-07-01 | 100 | 4
>>   company1 | 2023-07-02 | | 3
>>   company1 | 2023-07-03 | | 2
>>   company1 | 2023-07-04 | | 0
>>   company1 | 2023-07-05 | | 0
>>   company1 | 2023-07-06 |  90 | 4
>>   company1 | 2023-07-07 | | 3
>>   company1 | 2023-07-08 | | 2
>>   company1 | 2023-07-09 | | 0
>>   company1 | 2023-07-10 | | 0
>>   company2 | 2023-07-01 |  50 | 4
>>   company2 | 2023-07-02 | | 3
>>   company2 | 2023-07-03 | | 2
>>   company2 | 2023-07-04 | | 0
>>   company2 | 2023-07-05 | | 0
>>   company2 | 2023-07-06 |  60 | 4
>>   company2 | 2023-07-07 | | 3
>>   company2 | 2023-07-08 | | 2
>>   company2 | 2023-07-09 | | 0
>>   company2 | 2023-07-10 | | 0
> 
> In this scenario, row 1's frame is the first 5 rows and specified SKIP
> PAST LAST ROW, so rows 2-5 don't have *any* frame (because they are
> skipped) and the result of the outer count should be 0 for all of them
> because there are no rows in the frame.

Ok. Just I want to make sure. If it's other aggregates like sum or
avg, the result of the outer aggregates should be NULL.

> When we get to adding count in the MEASURES clause, there will be a
> difference between no match and empty match, but that does not apply
> here.

Can you elaborate more? I understand that "no match" and "empty match"
are different things. But I do not understand how the difference
affects the result of count.

>> I am going to add this thread to CommitFest and plan to add both of
>> you as reviewers. Thanks in advance.
> 
> My pleasure.  Thank you for working on this difficult feature.

Thank you for accepting being registered as a reviewer. Your comments
are really helpful.
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp




Re: Row pattern recognition

2023-07-28 Thread Vik Fearing

On 7/26/23 14:21, Tatsuo Ishii wrote:

Attached is the v3 patch. In this patch following changes are made.


Excellent.  Thanks!

A few quick comments:

- PERMUTE is still misspelled as PREMUTE

- PATTERN variables do not have to exist in the DEFINE clause.  They are 
considered TRUE if not present.



(1) I completely changed the pattern matching engine so that it
performs backtracking. Now the engine evaluates all pattern elements
defined in PATTER against each row, saving matched pattern variables
in a string per row. For example if the pattern element A and B
evaluated to true, a string "AB" is created for current row.

This continues until all pattern matching fails or encounters the end
of full window frame/partition. After that, the pattern matching
engine creates all possible "pattern strings" and apply the regular
expression matching to each. For example if we have row 0 = "AB" row 1
= "C", possible pattern strings are: "AC" and "BC".

If it matches, the length of matching substring is saved. After all
possible trials are done, the longest matching substring is chosen and
it becomes the width (number of rows) in the reduced window frame.

See row_is_in_reduced_frame, search_str_set and search_str_set_recurse
in nodeWindowAggs.c for more details. For now I use a naive depth
first search and probably there is a lot of rooms for optimization
(for example rewriting it without using
recursion). Suggestions/patches are welcome.


My own reviews will only focus on correctness for now.  Once we get a 
good set of regression tests all passing, I will focus more on 
optimization.  Of course, others might want to review the performance now.



Vik Fearing wrote:

I strongly disagree with this.  Window function do not need to know
how the frame is defined, and indeed they should not.
WinGetFuncArgInFrame should answer yes or no and the window function
just works on that. Otherwise we will get extension (and possibly even
core) functions that don't handle the frame properly.


So I moved row_is_in_reduce_frame into WinGetFuncArgInFrame so that
those window functions are not needed to be changed.

(3) Window function rpr was removed. We can use first_value instead.


Excellent.


(4) Remaining tasks/issues.

- For now I disable WinSetMarkPosition because RPR often needs to
   access a row before the mark is set. We need to fix this in the
   future.


Noted, and agreed.


- I am working on making window aggregates RPR aware now. The
   implementation is in progress and far from completeness. An example
   is below. I think row 2, 3, 4 of "count" column should be NULL
   instead of 3, 2, 0, 0. Same thing can be said to other
   rows. Probably this is an effect of moving aggregate but I still
   studying the window aggregation code.


This tells me again that RPR is not being run in the right place.  All 
windowed aggregates and frame-level window functions should Just Work 
with no modification.



SELECT company, tdate, first_value(price) OVER W, count(*) OVER w FROM stock
  WINDOW w AS (
  PARTITION BY company
  ORDER BY tdate
  ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
  AFTER MATCH SKIP PAST LAST ROW
  INITIAL
  PATTERN (START UP+ DOWN+)
  DEFINE
   START AS TRUE,
   UP AS price > PREV(price),
   DOWN AS price < PREV(price)
);
  company  |   tdate| first_value | count
--++-+---
  company1 | 2023-07-01 | 100 | 4
  company1 | 2023-07-02 | | 3
  company1 | 2023-07-03 | | 2
  company1 | 2023-07-04 | | 0
  company1 | 2023-07-05 | | 0
  company1 | 2023-07-06 |  90 | 4
  company1 | 2023-07-07 | | 3
  company1 | 2023-07-08 | | 2
  company1 | 2023-07-09 | | 0
  company1 | 2023-07-10 | | 0
  company2 | 2023-07-01 |  50 | 4
  company2 | 2023-07-02 | | 3
  company2 | 2023-07-03 | | 2
  company2 | 2023-07-04 | | 0
  company2 | 2023-07-05 | | 0
  company2 | 2023-07-06 |  60 | 4
  company2 | 2023-07-07 | | 3
  company2 | 2023-07-08 | | 2
  company2 | 2023-07-09 | | 0
  company2 | 2023-07-10 | | 0


In this scenario, row 1's frame is the first 5 rows and specified SKIP 
PAST LAST ROW, so rows 2-5 don't have *any* frame (because they are 
skipped) and the result of the outer count should be 0 for all of them 
because there are no rows in the frame.


When we get to adding count in the MEASURES clause, there will be a 
difference between no match and empty match, but that does not apply here.



I am going to add this thread to CommitFest and plan to add both of
you as reviewers. Thanks in advance.


My pleasure.  Thank you for working on this difficult feature.
--
Vik Fearing





Re: Row pattern recognition

2023-07-28 Thread Vik Fearing

On 7/28/23 09:09, Tatsuo Ishii wrote:

We already recalculate a frame each time a row is processed even
without RPR. See ExecWindowAgg.


Yes, after each row.  Not for each function.


Ok, I understand now. Closer look at the code, I realized that each
window function calls update_frameheadpos, which computes the frame
head position. But actually it checks winstate->framehead_valid and if
it's already true (probably by other window function), then it does
nothing.


Also RPR always requires a frame option ROWS BETWEEN CURRENT ROW,
which means the frame head is changed each time current row position
changes.


Off topic for now: I wonder why this restriction is in place and
whether we should respect or ignore it.  That is a discussion for
another time, though.


My guess is, it is because other than ROWS BETWEEN CURRENT ROW has
little or no meaning. Consider following example:


Yes, that makes sense.


I strongly disagree with this.  Window function do not need to know
how the frame is defined, and indeed they should not.

We already break the rule by defining *support functions. See
windowfuncs.c.

The support functions don't know anything about the frame, they just
know when a window function is monotonically increasing and execution
can either stop or be "passed through".


I see following code in window_row_number_support:

/*
 * The frame options can always become "ROWS BETWEEN UNBOUNDED
 * PRECEDING AND CURRENT ROW".  row_number() always just 
increments by
 * 1 with each row in the partition.  Using ROWS instead of 
RANGE
 * saves effort checking peer rows during execution.
 */
req->frameOptions = (FRAMEOPTION_NONDEFAULT |
 FRAMEOPTION_ROWS |
 
FRAMEOPTION_START_UNBOUNDED_PRECEDING |
 
FRAMEOPTION_END_CURRENT_ROW);

I think it not only knows about frame but it even changes the frame
options. This seems far from "don't know anything about the frame", no?


That's the planner support function.  The row_number() function itself 
is not even allowed to *have* a frame, per spec.  We allow it, but as 
you can see from that support function, we completely replace it.


So all of the partition-level window functions are not affected by RPR 
anyway.



I have two comments about this:

It isn't just for convenience, it is for correctness.  The window
functions do not need to know which rows they are *not* operating on.

There is no such thing as a "full" or "reduced" frame.  The standard
uses those terms to explain the difference between before and after
RPR is applied, but window functions do not get to choose which frame
they apply over.  They only ever apply over the reduced window frame.


I agree that "full window frame" and "reduced window frame" do not
exist at the same time, and in the end (after computation of reduced
frame), only "reduced" frame is visible to window
functions/aggregates. But I still do think that "full window frame"
and "reduced window frame" are important concept to explain/understand
how PRP works.


If we are just using those terms for documentation, then okay.
--
Vik Fearing





Re: Row pattern recognition

2023-07-26 Thread Tatsuo Ishii
> I am going to add this thread to CommitFest and plan to add both of
> you as reviewers. Thanks in advance.

Done.
https://commitfest.postgresql.org/44/4460/

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp




Re: Row pattern recognition

2023-07-26 Thread Tatsuo Ishii
ce)
);
ERROR:  attribute number 3 exceeds number of columns 2

This is because attributes appearing in DEFINE are not added to the
target list. I am looking for way to teach planner to add attributes
appearing in DEFINE.

I am going to add this thread to CommitFest and plan to add both of
you as reviewers. Thanks in advance.
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
>From 16cab4e23f38b447a814773fea83a74c337a08d5 Mon Sep 17 00:00:00 2001
From: Tatsuo Ishii 
Date: Wed, 26 Jul 2023 19:49:09 +0900
Subject: [PATCH v3 1/7] Row pattern recognition patch for raw parser.

---
 src/backend/parser/gram.y   | 218 +---
 src/include/nodes/parsenodes.h  |  54 
 src/include/parser/kwlist.h |   8 ++
 src/include/parser/parse_node.h |   1 +
 4 files changed, 266 insertions(+), 15 deletions(-)

diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 856d5dee0e..a4c97c28ff 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -251,6 +251,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	DefElem	   *defelt;
 	SortBy	   *sortby;
 	WindowDef  *windef;
+	RPCommonSyntax	*rpcom;
+	RPSubsetItem	*rpsubset;
 	JoinExpr   *jexpr;
 	IndexElem  *ielem;
 	StatsElem  *selem;
@@ -453,8 +455,12 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 TriggerTransitions TriggerReferencing
 vacuum_relation_list opt_vacuum_relation_list
 drop_option_list pub_obj_list
-
-%type 	opt_routine_body
+row_pattern_measure_list row_pattern_definition_list
+opt_row_pattern_subset_clause
+row_pattern_subset_list row_pattern_subset_rhs
+row_pattern
+%type 	 row_pattern_subset_item
+%type 	opt_routine_body row_pattern_term
 %type  group_clause
 %type 	group_by_list
 %type 	group_by_item empty_grouping_set rollup_clause cube_clause
@@ -551,6 +557,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type 	relation_expr_opt_alias
 %type 	tablesample_clause opt_repeatable_clause
 %type 	target_el set_target insert_column_item
+row_pattern_measure_item row_pattern_definition
 
 %type 		generic_option_name
 %type 	generic_option_arg
@@ -633,6 +640,9 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type 	window_clause window_definition_list opt_partition_clause
 %type 	window_definition over_clause window_specification
 opt_frame_clause frame_extent frame_bound
+%type 	opt_row_pattern_common_syntax opt_row_pattern_skip_to
+%type 	opt_row_pattern_initial_or_seek
+%type 	opt_row_pattern_measures
 %type 	opt_window_exclusion_clause
 %type 		opt_existing_window_name
 %type  opt_if_not_exists
@@ -659,7 +669,6 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 json_object_constructor_null_clause_opt
 json_array_constructor_null_clause_opt
 
-
 /*
  * Non-keyword token types.  These are hard-wired into the "flex" lexer.
  * They must be listed first so that their numeric codes do not depend on
@@ -702,7 +711,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	CURRENT_TIME CURRENT_TIMESTAMP CURRENT_USER CURSOR CYCLE
 
 	DATA_P DATABASE DAY_P DEALLOCATE DEC DECIMAL_P DECLARE DEFAULT DEFAULTS
-	DEFERRABLE DEFERRED DEFINER DELETE_P DELIMITER DELIMITERS DEPENDS DEPTH DESC
+	DEFERRABLE DEFERRED DEFINE DEFINER DELETE_P DELIMITER DELIMITERS DEPENDS DEPTH DESC
 	DETACH DICTIONARY DISABLE_P DISCARD DISTINCT DO DOCUMENT_P DOMAIN_P
 	DOUBLE_P DROP
 
@@ -718,7 +727,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	HANDLER HAVING HEADER_P HOLD HOUR_P
 
 	IDENTITY_P IF_P ILIKE IMMEDIATE IMMUTABLE IMPLICIT_P IMPORT_P IN_P INCLUDE
-	INCLUDING INCREMENT INDENT INDEX INDEXES INHERIT INHERITS INITIALLY INLINE_P
+	INCLUDING INCREMENT INDENT INDEX INDEXES INHERIT INHERITS INITIAL INITIALLY INLINE_P
 	INNER_P INOUT INPUT_P INSENSITIVE INSERT INSTEAD INT_P INTEGER
 	INTERSECT INTERVAL INTO INVOKER IS ISNULL ISOLATION
 
@@ -731,7 +740,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	LEADING LEAKPROOF LEAST LEFT LEVEL LIKE LIMIT LISTEN LOAD LOCAL
 	LOCALTIME LOCALTIMESTAMP LOCATION LOCK_P LOCKED LOGGED
 
-	MAPPING MATCH MATCHED MATERIALIZED MAXVALUE MERGE METHOD
+	MAPPING MATCH MATCHED MATERIALIZED MAXVALUE MEASURES MERGE METHOD
 	MINUTE_P MINVALUE MODE MONTH_P MOVE
 
 	NAME_P NAMES NATIONAL NATURAL NCHAR NEW NEXT NFC NFD NFKC NFKD NO NONE
@@ -743,9 +752,9 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	ORDER ORDINALITY OTHERS OUT_P OUTER_P
 	OVER OVERLAPS OVERLAY OVERRIDING OWNED OWNER
 
-	PARALLEL PARAMETER PARSER PARTIAL PARTITION PASSING PASSWORD
-	PLACING PLANS POLICY
-	POSITION PRECEDING PRECISION PRESERVE PREPARE PREPARED PRIMARY
+	PARALLEL PARAMETER PARSER PARTIAL PARTITI

Re: Row pattern recognition

2023-07-25 Thread Tatsuo Ishii
Hi,

> diff --git a/src/test/regress/expected/rpr.out 
> b/src/test/regress/expected/rpr.out
> index 6bf8818911..f3fd22de2a 100644
> --- a/src/test/regress/expected/rpr.out
> +++ b/src/test/regress/expected/rpr.out
> @@ -230,6 +230,79 @@ SELECT company, tdate, price, rpr(price) OVER w FROM 
> stock
>   company2 | 07-10-2023 |  1300 | 
>  (20 rows)
>  
> +-- match everything
> +SELECT company, tdate, price, rpr(price) OVER w FROM stock
> + WINDOW w AS (
> + PARTITION BY company
> + ORDER BY tdate
> + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
> + AFTER MATCH SKIP TO NEXT ROW

It seems it's a result with AFTER MATCH SKIP PAST LAST ROW.

> + INITIAL
> + PATTERN (A+)
> + DEFINE
> +  A AS TRUE
> +);
> + company  |   tdate| price | rpr 
> +--++---+-
> + company1 | 07-01-2023 |   100 | 100
> + company1 | 07-02-2023 |   200 |
> + company1 | 07-03-2023 |   150 |
> + company1 | 07-04-2023 |   140 |
> + company1 | 07-05-2023 |   150 |
> + company1 | 07-06-2023 |90 |
> + company1 | 07-07-2023 |   110 |
> + company1 | 07-08-2023 |   130 |
> + company1 | 07-09-2023 |   120 |
> + company1 | 07-10-2023 |   130 |
> + company2 | 07-01-2023 |50 |  50
> + company2 | 07-02-2023 |  2000 |
> + company2 | 07-03-2023 |  1500 |
> + company2 | 07-04-2023 |  1400 |
> + company2 | 07-05-2023 |  1500 |
> + company2 | 07-06-2023 |60 |
> + company2 | 07-07-2023 |  1100 |
> + company2 | 07-08-2023 |  1300 |
> + company2 | 07-09-2023 |  1200 |
> + company2 | 07-10-2023 |  1300 |
> +(20 rows)
> +
> +-- backtracking with reclassification of rows
> +SELECT company, tdate, price, rpr(price) OVER w FROM stock
> + WINDOW w AS (
> + PARTITION BY company
> + ORDER BY tdate
> + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
> + AFTER MATCH SKIP TO NEXT ROW
> + INITIAL
> + PATTERN (A+ B+)
> + DEFINE
> +  A AS price > 100,
> +  B AS price > 100
> +);
> + company  |   tdate| price | rpr  
> +--++---+--
> + company1 | 07-01-2023 |   100 | 
> + company1 | 07-02-2023 |   200 |  200
> + company1 | 07-03-2023 |   150 | 
> + company1 | 07-04-2023 |   140 | 
> + company1 | 07-05-2023 |   150 | 
> + company1 | 07-06-2023 |90 | 
> + company1 | 07-07-2023 |   110 |  110
> + company1 | 07-08-2023 |   130 | 
> + company1 | 07-09-2023 |   120 | 
> + company1 | 07-10-2023 |   130 | 
> + company2 | 07-01-2023 |50 | 
> + company2 | 07-02-2023 |  2000 | 2000
> + company2 | 07-03-2023 |  1500 | 
> + company2 | 07-04-2023 |  1400 | 
> + company2 | 07-05-2023 |  1500 | 
> + company2 | 07-06-2023 |60 | 
> + company2 | 07-07-2023 |  1100 | 1100
> + company2 | 07-08-2023 |  1300 | 
> + company2 | 07-09-2023 |  1200 | 
> + company2 | 07-10-2023 |  1300 | 
> +(20 rows)
> +
>  --
>  -- Error cases
>  --
> diff --git a/src/test/regress/sql/rpr.sql b/src/test/regress/sql/rpr.sql
> index 951c9abfe9..f1cd0369f4 100644
> --- a/src/test/regress/sql/rpr.sql
> +++ b/src/test/regress/sql/rpr.sql
> @@ -94,6 +94,33 @@ SELECT company, tdate, price, rpr(price) OVER w FROM stock
>UPDOWN AS price > PREV(price) AND price > NEXT(price)
>  );
>  
> +-- match everything
> +SELECT company, tdate, price, rpr(price) OVER w FROM stock
> + WINDOW w AS (
> + PARTITION BY company
> + ORDER BY tdate
> + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
> + AFTER MATCH SKIP TO NEXT ROW
> + INITIAL
> + PATTERN (A+)
> + DEFINE
> +  A AS TRUE
> +);
> +
> +-- backtracking with reclassification of rows
> +SELECT company, tdate, price, rpr(price) OVER w FROM stock
> + WINDOW w AS (
> + PARTITION BY company
> + ORDER BY tdate
> + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
> + AFTER MATCH SKIP TO NEXT ROW
> + INITIAL
> + PATTERN (A+ B+)
> + DEFINE
> +  A AS price > 100,
> +  B AS price > 100
> +);
> +
>  --
>  -- Error cases
>  --




Re: Row pattern recognition

2023-07-24 Thread Vik Fearing

On 7/24/23 02:22, Tatsuo Ishii wrote:

What we are talking about here is *defining* a window
frame.

Well, we are defining a "reduced" window frame within a (full) window
frame. A "reduced" window frame is calculated each time when a window
function is called.



Why?  It should only be recalculated when the current row changes and
we need a new frame.  The reduced window frame *is* the window frame
for all functions over that window.


We already recalculate a frame each time a row is processed even
without RPR. See ExecWindowAgg.


Yes, after each row.  Not for each function.


Also RPR always requires a frame option ROWS BETWEEN CURRENT ROW,
which means the frame head is changed each time current row position
changes.


Off topic for now: I wonder why this restriction is in place and whether 
we should respect or ignore it.  That is a discussion for another time, 
though.



I strongly disagree with this.  Window function do not need to know
how the frame is defined, and indeed they should not.


We already break the rule by defining *support functions. See
windowfuncs.c.


The support functions don't know anything about the frame, they just 
know when a window function is monotonically increasing and execution 
can either stop or be "passed through".



WinGetFuncArgInFrame should answer yes or no and the window function
just works on that. Otherwise we will get extension (and possibly even
core) functions that don't handle the frame properly.


Maybe I can move row_is_in_reduced_frame into WinGetFuncArgInFrame
just for convenience.


I have two comments about this:

It isn't just for convenience, it is for correctness.  The window 
functions do not need to know which rows they are *not* operating on.


There is no such thing as a "full" or "reduced" frame.  The standard 
uses those terms to explain the difference between before and after RPR 
is applied, but window functions do not get to choose which frame they 
apply over.  They only ever apply over the reduced window frame.

--
Vik Fearing





Re: Row pattern recognition

2023-07-23 Thread Tatsuo Ishii
>>> What we are talking about here is *defining* a window
>>> frame.
>> Well, we are defining a "reduced" window frame within a (full) window
>> frame. A "reduced" window frame is calculated each time when a window
>> function is called.
> 
> 
> Why?  It should only be recalculated when the current row changes and
> we need a new frame.  The reduced window frame *is* the window frame
> for all functions over that window.

We already recalculate a frame each time a row is processed even
without RPR. See ExecWindowAgg.

Also RPR always requires a frame option ROWS BETWEEN CURRENT ROW,
which means the frame head is changed each time current row position
changes.

> I strongly disagree with this.  Window function do not need to know
> how the frame is defined, and indeed they should not.

We already break the rule by defining *support functions. See
windowfuncs.c.

> WinGetFuncArgInFrame should answer yes or no and the window function
> just works on that. Otherwise we will get extension (and possibly even
> core) functions that don't handle the frame properly.

Maybe I can move row_is_in_reduced_frame into WinGetFuncArgInFrame
just for convenience.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp




Re: Row pattern recognition

2023-07-23 Thread Vik Fearing

On 7/22/23 08:14, Tatsuo Ishii wrote:

On 7/22/23 03:11, Tatsuo Ishii wrote:

Maybe. Suppose a window function executes row pattern matching using
price > PREV(price). The window function already receives
WindowStatePerFuncData. If we can pass the WindowStatePerFuncData to
PREV, we could let PREV do the real work (getting previous tuple).
I have not tried this yet, though.


I don't understand this logic.  Window functions work over a window
frame.


Yes.


What we are talking about here is *defining* a window
frame.


Well, we are defining a "reduced" window frame within a (full) window
frame. A "reduced" window frame is calculated each time when a window
function is called.



Why?  It should only be recalculated when the current row changes and we 
need a new frame.  The reduced window frame *is* the window frame for 
all functions over that window.




How can a window function execute row pattern matching?


A window function is called for each row fed by an outer plan. It
fetches current, previous and next row to execute pattern matching. If
it matches, the window function moves to next row and repeat the
process, until pattern match fails.

Below is an example window function to execute pattern matching (I
will include this in the v3 patch). row_is_in_reduced_frame() is a
function to execute pattern matching. It returns the number of rows in
the reduced frame if pattern match succeeds. If succeeds, the function
returns the last row in the reduced frame instead of the last row in
the full window frame.



I strongly disagree with this.  Window function do not need to know how 
the frame is defined, and indeed they should not.  WinGetFuncArgInFrame 
should answer yes or no and the window function just works on that. 
Otherwise we will get extension (and possibly even core) functions that 
don't handle the frame properly.

--
Vik Fearing





Re: Row pattern recognition

2023-07-22 Thread Tatsuo Ishii
> On 7/22/23 03:11, Tatsuo Ishii wrote:
> Maybe
> that can help clarify the design? It feels like it'll need to
> eventually
> be a "real" function that operates on the window state, even if it
> doesn't support all of the possible complexities in v1.
 Unfortunately an window function can not call other window functions.
>>> Can that restriction be lifted for the EXPR_KIND_RPR_DEFINE case?
> 
>> I am not sure at this point. Current PostgreSQL executor creates
>> WindowStatePerFuncData for each window function and aggregate
>> appearing in OVER clause. This means PREV/NEXT and other row pattern
>> navigation operators cannot have their own WindowStatePerFuncData if
>> they do not appear in OVER clauses in a query even if PREV/NEXT
>> etc. are defined as window function.
>> 
>>> Or
>>> does it make sense to split the pattern navigation "functions" into
>>> their own new concept, and maybe borrow some of the window function
>>> infrastructure for it?
> 
>> Maybe. Suppose a window function executes row pattern matching using
>> price > PREV(price). The window function already receives
>> WindowStatePerFuncData. If we can pass the WindowStatePerFuncData to
>> PREV, we could let PREV do the real work (getting previous tuple).
>> I have not tried this yet, though.
> 
> 
> I don't understand this logic.  Window functions work over a window
> frame.

Yes.

> What we are talking about here is *defining* a window
> frame.

Well, we are defining a "reduced" window frame within a (full) window
frame. A "reduced" window frame is calculated each time when a window
function is called.

> How can a window function execute row pattern matching?

A window function is called for each row fed by an outer plan. It
fetches current, previous and next row to execute pattern matching. If
it matches, the window function moves to next row and repeat the
process, until pattern match fails.

Below is an example window function to execute pattern matching (I
will include this in the v3 patch). row_is_in_reduced_frame() is a
function to execute pattern matching. It returns the number of rows in
the reduced frame if pattern match succeeds. If succeeds, the function
returns the last row in the reduced frame instead of the last row in
the full window frame.

/*
 * last_value
 * return the value of VE evaluated on the last row of the
 * window frame, per spec.
 */
Datum
window_last_value(PG_FUNCTION_ARGS)
{
WindowObject winobj = PG_WINDOW_OBJECT();
Datum   result;
boolisnull;
int64   abspos;
int num_reduced_frame;

abspos = WinGetCurrentPosition(winobj);
num_reduced_frame = row_is_in_reduced_frame(winobj, abspos);

if (num_reduced_frame == 0)
/* no RPR is involved */
result = WinGetFuncArgInFrame(winobj, 0,
  0, 
WINDOW_SEEK_TAIL, true,
  
, NULL);
else if (num_reduced_frame > 0)
/* get last row value in the reduced frame */
result = WinGetFuncArgInFrame(winobj, 0,
  
num_reduced_frame - 1, WINDOW_SEEK_HEAD, true,
  
, NULL);
else
/* RPR is involved and current row is unmatched or skipped */
isnull = true;

if (isnull)
PG_RETURN_NULL();

PG_RETURN_DATUM(result);
}




Re: Row pattern recognition

2023-07-21 Thread Vik Fearing

On 7/22/23 03:11, Tatsuo Ishii wrote:

Maybe
that can help clarify the design? It feels like it'll need to eventually
be a "real" function that operates on the window state, even if it
doesn't support all of the possible complexities in v1.

Unfortunately an window function can not call other window functions.

Can that restriction be lifted for the EXPR_KIND_RPR_DEFINE case?



I am not sure at this point. Current PostgreSQL executor creates
WindowStatePerFuncData for each window function and aggregate
appearing in OVER clause. This means PREV/NEXT and other row pattern
navigation operators cannot have their own WindowStatePerFuncData if
they do not appear in OVER clauses in a query even if PREV/NEXT
etc. are defined as window function.


Or
does it make sense to split the pattern navigation "functions" into
their own new concept, and maybe borrow some of the window function
infrastructure for it?



Maybe. Suppose a window function executes row pattern matching using
price > PREV(price). The window function already receives
WindowStatePerFuncData. If we can pass the WindowStatePerFuncData to
PREV, we could let PREV do the real work (getting previous tuple).
I have not tried this yet, though.



I don't understand this logic.  Window functions work over a window 
frame.  What we are talking about here is *defining* a window frame. 
How can a window function execute row pattern matching?

--
Vik Fearing





Re: Row pattern recognition

2023-07-21 Thread Tatsuo Ishii
>> One small concern is how to convert pattern variables to regex
>> expression which our regex enginer understands. Suppose,
>> 
>> PATTERN UP+
>> 
>> Currently I convert "UP+" to "U+" so that it can be fed to the regexp
>> engine. In order to do that, we need to know which part of the pattern
>> (UP+) is the pattern variable ("UP"). For "UP+" it's quite easy. But
>> for more complex regular expressions it would be not, unless PATTERN
>> grammer can be analyzed by our parser to know which part is the
>> pattern variable.
> 
> Is the eventual plan to generate multiple alternatives, and run the
> regex against them one at a time?

Yes, that's my plan.

>>> I'm not familiar enough with this code yet to offer very concrete
>>> suggestions, sorry... But at some point in the future, we need to be
>>> able to skip forward and backward from arbitrary points, like
>>>
>>> DEFINE B AS B.price > PREV(FIRST(A.price), 3)
>>>
>>> so there won't be just one pair of "previous and next" tuples.
>> 
>> Yes, I know.
> 
> I apologize. I got overexplain-y.

No problem. Thank you for reminding me it.

>>> Maybe
>>> that can help clarify the design? It feels like it'll need to eventually
>>> be a "real" function that operates on the window state, even if it
>>> doesn't support all of the possible complexities in v1.
>> 
>> Unfortunately an window function can not call other window functions.
> 
> Can that restriction be lifted for the EXPR_KIND_RPR_DEFINE case?

I am not sure at this point. Current PostgreSQL executor creates
WindowStatePerFuncData for each window function and aggregate
appearing in OVER clause. This means PREV/NEXT and other row pattern
navigation operators cannot have their own WindowStatePerFuncData if
they do not appear in OVER clauses in a query even if PREV/NEXT
etc. are defined as window function.

> Or
> does it make sense to split the pattern navigation "functions" into
> their own new concept, and maybe borrow some of the window function
> infrastructure for it?

Maybe. Suppose a window function executes row pattern matching using
price > PREV(price). The window function already receives
WindowStatePerFuncData. If we can pass the WindowStatePerFuncData to
PREV, we could let PREV do the real work (getting previous tuple).
I have not tried this yet, though.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp




Re: Row pattern recognition

2023-07-21 Thread Vik Fearing

On 7/22/23 01:14, Jacob Champion wrote:

On 7/20/23 17:07, Vik Fearing wrote:

On 7/21/23 01:36, Jacob Champion wrote:

(I've attached two failing tests against v2, to hopefully better
illustrate, along with what I_think_  should be the correct results.)


Almost.  You are matching 07-01-2023 but the condition is "price > 100".


D'oh. Correction attached. I think :)


This looks correct to my human brain.  Thanks!
--
Vik Fearing





Re: Row pattern recognition

2023-07-21 Thread Jacob Champion
On 7/20/23 23:16, Tatsuo Ishii wrote:
> I don't know at this point. I think context-free is not enough to be
> repsented in Bison. The grammer also needs to be LALR(1).  Moreover,
> adding the grammer to existing parser may generate shift/reduce
> errors.

Ah. It's been too long since my compilers classes; I will pipe down.

> One small concern is how to convert pattern variables to regex
> expression which our regex enginer understands. Suppose,
> 
> PATTERN UP+
> 
> Currently I convert "UP+" to "U+" so that it can be fed to the regexp
> engine. In order to do that, we need to know which part of the pattern
> (UP+) is the pattern variable ("UP"). For "UP+" it's quite easy. But
> for more complex regular expressions it would be not, unless PATTERN
> grammer can be analyzed by our parser to know which part is the
> pattern variable.

Is the eventual plan to generate multiple alternatives, and run the
regex against them one at a time?

>> I'm not familiar enough with this code yet to offer very concrete
>> suggestions, sorry... But at some point in the future, we need to be
>> able to skip forward and backward from arbitrary points, like
>>
>> DEFINE B AS B.price > PREV(FIRST(A.price), 3)
>>
>> so there won't be just one pair of "previous and next" tuples.
> 
> Yes, I know.

I apologize. I got overexplain-y.

>> Maybe
>> that can help clarify the design? It feels like it'll need to eventually
>> be a "real" function that operates on the window state, even if it
>> doesn't support all of the possible complexities in v1.
> 
> Unfortunately an window function can not call other window functions.

Can that restriction be lifted for the EXPR_KIND_RPR_DEFINE case? Or
does it make sense to split the pattern navigation "functions" into
their own new concept, and maybe borrow some of the window function
infrastructure for it?

Thanks!
--Jacob




Re: Row pattern recognition

2023-07-21 Thread Jacob Champion
On 7/20/23 17:07, Vik Fearing wrote:
> On 7/21/23 01:36, Jacob Champion wrote:
>> (I've attached two failing tests against v2, to hopefully better
>> illustrate, along with what I_think_  should be the correct results.)
> 
> Almost.  You are matching 07-01-2023 but the condition is "price > 100".

D'oh. Correction attached. I think :)

Thanks,
--Jacobdiff --git a/src/test/regress/expected/rpr.out 
b/src/test/regress/expected/rpr.out
index 6bf8818911..f3fd22de2a 100644
--- a/src/test/regress/expected/rpr.out
+++ b/src/test/regress/expected/rpr.out
@@ -230,6 +230,79 @@ SELECT company, tdate, price, rpr(price) OVER w FROM stock
  company2 | 07-10-2023 |  1300 | 
 (20 rows)
 
+-- match everything
+SELECT company, tdate, price, rpr(price) OVER w FROM stock
+ WINDOW w AS (
+ PARTITION BY company
+ ORDER BY tdate
+ ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
+ AFTER MATCH SKIP TO NEXT ROW
+ INITIAL
+ PATTERN (A+)
+ DEFINE
+  A AS TRUE
+);
+ company  |   tdate| price | rpr 
+--++---+-
+ company1 | 07-01-2023 |   100 | 100
+ company1 | 07-02-2023 |   200 |
+ company1 | 07-03-2023 |   150 |
+ company1 | 07-04-2023 |   140 |
+ company1 | 07-05-2023 |   150 |
+ company1 | 07-06-2023 |90 |
+ company1 | 07-07-2023 |   110 |
+ company1 | 07-08-2023 |   130 |
+ company1 | 07-09-2023 |   120 |
+ company1 | 07-10-2023 |   130 |
+ company2 | 07-01-2023 |50 |  50
+ company2 | 07-02-2023 |  2000 |
+ company2 | 07-03-2023 |  1500 |
+ company2 | 07-04-2023 |  1400 |
+ company2 | 07-05-2023 |  1500 |
+ company2 | 07-06-2023 |60 |
+ company2 | 07-07-2023 |  1100 |
+ company2 | 07-08-2023 |  1300 |
+ company2 | 07-09-2023 |  1200 |
+ company2 | 07-10-2023 |  1300 |
+(20 rows)
+
+-- backtracking with reclassification of rows
+SELECT company, tdate, price, rpr(price) OVER w FROM stock
+ WINDOW w AS (
+ PARTITION BY company
+ ORDER BY tdate
+ ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
+ AFTER MATCH SKIP TO NEXT ROW
+ INITIAL
+ PATTERN (A+ B+)
+ DEFINE
+  A AS price > 100,
+  B AS price > 100
+);
+ company  |   tdate| price | rpr  
+--++---+--
+ company1 | 07-01-2023 |   100 | 
+ company1 | 07-02-2023 |   200 |  200
+ company1 | 07-03-2023 |   150 | 
+ company1 | 07-04-2023 |   140 | 
+ company1 | 07-05-2023 |   150 | 
+ company1 | 07-06-2023 |90 | 
+ company1 | 07-07-2023 |   110 |  110
+ company1 | 07-08-2023 |   130 | 
+ company1 | 07-09-2023 |   120 | 
+ company1 | 07-10-2023 |   130 | 
+ company2 | 07-01-2023 |50 | 
+ company2 | 07-02-2023 |  2000 | 2000
+ company2 | 07-03-2023 |  1500 | 
+ company2 | 07-04-2023 |  1400 | 
+ company2 | 07-05-2023 |  1500 | 
+ company2 | 07-06-2023 |60 | 
+ company2 | 07-07-2023 |  1100 | 1100
+ company2 | 07-08-2023 |  1300 | 
+ company2 | 07-09-2023 |  1200 | 
+ company2 | 07-10-2023 |  1300 | 
+(20 rows)
+
 --
 -- Error cases
 --
diff --git a/src/test/regress/sql/rpr.sql b/src/test/regress/sql/rpr.sql
index 951c9abfe9..f1cd0369f4 100644
--- a/src/test/regress/sql/rpr.sql
+++ b/src/test/regress/sql/rpr.sql
@@ -94,6 +94,33 @@ SELECT company, tdate, price, rpr(price) OVER w FROM stock
   UPDOWN AS price > PREV(price) AND price > NEXT(price)
 );
 
+-- match everything
+SELECT company, tdate, price, rpr(price) OVER w FROM stock
+ WINDOW w AS (
+ PARTITION BY company
+ ORDER BY tdate
+ ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
+ AFTER MATCH SKIP TO NEXT ROW
+ INITIAL
+ PATTERN (A+)
+ DEFINE
+  A AS TRUE
+);
+
+-- backtracking with reclassification of rows
+SELECT company, tdate, price, rpr(price) OVER w FROM stock
+ WINDOW w AS (
+ PARTITION BY company
+ ORDER BY tdate
+ ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
+ AFTER MATCH SKIP TO NEXT ROW
+ INITIAL
+ PATTERN (A+ B+)
+ DEFINE
+  A AS price > 100,
+  B AS price > 100
+);
+
 --
 -- Error cases
 --


Re: Row pattern recognition

2023-07-21 Thread Tatsuo Ishii
Hi,

> Hi Ishii-san,
> 
> On 7/19/23 22:15, Tatsuo Ishii wrote:
>> Currently my patch has a limitation for the sake of simple
>> implementation: a pattern like "A+" is parsed and analyzed in the raw
>> parser. This makes subsequent process much easier because the pattern
>> element variable (in this case "A") and the quantifier (in this case
>> "+") is already identified by the raw parser. However there are much
>> more cases are allowed in the standard as you already pointed out. For
>> those cases probably we should give up to parse PATTERN items in the
>> raw parser, and instead the raw parser just accepts the elements as
>> Sconst?
> 
> Is there a concern that the PATTERN grammar can't be represented in
> Bison? I thought it was all context-free...

I don't know at this point. I think context-free is not enough to be
repsented in Bison. The grammer also needs to be LALR(1).  Moreover,
adding the grammer to existing parser may generate shift/reduce
errors.

> Or is the concern that the
> parse tree of the pattern is hard to feed into a regex engine?

One small concern is how to convert pattern variables to regex
expression which our regex enginer understands. Suppose,

PATTERN UP+

Currently I convert "UP+" to "U+" so that it can be fed to the regexp
engine. In order to do that, we need to know which part of the pattern
(UP+) is the pattern variable ("UP"). For "UP+" it's quite easy. But
for more complex regular expressions it would be not, unless PATTERN
grammer can be analyzed by our parser to know which part is the
pattern variable.

>> Any comments, especially on the PREV/NEXT implementation part is
>> welcome. Currently the DEFINE expression like "price > PREV(price)" is
>> prepared in ExecInitWindowAgg using ExecInitExpr,tweaking var->varno
>> in Var node so that PREV uses OUTER_VAR, NEXT uses INNER_VAR.  Then
>> evaluate the expression in ExecWindowAgg using ExecEvalExpr, setting
>> previous row TupleSlot to ExprContext->ecxt_outertuple, and next row
>> TupleSlot to ExprContext->ecxt_innertuple. I think this is temporary
>> hack and should be gotten ride of before v1 is committed. Better idea?
> 
> I'm not familiar enough with this code yet to offer very concrete
> suggestions, sorry... But at some point in the future, we need to be
> able to skip forward and backward from arbitrary points, like
> 
> DEFINE B AS B.price > PREV(FIRST(A.price), 3)
> 
> so there won't be just one pair of "previous and next" tuples.

Yes, I know.

> Maybe
> that can help clarify the design? It feels like it'll need to eventually
> be a "real" function that operates on the window state, even if it
> doesn't support all of the possible complexities in v1.

Unfortunately an window function can not call other window functions.

> Taking a closer look at the regex engine:
> 
> It looks like the + qualifier has trouble when it meets the end of the
> frame. For instance, try removing the last row of the 'stock' table in
> rpr.sql; some of the final matches will disappear unexpectedly. Or try a
> pattern like
> 
> PATTERN ( a+ )
>  DEFINE a AS TRUE
> 
> which doesn't seem to match anything in my testing.
> 
> There's also the issue of backtracking in the face of reclassification,
> as I think Vik was alluding to upthread. The pattern
> 
> PATTERN ( a+ b+ )
>  DEFINE a AS col = 2,
> b AS col = 2
> 
> doesn't match a sequence of values (2 2 ...) with the current
> implementation, even with a dummy row at the end to avoid the
> end-of-frame bug.
> 
> (I've attached two failing tests against v2, to hopefully better
> illustrate, along with what I _think_ should be the correct results.)

Thanks. I will look into this.

> I'm not quite understanding the match loop in evaluate_pattern(). It
> looks like we're building up a string to pass to the regex engine, but
> by the we call regexp_instr, don't we already know whether or not the
> pattern will match based on the expression evaluation we've done?

For "+" yes. But for more complex regular expression like '{n}', we
need to call our regexp engine to check if the pattern matches.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp




Re: Row pattern recognition

2023-07-20 Thread Vik Fearing

On 7/21/23 01:36, Jacob Champion wrote:

There's also the issue of backtracking in the face of reclassification,
as I think Vik was alluding to upthread.


We definitely need some kind of backtracking or other reclassification 
method.



(I've attached two failing tests against v2, to hopefully better
illustrate, along with what I_think_  should be the correct results.)


Almost.  You are matching 07-01-2023 but the condition is "price > 100".
--
Vik Fearing





Re: Row pattern recognition

2023-07-20 Thread Jacob Champion
Hi Ishii-san,

On 7/19/23 22:15, Tatsuo Ishii wrote:
> Currently my patch has a limitation for the sake of simple
> implementation: a pattern like "A+" is parsed and analyzed in the raw
> parser. This makes subsequent process much easier because the pattern
> element variable (in this case "A") and the quantifier (in this case
> "+") is already identified by the raw parser. However there are much
> more cases are allowed in the standard as you already pointed out. For
> those cases probably we should give up to parse PATTERN items in the
> raw parser, and instead the raw parser just accepts the elements as
> Sconst?

Is there a concern that the PATTERN grammar can't be represented in
Bison? I thought it was all context-free... Or is the concern that the
parse tree of the pattern is hard to feed into a regex engine?

> Any comments, especially on the PREV/NEXT implementation part is
> welcome. Currently the DEFINE expression like "price > PREV(price)" is
> prepared in ExecInitWindowAgg using ExecInitExpr,tweaking var->varno
> in Var node so that PREV uses OUTER_VAR, NEXT uses INNER_VAR.  Then
> evaluate the expression in ExecWindowAgg using ExecEvalExpr, setting
> previous row TupleSlot to ExprContext->ecxt_outertuple, and next row
> TupleSlot to ExprContext->ecxt_innertuple. I think this is temporary
> hack and should be gotten ride of before v1 is committed. Better idea?

I'm not familiar enough with this code yet to offer very concrete
suggestions, sorry... But at some point in the future, we need to be
able to skip forward and backward from arbitrary points, like

DEFINE B AS B.price > PREV(FIRST(A.price), 3)

so there won't be just one pair of "previous and next" tuples. Maybe
that can help clarify the design? It feels like it'll need to eventually
be a "real" function that operates on the window state, even if it
doesn't support all of the possible complexities in v1.

--

Taking a closer look at the regex engine:

It looks like the + qualifier has trouble when it meets the end of the
frame. For instance, try removing the last row of the 'stock' table in
rpr.sql; some of the final matches will disappear unexpectedly. Or try a
pattern like

PATTERN ( a+ )
 DEFINE a AS TRUE

which doesn't seem to match anything in my testing.

There's also the issue of backtracking in the face of reclassification,
as I think Vik was alluding to upthread. The pattern

PATTERN ( a+ b+ )
 DEFINE a AS col = 2,
b AS col = 2

doesn't match a sequence of values (2 2 ...) with the current
implementation, even with a dummy row at the end to avoid the
end-of-frame bug.

(I've attached two failing tests against v2, to hopefully better
illustrate, along with what I _think_ should be the correct results.)

I'm not quite understanding the match loop in evaluate_pattern(). It
looks like we're building up a string to pass to the regex engine, but
by the we call regexp_instr, don't we already know whether or not the
pattern will match based on the expression evaluation we've done?

Thanks,
--Jacobdiff --git a/src/test/regress/expected/rpr.out 
b/src/test/regress/expected/rpr.out
index 6bf8818911..68e9a98684 100644
--- a/src/test/regress/expected/rpr.out
+++ b/src/test/regress/expected/rpr.out
@@ -230,6 +230,79 @@ SELECT company, tdate, price, rpr(price) OVER w FROM stock
  company2 | 07-10-2023 |  1300 | 
 (20 rows)
 
+-- match everything
+SELECT company, tdate, price, rpr(price) OVER w FROM stock
+ WINDOW w AS (
+ PARTITION BY company
+ ORDER BY tdate
+ ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
+ AFTER MATCH SKIP TO NEXT ROW
+ INITIAL
+ PATTERN (A+)
+ DEFINE
+  A AS TRUE
+);
+ company  |   tdate| price | rpr 
+--++---+-
+ company1 | 07-01-2023 |   100 | 100
+ company1 | 07-02-2023 |   200 |
+ company1 | 07-03-2023 |   150 |
+ company1 | 07-04-2023 |   140 |
+ company1 | 07-05-2023 |   150 |
+ company1 | 07-06-2023 |90 |
+ company1 | 07-07-2023 |   110 |
+ company1 | 07-08-2023 |   130 |
+ company1 | 07-09-2023 |   120 |
+ company1 | 07-10-2023 |   130 |
+ company2 | 07-01-2023 |50 |  50
+ company2 | 07-02-2023 |  2000 |
+ company2 | 07-03-2023 |  1500 |
+ company2 | 07-04-2023 |  1400 |
+ company2 | 07-05-2023 |  1500 |
+ company2 | 07-06-2023 |60 |
+ company2 | 07-07-2023 |  1100 |
+ company2 | 07-08-2023 |  1300 |
+ company2 | 07-09-2023 |  1200 |
+ company2 | 07-10-2023 |  1300 |
+(20 rows)
+
+-- backtracking with reclassification of rows
+SELECT company, tdate, price, rpr(price) OVER w FROM stock
+ WINDOW w AS (
+ PARTITION BY company
+ ORDER BY tdate
+ ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
+ AFTER MATCH SKIP TO NEXT ROW
+ INITIAL
+ PATTERN (A+ B+)
+ DEFINE
+  A AS price > 100,
+  B AS price > 100
+);
+ company  |   tdate| price | rpr  
+--++---+--
+ company1 | 07-01-2023 |   100 |  100
+ company1 | 07-02-2023 |   200 | 
+ 

Re: Row pattern recognition

2023-07-19 Thread Tatsuo Ishii
> Hello,
> 
> Thanks for working on this! We're interested in RPR as well, and I've
> been trying to get up to speed with the specs, to maybe make myself
> useful.

Thank you for being interested in this.

> 19075-5 discusses that, at least; not sure about other parts of the spec.

Thanks for the info. Unfortunately I don't have 19075-5 though.

>> Maybe an insane idea but what about rewriting MATCH_RECOGNIZE clause
>> into Window clause with RPR?
> 
> Are we guaranteed to always have an equivalent window clause? There seem
> to be many differences between the two, especially when it comes to ONE
> ROW/ALL ROWS PER MATCH.

You are right. I am not 100% sure if the rewriting is possible at this
point.

> To add onto what Vik said above:
> 
>>> It seems RPR in the standard is quite complex. I think we can start
>>> with a small subset of RPR then we could gradually enhance the
>>> implementation.
>> 
>> I have no problem with that as long as we don't paint ourselves into a 
>> corner.
> 
> To me, PATTERN looks like an area where we may want to support a broader
> set of operations in the first version.

Me too but...

> The spec has a bunch of
> discussion around cases like empty matches, match order of alternation
> and permutation, etc., which are not possible to express or test with
> only the + quantifier. Those might be harder to get right in a v2, if we
> don't at least keep them in mind for v1?

Currently my patch has a limitation for the sake of simple
implementation: a pattern like "A+" is parsed and analyzed in the raw
parser. This makes subsequent process much easier because the pattern
element variable (in this case "A") and the quantifier (in this case
"+") is already identified by the raw parser. However there are much
more cases are allowed in the standard as you already pointed out. For
those cases probably we should give up to parse PATTERN items in the
raw parser, and instead the raw parser just accepts the elements as
Sconst?

>> +static List *
>> +transformPatternClause(ParseState *pstate, WindowClause *wc, WindowDef 
>> *windef)
>> +{
>> +   List*patterns;
> 
> My compiler complains about the `patterns` variable here, which is
> returned without ever being initialized. (The caller doesn't seem to use
> it.)

Will fix.

>> +-- basic test using PREV
>> +SELECT company, tdate, price, rpr(price) OVER w FROM stock
>> + WINDOW w AS (
>> + PARTITION BY company
>> + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
>> + INITIAL
>> + PATTERN (START UP+ DOWN+)
>> + DEFINE
>> +  START AS TRUE,
>> +  UP AS price > PREV(price),
>> +  DOWN AS price < PREV(price)
>> +);
> 
> nitpick: IMO the tests should be making use of ORDER BY in the window
> clauses.

Right. Will fix.

> This is a very big feature. I agree with you that MEASURES seems like a
> very important "next piece" to add. Are there other areas where you
> would like reviewers to focus on right now (or avoid)?

Any comments, especially on the PREV/NEXT implementation part is
welcome. Currently the DEFINE expression like "price > PREV(price)" is
prepared in ExecInitWindowAgg using ExecInitExpr,tweaking var->varno
in Var node so that PREV uses OUTER_VAR, NEXT uses INNER_VAR.  Then
evaluate the expression in ExecWindowAgg using ExecEvalExpr, setting
previous row TupleSlot to ExprContext->ecxt_outertuple, and next row
TupleSlot to ExprContext->ecxt_innertuple. I think this is temporary
hack and should be gotten ride of before v1 is committed. Better idea?

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp




Re: Row pattern recognition

2023-07-19 Thread Jacob Champion
Hello,

Thanks for working on this! We're interested in RPR as well, and I've
been trying to get up to speed with the specs, to maybe make myself
useful.

On 6/27/23 17:58, Tatsuo Ishii wrote:
> Yes. (I think the standard calls the window frame as "full window
> frame" in context of RPR to make a contrast with the subset of the
> frame rows restricted by RPR. The paper I refered to as [2] claims
> that the latter window frame is called "reduced window frame" in the
> standard but I wasn't able to find the term in the standard.)

19075-5 discusses that, at least; not sure about other parts of the spec.

> Maybe an insane idea but what about rewriting MATCH_RECOGNIZE clause
> into Window clause with RPR?

Are we guaranteed to always have an equivalent window clause? There seem
to be many differences between the two, especially when it comes to ONE
ROW/ALL ROWS PER MATCH.

--

To add onto what Vik said above:

>> It seems RPR in the standard is quite complex. I think we can start
>> with a small subset of RPR then we could gradually enhance the
>> implementation.
> 
> I have no problem with that as long as we don't paint ourselves into a 
> corner.

To me, PATTERN looks like an area where we may want to support a broader
set of operations in the first version. The spec has a bunch of
discussion around cases like empty matches, match order of alternation
and permutation, etc., which are not possible to express or test with
only the + quantifier. Those might be harder to get right in a v2, if we
don't at least keep them in mind for v1?

> +static List *
> +transformPatternClause(ParseState *pstate, WindowClause *wc, WindowDef 
> *windef)
> +{
> +   List*patterns;

My compiler complains about the `patterns` variable here, which is
returned without ever being initialized. (The caller doesn't seem to use
it.)

> +-- basic test using PREV
> +SELECT company, tdate, price, rpr(price) OVER w FROM stock
> + WINDOW w AS (
> + PARTITION BY company
> + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
> + INITIAL
> + PATTERN (START UP+ DOWN+)
> + DEFINE
> +  START AS TRUE,
> +  UP AS price > PREV(price),
> +  DOWN AS price < PREV(price)
> +);

nitpick: IMO the tests should be making use of ORDER BY in the window
clauses.

This is a very big feature. I agree with you that MEASURES seems like a
very important "next piece" to add. Are there other areas where you
would like reviewers to focus on right now (or avoid)?

Thanks!
--Jacob




Re: Row pattern recognition

2023-06-28 Thread Vik Fearing

On 6/28/23 14:17, Tatsuo Ishii wrote:

Small question.


This query (with all the defaults made explicit):

SELECT s.company, s.tdate, s.price,
FIRST_VALUE(s.tdate) OVER w,
LAST_VALUE(s.tdate) OVER w,
lowest OVER w
FROM stock AS s
WINDOW w AS (
   PARTITION BY s.company
   ORDER BY s.tdate
   ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
   EXCLUDE NO OTHERS
   MEASURES
 LAST(DOWN) AS lowest
   AFTER MATCH SKIP PAST LAST ROW
   INITIAL PATTERN (START DOWN+ UP+)
   DEFINE
 START AS TRUE,
 UP AS price > PREV(price),
 DOWN AS price < PREV(price)
);



 LAST(DOWN) AS lowest


should be "LAST(DOWN.price) AS lowest"?


Yes, it should be.  And the tdate='07-08-2023' row in the first 
resultset should have '07-08-2023' and '07-10-2023' as its 4th and 5th 
columns.


Since my brain is doing the processing instead of postgres, I made some 
human errors. :-)

--
Vik Fearing





Re: Row pattern recognition

2023-06-28 Thread Tatsuo Ishii
Small question.

> This query (with all the defaults made explicit):
> 
> SELECT s.company, s.tdate, s.price,
>FIRST_VALUE(s.tdate) OVER w,
>LAST_VALUE(s.tdate) OVER w,
>lowest OVER w
> FROM stock AS s
> WINDOW w AS (
>   PARTITION BY s.company
>   ORDER BY s.tdate
>   ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
>   EXCLUDE NO OTHERS
>   MEASURES
> LAST(DOWN) AS lowest
>   AFTER MATCH SKIP PAST LAST ROW
>   INITIAL PATTERN (START DOWN+ UP+)
>   DEFINE
> START AS TRUE,
> UP AS price > PREV(price),
> DOWN AS price < PREV(price)
> );

> LAST(DOWN) AS lowest

should be "LAST(DOWN.price) AS lowest"?

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp




Re: Row pattern recognition

2023-06-27 Thread Tatsuo Ishii
> Okay, I see the problem now, and why you need the rpr() function.
> 
> You are doing this as something that happens over a window frame, but
> it is actually something that *reduces* the window frame.  The pattern
> matching needs to be done when the frame is calculated and not when
> any particular function is applied over it.

Yes. (I think the standard calls the window frame as "full window
frame" in context of RPR to make a contrast with the subset of the
frame rows restricted by RPR. The paper I refered to as [2] claims
that the latter window frame is called "reduced window frame" in the
standard but I wasn't able to find the term in the standard.)

I wanted to demonstate that pattern matching logic is basically
correct in the PoC patch. Now what I need to do is, move the row
pattern matching logic to somewhere inside nodeWindowAgg so that
"restricted window frame" can be applied to all window functions and
window aggregates. Currently I am looking into update_frameheadpos()
and update_frametailpos() which calculate the frame head and tail
against current row. What do you think?

> This query (with all the defaults made explicit):
> 
> SELECT s.company, s.tdate, s.price,
>FIRST_VALUE(s.tdate) OVER w,
>LAST_VALUE(s.tdate) OVER w,
>lowest OVER w
> FROM stock AS s
> WINDOW w AS (
>   PARTITION BY s.company
>   ORDER BY s.tdate
>   ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
>   EXCLUDE NO OTHERS
>   MEASURES
> LAST(DOWN) AS lowest
>   AFTER MATCH SKIP PAST LAST ROW
>   INITIAL PATTERN (START DOWN+ UP+)
>   DEFINE
> START AS TRUE,
> UP AS price > PREV(price),
> DOWN AS price < PREV(price)
> );
> 
> Should produce this result:

[snip]

Thanks for the examples. I agree with the expected query results.

 o SUBSET is not supported
>>>
>>> Is this because you haven't done it yet, or because you ran into
>>> problems trying to do it?
>> Because it seems SUBSET is not useful without MEASURES support. Thus
>> my plan is, firstly implement MEASURES, then SUBSET. What do you
>> think?
> 
> 
> SUBSET elements can be used in DEFINE clauses, but I do not think this
> is important compared to other features.

Ok.

>>> I have not looked at the patch yet, but is the reason for doing R020
>>> before R010 because you haven't done the MEASURES clause yet?
>> One of the reasons is, implementing MATCH_RECOGNIZE (R010) looked
>> harder for me because modifying main SELECT clause could be a hard
>> work. Another reason is, I had no idea how to implement PREV/NEXT in
>> other than in WINDOW clause. Other people might feel differently
>> though.
> 
> 
> I think we could do this with a single tuplesort if we use
> backtracking (which might be really slow for some patterns).  I have
> not looked into it in any detail.
> 
> We would need to be able to remove tuples from the end (even if only
> logically), and be able to update tuples inside the store.  Both of
> those needs come from backtracking and possibly changing the
> classifier.
> 
> Without backtracking, I don't see how we could do it without have a
> separate tuplestore for every current possible match.

Maybe an insane idea but what about rewriting MATCH_RECOGNIZE clause
into Window clause with RPR?

> I looked at your v2 patches a little bit and the only comment that I
> currently have on the code is you spelled PERMUTE as
> PREMUTE. Everything else is hopefully explained above.

Thanks. Will fix.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp




Re: Row pattern recognition

2023-06-26 Thread Vik Fearing

On 6/26/23 03:05, Tatsuo Ishii wrote:

I don't understand this.  RPR in a window specification limits the
window to the matched rows, so this looks like your rpr() function is
just the regular first_value() window function that we already have?


No, rpr() is different from first_value(). rpr() returns the argument
value at the first row in a frame only when matched rows found. On the
other hand first_value() returns the argument value at the first row
in a frame unconditionally.

company  |   tdate| price | rpr  | first_value
--++---+--+-
  company1 | 2023-07-01 |   100 |  | 100
  company1 | 2023-07-02 |   200 |  200 | 200
  company1 | 2023-07-03 |   150 |  150 | 150
  company1 | 2023-07-04 |   140 |  | 140
  company1 | 2023-07-05 |   150 |  150 | 150
  company1 | 2023-07-06 |90 |  |  90
  company1 | 2023-07-07 |   110 |  | 110
  company1 | 2023-07-08 |   130 |  | 130
  company1 | 2023-07-09 |   120 |  | 120
  company1 | 2023-07-10 |   130 |  | 130

For example, a frame starting with (tdate = 2023-07-02, price = 200)
consists of rows (price = 200, 150, 140, 150) satisfying the pattern,
thus rpr returns 200. Since in this example frame option "ROWS BETWEEN
CURRENT ROW AND UNBOUNDED FOLLOWING" is specified, next frame starts
with (tdate = 2023-07-03, price = 150). This frame satisfies the
pattern too (price = 150, 140, 150), and rpr retus 150... and so on.



Okay, I see the problem now, and why you need the rpr() function.

You are doing this as something that happens over a window frame, but it 
is actually something that *reduces* the window frame.  The pattern 
matching needs to be done when the frame is calculated and not when any 
particular function is applied over it.


This query (with all the defaults made explicit):

SELECT s.company, s.tdate, s.price,
   FIRST_VALUE(s.tdate) OVER w,
   LAST_VALUE(s.tdate) OVER w,
   lowest OVER w
FROM stock AS s
WINDOW w AS (
  PARTITION BY s.company
  ORDER BY s.tdate
  ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
  EXCLUDE NO OTHERS
  MEASURES
LAST(DOWN) AS lowest
  AFTER MATCH SKIP PAST LAST ROW
  INITIAL PATTERN (START DOWN+ UP+)
  DEFINE
START AS TRUE,
UP AS price > PREV(price),
DOWN AS price < PREV(price)
);

Should produce this result:

 company  |   tdate| price | first_value | last_value | lowest
--++---+-++
 company1 | 07-01-2023 |   100 | ||
 company1 | 07-02-2023 |   200 | 07-02-2023  | 07-05-2023 |140
 company1 | 07-03-2023 |   150 | ||
 company1 | 07-04-2023 |   140 | ||
 company1 | 07-05-2023 |   150 | ||
 company1 | 07-06-2023 |90 | ||
 company1 | 07-07-2023 |   110 | ||
 company1 | 07-08-2023 |   130 | 07-05-2023  | 07-05-2023 |120
 company1 | 07-09-2023 |   120 | ||
 company1 | 07-10-2023 |   130 | ||
(10 rows)

Or if we switch to AFTER MATCH SKIP TO NEXT ROW, then we get:

 company  |   tdate| price | first_value | last_value | lowest
--++---+-++
 company1 | 07-01-2023 |   100 | ||
 company1 | 07-02-2023 |   200 | 07-02-2023  | 07-05-2023 |140
 company1 | 07-03-2023 |   150 | 07-03-2023  | 07-05-2023 |140
 company1 | 07-04-2023 |   140 | ||
 company1 | 07-05-2023 |   150 | 07-05-2023  | 07-08-2023 | 90
 company1 | 07-06-2023 |90 | ||
 company1 | 07-07-2023 |   110 | ||
 company1 | 07-08-2023 |   130 | 07-08-2023  | 07-10-2023 |120
 company1 | 07-09-2023 |   120 | ||
 company1 | 07-10-2023 |   130 | ||
(10 rows)

And then if we change INITIAL to SEEK:

 company  |   tdate| price | first_value | last_value | lowest
--++---+-++
 company1 | 07-01-2023 |   100 | 07-02-2023  | 07-05-2023 |140
 company1 | 07-02-2023 |   200 | 07-02-2023  | 07-05-2023 |140
 company1 | 07-03-2023 |   150 | 07-03-2023  | 07-05-2023 |140
 company1 | 07-04-2023 |   140 | 07-05-2023  | 07-08-2023 | 90
 company1 | 07-05-2023 |   150 | 07-05-2023  | 07-08-2023 | 90
 company1 | 07-06-2023 |90 | 07-08-2023  | 07-10-2023 |120
 company1 | 07-07-2023 |   110 | 07-08-2023  | 07-10-2023 |120
 company1 | 07-08-2023 |   130 | 07-08-2023  | 07-10-2023 |120
 company1 | 07-09-2023 |   120 | ||
 company1 | 07-10-2023 |   130 | ||
(10 rows)

Since the pattern recognition is part of the frame, the window 
aggregates should Just Work.




o SUBSET is not supported


Is this because 

Re: Row pattern recognition

2023-06-26 Thread Tatsuo Ishii
>> In this case, we should require the user to specify AFTER MATCH SKIP
>> TO NEXT ROW so that behavior doesn't change when we implement the
>> standard default.  (Your patch might do this already.)
> 
> Agreed. I will implement AFTER MATCH SKIP PAST LAST ROW in the next
> patch and I will change the default to AFTER MATCH SKIP PAST LAST ROW.

Attached is the v2 patch to add support for AFTER MATCH SKIP PAST LAST
ROW and AFTER MATCH SKIP PAST LAST ROW. The default is AFTER MATCH
SKIP PAST LAST ROW as the standard default. Here are some examples to
demonstrate how those clauses affect the query result.

SELECT i, rpr(i) OVER w
  FROM (VALUES (1), (2), (3), (4)) AS v (i)
  WINDOW w AS (
   ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
   AFTER MATCH SKIP PAST LAST ROW
   PATTERN (A B)
   DEFINE
A AS i <= 2,
B AS i <= 3
);
 i | rpr 
---+-
 1 |   1
 2 |
 3 |
 4 |
(4 rows)

In this example rpr starts from i = 1 and find that row i = 1
satisfies A, and row i = 2 satisfies B. Then rpr moves to row i = 3
and find that it does not satisfy A, thus the result is NULL. Same
thing can be said to row i = 4.

SELECT i, rpr(i) OVER w
  FROM (VALUES (1), (2), (3), (4)) AS v (i)
  WINDOW w AS (
   ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
   AFTER MATCH SKIP TO NEXT ROW
   PATTERN (A B)
   DEFINE
A AS i <= 2,
B AS i <= 3
);
 i | rpr 
---+-
 1 |   1
 2 |   2
 3 |
 4 |
(4 rows)

In this example rpr starts from i = 1 and find that row i = 1
satisfies A, and row i = 2 satisfies B (same as above). Then rpr moves
to row i = 2, rather than 3 because AFTER MATCH SKIP TO NEXT ROW is
specified.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
>From 92aebdb8a8c6d3739ee49b755f281e77c34f5d36 Mon Sep 17 00:00:00 2001
From: Tatsuo Ishii 
Date: Mon, 26 Jun 2023 17:05:35 +0900
Subject: [PATCH v2 1/7] Row pattern recognition patch for raw parser.

---
 src/backend/parser/gram.y   | 218 +---
 src/include/nodes/parsenodes.h  |  54 
 src/include/parser/kwlist.h |   8 ++
 src/include/parser/parse_node.h |   1 +
 4 files changed, 266 insertions(+), 15 deletions(-)

diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 39ab7eac0d..5cd3ebaa98 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -251,6 +251,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	DefElem	   *defelt;
 	SortBy	   *sortby;
 	WindowDef  *windef;
+	RPCommonSyntax	*rpcom;
+	RPSubsetItem	*rpsubset;
 	JoinExpr   *jexpr;
 	IndexElem  *ielem;
 	StatsElem  *selem;
@@ -453,8 +455,12 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 TriggerTransitions TriggerReferencing
 vacuum_relation_list opt_vacuum_relation_list
 drop_option_list pub_obj_list
-
-%type 	opt_routine_body
+row_pattern_measure_list row_pattern_definition_list
+opt_row_pattern_subset_clause
+row_pattern_subset_list row_pattern_subset_rhs
+row_pattern
+%type 	 row_pattern_subset_item
+%type 	opt_routine_body row_pattern_term
 %type  group_clause
 %type 	group_by_list
 %type 	group_by_item empty_grouping_set rollup_clause cube_clause
@@ -551,6 +557,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type 	relation_expr_opt_alias
 %type 	tablesample_clause opt_repeatable_clause
 %type 	target_el set_target insert_column_item
+row_pattern_measure_item row_pattern_definition
 
 %type 		generic_option_name
 %type 	generic_option_arg
@@ -633,6 +640,9 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type 	window_clause window_definition_list opt_partition_clause
 %type 	window_definition over_clause window_specification
 opt_frame_clause frame_extent frame_bound
+%type 	opt_row_pattern_common_syntax opt_row_pattern_skip_to
+%type 	opt_row_pattern_initial_or_seek
+%type 	opt_row_pattern_measures
 %type 	opt_window_exclusion_clause
 %type 		opt_existing_window_name
 %type  opt_if_not_exists
@@ -645,7 +655,6 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type 		hash_partbound
 %type 		hash_partbound_elem
 
-
 %type 		json_format_clause_opt
 	json_value_expr
 	json_output_clause_opt
@@ -705,7 +714,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	CURRENT_TIME CURRENT_TIMESTAMP CURRENT_USER CURSOR CYCLE
 
 	DATA_P DATABASE DAY_P DEALLOCATE DEC DECIMAL_P DECLARE DEFAULT DEFAULTS
-	DEFERRABLE DEFERRED DEFINER DELETE_P DELIMITER DELIMITERS DEPENDS DEPTH DESC
+	DEFERRABLE DEFERRED DEFINE DEFINER DELETE_P DELIMITER DELIMITERS DEPENDS DEPTH DESC
 	DETACH DICTIONARY DISABLE_P DISCARD DISTINCT DO DOCUMENT_P DOMAIN_P
 	DOUBLE_P DROP
 
@@ -721,7 +730,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *que

Re: Row pattern recognition

2023-06-25 Thread Tatsuo Ishii
> I have been dreaming of and lobbying for someone to pick up this
> feature.  I will be sure to review it from a standards perspective and
> will try my best to help with the technical aspect, but I am not sure
> to have the qualifications for that.
> 
> THANK YOU!

Thank you for looking into my proposal.

>> (I know SQL:2023 is already out, but I don't have access to it)
> 
> If you can, try to get ISO/IEC 19075-5 which is a guide to RPR instead
> of just its technical specification.
> 
> https://www.iso.org/standard/78936.html

Thanks for the info.

> I don't understand this.  RPR in a window specification limits the
> window to the matched rows, so this looks like your rpr() function is
> just the regular first_value() window function that we already have?

No, rpr() is different from first_value(). rpr() returns the argument
value at the first row in a frame only when matched rows found. On the
other hand first_value() returns the argument value at the first row
in a frame unconditionally.

company  |   tdate| price | rpr  | first_value 
--++---+--+-
 company1 | 2023-07-01 |   100 |  | 100
 company1 | 2023-07-02 |   200 |  200 | 200
 company1 | 2023-07-03 |   150 |  150 | 150
 company1 | 2023-07-04 |   140 |  | 140
 company1 | 2023-07-05 |   150 |  150 | 150
 company1 | 2023-07-06 |90 |  |  90
 company1 | 2023-07-07 |   110 |  | 110
 company1 | 2023-07-08 |   130 |  | 130
 company1 | 2023-07-09 |   120 |  | 120
 company1 | 2023-07-10 |   130 |  | 130

For example, a frame starting with (tdate = 2023-07-02, price = 200)
consists of rows (price = 200, 150, 140, 150) satisfying the pattern,
thus rpr returns 200. Since in this example frame option "ROWS BETWEEN
CURRENT ROW AND UNBOUNDED FOLLOWING" is specified, next frame starts
with (tdate = 2023-07-03, price = 150). This frame satisfies the
pattern too (price = 150, 140, 150), and rpr retus 150... and so on.

> As in your example, you cannot have START.price outside of the window
> specification; it can only go in the MEASURES clause.  Only startprice
> is allowed outside and it gets its qualification from the OVER.  Using
> w.startprice might have been better but that would require window
> names to be in the same namespace as range tables.
> 
> This currently works in Postgres:
> 
>   SELECT RANK() OVER w
>   FROM (VALUES (1)) AS w (x)
>   WINDOW w AS (ORDER BY w.x);

Interesting.

>> o SUBSET is not supported
> 
> Is this because you haven't done it yet, or because you ran into
> problems trying to do it?

Because it seems SUBSET is not useful without MEASURES support. Thus
my plan is, firstly implement MEASURES, then SUBSET. What do you
think?

>> o Regular expressions other than "+" are not supported
> 
> This is what I had a hard time imagining how to do while thinking
> about it.  The grammar is so different here and we allow many more
> operators (like "?" which is also the standard parameter symbol).
> People more expert than me will have to help here.

Yes, that is a problem.

> In this case, we should require the user to specify AFTER MATCH SKIP
> TO NEXT ROW so that behavior doesn't change when we implement the
> standard default.  (Your patch might do this already.)

Agreed. I will implement AFTER MATCH SKIP PAST LAST ROW in the next
patch and I will change the default to AFTER MATCH SKIP PAST LAST ROW.

>> o Aggregate functions associated with window clause using RPR do not
>> respect RPR
> 
> I do not understand what this means.

Ok, let me explain. See example below. In my understanding "count"
should retun the number of rows in a frame restriced by the match
condition. For example at the first line (2023-07-01 | 100) count
returns 10. I think this should be 0 because the "restriced" frame
starting at the line contains no matched row. On the other hand the
(restricted) frame starting at second line (2023-07-02 | 200) contains
4 rows, thus count should return 4, instead of 9.

SELECT company, tdate, price, rpr(price) OVER w, count(*) OVER w FROM stock
 WINDOW w AS (
 PARTITION BY company
 ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
 PATTERN (START DOWN+ UP+)
 DEFINE
  START AS TRUE,
  UP AS price > PREV(price),
  DOWN AS price < PREV(price)
);

company  |   tdate| price | rpr  | count 
--++---+--+---
 company1 | 2023-07-01 |   100 |  |10
 company1 | 2023-07-02 |   200 |  200 | 9
 company1 | 2023-07-03 |   150 |  150 | 8
 company1 | 2023-07-04 |   140 |  | 7
 company1 | 2023-07-05 |   150 |  150 | 6
 company1 | 2023-07-06 |90 |  | 5
 company1 | 2023-07-07 |   110 |  | 4
 company1 | 2023-07-08 |   130 |  | 3
 company1 | 2023-07-09 |   120 |  | 2
 company1 | 2023-07-10 |   130 |  | 1

>> It seems RPR in the standard is quite complex. I think we can start
>> with a small subset 

Re: Row pattern recognition

2023-06-25 Thread Vik Fearing

On 6/25/23 14:05, Tatsuo Ishii wrote:

Attached is a PoC patch to implement "Row pattern recognition" (RPR)
in SQL:2016 (I know SQL:2023 is already out, but I don't have access
to it). Actually SQL:2016 defines RPR in two places[1]:

 Feature R010, “Row pattern recognition: FROM clause”
 Feature R020, “Row pattern recognition: WINDOW clause”

The patch includes partial support for R020 part.


I have been dreaming of and lobbying for someone to pick up this 
feature.  I will be sure to review it from a standards perspective and 
will try my best to help with the technical aspect, but I am not sure to 
have the qualifications for that.


THANK YOU!

> (I know SQL:2023 is already out, but I don't have access to it)

If you can, try to get ISO/IEC 19075-5 which is a guide to RPR instead 
of just its technical specification.


https://www.iso.org/standard/78936.html


- What is RPR?

RPR provides a way to search series of data using regular expression
patterns. Suppose you have a stock database.

CREATE TABLE stock (
company TEXT,
tdate DATE,
price BIGINT);

You want to find a "V-shaped" pattern: i.e. price goes down for 1 or
more days, then goes up for 1 or more days. If we try to implement the
query in PostgreSQL, it could be quite complex and inefficient.

RPR provides convenient way to implement the query.

SELECT company, tdate, price, rpr(price) OVER w FROM stock
  WINDOW w AS (
  PARTITION BY company
  ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
  PATTERN (START DOWN+ UP+)
  DEFINE
   START AS TRUE,
   UP AS price > PREV(price),
   DOWN AS price < PREV(price)
);

"PATTERN" and "DEFINE" are the key clauses of RPR. DEFINE defines 3
"row pattern variables" namely START, UP and DOWN. They are associated
with logical expressions namely "TRUE", "price > PREV(price)", and
"price < PREV(price)". Note that "PREV" function returns price column
in the previous row. So, UP is true if price is higher than previous
day. On the other hand, DOWN is true if price is lower than previous
day.  PATTERN uses the row pattern variables to create a necessary
pattern.  In this case, the first row is always match because START is
always true, and second or more rows match with "UP" ('+' is a regular
expression representing one or more), and subsequent rows match with
"DOWN".

Here is the sample output.

  company  |   tdate| price | rpr
--++---+--
  company1 | 2023-07-01 |   100 |
  company1 | 2023-07-02 |   200 |  200 -- 200->150->140->150
  company1 | 2023-07-03 |   150 |  150 -- 150->140->150
  company1 | 2023-07-04 |   140 |
  company1 | 2023-07-05 |   150 |  150 -- 150->90->110->130
  company1 | 2023-07-06 |90 |
  company1 | 2023-07-07 |   110 |
  company1 | 2023-07-08 |   130 |
  company1 | 2023-07-09 |   120 |
  company1 | 2023-07-10 |   130 |

rpr shows the first row if all the patterns are satisfied. In the
example above 200, 150, 150 are the cases.  Other rows are shown as
NULL. For example, on 2023-07-02 price starts with 200, then goes down
to 150 then 140 but goes up 150 next day.


I don't understand this.  RPR in a window specification limits the 
window to the matched rows, so this looks like your rpr() function is 
just the regular first_value() window function that we already have?



As far as I know, only Oracle implements RPR (only R010. R020 is not
implemented) among OSS/commercial RDBMSs. There are a few DWH software
having RPR. According to [2] they are Snowflake and MS Stream
Analytics. It seems Trino is another one[3].


I thought DuckDB had it already, but it looks like I was wrong.


- Note about the patch

The current implementation is not only a subset of the standard, but
is different from it in some places. The example above is written as
follows according to the standard:

SELECT company, tdate, startprice OVER w FROM stock
  WINDOW w AS (
  PARTITION BY company
  MEASURES
   START.price AS startprice
  ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
  PATTERN (START DOWN+ UP+)
  DEFINE
   START AS TRUE,
   UP AS UP.price > PREV(UP.price),
   DOWN AS DOWN.price < PREV(DOWN.price)
);

Notice that rpr(price) is written as START.price and startprice in the
standard. MEASURES defines variable names used in the target list used
with "OVER window". As OVER only allows functions in PostgreSQL, I had
to make up a window function "rpr" which performs the row pattern
recognition task.  I was not able to find a way to implement
expressions like START.price (START is not a table alias). Any
suggestion is greatly appreciated.


As in your example, you cannot have START.price outside of the window 
specification; it can only go in the MEASURES clause.  Only startprice 
is allowed outside and it gets its qualification from the OVER.  Using 
w.startprice might h

Row pattern recognition

2023-06-25 Thread Tatsuo Ishii
Attached is a PoC patch to implement "Row pattern recognition" (RPR)
in SQL:2016 (I know SQL:2023 is already out, but I don't have access
to it). Actually SQL:2016 defines RPR in two places[1]:

Feature R010, “Row pattern recognition: FROM clause”
Feature R020, “Row pattern recognition: WINDOW clause”

The patch includes partial support for R020 part.

- What is RPR?

RPR provides a way to search series of data using regular expression
patterns. Suppose you have a stock database.

CREATE TABLE stock (
   company TEXT,
   tdate DATE,
   price BIGINT);

You want to find a "V-shaped" pattern: i.e. price goes down for 1 or
more days, then goes up for 1 or more days. If we try to implement the
query in PostgreSQL, it could be quite complex and inefficient.

RPR provides convenient way to implement the query.

SELECT company, tdate, price, rpr(price) OVER w FROM stock
 WINDOW w AS (
 PARTITION BY company
 ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
 PATTERN (START DOWN+ UP+)
 DEFINE
  START AS TRUE,
  UP AS price > PREV(price),
  DOWN AS price < PREV(price)
);

"PATTERN" and "DEFINE" are the key clauses of RPR. DEFINE defines 3
"row pattern variables" namely START, UP and DOWN. They are associated
with logical expressions namely "TRUE", "price > PREV(price)", and
"price < PREV(price)". Note that "PREV" function returns price column
in the previous row. So, UP is true if price is higher than previous
day. On the other hand, DOWN is true if price is lower than previous
day.  PATTERN uses the row pattern variables to create a necessary
pattern.  In this case, the first row is always match because START is
always true, and second or more rows match with "UP" ('+' is a regular
expression representing one or more), and subsequent rows match with
"DOWN".

Here is the sample output.

 company  |   tdate| price | rpr  
--++---+--
 company1 | 2023-07-01 |   100 | 
 company1 | 2023-07-02 |   200 |  200 -- 200->150->140->150
 company1 | 2023-07-03 |   150 |  150 -- 150->140->150
 company1 | 2023-07-04 |   140 | 
 company1 | 2023-07-05 |   150 |  150 -- 150->90->110->130
 company1 | 2023-07-06 |90 | 
 company1 | 2023-07-07 |   110 | 
 company1 | 2023-07-08 |   130 | 
 company1 | 2023-07-09 |   120 | 
 company1 | 2023-07-10 |   130 | 

rpr shows the first row if all the patterns are satisfied. In the
example above 200, 150, 150 are the cases.  Other rows are shown as
NULL. For example, on 2023-07-02 price starts with 200, then goes down
to 150 then 140 but goes up 150 next day.

As far as I know, only Oracle implements RPR (only R010. R020 is not
implemented) among OSS/commercial RDBMSs. There are a few DWH software
having RPR. According to [2] they are Snowflake and MS Stream
Analytics. It seems Trino is another one[3].

- Note about the patch

The current implementation is not only a subset of the standard, but
is different from it in some places. The example above is written as
follows according to the standard:

SELECT company, tdate, startprice OVER w FROM stock
 WINDOW w AS (
 PARTITION BY company
 MEASURES
  START.price AS startprice
 ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
 PATTERN (START DOWN+ UP+)
 DEFINE
  START AS TRUE,
  UP AS UP.price > PREV(UP.price),
  DOWN AS DOWN.price < PREV(DOWN.price)
);

Notice that rpr(price) is written as START.price and startprice in the
standard. MEASURES defines variable names used in the target list used
with "OVER window". As OVER only allows functions in PostgreSQL, I had
to make up a window function "rpr" which performs the row pattern
recognition task.  I was not able to find a way to implement
expressions like START.price (START is not a table alias). Any
suggestion is greatly appreciated.

The differences from the standard include:

o MEASURES is not supported
o SUBSET is not supported
o FIRST, LAST, CLASSIFIER are not supported
o PREV/NEXT in the standard accept more complex arguments
o Regular expressions other than "+" are not supported
o Only AFTER MATCH SKIP TO NEXT ROW is supported (if AFTER MATCH is
  not specified, AFTER MATCH SKIP TO NEXT ROW is assumed. In the
  standard AFTER MATCH SKIP PAST LAST ROW is assumed in this case). I
  have a plan to implement AFTER MATCH SKIP PAST LAST ROW though.
o INITIAL or SEEK are not supported ((behaves as if INITIAL is specified)
o Aggregate functions associated with window clause using RPR do not respect RPR

It seems RPR in the standard is quite complex. I think we can start
with a small subset of RPR then we could gradually enhance the
implementation.

Comments and suggestions are welcome.

[1] 
https://sqlperformance.com/2019/04/t-sql-queries/row-pattern-recognition-in-sql
[2] https://link.springer.com/article/10.1007/s13222-022-00404-3