Hi hackers,

Thank you for the excellent report! I am glad to see that PostgreSQL
> RPR is much faster than Trino, especially in the match failure cases.
>
> Is it possible to share the data generation script and the query for
> PostgreSQL so that I could locally perform the tests?


Here is a self-contained guide to reproduce the RPR ABCD pattern test
locally
using PostgreSQL, Trino, and Oracle.


Requirements:

PostgreSQL 19devel build with the RPR patch applied (local)

Trino and Oracle can be set up via Docker (see README for details):

  https://github.com/assam258-5892/docker-databases

Start all services:

  cd ~/docker-databases && docker compose up -d trino-service oracle-service

Note: docker compose up does not pull images automatically if they are
already
cached locally. To fetch the latest images, run docker compose pull first.

Note: The Oracle image requires an Oracle account. Register at
https://container-registry.oracle.com, accept the license for the database
image, then log in before pulling:

  docker login container-registry.oracle.com

Connect to each database shell:

  Trino:  docker compose exec -it trino-service trino
  Oracle: docker compose exec -it oracle-service sqlplus / as sysdba


Step 1: Create the test table and data

PostgreSQL (1x scale, 20,000 rows):

DROP TABLE IF EXISTS abcd_test;
CREATE TABLE abcd_test AS
SELECT v,
       CASE
           WHEN v % 10000 < 3333 THEN 'A'
           WHEN v % 10000 >= 3333 AND v % 10000 < 6666 THEN 'B'
           WHEN v % 10000 >= 6666 AND v % 10000 < 9999 THEN 'C'
           WHEN v % 10000 = 9999 THEN 'D'
       END AS cat
FROM generate_series(0, 19999) AS v;

ANALYZE abcd_test;


Trino (1x scale, 20,000 rows):

CREATE SCHEMA IF NOT EXISTS memory.test;

DROP TABLE IF EXISTS memory.test.abcd_test;
CREATE TABLE memory.test.abcd_test AS
WITH nums AS (
    SELECT a.v * 10000 + b.v AS v
    FROM UNNEST(sequence(0, 1)) AS a(v)
    CROSS JOIN UNNEST(sequence(0, 9999)) AS b(v)
)
SELECT CAST(v AS INTEGER) AS v,
       CASE
           WHEN v % 10000 < 3333 THEN 'A'
           WHEN v % 10000 >= 3333 AND v % 10000 < 6666 THEN 'B'
           WHEN v % 10000 >= 6666 AND v % 10000 < 9999 THEN 'C'
           WHEN v % 10000 = 9999 THEN 'D'
       END AS cat
FROM nums;

Note: Trino sequence() is limited to 10,000 elements per call, so a CROSS
JOIN
is used. For scale Sx, change sequence(0, 1) to sequence(0, S*2-1).


Oracle (1x scale, 20,000 rows):

DROP TABLE abcd_test PURGE;
CREATE TABLE abcd_test AS
SELECT v,
       CASE
           WHEN MOD(v, 10000) < 3333 THEN 'A'
           WHEN MOD(v, 10000) >= 3333 AND MOD(v, 10000) < 6666 THEN 'B'
           WHEN MOD(v, 10000) >= 6666 AND MOD(v, 10000) < 9999 THEN 'C'
           WHEN MOD(v, 10000) = 9999 THEN 'D'
       END AS cat
FROM (SELECT LEVEL - 1 AS v FROM dual CONNECT BY LEVEL <= 20000);

EXEC DBMS_STATS.GATHER_TABLE_STATS(USER, 'ABCD_TEST');


Verify data distribution (all engines):

SELECT cat, COUNT(*) AS cnt FROM abcd_test GROUP BY cat ORDER BY cat;

Expected (1x): A=6666, B=6666, C=6666, D=2


Step 2: Run Test 1 — A+ B+ C+ D (match expected)

Expected: 2 rows returned (one match per segment)

PostgreSQL:

SELECT match_first, match_last, match_len
FROM (
    SELECT v,
           first_value(v) OVER w AS match_first,
           last_value(v) OVER w AS match_last,
           count(*) OVER w AS match_len
    FROM abcd_test
    WINDOW w AS (
        ORDER BY v
        ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
        AFTER MATCH SKIP PAST LAST ROW
        PATTERN (A+ B+ C+ D)
        DEFINE
            A AS cat = 'A',
            B AS cat = 'B',
            C AS cat = 'C',
            D AS cat = 'D'
    )
) result
WHERE match_len > 0;


Trino / Oracle:

SELECT match_first, match_last, match_len
FROM abcd_test
MATCH_RECOGNIZE (
    ORDER BY v
    MEASURES
        FIRST(v) AS match_first,
        LAST(v) AS match_last,
        COUNT(*) AS match_len
    ONE ROW PER MATCH
    AFTER MATCH SKIP PAST LAST ROW
    PATTERN (A+ B+ C+ D)
    DEFINE
        A AS cat = 'A',
        B AS cat = 'B',
        C AS cat = 'C',
        D AS cat = 'D'
) mr;

Note: Trino uses memory.test.abcd_test as the table name.

Expected result (1x):

match_first | match_last | match_len
------------|------------|----------
          0 |       9999 |     10000
      10000 |      19999 |     10000


Step 3: Run Test 2 — A+ B+ C+ E (match failure)

Expected: 0 rows (E does not exist)

Use the same queries as Test 1 with two changes:

PATTERN: (A+ B+ C+ D) → (A+ B+ C+ E)

DEFINE: cat = 'D' → cat = 'E'

Warning: Trino Test 2 at 1x scale takes approximately 5-6 minutes.


Step 4: Scale up (optional)

Re-create the test table at 2x scale (40,000 rows) and then repeat Step 2
and
Step 3.

PostgreSQL (2x scale, 40,000 rows):

DROP TABLE IF EXISTS abcd_test;
CREATE TABLE abcd_test AS
SELECT v,
       CASE
           WHEN v % 20000 < 6666 THEN 'A'
           WHEN v % 20000 >= 6666 AND v % 20000 < 13332 THEN 'B'
           WHEN v % 20000 >= 13332 AND v % 20000 < 19999 THEN 'C'
           WHEN v % 20000 = 19999 THEN 'D'
       END AS cat
FROM generate_series(0, 39999) AS v;

ANALYZE abcd_test;


Trino (2x scale, 40,000 rows):

CREATE SCHEMA IF NOT EXISTS memory.test;

DROP TABLE IF EXISTS memory.test.abcd_test;
CREATE TABLE memory.test.abcd_test AS
WITH nums AS (
    SELECT a.v * 10000 + b.v AS v
    FROM UNNEST(sequence(0, 3)) AS a(v)
    CROSS JOIN UNNEST(sequence(0, 9999)) AS b(v)
)
SELECT CAST(v AS INTEGER) AS v,
       CASE
           WHEN v % 20000 < 6666 THEN 'A'
           WHEN v % 20000 >= 6666 AND v % 20000 < 13332 THEN 'B'
           WHEN v % 20000 >= 13332 AND v % 20000 < 19999 THEN 'C'
           WHEN v % 20000 = 19999 THEN 'D'
       END AS cat
FROM nums;


Oracle (2x scale, 40,000 rows):

DROP TABLE abcd_test PURGE;
CREATE TABLE abcd_test AS
SELECT v,
       CASE
           WHEN MOD(v, 20000) < 6666 THEN 'A'
           WHEN MOD(v, 20000) >= 6666 AND MOD(v, 20000) < 13332 THEN 'B'
           WHEN MOD(v, 20000) >= 13332 AND MOD(v, 20000) < 19999 THEN 'C'
           WHEN MOD(v, 20000) = 19999 THEN 'D'
       END AS cat
FROM (SELECT LEVEL - 1 AS v FROM dual CONNECT BY LEVEL <= 40000);

EXEC DBMS_STATS.GATHER_TABLE_STATS(USER, 'ABCD_TEST');

Expected (2x): A=13332, B=13332, C=13332, D=4

Then run Step 2 and Step 3 as-is.

Warning: Trino Test 2 at 2x scale takes approximately 20-25 minutes.


Please let me know if you encounter any issues reproducing this.

Best regards
SungJun

Reply via email to