Re: Row pattern recognition

Jacob Champion Tue, 01 Oct 2024 09:25:21 -0700

On Sun, Sep 29, 2024 at 5:08 PM Tatsuo Ishii <is...@postgresql.org> wrote:
> > I think implementing MEASURES is challenging. Especially we need to
> > find how our parser accepts "colname OVER
> > window_definition". Currently PostgreSQL's parser only accepts "func()
> > OVER window_definition" Even it is technically possible, I think the
> > v1 patch size will become much larger than now due to this.


[resending, to the whole list this time]

Yeah. In any case, I'm not the right person to bolt MEASURES onto the
existing grammar... my misadventures in the PATTERN parser have
highlighted how little I know about Bison. :D

> Please disregard my proposal. Even if we make such a function, it
> would always return NULL for unmatched rows or skipped rows, and I
> think the function does not solve your problem.
>
> However, I wonder if supporting MEASURES solves the problem either
> because any columns defined by MEASURES will return NULL except the
> first row in a reduced frame. Can you please show an example how to
> identify runs of matched rows using MEASURES?

I think you're probably right; my suggestion can't distinguish between
skipped (but previously matched) rows and entirely-unmatched rows. The
test case I'd been working with returned an empty match as a fallback,
so it wouldn't have had that problem in practice. I was hoping that
one of the existing whole-partition window functions would allow me to
cobble something together based on the COUNT(*) measure, but after
searching for a while I haven't been able to come up with a solution.

Maybe it's just too niche for the window-function version of this --
after all, it only makes sense when using both INITIAL and AFTER MATCH
SKIP PAST LAST ROW. A more general solution could identify the
row_number of the first and last rows of the window frame, perhaps?
But a frame isn't guaranteed to be contiguous, so maybe that doesn't
make sense either. Ugh.

> I wonder how Oracle solves the problem (an infinite set of possible
> matches) without using "--max-rows" or something like that because in
> my understanding Oracle supports the regular expressions and PERMUTE.

I chose a confusing way to describe it, sorry. The parenthesized
language for a pattern can be an infinite set, because A+ could match
"( A )" or "( A A )" or "( A A A )" and so on forever. But that
doesn't apply to our regex engine in practice; our tables have a
finite number of rows, and I *think* the PL for a finite number of
rows is also finite, due to the complicated rules on where empty
matches are allowed to appear in the language. (In any case, my tool
doesn't guard against infinite recursion...)

> >> My implementation is really messy -- it leaks memory like a sieve, and
> >> I cannibalized the parser from ECPG, which just ended up as an
> >> exercise in teaching myself flex/bison. But if there's interest in
> >> having this kind of tool in the tree, I can work on making it
> >> reviewable. Either way, I should be able to use it to double-check
> >> more complicated test cases.
>
> I definitely am interested in the tool!

Okay, good to know! I will need to clean it up considerably, and
figure out whether I've duplicated more code than I should have.

> >> A while back [2], you were wondering whether our Bison implementation
> >> would be able to parse the PATTERN grammar directly. I think this tool
> >> proves that the answer is "yes", but PERMUTE in particular causes a
> >> shift/reduce conflict. To fix it, I applied the same precedence
> >> workaround that we use for CUBE and ROLLUP.
>
> That's a good news!

+1

Thanks,
--Jacob

Re: Row pattern recognition

Reply via email to