Re: Rethinking the implementation of ts_headline()

2023-04-06 Thread Tom Lane
Alexander Lakhin writes: > I've found that starting from commit 5a617d75 this query: > SELECT ts_headline('english', 'To be, or not to be', to_tsquery('english', > 'or')); > invokes a valgrind-detected error: > ==00:00:00:03.950 3241424== Invalid read of size 1 On my machine, I also see

Re: Rethinking the implementation of ts_headline()

2023-04-06 Thread Alexander Lakhin
Hi, 19.01.2023 19:13, Tom Lane wrote: Alvaro Herrera writes: Anyway, I don't think this needs to stop your current patch. Many thanks for looking at it! I've found that starting from commit 5a617d75 this query: SELECT ts_headline('english', 'To be, or not to be', to_tsquery('english',

Re: Rethinking the implementation of ts_headline()

2023-01-19 Thread Tom Lane
Alvaro Herrera writes: > On 2023-Jan-18, Tom Lane wrote: >> It's including hits for "day" into the cover despite the lack of any >> nearby match to "drink". > I suppose it would be possible to put 'day' and 'drink' in two different > fragments: since the query has a & operator for them, the

Re: Rethinking the implementation of ts_headline()

2023-01-19 Thread Alvaro Herrera
On 2023-Jan-18, Tom Lane wrote: > Alvaro Herrera writes: > > and was surprised that the match for the 'day & drink' arm of the OR > > disappears from the reported headline. > > I'd argue that that's exactly what should happen. It's supposed to > find as-short-as-possible cover strings that

Re: Rethinking the implementation of ts_headline()

2023-01-18 Thread Tom Lane
Alvaro Herrera writes: > I tried this other test, based on looking at the new regression tests > you added, > SELECT ts_headline('english', ' > Day after day, day after day, > We stuck, nor breath nor motion, > As idle as a painted Ship > Upon a painted Ocean. > Water, water, every where >

Re: Rethinking the implementation of ts_headline()

2023-01-18 Thread Alvaro Herrera
I tried this other test, based on looking at the new regression tests you added, SELECT ts_headline('english', ' Day after day, day after day, We stuck, nor breath nor motion, As idle as a painted Ship Upon a painted Ocean. Water, water, every where And all the boards did shrink; Water,

Re: Rethinking the implementation of ts_headline()

2023-01-17 Thread Alvaro Herrera
On 2023-Jan-16, Tom Lane wrote: > I get this with the patch: > > ts_headline | ts_headline > -+- > {ipsum} ... {labor} | {ipsum} ... {labor} > (1 row) > > which is what I'd expect, because it removes the artificial limit on > cover

Re: Rethinking the implementation of ts_headline()

2023-01-16 Thread Tom Lane
Alvaro Herrera writes: > I came across #17556 which contains a different test for this, and I'm > not sure that this patch changes things completely for the better. Thanks for looking at my patch. However ... > That is, once past the 5000 words of distance, it fails to find a good > cover, but

Re: Rethinking the implementation of ts_headline()

2023-01-16 Thread Alvaro Herrera
On 2022-Nov-25, Tom Lane wrote: > After further contemplation of bug #17691 [1], I've concluded that > what I did in commit c9b0c678d was largely misguided. For one > thing, the new hlCover() algorithm no longer finds shortest-possible > cover strings: if your query is "x & y" and the text is

Rethinking the implementation of ts_headline()

2022-11-25 Thread Tom Lane
After further contemplation of bug #17691 [1], I've concluded that what I did in commit c9b0c678d was largely misguided. For one thing, the new hlCover() algorithm no longer finds shortest-possible cover strings: if your query is "x & y" and the text is like "... x ... x ... y ...", then the