On 2022-Nov-25, Tom Lane wrote: > After further contemplation of bug #17691 [1], I've concluded that > what I did in commit c9b0c678d was largely misguided. For one > thing, the new hlCover() algorithm no longer finds shortest-possible > cover strings: if your query is "x & y" and the text is like > "... x ... x ... y ...", then the selected cover string will run > from the first occurrence of x to the y, whereas the old algorithm > would have correctly selected "x ... y". For another thing, the > maximum-cover-length hack that I added in 78e73e875 to band-aid > over the performance issues of the original c9b0c678d patch means > that various scenarios no longer work as well as they used to, > which is the proximate cause of the complaints in bug #17691.
I came across #17556 which contains a different test for this, and I'm not sure that this patch changes things completely for the better. In that bug report, Alex Malek presents this example select ts_headline('baz baz baz ipsum ' || repeat(' foo ',4998) || 'labor', $$'ipsum' & 'labor'$$::tsquery, 'StartSel={, StopSel=}, MaxFragments=100, MaxWords=7, MinWords=3'), ts_headline('baz baz baz ipsum ' || repeat(' foo ',4999) || 'labor', $$'ipsum' & 'labor'$$::tsquery, 'StartSel={, StopSel=}, MaxFragments=100, MaxWords=7, MinWords=3'); which returns, in the current HEAD, the following ts_headline │ ts_headline ─────────────────────┼───────────── {ipsum} ... {labor} │ baz baz baz (1 fila) That is, once past the 5000 words of distance, it fails to find a good cover, but before that it returns an acceptable headline. However, after your proposed patch, we get this: ts_headline │ ts_headline ─────────────┼───────────── {ipsum} │ {ipsum} (1 fila) which is an improvement in the second case, though perhaps not as much as we would like, and definitely not an improvement in the first case. -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/ "If you have nothing to say, maybe you need just the right tool to help you not say it." (New York Times, about Microsoft PowerPoint)