what's about word with several infinitivesselect to_tsvector('en', 'leavings'); to_tsvector ------------------------ 'leave':1 'leavings':1 (1 row)select to_tsvector('en', 'leavings') @@ 'leave <0> leavings'::tsquery; ?column? ---------- t (1 row)
Second example is not correct: select phraseto_tsquery('en', 'leavings') will produce 'leave | leavings' and select phraseto_tsquery('en', 'leavings cats') will produce 'leave <-> cat | leavings <-> cat' which seems correct and we don't need special threating of <0>.
This brings up something else that I am not very sold on: to wit, do we really want the "less than or equal" distance behavior at all? The documentation gives the example that phraseto_tsquery('cat ate some rats') produces ( 'cat' <-> 'ate' ) <2> 'rat' because "some" is a stopword. However, that pattern will also match "cat ate rats", which seems surprising and unexpected to me; certainly it would surprise a user who did not realize that "some" is a stopword. So I think there's a reasonable case for decreeing that <N> should only match lexemes *exactly* N apart. If we did that, we would no longer have the misbehavior that Jean-Pierre is complaining about, and we'd not need to argue about whether <0> needs to be treated specially.
Agree, seems that's easy to change. I thought that I saw an issue with hyphenated word but, fortunately, I forget that hyphenated words don't share a position:
# select to_tsvector('foo-bar'); to_tsvector ----------------------------- 'bar':3 'foo':2 'foo-bar':1 # select phraseto_tsquery('foo-bar'); phraseto_tsquery ----------------------------------- ( 'foo-bar' <-> 'foo' ) <-> 'bar' and # select to_tsvector('foo-bar') @@ phraseto_tsquery('foo-bar'); ?column? ---------- t Patch is attached -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/
phrase_exact_distance.patch
Description: binary/octet-stream
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers