Re: [HACKERS] Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?

Teodor Sigaev Wed, 15 Jun 2016 09:06:32 -0700

what's about word with several infinitives

select to_tsvector('en', 'leavings');
       to_tsvector
------------------------
  'leave':1 'leavings':1
(1 row)

select to_tsvector('en', 'leavings') @@ 'leave <0> leavings'::tsquery;
  ?column?
----------
  t
(1 row)


Second example is not correct:

select phraseto_tsquery('en', 'leavings')
will produce 'leave | leavings'

and

select phraseto_tsquery('en', 'leavings cats')
will produce 'leave <-> cat | leavings <-> cat'

which seems correct and we don't need special threating of <0>.

This brings up something else that I am not very sold on: to wit,
do we really want the "less than or equal" distance behavior at all?
The documentation gives the example that
        phraseto_tsquery('cat ate some rats')
produces
        ( 'cat' <-> 'ate' ) <2> 'rat'
because "some" is a stopword.  However, that pattern will also match
"cat ate rats", which seems surprising and unexpected to me; certainly
it would surprise a user who did not realize that "some" is a stopword.

So I think there's a reasonable case for decreeing that <N> should only
match lexemes *exactly* N apart.  If we did that, we would no longer have
the misbehavior that Jean-Pierre is complaining about, and we'd not need
to argue about whether <0> needs to be treated specially.

Agree, seems that's easy to change. I thought that I saw an issue with hyphenated word but, fortunately, I forget that hyphenated words don't share a position:

# select to_tsvector('foo-bar');
         to_tsvector
-----------------------------
 'bar':3 'foo':2 'foo-bar':1
# select phraseto_tsquery('foo-bar');
         phraseto_tsquery
-----------------------------------
 ( 'foo-bar' <-> 'foo' ) <-> 'bar'
and
# select to_tsvector('foo-bar') @@ phraseto_tsquery('foo-bar');
 ?column?
----------
 t


Patch is attached

--
Teodor Sigaev                                   E-mail: teo...@sigaev.ru
                                                   WWW: http://www.sigaev.ru/

phrase_exact_distance.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?

Reply via email to