I wrote: > It's worth noting that with these rules, phrase searches will act as > though "!x" always matches somewhere; for instance "!a <-> !b" will match > any tsvector. I argue that this is not wrong, not even if the tsvector is > empty: there could have been adjacent stopwords matching !a and !b in the > original text. Since we've adjusted the phrase matching rules to treat > stopwords as unknown-but-present words in a phrase, I think this is > consistent. It's also pretty hard to assert this is wrong and at the same > time accept "!a <-> b" matching b at the start of the document.
To clarify this point, I'm imagining that the patch would include documentation changes like the attached. regards, tom lane
diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml index 67d0c34..464ce83 100644 *** a/doc/src/sgml/datatype.sgml --- b/doc/src/sgml/datatype.sgml *************** SELECT 'fat & rat & ! cat'::tsqu *** 3959,3973 **** tsquery ------------------------ 'fat' & 'rat' & !'cat' - - SELECT '(fat | rat) <-> cat'::tsquery; - tsquery - ----------------------------------- - 'fat' <-> 'cat' | 'rat' <-> 'cat' </programlisting> - - The last example demonstrates that <type>tsquery</type> sometimes - rearranges nested operators into a logically equivalent formulation. </para> <para> --- 3959,3965 ---- diff --git a/doc/src/sgml/textsearch.sgml b/doc/src/sgml/textsearch.sgml index 2da7595..bc33a70 100644 *** a/doc/src/sgml/textsearch.sgml --- b/doc/src/sgml/textsearch.sgml *************** text @@ text *** 323,328 **** --- 323,330 ---- at least one of its arguments must appear, while the <literal>!</> (NOT) operator specifies that its argument must <emphasis>not</> appear in order to have a match. + For example, the query <literal>fat & ! rat</> matches documents that + contain <literal>fat</> but not <literal>rat</>. </para> <para> *************** SELECT phraseto_tsquery('the cats ate th *** 377,382 **** --- 379,401 ---- then <literal>&</literal>, then <literal><-></literal>, and <literal>!</literal> most tightly. </para> + + <para> + It's worth noticing that the AND/OR/NOT operators mean something subtly + different when they are within the arguments of a FOLLOWED BY operator + than when they are not, because then the position of the match is + significant. Normally, <literal>!x</> matches only documents that do not + contain <literal>x</> anywhere. But <literal>x <-> !y</> + matches <literal>x</> if it is not immediately followed by <literal>y</>; + an occurrence of <literal>y</> elsewhere in the document does not prevent + a match. Another example is that <literal>x & y</> normally only + requires that <literal>x</> and <literal>y</> both appear somewhere in the + document, but <literal>(x & y) <-> z</> requires <literal>x</> + and <literal>y</> to match at the same place, immediately before + a <literal>z</>. Thus this query behaves differently from <literal>x + <-> z & y <-> z</>, which would match a document + containing two separate sequences <literal>x z</> and <literal>y z</>. + </para> </sect2> <sect2 id="textsearch-intro-configurations">
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers