Just so that it's not overlooked. I suggest a cleanup of the
(flexible?) query parser syntax in LUCENE-9528.

In short, the current javacc code is a tangled mess that is hard to
read, modify and make sense of.

https://issues.apache.org/jira/browse/LUCENE-9528

For example, these are all valid queries at the moment (flex qp):

1. assertQueryEquals("term~0.7", null, "term~1");
2. assertQueryEquals("term^3~", null, "(term~2)^3.0");
3. assertEquals(re, qp.parse("/http/~0.5", df));

The thing is:

1) fuzzy (and slop) are integers. They shouldn't parse and accept
floats, it's incorrect and misleading.
2) operator order in this case should matter: fuzzy should apply
first, boost to any other expression underneath (it has a wider
application than just term queries). This arbitrary-order syntax is
hardcoded in the parser and is wrong. This parses, for example:
term~3^3~4 and results in this query:
<fuzzy field='field' similarity='4.0' term='term'/>
3) Operators that don't apply to certain types of clauses should cause
parser exceptions. Can you guess what the query "/http/~0.5" parses
to? Looks like a regexp with a fuzzy factor, right? No, it parses to:

<fuzzy field='field' similarity='0.5' term='/http/'/>

because regexps don't allow fuzziness.

LUCENE-9528 cleans most of the above. The drawback: it is not a
backwards-compatible change (arguably this fixes parser errors, not
behavior).

Speak up if you have an opinion about not changing the above.

Dawid

[1] https://en.wikipedia.org/wiki/Tears_in_rain_monologue

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to