Just so that it's not overlooked. I suggest a cleanup of the (flexible?) query parser syntax in LUCENE-9528.
In short, the current javacc code is a tangled mess that is hard to read, modify and make sense of. https://issues.apache.org/jira/browse/LUCENE-9528 For example, these are all valid queries at the moment (flex qp): 1. assertQueryEquals("term~0.7", null, "term~1"); 2. assertQueryEquals("term^3~", null, "(term~2)^3.0"); 3. assertEquals(re, qp.parse("/http/~0.5", df)); The thing is: 1) fuzzy (and slop) are integers. They shouldn't parse and accept floats, it's incorrect and misleading. 2) operator order in this case should matter: fuzzy should apply first, boost to any other expression underneath (it has a wider application than just term queries). This arbitrary-order syntax is hardcoded in the parser and is wrong. This parses, for example: term~3^3~4 and results in this query: <fuzzy field='field' similarity='4.0' term='term'/> 3) Operators that don't apply to certain types of clauses should cause parser exceptions. Can you guess what the query "/http/~0.5" parses to? Looks like a regexp with a fuzzy factor, right? No, it parses to: <fuzzy field='field' similarity='0.5' term='/http/'/> because regexps don't allow fuzziness. LUCENE-9528 cleans most of the above. The drawback: it is not a backwards-compatible change (arguably this fixes parser errors, not behavior). Speak up if you have an opinion about not changing the above. Dawid [1] https://en.wikipedia.org/wiki/Tears_in_rain_monologue --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
