[ https://issues.apache.org/jira/browse/LUCENE-2605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Steve Rowe updated LUCENE-2605: ------------------------------- Attachment: LUCENE-2605.patch Okay, really final patch. On SOLR-9185 I was having trouble integrating the Solr standard QP's comment support with the whitespace tokenization I introduced here, so I tried switching the Solr parser back to ignoring both whitespace and comments, and it worked. The patch brings this grammar simplification back here too - in addition to many fewer whitespace mentions in the rules, fewer (and less complicated) lookaheads are required. I've included the generated files in the patch. No tests changed from the last patch. All Lucene tests pass, and precommit passes. > queryparser parses on whitespace > -------------------------------- > > Key: LUCENE-2605 > URL: https://issues.apache.org/jira/browse/LUCENE-2605 > Project: Lucene - Core > Issue Type: Bug > Components: core/queryparser > Reporter: Robert Muir > Assignee: Steve Rowe > Attachments: LUCENE-2605.patch, LUCENE-2605.patch, LUCENE-2605.patch, > LUCENE-2605.patch, LUCENE-2605.patch, LUCENE-2605.patch > > > The queryparser parses input on whitespace, and sends each whitespace > separated term to its own independent token stream. > This breaks the following at query-time, because they can't see across > whitespace boundaries: > * n-gram analysis > * shingles > * synonyms (especially multi-word for whitespace-separated languages) > * languages where a 'word' can contain whitespace (e.g. vietnamese) > Its also rather unexpected, as users think their > charfilters/tokenizers/tokenfilters will do the same thing at index and > querytime, but > in many cases they can't. Instead, preferably the queryparser would parse > around only real 'operators'. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org