[ https://issues.apache.org/jira/browse/SOLR-9185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16416051#comment-16416051 ]
Dean Gurvitz commented on SOLR-9185: ------------------------------------ I missed that comment. Anyways, I just think that we should be more careful with such changes in minor versions, and at least explicitly mention them in the changes.txt file for those who with to upgrade their version. > Solr's edismax and "Lucene"/standard query parsers should optionally not > split on whitespace before sending terms to analysis > ----------------------------------------------------------------------------------------------------------------------------- > > Key: SOLR-9185 > URL: https://issues.apache.org/jira/browse/SOLR-9185 > Project: Solr > Issue Type: New Feature > Reporter: Steve Rowe > Assignee: Steve Rowe > Priority: Major > Fix For: 6.5, 7.0 > > Attachments: SOLR-9185.patch, SOLR-9185.patch, SOLR-9185.patch, > SOLR-9185.patch > > > Copied from LUCENE-2605: > The queryparser parses input on whitespace, and sends each whitespace > separated term to its own independent token stream. > This breaks the following at query-time, because they can't see across > whitespace boundaries: > n-gram analysis > shingles > synonyms (especially multi-word for whitespace-separated languages) > languages where a 'word' can contain whitespace (e.g. vietnamese) > Its also rather unexpected, as users think their > charfilters/tokenizers/tokenfilters will do the same thing at index and > querytime, but > in many cases they can't. Instead, preferably the queryparser would parse > around only real 'operators'. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org