[ https://issues.apache.org/jira/browse/LUCENE-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12892007#action_12892007 ]
Robert Muir commented on LUCENE-2458: ------------------------------------- {quote} Robert, it was your commit that changed the default behavior of Solr, and I disagree with that change. Technically, I could VETO - but I don't believe I have ever done a code-change veto, and I don't want to start now {quote} Yonik, i would rather you just VETO than heavy-commit the wrong changes. For example, if you said "robert, its annoying that for users with LUCENE_31 version in their solrconfig, I don't feel they don't have enough flexibility yet without going setting version to LUCENE_30. I feel that the parameter setting in SOLR-2015 should be incorporated into this issue" I mean, thats completely constructive! {quote} Instead, I'll try and be constructive by going to work on SOLR-2015 so we can at least configure it per-field. {quote} Man, I am willing to help with that also (though, i am not particularly a solr queryparser expert, I think we should expose these options to users that want them, instead of requiring them to depend on version-specific defaults). Just let me know how I can help, I want constructive progress. > queryparser makes all CJK queries phrase queries regardless of analyzer > ----------------------------------------------------------------------- > > Key: LUCENE-2458 > URL: https://issues.apache.org/jira/browse/LUCENE-2458 > Project: Lucene - Java > Issue Type: Bug > Components: QueryParser > Reporter: Robert Muir > Assignee: Robert Muir > Priority: Blocker > Fix For: 3.1, 4.0 > > Attachments: LUCENE-2458.patch, LUCENE-2458.patch, LUCENE-2458.patch, > LUCENE-2458.patch > > > The queryparser automatically makes *ALL* CJK, Thai, Lao, Myanmar, Tibetan, > ... queries into phrase queries, even though you didn't ask for one, and > there isn't a way to turn this off. > This completely breaks lucene for these languages, as it treats all queries > like 'grep'. > Example: if you query for f:abcd with standardanalyzer, where a,b,c,d are > chinese characters, you get a phrasequery of "a b c d". if you use cjk > analyzer, its no better, its a phrasequery of "ab bc cd", and if you use > smartchinese analyzer, you get a phrasequery like "ab cd". But the user > didn't ask for one, and they cannot turn it off. > The reason is that the code to form phrase queries is not internationally > appropriate and assumes whitespace tokenization. If more than one token comes > out of whitespace delimited text, its automatically a phrase query no matter > what. > The proposed patch fixes the core queryparser (with all backwards compat > kept) to only form phrase queries when the double quote operator is used. > Implementing subclasses can always extend the QP and auto-generate whatever > kind of queries they want that might completely break search for languages > they don't care about, but core general-purpose QPs should be language > independent. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org