There is no bug here. the positions are correct. If you want to use phrase queries, i wouldnt try to be so tricky with n-grams.
This never works well, and there is nothing to fix... On Sun, Apr 20, 2014 at 2:02 AM, Shawn Heisey <s...@elyograg.org> wrote: > The analysis chain on some of my Solr fieldType entries includes > CJKBigramFilterFactory on both the index and query. I had > outputUnigrams enabled on the index side, but had it disabled on the > query side. This resulted in a problem with phrase queries. This is a > subset of the index analysis for the three terms you can see in the > ICUNF step, separated by spaces. One word has been replaced with > 'redacted' ... it's in Latin1 script and there's nothing unusual about it: > > https://www.dropbox.com/s/9q1x9pdbsjhzocg/bigram-position-problem.png > > Note that in the CJKBF step, the second unigram is output at position 2, > pushing the english terms to 3 and 4. > > Imagine that the customer is doing a phrase search. What ends up > getting sent to Solr is a filter query like this: > > field:"綾瀬 haruka" > > The query analysis on this, which doesn't output unigrams, has "haruka" > at position 2. As already shown, the index analysis puts "haruka" at > position 3. The query doesn't match, because it's a phrase query and > has no phrase slop. > > I would have expected both unigrams to be at position 1. To me, it's a > bug ... or at least something that I should be able to configure on the > filter. > > If this gets sent via the main query (edismax), it all works, because I > have phrase slop enabled by default. > > The customer does not like what happens when the index and query > analyzers match, either with or without outputUnigrams. When > outputUnigrams is completely disabled, searching for a single character > doesn't match multi-character strings, and when it is enabled on both, > they get matches they did not want. > > I've already been pointed at an awesome blog series, which will > hopefully help me improve things, but I think that the customer will > still want outputUnigrams disabled on the query side, so I still have > this problem. > > http://discovery-grindstone.blogspot.com/2013/10/cjk-with-solr-for-libraries-part-1.html > > If I file an issue, should it be bug or improvement? > > Thanks, > Shawn > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org