On 4/20/2014 12:00 PM, Robert Muir wrote: > On Sun, Apr 20, 2014 at 1:53 PM, Shawn Heisey <s...@elyograg.org> wrote: >> On 4/20/2014 11:10 AM, Robert Muir wrote: >> What is "n" in what you wrote above? > This is just the mathematics, its the "n" of the n-gram. You should > only really ever have a fixed value of this for a field, otherwise the > positions are confusing. > > There is nothing this filter can do to change this mathematical fact.
At first I was confused as to why this would be a problem, but then I realized that I had only considered the two-character case. If the input is three or more characters, then you have to decide whether unigrams in the middle of the string get assigned to same position as the first bigram or the second. In that situation, the only reasonable thing to do is keep the second unigram with the second bigram -- exactly what the filter does. Does the two-character case need to be treated differently here? If so, it is probably something that should be configurable. Thanks, Shawn --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org