[ https://issues.apache.org/jira/browse/LUCENE-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12597069#action_12597069 ]
Hiroaki Kawai commented on LUCENE-1224: --------------------------------------- Q: Why it is necessary to index A: Because it was necessary to show how the query is performed. That is the point I wanted to address. Q: testNGrams don't test the issue A: Exactlly. it don't test the issue. I modified the test because it failed with my patch, that Token.toString() prints additional incrementPosition information. I read the existing test program, and found that current test program depends on Token.toString() method. I thought we'd better test it without Token.toString(). Current test program tests that the Token have NO positionIncrement. testIndexAndQuery is the very test that address the issue. Please don't drop it. Think the case, we want to search a word that contain "abcd" with 2-gram index. The test does searching "abcd" with 2,3-gram. We have the 2gram of abcde; 'ab', 'bc', 'cd', 'de'. Reffering the current lucene implementation, the position gap of 'ab' and 'bc' must be 1. > NGramTokenFilter creates bad TokenStream > ---------------------------------------- > > Key: LUCENE-1224 > URL: https://issues.apache.org/jira/browse/LUCENE-1224 > Project: Lucene - Java > Issue Type: Bug > Components: contrib/* > Reporter: Hiroaki Kawai > Assignee: Grant Ingersoll > Priority: Critical > Attachments: LUCENE-1224.patch, NGramTokenFilter.patch, > NGramTokenFilter.patch > > > With current trunk NGramTokenFilter(min=2,max=4) , I index "abcdef" string > into an index, but I can't query it with "abc". If I query with "ab", I can > get a hit result. > The reason is that the NGramTokenFilter generates badly ordered TokenStream. > Query is based on the Token order in the TokenStream, that how stemming or > phrase should be anlayzed is based on the order (Token.positionIncrement). > With current filter, query string "abc" is tokenized to : ab bc abc > meaning "query a string that has ab bc abc in this order". > Expected filter will generate : ab abc(positionIncrement=0) bc > meaning "query a string that has (ab|abc) bc in this order" > I'd like to submit a patch for this issue. :-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]