[jira] Updated: (LUCENE-1224) NGramTokenFilter creates bad TokenStream
[ https://issues.apache.org/jira/browse/LUCENE-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hiroaki Kawai updated LUCENE-1224: -- Attachment: LUCENE-1224.patch Patch updated with unit test. LUCENE-1225 is easier to understand this problem. This patch also includes token filter issues that is more complicated. NGramTokenFilter creates bad TokenStream Key: LUCENE-1224 URL: https://issues.apache.org/jira/browse/LUCENE-1224 Project: Lucene - Java Issue Type: Bug Components: contrib/* Reporter: Hiroaki Kawai Assignee: Grant Ingersoll Priority: Critical Attachments: LUCENE-1224.patch, NGramTokenFilter.patch, NGramTokenFilter.patch With current trunk NGramTokenFilter(min=2,max=4) , I index abcdef string into an index, but I can't query it with abc. If I query with ab, I can get a hit result. The reason is that the NGramTokenFilter generates badly ordered TokenStream. Query is based on the Token order in the TokenStream, that how stemming or phrase should be anlayzed is based on the order (Token.positionIncrement). With current filter, query string abc is tokenized to : ab bc abc meaning query a string that has ab bc abc in this order. Expected filter will generate : ab abc(positionIncrement=0) bc meaning query a string that has (ab|abc) bc in this order I'd like to submit a patch for this issue. :-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1224) NGramTokenFilter creates bad TokenStream
[ https://issues.apache.org/jira/browse/LUCENE-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hiroaki Kawai updated LUCENE-1224: -- Attachment: NGramTokenFilter.patch Modified to set a right start/end offset value in Token properties. NGramTokenFilter creates bad TokenStream Key: LUCENE-1224 URL: https://issues.apache.org/jira/browse/LUCENE-1224 Project: Lucene - Java Issue Type: Bug Components: contrib/* Reporter: Hiroaki Kawai Priority: Critical Attachments: NGramTokenFilter.patch, NGramTokenFilter.patch With current trunk NGramTokenFilter(min=2,max=4) , I index abcdef string into an index, but I can't query it with abc. If I query with ab, I can get a hit result. The reason is that the NGramTokenFilter generates badly ordered TokenStream. Query is based on the Token order in the TokenStream, that how stemming or phrase should be anlayzed is based on the order (Token.positionIncrement). With current filter, query string abc is tokenized to : ab bc abc meaning query a string that has ab bc abc in this order. Expected filter will generate : ab abc(positionIncrement=0) bc meaning query a string that has (ab|abc) bc in this order I'd like to submit a patch for this issue. :-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1224) NGramTokenFilter creates bad TokenStream
[ https://issues.apache.org/jira/browse/LUCENE-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hiroaki Kawai updated LUCENE-1224: -- Attachment: NGramTokenFilter.patch NGramTokenFilter creates bad TokenStream Key: LUCENE-1224 URL: https://issues.apache.org/jira/browse/LUCENE-1224 Project: Lucene - Java Issue Type: Bug Components: contrib/* Reporter: Hiroaki Kawai Priority: Critical Attachments: NGramTokenFilter.patch With current trunk NGramTokenFilter(min=2,max=4) , I index abcdef string into an index, but I can't query it with abc. If I query with ab, I can get a hit result. The reason is that the NGramTokenFilter generates badly ordered TokenStream. Query is based on the Token order in the TokenStream, that how stemming or phrase should be anlayzed is based on the order (Token.positionIncrement). With current filter, query string abc is tokenized to : ab bc abc meaning query a string that has ab bc abc in this order. Expected filter will generate : ab abc(positionIncrement=0) bc meaning query a string that has (ab|abc) bc in this order I'd like to submit a patch for this issue. :-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]