[ https://issues.apache.org/jira/browse/LUCENE-759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Otis Gospodnetic reopened LUCENE-759: ------------------------------------- Lucene Fields: [New, Patch Available] (was: [New]) Reopening, because I'm bringing in Adam Hiatt's modifications that he uploaded in a patch for SOLR-81. Adam's changes allow this tokenizer to create n-grams whose sizes are specified as a min-max range. This patch fixes a bug in Adam's code, but has another bug that I don't know how to fix now. Adam's bug: input: abcde minGram: 1 maxGram: 3 output: a ab abc -- and this is where tokenizing stopped, which was wrong, it should have continued: b bc bcd c cd cde d de e Otis' bug: input: abcde minGeam: 1 maxGram: 3 output: e de cde d cd bcd c bc abc b ab -- and this is where tokenizing stops, which is wrong, it should generate one more n-gram: a This bug won't hurt SOLR-81, but it should be fixed. > Add n-gram tokenizers to contrib/analyzers > ------------------------------------------ > > Key: LUCENE-759 > URL: https://issues.apache.org/jira/browse/LUCENE-759 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis > Reporter: Otis Gospodnetic > Priority: Minor > Attachments: LUCENE-759.patch > > > It would be nice to have some n-gram-capable tokenizers in contrib/analyzers. > Patch coming shortly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]