[ https://issues.apache.org/jira/browse/LUCENE-5180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael McCandless updated LUCENE-5180: --------------------------------------- Attachment: LUCENE-5180.patch Patch; it turned out to be easier than I expected: I just tapped into the existing logic that ShingleFilter has for handling holes between tokens. > ShingleFilter should make shingles from trailing holes > ------------------------------------------------------ > > Key: LUCENE-5180 > URL: https://issues.apache.org/jira/browse/LUCENE-5180 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis > Reporter: Michael McCandless > Assignee: Michael McCandless > Fix For: 5.0, 4.6 > > Attachments: LUCENE-5180.patch > > > When ShingleFilter hits a hole, it uses _ as the token, e.g. bigrams for "the > dog barked", if you have a StopFilter removing the, would be: "_ dog", "dog > barked". > But if the input ends with a stopword, e.g. "wizard of", ShingleFilter fails > to produce "wizard _" due to LUCENE-3849 ... once we fix that I think we > should fix ShingleFilter to make shingles for trailing holes too ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org