[ https://issues.apache.org/jira/browse/LUCENE-400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Steven Rowe updated LUCENE-400: ------------------------------- Attachment: LUCENE-400.patch Repackaged these four files as a patch, with the following modifications to the code: * Renamed files and variables to refer to "n-grams" as "shingles", to avoid confusion with the character-level n-gram code already in Lucene's sandbox * Placed code in the o.a.l.analysis.shingle package * Converted commons-collections FIFO usages to LinkedLists * Removed @author from javadocs * Changed deprecated Lucene API usages to alternate forms; addressed all compilation warnings * Changed code style to conform to Lucene conventions * Changed field setters to return null instead of a reference to the class instance, then changed instantiations to use individual setter calls instead of the chained calling style * Added ASF license to each file All tests pass. Although I left in the ShingleAnalyzerWrapper and its test in the patch, no other Lucene filter (AFAICT) has such a filter wrapping facility. My vote is to remove these two files. > NGramFilter -- construct n-grams from a TokenStream > --------------------------------------------------- > > Key: LUCENE-400 > URL: https://issues.apache.org/jira/browse/LUCENE-400 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis > Affects Versions: unspecified > Environment: Operating System: All > Platform: All > Reporter: Sebastian Kirsch > Priority: Minor > Attachments: LUCENE-400.patch, NGramAnalyzerWrapper.java, > NGramAnalyzerWrapperTest.java, NGramFilter.java, NGramFilterTest.java > > > This filter constructs n-grams (token combinations up to a fixed size, > sometimes > called "shingles") from a token stream. > The filter sets start offsets, end offsets and position increments, so > highlighting and phrase queries should work. > Position increments > 1 in the input stream are replaced by filler tokens > (tokens with termText "_" and endOffset - startOffset = 0) in the output > n-grams. (Position increments > 1 in the input stream are usually caused by > removing some tokens, eg. stopwords, from a stream.) > The filter uses CircularFifoBuffer and UnboundedFifoBuffer from Apache > Commons-Collections. > Filter, test case and an analyzer are attached. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]