[jira] Commented: (LUCENE-400) NGramFilter -- construct n-grams from a TokenStream
[ https://issues.apache.org/jira/browse/LUCENE-400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12579933#action_12579933 ] Steven Rowe commented on LUCENE-400: re-ping, Otis, do you still plan to commit? > NGramFilter -- construct n-grams from a TokenStream > --- > > Key: LUCENE-400 > URL: https://issues.apache.org/jira/browse/LUCENE-400 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis >Affects Versions: unspecified > Environment: Operating System: All > Platform: All >Reporter: Sebastian Kirsch >Assignee: Otis Gospodnetic >Priority: Minor > Fix For: 2.4 > > Attachments: LUCENE-400.patch, NGramAnalyzerWrapper.java, > NGramAnalyzerWrapperTest.java, NGramFilter.java, NGramFilterTest.java > > > This filter constructs n-grams (token combinations up to a fixed size, > sometimes > called "shingles") from a token stream. > The filter sets start offsets, end offsets and position increments, so > highlighting and phrase queries should work. > Position increments > 1 in the input stream are replaced by filler tokens > (tokens with termText "_" and endOffset - startOffset = 0) in the output > n-grams. (Position increments > 1 in the input stream are usually caused by > removing some tokens, eg. stopwords, from a stream.) > The filter uses CircularFifoBuffer and UnboundedFifoBuffer from Apache > Commons-Collections. > Filter, test case and an analyzer are attached. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-400) NGramFilter -- construct n-grams from a TokenStream
[ https://issues.apache.org/jira/browse/LUCENE-400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12574493#action_12574493 ] Grant Ingersoll commented on LUCENE-400: ping, Otis, do you still plan to commit? > NGramFilter -- construct n-grams from a TokenStream > --- > > Key: LUCENE-400 > URL: https://issues.apache.org/jira/browse/LUCENE-400 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis >Affects Versions: unspecified > Environment: Operating System: All > Platform: All >Reporter: Sebastian Kirsch >Assignee: Otis Gospodnetic >Priority: Minor > Fix For: 2.4 > > Attachments: LUCENE-400.patch, NGramAnalyzerWrapper.java, > NGramAnalyzerWrapperTest.java, NGramFilter.java, NGramFilterTest.java > > > This filter constructs n-grams (token combinations up to a fixed size, > sometimes > called "shingles") from a token stream. > The filter sets start offsets, end offsets and position increments, so > highlighting and phrase queries should work. > Position increments > 1 in the input stream are replaced by filler tokens > (tokens with termText "_" and endOffset - startOffset = 0) in the output > n-grams. (Position increments > 1 in the input stream are usually caused by > removing some tokens, eg. stopwords, from a stream.) > The filter uses CircularFifoBuffer and UnboundedFifoBuffer from Apache > Commons-Collections. > Filter, test case and an analyzer are attached. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-400) NGramFilter -- construct n-grams from a TokenStream
[ https://issues.apache.org/jira/browse/LUCENE-400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12558717#action_12558717 ] Steven Rowe commented on LUCENE-400: Removed the duplicate link (to LUCENE-759), since that issue is about character-level n-grams, and this issue is about word-level n-grams. > NGramFilter -- construct n-grams from a TokenStream > --- > > Key: LUCENE-400 > URL: https://issues.apache.org/jira/browse/LUCENE-400 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis >Affects Versions: unspecified > Environment: Operating System: All > Platform: All >Reporter: Sebastian Kirsch >Priority: Minor > Fix For: 2.4 > > Attachments: LUCENE-400.patch, NGramAnalyzerWrapper.java, > NGramAnalyzerWrapperTest.java, NGramFilter.java, NGramFilterTest.java > > > This filter constructs n-grams (token combinations up to a fixed size, > sometimes > called "shingles") from a token stream. > The filter sets start offsets, end offsets and position increments, so > highlighting and phrase queries should work. > Position increments > 1 in the input stream are replaced by filler tokens > (tokens with termText "_" and endOffset - startOffset = 0) in the output > n-grams. (Position increments > 1 in the input stream are usually caused by > removing some tokens, eg. stopwords, from a stream.) > The filter uses CircularFifoBuffer and UnboundedFifoBuffer from Apache > Commons-Collections. > Filter, test case and an analyzer are attached. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-400) NGramFilter -- construct n-grams from a TokenStream
[ https://issues.apache.org/jira/browse/LUCENE-400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12558350#action_12558350 ] Steven Rowe commented on LUCENE-400: Lucene has *character* NGram support, but not *word* NGram support, which this filter supplies: bq. This filter constructs n-grams (token combinations up to a fixed size, sometimes called "shingles") from a token stream. > NGramFilter -- construct n-grams from a TokenStream > --- > > Key: LUCENE-400 > URL: https://issues.apache.org/jira/browse/LUCENE-400 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis >Affects Versions: unspecified > Environment: Operating System: All > Platform: All >Reporter: Sebastian Kirsch >Priority: Minor > Attachments: NGramAnalyzerWrapper.java, > NGramAnalyzerWrapperTest.java, NGramFilter.java, NGramFilterTest.java > > > This filter constructs n-grams (token combinations up to a fixed size, > sometimes > called "shingles") from a token stream. > The filter sets start offsets, end offsets and position increments, so > highlighting and phrase queries should work. > Position increments > 1 in the input stream are replaced by filler tokens > (tokens with termText "_" and endOffset - startOffset = 0) in the output > n-grams. (Position increments > 1 in the input stream are usually caused by > removing some tokens, eg. stopwords, from a stream.) > The filter uses CircularFifoBuffer and UnboundedFifoBuffer from Apache > Commons-Collections. > Filter, test case and an analyzer are attached. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-400) NGramFilter -- construct n-grams from a TokenStream
[ http://issues.apache.org/jira/browse/LUCENE-400?page=comments#action_12426327 ] Sebastian Kirsch commented on LUCENE-400: - Hi Otis, I did not figure out the problem. Getting rid of Commons Collection should be no problem; I am just using them as FIFOs. However, I do not have the time at the moment to implement this. Kind regards, Sebastian > NGramFilter -- construct n-grams from a TokenStream > --- > > Key: LUCENE-400 > URL: http://issues.apache.org/jira/browse/LUCENE-400 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis >Affects Versions: unspecified > Environment: Operating System: All > Platform: All >Reporter: Sebastian Kirsch > Assigned To: Lucene Developers >Priority: Minor > Attachments: NGramAnalyzerWrapper.java, > NGramAnalyzerWrapperTest.java, NGramFilter.java, NGramFilterTest.java > > > This filter constructs n-grams (token combinations up to a fixed size, > sometimes > called "shingles") from a token stream. > The filter sets start offsets, end offsets and position increments, so > highlighting and phrase queries should work. > Position increments > 1 in the input stream are replaced by filler tokens > (tokens with termText "_" and endOffset - startOffset = 0) in the output > n-grams. (Position increments > 1 in the input stream are usually caused by > removing some tokens, eg. stopwords, from a stream.) > The filter uses CircularFifoBuffer and UnboundedFifoBuffer from Apache > Commons-Collections. > Filter, test case and an analyzer are attached. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-400) NGramFilter -- construct n-grams from a TokenStream
[ http://issues.apache.org/jira/browse/LUCENE-400?page=comments#action_12426327 ] Sebastian Kirsch commented on LUCENE-400: - Hi Otis, I did not figure out the problem. Getting rid of Commons Collection should be no problem; I am just using them as FIFOs. However, I do not have the time at the moment to implement this. Kind regards, Sebastian > NGramFilter -- construct n-grams from a TokenStream > --- > > Key: LUCENE-400 > URL: http://issues.apache.org/jira/browse/LUCENE-400 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis >Affects Versions: unspecified > Environment: Operating System: All > Platform: All >Reporter: Sebastian Kirsch > Assigned To: Lucene Developers >Priority: Minor > Attachments: NGramAnalyzerWrapper.java, > NGramAnalyzerWrapperTest.java, NGramFilter.java, NGramFilterTest.java > > > This filter constructs n-grams (token combinations up to a fixed size, > sometimes > called "shingles") from a token stream. > The filter sets start offsets, end offsets and position increments, so > highlighting and phrase queries should work. > Position increments > 1 in the input stream are replaced by filler tokens > (tokens with termText "_" and endOffset - startOffset = 0) in the output > n-grams. (Position increments > 1 in the input stream are usually caused by > removing some tokens, eg. stopwords, from a stream.) > The filter uses CircularFifoBuffer and UnboundedFifoBuffer from Apache > Commons-Collections. > Filter, test case and an analyzer are attached. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-400) NGramFilter -- construct n-grams from a TokenStream
[ http://issues.apache.org/jira/browse/LUCENE-400?page=comments#action_12419913 ] Otis Gospodnetic commented on LUCENE-400: - Sebastian, ever figured out the problem? Also, is there a way to get rid of the Commons Collections? Lucene has no run-time dependencies on other libraries. > NGramFilter -- construct n-grams from a TokenStream > --- > > Key: LUCENE-400 > URL: http://issues.apache.org/jira/browse/LUCENE-400 > Project: Lucene - Java > Type: Improvement > Components: Analysis > Versions: unspecified > Environment: Operating System: All > Platform: All > Reporter: Sebastian Kirsch > Assignee: Lucene Developers > Priority: Minor > Attachments: NGramAnalyzerWrapper.java, NGramAnalyzerWrapperTest.java, > NGramFilter.java, NGramFilterTest.java > > This filter constructs n-grams (token combinations up to a fixed size, > sometimes > called "shingles") from a token stream. > The filter sets start offsets, end offsets and position increments, so > highlighting and phrase queries should work. > Position increments > 1 in the input stream are replaced by filler tokens > (tokens with termText "_" and endOffset - startOffset = 0) in the output > n-grams. (Position increments > 1 in the input stream are usually caused by > removing some tokens, eg. stopwords, from a stream.) > The filter uses CircularFifoBuffer and UnboundedFifoBuffer from Apache > Commons-Collections. > Filter, test case and an analyzer are attached. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]