[ https://issues.apache.org/jira/browse/LUCENE-8202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16411106#comment-16411106 ]
Alan Woodward commented on LUCENE-8202: --------------------------------------- {quote}So I would expect it to be slow, but not to use lots of memory? {quote} BaseTokenStreamTestCase.checkAnalysisConsistency() stores all incoming tokens in a list so that it can then compare them against a second run, so it's a test issue rather than anything else. {quote}I would add both {quote} +1 - I'll add a max shingle size of four and a max number of stacks of 1000. > Add a FixedShingleFilter > ------------------------ > > Key: LUCENE-8202 > URL: https://issues.apache.org/jira/browse/LUCENE-8202 > Project: Lucene - Core > Issue Type: New Feature > Reporter: Alan Woodward > Assignee: Alan Woodward > Priority: Major > Fix For: 7.4 > > Attachments: LUCENE-8202.patch, LUCENE-8202.patch, LUCENE-8202.patch > > > In LUCENE-3475 I tried to make a ShingleGraphFilter that could accept and > emit arbitrary graphs, while duplicating all the functionality of the > existing ShingleFilter. This ends up being extremely hairy, and doesn't play > well with query parsers. > I'd like to step back and try and create a simpler shingle filter that can be > used for index-time phrase tokenization only. It will have a single fixed > shingle size, can deal with single-token synonyms, and won't emit unigrams. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org