[jira] Created: (LUCENE-1826) All Tokenizer implementations should have constructor that takes an AttributeSource

Tim Smith (JIRA) Thu, 20 Aug 2009 10:38:41 -0700

All Tokenizer implementations should have constructor that takes an 
AttributeSource
-----------------------------------------------------------------------------------


                 Key: LUCENE-1826
                 URL: https://issues.apache.org/jira/browse/LUCENE-1826
             Project: Lucene - Java
          Issue Type: Improvement
          Components: Analysis
    Affects Versions: 2.9
            Reporter: Tim Smith


I have a TokenStream implementation that joins together multiple sub 
TokenStreams (i then do additional filtering on top of this, so i can't just 
have the indexer do the merging)

in 2.4, this worked fine.
once one sub stream was exhausted, i just started using the next stream 

however, in 2.9, this is very difficult, and requires copying Term buffers for 
every token being aggregated

however, if all the sub TokenStreams share the same AttributeSource, and my 
"concat" TokenStream shares the same AttributeSource, this goes back to being 
very simple (and very efficient)


So for example, i would like to see the following constructor added to 
StandardTokenizer:
{code}
  public StandardTokenizer(AttributeSource source, Reader input, boolean 
replaceInvalidAcronym) {
    super(source);
    ...
  }
{code}

would likewise want similar constructors added to all Tokenizer sub classes 
provided by lucene


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Created: (LUCENE-1826) All Tokenizer implementations should have constructor that takes an AttributeSource

Reply via email to