[ 
https://issues.apache.org/jira/browse/LUCENE-2384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12854902#action_12854902
 ] 

Robert Muir commented on LUCENE-2384:
-------------------------------------

If tokenizers like StandardTokenizer just end out reading things into ram 
anyway, we should remove Reader from the Tokenizer interface.

supporting reader instead of simply tokenizing the entire doc causes our 
tokenizers to be very very complex (see CharTokenizer).
It would be nice to remove this complexity, if the objective doesn't really 
work anyway.

> Reset zzBuffer in StandardTokenizerImpl* when lexer is reset.
> -------------------------------------------------------------
>
>                 Key: LUCENE-2384
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2384
>             Project: Lucene - Java
>          Issue Type: Sub-task
>          Components: Analysis
>    Affects Versions: 3.0.1
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 3.1
>
>
> When indexing large documents, the lexer buffer may stay large forever. This 
> sub-issue resets the lexer buffer back to the default on reset(Reader).
> This is done on the enclosing issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to