[
https://issues.apache.org/jira/browse/LUCENE-2384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12854902#action_12854902
]
Robert Muir commented on LUCENE-2384:
-------------------------------------
If tokenizers like StandardTokenizer just end out reading things into ram
anyway, we should remove Reader from the Tokenizer interface.
supporting reader instead of simply tokenizing the entire doc causes our
tokenizers to be very very complex (see CharTokenizer).
It would be nice to remove this complexity, if the objective doesn't really
work anyway.
> Reset zzBuffer in StandardTokenizerImpl* when lexer is reset.
> -------------------------------------------------------------
>
> Key: LUCENE-2384
> URL: https://issues.apache.org/jira/browse/LUCENE-2384
> Project: Lucene - Java
> Issue Type: Sub-task
> Components: Analysis
> Affects Versions: 3.0.1
> Reporter: Uwe Schindler
> Assignee: Uwe Schindler
> Fix For: 3.1
>
>
> When indexing large documents, the lexer buffer may stay large forever. This
> sub-issue resets the lexer buffer back to the default on reset(Reader).
> This is done on the enclosing issue.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]