[
https://issues.apache.org/jira/browse/LUCENE-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Uwe Schindler updated LUCENE-2404:
----------------------------------
Attachment: LUCENE-2404-2.patch
Another variant of the previous patch, slightly faster as Robert said, maybe we
get an inspiration by that. It uses cloneAttributes and does not create new
clones all the time.
> Improve speed of ThaiWordFilter by CharacterIterator, factor out LowerCasing
> and also fix some bugs (empty tokens stop iteration)
> ---------------------------------------------------------------------------------------------------------------------------------
>
> Key: LUCENE-2404
> URL: https://issues.apache.org/jira/browse/LUCENE-2404
> Project: Lucene - Java
> Issue Type: Bug
> Components: contrib/analyzers
> Reporter: Uwe Schindler
> Assignee: Robert Muir
> Fix For: 3.1
>
> Attachments: LUCENE-2404-2.patch, LUCENE-2404.patch, LUCENE-2404.patch
>
>
> The ThaiWordFilter creates new Strings out of term buffer before passing to
> The BreakIterator., But BreakIterator can take a CharacterIterator and
> directly process on it without buffer copying.
> As Java itsself does not provide a CharacterIterator implementation in
> java.text, we can use the javax.swing.text.Segment class, that operates on a
> char[] and is even reuseable! This class is very strange but it works and is
> in JDK 1.4+ and not deprecated.
> The filter also had a bug: It stopped iterating tokens when an empty token
> occurred. Also the lowercasing for non-thai words was removed and put into
> the Analyzer by adding LowerCaseFilter.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]