Re: Best way to create own version of StandardTokenizer ?

Paul Taylor Fri, 04 Sep 2009 09:55:09 -0700

Robert Muir wrote:

On Fri, Sep 4, 2009 at 11:18 AM, Paul Taylor<paul_t...@fastmail.fm> wrote:

I submitted this https://issues.apache.org/jira/browse/LUCENE-1787 patch to
StandardTokenizerImpl, understandably it hasn't been incoroprated into
Lucene (yet) but I need it for the project Im working on. So would you
recommend keeping the same class name, and just putting in the classpath
before the lucene.jar, or creating a new Tokenizer,Impl and Jflex file in my
own projects package space.


i would recommend creating one in your own package space.

Also, the StandardTokenizerImpl.jflex file states it should be compiled with
Java 1.4 not a later JDK, is this just for backwards compatability ? Because
the indexes will be built afresh with this project  would I actually get a
better results if I used a later JVM, the project has to deal with indexing
text  which can be in any language and I'm hoping using the latest JVM may
solve some mapping problems with Japanese, Hebrew and Korean that I don't
really understand.


i do not think you will really get better results, but it depends what
your issue is (can you elaborate?)
upgrading from 1.4 -> 1.6 will bump your unicode version from 3 to 4.
you can see a list of the changes here:
http://www.unicode.org/versions/Unicode4.0.0/

Things like:

http://bugs.musicbrainz.org/ticket/1006
http://bugs.musicbrainz.org/ticket/5311
http://bugs.musicbrainz.org/ticket/4827

Paul


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Best way to create own version of StandardTokenizer ?

Reply via email to