Sure, there are two Chinese analyzer (including the CJKAnalyzer) bundled with Lucene. But both are character based and far from acceptable.
A practical Chinese tokenizer should know Chinese words (with one or several dictionaries) than characters. It is reasonable not to bundle a dictionary based analyzer with Lucene, since the dictionary alone would be several megabytes, yet not helpful to other part of the world:) On Mon, Jun 2, 2008 at 2:09 PM, Andi Vajda <[EMAIL PROTECTED]> wrote: > > On Jun 1, 2008, at 10:53 PM, "Cloud Zhang" <[EMAIL PROTECTED]> wrote: > > Thank a lot for this very detailed guide, I'll forward this to Chinese > Python community, since the first thing a Chinese developer looking for > about Lucene is a tokenizer for Chinese and get stuck with importing a > jar... > > > Isn't there a Chinese analyzer already shipped with Java Lucene in > contrib/analyzers ? > That contrib is already built into PyLucene. > > Andi.. > > > > On Mon, Jun 2, 2008 at 1:15 PM, Andi Vajda < <[EMAIL PROTECTED]> > [EMAIL PROTECTED]> wrote: > >> >> On Mon, 2 Jun 2008, Cloud Zhang wrote: >> >> Adding an new analyzer (in jar form) in Java is really straightforward, >>> but >>> when I was trying to add one for pyLucene, I found no way to refer the >>> jar >>> package. >>> >>> I went though the building process of pyLucene and guess maybe I could: >>> * put the analyzer source under >>> PyLucene-2.3.2-1/lucene-java-2.3.2/contrib/analyzers/src/java/, and >>> recompile Lucene then pyLucene >>> or >>> * put the analyzer jar somewhere in the building folder and add it to the >>> Makefile, then recompile pyLucene >>> >>> Could them work? Or is there other solution which is as straightforward >>> as >>> setting CLASSPATH in java? >>> >> >> To access your class(es) by name from Python, you must have JCC generate >> wrappers for it (them). This is what is done line 177 and on in PyLucene's >> Makefile. The easiest way for you to add your own Java classes to PyLucene >> is to create another jar file with your own analyzer classes and code and >> add it to the JCC invocation there. >> >> For example, the Makefile snippet in question currently says: >> >> GENERATE=$(JCC) $(foreach jar,$(JARS),--jar $(jar)) \ >> --package java.lang java.lang.System \ >> java.lang.Runtime \ >> --package java.util \ >> java.text.SimpleDateFormat \ >> --package java.io java.io.StringReader \ >> java.io.InputStreamReader \ >> java.io.FileInputStream \ >> --exclude org.apache.lucene.queryParser.Token \ >> --exclude org.apache.lucene.queryParser.TokenMgrError \ >> --exclude org.apache.lucene.queryParser.QueryParserTokenManager >> \ >> --exclude org.apache.lucene.queryParser.ParseException \ >> --python lucene \ >> --mapping org.apache.lucene.document.Document >> 'get:(Ljava/lang/String;)Ljava/lang/String;' \ >> --mapping java.util.Properties >> 'getProperty:(Ljava/lang/String;)Ljava/lang/String;' \ >> --sequence org.apache.lucene.search.Hits 'length:()I' >> 'doc:(I)Lorg/apache/lucene/document/Document;' \ >> --version $(LUCENE_VER) \ >> --files $(NUM_FILES) >> >> >> change the first line to say: >> >> GENERATE=$(JCC) $(foreach jar,$(JARS),--jar $(jar)) --jar myjar.jar \ >> ... >> >> and rebuild PyLucene. That should be all you need to do. Your jar file is >> going to be installed along with lucene's in the lucene egg and it is going >> to be put on lucene.CLASSPATH which you use with lucene.initVM(). >> >> Your classes can be declared in any Java package you want. Just make sure >> that their names don't clash with other Lucene class names that you also >> need to use as the class namespace is flattened in PyLucene. >> >> For more information about JCC and its command line args see JCC's README >> file at [1]. >> >> Andi.. >> >> [1] <http://svn.osafoundation.org/pylucene/trunk/jcc/jcc/README> >> http://svn.osafoundation.org/pylucene/trunk/jcc/jcc/README >> _______________________________________________ >> pylucene-dev mailing list >> <pylucene-dev@osafoundation.org>pylucene-dev@osafoundation.org >> <http://lists.osafoundation.org/mailman/listinfo/pylucene-dev> >> http://lists.osafoundation.org/mailman/listinfo/pylucene-dev >> > > > > -- > Cheers, > Cloud > > _______________________________________________ > pylucene-dev mailing list > pylucene-dev@osafoundation.org > http://lists.osafoundation.org/mailman/listinfo/pylucene-dev > > > _______________________________________________ > pylucene-dev mailing list > pylucene-dev@osafoundation.org > http://lists.osafoundation.org/mailman/listinfo/pylucene-dev > > -- Cheers, Cloud
_______________________________________________ pylucene-dev mailing list pylucene-dev@osafoundation.org http://lists.osafoundation.org/mailman/listinfo/pylucene-dev