Sure, there are two Chinese analyzer (including the CJKAnalyzer) bundled
with Lucene. But both are character based and far from acceptable.

A practical Chinese tokenizer should know Chinese words (with one or several
dictionaries) than characters. It is reasonable not to bundle a dictionary
based analyzer with Lucene, since the dictionary alone would be several
megabytes, yet not helpful to other part of the world:)


On Mon, Jun 2, 2008 at 2:09 PM, Andi Vajda <[EMAIL PROTECTED]> wrote:

>
> On Jun 1, 2008, at 10:53 PM, "Cloud Zhang" <[EMAIL PROTECTED]> wrote:
>
> Thank a lot for this very detailed guide, I'll forward this to Chinese
> Python community, since the first thing a Chinese developer looking for
> about Lucene is a tokenizer for Chinese and get stuck with importing a
> jar...
>
>
> Isn't there a Chinese analyzer already shipped with Java Lucene in
> contrib/analyzers ?
> That contrib is already built into PyLucene.
>
> Andi..
>
>
>
> On Mon, Jun 2, 2008 at 1:15 PM, Andi Vajda < <[EMAIL PROTECTED]>
> [EMAIL PROTECTED]> wrote:
>
>>
>> On Mon, 2 Jun 2008, Cloud Zhang wrote:
>>
>>  Adding an new analyzer (in jar form) in Java is really straightforward,
>>> but
>>> when I was trying to add one for pyLucene, I found no way to refer the
>>> jar
>>> package.
>>>
>>> I went though the building process of pyLucene and guess maybe I could:
>>> * put the analyzer source under
>>> PyLucene-2.3.2-1/lucene-java-2.3.2/contrib/analyzers/src/java/, and
>>> recompile Lucene then pyLucene
>>> or
>>> * put the analyzer jar somewhere in the building folder and add it to the
>>> Makefile, then recompile pyLucene
>>>
>>> Could them work? Or is there other solution which is as straightforward
>>> as
>>> setting CLASSPATH in java?
>>>
>>
>> To access your class(es) by name from Python, you must have JCC generate
>> wrappers for it (them). This is what is done line 177 and on in PyLucene's
>> Makefile. The easiest way for you to add your own Java classes to PyLucene
>> is to create another jar file with your own analyzer classes and code and
>> add it to the JCC invocation there.
>>
>> For example, the Makefile snippet in question currently says:
>>
>> GENERATE=$(JCC) $(foreach jar,$(JARS),--jar $(jar)) \
>>           --package java.lang java.lang.System \
>>                               java.lang.Runtime \
>>           --package java.util \
>>                     java.text.SimpleDateFormat \
>>           --package java.io java.io.StringReader \
>>                             java.io.InputStreamReader \
>>                             java.io.FileInputStream \
>>           --exclude org.apache.lucene.queryParser.Token \
>>           --exclude org.apache.lucene.queryParser.TokenMgrError \
>>           --exclude org.apache.lucene.queryParser.QueryParserTokenManager
>> \
>>           --exclude org.apache.lucene.queryParser.ParseException \
>>           --python lucene \
>>           --mapping org.apache.lucene.document.Document
>> 'get:(Ljava/lang/String;)Ljava/lang/String;' \
>>           --mapping java.util.Properties
>> 'getProperty:(Ljava/lang/String;)Ljava/lang/String;' \
>>           --sequence org.apache.lucene.search.Hits 'length:()I'
>> 'doc:(I)Lorg/apache/lucene/document/Document;' \
>>           --version $(LUCENE_VER) \
>>           --files $(NUM_FILES)
>>
>>
>> change the first line to say:
>>
>> GENERATE=$(JCC) $(foreach jar,$(JARS),--jar $(jar)) --jar myjar.jar \
>>   ...
>>
>> and rebuild PyLucene. That should be all you need to do. Your jar file is
>> going to be installed along with lucene's in the lucene egg and it is going
>> to be put on lucene.CLASSPATH which you use with lucene.initVM().
>>
>> Your classes can be declared in any Java package you want. Just make sure
>> that their names don't clash with other Lucene class names that you also
>> need to use as the class namespace is flattened in PyLucene.
>>
>> For more information about JCC and its command line args see JCC's README
>> file at [1].
>>
>> Andi..
>>
>> [1] <http://svn.osafoundation.org/pylucene/trunk/jcc/jcc/README>
>> http://svn.osafoundation.org/pylucene/trunk/jcc/jcc/README
>> _______________________________________________
>> pylucene-dev mailing list
>>  <pylucene-dev@osafoundation.org>pylucene-dev@osafoundation.org
>>  <http://lists.osafoundation.org/mailman/listinfo/pylucene-dev>
>> http://lists.osafoundation.org/mailman/listinfo/pylucene-dev
>>
>
>
>
> --
> Cheers,
> Cloud
>
> _______________________________________________
> pylucene-dev mailing list
> pylucene-dev@osafoundation.org
> http://lists.osafoundation.org/mailman/listinfo/pylucene-dev
>
>
> _______________________________________________
> pylucene-dev mailing list
> pylucene-dev@osafoundation.org
> http://lists.osafoundation.org/mailman/listinfo/pylucene-dev
>
>


-- 
Cheers,
Cloud
_______________________________________________
pylucene-dev mailing list
pylucene-dev@osafoundation.org
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev

Reply via email to