On Mon, 27 Sep 2010, Christian Heimes wrote:

I like to request a new feature for the next version of PyLucene. Lucene
already comes with a collation library but PyLucene doesn't wrap it.
Collation is required for language depending sorting of search results. [1]

I've attached a working patch for the feature request.

from lucene import *
initVM()
<jcc.JCCEnv object at 0x7f326926e0d8>
collator = Collator.getInstance(Locale("de"))
keyanalyzer = CollationKeyAnalyzer(collator)
keyanalyzer
<CollationKeyAnalyzer:
org.apache.lucene.collation.collationkeyanaly...@510dc6b5>

Thanks
Christian

  Hi Christian,

In 3.x and trunk, I've been porting ICU-dependant Lucene contrib features to use PyICU [1][2] (which depends on C++ ICU). I think that having PyLucene depend both on C++ ICU and Java ICU is one ICU too many :-), though.

I'm not sure at this point which should remain. There are advantages to both... I'm open to arguments in favor of either. You can see examples in the 3.x tree [3].

(disclaimer: I'm the author of PyICU)

Andi..

[1] http://pypi.python.org/pypi/PyICU
[2] http://pyicu.osafoundation.org/
[3] http://svn.apache.org/repos/asf/lucene/pylucene/branches/branch_3x/python/


[1]
http://lucene.apache.org/java/3_0_1/api/contrib-collation/org/apache/lucene/collation/package-summary.html


Index: Makefile
===================================================================
--- Makefile    (Revision 1001535)
+++ Makefile    (Arbeitskopie)
@@ -136,7 +136,9 @@
REGEX_JAR=$(LUCENE)/build/contrib/regex/lucene-regex-$(LUCENE_VER).jar
QUERIES_JAR=$(LUCENE)/build/contrib/queries/lucene-queries-$(LUCENE_VER).jar
INSTANTIATED_JAR=$(LUCENE)/build/contrib/instantiated/lucene-instantiated-$(LUCENE_VER).jar
+COLLATION_JAR=$(LUCENE)/build/contrib/collation/lucene-collation-$(LUCENE_VER).jar
EXTENSIONS_JAR=build/jar/extensions.jar
+ICU4J_JAR=$(LUCENE)/contrib/collation/lib/icu4j-collation-4.0.jar


.PHONY: generate compile install default all clean realclean \
@@ -185,19 +187,24 @@
$(INSTANTIATED_JAR): $(LUCENE_JAR)
       cd $(LUCENE)/contrib/instantiated; $(ANT) -Dversion=$(LUCENE_VER)

+$(COLLATION_JAR): $(LUCENE_JAR)
+       cd $(LUCENE)/contrib/collation; $(ANT) -Dversion=$(LUCENE_VER)
+
$(EXTENSIONS_JAR): $(LUCENE_JAR)
       $(ANT) -f extensions.xml -Dlucene.dir=$(LUCENE)

JARS=$(LUCENE_JAR) $(SNOWBALL_JAR) $(ANALYZERS_JAR) \
     $(REGEX_JAR) $(MEMORY_JAR) $(HIGHLIGHTER_JAR) \
-     $(QUERIES_JAR) $(INSTANTIATED_JAR) $(EXTENSIONS_JAR)
+     $(QUERIES_JAR) $(INSTANTIATED_JAR) $(COLLATION_JAR) \
+     $(EXTENSIONS_JAR)

-JCCFLAGS?=--no-generics
+JCCFLAGS?=--no-generics --reserved IGNORE

jars: $(JARS)

GENERATE=$(JCC) $(foreach jar,$(JARS),--jar $(jar)) \
           $(JCCFLAGS) \
+           --include $(ICU4J_JAR) \
           --package java.lang java.lang.System \
                               java.lang.Runtime \
           --package java.util \
@@ -206,6 +213,8 @@
           --package java.io java.io.StringReader \
                             java.io.InputStreamReader \
                             java.io.FileInputStream \
+           --package java.text \
+                     java.text.Collator \
           --exclude org.apache.lucene.queryParser.Token \
           --exclude org.apache.lucene.queryParser.TokenMgrError \
           --exclude
org.apache.lucene.queryParser.QueryParserTokenManager \

Reply via email to