I found this interesting post comparing the GCJ and JCC PyLucene flavors on an Ubuntu forum:
    http://ubuntuforums.org/showthread.php?t=593327

Mostly correct. Taking the final points made, comments inline:

  1. GCJ version seems to be incompatible with python web frameworks, as well
     as mod-python

Yeah, the threading issue in PyLucene with GCJ is a long standing pain that got resolved with PyLucene with JCC.

  2. GCJ has limits regarding file size for indexes, and sometimes cannot
     optimize your data

That is true with GCJ 3.x. GCJ 4.x has a fix for the 2 Gb file size limit in the Java runtime classes. Of course, your mileage with GCJ 4.x will vary.

  3. GCJ is very, very fast making search

GCJ is faster than Sun's JRE in getting started. If your search is a short lived program, GCJ is indeed faster. I did notice that this performance difference got lesser and lesser as the program's running time was longer.

 4. JCC is more complicated to install and require java installed (at least
    jre)

Well, it depends. If you have to build your own GCJ, I'd argue that installing PyLucene with JCC is vastly simpler. Building openjdk on Linux is also comparatively easier (?) than building GCJ.

  5. Programs using JCC version always need LD_LIBRARY_PATH

Not anymore. By using "-Wl,-rpath=libpath" in setup.py's LFLAGS, this problem - and arguably, security issue - is resolved. No need to set LD_LIBRARY_PATH anymore. svn trunk's version of JCC's setup.py has an example.

  6. JCC needs to start java VM everytime you run the program, so in cases
     like mine (cgi application) it's a bit slower

Yes, that's true. I spent some time today trying to detect the missing call to initVM() but it's more complicated than I thought without adding the check everywhere. I thought of adding it to findClass() only, a relatively slow operation the first time, but it's harder than I thought. More on this later. In the meantime, I put BIG notices at the top of both PyLucene's and JCC's README files about the need to call initVM() before calling into the VM.

To dispell another fallacy in the post, initVM() is indeed documented along with all its arguments in JCC's README file starting at line 189 of [1]

  7. JCC is about 3 times slower than GCJ when searching records, but seems to
     be fast importing data

See comment (3)

  8. JCC seems to be more stable and can optimize indexes bigger than 2.4GB

Yes, the Sun-originating VMs are much more mature than GCJ's is many ways. Now that Sun is sponsoring an open source JDK and JRE, openjdk [2], I expect most of the open source energy in java land to be focusing on it (see iced tea [3] project) instead of GCJ. The amount of traffic on the GCJ mailing list is not what it used to be...

Andi..

[1] http://svn.osafoundation.org/pylucene/trunk/jcc/README
[2] http://openjdk.java.net/
[3] http://fitzsim.org/blog/?p=16 and http://fitzsim.org/blog/?p=17

_______________________________________________
pylucene-dev mailing list
[email protected]
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev

Reply via email to