On Tue, 27 Mar 2007, Jack L wrote:

This is about an earlier email of yours. Was it the indexing speed
or searching speed?

I don't remember, it might have meant "on average both".
I also said that CLucene, the native C++ port, seemed twice as fast again.

Benchmarking Java apps is typically fraught with controversy. The most common one having to do with the need to warm up the JVM before being able to make any meaningful speed measurements.

I did nothing of the careful sort. If I remember correctly, I ran the IndexFiles.py PyLucene/Lucene sample on the same text files and measured the time spent. Similarly, I may have done the same with the SearchFiles.py PyLucene/Lucene sample.

I read online that gcj is typically slower than java. Any idea why
PyLucene is actually faster?

It's all in the definition of "typically". Like I said, it depends on what you're doing and what you're measuring. There are people that claim that Java is even faster than C++. One can make benchmarks say whatever one wants them to say.

To put my benchmarking in context, I did this at a very early stage in PyLucene development (when SWIG was still used - slowing things down too), when gcj was at version 3.3 or 3.2, when Lucene itself was at version 1.3 or 1.4 (unclear). The alternatives to PyLucene at the time were Lupy, a pure python port, 10x slower, and CLucene a pure C++ port, 4x times faster, than Java Lucene in the above benchmark.

Lucy has the potential of attaining CLucene benchmarks. The key for me here is whether:
  - it can keep up with Lucene development
  - its own bugs are worth trading for gcj's
In the CLucene case, I felt that the gcj route was a better trade-off, especially since dumping SWIG.

Andi..
_______________________________________________
pylucene-dev mailing list
[email protected]
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev

Reply via email to