On Tue, 27 Mar 2007, Jack L wrote:
This is about an earlier email of yours. Was it the indexing speed
or searching speed?
I don't remember, it might have meant "on average both".
I also said that CLucene, the native C++ port, seemed twice as fast again.
Benchmarking Java apps is typically fraught with controversy. The most common
one having to do with the need to warm up the JVM before being able to make
any meaningful speed measurements.
I did nothing of the careful sort. If I remember correctly, I ran the
IndexFiles.py PyLucene/Lucene sample on the same text files and measured the
time spent. Similarly, I may have done the same with the SearchFiles.py
PyLucene/Lucene sample.
I read online that gcj is typically slower than java. Any idea why
PyLucene is actually faster?
It's all in the definition of "typically". Like I said, it depends on what
you're doing and what you're measuring. There are people that claim that Java
is even faster than C++. One can make benchmarks say whatever one wants them
to say.
To put my benchmarking in context, I did this at a very early stage in
PyLucene development (when SWIG was still used - slowing things down too),
when gcj was at version 3.3 or 3.2, when Lucene itself was at version 1.3 or
1.4 (unclear). The alternatives to PyLucene at the time were Lupy, a pure
python port, 10x slower, and CLucene a pure C++ port, 4x times faster, than
Java Lucene in the above benchmark.
Lucy has the potential of attaining CLucene benchmarks. The key for me here
is whether:
- it can keep up with Lucene development
- its own bugs are worth trading for gcj's
In the CLucene case, I felt that the gcj route was a better trade-off,
especially since dumping SWIG.
Andi..
_______________________________________________
pylucene-dev mailing list
[email protected]
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev