RESULTS A: 'body' neither stored nor vectorized
======================================================================== ===
configuration               avg secs       max memory consumed
------------------------------------------------------------------------ ---
Lucene / JVM 1.4               50.14               79 MB
Lucene / JVM 1.5               51.86               93 MB
KinoSearch / Perl 5.8.8        70.25               29 MB
KinoSearch / Perl 5.8.6        83.43               31 MB


RESULTS B: 'body' stored and vectorized
======================================================================== ===
configuration               avg secs       max memory consumed
------------------------------------------------------------------------ ---
KinoSearch / Perl 5.8.8        76.01               29 MB
Lucene / JVM 1.4               86.70              178 MB
KinoSearch / Perl 5.8.6        88.79               31 MB
Lucene / JVM 1.5               89.28              147 MB
Plucene / Perl 5.8.6         2014.00*            skipped


DISCUSSION
======================================================================== ===

1) Lucene performs better than KinoSearch when there is less data to be stored, while KinoSearch does better when there is a lot of data to be stored. This may be because Lucene rewrites the stored field data and the term vector data whenever segments are merged, while KinoSearch writes that data only once (twice if you count the fact that KinoSearch only supports the compound file format, which we've disabled in Lucene for the sake of speed). It probably also helps that KinoSearch stores term vector data with the stored field data in the .fdx file.

2) The memory consumed by Lucene is due to the generous value (1000) assigned to maxBufferedDocs, which is critical for indexing performance. KinoSearch's memory consumption is primarily dependent on the mem_threshold argument to the KinoSearch::Util::SortExternal constructor, which isn't accessible from the public API at present. Increasing this from the default of 16 MB to 256 MB improves speed by another 15% or so.

3) The difference between Perl 5.8.8 and 5.8.6 probably has less to do with the version number and more to do with the fact that the 5.8.6 install has threads enabled, while the 5.8.8 install does not. The 5.8.6 install is the Perl that Apple ships with OS X 10.4. The 5.8.8 install is compiled from source using all the Configure script's suggestions/defaults except for the two pertaining to installation location.

4) While Plucene is written in pure Perl and KinoSearch is written in Perl and C/XS, there are also substantial algorithmic differences between them. These have been covered in depth elsewhere.

METHODOLOGY
======================================================================== ===

Source code for the experiment can be found at <http:// www.rectangular.com/svn/kinosearch/trunk/t/benchmarks/>. The tests were run using subversion repository revision 762.

The test corpus was Reuters-21578, Distribution 1.0. Reuters-21578 is available from David D. Lewis' professional home page, currently:

    http://www.research.att.com/~lewis

The times for KinoSearch and Lucene are 5-run averages. OS X is a busy operating system, which injects some noise into the results. It's crucial that iters occur one right after another, as a second run immediately following another is often faster, but even a few seconds lag between them can slow the second run. (Presumably this is due to cache reassignment.) Therefore, the same command was issued on the command line 6 times, separated by semicolons. The first iter was discarded, and the rest were averaged.

The maximum memory consumption was measured during auxiliary passes (i.e. not averaged in), using the crude method of eyeballing RPRVT in the output of top.

* The sole Plucene stat isn't an average, it's just one run, as there wasn't time to perform multiple runs.

HARDWARE
======================================================================== ===

    PowerBook G4 17" 1.67 MHz
    Mac OS X 10.4.5
    1.5 GB ram
    Seagate 5400 rpm, 100 MB ATA HD


SOFTWARE
======================================================================== ===

Lucene 1.9.1
KinoSearch 0.09_03
Plucene 1.24

JVM 1.4.2_09
JVM 1.5.0_02
Apple's Perl 5.8.6 (shipped with OS X 10.4)
Perl 5.8.8 from source


RAW DATA
======================================================================== ===

slothbear:~/Desktop/ks/t/benchmarks marvin$ javac -d . indexers/ LuceneIndexer.java slothbear:~/Desktop/ks/t/benchmarks marvin$ java -server -Xmx500M LuceneIndexer; java -server -Xmx500M LuceneIndexer; java -server - Xmx500M LuceneIndexer; java -server -Xmx500M LuceneIndexer; java - server -Xmx500M LuceneIndexer; java -server -Xmx500M LuceneIndexer
Java Lucene 1.9.1 DOCS: 19043 SECS: 50.99
Java Lucene 1.9.1 DOCS: 19043 SECS: 50.42
Java Lucene 1.9.1 DOCS: 19043 SECS: 50.08
Java Lucene 1.9.1 DOCS: 19043 SECS: 49.54
Java Lucene 1.9.1 DOCS: 19043 SECS: 50.48
Java Lucene 1.9.1 DOCS: 19043 SECS: 50.18
slothbear:~/Desktop/ks/t/benchmarks marvin$ javac15 -d . indexers/ LuceneIndexer.java Note: indexers/LuceneIndexer.java uses unchecked or unsafe operations.Note: Recompile with -Xlint:unchecked for details. slothbear:~/Desktop/ks/t/benchmarks marvin$ java15 -server -Xmx500M LuceneIndexer; java15 -server -Xmx500M LuceneIndexer; java15 -server - Xmx500M LuceneIndexer; java15 -server -Xmx500M LuceneIndexer; java15 - server -Xmx500M LuceneIndexer; java15 -server -Xmx500M LuceneIndexer;
Java Lucene 1.9.1 DOCS: 19043 SECS: 52.26
Java Lucene 1.9.1 DOCS: 19043 SECS: 51.91
Java Lucene 1.9.1 DOCS: 19043 SECS: 52.19
Java Lucene 1.9.1 DOCS: 19043 SECS: 51.80
Java Lucene 1.9.1 DOCS: 19043 SECS: 51.23
Java Lucene 1.9.1 DOCS: 19043 SECS: 52.19
slothbear:~/Desktop/ks/t/benchmarks marvin$ vim indexers/ LuceneIndexer.java slothbear:~/Desktop/ks/t/benchmarks marvin$ javac -d . indexers/ LuceneIndexer.java slothbear:~/Desktop/ks/t/benchmarks marvin$ java - server -Xmx500M LuceneIndexer; java -server -Xmx500M LuceneIndexer; java -server -Xmx500M LuceneIndexer; java -server -Xmx500M LuceneIndexer; java -server -Xmx500M LuceneIndexer; java -server - Xmx500M LuceneIndexer
Java Lucene 1.9.1 DOCS: 19043 SECS: 87.50
Java Lucene 1.9.1 DOCS: 19043 SECS: 87.42
Java Lucene 1.9.1 DOCS: 19043 SECS: 86.29
Java Lucene 1.9.1 DOCS: 19043 SECS: 86.74
Java Lucene 1.9.1 DOCS: 19043 SECS: 86.11
Java Lucene 1.9.1 DOCS: 19043 SECS: 86.96
slothbear:~/Desktop/ks/t/benchmarks marvin$ javac15 -d . indexers/ LuceneIndexer.java Note: indexers/LuceneIndexer.java uses unchecked or unsafe operations.Note: Recompile with -Xlint:unchecked for details. slothbear:~/Desktop/ks/t/benchmarks marvin$ java15 -server -Xmx500M LuceneIndexer; java15 -server -Xmx500M LuceneIndexer; java15 -server - Xmx500M LuceneIndexer; java15 -server -Xmx500M LuceneIndexer; java15 - server -Xmx500M LuceneIndexer; java15 -server -Xmx500M LuceneIndexer;
Java Lucene 1.9.1 DOCS: 19043 SECS: 90.43
Java Lucene 1.9.1 DOCS: 19043 SECS: 90.52
Java Lucene 1.9.1 DOCS: 19043 SECS: 90.06
Java Lucene 1.9.1 DOCS: 19043 SECS: 89.69
Java Lucene 1.9.1 DOCS: 19043 SECS: 87.87
Java Lucene 1.9.1 DOCS: 19043 SECS: 88.24
slothbear:~/Desktop/ks/t/benchmarks marvin$ perl -Mblib indexers/ kinosearch_indexer.plx; perl -Mblib indexers/kinosearch_indexer.plx; perl -Mblib indexers/kinosearch_indexer.plx; perl -Mblib indexers/ kinosearch_indexer.plx; perl -Mblib indexers/kinosearch_indexer.plx; perl -Mblib indexers/kinosearch_indexer.plx;
KinoSearch 0.09_03 DOCS: 19043  SECS: 87.20
KinoSearch 0.09_03 DOCS: 19043  SECS: 82.55
KinoSearch 0.09_03 DOCS: 19043  SECS: 82.38
KinoSearch 0.09_03 DOCS: 19043  SECS: 81.86
KinoSearch 0.09_03 DOCS: 19043  SECS: 87.79
KinoSearch 0.09_03 DOCS: 19043  SECS: 82.52
slothbear:~/Desktop/ks/t/benchmarks marvin$ vim indexers/ kinosearch_indexer.plx slothbear:~/Desktop/ks/t/benchmarks marvin$ perl -Mblib indexers/ kinosearch_indexer.plx; perl -Mblib indexers/kinosearch_indexer.plx; perl -Mblib indexers/kinosearch_indexer.plx; perl -Mblib indexers/ kinosearch_indexer.plx; perl -Mblib indexers/kinosearch_indexer.plx; perl -Mblib indexers/kinosearch_indexer.plx;
KinoSearch 0.09_03 DOCS: 19043  SECS: 88.16
KinoSearch 0.09_03 DOCS: 19043  SECS: 87.70
KinoSearch 0.09_03 DOCS: 19043  SECS: 92.67
KinoSearch 0.09_03 DOCS: 19043  SECS: 87.32
KinoSearch 0.09_03 DOCS: 19043  SECS: 88.35
KinoSearch 0.09_03 DOCS: 19043  SECS: 87.92
slothbear:~/Desktop/ks/t/benchmarks marvin$ cd ~/Desktop/ks588/t/ benchmarks/ slothbear:~/Desktop/ks588/t/benchmarks marvin$ /usr/local/perl588/bin/ perl -Mblib indexers/kinosearch_indexer.plx; /usr/local/perl588/bin/ perl -Mblib indexers/kinosearch_indexer.plx; /usr/local/perl588/bin/ perl -Mblib indexers/kinosearch_indexer.plx; /usr/local/perl588/bin/ perl -Mblib indexers/kinosearch_indexer.plx; /usr/local/perl588/bin/ perl -Mblib indexers/kinosearch_indexer.plx; /usr/local/perl588/bin/ perl -Mblib indexers/kinosearch_indexer.plx
KinoSearch 0.09_03 DOCS: 19043  SECS: 69.67
KinoSearch 0.09_03 DOCS: 19043  SECS: 70.44
KinoSearch 0.09_03 DOCS: 19043  SECS: 72.87
KinoSearch 0.09_03 DOCS: 19043  SECS: 69.94
KinoSearch 0.09_03 DOCS: 19043  SECS: 69.16
KinoSearch 0.09_03 DOCS: 19043  SECS: 68.82
slothbear:~/Desktop/ks588/t/benchmarks marvin$ vim indexers/ kinosearch_indexer.plx slothbear:~/Desktop/ks588/t/benchmarks marvin$ /usr/local/perl588/bin/ perl -Mblib indexers/kinosearch_indexer.plx; /usr/local/perl588/bin/ perl -Mblib indexers/kinosearch_indexer.plx; /usr/local/perl588/bin/ perl -Mblib indexers/kinosearch_indexer.plx; /usr/local/perl588/bin/ perl -Mblib indexers/kinosearch_indexer.plx; /usr/local/perl588/bin/ perl -Mblib indexers/kinosearch_indexer.plx; /usr/local/perl588/bin/ perl -Mblib indexers/kinosearch_indexer.plx
KinoSearch 0.09_03 DOCS: 19043  SECS: 87.58
KinoSearch 0.09_03 DOCS: 19043  SECS: 75.17
KinoSearch 0.09_03 DOCS: 19043  SECS: 75.86
KinoSearch 0.09_03 DOCS: 19043  SECS: 75.05
KinoSearch 0.09_03 DOCS: 19043  SECS: 78.55
KinoSearch 0.09_03 DOCS: 19043  SECS: 75.41
slothbear:~/Desktop/ks588/t/benchmarks marvin$ cd ~/Desktop/ks/t/ benchmarks/ slothbear:~/Desktop/ks/t/benchmarks marvin$ perl indexers/ plucene_indexer.plx; perl indexers/plucene_indexer.plx; perl indexers/ plucene_indexer.plx; perl indexers/plucene_indexer.plx; perl indexers/ plucene_indexer.plx;
Plucene 1.24 DOCS: 19043  SECS: 2013.70
^C
Couldn't get lock at indexers/plucene_indexer.plx line 56
^C
^C
slothbear:~/Desktop/ks/t/benchmarks marvin$




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to