RESULTS A: 'body' neither stored nor vectorized
========================================================================
===
configuration avg secs max memory consumed
------------------------------------------------------------------------
---
Lucene / JVM 1.4 50.14 79 MB
Lucene / JVM 1.5 51.86 93 MB
KinoSearch / Perl 5.8.8 70.25 29 MB
KinoSearch / Perl 5.8.6 83.43 31 MB
RESULTS B: 'body' stored and vectorized
========================================================================
===
configuration avg secs max memory consumed
------------------------------------------------------------------------
---
KinoSearch / Perl 5.8.8 76.01 29 MB
Lucene / JVM 1.4 86.70 178 MB
KinoSearch / Perl 5.8.6 88.79 31 MB
Lucene / JVM 1.5 89.28 147 MB
Plucene / Perl 5.8.6 2014.00* skipped
DISCUSSION
========================================================================
===
1) Lucene performs better than KinoSearch when there is less data to
be stored, while KinoSearch does better when there is a lot of data
to be stored. This may be because Lucene rewrites the stored field
data and the term vector data whenever segments are merged, while
KinoSearch writes that data only once (twice if you count the fact
that KinoSearch only supports the compound file format, which we've
disabled in Lucene for the sake of speed). It probably also helps
that KinoSearch stores term vector data with the stored field data in
the .fdx file.
2) The memory consumed by Lucene is due to the generous value (1000)
assigned to maxBufferedDocs, which is critical for indexing
performance. KinoSearch's memory consumption is primarily dependent
on the mem_threshold argument to the KinoSearch::Util::SortExternal
constructor, which isn't accessible from the public API at present.
Increasing this from the default of 16 MB to 256 MB improves speed by
another 15% or so.
3) The difference between Perl 5.8.8 and 5.8.6 probably has less to
do with the version number and more to do with the fact that the
5.8.6 install has threads enabled, while the 5.8.8 install does not.
The 5.8.6 install is the Perl that Apple ships with OS X 10.4. The
5.8.8 install is compiled from source using all the Configure
script's suggestions/defaults except for the two pertaining to
installation location.
4) While Plucene is written in pure Perl and KinoSearch is written in
Perl and C/XS, there are also substantial algorithmic differences
between them. These have been covered in depth elsewhere.
METHODOLOGY
========================================================================
===
Source code for the experiment can be found at <http://
www.rectangular.com/svn/kinosearch/trunk/t/benchmarks/>. The tests
were run using subversion repository revision 762.
The test corpus was Reuters-21578, Distribution 1.0. Reuters-21578
is available from David D. Lewis' professional home page, currently:
http://www.research.att.com/~lewis
The times for KinoSearch and Lucene are 5-run averages. OS X is a
busy operating system, which injects some noise into the results.
It's crucial that iters occur one right after another, as a second
run immediately following another is often faster, but even a few
seconds lag between them can slow the second run. (Presumably this
is due to cache reassignment.) Therefore, the same command was
issued on the command line 6 times, separated by semicolons. The
first iter was discarded, and the rest were averaged.
The maximum memory consumption was measured during auxiliary passes
(i.e. not averaged in), using the crude method of eyeballing RPRVT in
the output of top.
* The sole Plucene stat isn't an average, it's just one run, as there
wasn't time to perform multiple runs.
HARDWARE
========================================================================
===
PowerBook G4 17" 1.67 MHz
Mac OS X 10.4.5
1.5 GB ram
Seagate 5400 rpm, 100 MB ATA HD
SOFTWARE
========================================================================
===
Lucene 1.9.1
KinoSearch 0.09_03
Plucene 1.24
JVM 1.4.2_09
JVM 1.5.0_02
Apple's Perl 5.8.6 (shipped with OS X 10.4)
Perl 5.8.8 from source
RAW DATA
========================================================================
===
slothbear:~/Desktop/ks/t/benchmarks marvin$ javac -d . indexers/
LuceneIndexer.java
slothbear:~/Desktop/ks/t/benchmarks marvin$ java -server -Xmx500M
LuceneIndexer; java -server -Xmx500M LuceneIndexer; java -server -
Xmx500M LuceneIndexer; java -server -Xmx500M LuceneIndexer; java -
server -Xmx500M LuceneIndexer; java -server -Xmx500M LuceneIndexer
Java Lucene 1.9.1 DOCS: 19043 SECS: 50.99
Java Lucene 1.9.1 DOCS: 19043 SECS: 50.42
Java Lucene 1.9.1 DOCS: 19043 SECS: 50.08
Java Lucene 1.9.1 DOCS: 19043 SECS: 49.54
Java Lucene 1.9.1 DOCS: 19043 SECS: 50.48
Java Lucene 1.9.1 DOCS: 19043 SECS: 50.18
slothbear:~/Desktop/ks/t/benchmarks marvin$ javac15 -d . indexers/
LuceneIndexer.java Note: indexers/LuceneIndexer.java uses unchecked
or unsafe operations.Note: Recompile with -Xlint:unchecked for details.
slothbear:~/Desktop/ks/t/benchmarks marvin$ java15 -server -Xmx500M
LuceneIndexer; java15 -server -Xmx500M LuceneIndexer; java15 -server -
Xmx500M LuceneIndexer; java15 -server -Xmx500M LuceneIndexer; java15 -
server -Xmx500M LuceneIndexer; java15 -server -Xmx500M LuceneIndexer;
Java Lucene 1.9.1 DOCS: 19043 SECS: 52.26
Java Lucene 1.9.1 DOCS: 19043 SECS: 51.91
Java Lucene 1.9.1 DOCS: 19043 SECS: 52.19
Java Lucene 1.9.1 DOCS: 19043 SECS: 51.80
Java Lucene 1.9.1 DOCS: 19043 SECS: 51.23
Java Lucene 1.9.1 DOCS: 19043 SECS: 52.19
slothbear:~/Desktop/ks/t/benchmarks marvin$ vim indexers/
LuceneIndexer.java
slothbear:~/Desktop/ks/t/benchmarks marvin$ javac -d . indexers/
LuceneIndexer.java slothbear:~/Desktop/ks/t/benchmarks marvin$ java -
server -Xmx500M LuceneIndexer; java -server -Xmx500M LuceneIndexer;
java -server -Xmx500M LuceneIndexer; java -server -Xmx500M
LuceneIndexer; java -server -Xmx500M LuceneIndexer; java -server -
Xmx500M LuceneIndexer
Java Lucene 1.9.1 DOCS: 19043 SECS: 87.50
Java Lucene 1.9.1 DOCS: 19043 SECS: 87.42
Java Lucene 1.9.1 DOCS: 19043 SECS: 86.29
Java Lucene 1.9.1 DOCS: 19043 SECS: 86.74
Java Lucene 1.9.1 DOCS: 19043 SECS: 86.11
Java Lucene 1.9.1 DOCS: 19043 SECS: 86.96
slothbear:~/Desktop/ks/t/benchmarks marvin$ javac15 -d . indexers/
LuceneIndexer.java Note: indexers/LuceneIndexer.java uses unchecked
or unsafe operations.Note: Recompile with -Xlint:unchecked for details.
slothbear:~/Desktop/ks/t/benchmarks marvin$ java15 -server -Xmx500M
LuceneIndexer; java15 -server -Xmx500M LuceneIndexer; java15 -server -
Xmx500M LuceneIndexer; java15 -server -Xmx500M LuceneIndexer; java15 -
server -Xmx500M LuceneIndexer; java15 -server -Xmx500M LuceneIndexer;
Java Lucene 1.9.1 DOCS: 19043 SECS: 90.43
Java Lucene 1.9.1 DOCS: 19043 SECS: 90.52
Java Lucene 1.9.1 DOCS: 19043 SECS: 90.06
Java Lucene 1.9.1 DOCS: 19043 SECS: 89.69
Java Lucene 1.9.1 DOCS: 19043 SECS: 87.87
Java Lucene 1.9.1 DOCS: 19043 SECS: 88.24
slothbear:~/Desktop/ks/t/benchmarks marvin$ perl -Mblib indexers/
kinosearch_indexer.plx; perl -Mblib indexers/kinosearch_indexer.plx;
perl -Mblib indexers/kinosearch_indexer.plx; perl -Mblib indexers/
kinosearch_indexer.plx; perl -Mblib indexers/kinosearch_indexer.plx;
perl -Mblib indexers/kinosearch_indexer.plx;
KinoSearch 0.09_03 DOCS: 19043 SECS: 87.20
KinoSearch 0.09_03 DOCS: 19043 SECS: 82.55
KinoSearch 0.09_03 DOCS: 19043 SECS: 82.38
KinoSearch 0.09_03 DOCS: 19043 SECS: 81.86
KinoSearch 0.09_03 DOCS: 19043 SECS: 87.79
KinoSearch 0.09_03 DOCS: 19043 SECS: 82.52
slothbear:~/Desktop/ks/t/benchmarks marvin$ vim indexers/
kinosearch_indexer.plx
slothbear:~/Desktop/ks/t/benchmarks marvin$ perl -Mblib indexers/
kinosearch_indexer.plx; perl -Mblib indexers/kinosearch_indexer.plx;
perl -Mblib indexers/kinosearch_indexer.plx; perl -Mblib indexers/
kinosearch_indexer.plx; perl -Mblib indexers/kinosearch_indexer.plx;
perl -Mblib indexers/kinosearch_indexer.plx;
KinoSearch 0.09_03 DOCS: 19043 SECS: 88.16
KinoSearch 0.09_03 DOCS: 19043 SECS: 87.70
KinoSearch 0.09_03 DOCS: 19043 SECS: 92.67
KinoSearch 0.09_03 DOCS: 19043 SECS: 87.32
KinoSearch 0.09_03 DOCS: 19043 SECS: 88.35
KinoSearch 0.09_03 DOCS: 19043 SECS: 87.92
slothbear:~/Desktop/ks/t/benchmarks marvin$ cd ~/Desktop/ks588/t/
benchmarks/
slothbear:~/Desktop/ks588/t/benchmarks marvin$ /usr/local/perl588/bin/
perl -Mblib indexers/kinosearch_indexer.plx; /usr/local/perl588/bin/
perl -Mblib indexers/kinosearch_indexer.plx; /usr/local/perl588/bin/
perl -Mblib indexers/kinosearch_indexer.plx; /usr/local/perl588/bin/
perl -Mblib indexers/kinosearch_indexer.plx; /usr/local/perl588/bin/
perl -Mblib indexers/kinosearch_indexer.plx; /usr/local/perl588/bin/
perl -Mblib indexers/kinosearch_indexer.plx
KinoSearch 0.09_03 DOCS: 19043 SECS: 69.67
KinoSearch 0.09_03 DOCS: 19043 SECS: 70.44
KinoSearch 0.09_03 DOCS: 19043 SECS: 72.87
KinoSearch 0.09_03 DOCS: 19043 SECS: 69.94
KinoSearch 0.09_03 DOCS: 19043 SECS: 69.16
KinoSearch 0.09_03 DOCS: 19043 SECS: 68.82
slothbear:~/Desktop/ks588/t/benchmarks marvin$ vim indexers/
kinosearch_indexer.plx
slothbear:~/Desktop/ks588/t/benchmarks marvin$ /usr/local/perl588/bin/
perl -Mblib indexers/kinosearch_indexer.plx; /usr/local/perl588/bin/
perl -Mblib indexers/kinosearch_indexer.plx; /usr/local/perl588/bin/
perl -Mblib indexers/kinosearch_indexer.plx; /usr/local/perl588/bin/
perl -Mblib indexers/kinosearch_indexer.plx; /usr/local/perl588/bin/
perl -Mblib indexers/kinosearch_indexer.plx; /usr/local/perl588/bin/
perl -Mblib indexers/kinosearch_indexer.plx
KinoSearch 0.09_03 DOCS: 19043 SECS: 87.58
KinoSearch 0.09_03 DOCS: 19043 SECS: 75.17
KinoSearch 0.09_03 DOCS: 19043 SECS: 75.86
KinoSearch 0.09_03 DOCS: 19043 SECS: 75.05
KinoSearch 0.09_03 DOCS: 19043 SECS: 78.55
KinoSearch 0.09_03 DOCS: 19043 SECS: 75.41
slothbear:~/Desktop/ks588/t/benchmarks marvin$ cd ~/Desktop/ks/t/
benchmarks/
slothbear:~/Desktop/ks/t/benchmarks marvin$ perl indexers/
plucene_indexer.plx; perl indexers/plucene_indexer.plx; perl indexers/
plucene_indexer.plx; perl indexers/plucene_indexer.plx; perl indexers/
plucene_indexer.plx;
Plucene 1.24 DOCS: 19043 SECS: 2013.70
^C
Couldn't get lock at indexers/plucene_indexer.plx line 56
^C
^C
slothbear:~/Desktop/ks/t/benchmarks marvin$
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]