There is a good chance that they were using stock indexing defaults,
based on:
Lucene:
" In the present work, the simple applications
bundled with the library were used to index the collection. "
On 7-Dec-07, at 10:27 AM, Grant Ingersoll wrote:
Yeah, I wasn't too excited over it and I certainly didn't lose any
sleep over it, but there are some interesting things of note in
there concerning Lucene, including the claim that it fell over on
indexing WT10g docs (page 40) and I am always looking for ways to
improve things. Overall, I think Lucene held up pretty well in the
evaluation, and I know how suspect _any_ evaluation is given the
myriad ways of doing search. Still, when a well-respected
researcher in the field says Lucene didn't do so hot in certain
areas, I don't think we can dismiss them out of hand. So
regardless of the tests being right or wrong, they are worth either
addressing the failures in Lucene or the failures in the test such
that we make sure we are properly educating our users on how best
to use Lucene.
I emailed the authors asking for information on how the test was
run etc., so we'll see if anything comes of it.
On Dec 7, 2007, at 12:04 PM, robert engels wrote:
I wouldn't get too excited over this. Once again, it does not seem
the evaluator understands the nature of GC based systems, and the
memory statistics are quite out of whack. But it is hard to tell
because there is no data on how memory consumption was actually
measured.
A far better way of measuring memory consumption is to cap the
process at different levels (max ram sizes), and compare the
performance at each level.
There is also fact that a process takes memory from disk cache,
and visa versa, that heavily affects search performance, etc.
Since there is no detailed data (that I could find) about system
configuration, etc. the results are highly suspect.
There is also no mention of performance on multi-processor
systems. Some systems (like Lucene) pay a penalty to support multi-
processing (both in Java and Lucene), and only realize this
benefit when operating in a multi-processor environment.
Based on the shear speed of XMLSearch and Zettair those seem
likely candidates to inspect their design.
On Dec 7, 2007, at 7:03 AM, Grant Ingersoll wrote:
Was wondering if people have seen http://wrg.upf.edu/WRG/dctos/
Middleton-Baeza.pdf
Has some interesting comparisons. Obviously, the comparison of
Lucene indexing is done w/ 1.9 so it probably needs to be done
again. Just wondering if people see any opportunities to improve
Lucene from it. I am going to try and contact the authors to
see if I can get what there setup values were (mergeFactor,
Analyzer, etc.) as I think it would be interesting to run the
tests again on 2.3.
-Grant
--------------------------------------------------------------------
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]