Re: O/S Search Comparisons

Mike Klaas Fri, 07 Dec 2007 11:29:00 -0800

There is a good chance that they were using stock indexing defaults,based on:


Lucene:
" In the present work, the simple applications
bundled with the library were used to index the collection. "


On 7-Dec-07, at 10:27 AM, Grant Ingersoll wrote:

Yeah, I wasn't too excited over it and I certainly didn't lose anysleep over it, but there are some interesting things of note inthere concerning Lucene, including the claim that it fell over onindexing WT10g docs (page 40) and I am always looking for ways toimprove things. Overall, I think Lucene held up pretty well in theevaluation, and I know how suspect _any_ evaluation is given themyriad ways of doing search. Still, when a well-respectedresearcher in the field says Lucene didn't do so hot in certainareas, I don't think we can dismiss them out of hand. Soregardless of the tests being right or wrong, they are worth eitheraddressing the failures in Lucene or the failures in the test suchthat we make sure we are properly educating our users on how bestto use Lucene.
I emailed the authors asking for information on how the test wasrun etc., so we'll see if anything comes of it.
On Dec 7, 2007, at 12:04 PM, robert engels wrote:
I wouldn't get too excited over this. Once again, it does not seemthe evaluator understands the nature of GC based systems, and thememory statistics are quite out of whack. But it is hard to tellbecause there is no data on how memory consumption was actuallymeasured.
A far better way of measuring memory consumption is to cap theprocess at different levels (max ram sizes), and compare theperformance at each level.
There is also fact that a process takes memory from disk cache,and visa versa, that heavily affects search performance, etc.
Since there is no detailed data (that I could find) about systemconfiguration, etc. the results are highly suspect.
There is also no mention of performance on multi-processorsystems. Some systems (like Lucene) pay a penalty to support multi-processing (both in Java and Lucene), and only realize thisbenefit when operating in a multi-processor environment.
Based on the shear speed of XMLSearch and Zettair those seemlikely candidates to inspect their design.
On Dec 7, 2007, at 7:03 AM, Grant Ingersoll wrote:
Was wondering if people have seen http://wrg.upf.edu/WRG/dctos/Middleton-Baeza.pdf
Has some interesting comparisons. Obviously, the comparison ofLucene indexing is done w/ 1.9 so it probably needs to be doneagain. Just wondering if people see any opportunities to improveLucene from it. I am going to try and contact the authors tosee if I can get what there setup values were (mergeFactor,Analyzer, etc.) as I think it would be interesting to run thetests again on 2.3.
-Grant
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: O/S Search Comparisons

Reply via email to