Re: O/S Search Comparisons

Mark Miller Fri, 07 Dec 2007 12:02:40 -0800

Yes, and even if they did not use the stock defaults, I would bet therewould be complaints about what was done wrong at every turn. This seemslike a very difficult thing to do. How long does it take to fully learnhow to correctly utilize each search engine for the task at hand? I amsure longer than these busy men could possibly take. It seems that sucha comparison could only be done legitimately if experts for each searchengine set up the indexing/searching processes. Even then the resultsseem like they could be difficult to measure...eg was each search engineconfigured so that they would only break on spaces for indexing and donothing else special at all? So many small settings and knowledge needto ensure each engine is on level ground...

I doubt it will ever happen, but some sort of open source search offwould be pretty cool <g>. Then each camp could properly configure theirsearch engine for each task.


- Mark

Mike Klaas wrote:

There is a good chance that they were using stock indexing defaults,based on:
Lucene:
" In the present work, the simple applications
bundled with the library were used to index the collection. "

On 7-Dec-07, at 10:27 AM, Grant Ingersoll wrote:
Yeah, I wasn't too excited over it and I certainly didn't lose anysleep over it, but there are some interesting things of note in thereconcerning Lucene, including the claim that it fell over on indexingWT10g docs (page 40) and I am always looking for ways to improvethings. Overall, I think Lucene held up pretty well in theevaluation, and I know how suspect _any_ evaluation is given themyriad ways of doing search. Still, when a well-respected researcherin the field says Lucene didn't do so hot in certain areas, I don'tthink we can dismiss them out of hand. So regardless of the testsbeing right or wrong, they are worth either addressing the failuresin Lucene or the failures in the test such that we make sure we areproperly educating our users on how best to use Lucene.
I emailed the authors asking for information on how the test was runetc., so we'll see if anything comes of it.
On Dec 7, 2007, at 12:04 PM, robert engels wrote:
I wouldn't get too excited over this. Once again, it does not seemthe evaluator understands the nature of GC based systems, and thememory statistics are quite out of whack. But it is hard to tellbecause there is no data on how memory consumption was actuallymeasured.
A far better way of measuring memory consumption is to cap theprocess at different levels (max ram sizes), and compare theperformance at each level.
There is also fact that a process takes memory from disk cache, andvisa versa, that heavily affects search performance, etc.
Since there is no detailed data (that I could find) about systemconfiguration, etc. the results are highly suspect.
There is also no mention of performance on multi-processor systems.Some systems (like Lucene) pay a penalty to support multi-processing(both in Java and Lucene), and only realize this benefit whenoperating in a multi-processor environment.
Based on the shear speed of XMLSearch and Zettair those seem likelycandidates to inspect their design.
On Dec 7, 2007, at 7:03 AM, Grant Ingersoll wrote:
Was wondering if people have seenhttp://wrg.upf.edu/WRG/dctos/Middleton-Baeza.pdf
Has some interesting comparisons. Obviously, the comparison ofLucene indexing is done w/ 1.9 so it probably needs to be doneagain. Just wondering if people see any opportunities to improveLucene from it. I am going to try and contact the authors to seeif I can get what there setup values were (mergeFactor, Analyzer,etc.) as I think it would be interesting to run the tests again on2.3.
-Grant



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: O/S Search Comparisons

Reply via email to