[ 
http://issues.apache.org/jira/browse/LUCENE-675?page=comments#action_12436858 ] 
            
Michael McCandless commented on LUCENE-675:
-------------------------------------------

I think this is an incredibly important initiative: with every
non-trivial change to Lucene (eg lock-less commits) we must verify
performance did not get worse.  But, as things stand now, it's an
ad-hoc thing that each developer needs to do.

So (as a consumer of this), I would love to have a ready-to-use
standard test that I could run to check if I've slowed things down
with lock-less commits.

In the mean time I've been using Europarl for my testing.

Also important to realize is there are many dimensions to test.  With
lock-less I'm focusing entirely on "wall clock time to open readers
and writers" in different use cases like pure indexing, pure
searching, highly interactive mixed indexing/searching, etc.  And this
is actually hard to test cleanly because in certain cases (highly
interactive case, or many readers case), the current Lucene hits many
"commit lock" retries and/or timeouts (whereas lock-less doesn't).  So
what's a "fair" comparison in this case?

In addition to standardizing on the corpus I think we ideallly need
standardized hardware / OS / software configuration as well, so the
numbers are easily comparable across time.  Even the test process
itself is important, eg details like "you should reboot the box before
each run" and "discard results from first run then take average of
next 3 runs as your result", are important.  It would be wonderful if
we could get this into a nightly automated regression test so we could
track over time how the performance has changed (and, for example,
quickly detect accidental regressions).  We should probably open this
as a separate issue which depends first on this issue being complete.


> Lucene benchmark: objective performance test for Lucene
> -------------------------------------------------------
>
>                 Key: LUCENE-675
>                 URL: http://issues.apache.org/jira/browse/LUCENE-675
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Andrzej Bialecki 
>         Assigned To: Grant Ingersoll
>         Attachments: LuceneBenchmark.java
>
>
> We need an objective way to measure the performance of Lucene, both indexing 
> and querying, on a known corpus. This issue is intended to collect comments 
> and patches implementing a suite of such benchmarking tests.
> Regarding the corpus: one of the widely used and freely available corpora is 
> the original Reuters collection, available from 
> http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/news20.tar.gz 
> or 
> http://people.csail.mit.edu/u/j/jrennie/public_html/20Newsgroups/20news-18828.tar.gz.
>  I propose to use this corpus as a base for benchmarks. The benchmarking 
> suite could automatically retrieve it from known locations, and cache it 
> locally.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to