[ 
http://issues.apache.org/jira/browse/LUCENE-675?page=comments#action_12447781 ] 
            
Grant Ingersoll commented on LUCENE-675:
----------------------------------------

1st run downloaded the documents from the Web before starting to index. 
2nd run started right off - as input docs are already in place - great. 

Seems the only output is what is printed to stdout, right? 


GSI: The Benchmarker interface does return the TimeData, so other 
implementations, etc. could use the results programmatically.



I like much the logic of loading test data from the Web, and the scaleUp and 
maximumDocumentsToIndex params are handy. 

It seems that all the test logic and some of its data (queries) are java coded. 
I initially thought of a setting where we define tasks/jobs that are 
parameterized, like:

- createIndex(params)
- writeToIndex(params):
  - addDocs()
  - optimize()
- readFromIndex(params):
  - searchIndex()
  - fetchData()


GSI: I definitely agree that we want a more flexible one to meet people's 
benchmarking needs.  I wanted at least one test that is "standard" in that you 
can't change the parameters and test cases, so that we can all be on the same 
page on a run.  Then, when people are having discussions on performance they 
can say "I ran the standard benchmark before and after and here are the 
results" and we all know what they are talking about.  I think all the 
components are there for a parameterized version, all it takes is someone to 
extend the Standard one or implement there own that reads in a config file.  I 
will try to put in a fully parameterized version soon.  


GSI: Thanks for the fixes, I will incorporate into my version and post another 
patch soon.

> Lucene benchmark: objective performance test for Lucene
> -------------------------------------------------------
>
>                 Key: LUCENE-675
>                 URL: http://issues.apache.org/jira/browse/LUCENE-675
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Andrzej Bialecki 
>         Assigned To: Grant Ingersoll
>         Attachments: benchmark.patch, BenchmarkingIndexer.pm, 
> extract_reuters.plx, LuceneBenchmark.java, LuceneIndexer.java, timedata.zip
>
>
> We need an objective way to measure the performance of Lucene, both indexing 
> and querying, on a known corpus. This issue is intended to collect comments 
> and patches implementing a suite of such benchmarking tests.
> Regarding the corpus: one of the widely used and freely available corpora is 
> the original Reuters collection, available from 
> http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/news20.tar.gz 
> or 
> http://people.csail.mit.edu/u/j/jrennie/public_html/20Newsgroups/20news-18828.tar.gz.
>  I propose to use this corpus as a base for benchmarks. The benchmarking 
> suite could automatically retrieve it from known locations, and cache it 
> locally.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to