[ http://issues.apache.org/jira/browse/LUCENE-675?page=all ]
Doron Cohen updated LUCENE-675: ------------------------------- Attachment: benchmark.byTask.patch I am attaching benchmark.byTask.patch - to be applied in the contrib/benchmark directory. Root package of byTask classes was modified to org.apache.lucene.benchmark.byTask, in the lines of Grant's suggestion - seems better cause it keeps all benchmark classes under lucene.benchmark. I added one a sample .alg under conf and added some documentation. Entry point - documentation wise - is the package doc for org.apache.lucene.benchmark.byTask. Thanks for any comments on this! PS. Before submitting the patch file, I tried to apply it myself on a clean version of the code, just to make sure that it works. But I got errors like this -- Could not retrieve revision 0 of "...\byTask\.." -- for every file under a new folder. So I am not sure if it is just my (Windows) svn patch applying utility, or is it really impossible to apply a patch that creates files in (yet) nonexistent directories. I searched Lucene mailing lists and SVN mailing lists and went again through the SVN book again but nowhere could I find what is the expected behavior for applying a patch containing new directories. In fact, "svn diff" would not even show you files that are new (again, this is the Windows svn 1.4.2 version). (I used Tortoise SVN to create the patch). This is rather annoying and I might be misunderstanding something basic about SVN, but I thought it'd be better to share this experience here - might save some time for others trying to apply this patch or other patches ... > Lucene benchmark: objective performance test for Lucene > ------------------------------------------------------- > > Key: LUCENE-675 > URL: http://issues.apache.org/jira/browse/LUCENE-675 > Project: Lucene - Java > Issue Type: Improvement > Reporter: Andrzej Bialecki > Assigned To: Grant Ingersoll > Attachments: benchmark.byTask.patch, benchmark.patch, > BenchmarkingIndexer.pm, extract_reuters.plx, LuceneBenchmark.java, > LuceneIndexer.java, taskBenchmark.zip, timedata.zip, tiny.alg, tiny.properties > > > We need an objective way to measure the performance of Lucene, both indexing > and querying, on a known corpus. This issue is intended to collect comments > and patches implementing a suite of such benchmarking tests. > Regarding the corpus: one of the widely used and freely available corpora is > the original Reuters collection, available from > http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/news20.tar.gz > or > http://people.csail.mit.edu/u/j/jrennie/public_html/20Newsgroups/20news-18828.tar.gz. > I propose to use this corpus as a base for benchmarks. The benchmarking > suite could automatically retrieve it from known locations, and cache it > locally. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]