[
http://issues.apache.org/jira/browse/LUCENE-675?page=comments#action_12449117 ]
Doron Cohen commented on LUCENE-675:
------------------------------------
I looked at extending the benchmark with:
- different test "scenarios", i.e. other sequences of operations.
- multithreaded tests, e.g. several queries in parallel.
- rate of events, e.g. "2 queries arriving per second", or "one query per
second in parallel with 20 new documents in a minute".
- different data sources (input documents, queries).
For this I made lots of changes to the benchmark code, using parts of it and
rewriting other parts.
I would like to submit this code in a few days - it is running already but some
functionality is missing.
I would like to describe how it works to hopefully get early feedback.
There are several "basic tasks" defined - all extending an (abstract) class
PerfTask:
- AddDocTask
- OptimizeTask
- CreateIndexTask
etc.
To further extend the benchmark 'framework', new tasks can be added. Each task
must implement the abstract method: doLogic(). For instance, in AddDocTask this
method (doLogic) would call indexWriter.addDocument().
There are also setup() and tearDown() methods for performing work that should
not be timed for that task.
A special TaskSequence task contains other tasks. It is either parallel or
sequential, which tells if it executes its child tasks serially or in parallel.
TaskSequence also supports "rate": the pace in which its child tasks are
"fired" can be controlled.
With these tasks, it is possible to describe a performance test 'algorithm' in
a simple syntax.
('algorithm' may be too big a word for this...?)
A test invocation takes two parameters:
- test.properties - file with various config properties.
- test.alg - file with the algorithm.
By convention, for each task class "OpNameTask", the command "OpName" is
valid in test.alg.
Adding a single document is done by:
AddDoc
Adding 3 documents:
AddDoc
AddDoc
AddDoc
Or, alternatively:
{ AddDoc } : 3
So, '{' and '}' indicate a serial sequence of (child) tasks.
To fire 100 queries in a row:
{ Search } : 100
To fire 100 queries in parallel:
[ Search ] : 100
So, '[' and ']' indicate a parallel group of tasks.
To fire 100 queries in a row, 2 queries per second (120 per minute):
{ Search } : 100 : 120
Similar, but in parallel:
[ Search ] : 100 : 120
A sequence task can be named for identifying it in reports:
{ "QueriesA" Search } : 100 : 120
And there are tasks that create reports.
There are more tasks, and more to tell on the alg syntax, but this post is
already long..
I find this quite powerful for perf testing.
What do you (and you) think?
- Doron
> Lucene benchmark: objective performance test for Lucene
> -------------------------------------------------------
>
> Key: LUCENE-675
> URL: http://issues.apache.org/jira/browse/LUCENE-675
> Project: Lucene - Java
> Issue Type: Improvement
> Reporter: Andrzej Bialecki
> Assigned To: Grant Ingersoll
> Attachments: benchmark.patch, BenchmarkingIndexer.pm,
> extract_reuters.plx, LuceneBenchmark.java, LuceneIndexer.java, timedata.zip
>
>
> We need an objective way to measure the performance of Lucene, both indexing
> and querying, on a known corpus. This issue is intended to collect comments
> and patches implementing a suite of such benchmarking tests.
> Regarding the corpus: one of the widely used and freely available corpora is
> the original Reuters collection, available from
> http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/news20.tar.gz
> or
> http://people.csail.mit.edu/u/j/jrennie/public_html/20Newsgroups/20news-18828.tar.gz.
> I propose to use this corpus as a base for benchmarks. The benchmarking
> suite could automatically retrieve it from known locations, and cache it
> locally.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]