[jira] Commented: (LUCENE-2061) Create benchmark & approach for testing Lucene's near real-time performance

Jason Rutherglen (JIRA) Sat, 28 Nov 2009 15:30:46 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-2061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783322#action_12783322
 ]


Jason Rutherglen commented on LUCENE-2061:
------------------------------------------

Mike, I tried running nrtBench.py, generated a 2 million doc
index as I didn't want to wait for the 5 mil to finish. 

Can you post the queries file you've used? (the nrtBench was
looking for it) I'd like to keep things as consistent as
possible between runs. 

I haven't seen the same results in regards to the OS managing
small files, and I suspect that users in general will choose a
variety of parameters (i.e. 1 max buffered doc) that makes
writing to disk inherently slow. Logically the OS should work as
a write cache, however in practice, it seems a variety of users
have reported otherwise. Maybe 100 docs works, however that
feels like a fairly narrow guideline for user's of NRT.

The latest LUCENE-1313 is a step in a direction that doesn't
change IW internals too much.

> Create benchmark & approach for testing Lucene's near real-time performance
> ---------------------------------------------------------------------------
>
>                 Key: LUCENE-2061
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2061
>             Project: Lucene - Java
>          Issue Type: Task
>          Components: Index
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>         Attachments: LUCENE-2061.patch, LUCENE-2061.patch, LUCENE-2061.patch
>
>
> With the improvements to contrib/benchmark in LUCENE-2050, it's now
> possible to create compelling algs to test indexing & searching
> throughput against a periodically reopened near-real-time reader from
> the IndexWriter.
> Coming out of the discussions in LUCENE-1526, I think to properly
> characterize NRT, we should measure net search throughput as a
> function of both reopen rate (ie how often you get a new NRT reader
> from the writer) and indexing rate.  We should also separately measure
> pure adds vs updates (deletes + adds); the latter is much more work
> for Lucene.
> This can help apps make capacity decisions... and can help us test
> performance of pending improvements for NRT (eg LUCENE-1313,
> LUCENE-2047).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-2061) Create benchmark & approach for testing Lucene's near real-time performance

Reply via email to