Hi,
Or Lucene is more like Google in this sense, meaning that the time
doesn't depend on the size of the matched result

i found that it takes long time if the result set is bigger(upto 25 sec for
29 M results). But for smaller resultset of size approx 10,000 it takes
approx. 200 ms.





On 6/27/06, Vladimir Olenin <[EMAIL PROTECTED]> wrote:

Thanks, Mike. This info is actually quite helpful. What is 'times 10
rule' you are refering to?

Also, I wonder how Lucene is handling the growth of the result set
returned by the query? In the various search engine implementations I
did myself for several projects that was one of the things which made
reponse time to grow with the size of the result set. Eg, does this
happen with Lucene or not?:

- query 1, returns 1000 ranked results, exec. time is 0.5s
- query 2, returns 10000 ranked results, exec. time is 0.7s
- query 3, returns 100000 ranked results, exec. time is 1.0s
- query 4, returns 1000000 ranked results, exec. time is 3.0s
- query 5, returns 10000000 ranked results, exec. time is 10.0s

By 'ranked results' I mean you can retrieve 'top X' 'best matched'
documents.

Or Lucene is more like Google in this sense, meaning that the time
doesn't depend on the size of the matched result set and the
implementation can statistically (or somehow else) deduce approximate
size of the full result set, while not actually counting every single
document in the set (eg, 'search query returned _approximately_ 54
million documents').

Yet another question would be what is the best book (if there are more
than one), that can be recommended as an introduction as well as
'in-depth' coverage of the latest version of Lucene?

Thanks everyone for answering this post - your feedback is very helpful!

Vlad

-----Original Message-----
From: Mike Streeton [mailto:[EMAIL PROTECTED]
Sent: Tuesday, June 27, 2006 2:59 AM
To: java-user@lucene.apache.org
Subject: RE: search performance benchmarks

We recently ran some benchmarks on Linux with 4 xeon cpus and 2gb of
heap (not that this was needed). We managed to easily get 1000 term
based queries a second, this including the query execution time and
retrieving the top 10 documents from the index. We did notice some
contention as adding more clients (threads) kept the same average
execution time but increased the max processing time for some queries.
So the addition of clients caused a queue to build up, but the results
were still sub second with 100 clients, simultaneously executing queries
and using the times 10 rule, this would represent 1000 connected users.

Mike

www.ardentia.com the home of NetSearch
-----Original Message-----
From: Wang, Jeff [mailto:[EMAIL PROTECTED]
Sent: 26 June 2006 19:50
To: java-user@lucene.apache.org
Subject: RE: search performance benchmarks

Performance varies a lot, and depends upon the number of indexes, the
number of fields, and the CPU/memory configuration.  For myself, a 65Gb
source indexed to 1Gb (or so) returns single term queries (oh yeah, the
query makeup also matters a lot) in sub seconds on a Intel dual
processor (each is 3.6Ghz I think.)  I frankly haven't tested out
scalability yet.

Jeff
Emptoris, Inc.

-----Original Message-----
From: Vladimir Olenin [mailto:[EMAIL PROTECTED]
Sent: Monday, June 26, 2006 7:56 AM
To: java-user@lucene.apache.org
Subject: search performance benchmarks

Hi,

I'm evaluating Lucene right now to use as a base for one open source
project. I found some _indexing_ benchmarks on the lucene website
(http://lucene.apache.org/java/docs/benchmarks.html), but, after a short
browsing, couldn't find any 'runtime' performance benchmarks (Query
speed). Only one of the benchmarks contained some reference to the query
execution... Is there any other source of benchmarks I can refer to? Or
probably some heruistic rule that can help to estimate query execution
time?

Thanks.

Vlad

PS: let me know if details of the searched data will help in evaluation
- I'll be able to provide what I know at this point...

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Reply via email to