Hi,
Or Lucene is more like Google in this sense, meaning that the time
doesn't depend on the size of the matched result
i found that it takes long time if the result set is bigger(upto 25 sec for 29 M results). But for smaller resultset of size approx 10,000 it takes approx. 200 ms. On 6/27/06, Vladimir Olenin <[EMAIL PROTECTED]> wrote:
Thanks, Mike. This info is actually quite helpful. What is 'times 10 rule' you are refering to? Also, I wonder how Lucene is handling the growth of the result set returned by the query? In the various search engine implementations I did myself for several projects that was one of the things which made reponse time to grow with the size of the result set. Eg, does this happen with Lucene or not?: - query 1, returns 1000 ranked results, exec. time is 0.5s - query 2, returns 10000 ranked results, exec. time is 0.7s - query 3, returns 100000 ranked results, exec. time is 1.0s - query 4, returns 1000000 ranked results, exec. time is 3.0s - query 5, returns 10000000 ranked results, exec. time is 10.0s By 'ranked results' I mean you can retrieve 'top X' 'best matched' documents. Or Lucene is more like Google in this sense, meaning that the time doesn't depend on the size of the matched result set and the implementation can statistically (or somehow else) deduce approximate size of the full result set, while not actually counting every single document in the set (eg, 'search query returned _approximately_ 54 million documents'). Yet another question would be what is the best book (if there are more than one), that can be recommended as an introduction as well as 'in-depth' coverage of the latest version of Lucene? Thanks everyone for answering this post - your feedback is very helpful! Vlad -----Original Message----- From: Mike Streeton [mailto:[EMAIL PROTECTED] Sent: Tuesday, June 27, 2006 2:59 AM To: java-user@lucene.apache.org Subject: RE: search performance benchmarks We recently ran some benchmarks on Linux with 4 xeon cpus and 2gb of heap (not that this was needed). We managed to easily get 1000 term based queries a second, this including the query execution time and retrieving the top 10 documents from the index. We did notice some contention as adding more clients (threads) kept the same average execution time but increased the max processing time for some queries. So the addition of clients caused a queue to build up, but the results were still sub second with 100 clients, simultaneously executing queries and using the times 10 rule, this would represent 1000 connected users. Mike www.ardentia.com the home of NetSearch -----Original Message----- From: Wang, Jeff [mailto:[EMAIL PROTECTED] Sent: 26 June 2006 19:50 To: java-user@lucene.apache.org Subject: RE: search performance benchmarks Performance varies a lot, and depends upon the number of indexes, the number of fields, and the CPU/memory configuration. For myself, a 65Gb source indexed to 1Gb (or so) returns single term queries (oh yeah, the query makeup also matters a lot) in sub seconds on a Intel dual processor (each is 3.6Ghz I think.) I frankly haven't tested out scalability yet. Jeff Emptoris, Inc. -----Original Message----- From: Vladimir Olenin [mailto:[EMAIL PROTECTED] Sent: Monday, June 26, 2006 7:56 AM To: java-user@lucene.apache.org Subject: search performance benchmarks Hi, I'm evaluating Lucene right now to use as a base for one open source project. I found some _indexing_ benchmarks on the lucene website (http://lucene.apache.org/java/docs/benchmarks.html), but, after a short browsing, couldn't find any 'runtime' performance benchmarks (Query speed). Only one of the benchmarks contained some reference to the query execution... Is there any other source of benchmarks I can refer to? Or probably some heruistic rule that can help to estimate query execution time? Thanks. Vlad PS: let me know if details of the searched data will help in evaluation - I'll be able to provide what I know at this point... --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]