Look at the Benchmarks page on Lucene's site. It is not complete (heh, it can never be complete), but it will give you some ideas about Lucene's performance. Feel free to submit your benchmarks, using this template:
http://jakarta.apache.org/lucene/docs/benchmarktemplate.xml Thank you, Otis --- Alex Aw Seat Kiong <[EMAIL PROTECTED]> wrote: > > Hi Doug Cutting ! > > That's really very helpful, thanks to Doug. > I'm doing the performance research of the lucene speed of indexing > and > searching engine. > So, isn't able to give me more details of > 1. searching > >But if you > > need to search two million 2kB documents on a 500Mhz Pentium with > 128MB of > > RAM in a couple of seconds per query, you're probably okay. > What is the other hardware spec, like > - SCSI harddisk or IDE harddisk? If it's SCSI harddisk, what is the > model of > the harddisk and SCSI card model, XXXX PRM? > - Which OS was use for this performance testing? > - Which Application Server was use for this performance testing? > > 2. indexing (assume the hardware and software spec is same as > searching > server) > Index space should be generally less than the original document size, > right? > Assume, for 500MB Disk Space for the application, > Max index size : should been more than 250,000 document > in 2 KB > size, right? > Max Speed of indexing : ??? documents in 2KB size per hours > > > Can share the performance test was done to among of us? > > Thank You. > > Regards, > AlexAw > > > > > > ----- Original Message ----- > From: "Maurice Coyle" <[EMAIL PROTECTED]> > To: "Lucene Users List" <[EMAIL PROTECTED]> > Sent: Tuesday, October 28, 2003 6:50 PM > Subject: Re: large index query time > > > > that's very helpful, thanks to all who replied. > > > > my index is definitely larger than my RAM so i guess the increase > in query > > time is due to an increase in time to open the index/perform a > search. > > > > thanks again, > > maurice > > > > > > ----- Original Message ----- > > From: "Tate Avery" <[EMAIL PROTECTED]> > > To: "Lucene Users List" <[EMAIL PROTECTED]> > > Sent: Friday, October 24, 2003 5:33 PM > > Subject: RE: large index query time > > > > > > > > Below are some posts from Doug (circa 2001) that I found very > helpful with > > regard to understanding Lucene scalability. I am assuming that > they are > > still generally applicable. You might also find them useful. > > > > Tate > > > > > > ----------------------------------------------------------- > > > > > > Performance for large indices is frequently governed by i/o > performance. > If > > an index is larger than RAM then searches will need to read data > from > disk. > > This can quickly become a bottleneck. A search for a term that > occurs in > a > > million documents can require over 1MB of data, which can take some > time > to > > read. With multiple searching threads, the disk can easily become > a > > bottleneck. Disk arrays can alleviate this, more RAM helps even > more! > > > > For some folks, queries that take over a second are unacceptable, > for > > others, ten seconds is okay. > > > > Performance should be more-or-less linear: a two-million document > index > will > > be almost twice as slow to search as a one-million document index. > There > > are lots of factors, including document size, CPU-speed, RAM-size, > i/o > > subsystem, but a rough rule-of-thumb for Lucene performance might > be that, > > in a "typical" configuration, it can search a million documents per > second. > > > > So if you need to search 20 million 100kB documents on a 100Mhz 386 > with > 8MB > > of RAM with sub-second response time, Lucene will probably fail. > But if > you > > need to search two million 2kB documents on a 500Mhz Pentium with > 128MB of > > RAM in a couple of seconds per query, you're probably okay. > > > > - Doug Cutting (10/08/2001) > > > > > > Some more precise statements: The cost to search for a term is > proportional > > to the number of documents that contain that term. The cost to > search for > a > > phrase is proportional to the sum of the number of occurrences of > its > > constituent terms. The cost to execute a boolean query is the sum > of the > > costs of its sub-queries. Longer documents contain more terms: > usually > both > > more unique terms and more occurrences. > > > > Total vocabulary size is not a big factor in search performance. > When you > > open an index Lucene does read one out of every 128 unique terms > into a > > table, so an index with a large number of unique terms will be > slower to > > open. Searching that table for query terms is also slower for > bigger > > indexes, but the time to search that table is not significant in > overall > > performance. Lucene also reads at index open one byte per document > per > > indexed field (the normalization factor). So an index with lots of > > documents and fields will also be slower to open. But, once > opened, the > > cost of searching is largely dependent on the frequency > characteristics of > > query terms. And, since IndexReaders and Searchers are thread > safe, you > > don't need to open indexes very often. > > > > - Doug Cutting (10/08/2001) > > > > > > > > > > > > -----Original Message----- > > From: Dan Quaroni [mailto:[EMAIL PROTECTED] > > Sent: October 24, 2003 1:33 PM > > To: 'Lucene Users List' > > Subject: RE: large index query time > > > > > > My experience is that the query time (and memory usage) can be > affected > > greatly by booleans that retrieve lots of results. > > > > Are you finding it slow when doing a simple query that should > return only > a > > handful of results, or is it on more complex queries? > > > > -----Original Message----- > > From: Maurice Coyle [mailto:[EMAIL PROTECTED] > > Sent: Friday, October 24, 2003 1:29 PM > > To: Lucene Users List > > Subject: large index query time > > > > > > hi, > > i recently merged a whole lot of indexes into one big index for > testing > > purposes. however, now the programs i use to search the index are > taking > > much longer. this may be a stupid question (or very simple) and > please > tell > > me if it is, but should this be the case? i mean, i realise it'll > take > > longer to search over a larger collection, but it's taking an order > of > > magnitude longer. this is the reaosn i'm asking, since if lucene > is > capable > > of handling large-scale search apps presumably it's set up to > search large > > collections rapidly. > > > > maybe there's some steps i can take to speed things up (i optimised > the > big > > index when it was finished being created) or something i'm missing? > if i > > can give any information which will help the diagnosis of this > problem > > please specify it. > > > > thanks, > > maurice > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: > [EMAIL PROTECTED] > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: > [EMAIL PROTECTED] > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: > [EMAIL PROTECTED] > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: > [EMAIL PROTECTED] > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > __________________________________ Do you Yahoo!? Exclusive Video Premiere - Britney Spears http://launch.yahoo.com/promos/britneyspears/ --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]