Re: problems with lucene in multithreaded environment
--- Doug Cutting <[EMAIL PROTECTED]> wrote: > Jayant Kumar wrote: > > Thanks for the patch. It helped in increasing the > > search speed to a good extent. > > Good. I'll commit it. Thanks for testing it. > > > But when we tried to > > give about 100 queries in 10 seconds, then again > we > > found that after about 15 seconds, the response > time > > per query increased. > > This still sounds very slow to me. Is your index > optimized? What JVM > are you using? Yes our index is optimized and we are using java version 1.4.2. What would be the difference between search on a optimized index and that on an unoptimized index. > > You might also consider ramping up your benchmark > more slowly, to warm > the filesystem's cache. So, when you first launch > the server, give it a > few queries at a lower rate, then, after those have > completed, try a > higher rate. > We have tried this out and since we have lots of ram, once the indexes go into the os cache, results come out fast. > > We were able to simplify the searches further by > > consolidating the fields in the index but that > > resulted in increasing the index size to 2.5 GB as > we > > required fields 2-5 and fields 1-7 in different > > searches. > > That will slow updates a bit, but searching should > be faster. > > How about your range searches? Do you know how many > terms they match? > The easiest way to determine this might be to insert > a print statement > in RangeQuery.rewrite() that shows the query before > it is returned. > > > Our indexes are on the local disk therefor > > there is no network i/o involved. > > It does like file i/o is now your bottleneck. The > traces below show > that you're using the compound file format, which > combines many files > into one. When two threads try to read two > logically different files > (.prx and .frq below) they must sychronize when the > compound format is > used. But if your application did not use the > compound format this > synchronization would not be required. So you > should try rebuilding > your index with the compound format turned off. > (The fastest way to do > this is simply to add and/or delete a single > document, then re-optimize > the index with compound format turned off. This > will cause the index to > be re-written in non-compound format.) We have changed the indexwriter to write index in non-compound format and have noticed a little difference in the response time. > > Is this on linux? If so, please try running 'iostat > -x 1' while you > perform your benchmark (iostat is installed by the > 'sysstat' package). > What percentage is the disk utilized (%util)? What > is the percentage of > idle CPU (%idle)? What is the rate of data that is > read (rkB/s)? If > things really are i/o bound then you might consider > spreading the data > over multiple disks, e.g., with lvm striping or a > RAID controller. > > If you have a lot of RAM, then you could also > consider moving certain > files of the index onto a ramfs-based drive. For > example, moving the > .tis, .frq and .prx can greatly improve performance. > Also, having these > files in RAM means that the cache does not need to > be warmed. > > Hope this helps! Thanks for the help. We will continue interacting with you as and when we face problems. We would also like you to know that lucene is an excellent search engine compared to any other. Jayant Yahoo! India Matrimony: Find your partner online. http://yahoo.shaadi.com/india-matrimony/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: problems with lucene in multithreaded environment
Jayant Kumar wrote: Thanks for the patch. It helped in increasing the search speed to a good extent. Good. I'll commit it. Thanks for testing it. But when we tried to give about 100 queries in 10 seconds, then again we found that after about 15 seconds, the response time per query increased. This still sounds very slow to me. Is your index optimized? What JVM are you using? You might also consider ramping up your benchmark more slowly, to warm the filesystem's cache. So, when you first launch the server, give it a few queries at a lower rate, then, after those have completed, try a higher rate. We were able to simplify the searches further by consolidating the fields in the index but that resulted in increasing the index size to 2.5 GB as we required fields 2-5 and fields 1-7 in different searches. That will slow updates a bit, but searching should be faster. How about your range searches? Do you know how many terms they match? The easiest way to determine this might be to insert a print statement in RangeQuery.rewrite() that shows the query before it is returned. Our indexes are on the local disk therefor there is no network i/o involved. It does like file i/o is now your bottleneck. The traces below show that you're using the compound file format, which combines many files into one. When two threads try to read two logically different files (.prx and .frq below) they must sychronize when the compound format is used. But if your application did not use the compound format this synchronization would not be required. So you should try rebuilding your index with the compound format turned off. (The fastest way to do this is simply to add and/or delete a single document, then re-optimize the index with compound format turned off. This will cause the index to be re-written in non-compound format.) Is this on linux? If so, please try running 'iostat -x 1' while you perform your benchmark (iostat is installed by the 'sysstat' package). What percentage is the disk utilized (%util)? What is the percentage of idle CPU (%idle)? What is the rate of data that is read (rkB/s)? If things really are i/o bound then you might consider spreading the data over multiple disks, e.g., with lvm striping or a RAID controller. If you have a lot of RAM, then you could also consider moving certain files of the index onto a ramfs-based drive. For example, moving the .tis, .frq and .prx can greatly improve performance. Also, having these files in RAM means that the cache does not need to be warmed. Hope this helps! Doug > "Thread-23" prio=1 tid=0x08169f38 nid=0x2867 waiting for monitor entry [69bd4000..69bd48c8] at org.apache.lucene.index.CompoundFileReader$CSInputStream.readInternal(CompoundFileReader.java:217) - waiting to lock <0x46f1b828> (a org.apache.lucene.store.FSInputStream) at org.apache.lucene.store.InputStream.refill(InputStream.java:158) at org.apache.lucene.store.InputStream.readByte(InputStream.java:43) at org.apache.lucene.store.InputStream.readVInt(InputStream.java:83) at org.apache.lucene.index.SegmentTermPositions.next(SegmentTermPositions.java:58) "Thread-22" prio=1 tid=0x08159f78 nid=0x2866 waiting for monitor entry [69b53000..69b538c8] at org.apache.lucene.index.CompoundFileReader$CSInputStream.readInternal(CompoundFileReader.java:217) - waiting to lock <0x46f1b828> (a org.apache.lucene.store.FSInputStream) at org.apache.lucene.store.InputStream.refill(InputStream.java:158) at org.apache.lucene.store.InputStream.readByte(InputStream.java:43) at org.apache.lucene.store.InputStream.readVInt(InputStream.java:86) at org.apache.lucene.index.SegmentTermDocs.read(SegmentTermDocs.java:126) - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: problems with lucene in multithreaded environment
Thanks for the patch. It helped in increasing the search speed to a good extent. But when we tried to give about 100 queries in 10 seconds, then again we found that after about 15 seconds, the response time per query increased. Enclosed is the dump which we took after about 30 seconds of starting the search. The maximum query time has reduced from 200-300 seconds to about 50 seconds. We were able to simplify the searches further by consolidating the fields in the index but that resulted in increasing the index size to 2.5 GB as we required fields 2-5 and fields 1-7 in different searches. Our indexes are on the local disk therefor there is no network i/o involved. Thanks Jayant --- Doug Cutting <[EMAIL PROTECTED]> wrote: > Doug Cutting wrote: > > Please tell me if you are able to simplify your > queries and if that > > speeds things. I'll look into a ThreadLocal-based > solution too. > > I've attached a patch that should help with the > thread contention, > although I've not tested it extensively. > > I still don't fully understand why your searches are > so slow, though. > Are the indexes stored on the local disk of the > machine? Indexes > accessed over the network can be very slow. > > Anyway, give this patch a try. Also, if anyone else > can try this and > report back whether it makes multi-threaded > searching faster, or > anything else slower, or is buggy, that would be > great. > > Thanks, > > Doug > > Index: > src/java/org/apache/lucene/index/TermInfosReader.java > === > RCS file: > /home/cvs/jakarta-lucene/src/java/org/apache/lucene/index/TermInfosReader.java,v > retrieving revision 1.6 > diff -u -u -r1.6 TermInfosReader.java > --- > src/java/org/apache/lucene/index/TermInfosReader.java > 20 May 2004 11:23:53 -1.6 > +++ > src/java/org/apache/lucene/index/TermInfosReader.java > 4 Jun 2004 21:45:15 - > @@ -29,7 +29,8 @@ >private String segment; >private FieldInfos fieldInfos; > > - private SegmentTermEnum enumerator; > + private ThreadLocal enumerators = new > ThreadLocal(); > + private SegmentTermEnum origEnum; >private long size; > >TermInfosReader(Directory dir, String seg, > FieldInfos fis) > @@ -38,19 +39,19 @@ > segment = seg; > fieldInfos = fis; > > -enumerator = new > SegmentTermEnum(directory.openFile(segment + > ".tis"), > -fieldInfos, false); > -size = enumerator.size; > +origEnum = new > SegmentTermEnum(directory.openFile(segment + > ".tis"), > + fieldInfos, > false); > +size = origEnum.size; > readIndex(); >} > >public int getSkipInterval() { > -return enumerator.skipInterval; > +return origEnum.skipInterval; >} > >final void close() throws IOException { > -if (enumerator != null) > - enumerator.close(); > +if (origEnum != null) > + origEnum.close(); >} > >/** Returns the number of term/value pairs in the > set. */ > @@ -58,6 +59,15 @@ > return size; >} > > + private SegmentTermEnum getEnum() { > +SegmentTermEnum enum = > (SegmentTermEnum)enumerators.get(); > +if (enum == null) { > + enum = terms(); > + enumerators.set(enum); > +} > +return enum; > + } > + >Term[] indexTerms = null; >TermInfo[] indexInfos; >long[] indexPointers; > @@ -102,16 +112,17 @@ >} > >private final void seekEnum(int indexOffset) > throws IOException { > -enumerator.seek(indexPointers[indexOffset], > - (indexOffset * enumerator.indexInterval) - > 1, > +getEnum().seek(indexPointers[indexOffset], > + (indexOffset * getEnum().indexInterval) - 1, > indexTerms[indexOffset], > indexInfos[indexOffset]); >} > >/** Returns the TermInfo for a Term in the set, > or null. */ > - final synchronized TermInfo get(Term term) throws > IOException { > + TermInfo get(Term term) throws IOException { > if (size == 0) return null; > > -// optimize sequential access: first try > scanning cached enumerator w/o seeking > +// optimize sequential access: first try > scanning cached enum w/o seeking > +SegmentTermEnum enumerator = getEnum(); > if (enumerator.term() != null > // term is at or past current > && ((enumerator.prev != null && > term.compareTo(enumerator.prev) > 0) > || term.compareTo(enumerator.term()) >= 0)) { > @@ -128,6 +139,7 @@ > >/** Scans within block for matching term. */ >private final TermInfo scanEnum(Term term) throws > IOException { > +SegmentTermEnum enumerator = getEnum(); > while (term.compareTo(enumerator.term()) > 0 && > enumerator.next()) {} > if (enumerator.term() != null && > term.compareTo(enumerator.term()) == 0) >return enumerator.termInfo(); > @@ -136,10 +148,12 @@ >} > >/** Returns the nth term in the set. */ > - final synchronized Te
Re: problems with lucene in multithreaded environment
Doug Cutting wrote: Please tell me if you are able to simplify your queries and if that speeds things. I'll look into a ThreadLocal-based solution too. I've attached a patch that should help with the thread contention, although I've not tested it extensively. I still don't fully understand why your searches are so slow, though. Are the indexes stored on the local disk of the machine? Indexes accessed over the network can be very slow. Anyway, give this patch a try. Also, if anyone else can try this and report back whether it makes multi-threaded searching faster, or anything else slower, or is buggy, that would be great. Thanks, Doug Index: src/java/org/apache/lucene/index/TermInfosReader.java === RCS file: /home/cvs/jakarta-lucene/src/java/org/apache/lucene/index/TermInfosReader.java,v retrieving revision 1.6 diff -u -u -r1.6 TermInfosReader.java --- src/java/org/apache/lucene/index/TermInfosReader.java 20 May 2004 11:23:53 - 1.6 +++ src/java/org/apache/lucene/index/TermInfosReader.java 4 Jun 2004 21:45:15 - @@ -29,7 +29,8 @@ private String segment; private FieldInfos fieldInfos; - private SegmentTermEnum enumerator; + private ThreadLocal enumerators = new ThreadLocal(); + private SegmentTermEnum origEnum; private long size; TermInfosReader(Directory dir, String seg, FieldInfos fis) @@ -38,19 +39,19 @@ segment = seg; fieldInfos = fis; -enumerator = new SegmentTermEnum(directory.openFile(segment + ".tis"), - fieldInfos, false); -size = enumerator.size; +origEnum = new SegmentTermEnum(directory.openFile(segment + ".tis"), + fieldInfos, false); +size = origEnum.size; readIndex(); } public int getSkipInterval() { -return enumerator.skipInterval; +return origEnum.skipInterval; } final void close() throws IOException { -if (enumerator != null) - enumerator.close(); +if (origEnum != null) + origEnum.close(); } /** Returns the number of term/value pairs in the set. */ @@ -58,6 +59,15 @@ return size; } + private SegmentTermEnum getEnum() { +SegmentTermEnum enum = (SegmentTermEnum)enumerators.get(); +if (enum == null) { + enum = terms(); + enumerators.set(enum); +} +return enum; + } + Term[] indexTerms = null; TermInfo[] indexInfos; long[] indexPointers; @@ -102,16 +112,17 @@ } private final void seekEnum(int indexOffset) throws IOException { -enumerator.seek(indexPointers[indexOffset], - (indexOffset * enumerator.indexInterval) - 1, +getEnum().seek(indexPointers[indexOffset], + (indexOffset * getEnum().indexInterval) - 1, indexTerms[indexOffset], indexInfos[indexOffset]); } /** Returns the TermInfo for a Term in the set, or null. */ - final synchronized TermInfo get(Term term) throws IOException { + TermInfo get(Term term) throws IOException { if (size == 0) return null; -// optimize sequential access: first try scanning cached enumerator w/o seeking +// optimize sequential access: first try scanning cached enum w/o seeking +SegmentTermEnum enumerator = getEnum(); if (enumerator.term() != null // term is at or past current && ((enumerator.prev != null && term.compareTo(enumerator.prev) > 0) || term.compareTo(enumerator.term()) >= 0)) { @@ -128,6 +139,7 @@ /** Scans within block for matching term. */ private final TermInfo scanEnum(Term term) throws IOException { +SegmentTermEnum enumerator = getEnum(); while (term.compareTo(enumerator.term()) > 0 && enumerator.next()) {} if (enumerator.term() != null && term.compareTo(enumerator.term()) == 0) return enumerator.termInfo(); @@ -136,10 +148,12 @@ } /** Returns the nth term in the set. */ - final synchronized Term get(int position) throws IOException { + final Term get(int position) throws IOException { if (size == 0) return null; -if (enumerator != null && enumerator.term() != null && position >= enumerator.position && +SegmentTermEnum enumerator = getEnum(); +if (enumerator != null && enumerator.term() != null && +position >= enumerator.position && position < (enumerator.position + enumerator.indexInterval)) return scanEnum(position); // can avoid seek @@ -148,6 +162,7 @@ } private final Term scanEnum(int position) throws IOException { +SegmentTermEnum enumerator = getEnum(); while(enumerator.position < position) if (!enumerator.next()) return null; @@ -156,12 +171,13 @@ } /** Returns the position of a Term in the set or -1. */ - final synchronized long getPosition(Term term) throws IOException { + final long getPosition(Term term) throws IOException { if (size == 0) return -1; int indexOffset = getIndexOffset(term); seekEnum(indexOffset); +SegmentTermEnum enumerator
Re: problems with lucene in multithreaded environment
Jayant Kumar wrote: Please find enclosed jvmdump.txt which contains a dump of our search program after about 20 seconds of starting the program. Also enclosed is the file queries.txt which contains few sample search queries. Thanks for the data. This is exactly what I was looking for. "Thread-14" prio=1 tid=0x080a7420 nid=0x468e waiting for monitor entry [4d61a000..4d61ac18] at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:112) - waiting to lock <0x44c95228> (a org.apache.lucene.index.TermInfosReader) "Thread-12" prio=1 tid=0x080a58e0 nid=0x468e waiting for monitor entry [4d51a000..4d51ad18] at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:112) - waiting to lock <0x44c95228> (a org.apache.lucene.index.TermInfosReader) These are all stuck looking terms up in the dictionary (TermInfos). Things would be much faster if your queries didn't have so many terms. Query : ( ( ( ( ( FIELD1: proof OR FIELD2: proof OR FIELD3: proof OR FIELD4: proof OR FIELD5: proof OR FIELD6: proof OR FIELD7: proof ) AND ( FIELD1: "george bush" OR FIELD2: "george bush" OR FIELD3: "george bush" OR FIELD4: "george bush" OR FIELD5: "george bush" OR FIELD6: "george bush" OR FIELD7: "george bush" ) ) AND ( FIELD1: script OR FIELD2: script OR FIELD3: script OR FIELD4: script OR FIELD5: script OR FIELD6: script OR FIELD7: script ) ) AND ( ( FIELD1: san OR FIELD2: san OR FIELD3: san OR FIELD4: san OR FIELD5: san OR FIELD6: san OR FIELD7: san ) OR ( ( FIELD1: war OR FIELD2: war OR FIELD3: war OR FIELD4: war OR FIELD5: war OR FIELD6: war OR FIELD7: war ) OR ( ( FIELD1: gulf OR FIELD2: gulf OR FIELD3: gulf OR FIELD4: gulf OR FIELD5: gulf OR FIELD6: gulf OR FIELD7: gulf ) OR ( ( FIELD1: laden OR FIELD2: laden OR FIELD3: laden OR FIELD4: laden OR FIELD5: laden OR FIELD6: laden OR FIELD7: laden ) OR ( ( FIE LD1: ttouristeat OR FIELD2: ttouristeat OR FIELD3: ttouristeat OR FIELD4: ttouristeat OR FIELD5: ttouristeat OR FIELD6: ttouristeat OR FIELD7: ttouristeat ) OR ( ( FIELD1: pow OR FIELD2: pow OR FIELD3: pow OR FIELD4: pow OR FIELD5: pow OR FIELD6: pow OR FIELD7: pow ) OR ( FIELD1: bin OR FIELD2: bin OR FIELD3: bin OR FIELD4: bin OR FIELD5: bin OR FIELD6: bin OR FIELD7: bin ) ) ) ) ) ) ) ) ) AND RANGE: ([ 0800 TO 1100 ]) AND ( S_IDa: (7 OR 8 OR 9 OR 10 OR 11 OR 12 OR 13 OR 14 OR 15 OR 16 OR 17 ) or S_IDb: (2 ) ) All your queries look for terms in fields 1-7. If you instead combined the contents of fields 1-7 in a single field, and searched that field, then your searches would contain far fewer terms and be much faster. Also, I don't know how many terms your RANGE queries match, but that could also be introducing large numbers of terms which would slow things down too. But, still, you have identified a bottleneck: TermInfosReader caches a TermEnum and hence access to it must be synchronized. Caching the enum greatly speeds sequential access to terms, e.g., when merging, performing range or prefix queries, etc. Perhaps however the cache should be done through a ThreadLocal, giving each thread its own cache and obviating the need for synchronization... Please tell me if you are able to simplify your queries and if that speeds things. I'll look into a ThreadLocal-based solution too. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: problems with lucene in multithreaded environment
We conducted a test on our search for 500 requests given in 27 seconds. We noticed that in the first 5 seconds, the results were coming in 100 to 500 ms. But as the queue size kept increasing, the response time of the search increased drastically to approx 80-100 seconds. Please find enclosed jvmdump.txt which contains a dump of our search program after about 20 seconds of starting the program. Also enclosed is the file queries.txt which contains few sample search queries. Please note that this is done on a sample of 400,000 documents (450MB) on P4 having 1GB RAM. Kindly let us know if this helps to identify the cause of slow response. Jayant --- Doug Cutting <[EMAIL PROTECTED]> wrote: > Jayant Kumar wrote: > > We recently tested lucene with an index size of 2 > GB > > which has about 1,500,000 documents, each document > > having about 25 fields. The frequency of search > was > > about 20 queries per second. This resulted in an > > average response time of about 20 seconds approx > > per search. > > That sounds slow, unless your queries are very > complex. What are your > queries like? > > > What we observed was that lucene queues > > the queries and does not release them until the > > results are found. so the queries that have come > in > > later take up about 500 seconds. Please let us > know > > whether there is a technique to optimize lucene in > > such circumstances. > > Multiple queries executed from different threads > using a single searcher > should not queue, but should run in parallel. A > technique to find out > where threads are queueing is to get a thread dump > and see where all of > the threads are stuck. In Solaris and Linux, > sending the JVM a SIGQUIT > will give a thread dump. On Windows, use > Control-Break. > > Doug > > - > To unsubscribe, e-mail: > [EMAIL PROTECTED] > For additional commands, e-mail: > [EMAIL PROTECTED] > Yahoo! India Matrimony: Find your partner online. http://yahoo.shaadi.com/india-matrimony/"Thread-14" prio=1 tid=0x080a7420 nid=0x468e waiting for monitor entry [4d61a000..4d61ac18] at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:112) - waiting to lock <0x44c95228> (a org.apache.lucene.index.TermInfosReader) at org.apache.lucene.index.SegmentTermDocs.seek(SegmentTermDocs.java:51) at org.apache.lucene.index.IndexReader.termDocs(IndexReader.java:364) at org.apache.lucene.search.TermQuery$TermWeight.scorer(TermQuery.java:59) at org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:164) at org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:164) at org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:164) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:85) at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:64) at org.apache.lucene.search.Hits.(Hits.java:43) at org.apache.lucene.search.Searcher.search(Searcher.java:33) at org.apache.lucene.search.Searcher.search(Searcher.java:27) at resdex.searchinc.getHits(searchinc.java:752) at resdex.searchinc.Search(searchinc.java:943) at resdex.searchinctest.conductTestSearch(searchinctest.java:99) at resdex.Server$Handler.run(Server.java:64) at java.lang.Thread.run(Thread.java:534) "Thread-12" prio=1 tid=0x080a58e0 nid=0x468e waiting for monitor entry [4d51a000..4d51ad18] at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:112) - waiting to lock <0x44c95228> (a org.apache.lucene.index.TermInfosReader) at org.apache.lucene.index.SegmentTermDocs.seek(SegmentTermDocs.java:51) at org.apache.lucene.index.IndexReader.termDocs(IndexReader.java:364) at org.apache.lucene.search.TermQuery$TermWeight.scorer(TermQuery.java:59) at org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:164) at org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:164) at org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:164) at org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:164) at org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:164) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:85) at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:64) at org.apache.lucene.search.Hits.(Hits.java:43) at org.apache.lucene.search.Searcher.search(Searcher.java:33) at org.apache.lucene.search.Searcher.search(Searcher.java:27) at resdex.searchinc.getHits(searchinc.java:752) at resdex.searchinc.Search(searchinc.java:943) at resdex.searchincte
Re: problems with lucene in multithreaded environment
I noticed delays when concurrent threads query an IndexSearcher too. our index is about 550MB with about 850,000 docs. each doc with 20-30 fields of which only 3 are indexed. Our queries are not very complex -- just 3 required term queries. this is what my test did: intialilize an array of terms that are known to appear in the initialize a IndexSearcher start a number of threads that query the indexsearcher and extract each thread picks random terms that are known to appear in the indexed Keyword fields and builds a boolean query and then extracts all 20-30 fields from the 1st 10 hits. waits .5 secondseach thread does this 30 times. typical queries returned 20 - 100 hits with just one thread: 30 queries ran over a span about 20 seconds. search time for each query generally took 40ms to 75ms. The longest search time was 445ms but searches that took more than 100ms were rare. with 5 threads: 150 queries ran over a span of 62 seconds. search time for each query for the most part increased to 120ms to 300ms. big delays were more prevalent and took 3 or 4 seconds. with 10 or more threads things got bad. and I didn't run enough tests. but most searches took 1 to 2 seconds and some searches did take 20 to 30 seconds. when I ran the test with 5 concurrent thread each doing one query search times were like 100ms to 200 ms with a max of 700ms. I have not looked into the code Lucene much and I didn't think queries were queued. I ran my test with the -DdisableLuceneLocks in the command line. But I wasn't sure it did anything. I ran the test on Lucene1.3 final on my powerbook G4 and tests ran with alot of other processes going on. I was interested in this discussion because I could not figure out the delay if queries are run in parallel. On Jun 2, 2004, at 9:32 PM, Doug Cutting wrote: Jayant Kumar wrote: We recently tested lucene with an index size of 2 GB which has about 1,500,000 documents, each document having about 25 fields. The frequency of search was about 20 queries per second. This resulted in an average response time of about 20 seconds approx per search. That sounds slow, unless your queries are very complex. What are your queries like? What we observed was that lucene queues the queries and does not release them until the results are found. so the queries that have come in later take up about 500 seconds. Please let us know whether there is a technique to optimize lucene in such circumstances. Multiple queries executed from different threads using a single searcher should not queue, but should run in parallel. A technique to find out where threads are queueing is to get a thread dump and see where all of the threads are stuck. In Solaris and Linux, sending the JVM a SIGQUIT will give a thread dump. On Windows, use Control-Break. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: problems with lucene in multithreaded environment
Jayant Kumar wrote: We recently tested lucene with an index size of 2 GB which has about 1,500,000 documents, each document having about 25 fields. The frequency of search was about 20 queries per second. This resulted in an average response time of about 20 seconds approx per search. That sounds slow, unless your queries are very complex. What are your queries like? What we observed was that lucene queues the queries and does not release them until the results are found. so the queries that have come in later take up about 500 seconds. Please let us know whether there is a technique to optimize lucene in such circumstances. Multiple queries executed from different threads using a single searcher should not queue, but should run in parallel. A technique to find out where threads are queueing is to get a thread dump and see where all of the threads are stuck. In Solaris and Linux, sending the JVM a SIGQUIT will give a thread dump. On Windows, use Control-Break. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
problems with lucene in multithreaded environment
We recently tested lucene with an index size of 2 GB which has about 1,500,000 documents, each document having about 25 fields. The frequency of search was about 20 queries per second. This resulted in an average response time of about 20 seconds approx per search. What we observed was that lucene queues the queries and does not release them until the results are found. so the queries that have come in later take up about 500 seconds. Please let us know whether there is a technique to optimize lucene in such circumstances. Please note that we have created a single object for the searcher (IndexSearcher) and all queries are passed to this searcher only. We are using a P4 dual processor machine with 6 gb of ram. We need results at the rate of about 60 queries/second at peak load. Is there a way to optimize lucene to get this performance from this machine? What other ways can i optimize lucene for this output? Regards Jayant Yahoo! India Matrimony: Find your partner online. http://yahoo.shaadi.com/india-matrimony/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]