Re: problems with lucene in multithreaded environment

2004-06-08 Thread Jayant Kumar
 --- Doug Cutting <[EMAIL PROTECTED]> wrote: > Jayant
Kumar wrote:
> > Thanks for the patch. It helped in increasing the
> > search speed to a good extent.
> 
> Good.  I'll commit it.  Thanks for testing it.
> 
> > But when we tried to
> > give about 100 queries in 10 seconds, then again
> we
> > found that after about 15 seconds, the response
> time
> > per query increased.
> 
> This still sounds very slow to me.  Is your index
> optimized?  What JVM 
> are you using?

Yes our index is optimized and we are using java
version 1.4.2. What would be the difference between
search on a optimized index and that on an unoptimized
index.

> 
> You might also consider ramping up your benchmark
> more slowly, to warm 
> the filesystem's cache.  So, when you first launch
> the server, give it a 
> few queries at a lower rate, then, after those have
> completed, try a 
> higher rate.
> 

We have tried this out and since we have lots of ram,
once the indexes go into the os cache, results come
out fast.

> > We were able to simplify the searches further by
> > consolidating the fields in the index but that
> > resulted in increasing the index size to 2.5 GB as
> we
> > required fields 2-5 and fields 1-7 in different
> > searches.
> 
> That will slow updates a bit, but searching should
> be faster.
> 
> How about your range searches?  Do you know how many
> terms they match? 
> The easiest way to determine this might be to insert
> a print statement 
> in RangeQuery.rewrite() that shows the query before
> it is returned.
> 
> > Our indexes are on the local disk therefor
> > there is no network i/o involved.
> 
> It does like file i/o is now your bottleneck.  The
> traces below show 
> that you're using the compound file format, which
> combines many files 
> into one.  When two threads try to read two
> logically different files 
> (.prx and .frq below) they must sychronize when the
> compound format is 
> used.  But if your application did not use the
> compound format this 
> synchronization would not be required.  So you
> should try rebuilding 
> your index with the compound format turned off. 
> (The fastest way to do 
> this is simply to add and/or delete a single
> document, then re-optimize 
> the index with compound format turned off.  This
> will cause the index to 
> be re-written in non-compound format.)

We have changed the indexwriter to write index in
non-compound format and have noticed a little
difference in the response time.

> 
> Is this on linux?  If so, please try running 'iostat
> -x 1' while you 
> perform your benchmark (iostat is installed by the
> 'sysstat' package). 
> What percentage is the disk utilized (%util)?  What
> is the percentage of 
> idle CPU (%idle)?  What is the rate of data that is
> read (rkB/s)?  If 
> things really are i/o bound then you might consider
> spreading the data 
> over multiple disks, e.g., with lvm striping or a
> RAID controller.
> 
> If you have a lot of RAM, then you could also
> consider moving certain 
> files of the index onto a ramfs-based drive.  For
> example, moving the 
> .tis, .frq and .prx can greatly improve performance.
>  Also, having these 
> files in RAM means that the cache does not need to
> be warmed.
> 
> Hope this helps!

Thanks for the help. We will continue interacting with
you as and when we face problems. We would also like
you to know that lucene is an excellent search engine
compared to any other.

Jayant
  


Yahoo! India Matrimony: Find your partner online. 
http://yahoo.shaadi.com/india-matrimony/

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: problems with lucene in multithreaded environment

2004-06-07 Thread Doug Cutting
Jayant Kumar wrote:
Thanks for the patch. It helped in increasing the
search speed to a good extent.
Good.  I'll commit it.  Thanks for testing it.
But when we tried to
give about 100 queries in 10 seconds, then again we
found that after about 15 seconds, the response time
per query increased.
This still sounds very slow to me.  Is your index optimized?  What JVM 
are you using?

You might also consider ramping up your benchmark more slowly, to warm 
the filesystem's cache.  So, when you first launch the server, give it a 
few queries at a lower rate, then, after those have completed, try a 
higher rate.

We were able to simplify the searches further by
consolidating the fields in the index but that
resulted in increasing the index size to 2.5 GB as we
required fields 2-5 and fields 1-7 in different
searches.
That will slow updates a bit, but searching should be faster.
How about your range searches?  Do you know how many terms they match? 
The easiest way to determine this might be to insert a print statement 
in RangeQuery.rewrite() that shows the query before it is returned.

Our indexes are on the local disk therefor
there is no network i/o involved.
It does like file i/o is now your bottleneck.  The traces below show 
that you're using the compound file format, which combines many files 
into one.  When two threads try to read two logically different files 
(.prx and .frq below) they must sychronize when the compound format is 
used.  But if your application did not use the compound format this 
synchronization would not be required.  So you should try rebuilding 
your index with the compound format turned off.  (The fastest way to do 
this is simply to add and/or delete a single document, then re-optimize 
the index with compound format turned off.  This will cause the index to 
be re-written in non-compound format.)

Is this on linux?  If so, please try running 'iostat -x 1' while you 
perform your benchmark (iostat is installed by the 'sysstat' package). 
What percentage is the disk utilized (%util)?  What is the percentage of 
idle CPU (%idle)?  What is the rate of data that is read (rkB/s)?  If 
things really are i/o bound then you might consider spreading the data 
over multiple disks, e.g., with lvm striping or a RAID controller.

If you have a lot of RAM, then you could also consider moving certain 
files of the index onto a ramfs-based drive.  For example, moving the 
.tis, .frq and .prx can greatly improve performance.  Also, having these 
files in RAM means that the cache does not need to be warmed.

Hope this helps!
Doug
 > "Thread-23" prio=1 tid=0x08169f38 nid=0x2867 waiting for monitor 
entry [69bd4000..69bd48c8]
at 
org.apache.lucene.index.CompoundFileReader$CSInputStream.readInternal(CompoundFileReader.java:217)
- waiting to lock <0x46f1b828> (a org.apache.lucene.store.FSInputStream)
at org.apache.lucene.store.InputStream.refill(InputStream.java:158)
at org.apache.lucene.store.InputStream.readByte(InputStream.java:43)
at org.apache.lucene.store.InputStream.readVInt(InputStream.java:83)
at 
org.apache.lucene.index.SegmentTermPositions.next(SegmentTermPositions.java:58)
"Thread-22" prio=1 tid=0x08159f78 nid=0x2866 waiting for monitor entry 
[69b53000..69b538c8]
at 
org.apache.lucene.index.CompoundFileReader$CSInputStream.readInternal(CompoundFileReader.java:217)
- waiting to lock <0x46f1b828> (a org.apache.lucene.store.FSInputStream)
at org.apache.lucene.store.InputStream.refill(InputStream.java:158)
at org.apache.lucene.store.InputStream.readByte(InputStream.java:43)
at org.apache.lucene.store.InputStream.readVInt(InputStream.java:86)
at org.apache.lucene.index.SegmentTermDocs.read(SegmentTermDocs.java:126)
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: problems with lucene in multithreaded environment

2004-06-05 Thread Jayant Kumar
Thanks for the patch. It helped in increasing the
search speed to a good extent. But when we tried to
give about 100 queries in 10 seconds, then again we
found that after about 15 seconds, the response time
per query increased. Enclosed is the dump which we
took after about 30 seconds of starting the search.
The maximum query time has reduced from 200-300
seconds to about 50 seconds.

We were able to simplify the searches further by
consolidating the fields in the index but that
resulted in increasing the index size to 2.5 GB as we
required fields 2-5 and fields 1-7 in different
searches. Our indexes are on the local disk therefor
there is no network i/o involved.

Thanks
Jayant

 --- Doug Cutting <[EMAIL PROTECTED]> wrote: > Doug
Cutting wrote:
> > Please tell me if you are able to simplify your
> queries and if that 
> > speeds things.  I'll look into a ThreadLocal-based
> solution too.
> 
> I've attached a patch that should help with the
> thread contention, 
> although I've not tested it extensively.
> 
> I still don't fully understand why your searches are
> so slow, though. 
> Are the indexes stored on the local disk of the
> machine?  Indexes 
> accessed over the network can be very slow.
> 
> Anyway, give this patch a try.  Also, if anyone else
> can try this and 
> report back whether it makes multi-threaded
> searching faster, or 
> anything else slower, or is buggy, that would be
> great.
> 
> Thanks,
> 
> Doug
> > Index:
>
src/java/org/apache/lucene/index/TermInfosReader.java
>
===
> RCS file:
>
/home/cvs/jakarta-lucene/src/java/org/apache/lucene/index/TermInfosReader.java,v
> retrieving revision 1.6
> diff -u -u -r1.6 TermInfosReader.java
> ---
>
src/java/org/apache/lucene/index/TermInfosReader.java
> 20 May 2004 11:23:53 -1.6
> +++
>
src/java/org/apache/lucene/index/TermInfosReader.java
> 4 Jun 2004 21:45:15 -
> @@ -29,7 +29,8 @@
>private String segment;
>private FieldInfos fieldInfos;
>  
> -  private SegmentTermEnum enumerator;
> +  private ThreadLocal enumerators = new
> ThreadLocal();
> +  private SegmentTermEnum origEnum;
>private long size;
>  
>TermInfosReader(Directory dir, String seg,
> FieldInfos fis)
> @@ -38,19 +39,19 @@
>  segment = seg;
>  fieldInfos = fis;
>  
> -enumerator = new
> SegmentTermEnum(directory.openFile(segment +
> ".tis"),
> -fieldInfos, false);
> -size = enumerator.size;
> +origEnum = new
> SegmentTermEnum(directory.openFile(segment +
> ".tis"),
> +   fieldInfos,
> false);
> +size = origEnum.size;
>  readIndex();
>}
>  
>public int getSkipInterval() {
> -return enumerator.skipInterval;
> +return origEnum.skipInterval;
>}
>  
>final void close() throws IOException {
> -if (enumerator != null)
> -  enumerator.close();
> +if (origEnum != null)
> +  origEnum.close();
>}
>  
>/** Returns the number of term/value pairs in the
> set. */
> @@ -58,6 +59,15 @@
>  return size;
>}
>  
> +  private SegmentTermEnum getEnum() {
> +SegmentTermEnum enum =
> (SegmentTermEnum)enumerators.get();
> +if (enum == null) {
> +  enum = terms();
> +  enumerators.set(enum);
> +}
> +return enum;
> +  }
> +
>Term[] indexTerms = null;
>TermInfo[] indexInfos;
>long[] indexPointers;
> @@ -102,16 +112,17 @@
>}
>  
>private final void seekEnum(int indexOffset)
> throws IOException {
> -enumerator.seek(indexPointers[indexOffset],
> -   (indexOffset * enumerator.indexInterval) -
> 1,
> +getEnum().seek(indexPointers[indexOffset],
> +   (indexOffset * getEnum().indexInterval) - 1,
> indexTerms[indexOffset],
> indexInfos[indexOffset]);
>}
>  
>/** Returns the TermInfo for a Term in the set,
> or null. */
> -  final synchronized TermInfo get(Term term) throws
> IOException {
> +  TermInfo get(Term term) throws IOException {
>  if (size == 0) return null;
>  
> -// optimize sequential access: first try
> scanning cached enumerator w/o seeking
> +// optimize sequential access: first try
> scanning cached enum w/o seeking
> +SegmentTermEnum enumerator = getEnum();
>  if (enumerator.term() != null
> // term is at or past current
>   && ((enumerator.prev != null &&
> term.compareTo(enumerator.prev) > 0)
>   || term.compareTo(enumerator.term()) >= 0)) {
> @@ -128,6 +139,7 @@
>  
>/** Scans within block for matching term. */
>private final TermInfo scanEnum(Term term) throws
> IOException {
> +SegmentTermEnum enumerator = getEnum();
>  while (term.compareTo(enumerator.term()) > 0 &&
> enumerator.next()) {}
>  if (enumerator.term() != null &&
> term.compareTo(enumerator.term()) == 0)
>return enumerator.termInfo();
> @@ -136,10 +148,12 @@
>}
>  
>/** Returns the nth term in the set. */
> -  final synchronized Te

Re: problems with lucene in multithreaded environment

2004-06-04 Thread Doug Cutting
Doug Cutting wrote:
Please tell me if you are able to simplify your queries and if that 
speeds things.  I'll look into a ThreadLocal-based solution too.
I've attached a patch that should help with the thread contention, 
although I've not tested it extensively.

I still don't fully understand why your searches are so slow, though. 
Are the indexes stored on the local disk of the machine?  Indexes 
accessed over the network can be very slow.

Anyway, give this patch a try.  Also, if anyone else can try this and 
report back whether it makes multi-threaded searching faster, or 
anything else slower, or is buggy, that would be great.

Thanks,
Doug
Index: src/java/org/apache/lucene/index/TermInfosReader.java
===
RCS file: /home/cvs/jakarta-lucene/src/java/org/apache/lucene/index/TermInfosReader.java,v
retrieving revision 1.6
diff -u -u -r1.6 TermInfosReader.java
--- src/java/org/apache/lucene/index/TermInfosReader.java	20 May 2004 11:23:53 -	1.6
+++ src/java/org/apache/lucene/index/TermInfosReader.java	4 Jun 2004 21:45:15 -
@@ -29,7 +29,8 @@
   private String segment;
   private FieldInfos fieldInfos;
 
-  private SegmentTermEnum enumerator;
+  private ThreadLocal enumerators = new ThreadLocal();
+  private SegmentTermEnum origEnum;
   private long size;
 
   TermInfosReader(Directory dir, String seg, FieldInfos fis)
@@ -38,19 +39,19 @@
 segment = seg;
 fieldInfos = fis;
 
-enumerator = new SegmentTermEnum(directory.openFile(segment + ".tis"),
-			   fieldInfos, false);
-size = enumerator.size;
+origEnum = new SegmentTermEnum(directory.openFile(segment + ".tis"),
+   fieldInfos, false);
+size = origEnum.size;
 readIndex();
   }
 
   public int getSkipInterval() {
-return enumerator.skipInterval;
+return origEnum.skipInterval;
   }
 
   final void close() throws IOException {
-if (enumerator != null)
-  enumerator.close();
+if (origEnum != null)
+  origEnum.close();
   }
 
   /** Returns the number of term/value pairs in the set. */
@@ -58,6 +59,15 @@
 return size;
   }
 
+  private SegmentTermEnum getEnum() {
+SegmentTermEnum enum = (SegmentTermEnum)enumerators.get();
+if (enum == null) {
+  enum = terms();
+  enumerators.set(enum);
+}
+return enum;
+  }
+
   Term[] indexTerms = null;
   TermInfo[] indexInfos;
   long[] indexPointers;
@@ -102,16 +112,17 @@
   }
 
   private final void seekEnum(int indexOffset) throws IOException {
-enumerator.seek(indexPointers[indexOffset],
-	  (indexOffset * enumerator.indexInterval) - 1,
+getEnum().seek(indexPointers[indexOffset],
+	  (indexOffset * getEnum().indexInterval) - 1,
 	  indexTerms[indexOffset], indexInfos[indexOffset]);
   }
 
   /** Returns the TermInfo for a Term in the set, or null. */
-  final synchronized TermInfo get(Term term) throws IOException {
+  TermInfo get(Term term) throws IOException {
 if (size == 0) return null;
 
-// optimize sequential access: first try scanning cached enumerator w/o seeking
+// optimize sequential access: first try scanning cached enum w/o seeking
+SegmentTermEnum enumerator = getEnum();
 if (enumerator.term() != null // term is at or past current
 	&& ((enumerator.prev != null && term.compareTo(enumerator.prev) > 0)
 	|| term.compareTo(enumerator.term()) >= 0)) {
@@ -128,6 +139,7 @@
 
   /** Scans within block for matching term. */
   private final TermInfo scanEnum(Term term) throws IOException {
+SegmentTermEnum enumerator = getEnum();
 while (term.compareTo(enumerator.term()) > 0 && enumerator.next()) {}
 if (enumerator.term() != null && term.compareTo(enumerator.term()) == 0)
   return enumerator.termInfo();
@@ -136,10 +148,12 @@
   }
 
   /** Returns the nth term in the set. */
-  final synchronized Term get(int position) throws IOException {
+  final Term get(int position) throws IOException {
 if (size == 0) return null;
 
-if (enumerator != null && enumerator.term() != null && position >= enumerator.position &&
+SegmentTermEnum enumerator = getEnum();
+if (enumerator != null && enumerator.term() != null &&
+position >= enumerator.position &&
 	position < (enumerator.position + enumerator.indexInterval))
   return scanEnum(position);		  // can avoid seek
 
@@ -148,6 +162,7 @@
   }
 
   private final Term scanEnum(int position) throws IOException {
+SegmentTermEnum enumerator = getEnum();
 while(enumerator.position < position)
   if (!enumerator.next())
 	return null;
@@ -156,12 +171,13 @@
   }
 
   /** Returns the position of a Term in the set or -1. */
-  final synchronized long getPosition(Term term) throws IOException {
+  final long getPosition(Term term) throws IOException {
 if (size == 0) return -1;
 
 int indexOffset = getIndexOffset(term);
 seekEnum(indexOffset);
 
+SegmentTermEnum enumerator

Re: problems with lucene in multithreaded environment

2004-06-04 Thread Doug Cutting
Jayant Kumar wrote:
Please find enclosed jvmdump.txt which contains a dump
of our search program after about 20 seconds of
starting the program.
Also enclosed is the file queries.txt which contains
few sample search queries.
Thanks for the data.  This is exactly what I was looking for.
"Thread-14" prio=1 tid=0x080a7420 nid=0x468e waiting for monitor entry 
[4d61a000..4d61ac18]
at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:112)
- waiting to lock <0x44c95228> (a org.apache.lucene.index.TermInfosReader)
"Thread-12" prio=1 tid=0x080a58e0 nid=0x468e waiting for monitor entry 
[4d51a000..4d51ad18]
at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:112)
- waiting to lock <0x44c95228> (a org.apache.lucene.index.TermInfosReader)
These are all stuck looking terms up in the dictionary (TermInfos). 
Things would be much faster if your queries didn't have so many terms.

Query : (  (  (  (  (  FIELD1: proof OR  FIELD2: proof OR  FIELD3: proof OR  FIELD4: proof OR  FIELD5: proof OR  FIELD6: proof OR  FIELD7: proof ) AND (  FIELD1: "george bush" OR  FIELD2: "george bush" OR  FIELD3: "george bush" OR  FIELD4: "george bush" OR  FIELD5: "george bush" OR  FIELD6: "george bush" OR  FIELD7: "george bush" )  ) AND (  FIELD1: script OR  FIELD2: script OR  FIELD3: script OR  FIELD4: script OR  FIELD5: script OR  FIELD6: script OR  FIELD7: script )  ) AND (  (  FIELD1: san OR  FIELD2: san OR  FIELD3: san OR  FIELD4: san OR  FIELD5: san OR  FIELD6: san OR  FIELD7: san ) OR (  (  FIELD1: war OR  FIELD2: war OR  FIELD3: war OR  FIELD4: war OR  FIELD5: war OR  FIELD6: war OR  FIELD7: war ) OR (  (  FIELD1: gulf OR  FIELD2: gulf OR  FIELD3: gulf OR  FIELD4: gulf OR  FIELD5: gulf OR  FIELD6: gulf OR  FIELD7: gulf ) OR (  (  FIELD1: laden OR  FIELD2: laden OR  FIELD3: laden OR  FIELD4: laden OR  FIELD5: laden OR  FIELD6: laden OR  FIELD7: laden ) OR (  (  FIE
LD1: ttouristeat OR  FIELD2: ttouristeat OR  FIELD3: ttouristeat OR  FIELD4: 
ttouristeat OR  FIELD5: ttouristeat OR  FIELD6: ttouristeat OR  FIELD7: ttouristeat ) 
OR (  (  FIELD1: pow OR  FIELD2: pow OR  FIELD3: pow OR  FIELD4: pow OR  FIELD5: pow 
OR  FIELD6: pow OR  FIELD7: pow ) OR (  FIELD1: bin OR  FIELD2: bin OR  FIELD3: bin OR 
 FIELD4: bin OR  FIELD5: bin OR  FIELD6: bin OR  FIELD7: bin )  )  )  )  )  )  )  )  ) 
AND  RANGE: ([ 0800 TO 1100 ]) AND  (  S_IDa: (7 OR 8 OR 9 OR 10 OR 11 OR 12 OR 13 OR 
14 OR 15 OR 16 OR 17 )  or  S_IDb: (2 )  )
All your queries look for terms in fields 1-7.  If you instead combined 
the contents of fields 1-7 in a single field, and searched that field, 
then your searches would contain far fewer terms and be much faster.

Also, I don't know how many terms your RANGE queries match, but that 
could also be introducing large numbers of terms which would slow things 
down too.

But, still, you have identified a bottleneck: TermInfosReader caches a 
TermEnum and hence access to it must be synchronized.  Caching the enum 
greatly speeds sequential access to terms, e.g., when merging, 
performing range or prefix queries, etc.  Perhaps however the cache 
should be done through a ThreadLocal, giving each thread its own cache 
and obviating the need for synchronization...

Please tell me if you are able to simplify your queries and if that 
speeds things.  I'll look into a ThreadLocal-based solution too.

Doug
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: problems with lucene in multithreaded environment

2004-06-03 Thread Jayant Kumar
We conducted a test on our search for 500 requests
given in 27 seconds. We noticed that in the first 5
seconds, the results were coming in 100 to 500 ms. But
as the queue size kept increasing, the response time
of the search increased drastically to approx 80-100
seconds. 

Please find enclosed jvmdump.txt which contains a dump
of our search program after about 20 seconds of
starting the program.

Also enclosed is the file queries.txt which contains
few sample search queries.

Please note that this is done on a sample of 400,000
documents (450MB) on P4 having 1GB RAM.

Kindly let us know if this helps to identify the cause
of slow response.

Jayant

 --- Doug Cutting <[EMAIL PROTECTED]> wrote: > Jayant
Kumar wrote:
> > We recently tested lucene with an index size of 2
> GB
> > which has about 1,500,000 documents, each document
> > having about 25 fields. The frequency of search
> was
> > about 20 queries per second. This resulted in an
> > average response time of about 20 seconds approx
> > per search.
> 
> That sounds slow, unless your queries are very
> complex.  What are your 
> queries like?
> 
> > What we observed was that lucene queues
> > the queries and does not release them until the
> > results are found. so the queries that have come
> in
> > later take up about 500 seconds. Please let us
> know
> > whether there is a technique to optimize lucene in
> > such circumstances. 
> 
> Multiple queries executed from different threads
> using a single searcher 
> should not queue, but should run in parallel.  A
> technique to find out 
> where threads are queueing is to get a thread dump
> and see where all of 
> the threads are stuck.  In Solaris and Linux,
> sending the JVM a SIGQUIT 
> will give a thread dump.  On Windows, use
> Control-Break.
> 
> Doug
> 
>
-
> To unsubscribe, e-mail:
> [EMAIL PROTECTED]
> For additional commands, e-mail:
> [EMAIL PROTECTED]
>  


Yahoo! India Matrimony: Find your partner online. 
http://yahoo.shaadi.com/india-matrimony/"Thread-14" prio=1 tid=0x080a7420 nid=0x468e waiting for monitor entry 
[4d61a000..4d61ac18]
at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:112)
- waiting to lock <0x44c95228> (a org.apache.lucene.index.TermInfosReader)
at org.apache.lucene.index.SegmentTermDocs.seek(SegmentTermDocs.java:51)
at org.apache.lucene.index.IndexReader.termDocs(IndexReader.java:364)
at org.apache.lucene.search.TermQuery$TermWeight.scorer(TermQuery.java:59)
at 
org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:164)
at 
org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:164)
at 
org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:164)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:85)
at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:64)
at org.apache.lucene.search.Hits.(Hits.java:43)
at org.apache.lucene.search.Searcher.search(Searcher.java:33)
at org.apache.lucene.search.Searcher.search(Searcher.java:27)
at resdex.searchinc.getHits(searchinc.java:752)
at resdex.searchinc.Search(searchinc.java:943)
at resdex.searchinctest.conductTestSearch(searchinctest.java:99)
at resdex.Server$Handler.run(Server.java:64)
at java.lang.Thread.run(Thread.java:534)

"Thread-12" prio=1 tid=0x080a58e0 nid=0x468e waiting for monitor entry 
[4d51a000..4d51ad18]
at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:112)
- waiting to lock <0x44c95228> (a org.apache.lucene.index.TermInfosReader)
at org.apache.lucene.index.SegmentTermDocs.seek(SegmentTermDocs.java:51)
at org.apache.lucene.index.IndexReader.termDocs(IndexReader.java:364)
at org.apache.lucene.search.TermQuery$TermWeight.scorer(TermQuery.java:59)
at 
org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:164)
at 
org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:164)
at 
org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:164)
at 
org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:164)
at 
org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:164)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:85)
at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:64)
at org.apache.lucene.search.Hits.(Hits.java:43)
at org.apache.lucene.search.Searcher.search(Searcher.java:33)
at org.apache.lucene.search.Searcher.search(Searcher.java:27)
at resdex.searchinc.getHits(searchinc.java:752)
at resdex.searchinc.Search(searchinc.java:943)
at resdex.searchincte

Re: problems with lucene in multithreaded environment

2004-06-03 Thread Supun Edirisinghe
I noticed delays when concurrent threads query an IndexSearcher too.
our index is about 550MB with about 850,000 docs. each doc with 20-30 
fields of which only 3 are indexed. Our queries are not very complex -- 
just 3 required term queries.

this is what my test did:
intialilize an array of terms that are known to appear in the
initialize a IndexSearcher
start a number of threads
	that query the indexsearcher and extract
	each thread picks random terms that are known to appear in the indexed 
Keyword fields and builds a boolean query
	and then extracts all 20-30 fields from the 1st 10 hits.
	waits .5 secondseach thread does this 30 times.

typical queries returned 20 - 100 hits
with just one thread: 30 queries ran over a span about 20 seconds. 
search time for each query generally took 40ms to 75ms. The longest 
search time was 445ms but searches that took more than 100ms were rare.

with 5 threads: 150 queries ran over a span of 62 seconds. search time 
for each query for the most part increased to 120ms to 300ms. big 
delays were more prevalent and took 3 or 4 seconds.

with 10 or more threads things got bad. and I didn't run enough tests. 
but most searches took 1 to 2 seconds and some searches did take 20 to 
30 seconds.

when I ran the test with 5 concurrent thread each doing one query 
search times were like 100ms to 200 ms with a max of 700ms.

I have not looked into the code Lucene much and I didn't think queries 
were queued.

I ran my test with the -DdisableLuceneLocks in the command line. But I 
wasn't sure it did anything.

I ran the test on Lucene1.3 final on my powerbook G4 and tests ran with 
alot of other processes going on.

I was interested in this discussion because I could not figure out the 
delay if queries are run in parallel.

On Jun 2, 2004, at 9:32 PM, Doug Cutting wrote:
Jayant Kumar wrote:
We recently tested lucene with an index size of 2 GB
which has about 1,500,000 documents, each document
having about 25 fields. The frequency of search was
about 20 queries per second. This resulted in an
average response time of about 20 seconds approx
per search.
That sounds slow, unless your queries are very complex.  What are your 
queries like?

What we observed was that lucene queues
the queries and does not release them until the
results are found. so the queries that have come in
later take up about 500 seconds. Please let us know
whether there is a technique to optimize lucene in
such circumstances.
Multiple queries executed from different threads using a single 
searcher should not queue, but should run in parallel.  A technique to 
find out where threads are queueing is to get a thread dump and see 
where all of the threads are stuck.  In Solaris and Linux, sending the 
JVM a SIGQUIT will give a thread dump.  On Windows, use Control-Break.

Doug
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: problems with lucene in multithreaded environment

2004-06-02 Thread Doug Cutting
Jayant Kumar wrote:
We recently tested lucene with an index size of 2 GB
which has about 1,500,000 documents, each document
having about 25 fields. The frequency of search was
about 20 queries per second. This resulted in an
average response time of about 20 seconds approx
per search.
That sounds slow, unless your queries are very complex.  What are your 
queries like?

What we observed was that lucene queues
the queries and does not release them until the
results are found. so the queries that have come in
later take up about 500 seconds. Please let us know
whether there is a technique to optimize lucene in
such circumstances. 
Multiple queries executed from different threads using a single searcher 
should not queue, but should run in parallel.  A technique to find out 
where threads are queueing is to get a thread dump and see where all of 
the threads are stuck.  In Solaris and Linux, sending the JVM a SIGQUIT 
will give a thread dump.  On Windows, use Control-Break.

Doug
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


problems with lucene in multithreaded environment

2004-06-02 Thread Jayant Kumar
We recently tested lucene with an index size of 2 GB
which has about 1,500,000 documents, each document
having about 25 fields. The frequency of search was
about 20 queries per second. This resulted in an
average response time of about 20 seconds approx
per search. What we observed was that lucene queues
the queries and does not release them until the
results are found. so the queries that have come in
later take up about 500 seconds. Please let us know
whether there is a technique to optimize lucene in
such circumstances. 

Please note that we have created a single object for
the searcher (IndexSearcher) and all queries are
passed to this searcher only. We are using a P4 dual
processor machine with 6 gb of ram. We need results at
the rate of about 60 queries/second at peak load. Is
there a way to optimize lucene to get this performance
from this machine? What other ways can i optimize
lucene for this output?

Regards
Jayant


Yahoo! India Matrimony: Find your partner online. 
http://yahoo.shaadi.com/india-matrimony/

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]