Re: Race condition in org/apache/solr/client/solrj/impl/StreamingUpdateSolrServer

2010-01-07 Thread Ryan McKinley

can  you submit a patch to JIRA?


On Jan 7, 2010, at 10:23 AM, Attila Babo wrote:


While inserting a large pile of documents using
StreamingUpdateSolrServer I've found a race condition as all Runner
instances stopped while the blocking queue was full. The attached
patch solves the problem, to minify it all indentation has been
removed.

Index: src/solrj/org/apache/solr/client/solrj/impl/ 
StreamingUpdateSolrServer.java

===
--- src/solrj/org/apache/solr/client/solrj/impl/ 
StreamingUpdateSolrServer.java	(revision

888167)
+++ src/solrj/org/apache/solr/client/solrj/impl/ 
StreamingUpdateSolrServer.java	(working

copy)
@@ -82,6 +82,7 @@
  log.info( "starting runner: {}" , this );
  PostMethod method = null;
  try {
+do {
RequestEntity request = new RequestEntity() {
  // we don't know the length
  public long getContentLength() { return -1; }
@@ -142,6 +143,7 @@
  msg.append( "request: "+method.getURI() );
  handleError( new Exception( msg.toString() ) );
}
+}  while( ! queue.isEmpty());
  }
  catch (Throwable e) {
handleError( e );
@@ -149,6 +151,7 @@
  finally {
try {
  // make sure to release the connection
+  if(method != null)
  method.releaseConnection();
}
catch( Exception ex ){}
@@ -195,11 +198,11 @@

  queue.put( req );

+synchronized( runners ) {
  if( runners.isEmpty()
|| (queue.remainingCapacity() < queue.size()
 && runners.size() < threadCount) )
  {
-synchronized( runners ) {
  Runner r = new Runner();
  scheduler.execute( r );
  runners.add( r );

===

This patch has been tested with millions of document inserted to Solr,
before that I was unable to inject all of our documents as the
following scenario happened. We have a BlockingQueue called runners to
handle requests, at one point the queue was emptied by the Runner
threads, they all stopped processing new items but sent the collected
items to Solr. Solr was busy so that toke a long time, during that the
client filled the queue again. As all worker threads were instantiated
there were no way to create new Runners to handle the queue so it was
growing to upper limit. When the next item was about to put into the
queue it was blocked and the race condition just happened.

Patch 1, 2:
Inside the Runner.run method I've added a do while loop to prevent the
Runner to quit while there are new requests, this handles the problem
of new requests added while Runner is sending the previous batch.

Patch 3
Validity check of method variable is not strictly necessary, just a
code clean up.

Patch 4
The last part of the patch is to move synchronized outside of
conditional to avoid a situation where runners change while evaluating
it.

Your comments and critique are welcome!

Attila




Race condition in org/apache/solr/client/solrj/impl/StreamingUpdateSolrServer

2010-01-07 Thread Attila Babo
While inserting a large pile of documents using
StreamingUpdateSolrServer I've found a race condition as all Runner
instances stopped while the blocking queue was full. The attached
patch solves the problem, to minify it all indentation has been
removed.

Index: 
src/solrj/org/apache/solr/client/solrj/impl/StreamingUpdateSolrServer.java
===
--- src/solrj/org/apache/solr/client/solrj/impl/StreamingUpdateSolrServer.java  
(revision
888167)
+++ src/solrj/org/apache/solr/client/solrj/impl/StreamingUpdateSolrServer.java  
(working
copy)
@@ -82,6 +82,7 @@
   log.info( "starting runner: {}" , this );
   PostMethod method = null;
   try {
+do {
 RequestEntity request = new RequestEntity() {
   // we don't know the length
   public long getContentLength() { return -1; }
@@ -142,6 +143,7 @@
   msg.append( "request: "+method.getURI() );
   handleError( new Exception( msg.toString() ) );
 }
+}  while( ! queue.isEmpty());
   }
   catch (Throwable e) {
 handleError( e );
@@ -149,6 +151,7 @@
   finally {
 try {
   // make sure to release the connection
+  if(method != null)
   method.releaseConnection();
 }
 catch( Exception ex ){}
@@ -195,11 +198,11 @@

   queue.put( req );

+synchronized( runners ) {
   if( runners.isEmpty()
 || (queue.remainingCapacity() < queue.size()
  && runners.size() < threadCount) )
   {
-synchronized( runners ) {
   Runner r = new Runner();
   scheduler.execute( r );
   runners.add( r );

===

This patch has been tested with millions of document inserted to Solr,
before that I was unable to inject all of our documents as the
following scenario happened. We have a BlockingQueue called runners to
handle requests, at one point the queue was emptied by the Runner
threads, they all stopped processing new items but sent the collected
items to Solr. Solr was busy so that toke a long time, during that the
client filled the queue again. As all worker threads were instantiated
there were no way to create new Runners to handle the queue so it was
growing to upper limit. When the next item was about to put into the
queue it was blocked and the race condition just happened.

Patch 1, 2:
Inside the Runner.run method I've added a do while loop to prevent the
Runner to quit while there are new requests, this handles the problem
of new requests added while Runner is sending the previous batch.

Patch 3
Validity check of method variable is not strictly necessary, just a
code clean up.

Patch 4
The last part of the patch is to move synchronized outside of
conditional to avoid a situation where runners change while evaluating
it.

Your comments and critique are welcome!

Attila