Re: Solrj : ConcurrentUpdateSolrClient based on QueueSize and Time

2018-02-23 Thread Santosh Narayan
Thanks Jason. Hope this can be fixed in the next update of SolrJ.



On Thu, Feb 22, 2018 at 10:49 AM, Jason Gerlowski 
wrote:

> My apologies Santosh.  I added that comment a few releases back based
> on a misunderstanding I've only recently been disabused of.  I will
> correct it.
>
> Anyway, Shawn's explanation above is correct.  The queueSize parameter
> doesn't control batching, as he clarified.  Sorry for the trouble.
>
> Best,
>
> Jason
>
> On Wed, Feb 21, 2018 at 8:50 PM, Santosh Narayan
>  wrote:
> > Thanks for the explanation Shawn. Very helpful. I think I got misled by
> the
> > JavaDoc text for
> > *ConcurrentUpdateSolrClient.Builder.withQueueSize*
> > /**
> >  * The number of documents to batch together before sending to Solr.
> If
> > not set, this defaults to 10.
> >  */
> > public Builder withQueueSize(int queueSize) {
> >   if (queueSize <= 0) {
> > throw new IllegalArgumentException("queueSize must be a positive
> > integer.");
> >   }
> >   this.queueSize = queueSize;
> >   return this;
> > }
> >
> >
> >
> > On Thu, Feb 22, 2018 at 9:41 AM, Shawn Heisey 
> wrote:
> >
> >> On 2/21/2018 7:41 AM, Santosh Narayan wrote:
> >> > May be it is my understanding of the documentation. As per the
> >> > JavaDoc, ConcurrentUpdateSolrClient
> >> > buffers all added documents and writes them into open HTTP
> connections.
> >> >
> >> > So I thought that this class would buffer documents in the client side
> >> > itself till the QueueSize is reached and then send all the cached
> >> documents
> >> > together in one HTTP request. Is this not the case?
> >>
> >> That's not how it's designed.
> >>
> >> What ConcurrentUpdateSolrClient does differently than HttpSolrClient or
> >> CloudSolrClient is return control immediately to your program when you
> >> send an update, and begin processing that update in the background.  If
> >> you send a LOT of updates very quickly, then the queue will get larger,
> >> and will typically be processed in parallel by multiple threads.  The
> >> client won't wait for the queue to fill.  Processing of the first update
> >> you send should begin right after you add it.
> >>
> >> Something to consider:  Because control is returned to your program
> >> immediately, and the response is always a success, your program will
> >> never be informed about any problems with your adds when you use the
> >> concurrent client.  The concurrent client is a great choice for initial
> >> bulk indexing, because it offers multi-threaded indexing without any
> >> need to handle the threads yourself.  But you don't get any kind of
> >> error handling.
> >>
> >> Thanks,
> >> Shawn
> >>
> >>
>


Re: Solrj : ConcurrentUpdateSolrClient based on QueueSize and Time

2018-02-21 Thread Santosh Narayan
Thanks for the explanation Shawn. Very helpful. I think I got misled by the
JavaDoc text for
*ConcurrentUpdateSolrClient.Builder.withQueueSize*
/**
 * The number of documents to batch together before sending to Solr. If
not set, this defaults to 10.
 */
public Builder withQueueSize(int queueSize) {
  if (queueSize <= 0) {
throw new IllegalArgumentException("queueSize must be a positive
integer.");
  }
  this.queueSize = queueSize;
  return this;
}



On Thu, Feb 22, 2018 at 9:41 AM, Shawn Heisey  wrote:

> On 2/21/2018 7:41 AM, Santosh Narayan wrote:
> > May be it is my understanding of the documentation. As per the
> > JavaDoc, ConcurrentUpdateSolrClient
> > buffers all added documents and writes them into open HTTP connections.
> >
> > So I thought that this class would buffer documents in the client side
> > itself till the QueueSize is reached and then send all the cached
> documents
> > together in one HTTP request. Is this not the case?
>
> That's not how it's designed.
>
> What ConcurrentUpdateSolrClient does differently than HttpSolrClient or
> CloudSolrClient is return control immediately to your program when you
> send an update, and begin processing that update in the background.  If
> you send a LOT of updates very quickly, then the queue will get larger,
> and will typically be processed in parallel by multiple threads.  The
> client won't wait for the queue to fill.  Processing of the first update
> you send should begin right after you add it.
>
> Something to consider:  Because control is returned to your program
> immediately, and the response is always a success, your program will
> never be informed about any problems with your adds when you use the
> concurrent client.  The concurrent client is a great choice for initial
> bulk indexing, because it offers multi-threaded indexing without any
> need to handle the threads yourself.  But you don't get any kind of
> error handling.
>
> Thanks,
> Shawn
>
>


Re: Solrj : ConcurrentUpdateSolrClient based on QueueSize and Time

2018-02-21 Thread Santosh Narayan
Hi Shawn,
May be it is my understanding of the documentation. As per the
JavaDoc, ConcurrentUpdateSolrClient
buffers all added documents and writes them into open HTTP connections.

So I thought that this class would buffer documents in the client side
itself till the QueueSize is reached and then send all the cached documents
together in one HTTP request. Is this not the case?

On Wed, Feb 21, 2018 at 7:26 PM, Shawn Heisey  wrote:

> On 2/21/2018 1:21 AM, Santosh Narayan wrote:
>
>> I'm using ConcurrentUpdateSolrClient to push data into Solr. Currently,
>> I'm
>> initializing it as follows:
>>
>> ConcurrentUpdateSolrClientclient = new
>> ConcurrentUpdateSolrClient.Builder(serverUrl).withThreadCoun
>> t(100).withQueueSize(50).build();
>>
>> This works fine when there are 50 requests coming in a short span of
>> time(a
>> few seconds). The problem is when there aren't many requests. If there are
>> not many requests for say 5 minutes, then the queue size may not touch 50
>> and the data is not sent to the Solr Server. Is there a way I can add
>> another condition to this, where I can say either the QueueSize if 50 or a
>> timeout of 60 seconds, whichever is earliest? This way, in case there are
>> less requests, the records that came in would get pushed to the server
>> every 60 seconds.
>>
>
> The client should begin processing requests as soon as they are added, not
> when the queue fills up.  If you're seeing something different, then either
> there's a bug in ConcurrentUpdateSolrClient or your code is doing something
> very unusual.  Can you share the rest of the code using that client
> object?  What version of SolrJ are you using?
>
> Thanks,
> Shawn
>
>


Solrj : ConcurrentUpdateSolrClient based on QueueSize and Time

2018-02-21 Thread Santosh Narayan
Hi all,
I'm using ConcurrentUpdateSolrClient to push data into Solr. Currently, I'm
initializing it as follows:

ConcurrentUpdateSolrClientclient = new
ConcurrentUpdateSolrClient.Builder(serverUrl).withThreadCount(100).withQueueSize(50).build();

This works fine when there are 50 requests coming in a short span of time(a
few seconds). The problem is when there aren't many requests. If there are
not many requests for say 5 minutes, then the queue size may not touch 50
and the data is not sent to the Solr Server. Is there a way I can add
another condition to this, where I can say either the QueueSize if 50 or a
timeout of 60 seconds, whichever is earliest? This way, in case there are
less requests, the records that came in would get pushed to the server
every 60 seconds.

Thanks in advance for your guidance.