[jira] [Commented] (SOLR-9824) Documents indexed in bulk are replicated using too many HTTP requests

David Smiley (JIRA) Fri, 09 Dec 2016 05:15:08 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-9824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15735278#comment-15735278
 ]


David Smiley commented on SOLR-9824:
------------------------------------

Ok I get we want a flag to articulate that we want long-running threads.  It's 
really long-running _connections_ specifically what we want; it sounds clearer 
to say it that way IMO.  Maybe longLastingThreads/Connections could be 
synonymous with a poll time of MAX_VALUE?
Some other review comments:
* longLastingThreads (perhaps better renamed longLastingConnections) should be 
final.
* Both places where poll() is invoked, has the same surrounding try-catch 
InterruptedException logic; it could be extracted to a method that returns the 
Update (possibly null).
** say what's the harm in removing the {{if !longLastingThreads}} so that it 
doesn't propagate; it just finishes the connection. That sounds like the right 
behavior to do always.
* I presume you're going to effectively revert SOLR-7333 and all related 
changes?

RE Futures: I get what you're saying but I think it introduces needless 
complexity.  We're conditionally creating a Map, which needs to be populated 
and get items removed under the safety synchronized.  Runners need to be 
submitted differently to get Futures.  I think that's much more complex than 
saving the current Thread to a field on the Runner.

bq. (RE interruption) It's fine - in our case we know all the outstanding 
documents have been added to the queue by the time we are in the 
blockuntilfinished block. We don't access CUSS in a multi threaded manner. Once 
we are in blockUntilFinished and the queue is empty, we know we are just 
interrupting poll.

The spot where blockUntilFinished triggers interruption (via Future.cancel) 
indeed only occurs when the queue is empty (and we can be sure of that, and 
that no new docs will get added.  But the Runner could be in the middle of 
sending the last document it got.  I have no idea if somewhere internal to 
doing that, it might cause the sending of that document to not send and/or 
throw.

> Documents indexed in bulk are replicated using too many HTTP requests
> ---------------------------------------------------------------------
>
>                 Key: SOLR-9824
>                 URL: https://issues.apache.org/jira/browse/SOLR-9824
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrCloud
>    Affects Versions: 6.3
>            Reporter: David Smiley
>         Attachments: SOLR-9824.patch, SOLR-9824.patch, SOLR-9824.patch
>
>
> This takes awhile to explain; bear with me. While working on bulk indexing 
> small documents, I looked at the logs of my SolrCloud nodes.  I noticed that 
> shards would see an /update log message every ~6ms which is *way* too much.  
> These are requests from one shard (that isn't a leader/replica for these docs 
> but the recipient from my client) to the target shard leader (no additional 
> replicas).  One might ask why I'm not sending docs to the right shard in the 
> first place; I have a reason but it's besides the point -- there's a real 
> Solr perf problem here and this probably applies equally to 
> replicationFactor>1 situations too.  I could turn off the logs but that would 
> hide useful stuff, and it's disconcerting to me that so many short-lived HTTP 
> requests are happening, somehow at the bequest of DistributedUpdateProcessor. 
>  After lots of analysis and debugging and hair pulling, I finally figured it 
> out.  
> In SOLR-7333 ([~tpot]) introduced an optimization called 
> {{UpdateRequest.isLastDocInBatch()}} in which ConcurrentUpdateSolrClient will 
> poll with a '0' timeout to the internal queue, so that it can close the 
> connection without it hanging around any longer than needed.  This part makes 
> sense to me.  Currently the only spot that has the smarts to set this flag is 
> {{JavaBinUpdateRequestCodec.unmarshal.readOuterMostDocIterator()}} at the 
> last document.  So if a shard received docs in a javabin stream (but not 
> other formats) one would expect the _last_ document to have this flag.  
> There's even a test.  Docs without this flag get the default poll time; for 
> javabin it's 25ms.  Okay.
> I _suspect_ that if someone used CloudSolrClient or HttpSolrClient to send 
> javabin data in a batch, the intended efficiencies of SOLR-7333 would apply.  
> I didn't try. In my case, I'm using ConcurrentUpdateSolrClient (and BTW 
> DistributedUpdateProcessor uses CUSC too).  CUSC uses the RequestWriter 
> (defaulting to javabin) to send each document separately without any leading 
> marker or trailing marker.  For the XML format by comparison, there is a 
> leading and trailing marker (<stream> ... </stream>).  Since there's no outer 
> container for the javabin unmarshalling to detect the last document, it marks 
> _every_ document as {{req.lastDocInBatch()}}!  Ouch!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-9824) Documents indexed in bulk are replicated using too many HTTP requests

Reply via email to