[jira] [Commented] (SOLR-6264) optimize with waitSearcher=true leads to serial execution across all replicas
[ https://issues.apache.org/jira/browse/SOLR-6264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14075421#comment-14075421 ] Shalin Shekhar Mangar commented on SOLR-6264: - +1 LGTM > optimize with waitSearcher=true leads to serial execution across all replicas > - > > Key: SOLR-6264 > URL: https://issues.apache.org/jira/browse/SOLR-6264 > Project: Solr > Issue Type: Improvement > Components: SolrCloud >Reporter: Timothy Potter >Assignee: Mark Miller > Attachments: SOLR-6264.patch, SOLR-6264.patch > > > Regardless of whether one agrees with optimizing, when you execute an > optimize request using waitSearcher=true, the requests from the controller > node are sent to each replica in the collection serially. > You can send the optimize command to the update handler for a collection to > any node in the cluster. For instance, if I had a collection named "foo": > curl -i -v http://localhost:8984/solr/foo/update --data-binary ' maxSegments="1" waitSearcher="true"/>' -H 'Content-type:application/xml' > The node that receives this request will collect the URL for all "live" > replicas in the collection (not just leaders) (see > DistributedUpdateProcessor#getCollectionUrls) and then forward the commit > request to each of them. On the surface, the code looks like it forwards the > request asynchronously to all replicas. However, this is not actually what > happens; the commit requests to each replica in the collection will be > processed serially when using waitSearcher=true (because > ConcurrentUpdateSolrServer's background queue processing is by-passed for > commits). > Bottom-line, if you request the collection to be optimized, the request gets > forwarded around as you'd expect but is done synchronously so can take a long > time. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6264) optimize with waitSearcher=true leads to serial execution across all replicas
[ https://issues.apache.org/jira/browse/SOLR-6264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14070415#comment-14070415 ] Timothy Potter commented on SOLR-6264: -- Patch looks good and I ran it through my scenario (described above ^) and the optimize was definitely sent to all replicas in parallel and finished in less than half the runtime previously. > optimize with waitSearcher=true leads to serial execution across all replicas > - > > Key: SOLR-6264 > URL: https://issues.apache.org/jira/browse/SOLR-6264 > Project: Solr > Issue Type: Improvement > Components: SolrCloud >Reporter: Timothy Potter > Attachments: SOLR-6264.patch > > > Regardless of whether one agrees with optimizing, when you execute an > optimize request using waitSearcher=true, the requests from the controller > node are sent to each replica in the collection serially. > You can send the optimize command to the update handler for a collection to > any node in the cluster. For instance, if I had a collection named "foo": > curl -i -v http://localhost:8984/solr/foo/update --data-binary ' maxSegments="1" waitSearcher="true"/>' -H 'Content-type:application/xml' > The node that receives this request will collect the URL for all "live" > replicas in the collection (not just leaders) (see > DistributedUpdateProcessor#getCollectionUrls) and then forward the commit > request to each of them. On the surface, the code looks like it forwards the > request asynchronously to all replicas. However, this is not actually what > happens; the commit requests to each replica in the collection will be > processed serially when using waitSearcher=true (because > ConcurrentUpdateSolrServer's background queue processing is by-passed for > commits). > Bottom-line, if you request the collection to be optimized, the request gets > forwarded around as you'd expect but is done synchronously so can take a long > time. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6264) optimize with waitSearcher=true leads to serial execution across all replicas
[ https://issues.apache.org/jira/browse/SOLR-6264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14069776#comment-14069776 ] Mark Miller commented on SOLR-6264: --- bq. with waitSearcher=true I think you will get the same thing with a pure commit and no docs or deletes. > optimize with waitSearcher=true leads to serial execution across all replicas > - > > Key: SOLR-6264 > URL: https://issues.apache.org/jira/browse/SOLR-6264 > Project: Solr > Issue Type: Improvement > Components: SolrCloud >Reporter: Timothy Potter > Attachments: SOLR-6264.patch > > > Regardless of whether one agrees with optimizing, when you execute an > optimize request using waitSearcher=true, the requests from the controller > node are sent to each replica in the collection serially. > You can send the optimize command to the update handler for a collection to > any node in the cluster. For instance, if I had a collection named "foo": > curl -i -v http://localhost:8984/solr/foo/update --data-binary ' maxSegments="1" waitSearcher="true"/>' -H 'Content-type:application/xml' > The node that receives this request will collect the URL for all "live" > replicas in the collection (not just leaders) (see > DistributedUpdateProcessor#getCollectionUrls) and then forward the commit > request to each of them. On the surface, the code looks like it forwards the > request asynchronously to all replicas. However, this is not actually what > happens; the commit requests to each replica in the collection will be > processed serially when using waitSearcher=true (because > ConcurrentUpdateSolrServer's background queue processing is by-passed for > commits). > Bottom-line, if you request the collection to be optimized, the request gets > forwarded around as you'd expect but is done synchronously so can take a long > time. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6264) optimize with waitSearcher=true leads to serial execution across all replicas
[ https://issues.apache.org/jira/browse/SOLR-6264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14069690#comment-14069690 ] Mark Miller commented on SOLR-6264: --- I still have to finish it up and run some tests - just a quick jam out for comment. > optimize with waitSearcher=true leads to serial execution across all replicas > - > > Key: SOLR-6264 > URL: https://issues.apache.org/jira/browse/SOLR-6264 > Project: Solr > Issue Type: Improvement > Components: SolrCloud >Reporter: Timothy Potter > Attachments: SOLR-6264.patch > > > Regardless of whether one agrees with optimizing, when you execute an > optimize request using waitSearcher=true, the requests from the controller > node are sent to each replica in the collection serially. > You can send the optimize command to the update handler for a collection to > any node in the cluster. For instance, if I had a collection named "foo": > curl -i -v http://localhost:8984/solr/foo/update --data-binary ' maxSegments="1" waitSearcher="true"/>' -H 'Content-type:application/xml' > The node that receives this request will collect the URL for all "live" > replicas in the collection (not just leaders) (see > DistributedUpdateProcessor#getCollectionUrls) and then forward the commit > request to each of them. On the surface, the code looks like it forwards the > request asynchronously to all replicas. However, this is not actually what > happens; the commit requests to each replica in the collection will be > processed serially when using waitSearcher=true (because > ConcurrentUpdateSolrServer's background queue processing is by-passed for > commits). > Bottom-line, if you request the collection to be optimized, the request gets > forwarded around as you'd expect but is done synchronously so can take a long > time. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6264) optimize with waitSearcher=true leads to serial execution across all replicas
[ https://issues.apache.org/jira/browse/SOLR-6264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14069684#comment-14069684 ] Mark Miller commented on SOLR-6264: --- Here is a rough patch with what I'm thinking. > optimize with waitSearcher=true leads to serial execution across all replicas > - > > Key: SOLR-6264 > URL: https://issues.apache.org/jira/browse/SOLR-6264 > Project: Solr > Issue Type: Improvement > Components: SolrCloud >Reporter: Timothy Potter > > Regardless of whether one agrees with optimizing, when you execute an > optimize request using waitSearcher=true, the requests from the controller > node are sent to each replica in the collection serially. > You can send the optimize command to the update handler for a collection to > any node in the cluster. For instance, if I had a collection named "foo": > curl -i -v http://localhost:8984/solr/foo/update --data-binary ' maxSegments="1" waitSearcher="true"/>' -H 'Content-type:application/xml' > The node that receives this request will collect the URL for all "live" > replicas in the collection (not just leaders) (see > DistributedUpdateProcessor#getCollectionUrls) and then forward the commit > request to each of them. On the surface, the code looks like it forwards the > request asynchronously to all replicas. However, this is not actually what > happens; the commit requests to each replica in the collection will be > processed serially when using waitSearcher=true (because > ConcurrentUpdateSolrServer's background queue processing is by-passed for > commits). > Bottom-line, if you request the collection to be optimized, the request gets > forwarded around as you'd expect but is done synchronously so can take a long > time. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6264) optimize with waitSearcher=true leads to serial execution across all replicas
[ https://issues.apache.org/jira/browse/SOLR-6264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14069630#comment-14069630 ] Timothy Potter commented on SOLR-6264: -- We probably only want to use the thread poll for commits (and optimizes) ... for other update requests, we probably don't want to spawn a thread that spawns runners, right? > optimize with waitSearcher=true leads to serial execution across all replicas > - > > Key: SOLR-6264 > URL: https://issues.apache.org/jira/browse/SOLR-6264 > Project: Solr > Issue Type: Improvement > Components: SolrCloud >Reporter: Timothy Potter > > Regardless of whether one agrees with optimizing, when you execute an > optimize request using waitSearcher=true, the requests from the controller > node are sent to each replica in the collection serially. > You can send the optimize command to the update handler for a collection to > any node in the cluster. For instance, if I had a collection named "foo": > curl -i -v http://localhost:8984/solr/foo/update --data-binary ' maxSegments="1" waitSearcher="true"/>' -H 'Content-type:application/xml' > The node that receives this request will collect the URL for all "live" > replicas in the collection (not just leaders) (see > DistributedUpdateProcessor#getCollectionUrls) and then forward the commit > request to each of them. On the surface, the code looks like it forwards the > request asynchronously to all replicas. However, this is not actually what > happens; the commit requests to each replica in the collection will be > processed serially when using waitSearcher=true (because > ConcurrentUpdateSolrServer's background queue processing is by-passed for > commits). > Bottom-line, if you request the collection to be optimized, the request gets > forwarded around as you'd expect but is done synchronously so can take a long > time. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6264) optimize with waitSearcher=true leads to serial execution across all replicas
[ https://issues.apache.org/jira/browse/SOLR-6264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14069583#comment-14069583 ] Mark Miller commented on SOLR-6264: --- Perhaps we have to put in a thread pool and ensure the async path of SolrCmdDistrbiutor#submit is async by putting it on another thread and making errors thread safe. I'm not sure - take a bit of thought to trace it all out. > optimize with waitSearcher=true leads to serial execution across all replicas > - > > Key: SOLR-6264 > URL: https://issues.apache.org/jira/browse/SOLR-6264 > Project: Solr > Issue Type: Improvement > Components: SolrCloud >Reporter: Timothy Potter > > Regardless of whether one agrees with optimizing, when you execute an > optimize request using waitSearcher=true, the requests from the controller > node are sent to each replica in the collection serially. > You can send the optimize command to the update handler for a collection to > any node in the cluster. For instance, if I had a collection named "foo": > curl -i -v http://localhost:8984/solr/foo/update --data-binary ' maxSegments="1" waitSearcher="true"/>' -H 'Content-type:application/xml' > The node that receives this request will collect the URL for all "live" > replicas in the collection (not just leaders) (see > DistributedUpdateProcessor#getCollectionUrls) and then forward the commit > request to each of them. On the surface, the code looks like it forwards the > request asynchronously to all replicas. However, this is not actually what > happens; the commit requests to each replica in the collection will be > processed serially when using waitSearcher=true (because > ConcurrentUpdateSolrServer's background queue processing is by-passed for > commits). > Bottom-line, if you request the collection to be optimized, the request gets > forwarded around as you'd expect but is done synchronously so can take a long > time. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6264) optimize with waitSearcher=true leads to serial execution across all replicas
[ https://issues.apache.org/jira/browse/SOLR-6264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14069584#comment-14069584 ] Mark Miller commented on SOLR-6264: --- bq. I think it is true for commits too. It's certainly true for commits - it only happens for optimize because it rides commits. > optimize with waitSearcher=true leads to serial execution across all replicas > - > > Key: SOLR-6264 > URL: https://issues.apache.org/jira/browse/SOLR-6264 > Project: Solr > Issue Type: Improvement > Components: SolrCloud >Reporter: Timothy Potter > > Regardless of whether one agrees with optimizing, when you execute an > optimize request using waitSearcher=true, the requests from the controller > node are sent to each replica in the collection serially. > You can send the optimize command to the update handler for a collection to > any node in the cluster. For instance, if I had a collection named "foo": > curl -i -v http://localhost:8984/solr/foo/update --data-binary ' maxSegments="1" waitSearcher="true"/>' -H 'Content-type:application/xml' > The node that receives this request will collect the URL for all "live" > replicas in the collection (not just leaders) (see > DistributedUpdateProcessor#getCollectionUrls) and then forward the commit > request to each of them. On the surface, the code looks like it forwards the > request asynchronously to all replicas. However, this is not actually what > happens; the commit requests to each replica in the collection will be > processed serially when using waitSearcher=true (because > ConcurrentUpdateSolrServer's background queue processing is by-passed for > commits). > Bottom-line, if you request the collection to be optimized, the request gets > forwarded around as you'd expect but is done synchronously so can take a long > time. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6264) optimize with waitSearcher=true leads to serial execution across all replicas
[ https://issues.apache.org/jira/browse/SOLR-6264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14069577#comment-14069577 ] Timothy Potter commented on SOLR-6264: -- Yes, we do, which is why this is tricky to see ;-) The SolrCmdDistributor.distribCommit has a for loop that calls submit: for (Node node : nodes) { submit(new Req(cmd.toString(), node, uReq, false)); } The submit uses a different CUSS of course, but the for loop is blocked because the "async" submit is actually sync because ConcurrentUpdateSolrServer skips the runners part if it's a commit. I only stumbled upon this by looking at timestamp of requests and realized they were running serially and then scratched my head a bit because I know StreamingSolrServers and CUSS pretty well at this point. I think it is true for commits too. > optimize with waitSearcher=true leads to serial execution across all replicas > - > > Key: SOLR-6264 > URL: https://issues.apache.org/jira/browse/SOLR-6264 > Project: Solr > Issue Type: Improvement > Components: SolrCloud >Reporter: Timothy Potter > > Regardless of whether one agrees with optimizing, when you execute an > optimize request using waitSearcher=true, the requests from the controller > node are sent to each replica in the collection serially. > You can send the optimize command to the update handler for a collection to > any node in the cluster. For instance, if I had a collection named "foo": > curl -i -v http://localhost:8984/solr/foo/update --data-binary ' maxSegments="1" waitSearcher="true"/>' -H 'Content-type:application/xml' > The node that receives this request will collect the URL for all "live" > replicas in the collection (not just leaders) (see > DistributedUpdateProcessor#getCollectionUrls) and then forward the commit > request to each of them. On the surface, the code looks like it forwards the > request asynchronously to all replicas. However, this is not actually what > happens; the commit requests to each replica in the collection will be > processed serially when using waitSearcher=true (because > ConcurrentUpdateSolrServer's background queue processing is by-passed for > commits). > Bottom-line, if you request the collection to be optimized, the request gets > forwarded around as you'd expect but is done synchronously so can take a long > time. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6264) optimize with waitSearcher=true leads to serial execution across all replicas
[ https://issues.apache.org/jira/browse/SOLR-6264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14069574#comment-14069574 ] Mark Miller commented on SOLR-6264: --- I see - it waits for each call on each ConcurrentUpdateSolrServer#request call as it loops through them. Interesting. Good find, fairly ugly, let's fix it. > optimize with waitSearcher=true leads to serial execution across all replicas > - > > Key: SOLR-6264 > URL: https://issues.apache.org/jira/browse/SOLR-6264 > Project: Solr > Issue Type: Improvement > Components: SolrCloud >Reporter: Timothy Potter > > Regardless of whether one agrees with optimizing, when you execute an > optimize request using waitSearcher=true, the requests from the controller > node are sent to each replica in the collection serially. > You can send the optimize command to the update handler for a collection to > any node in the cluster. For instance, if I had a collection named "foo": > curl -i -v http://localhost:8984/solr/foo/update --data-binary ' maxSegments="1" waitSearcher="true"/>' -H 'Content-type:application/xml' > The node that receives this request will collect the URL for all "live" > replicas in the collection (not just leaders) (see > DistributedUpdateProcessor#getCollectionUrls) and then forward the commit > request to each of them. On the surface, the code looks like it forwards the > request asynchronously to all replicas. However, this is not actually what > happens; the commit requests to each replica in the collection will be > processed serially when using waitSearcher=true (because > ConcurrentUpdateSolrServer's background queue processing is by-passed for > commits). > Bottom-line, if you request the collection to be optimized, the request gets > forwarded around as you'd expect but is done synchronously so can take a long > time. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6264) optimize with waitSearcher=true leads to serial execution across all replicas
[ https://issues.apache.org/jira/browse/SOLR-6264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14069570#comment-14069570 ] Mark Miller commented on SOLR-6264: --- {quote} waitSearcher=true (because ConcurrentUpdateSolrServer's background queue processing is by-passed for commits).{quote} But don't we use a different ConcurrentUpdateSolrServer for each Solr URL? > optimize with waitSearcher=true leads to serial execution across all replicas > - > > Key: SOLR-6264 > URL: https://issues.apache.org/jira/browse/SOLR-6264 > Project: Solr > Issue Type: Improvement > Components: SolrCloud >Reporter: Timothy Potter > > Regardless of whether one agrees with optimizing, when you execute an > optimize request using waitSearcher=true, the requests from the controller > node are sent to each replica in the collection serially. > You can send the optimize command to the update handler for a collection to > any node in the cluster. For instance, if I had a collection named "foo": > curl -i -v http://localhost:8984/solr/foo/update --data-binary ' maxSegments="1" waitSearcher="true"/>' -H 'Content-type:application/xml' > The node that receives this request will collect the URL for all "live" > replicas in the collection (not just leaders) (see > DistributedUpdateProcessor#getCollectionUrls) and then forward the commit > request to each of them. On the surface, the code looks like it forwards the > request asynchronously to all replicas. However, this is not actually what > happens; the commit requests to each replica in the collection will be > processed serially when using waitSearcher=true (because > ConcurrentUpdateSolrServer's background queue processing is by-passed for > commits). > Bottom-line, if you request the collection to be optimized, the request gets > forwarded around as you'd expect but is done synchronously so can take a long > time. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6264) optimize with waitSearcher=true leads to serial execution across all replicas
[ https://issues.apache.org/jira/browse/SOLR-6264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14069552#comment-14069552 ] Yonik Seeley commented on SOLR-6264: Good catch! Is this true for commit also? > optimize with waitSearcher=true leads to serial execution across all replicas > - > > Key: SOLR-6264 > URL: https://issues.apache.org/jira/browse/SOLR-6264 > Project: Solr > Issue Type: Improvement > Components: SolrCloud >Reporter: Timothy Potter > > Regardless of whether one agrees with optimizing, when you execute an > optimize request using waitSearcher=true, the requests from the controller > node are sent to each replica in the collection serially. > You can send the optimize command to the update handler for a collection to > any node in the cluster. For instance, if I had a collection named "foo": > curl -i -v http://localhost:8984/solr/foo/update --data-binary ' maxSegments="1" waitSearcher="true"/>' -H 'Content-type:application/xml' > The node that receives this request will collect the URL for all "live" > replicas in the collection (not just leaders) (see > DistributedUpdateProcessor#getCollectionUrls) and then forward the commit > request to each of them. On the surface, the code looks like it forwards the > request asynchronously to all replicas. However, this is not actually what > happens; the commit requests to each replica in the collection will be > processed serially when using waitSearcher=true (because > ConcurrentUpdateSolrServer's background queue processing is by-passed for > commits). > Bottom-line, if you request the collection to be optimized, the request gets > forwarded around as you'd expect but is done synchronously so can take a long > time. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org