[ https://issues.apache.org/jira/browse/SOLR-12708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16632585#comment-16632585 ]
Varun Thacker commented on SOLR-12708: -------------------------------------- {quote}I have to be honest and admit that I copied the full block from {{CreateShardCmd.java}}. I think the code is doing the right thing there. In both branches of the {{if}} the code checks if the main {{results}} has success/failure node already, and creates if necessary. Then adds the corresponding {{addResult}} field into the main one. The only difference is that the failure recalled before the {{if}} block. {quote} I think the usage of this code block is correct in CreateShardCmd but not where we are using it in the patch. Here's why : CreateShardCmd is one core admin API call . So the response is either a success or failure. Hence the if-else block covers it. In this patch, there are multiple add-replica calls who's response we are acknowledging. So there can be replicas that came back with success and some that failed. If there is a failure we will just add the failure response back to the results and not the success ones . This way we process requests and responses are very complicated for some reason and we should improve it in general . But do you see what I am seeing here? > Async collection actions should not hide failures > ------------------------------------------------- > > Key: SOLR-12708 > URL: https://issues.apache.org/jira/browse/SOLR-12708 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Admin UI, Backup/Restore > Affects Versions: 7.4 > Reporter: Mano Kovacs > Assignee: Varun Thacker > Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Async collection API may hide failures compared to sync version. > [OverseerCollectionMessageHandler::processResponses|https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/cloud/api/collections/OverseerCollectionMessageHandler.java#L744] > structures errors differently in the response, that hides failures from most > evaluators. RestoreCmd did not receive, nor handle async addReplica issues. > Sample create collection sync and async result with invalid solrconfig.xml: > {noformat} > { > "responseHeader":{ > "status":0, > "QTime":32104}, > "failure":{ > "localhost:8983_solr":"org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:Error > from server at http://localhost:8983/solr: Error CREATEing SolrCore > 'name4_shard1_replica_n1': Unable to create core [name4_shard1_replica_n1] > Caused by: The content of elements must consist of well-formed character data > or markup.", > "localhost:8983_solr":"org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:Error > from server at http://localhost:8983/solr: Error CREATEing SolrCore > 'name4_shard2_replica_n2': Unable to create core [name4_shard2_replica_n2] > Caused by: The content of elements must consist of well-formed character data > or markup.", > "localhost:8983_solr":"org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:Error > from server at http://localhost:8983/solr: Error CREATEing SolrCore > 'name4_shard1_replica_n2': Unable to create core [name4_shard1_replica_n2] > Caused by: The content of elements must consist of well-formed character data > or markup.", > "localhost:8983_solr":"org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:Error > from server at http://localhost:8983/solr: Error CREATEing SolrCore > 'name4_shard2_replica_n1': Unable to create core [name4_shard2_replica_n1] > Caused by: The content of elements must consist of well-formed character data > or markup."} > } > {noformat} > vs async: > {noformat} > { > "responseHeader":{ > "status":0, > "QTime":3}, > "success":{ > "localhost:8983_solr":{ > "responseHeader":{ > "status":0, > "QTime":12}}, > "localhost:8983_solr":{ > "responseHeader":{ > "status":0, > "QTime":3}}, > "localhost:8983_solr":{ > "responseHeader":{ > "status":0, > "QTime":11}}, > "localhost:8983_solr":{ > "responseHeader":{ > "status":0, > "QTime":12}}}, > "myTaskId2709146382836":{ > "responseHeader":{ > "status":0, > "QTime":1}, > "STATUS":"failed", > "Response":"Error CREATEing SolrCore 'name_shard2_replica_n2': Unable to > create core [name_shard2_replica_n2] Caused by: The content of elements must > consist of well-formed character data or markup."}, > "status":{ > "state":"completed", > "msg":"found [myTaskId] in completed tasks"}} > {noformat} > Proposing adding failure node to the results, keeping backward compatible but > correct result. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org