[
https://issues.apache.org/jira/browse/SOLR-12708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16624052#comment-16624052
]
Varun Thacker commented on SOLR-12708:
--------------------------------------
Hi Mano,
Thanks for the patch!
I'm curious about the 10 minute latch countdown timeout. Shouldn't we wait
forever here?
So here we're doing something different wrt success and failure . If the add
replica call has a failure we're adding it back to the main response but if
it's a success then we will end up skipping it ( at this point
results.get("success") will always be null ) .
{code:java}
ocmh.addReplica(clusterState, new ZkNodeProps(propMap), addResult, ()-> {
countDownLatch.countDown();
Object addResultFailure = addResult.get("failure");
if (addResultFailure != null) {
SimpleOrderedMap failure = (SimpleOrderedMap) results.get("failure");
if (failure == null) {
failure = new SimpleOrderedMap();
results.add("failure", failure);
}
failure.addAll((NamedList) addResultFailure);
} else {
SimpleOrderedMap success = (SimpleOrderedMap) results.get("success");
if (success == null) {
success = new SimpleOrderedMap();
results.add("success", success);
}
success.addAll((NamedList) addResult.get("success"));
}
});{code}
Can't we do this instead which will append the results directly to the main
object? We do this for the remaining add replicas as the last step of the
restore
{code:java}
ocmh.addReplica(clusterState, new ZkNodeProps(propMap), results, ()-> {
countDownLatch.countDown();
});{code}
> Async collection actions should not hide failures
> -------------------------------------------------
>
> Key: SOLR-12708
> URL: https://issues.apache.org/jira/browse/SOLR-12708
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Components: Admin UI, Backup/Restore
> Affects Versions: 7.4
> Reporter: Mano Kovacs
> Assignee: Varun Thacker
> Priority: Major
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Async collection API may hide failures compared to sync version.
> [OverseerCollectionMessageHandler::processResponses|https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/cloud/api/collections/OverseerCollectionMessageHandler.java#L744]
> structures errors differently in the response, that hides failures from most
> evaluators. RestoreCmd did not receive, nor handle async addReplica issues.
> Sample create collection sync and async result with invalid solrconfig.xml:
> {noformat}
> {
> "responseHeader":{
> "status":0,
> "QTime":32104},
> "failure":{
> "localhost:8983_solr":"org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:Error
> from server at http://localhost:8983/solr: Error CREATEing SolrCore
> 'name4_shard1_replica_n1': Unable to create core [name4_shard1_replica_n1]
> Caused by: The content of elements must consist of well-formed character data
> or markup.",
> "localhost:8983_solr":"org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:Error
> from server at http://localhost:8983/solr: Error CREATEing SolrCore
> 'name4_shard2_replica_n2': Unable to create core [name4_shard2_replica_n2]
> Caused by: The content of elements must consist of well-formed character data
> or markup.",
> "localhost:8983_solr":"org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:Error
> from server at http://localhost:8983/solr: Error CREATEing SolrCore
> 'name4_shard1_replica_n2': Unable to create core [name4_shard1_replica_n2]
> Caused by: The content of elements must consist of well-formed character data
> or markup.",
> "localhost:8983_solr":"org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:Error
> from server at http://localhost:8983/solr: Error CREATEing SolrCore
> 'name4_shard2_replica_n1': Unable to create core [name4_shard2_replica_n1]
> Caused by: The content of elements must consist of well-formed character data
> or markup."}
> }
> {noformat}
> vs async:
> {noformat}
> {
> "responseHeader":{
> "status":0,
> "QTime":3},
> "success":{
> "localhost:8983_solr":{
> "responseHeader":{
> "status":0,
> "QTime":12}},
> "localhost:8983_solr":{
> "responseHeader":{
> "status":0,
> "QTime":3}},
> "localhost:8983_solr":{
> "responseHeader":{
> "status":0,
> "QTime":11}},
> "localhost:8983_solr":{
> "responseHeader":{
> "status":0,
> "QTime":12}}},
> "myTaskId2709146382836":{
> "responseHeader":{
> "status":0,
> "QTime":1},
> "STATUS":"failed",
> "Response":"Error CREATEing SolrCore 'name_shard2_replica_n2': Unable to
> create core [name_shard2_replica_n2] Caused by: The content of elements must
> consist of well-formed character data or markup."},
> "status":{
> "state":"completed",
> "msg":"found [myTaskId] in completed tasks"}}
> {noformat}
> Proposing adding failure node to the results, keeping backward compatible but
> correct result.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]