[jira] [Commented] (SOLR-11392) StreamExpressionTest.testParallelExecutorStream fails too frequently

Alan Woodward (JIRA) Fri, 29 Sep 2017 02:44:19 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-11392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16185583#comment-16185583
 ]


Alan Woodward commented on SOLR-11392:
--------------------------------------

It looks like there are two issues here:

1) A previous test (in the case I'm looking at, testExecutorStream) is creating 
the mainCorpus collection, but for some reason it's created with replicas named 
_n1 and _n3:

{code}
 55135    [junit4]   2> 151484 INFO  (qtp1541434343-1456) [    ] 
o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/collections 
params={async=147c5276-5e91-4a73-912a-0025669a97ec&r
  55135 
eplicationFactor=1&collection.configName=conf&name=mainCorpus&nrtReplicas=1&action=CREATE&numShards=2&wt=javabin&version=2}
 status=0 QTime=1
  55136    [junit4]   2> 151485 INFO  (qtp1541434343-1455) [    ] 
o.a.s.h.a.CollectionsHandler Invoked Collection Action :requeststatus with 
params requestid=147c5276-5e91-4a73-912a-0
  55136 025669a97ec&action=REQUESTSTATUS&wt=javabin&version=2 and 
sendToOCPQueue=true
  55137    [junit4]   2> 151486 INFO  (qtp1541434343-1455) [    ] 
o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/collections 
params={requestid=147c5276-5e91-4a73-912a-0025669a97
  55137 ec&action=REQUESTSTATUS&wt=javabin&version=2} status=0 QTime=1
  55138    [junit4]   2> 151488 INFO  (OverseerThreadFactory-647-thread-5) [    
] o.a.s.c.CreateCollectionCmd Create collection mainCorpus
  55139    [junit4]   2> 151488 INFO  
(OverseerCollectionConfigSetProcessor-98740321293107209-127.0.0.1:65381_solr-n_0000000000)
 [    ] o.a.s.c.OverseerTaskQueue Response ZK path: /ov
  55139 erseer/collection-queue-work/qnr-0000000038 doesn't exist.  Requestor 
may have disconnected from ZooKeeper
  55140    [junit4]   2> 151610 INFO  
(OverseerStateUpdate-98740321293107209-127.0.0.1:65381_solr-n_0000000000) [    
] o.a.s.c.o.SliceMutator createReplica() {
  55141    [junit4]   2>   "operation":"ADDREPLICA",
  55142    [junit4]   2>   "collection":"mainCorpus",
  55143    [junit4]   2>   "shard":"shard1",
  55144    [junit4]   2>   "core":"mainCorpus_shard1_replica_n1",
  55145    [junit4]   2>   "state":"down",
  55146    [junit4]   2>   "base_url":"http://127.0.0.1:65381/solr";,
  55147    [junit4]   2>   "type":"NRT"}
  55148    [junit4]   2> 151616 INFO  
(OverseerStateUpdate-98740321293107209-127.0.0.1:65381_solr-n_0000000000) [    
] o.a.s.c.o.SliceMutator createReplica() {
  55149    [junit4]   2>   "operation":"ADDREPLICA",
  55150    [junit4]   2>   "collection":"mainCorpus",
  55151    [junit4]   2>   "shard":"shard2",
  55152    [junit4]   2>   "core":"mainCorpus_shard2_replica_n3",
  55153    [junit4]   2>   "state":"down",
  55154    [junit4]   2>   "base_url":"http://127.0.0.1:65394/solr";,
  55155    [junit4]   2>   "type":"NRT"}
{code}

This is a bit weird, but it works fine.  At the end of the test, the collection 
is deleted.  

Then testParallelExecutorStream starts up, and it too creates a 'mainCorpus' 
collection, only this time with shards named _n1 and _n2, as you'd expect.  The 
bug then comes when the cluster's existing CloudSolrClient is used to send 
updates to the newly recreated collection.  The cluster state provider still 
has state cached from the previous test, so it thinks that the relevant 
replicas to send data to are _n1 and _n3.  But when it gets a 404 back from the 
(no longer existing) _n3 replica, it doesn't invalidate its cache and try 
again, it just fails.  This looks like a genuine bug in CloudSolrClient.  
[~noble.paul] I think you're best placed to know how to fix this?

A workaround for the test is to use different collection names for the 
different tests.

> StreamExpressionTest.testParallelExecutorStream fails too frequently
> --------------------------------------------------------------------
>
>                 Key: SOLR-11392
>                 URL: https://issues.apache.org/jira/browse/SOLR-11392
>             Project: Solr
>          Issue Type: New Feature
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Joel Bernstein
>
> I've never been able to reproduce the failure but jenkins fails frequently 
> with the following error:
> {code}
> Stack Trace:
> org.apache.solr.client.solrj.impl.CloudSolrClient$RouteException: Error from 
> server at http://127.0.0.1:38180/solr/workQueue_shard2_replica_n3: Expected 
> mime type application/octet-stream but got text/html. <html>
> <head>
> <meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/>
> <title>Error 404 </title>
> </head>
> <body>
> <h2>HTTP ERROR: 404</h2>
> <p>Problem accessing /solr/workQueue_shard2_replica_n3/update. Reason:
> <pre>    Can not find: /solr/workQueue_shard2_replica_n3/update</pre></p>
> <hr /><a href="http://eclipse.org/jetty";>Powered by Jetty:// 
> 9.3.20.v20170531</a><hr/>
> </body>
> </html>
> {code}
> What appears to be happening is that the test framework is having trouble 
> setting up the collection.
> Here is the test code:
> {code}
> @Test
>   public void testParallelExecutorStream() throws Exception {
>     CollectionAdminRequest.createCollection("workQueue", "conf", 2, 
> 1).process(cluster.getSolrClient());
>     AbstractDistribZkTestBase.waitForRecoveriesToFinish("workQueue", 
> cluster.getSolrClient().getZkStateReader(),
>         false, true, TIMEOUT);
>     CollectionAdminRequest.createCollection("mainCorpus", "conf", 2, 
> 1).process(cluster.getSolrClient());
>     AbstractDistribZkTestBase.waitForRecoveriesToFinish("mainCorpus", 
> cluster.getSolrClient().getZkStateReader(),
>         false, true, TIMEOUT);
>     CollectionAdminRequest.createCollection("destination", "conf", 2, 
> 1).process(cluster.getSolrClient());
>     AbstractDistribZkTestBase.waitForRecoveriesToFinish("destination", 
> cluster.getSolrClient().getZkStateReader(),
>         false, true, TIMEOUT);
>     UpdateRequest workRequest = new UpdateRequest();
>     UpdateRequest dataRequest = new UpdateRequest();
>     for (int i = 0; i < 500; i++) {
>       workRequest.add(id, String.valueOf(i), "expr_s", "update(destination, 
> batchSize=50, search(mainCorpus, q=id:"+i+", rows=1, sort=\"id asc\", 
> fl=\"id, body_t, field_i\"))");
>       dataRequest.add(id, String.valueOf(i), "body_t", "hello world "+i, 
> "field_i", Integer.toString(i));
>     }
>     workRequest.commit(cluster.getSolrClient(), "workQueue");
>     dataRequest.commit(cluster.getSolrClient(), "mainCorpus");
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-11392) StreamExpressionTest.testParallelExecutorStream fails too frequently

Reply via email to