[ 
https://issues.apache.org/jira/browse/SOLR-13240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16906293#comment-16906293
 ] 

Christine Poerschke commented on SOLR-13240:
--------------------------------------------

Thanks [~goodman] revising the patch, this time it passed QA (yay!) and I've 
gone ahead and pulled out two small separate bits from it into the commit above 
(to potentially make for an easier diff on the remaining changes).

So as a next step then I was trying to better understand the {{TestPolicy}} 
changes:
 * for TestPolicy.testReplicaCountSuggestions just a comment seems sufficient
 * for TestPolicy.testNodeLostMultipleReplica it was a bit tricky to understand 
first of all what the existing test actually does; with that in mind the 
just-attached patch includes two small refactors to surface the sequential 
nature of part of the test – would appreciate any thoughts on that.

With the small refactor in place we can more easily see that two sub-tests are 
being changed:
 * "lets change the policy such that all replicas that were on node1 can now 
fit on node2"
 * "now lets change the policy such that a node can have 2 shard2 replicas"

Now to the tricky bit. My interpretion of the revised test expectations is that 
now then the "a node can have 2 shardX replicas" condition is no longer being 
met i.e. previously r3 and r5 were expected to be on node2 but now r3 and r5 
are expected on different nodes. So whilst technically speaking the test 
passes, the coverage it intended to provide has been lost and so further 
adjustments to that sub-test might be needed to preserve the intended coverage. 
What do you think?

> UTILIZENODE action results in an exception
> ------------------------------------------
>
>                 Key: SOLR-13240
>                 URL: https://issues.apache.org/jira/browse/SOLR-13240
>             Project: Solr
>          Issue Type: Bug
>          Components: AutoScaling
>    Affects Versions: 7.6
>            Reporter: Hendrik Haddorp
>            Priority: Major
>         Attachments: SOLR-13240.patch, SOLR-13240.patch, SOLR-13240.patch, 
> SOLR-13240.patch, SOLR-13240.patch, SOLR-13240.patch, SOLR-13240.patch, 
> solr-solrj-7.5.0.jar
>
>
> When I invoke the UTILIZENODE action the REST call fails like this after it 
> moved a few replicas:
> {
>   "responseHeader":{
>     "status":500,
>     "QTime":40220},
>   "Operation utilizenode caused 
> exception:":"java.lang.IllegalArgumentException:java.lang.IllegalArgumentException:
>  Comparison method violates its general contract!",
>   "exception":{
>     "msg":"Comparison method violates its general contract!",
>     "rspCode":-1},
>   "error":{
>     "metadata":[
>       "error-class","org.apache.solr.common.SolrException",
>       "root-error-class","org.apache.solr.common.SolrException"],
>     "msg":"Comparison method violates its general contract!",
>     "trace":"org.apache.solr.common.SolrException: Comparison method violates 
> its general contract!\n\tat 
> org.apache.solr.client.solrj.SolrResponse.getException(SolrResponse.java:53)\n\tat
>  
> org.apache.solr.handler.admin.CollectionsHandler.invokeAction(CollectionsHandler.java:274)\n\tat
>  
> org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:246)\n\tat
>  
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)\n\tat
>  
> org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:734)\n\tat 
> org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:715)\n\tat
>  org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:496)\n\tat 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:377)\n\tat
>  
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:323)\n\tat
>  
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1634)\n\tat
>  
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)\n\tat
>  
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)\n\tat
>  
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1317)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)\n\tat
>  
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)\n\tat
>  
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1219)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
>  
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
>  org.eclipse.jetty.server.Server.handle(Server.java:531)\n\tat 
> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:352)\n\tat 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)\n\tat
>  
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:281)\n\tat
>  org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102)\n\tat 
> org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)\n\tat 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)\n\tat
>  
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)\n\tat
>  
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)\n\tat
>  
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)\n\tat
>  
> org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)\n\tat
>  
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:762)\n\tat
>  
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:680)\n\tat
>  java.lang.Thread.run(Thread.java:748)\n",
>     "code":500}} 
> The logs show this additional exception:
> 2019-02-10 00:09:00.539 ERROR 
> (OverseerThreadFactory-1268-thread-38-processing-n:agent2:9151_solr) [   ] 
> o.a.s.c.a.c.OverseerCollectionMessageHandler Operation utilizenode 
> failed:java.lang.IllegalArgumentException: Comparison method violates its 
> general contract!
>     at java.util.TimSort.mergeLo(TimSort.java:777)
>     at java.util.TimSort.mergeAt(TimSort.java:514)
>     at java.util.TimSort.mergeCollapse(TimSort.java:439)
>     at java.util.TimSort.sort(TimSort.java:245)
>     at java.util.Arrays.sort(Arrays.java:1512)
>     at java.util.ArrayList.sort(ArrayList.java:1462)
>     at 
> org.apache.solr.client.solrj.cloud.autoscaling.MoveReplicaSuggester.tryEachNode(MoveReplicaSuggester.java:50)
>     at 
> org.apache.solr.client.solrj.cloud.autoscaling.MoveReplicaSuggester.init(MoveReplicaSuggester.java:38)
>     at 
> org.apache.solr.client.solrj.cloud.autoscaling.Suggester.getSuggestion(Suggester.java:187)
>     at 
> org.apache.solr.cloud.api.collections.UtilizeNodeCmd.call(UtilizeNodeCmd.java:100)
>     at 
> org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler.processMessage(OverseerCollectionMessageHandler.java:259)
>     at 
> org.apache.solr.cloud.OverseerTaskProcessor$Runner.run(OverseerTaskProcessor.java:478)
>     at 
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:748) 
> I suspect this to be caused by this comparator: 
> https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/client/solrj/cloud/autoscaling/MoveReplicaSuggester.java#L98
> In case it is possible that both compared replicas are leaders the result 
> would not be correct.
> see also the mail thread about this:
> https://www.mail-archive.com/search?l=solr-u...@lucene.apache.org&q=subject:%22Re%5C%3A+CloudSolrClient+getDocCollection%22&o=newest&f=1



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to