[ 
https://issues.apache.org/jira/browse/SOLR-13445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man reopened SOLR-13445:
-----------------------------

jenkins has found at least 2 problems with the new 
RoutingToNodesWithPropertiesTest class...

[https://jenkins.thetaphi.de/view/Lucene-Solr/job/Lucene-Solr-8.x-Linux/536/]
----
First: a reproducing failing seed (on branch_8x)...
{noformat}
   [junit4]   2> NOTE: reproduce with: ant test  
-Dtestcase=RoutingToNodesWithPropertiesTest -Dtests.method=test 
-Dtests.seed=13525A4073A0EB3F -Dtests.multiplier=3 -Dtests.slow=true 
-Dtests.locale=zh-HK -Dtests.timezone=Brazil/Acre -Dtests.asserts=true 
-Dtests.file.encoding=UTF-8
   [junit4] FAILURE 0.45s J1 | RoutingToNodesWithPropertiesTest.test <<<
   [junit4]    > Throwable #1: java.lang.AssertionError: Hitting same zone 
after 10 queries
   [junit4]    >        at 
__randomizedtesting.SeedInfo.seed([13525A4073A0EB3F:9B06659ADD5C86C7]:0)
   [junit4]    >        at 
org.apache.solr.cloud.RoutingToNodesWithPropertiesTest.test(RoutingToNodesWithPropertiesTest.java:251)
   [junit4]    >        at java.lang.Thread.run(Thread.java:748)
{noformat}
At a glance, the problem seems to be that the test assumes if it tries a query 
10 times, at least one of those queries is will hit 2 nodes in different 
"zones" – but there's no guarantee of that, it's pure dumb luck – it's like 
having a test that calls {{random().nextInt(2)}} in a loop 10 times and asserts 
that it got a value of "0" at least iteration ... it's statistically going to 
fail some fixed percentage of time.
----
Second: when jenkins tries to reproduce the seed, it runs with 
{{-Dtests.dups=5}} but this causes an initialization failure in the BeforeClass 
method ... i'm not certain, but at a glance I'm guessing this is because of 
static variables that aren't being cleaned up in the AfterClass method?
{noformat}
   [junit4] ERROR   0.00s J2 | RoutingToNodesWithPropertiesTest (suite) <<<
   [junit4]    > Throwable #1: java.lang.AssertionError: expected:<us-west1> 
but was:<null>
   [junit4]    >        at 
__randomizedtesting.SeedInfo.seed([13525A4073A0EB3F]:0)
   [junit4]    >        at 
org.apache.solr.cloud.RoutingToNodesWithPropertiesTest.setupCluster(RoutingToNodesWithPropertiesTest.java:115)
   [junit4]    >        at java.lang.Thread.run(Thread.java:748)
 {noformat}

> Preferred replicas on nodes with same system properties as the query master
> ---------------------------------------------------------------------------
>
>                 Key: SOLR-13445
>                 URL: https://issues.apache.org/jira/browse/SOLR-13445
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Cao Manh Dat
>            Assignee: Cao Manh Dat
>            Priority: Major
>             Fix For: master (9.0), 8.2
>
>         Attachments: SOLR-13445.patch, SOLR-13445.patch, SOLR-13445.patch
>
>
> Currently, Solr chooses a random replica for each shard to fan out the query 
> request. However, this presents a problem when running Solr in multiple 
> availability zones.
> If one availability zone fails then it affects all Solr nodes because they 
> will try to connect to Solr nodes in the failed availability zone until the 
> request times out. This can lead to a build up of threads on each Solr node 
> until the node goes out of memory. This results in a cascading failure.
> This issue try to solve this problem by adding
> * another shardPreference param named {{node.sysprop}}, so the query will be 
> routed to nodes with same defined system properties as the current one.
> * default shardPreferences on the whole cluster, which will be stored in 
> {{/clusterprops.json}}.
> * a cacher for fetching other nodes system properties whenever /live_nodes 
> get changed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to