[jira] [Commented] (SOLR-13445) Preferred replicas on nodes with same system properties as the query master

2019-05-13 Thread JIRA


[ 
https://issues.apache.org/jira/browse/SOLR-13445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838635#comment-16838635
 ] 

Tomás Fernández Löbbe commented on SOLR-13445:
--

Nice! Thanks for adding this Dat!

> Preferred replicas on nodes with same system properties as the query master
> ---
>
> Key: SOLR-13445
> URL: https://issues.apache.org/jira/browse/SOLR-13445
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Major
> Fix For: master (9.0), 8.2
>
> Attachments: SOLR-13445.patch, SOLR-13445.patch, SOLR-13445.patch
>
>
> Currently, Solr chooses a random replica for each shard to fan out the query 
> request. However, this presents a problem when running Solr in multiple 
> availability zones.
> If one availability zone fails then it affects all Solr nodes because they 
> will try to connect to Solr nodes in the failed availability zone until the 
> request times out. This can lead to a build up of threads on each Solr node 
> until the node goes out of memory. This results in a cascading failure.
> This issue try to solve this problem by adding
> * another shardPreference param named {{node.sysprop}}, so the query will be 
> routed to nodes with same defined system properties as the current one.
> * default shardPreferences on the whole cluster, which will be stored in 
> {{/clusterprops.json}}.
> * a cacher for fetching other nodes system properties whenever /live_nodes 
> get changed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13445) Preferred replicas on nodes with same system properties as the query master

2019-05-10 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16837327#comment-16837327
 ] 

ASF subversion and git services commented on SOLR-13445:


Commit 3e300e1a94d74dcdea0496ca9c908deb7313db0e in lucene-solr's branch 
refs/heads/branch_8x from Cao Manh Dat
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=3e300e1 ]

SOLR-13445: Hardness the test


> Preferred replicas on nodes with same system properties as the query master
> ---
>
> Key: SOLR-13445
> URL: https://issues.apache.org/jira/browse/SOLR-13445
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Major
> Fix For: master (9.0), 8.2
>
> Attachments: SOLR-13445.patch, SOLR-13445.patch, SOLR-13445.patch
>
>
> Currently, Solr chooses a random replica for each shard to fan out the query 
> request. However, this presents a problem when running Solr in multiple 
> availability zones.
> If one availability zone fails then it affects all Solr nodes because they 
> will try to connect to Solr nodes in the failed availability zone until the 
> request times out. This can lead to a build up of threads on each Solr node 
> until the node goes out of memory. This results in a cascading failure.
> This issue try to solve this problem by adding
> * another shardPreference param named {{node.sysprop}}, so the query will be 
> routed to nodes with same defined system properties as the current one.
> * default shardPreferences on the whole cluster, which will be stored in 
> {{/clusterprops.json}}.
> * a cacher for fetching other nodes system properties whenever /live_nodes 
> get changed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13445) Preferred replicas on nodes with same system properties as the query master

2019-05-10 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16837326#comment-16837326
 ] 

ASF subversion and git services commented on SOLR-13445:


Commit 6a06bcd47007870d4148d1f131bbdbd8f0924a31 in lucene-solr's branch 
refs/heads/master from Cao Manh Dat
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=6a06bcd ]

SOLR-13445: Hardness the test


> Preferred replicas on nodes with same system properties as the query master
> ---
>
> Key: SOLR-13445
> URL: https://issues.apache.org/jira/browse/SOLR-13445
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Major
> Fix For: master (9.0), 8.2
>
> Attachments: SOLR-13445.patch, SOLR-13445.patch, SOLR-13445.patch
>
>
> Currently, Solr chooses a random replica for each shard to fan out the query 
> request. However, this presents a problem when running Solr in multiple 
> availability zones.
> If one availability zone fails then it affects all Solr nodes because they 
> will try to connect to Solr nodes in the failed availability zone until the 
> request times out. This can lead to a build up of threads on each Solr node 
> until the node goes out of memory. This results in a cascading failure.
> This issue try to solve this problem by adding
> * another shardPreference param named {{node.sysprop}}, so the query will be 
> routed to nodes with same defined system properties as the current one.
> * default shardPreferences on the whole cluster, which will be stored in 
> {{/clusterprops.json}}.
> * a cacher for fetching other nodes system properties whenever /live_nodes 
> get changed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13445) Preferred replicas on nodes with same system properties as the query master

2019-05-10 Thread Cao Manh Dat (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16837323#comment-16837323
 ] 

Cao Manh Dat commented on SOLR-13445:
-

Hi [~hossman], thanks a lot for your comment. 
It seems that with -Dtests.dups, static instance are reused. That makes the 
code query to gone nodes of previous tests.

> Preferred replicas on nodes with same system properties as the query master
> ---
>
> Key: SOLR-13445
> URL: https://issues.apache.org/jira/browse/SOLR-13445
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Major
> Fix For: master (9.0), 8.2
>
> Attachments: SOLR-13445.patch, SOLR-13445.patch, SOLR-13445.patch
>
>
> Currently, Solr chooses a random replica for each shard to fan out the query 
> request. However, this presents a problem when running Solr in multiple 
> availability zones.
> If one availability zone fails then it affects all Solr nodes because they 
> will try to connect to Solr nodes in the failed availability zone until the 
> request times out. This can lead to a build up of threads on each Solr node 
> until the node goes out of memory. This results in a cascading failure.
> This issue try to solve this problem by adding
> * another shardPreference param named {{node.sysprop}}, so the query will be 
> routed to nodes with same defined system properties as the current one.
> * default shardPreferences on the whole cluster, which will be stored in 
> {{/clusterprops.json}}.
> * a cacher for fetching other nodes system properties whenever /live_nodes 
> get changed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13445) Preferred replicas on nodes with same system properties as the query master

2019-05-08 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835796#comment-16835796
 ] 

ASF subversion and git services commented on SOLR-13445:


Commit 2ec14f0323e0ed28d36ac984f79c00890b26271d in lucene-solr's branch 
refs/heads/branch_8x from Cao Manh Dat
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=2ec14f0 ]

SOLR-13445: Fix precommit


> Preferred replicas on nodes with same system properties as the query master
> ---
>
> Key: SOLR-13445
> URL: https://issues.apache.org/jira/browse/SOLR-13445
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Major
> Fix For: master (9.0), 8.2
>
> Attachments: SOLR-13445.patch, SOLR-13445.patch, SOLR-13445.patch
>
>
> Currently, Solr chooses a random replica for each shard to fan out the query 
> request. However, this presents a problem when running Solr in multiple 
> availability zones.
> If one availability zone fails then it affects all Solr nodes because they 
> will try to connect to Solr nodes in the failed availability zone until the 
> request times out. This can lead to a build up of threads on each Solr node 
> until the node goes out of memory. This results in a cascading failure.
> This issue try to solve this problem by adding
> * another shardPreference param named {{node.sysprop}}, so the query will be 
> routed to nodes with same defined system properties as the current one.
> * default shardPreferences on the whole cluster, which will be stored in 
> {{/clusterprops.json}}.
> * a cacher for fetching other nodes system properties whenever /live_nodes 
> get changed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13445) Preferred replicas on nodes with same system properties as the query master

2019-05-08 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835795#comment-16835795
 ] 

ASF subversion and git services commented on SOLR-13445:


Commit 81cfbcd0096b85d98c38dec038e2934bfaa271ca in lucene-solr's branch 
refs/heads/master from Cao Manh Dat
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=81cfbcd ]

SOLR-13445: Fix precommit


> Preferred replicas on nodes with same system properties as the query master
> ---
>
> Key: SOLR-13445
> URL: https://issues.apache.org/jira/browse/SOLR-13445
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Major
> Fix For: master (9.0), 8.2
>
> Attachments: SOLR-13445.patch, SOLR-13445.patch, SOLR-13445.patch
>
>
> Currently, Solr chooses a random replica for each shard to fan out the query 
> request. However, this presents a problem when running Solr in multiple 
> availability zones.
> If one availability zone fails then it affects all Solr nodes because they 
> will try to connect to Solr nodes in the failed availability zone until the 
> request times out. This can lead to a build up of threads on each Solr node 
> until the node goes out of memory. This results in a cascading failure.
> This issue try to solve this problem by adding
> * another shardPreference param named {{node.sysprop}}, so the query will be 
> routed to nodes with same defined system properties as the current one.
> * default shardPreferences on the whole cluster, which will be stored in 
> {{/clusterprops.json}}.
> * a cacher for fetching other nodes system properties whenever /live_nodes 
> get changed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13445) Preferred replicas on nodes with same system properties as the query master

2019-05-08 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835756#comment-16835756
 ] 

ASF subversion and git services commented on SOLR-13445:


Commit 8a1b966165339f017e0f1afb736b0afb939a0510 in lucene-solr's branch 
refs/heads/branch_8x from Cao Manh Dat
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=8a1b966 ]

SOLR-13445: Preferred replicas on nodes with same system properties as the 
query master


> Preferred replicas on nodes with same system properties as the query master
> ---
>
> Key: SOLR-13445
> URL: https://issues.apache.org/jira/browse/SOLR-13445
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Major
> Attachments: SOLR-13445.patch, SOLR-13445.patch, SOLR-13445.patch
>
>
> Currently, Solr chooses a random replica for each shard to fan out the query 
> request. However, this presents a problem when running Solr in multiple 
> availability zones.
> If one availability zone fails then it affects all Solr nodes because they 
> will try to connect to Solr nodes in the failed availability zone until the 
> request times out. This can lead to a build up of threads on each Solr node 
> until the node goes out of memory. This results in a cascading failure.
> This issue try to solve this problem by adding
> * another shardPreference param named {{node.sysprop}}, so the query will be 
> routed to nodes with same defined system properties as the current one.
> * default shardPreferences on the whole cluster, which will be stored in 
> {{/clusterprops.json}}.
> * a cacher for fetching other nodes system properties whenever /live_nodes 
> get changed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13445) Preferred replicas on nodes with same system properties as the query master

2019-05-08 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835754#comment-16835754
 ] 

ASF subversion and git services commented on SOLR-13445:


Commit 6b5b74bc9c9576913a5124eec138938e09037dad in lucene-solr's branch 
refs/heads/master from Cao Manh Dat
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=6b5b74b ]

SOLR-13445: Preferred replicas on nodes with same system properties as the 
query master


> Preferred replicas on nodes with same system properties as the query master
> ---
>
> Key: SOLR-13445
> URL: https://issues.apache.org/jira/browse/SOLR-13445
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Major
> Attachments: SOLR-13445.patch, SOLR-13445.patch, SOLR-13445.patch
>
>
> Currently, Solr chooses a random replica for each shard to fan out the query 
> request. However, this presents a problem when running Solr in multiple 
> availability zones.
> If one availability zone fails then it affects all Solr nodes because they 
> will try to connect to Solr nodes in the failed availability zone until the 
> request times out. This can lead to a build up of threads on each Solr node 
> until the node goes out of memory. This results in a cascading failure.
> This issue try to solve this problem by adding
> * another shardPreference param named {{node.sysprop}}, so the query will be 
> routed to nodes with same defined system properties as the current one.
> * default shardPreferences on the whole cluster, which will be stored in 
> {{/clusterprops.json}}.
> * a cacher for fetching other nodes system properties whenever /live_nodes 
> get changed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13445) Preferred replicas on nodes with same system properties as the query master

2019-05-08 Thread Cao Manh Dat (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835486#comment-16835486
 ] 

Cao Manh Dat commented on SOLR-13445:
-

Adding minor change to the patch because of TestHttpShardHandlerFactory failure.

> Preferred replicas on nodes with same system properties as the query master
> ---
>
> Key: SOLR-13445
> URL: https://issues.apache.org/jira/browse/SOLR-13445
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Major
> Attachments: SOLR-13445.patch, SOLR-13445.patch, SOLR-13445.patch
>
>
> Currently, Solr chooses a random replica for each shard to fan out the query 
> request. However, this presents a problem when running Solr in multiple 
> availability zones.
> If one availability zone fails then it affects all Solr nodes because they 
> will try to connect to Solr nodes in the failed availability zone until the 
> request times out. This can lead to a build up of threads on each Solr node 
> until the node goes out of memory. This results in a cascading failure.
> This issue try to solve this problem by adding
> * another shardPreference param named {{node.sysprop}}, so the query will be 
> routed to nodes with same defined system properties as the current one.
> * default shardPreferences on the whole cluster, which will be stored in 
> {{/clusterprops.json}}.
> * a cacher for fetching other nodes system properties whenever /live_nodes 
> get changed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13445) Preferred replicas on nodes with same system properties as the query master

2019-05-07 Thread Shalin Shekhar Mangar (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16834405#comment-16834405
 ] 

Shalin Shekhar Mangar commented on SOLR-13445:
--

Thanks Dat. A few comments:

# Minor nit: Rename HttpShardHandlerFactory#sameMetric to hasSameMetric
# Can you do an exponential backoff in NodesSysPropsCacher#fetchRemoteProps?
# The RoutingToNodesWithPropertiesTest needs a better check than comparing 
shardAddress. The reason is that shardAddress is set by the GET_TOP_IDS phase 
but not by other phases such as GET_FIELDS. Use TrackingShardHandlerFactory 
instead.
# I agree with Andrzej that the fix to SolrClientNodeStateProvider should go to 
a separate issue so that it can be backported to 7_7 if needed.

> Preferred replicas on nodes with same system properties as the query master
> ---
>
> Key: SOLR-13445
> URL: https://issues.apache.org/jira/browse/SOLR-13445
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Major
> Attachments: SOLR-13445.patch
>
>
> Currently, Solr chooses a random replica for each shard to fan out the query 
> request. However, this presents a problem when running Solr in multiple 
> availability zones.
> If one availability zone fails then it affects all Solr nodes because they 
> will try to connect to Solr nodes in the failed availability zone until the 
> request times out. This can lead to a build up of threads on each Solr node 
> until the node goes out of memory. This results in a cascading failure.
> This issue try to solve this problem by adding
> * another shardPreference param named {{node.sysprop}}, so the query will be 
> routed to nodes with same defined system properties as the current one.
> * default shardPreferences on the whole cluster, which will be stored in 
> {{/clusterprops.json}}.
> * a cacher for fetching other nodes system properties whenever /live_nodes 
> get changed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13445) Preferred replicas on nodes with same system properties as the query master

2019-05-03 Thread Andrzej Bialecki (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16832610#comment-16832610
 ] 

Andrzej Bialecki  commented on SOLR-13445:
--

The bug in {{SolrClientNodeStateProvider}} should be fixed in other active 
branches too, regardless of this improvement.

> Preferred replicas on nodes with same system properties as the query master
> ---
>
> Key: SOLR-13445
> URL: https://issues.apache.org/jira/browse/SOLR-13445
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Major
> Attachments: SOLR-13445.patch
>
>
> Currently, Solr chooses a random replica for each shard to fan out the query 
> request. However, this presents a problem when running Solr in multiple 
> availability zones.
> If one availability zone fails then it affects all Solr nodes because they 
> will try to connect to Solr nodes in the failed availability zone until the 
> request times out. This can lead to a build up of threads on each Solr node 
> until the node goes out of memory. This results in a cascading failure.
> This issue try to solve this problem by adding
> * another shardPreference param named {{node.sysprop}}, so the query will be 
> routed to nodes with same defined system properties as the current one.
> * default shardPreferences on the whole cluster, which will be stored in 
> {{/clusterprops.json}}.
> * a cacher for fetching other nodes system properties whenever /live_nodes 
> get changed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13445) Preferred replicas on nodes with same system properties as the query master

2019-05-03 Thread Cao Manh Dat (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16832436#comment-16832436
 ] 

Cao Manh Dat commented on SOLR-13445:
-

I had several private conversations with [~shalinmangar] about how to deal with 
this issue, and he helped a lot. Thanks [~shalinmangar].

The attached patch beside implementing mentioned features in the description, 
also solving an issue in {{SolrClientNodeStateProvider}} since we always 
retrying query metrics from other nodes even it just successfully doing that. 

> Preferred replicas on nodes with same system properties as the query master
> ---
>
> Key: SOLR-13445
> URL: https://issues.apache.org/jira/browse/SOLR-13445
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Major
> Attachments: SOLR-13445.patch
>
>
> Currently, Solr chooses a random replica for each shard to fan out the query 
> request. However, this presents a problem when running Solr in multiple 
> availability zones.
> If one availability zone fails then it affects all Solr nodes because they 
> will try to connect to Solr nodes in the failed availability zone until the 
> request times out. This can lead to a build up of threads on each Solr node 
> until the node goes out of memory. This results in a cascading failure.
> This issue try to solve this problem by adding
> * another shardPreference param named {{node.sysprop}}, so the query will be 
> routed to nodes with same defined system properties as the current one.
> * default shardPreferences on the whole cluster, which will be stored in 
> {{/clusterprops.json}}.
> * a cacher for fetching other nodes system properties whenever /live_nodes 
> get changed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org