[jira] [Commented] (SOLR-17275) Major performance regression of CloudSolrClient in Solr 9.6.0 when using aliases

2024-05-16 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-17275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17847059#comment-17847059
 ] 

ASF subversion and git services commented on SOLR-17275:


Commit 53963292db39803996f76a5229ac3b91c6265b5e in solr's branch 
refs/heads/branch_9_6 from aparnasuresh85
[ https://gitbox.apache.org/repos/asf?p=solr.git;h=53963292db3 ]

SOLR-17275: Revert SolrJ ZkClientClusterStateProvider SOLR-17153 (#2463)

SolrJ ZkClientClusterStateProvider: revert SOLR-17153 for perf regression when 
aliases are used.  Other parts of that issue are not reverted.  Affects only 
Solr 9.6.

Thanks Rafał Harabień for reporting the problem.

(cherry picked from commit f073dc86e1ab714c3e8eaa4a989698feb90f8a27)


> Major performance regression of CloudSolrClient in Solr 9.6.0 when using 
> aliases
> 
>
> Key: SOLR-17275
> URL: https://issues.apache.org/jira/browse/SOLR-17275
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 9.6.0
> Environment: SolrJ 9.6.0, Ubuntu 22.04, Java 17
>Reporter: Rafał Harabień
>Priority: Blocker
> Fix For: 9.6.1
>
> Attachments: image-2024-05-06-17-23-42-236.png
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> I observe worse performance of CloudSolrClient after upgrading from SolrJ 
> 9.5.0 to 9.6.0, especially on p99. 
> p99 jumped from ~25 ms to ~400 ms
> p90 jumped from ~9.9 ms to ~22 ms
> p75 jumped from ~7 ms to ~11 ms
> p50 jumped from ~4.5 ms to ~7.5 ms
> Screenshot from Grafana (at ~14:30 was deployed the new version):
> !image-2024-05-06-17-23-42-236.png!
> I've got a thread-dump and I can see many threads waiting in 
> [ZkStateReader.forceUpdateCollection|https://github.com/apache/solr/blob/f8e5a93c11267e13b7b43005a428bfb910ac6e57/solr/solrj-zookeeper/src/java/org/apache/solr/common/cloud/ZkStateReader.java#L503]:
> {noformat}
> Thread info: "suggest-solrThreadPool-thread-52" prio=5 Id=600 BLOCKED on 
> org.apache.solr.common.cloud.ZkStateReader@62e6bc3d owned by 
> "suggest-solrThreadPool-thread-34" Id=582
>   at 
> app//org.apache.solr.common.cloud.ZkStateReader.forceUpdateCollection(ZkStateReader.java:506)
>   -  blocked on org.apache.solr.common.cloud.ZkStateReader@62e6bc3d
>   at 
> app//org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider.getState(ZkClientClusterStateProvider.java:155)
>   at 
> app//org.apache.solr.client.solrj.impl.CloudSolrClient.resolveAliases(CloudSolrClient.java:1207)
>   at 
> app//org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1099)
>   at 
> app//org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:892)
>   at 
> app//org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:820)
>   at 
> app//org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:255)
>   at 
> app//org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:927)
>   ...
>   Number of locked synchronizers = 1
>   - java.util.concurrent.ThreadPoolExecutor$Worker@1beb7ed3
> {noformat}
> At the same time qTime from Solr hasn't changed so I'm pretty sure it's a 
> client regression.
> I've tried reproducing it locally and I can see 
> [forceUpdateCollection|https://github.com/apache/solr/blob/f8e5a93c11267e13b7b43005a428bfb910ac6e57/solr/solrj-zookeeper/src/java/org/apache/solr/common/cloud/ZkStateReader.java#L503]
>  function being called for every request in my application. I can see that 
> [this|https://github.com/apache/solr/commit/8cf552aa3642be473c6a08ce44feceb9cbe396d7]
>  commit
>  changed the logic in ZkClientClusterStateProvider.getState so the mentioned 
> function gets called if clusterState.getCollectionRef [returns 
> null|https://github.com/apache/solr/blob/f8e5a93c11267e13b7b43005a428bfb910ac6e57/solr/solrj-zookeeper/src/java/org/apache/solr/client/solrj/impl/ZkClientClusterStateProvider.java#L151].
>  In 9.5.0 it wasn't the case (forceUpdateCollection was not called in this 
> place). I can see in the debugger that getCollectionRef only supports 
> collections and not aliases (collectionStates map contains only collections). 
> In my application all collections are referenced using aliases so I guess 
> that's why I can see the regression in Solr response time.
> I am not familiar with the code enough to prepare a PR but I hope this 
> insight will be enough to fix this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additiona

[jira] [Commented] (SOLR-17275) Major performance regression of CloudSolrClient in Solr 9.6.0 when using aliases

2024-05-16 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-17275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17847057#comment-17847057
 ] 

ASF subversion and git services commented on SOLR-17275:


Commit 9954fc467e70d3e261f17656c2819fcc2e8faffe in solr's branch 
refs/heads/branch_9x from aparnasuresh85
[ https://gitbox.apache.org/repos/asf?p=solr.git;h=9954fc467e7 ]

SOLR-17275: Revert SolrJ ZkClientClusterStateProvider SOLR-17153 (#2463)

SolrJ ZkClientClusterStateProvider: revert SOLR-17153 for perf regression when 
aliases are used.  Other parts of that issue are not reverted.  Affects only 
Solr 9.6.

Thanks Rafał Harabień for reporting the problem.

(cherry picked from commit f073dc86e1ab714c3e8eaa4a989698feb90f8a27)


> Major performance regression of CloudSolrClient in Solr 9.6.0 when using 
> aliases
> 
>
> Key: SOLR-17275
> URL: https://issues.apache.org/jira/browse/SOLR-17275
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 9.6.0
> Environment: SolrJ 9.6.0, Ubuntu 22.04, Java 17
>Reporter: Rafał Harabień
>Priority: Blocker
> Fix For: 9.6.1
>
> Attachments: image-2024-05-06-17-23-42-236.png
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> I observe worse performance of CloudSolrClient after upgrading from SolrJ 
> 9.5.0 to 9.6.0, especially on p99. 
> p99 jumped from ~25 ms to ~400 ms
> p90 jumped from ~9.9 ms to ~22 ms
> p75 jumped from ~7 ms to ~11 ms
> p50 jumped from ~4.5 ms to ~7.5 ms
> Screenshot from Grafana (at ~14:30 was deployed the new version):
> !image-2024-05-06-17-23-42-236.png!
> I've got a thread-dump and I can see many threads waiting in 
> [ZkStateReader.forceUpdateCollection|https://github.com/apache/solr/blob/f8e5a93c11267e13b7b43005a428bfb910ac6e57/solr/solrj-zookeeper/src/java/org/apache/solr/common/cloud/ZkStateReader.java#L503]:
> {noformat}
> Thread info: "suggest-solrThreadPool-thread-52" prio=5 Id=600 BLOCKED on 
> org.apache.solr.common.cloud.ZkStateReader@62e6bc3d owned by 
> "suggest-solrThreadPool-thread-34" Id=582
>   at 
> app//org.apache.solr.common.cloud.ZkStateReader.forceUpdateCollection(ZkStateReader.java:506)
>   -  blocked on org.apache.solr.common.cloud.ZkStateReader@62e6bc3d
>   at 
> app//org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider.getState(ZkClientClusterStateProvider.java:155)
>   at 
> app//org.apache.solr.client.solrj.impl.CloudSolrClient.resolveAliases(CloudSolrClient.java:1207)
>   at 
> app//org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1099)
>   at 
> app//org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:892)
>   at 
> app//org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:820)
>   at 
> app//org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:255)
>   at 
> app//org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:927)
>   ...
>   Number of locked synchronizers = 1
>   - java.util.concurrent.ThreadPoolExecutor$Worker@1beb7ed3
> {noformat}
> At the same time qTime from Solr hasn't changed so I'm pretty sure it's a 
> client regression.
> I've tried reproducing it locally and I can see 
> [forceUpdateCollection|https://github.com/apache/solr/blob/f8e5a93c11267e13b7b43005a428bfb910ac6e57/solr/solrj-zookeeper/src/java/org/apache/solr/common/cloud/ZkStateReader.java#L503]
>  function being called for every request in my application. I can see that 
> [this|https://github.com/apache/solr/commit/8cf552aa3642be473c6a08ce44feceb9cbe396d7]
>  commit
>  changed the logic in ZkClientClusterStateProvider.getState so the mentioned 
> function gets called if clusterState.getCollectionRef [returns 
> null|https://github.com/apache/solr/blob/f8e5a93c11267e13b7b43005a428bfb910ac6e57/solr/solrj-zookeeper/src/java/org/apache/solr/client/solrj/impl/ZkClientClusterStateProvider.java#L151].
>  In 9.5.0 it wasn't the case (forceUpdateCollection was not called in this 
> place). I can see in the debugger that getCollectionRef only supports 
> collections and not aliases (collectionStates map contains only collections). 
> In my application all collections are referenced using aliases so I guess 
> that's why I can see the regression in Solr response time.
> I am not familiar with the code enough to prepare a PR but I hope this 
> insight will be enough to fix this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional

[jira] [Commented] (SOLR-17275) Major performance regression of CloudSolrClient in Solr 9.6.0 when using aliases

2024-05-16 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-17275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17847054#comment-17847054
 ] 

ASF subversion and git services commented on SOLR-17275:


Commit f073dc86e1ab714c3e8eaa4a989698feb90f8a27 in solr's branch 
refs/heads/main from aparnasuresh85
[ https://gitbox.apache.org/repos/asf?p=solr.git;h=f073dc86e1a ]

SOLR-17275: Revert SolrJ ZkClientClusterStateProvider SOLR-17153 (#2463)

SolrJ ZkClientClusterStateProvider: revert SOLR-17153 for perf regression when 
aliases are used.  Other parts of that issue are not reverted.  Affects only 
Solr 9.6.

Thanks Rafał Harabień for reporting the problem.

> Major performance regression of CloudSolrClient in Solr 9.6.0 when using 
> aliases
> 
>
> Key: SOLR-17275
> URL: https://issues.apache.org/jira/browse/SOLR-17275
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 9.6.0
> Environment: SolrJ 9.6.0, Ubuntu 22.04, Java 17
>Reporter: Rafał Harabień
>Priority: Blocker
> Fix For: 9.6.1
>
> Attachments: image-2024-05-06-17-23-42-236.png
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> I observe worse performance of CloudSolrClient after upgrading from SolrJ 
> 9.5.0 to 9.6.0, especially on p99. 
> p99 jumped from ~25 ms to ~400 ms
> p90 jumped from ~9.9 ms to ~22 ms
> p75 jumped from ~7 ms to ~11 ms
> p50 jumped from ~4.5 ms to ~7.5 ms
> Screenshot from Grafana (at ~14:30 was deployed the new version):
> !image-2024-05-06-17-23-42-236.png!
> I've got a thread-dump and I can see many threads waiting in 
> [ZkStateReader.forceUpdateCollection|https://github.com/apache/solr/blob/f8e5a93c11267e13b7b43005a428bfb910ac6e57/solr/solrj-zookeeper/src/java/org/apache/solr/common/cloud/ZkStateReader.java#L503]:
> {noformat}
> Thread info: "suggest-solrThreadPool-thread-52" prio=5 Id=600 BLOCKED on 
> org.apache.solr.common.cloud.ZkStateReader@62e6bc3d owned by 
> "suggest-solrThreadPool-thread-34" Id=582
>   at 
> app//org.apache.solr.common.cloud.ZkStateReader.forceUpdateCollection(ZkStateReader.java:506)
>   -  blocked on org.apache.solr.common.cloud.ZkStateReader@62e6bc3d
>   at 
> app//org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider.getState(ZkClientClusterStateProvider.java:155)
>   at 
> app//org.apache.solr.client.solrj.impl.CloudSolrClient.resolveAliases(CloudSolrClient.java:1207)
>   at 
> app//org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1099)
>   at 
> app//org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:892)
>   at 
> app//org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:820)
>   at 
> app//org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:255)
>   at 
> app//org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:927)
>   ...
>   Number of locked synchronizers = 1
>   - java.util.concurrent.ThreadPoolExecutor$Worker@1beb7ed3
> {noformat}
> At the same time qTime from Solr hasn't changed so I'm pretty sure it's a 
> client regression.
> I've tried reproducing it locally and I can see 
> [forceUpdateCollection|https://github.com/apache/solr/blob/f8e5a93c11267e13b7b43005a428bfb910ac6e57/solr/solrj-zookeeper/src/java/org/apache/solr/common/cloud/ZkStateReader.java#L503]
>  function being called for every request in my application. I can see that 
> [this|https://github.com/apache/solr/commit/8cf552aa3642be473c6a08ce44feceb9cbe396d7]
>  commit
>  changed the logic in ZkClientClusterStateProvider.getState so the mentioned 
> function gets called if clusterState.getCollectionRef [returns 
> null|https://github.com/apache/solr/blob/f8e5a93c11267e13b7b43005a428bfb910ac6e57/solr/solrj-zookeeper/src/java/org/apache/solr/client/solrj/impl/ZkClientClusterStateProvider.java#L151].
>  In 9.5.0 it wasn't the case (forceUpdateCollection was not called in this 
> place). I can see in the debugger that getCollectionRef only supports 
> collections and not aliases (collectionStates map contains only collections). 
> In my application all collections are referenced using aliases so I guess 
> that's why I can see the regression in Solr response time.
> I am not familiar with the code enough to prepare a PR but I hope this 
> insight will be enough to fix this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-17275) Major performance regression of CloudSolrClient in Solr 9.6.0 when using aliases

2024-05-16 Thread Aparna Suresh (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-17275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17847033#comment-17847033
 ] 

Aparna Suresh commented on SOLR-17275:
--

Indeed I confirmed the same once I saw your response. Somehow my test didnt 
capture the interaction with Zk with how it was designed, but debugging 
revealed the true issue.

 

Another reason to revert the logic is to have CloudSolrClient not try to be 
up-to-date with global cluster state, which was the intention of the parent 
Jira - SOLR-17153.

> Major performance regression of CloudSolrClient in Solr 9.6.0 when using 
> aliases
> 
>
> Key: SOLR-17275
> URL: https://issues.apache.org/jira/browse/SOLR-17275
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 9.6.0
> Environment: SolrJ 9.6.0, Ubuntu 22.04, Java 17
>Reporter: Rafał Harabień
>Priority: Blocker
> Fix For: 9.6.1
>
> Attachments: image-2024-05-06-17-23-42-236.png
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> I observe worse performance of CloudSolrClient after upgrading from SolrJ 
> 9.5.0 to 9.6.0, especially on p99. 
> p99 jumped from ~25 ms to ~400 ms
> p90 jumped from ~9.9 ms to ~22 ms
> p75 jumped from ~7 ms to ~11 ms
> p50 jumped from ~4.5 ms to ~7.5 ms
> Screenshot from Grafana (at ~14:30 was deployed the new version):
> !image-2024-05-06-17-23-42-236.png!
> I've got a thread-dump and I can see many threads waiting in 
> [ZkStateReader.forceUpdateCollection|https://github.com/apache/solr/blob/f8e5a93c11267e13b7b43005a428bfb910ac6e57/solr/solrj-zookeeper/src/java/org/apache/solr/common/cloud/ZkStateReader.java#L503]:
> {noformat}
> Thread info: "suggest-solrThreadPool-thread-52" prio=5 Id=600 BLOCKED on 
> org.apache.solr.common.cloud.ZkStateReader@62e6bc3d owned by 
> "suggest-solrThreadPool-thread-34" Id=582
>   at 
> app//org.apache.solr.common.cloud.ZkStateReader.forceUpdateCollection(ZkStateReader.java:506)
>   -  blocked on org.apache.solr.common.cloud.ZkStateReader@62e6bc3d
>   at 
> app//org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider.getState(ZkClientClusterStateProvider.java:155)
>   at 
> app//org.apache.solr.client.solrj.impl.CloudSolrClient.resolveAliases(CloudSolrClient.java:1207)
>   at 
> app//org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1099)
>   at 
> app//org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:892)
>   at 
> app//org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:820)
>   at 
> app//org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:255)
>   at 
> app//org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:927)
>   ...
>   Number of locked synchronizers = 1
>   - java.util.concurrent.ThreadPoolExecutor$Worker@1beb7ed3
> {noformat}
> At the same time qTime from Solr hasn't changed so I'm pretty sure it's a 
> client regression.
> I've tried reproducing it locally and I can see 
> [forceUpdateCollection|https://github.com/apache/solr/blob/f8e5a93c11267e13b7b43005a428bfb910ac6e57/solr/solrj-zookeeper/src/java/org/apache/solr/common/cloud/ZkStateReader.java#L503]
>  function being called for every request in my application. I can see that 
> [this|https://github.com/apache/solr/commit/8cf552aa3642be473c6a08ce44feceb9cbe396d7]
>  commit
>  changed the logic in ZkClientClusterStateProvider.getState so the mentioned 
> function gets called if clusterState.getCollectionRef [returns 
> null|https://github.com/apache/solr/blob/f8e5a93c11267e13b7b43005a428bfb910ac6e57/solr/solrj-zookeeper/src/java/org/apache/solr/client/solrj/impl/ZkClientClusterStateProvider.java#L151].
>  In 9.5.0 it wasn't the case (forceUpdateCollection was not called in this 
> place). I can see in the debugger that getCollectionRef only supports 
> collections and not aliases (collectionStates map contains only collections). 
> In my application all collections are referenced using aliases so I guess 
> that's why I can see the regression in Solr response time.
> I am not familiar with the code enough to prepare a PR but I hope this 
> insight will be enough to fix this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-17275) Major performance regression of CloudSolrClient in Solr 9.6.0 when using aliases

2024-05-16 Thread Jira


[ 
https://issues.apache.org/jira/browse/SOLR-17275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17847007#comment-17847007
 ] 

Rafał Harabień commented on SOLR-17275:
---

[~aparnasuresh] I think communication with ZK actually happens: 
{noformat}
tryLazyCollection.get() {noformat}
calls
{noformat}
ZkStateReader::LazyCollectionRef::get(false) // allowCached = false{noformat}
-> cachedDocCollection will be null
-> shouldFetch will be true
-> ZkStateReader::getCollectionLive will be called-> 
ZkStateReader::fetchCollectionState will be called-> zkClient.getData() will be 
called
I confirmed it in a debugger.

Synchronized block may cause some small delay too, because we handle a 
relatively big traffic, but sending ZK request sounds like a bigger problem to 
me.

Anyway PR looks good to me so I'm looking forward to 9.6.1 release :)

> Major performance regression of CloudSolrClient in Solr 9.6.0 when using 
> aliases
> 
>
> Key: SOLR-17275
> URL: https://issues.apache.org/jira/browse/SOLR-17275
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 9.6.0
> Environment: SolrJ 9.6.0, Ubuntu 22.04, Java 17
>Reporter: Rafał Harabień
>Priority: Blocker
> Fix For: 9.6.1
>
> Attachments: image-2024-05-06-17-23-42-236.png
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> I observe worse performance of CloudSolrClient after upgrading from SolrJ 
> 9.5.0 to 9.6.0, especially on p99. 
> p99 jumped from ~25 ms to ~400 ms
> p90 jumped from ~9.9 ms to ~22 ms
> p75 jumped from ~7 ms to ~11 ms
> p50 jumped from ~4.5 ms to ~7.5 ms
> Screenshot from Grafana (at ~14:30 was deployed the new version):
> !image-2024-05-06-17-23-42-236.png!
> I've got a thread-dump and I can see many threads waiting in 
> [ZkStateReader.forceUpdateCollection|https://github.com/apache/solr/blob/f8e5a93c11267e13b7b43005a428bfb910ac6e57/solr/solrj-zookeeper/src/java/org/apache/solr/common/cloud/ZkStateReader.java#L503]:
> {noformat}
> Thread info: "suggest-solrThreadPool-thread-52" prio=5 Id=600 BLOCKED on 
> org.apache.solr.common.cloud.ZkStateReader@62e6bc3d owned by 
> "suggest-solrThreadPool-thread-34" Id=582
>   at 
> app//org.apache.solr.common.cloud.ZkStateReader.forceUpdateCollection(ZkStateReader.java:506)
>   -  blocked on org.apache.solr.common.cloud.ZkStateReader@62e6bc3d
>   at 
> app//org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider.getState(ZkClientClusterStateProvider.java:155)
>   at 
> app//org.apache.solr.client.solrj.impl.CloudSolrClient.resolveAliases(CloudSolrClient.java:1207)
>   at 
> app//org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1099)
>   at 
> app//org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:892)
>   at 
> app//org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:820)
>   at 
> app//org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:255)
>   at 
> app//org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:927)
>   ...
>   Number of locked synchronizers = 1
>   - java.util.concurrent.ThreadPoolExecutor$Worker@1beb7ed3
> {noformat}
> At the same time qTime from Solr hasn't changed so I'm pretty sure it's a 
> client regression.
> I've tried reproducing it locally and I can see 
> [forceUpdateCollection|https://github.com/apache/solr/blob/f8e5a93c11267e13b7b43005a428bfb910ac6e57/solr/solrj-zookeeper/src/java/org/apache/solr/common/cloud/ZkStateReader.java#L503]
>  function being called for every request in my application. I can see that 
> [this|https://github.com/apache/solr/commit/8cf552aa3642be473c6a08ce44feceb9cbe396d7]
>  commit
>  changed the logic in ZkClientClusterStateProvider.getState so the mentioned 
> function gets called if clusterState.getCollectionRef [returns 
> null|https://github.com/apache/solr/blob/f8e5a93c11267e13b7b43005a428bfb910ac6e57/solr/solrj-zookeeper/src/java/org/apache/solr/client/solrj/impl/ZkClientClusterStateProvider.java#L151].
>  In 9.5.0 it wasn't the case (forceUpdateCollection was not called in this 
> place). I can see in the debugger that getCollectionRef only supports 
> collections and not aliases (collectionStates map contains only collections). 
> In my application all collections are referenced using aliases so I guess 
> that's why I can see the regression in Solr response time.
> I am not familiar with the code enough to prepare a PR but I hope this 
> insight will be enough to fix this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-

[jira] [Commented] (SOLR-17275) Major performance regression of CloudSolrClient in Solr 9.6.0 when using aliases

2024-05-15 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-17275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17846758#comment-17846758
 ] 

David Smiley commented on SOLR-17275:
-

I'll merge the change tomorrow morning to ensure anyone else has an opportunity 
to review.
It's surprising to see that forceUpdateCollection has this global 
synchronization lock, and that it was *this*, not some extra ZK call, that is 
the root of the problem.

> Major performance regression of CloudSolrClient in Solr 9.6.0 when using 
> aliases
> 
>
> Key: SOLR-17275
> URL: https://issues.apache.org/jira/browse/SOLR-17275
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 9.6.0
> Environment: SolrJ 9.6.0, Ubuntu 22.04, Java 17
>Reporter: Rafał Harabień
>Priority: Blocker
> Fix For: 9.6.1
>
> Attachments: image-2024-05-06-17-23-42-236.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> I observe worse performance of CloudSolrClient after upgrading from SolrJ 
> 9.5.0 to 9.6.0, especially on p99. 
> p99 jumped from ~25 ms to ~400 ms
> p90 jumped from ~9.9 ms to ~22 ms
> p75 jumped from ~7 ms to ~11 ms
> p50 jumped from ~4.5 ms to ~7.5 ms
> Screenshot from Grafana (at ~14:30 was deployed the new version):
> !image-2024-05-06-17-23-42-236.png!
> I've got a thread-dump and I can see many threads waiting in 
> [ZkStateReader.forceUpdateCollection|https://github.com/apache/solr/blob/f8e5a93c11267e13b7b43005a428bfb910ac6e57/solr/solrj-zookeeper/src/java/org/apache/solr/common/cloud/ZkStateReader.java#L503]:
> {noformat}
> Thread info: "suggest-solrThreadPool-thread-52" prio=5 Id=600 BLOCKED on 
> org.apache.solr.common.cloud.ZkStateReader@62e6bc3d owned by 
> "suggest-solrThreadPool-thread-34" Id=582
>   at 
> app//org.apache.solr.common.cloud.ZkStateReader.forceUpdateCollection(ZkStateReader.java:506)
>   -  blocked on org.apache.solr.common.cloud.ZkStateReader@62e6bc3d
>   at 
> app//org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider.getState(ZkClientClusterStateProvider.java:155)
>   at 
> app//org.apache.solr.client.solrj.impl.CloudSolrClient.resolveAliases(CloudSolrClient.java:1207)
>   at 
> app//org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1099)
>   at 
> app//org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:892)
>   at 
> app//org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:820)
>   at 
> app//org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:255)
>   at 
> app//org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:927)
>   ...
>   Number of locked synchronizers = 1
>   - java.util.concurrent.ThreadPoolExecutor$Worker@1beb7ed3
> {noformat}
> At the same time qTime from Solr hasn't changed so I'm pretty sure it's a 
> client regression.
> I've tried reproducing it locally and I can see 
> [forceUpdateCollection|https://github.com/apache/solr/blob/f8e5a93c11267e13b7b43005a428bfb910ac6e57/solr/solrj-zookeeper/src/java/org/apache/solr/common/cloud/ZkStateReader.java#L503]
>  function being called for every request in my application. I can see that 
> [this|https://github.com/apache/solr/commit/8cf552aa3642be473c6a08ce44feceb9cbe396d7]
>  commit
>  changed the logic in ZkClientClusterStateProvider.getState so the mentioned 
> function gets called if clusterState.getCollectionRef [returns 
> null|https://github.com/apache/solr/blob/f8e5a93c11267e13b7b43005a428bfb910ac6e57/solr/solrj-zookeeper/src/java/org/apache/solr/client/solrj/impl/ZkClientClusterStateProvider.java#L151].
>  In 9.5.0 it wasn't the case (forceUpdateCollection was not called in this 
> place). I can see in the debugger that getCollectionRef only supports 
> collections and not aliases (collectionStates map contains only collections). 
> In my application all collections are referenced using aliases so I guess 
> that's why I can see the regression in Solr response time.
> I am not familiar with the code enough to prepare a PR but I hope this 
> insight will be enough to fix this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-17275) Major performance regression of CloudSolrClient in Solr 9.6.0 when using aliases

2024-05-15 Thread Aparna Suresh (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-17275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17846729#comment-17846729
 ] 

Aparna Suresh commented on SOLR-17275:
--

I have reverted the change on the client side to call forceUpdateCollection() 
every time an alias is resolved. I think the performance impact was in terms of 
introducing contention in forceUpdateCollection(), since it synchronizes on 
ZkStateReader object, but should otherwise return pretty quickly, since calls 
to both 
{code:java}
clusterState.getCollectionRef(collection){code}
 and 
{code:java}
tryLazyCollection.get(){code}
 should return null for an alias.
 
Nevertheless the revert makes sense since the call to 
{code:java}
CloudSolrClient.resolveAlias(){code}
 should not ultimately call 
{code:java}
ZkStateReader.forceUpdateCollection(){code}
 since the latter is relevant only for collections. 

> Major performance regression of CloudSolrClient in Solr 9.6.0 when using 
> aliases
> 
>
> Key: SOLR-17275
> URL: https://issues.apache.org/jira/browse/SOLR-17275
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 9.6.0
> Environment: SolrJ 9.6.0, Ubuntu 22.04, Java 17
>Reporter: Rafał Harabień
>Priority: Blocker
> Fix For: 9.6.1
>
> Attachments: image-2024-05-06-17-23-42-236.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I observe worse performance of CloudSolrClient after upgrading from SolrJ 
> 9.5.0 to 9.6.0, especially on p99. 
> p99 jumped from ~25 ms to ~400 ms
> p90 jumped from ~9.9 ms to ~22 ms
> p75 jumped from ~7 ms to ~11 ms
> p50 jumped from ~4.5 ms to ~7.5 ms
> Screenshot from Grafana (at ~14:30 was deployed the new version):
> !image-2024-05-06-17-23-42-236.png!
> I've got a thread-dump and I can see many threads waiting in 
> [ZkStateReader.forceUpdateCollection|https://github.com/apache/solr/blob/f8e5a93c11267e13b7b43005a428bfb910ac6e57/solr/solrj-zookeeper/src/java/org/apache/solr/common/cloud/ZkStateReader.java#L503]:
> {noformat}
> Thread info: "suggest-solrThreadPool-thread-52" prio=5 Id=600 BLOCKED on 
> org.apache.solr.common.cloud.ZkStateReader@62e6bc3d owned by 
> "suggest-solrThreadPool-thread-34" Id=582
>   at 
> app//org.apache.solr.common.cloud.ZkStateReader.forceUpdateCollection(ZkStateReader.java:506)
>   -  blocked on org.apache.solr.common.cloud.ZkStateReader@62e6bc3d
>   at 
> app//org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider.getState(ZkClientClusterStateProvider.java:155)
>   at 
> app//org.apache.solr.client.solrj.impl.CloudSolrClient.resolveAliases(CloudSolrClient.java:1207)
>   at 
> app//org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1099)
>   at 
> app//org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:892)
>   at 
> app//org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:820)
>   at 
> app//org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:255)
>   at 
> app//org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:927)
>   ...
>   Number of locked synchronizers = 1
>   - java.util.concurrent.ThreadPoolExecutor$Worker@1beb7ed3
> {noformat}
> At the same time qTime from Solr hasn't changed so I'm pretty sure it's a 
> client regression.
> I've tried reproducing it locally and I can see 
> [forceUpdateCollection|https://github.com/apache/solr/blob/f8e5a93c11267e13b7b43005a428bfb910ac6e57/solr/solrj-zookeeper/src/java/org/apache/solr/common/cloud/ZkStateReader.java#L503]
>  function being called for every request in my application. I can see that 
> [this|https://github.com/apache/solr/commit/8cf552aa3642be473c6a08ce44feceb9cbe396d7]
>  commit
>  changed the logic in ZkClientClusterStateProvider.getState so the mentioned 
> function gets called if clusterState.getCollectionRef [returns 
> null|https://github.com/apache/solr/blob/f8e5a93c11267e13b7b43005a428bfb910ac6e57/solr/solrj-zookeeper/src/java/org/apache/solr/client/solrj/impl/ZkClientClusterStateProvider.java#L151].
>  In 9.5.0 it wasn't the case (forceUpdateCollection was not called in this 
> place). I can see in the debugger that getCollectionRef only supports 
> collections and not aliases (collectionStates map contains only collections). 
> In my application all collections are referenced using aliases so I guess 
> that's why I can see the regression in Solr response time.
> I am not familiar with the code enough to prepare a PR but I hope this 
> insight will be enough to fix this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---

[jira] [Commented] (SOLR-17275) Major performance regression of CloudSolrClient in Solr 9.6.0 when using aliases

2024-05-10 Thread Aparna Suresh (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-17275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17845438#comment-17845438
 ] 

Aparna Suresh commented on SOLR-17275:
--

Thanks for reporting. I will check in a fix by early next week.

> Major performance regression of CloudSolrClient in Solr 9.6.0 when using 
> aliases
> 
>
> Key: SOLR-17275
> URL: https://issues.apache.org/jira/browse/SOLR-17275
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 9.6.0
> Environment: SolrJ 9.6.0, Ubuntu 22.04, Java 17
>Reporter: Rafał Harabień
>Priority: Major
> Fix For: 9.6.1
>
> Attachments: image-2024-05-06-17-23-42-236.png
>
>
> I observe worse performance of CloudSolrClient after upgrading from SolrJ 
> 9.5.0 to 9.6.0, especially on p99. 
> p99 jumped from ~25 ms to ~400 ms
> p90 jumped from ~9.9 ms to ~22 ms
> p75 jumped from ~7 ms to ~11 ms
> p50 jumped from ~4.5 ms to ~7.5 ms
> Screenshot from Grafana (at ~14:30 was deployed the new version):
> !image-2024-05-06-17-23-42-236.png!
> I've got a thread-dump and I can see many threads waiting in 
> [ZkStateReader.forceUpdateCollection|https://github.com/apache/solr/blob/f8e5a93c11267e13b7b43005a428bfb910ac6e57/solr/solrj-zookeeper/src/java/org/apache/solr/common/cloud/ZkStateReader.java#L503]:
> {noformat}
> Thread info: "suggest-solrThreadPool-thread-52" prio=5 Id=600 BLOCKED on 
> org.apache.solr.common.cloud.ZkStateReader@62e6bc3d owned by 
> "suggest-solrThreadPool-thread-34" Id=582
>   at 
> app//org.apache.solr.common.cloud.ZkStateReader.forceUpdateCollection(ZkStateReader.java:506)
>   -  blocked on org.apache.solr.common.cloud.ZkStateReader@62e6bc3d
>   at 
> app//org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider.getState(ZkClientClusterStateProvider.java:155)
>   at 
> app//org.apache.solr.client.solrj.impl.CloudSolrClient.resolveAliases(CloudSolrClient.java:1207)
>   at 
> app//org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1099)
>   at 
> app//org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:892)
>   at 
> app//org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:820)
>   at 
> app//org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:255)
>   at 
> app//org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:927)
>   ...
>   Number of locked synchronizers = 1
>   - java.util.concurrent.ThreadPoolExecutor$Worker@1beb7ed3
> {noformat}
> At the same time qTime from Solr hasn't changed so I'm pretty sure it's a 
> client regression.
> I've tried reproducing it locally and I can see 
> [forceUpdateCollection|https://github.com/apache/solr/blob/f8e5a93c11267e13b7b43005a428bfb910ac6e57/solr/solrj-zookeeper/src/java/org/apache/solr/common/cloud/ZkStateReader.java#L503]
>  function being called for every request in my application. I can see that 
> [this|https://github.com/apache/solr/commit/8cf552aa3642be473c6a08ce44feceb9cbe396d7]
>  commit
>  changed the logic in ZkClientClusterStateProvider.getState so the mentioned 
> function gets called if clusterState.getCollectionRef [returns 
> null|https://github.com/apache/solr/blob/f8e5a93c11267e13b7b43005a428bfb910ac6e57/solr/solrj-zookeeper/src/java/org/apache/solr/client/solrj/impl/ZkClientClusterStateProvider.java#L151].
>  In 9.5.0 it wasn't the case (forceUpdateCollection was not called in this 
> place). I can see in the debugger that getCollectionRef only supports 
> collections and not aliases (collectionStates map contains only collections). 
> In my application all collections are referenced using aliases so I guess 
> that's why I can see the regression in Solr response time.
> I am not familiar with the code enough to prepare a PR but I hope this 
> insight will be enough to fix this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org