thelabdude opened a new pull request #2128:
URL: https://github.com/apache/lucene-solr/pull/2128


   # Description
   
   Backport of #2067 to `branch_8x`
   
   For collections with many shards (or aliases with many collections and some 
shards), `CloudSolrStream` will end up creating a new `HttpSolrClient` for 
every `SolrStream` it opens because the cache key is the full core URL, such 
as: `http://127.0.0.1:63460/solr/collection4_shard4_replica_n6/`
   
   In addition, `CloudSolrStream#getSlices` was calling 
`clusterState.getCollectionsMap()` which pre-emptively loads all 
`LazyCollectionRef` from ZK unnecessarily. This could cause an issue with 
clusters with many collections and slow down the streaming expression execution.
   
   # Solution
   
   In this PR, I've introduced a new ctor in `SolrStream` to pass the Replica's 
`baseUrl` and `core` as separate parameters. This leads to reusing the same 
`HttpSolrClient` for the same node because the cache key is now 
`http://127.0.0.1:63460/solr/`. I chose this new ctor approach because 
`CloudSolrStream` is not the only consumer of `SolrStream` and it knows how the 
list of URLs where constructed from cluster state, so it can safely make the 
decision about passing the core and reusing clients.
   
   When the request is sent to the remote core, we need to add the core name to 
the path. This happens in `SolrStream#constructParser`. This method was public 
and takes a SolrClient (even though SolrStream already has an HttpSolrClient 
created in the `open` method); I've changed the signature to be private and use 
the client opened in the `open` method.
   
   # Tests
   
   Added a new test `testCloudStreamClientCache` in `StreamingTest` to verify 
the SolrStreams created by the CloudStream have the correct baseUrl (without 
the core).
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [x] I have reviewed the guidelines for [How to 
Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms 
to the standards described there to the best of my ability.
   - [x] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [x] I have given Solr maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [x] I have developed this patch against the `master` branch.
   - [x] I have run `./gradlew check`.
   - [x] I have added tests for my changes.
   - [ ] I have added documentation for the [Ref 
Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) 
(for Solr changes only).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to