Re: [PR] SOLR-17492: Introduce recommendations of WAYS of running Solr from small to massive [solr]

via GitHub Tue, 22 Jul 2025 12:20:59 -0700


tboeghk commented on PR #2783:
URL: https://github.com/apache/solr/pull/2783#issuecomment-3104512660

In addition to the great summary of @ardatezcan1 above here are my practical
tips and real-world scenarios to run Solr in a high rpm and low to medium
dataset environment (like ecommerce appliations).

__Best practises using Solr in high rpm environments__

Before starting to optimize your Solr setup, make sure to have strong
observability in place. In addition to the [Solr Prometheus and
Grafana](https://solr.apache.org/guide/solr/latest/deployment-guide/monitoring-with-prometheus-and-grafana.html)
setup I strongly recommend setting up the [Node
Exporter](https://github.com/prometheus/node_exporter) to gather and correlate
machine metrics.

* __Use Solr in cloud mode__: Running Solr in cloud mode and in a Zookeeper
ensemble is a prerequisite to the following best-practices. Cloud mode enables
easy addition and removal of Solr cluster nodes depending on the current
traffic.
* __Sharding__: Request processing in Solr is a single threaded operation.
The larger your dataset the more latency you'll add to request processing. The
only (sustainable) way to make query processing a multi-threaded operation is
to shard your index. Depending on your workload, you could simply run multiple
Solr instances on the same machine. I recommend a single Solr instance per
machine though.
* __Sharding strategies__: If your query processing strategy uses
[collapse (and expand or
grouping)](https://solr.apache.org/guide/solr/latest/query-guide/collapse-and-expand-results.html),
make sure to put all documents to a grouping key on the same shard. Adjust the
[document
routing](https://solr.apache.org/guide/solr/latest/deployment-guide/solrcloud-shards-indexing.html#document-routing)
and `router.field` to your grouping key.
* __Indexing and optimization strategies__: Indexing into a live collection
adds significant latency to your search requests. Each commit flushes the
internal caches and those caches keep Solr running fast. Avoid any unnecessary
cache flushes!
* __Optimize your index__: Manually optimizing your index is not
recommended but delivers the best performance as deleted documents are pruned
from the index.
* __Rotate collections__: For smaller to medium datasets it might be a
good strategy to periodically index your data into a new collection instead of
updating an existing one. That way, requests caches stay warm for the lifetime
of a collection and a manual optimize is possible. Use [collection
aliases](https://solr.apache.org/guide/solr/latest/deployment-guide/aliases.html)
to switch clients to the new collection.
* __Use dedicated node setups__: In high traffic environments, a separation
of concerns gets more important. Use dedicated node types and machine
sizings/setup for optimal perfomance tailored to the machines role.
* __Indexer__: Solely used for indexing products. Set up as
[`TLOG`](https://solr.apache.org/guide/solr/latest/deployment-guide/solrcloud-shards-indexing.html#types-of-replicas)
replica type. Must not be used for request processing. Exclude `TLOG` node
types from request processing using the
[`shards.preference`](https://solr.apache.org/guide/solr/latest/deployment-guide/solrcloud-distributed-requests.html#shards-preference-parameter)
parameter configured at your request handlers.
* __Data__: Set up as a `PULL` replica. Replicates it's index from the
indexer nodes via Solr cloud. Using `TLOG` and `PULL` replicas avoids that
index data is being pulled off data nodes (as with `NRT` replicas).
* __Coordinator__: In sharded Solr cloud setups, these nodes coordinate
the distributed request flow and assemble the final search request result. This
is a very CPU intensive operation and is usually shared among the data nodes.
The usage of dedicated [coordinator
nodes](https://solr.apache.org/guide/solr/latest/deployment-guide/node-roles.html#coordinator-role)
separates the compute overhead of coordinating distributed requests off of the
data nodes. Adding coordinator nodes to a Solr cloud setup will drop the
resource usage on data nodes significantly. To make full use of coordinator
nodes, direct all incoming request traffic to these nodes.
* __JVM tuning__: I highly recommend running Solr on _G1GC garbage
collector_. Keep in mind the golden rule of keeping 50% heap for disk cache on
data and indexer nodes. As coordinator nodes are stateless, you can boost their
performance significantly with the _ZGC garbage collector_. It slashes
collection pauses from milli- to nanoseconds.
* __Cloud setup__: Most Solr cloud setups will run in some kind of cloud
environment. Here are some tipps to setup an elastic Solr cloud environment.
* __Autoscaling__: Use a dedicated autoscaling group for each node type
and each shard. Use tags to mark which instance should replicate which shard.
Configure your heap settings dynamically and configure a wide range of instance
types. Build a custom script to replicate data upon instance start. Use the
Solr collections api to [remove a node from the
cluster](https://solr.apache.org/guide/solr/latest/deployment-guide/cluster-node-management.html#deletenode)
during instance termination.
* __Spot instances__: Coordinator and data nodes are great to run as spot
instances. This will save a big bunch of cloud spendings.
* __ARM instance types__: Utilize ARM instance types wherever possible.
The Solr Docker image is also pre-built for ARM architectures. ARM cpus offer
the best bang for the buck and a more consistent response latency (as their CPU
is not power managed).

If you need more information or help to compile the whole information into a
single document let me know!

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] SOLR-17492: Introduce recommendations of WAYS of running Solr from small to massive [solr]

Reply via email to