In the /etc/jetty.xml delivered with Solr, maxThreads is set to 10,000.
I wonder why maxThreads for Jetty with Solr is so high?


The default case for a Solr instance is that processing is isolated to
machines controlled by the owner: Performance is dictated by local CPU,
memory & storage latency. There is no waiting for external services.

With this in mind, I would not expect throughput to rise after a certain
number of concurrent searches: When CPUs are maxed and storage is maxed,
starting new Threads just increases processing time for the already
running ones.

A fairly beefy machine nowadays might be 24 Hyper-Threaded cores with
SSDs in RAID 0 as backend. Let's say that CPU is the bottleneck here. We
multiply with 2 (way too much) for the Hyper-Threading and another 2 to
compensate for locks. Back of the envelope says that more than 100
concurrent searches on that machine will not increase throughput.

Fewer CPUs or slower storage would only lower that number. Cache hits is
a joker here as it takes very little CPU, but it would take a cache hit
rate of 99% to get increased throughput from the 10,000 threads.



Setting maxThread "too high" is bad, according to
http://67-23-9-112.static.slicehost.net/doc/optimization.html

The obvious problem for Solr is memory usage as some searches require a
non-trivial amount of temporary heap space; notably faceting with a
bitset for the hits and structures for counting (int[] or HashMap,
depending on implementation). A modest index of 5M documents with 100K
unique values in a facet field (used with fc or DocValues) takes 1MByte+
of memory for a single search or 10GByte+ with 10,000.

The potentially huge discrepancy between the memory requirements under
light load vs. heavy load seems like a trap. An unplanned burst of
request might very well bring the installation down.

Alternatively one could over-allocate memory to guard against OOM, but
since throughput does not increase past a given amount of concurrent
searches (where "a given amount" by my logic is far less than 10,000),
this is essentially wasted: Queueing would give the same throughput with
lower memory requirements.


By the logic above, maxThreads of 100 or maybe 200 would be an
appropriate default for Jetty with Solr. So why the 10,000?

- Toke Eskildsen, State and Univeristy Library, Denmark



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to