[
https://issues.apache.org/jira/browse/SOLR-17665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17926192#comment-17926192
]
Luke Kot-Zaniewski edited comment on SOLR-17665 at 2/11/25 9:49 PM:
--------------------------------------------------------------------
As [~dsmiley] alluded already, the reason simply setting multiThreaded=false
doesn't revert to previous behavior/performance is because once
indexSearcherExecutorThreads>0, the solr core will create this new TPE and pass
it to lucene's IndexSearcher. Once Lucene's IndexSearcher has an Executor, it
is liable to use it in ways that aren't governed by solr's multiThreaded flag.
One way to enforce that multiThreaded completely controls this feature is to
create a separate IndexSearcher (via SolrIndexSearcher) without any kind of
Executor that gets invoked when multiThreaded=false.
Still, the performance issues are quite puzzling and troubling.. In case it is
of any use, I am attaching JFR profiles of 3 different configurations of the
benchmarks discussed in the mailing list under "Identifying performance issues
in Solr 9.7/9.8":
1. multiThreaded=false, indexSearcherExecutorThreads=cores (default)
[^flight-9.8-mt-false-executor-enabled.jfr]
2. multiThreaded=false, indexSearcherExecutorThreads=-1
[^flight-9.8-executor-disabled.jfr]
3. multiThreaded=true, indexSearcherExecutorThreads=cores (default)
[^flight-9.8-mt-true.jfr]
One spot I noticed a large divergence is in the TermStates which the benchmark
term queries depend on. The reason I looked at TermStates more closely is that
it is the one part of lucene (that I could find) that parallelizes things on
that aforementioned executor even if you set multiThreaded=false in your
requests. I was surprised that disabling the thread-pool causes significantly
_more_ samples to appear in TermStates::build and that this _reduces_ latency
of each query. In the jfr with multithreaded=false but with
indexSearcherExecutorThreads=cores there are quite fewer samples captured in
TermStates::build yet the term queries suffer from worse latency.
Setting multiThreaded=true doesn't seem to really improve performance on my 16
core machine (WSL on windows with 16 core i7-11850h) of this benchmark. The
benchmark appears to run 1 query at a time so it's possible the benefits of
parallelism aren't ever reached with this level of load (low cpu-utilization
reported in the JFR recording seems to support this). But the ~4X degradation
of single-query latency with the custom executor enabled is suspicious.
was (Author: JIRAUSER304885):
As [~dsmiley] alluded already, the reason simply setting multiThreaded=false
doesn't revert to previous behavior/performance is because once
indexSearcherExecutorThreads>0, the solr core will create this new TPE and pass
it to lucene's IndexSearcher. Once Lucene's IndexSearcher has an Executor, it
is liable to use it in ways that aren't governed by solr's multiThreaded flag.
One way to enforce that multiThreaded completely controls this feature is to
create a separate IndexSearcher (via SolrIndexSearcher) without any kind of
Executor that gets invoked when multiThreaded=false.
Still, the performance issues are quite puzzling and troubling.. In case it is
of any use, I am attaching JFR profiles of 3 different configurations of the
benchmarks discussed in the mailing list under "Identifying performance issues
in Solr 9.7/9.8":
1. multiThreaded=false, indexSearcherExecutorThreads=cores (default)
[^flight-9.8-mt-false-executor-enabled.jfr]
2. multiThreaded=false, indexSearcherExecutorThreads=-1
[^flight-9.8-executor-disabled.jfr]
3. multiThreaded=true, indexSearcherExecutorThreads=cores (default)
[^flight-9.8-mt-true.jfr]
One spot I noticed a large divergence is in the TermStates which the benchmark
term queries depend on. The reason I looked at TermStates more closely is that
it is the one part of lucene (that I could find) that parallelizes things on
that aforementioned executor even if you set multiThreaded=false in your
requests. I was surprised that disabling the thread-pool causes significantly
_more_ samples to appear in TermStates::build and that this _reduces_ latency
of each query. In the jfr with multithreaded=false but with
indexSearcherExecutorThreads=cores there are quite fewer samples captured in
TermStates::build yet the term queries suffer from worse latency.
Worst of all, perhaps, is that setting multiThreaded=true doesn't seem to
really improve performance on my 16 core machine (WSL on windows with 16 core
i7-11850h). Perhaps I am missing something from my set-up but figured I'd share
the testing I've done.
> Perf regression: overhead of multiThreaded=false should be nothing
> ------------------------------------------------------------------
>
> Key: SOLR-17665
> URL: https://issues.apache.org/jira/browse/SOLR-17665
> Project: Solr
> Issue Type: Bug
> Components: search
> Affects Versions: 9.7
> Reporter: David Smiley
> Priority: Major
> Attachments: flight-9.8-executor-disabled.jfr,
> flight-9.8-mt-false-executor-enabled.jfr, flight-9.8-mt-true.jfr
>
>
> SOLR-13350 introduced {{multiThreaded}} query param, defaulting to false,
> thus opt-in. But there is still a serious performance impact; only setting
> {{indexSearcherExecutorThreads}} to -1 in solr.xml mitigates the regression.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]