JiaBaoGao created SOLR-17670:
--------------------------------
Summary: Fix unnecessary memory allocation caused by a large
reRankDocs param
Key: SOLR-17670
URL: https://issues.apache.org/jira/browse/SOLR-17670
Project: Solr
Issue Type: Bug
Security Level: Public (Default Security Level. Issues are Public)
Reporter: JiaBaoGao
The reRank function has a reRankDocs parameter that specifies the number of
documents to re-rank. I've observed that increasing this parameter to test its
performance impact causes queries to become progressively slower. Even when the
parameter value exceeds the total number of documents in the index, further
increases continue to slow down the query, which is counterintuitive.
Therefore, I investigated the code:
For a query containing re-ranking, such as:
{code:java}
{
"start": "0",
"rows": 10,
"fl": "ID,score",
"q": "*:*",
"rq": "{!rerank reRankQuery='{!func} 100' reRankDocs=1000000000 reRankWeight=2}"
} {code}
The current execution logic is as follows:
1. Perform normal retrieval using the q parameter.
2. Re-score all documents retrieved in the q phase using the rq parameter.
During the retrieval in phase 1 (using q), a TopScoreDocCollector is created.
Underneath, this creates a PriorityQueue which contains an Object[]. The length
of this Object[] continuously increases with reRankDocs without any limit.
On my local test cluster with limited JVM memory, this can even trigger an OOM,
causing the Solr node to crash. I can also reproduce the OOM situation using
the SolrCloudTestCase unit test.
I think limiting the length of the Object[] array using
searcher.getIndexReader().maxDoc() at ReRankCollector would resolve this issue.
This way, when reRankDocs exceeds maxDoc, memory allocation will not continue
to increase indefinitely.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]