Huy Le created LUCENE-8800:
------------------------------
Summary: FieldsReader#terms poor performance on a index with many
fields
Key: LUCENE-8800
URL: https://issues.apache.org/jira/browse/LUCENE-8800
Project: Lucene - Core
Issue Type: Improvement
Components: core/codecs
Affects Versions: 8.0
Reporter: Huy Le
Attachments: Screen Shot 2019-05-15 at 5.08.26 pm.png
We have experienced poor performance on an index with many fields, their names
share common prefix. Sampling stack using jprofiler showed a hotspot on methodĀ
FieldsReader#terms.
!Screen Shot 2019-05-15 at 5.08.26 pm.png!
Looking at source code I have seen thatĀ TreeMap is used to map between field
name to FieldsProducer which means a lookup incurs O(logN) comparisons.
{code:java}
private static class FieldsReader extends FieldsProducer {
...
private final Map<String,FieldsProducer> fields = new TreeMap<>();
...
@Override
public Terms terms(String field) throws IOException {
FieldsProducer fieldsProducer = fields.get(field);
return fieldsProducer == null ? null : fieldsProducer.terms(field);
}
{code}
The problem becomes much worse when field names are long and share common
prefix because each comparison has to iterate over an entire string.
In our case, the index has around 6000 fields in form of customfield_*. I
wonder if we can change the TreeMap to HashMap or LinkedHashMap in case we want
to preserve the sorted order to improve the situation.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]