[ https://issues.apache.org/jira/browse/LUCENE-8800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Huy Le updated LUCENE-8800: --------------------------- Summary: FieldsReader#terms poor performance on a index with many field names sharing common prefix (was: FieldsReader#terms poor performance on a index with many fields) > FieldsReader#terms poor performance on a index with many field names sharing > common prefix > ------------------------------------------------------------------------------------------ > > Key: LUCENE-8800 > URL: https://issues.apache.org/jira/browse/LUCENE-8800 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs > Affects Versions: 8.0 > Reporter: Huy Le > Priority: Major > Attachments: Screen Shot 2019-05-15 at 5.08.26 pm.png > > > We have experienced poor performance on an index with many fields, their > names share common prefix. Sampling stack using jprofiler showed a hotspot on > methodĀ FieldsReader#terms. > !Screen Shot 2019-05-15 at 5.08.26 pm.png! > Looking at source code I have seen thatĀ TreeMap is used to map between field > name to FieldsProducer which means a lookup incurs O(logN) comparisons. > {code:java} > private static class FieldsReader extends FieldsProducer { > ... > private final Map<String,FieldsProducer> fields = new TreeMap<>(); > ... > @Override > public Terms terms(String field) throws IOException { > FieldsProducer fieldsProducer = fields.get(field); > return fieldsProducer == null ? null : fieldsProducer.terms(field); > } > {code} > The problem becomes much worse when field names are long and share common > prefix because each comparison has to iterate over an entire string. > In our case, the index has around 6000 fields in form of customfield_*. I > wonder if we can change the TreeMap to HashMap or LinkedHashMap in case we > want to preserve the sorted order to improve the situation. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org