[ https://issues.apache.org/jira/browse/LUCENE-8662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16754984#comment-16754984 ]
Simon Willnauer commented on LUCENE-8662: ----------------------------------------- {noformat} If we think that it's a trap, we should remove the default impl and make it abstract (in 8.0). {noformat} I agree with this. I think it can be trappy and such an expert API shouldn't. Let make it abstract? > Override seekExact(BytesRef) in FilterLeafReader.FilterTermsEnum > ---------------------------------------------------------------- > > Key: LUCENE-8662 > URL: https://issues.apache.org/jira/browse/LUCENE-8662 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search > Affects Versions: 5.5.5, 6.6.5, 7.6, 8.0 > Reporter: jefferyyuan > Priority: Major > Labels: query > Fix For: 8.0, 7.7 > > Attachments: output of test program.txt > > Time Spent: 10m > Remaining Estimate: 0h > > Recently in our production, we found that Sole uses a lot of memory(more than > 10g) during recovery or commit for a small index (3.5gb) > The stack trace is: > > {code:java} > Thread 0x4d4b115c0 > at org.apache.lucene.store.DataInput.readVInt()I (DataInput.java:125) > at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadBlock()V > (SegmentTermsEnumFrame.java:157) > at > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTermNonLeaf(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus; > (SegmentTermsEnumFrame.java:786) > at > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTerm(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus; > (SegmentTermsEnumFrame.java:538) > at > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus; > (SegmentTermsEnum.java:757) > at > org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus; > (FilterLeafReader.java:185) > at > org.apache.lucene.index.TermsEnum.seekExact(Lorg/apache/lucene/util/BytesRef;)Z > (TermsEnum.java:74) > at > org.apache.solr.search.SolrIndexSearcher.lookupId(Lorg/apache/lucene/util/BytesRef;)J > (SolrIndexSearcher.java:823) > at > org.apache.solr.update.VersionInfo.getVersionFromIndex(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; > (VersionInfo.java:204) > at > org.apache.solr.update.UpdateLog.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; > (UpdateLog.java:786) > at > org.apache.solr.update.VersionInfo.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long; > (VersionInfo.java:194) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(Lorg/apache/solr/update/AddUpdateCommand;)Z > (DistributedUpdateProcessor.java:1051) > {code} > We reproduced the problem locally with the following code using Lucene code. > {code:java} > public static void main(String[] args) throws IOException { > FSDirectory index = FSDirectory.open(Paths.get("the-index")); > try (IndexReader reader = new > ExitableDirectoryReader(DirectoryReader.open(index), > new QueryTimeoutImpl(1000 * 60 * 5))) { > String id = "the-id"; > BytesRef text = new BytesRef(id); > for (LeafReaderContext lf : reader.leaves()) { > TermsEnum te = lf.reader().terms("id").iterator(); > System.out.println(te.seekExact(text)); > } > } > } > {code} > > I added System.out.println("ord: " + ord); in > codecs.blocktree.SegmentTermsEnum.getFrame(int). > Please check the attached output of test program.txt. > > We found out the root cause: > we didn't implement seekExact(BytesRef) method in > FilterLeafReader.FilterTerms, so it uses the base class > TermsEnum.seekExact(BytesRef) implementation which is very inefficient in > this case. > {code:java} > public boolean seekExact(BytesRef text) throws IOException { > return seekCeil(text) == SeekStatus.FOUND; > } > {code} > The fix is simple, just override seekExact(BytesRef) method in > FilterLeafReader.FilterTerms > {code:java} > @Override > public boolean seekExact(BytesRef text) throws IOException { > return in.seekExact(text); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org