[ https://issues.apache.org/jira/browse/LUCENE-6065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14219528#comment-14219528 ]
Uwe Schindler edited comment on LUCENE-6065 at 11/20/14 4:11 PM: ----------------------------------------------------------------- Maybe i was a little bit too complicated in my explanation, sorry. The main problem I have is: _a public search API where all public methods are final and the whole implementation is protected_, which is a horror when it comes to delegation pattern used by a Filtering API. This feels like Analyzer, which is unintuitive ([~mikemccand] also explained it with the complexity in analysis in his post on the mailing list to make a better lucene) :-) was (Author: thetaphi): Maybe i was a little bit too complicated in my explanation, sorry. The main problem I have is: _a public search API where all public methods are final and the whole implementation is protected_ > remove "foreign readers" from merge, fix LeafReader instead. > ------------------------------------------------------------ > > Key: LUCENE-6065 > URL: https://issues.apache.org/jira/browse/LUCENE-6065 > Project: Lucene - Core > Issue Type: Task > Reporter: Robert Muir > Attachments: LUCENE-6065.patch > > > Currently, SegmentMerger has supported two classes of citizens being merged: > # SegmentReader > # "foreign reader" (e.g. some FilterReader) > It does an instanceof check and executes the merge differently. In the > SegmentReader case: stored field and term vectors are bulk-merged, norms and > docvalues are transferred directly without piling up on the heap, CRC32 > verification runs with IO locality of the data being merged, etc. Otherwise, > we treat it as a "foreign" reader and its slow. > This is just the low-level, it gets worse as you wrap with more stuff. A > great example there is SortingMergePolicy: not only will it have the > low-level slowdowns listed above, it will e.g. cache/pile up OrdinalMaps for > all string docvalues fields being merged and other silliness that just makes > matters worse. > Another use case is 5.0 users wishing to upgrade from fieldcache to > docvalues. This should be possible to implement with a simple incremental > transition based on a mergepolicy that uses UninvertingReader. But we > shouldnt populate internal fieldcache entries unnecessarily on merge and > spike RAM until all those segment cores are released, and other issues like > bulk merge of stored fields and not piling up norms should still work: its > completely unrelated. > There are more problems we can fix if we clean this up, > checkindex/checkreader can run efficiently where it doesn't need to RAM spike > like merging, we can remove the checkIntegrity() method completely from > LeafReader, since it can always be accomplished on producers, etc. In general > it would be nice to just have one codepath for merging that is as efficient > as we can make it, and to support things like index modifications during > merge. > I spent a few weeks writing 3 different implementations to fix this > (interface, optional abstract class, "fix LeafReader"), and the latter is the > only one i don't completely hate: I think our APIs should be efficient for > indexing as well as search. > So the proposal is simple, its to instead refactor LeafReader to just require > the producer APIs as abstract methods (and FilterReaders should work on > that). The search-oriented APIs can just be final methods that defer to those. > So we would add 5 abstract methods, but implement 10 current methods as final > based on those, and then merging would always be efficient. > {code} > // new abstract codec-based apis > /** > * Expert: retrieve thread-private TermVectorsReader > * @throws AlreadyClosedException if this reader is closed > * @lucene.internal > */ > protected abstract TermVectorsReader getTermVectorsReader(); > /** > * Expert: retrieve thread-private StoredFieldsReader > * @throws AlreadyClosedException if this reader is closed > * @lucene.internal > */ > protected abstract StoredFieldsReader getFieldsReader(); > > /** > * Expert: retrieve underlying NormsProducer > * @throws AlreadyClosedException if this reader is closed > * @lucene.internal > */ > protected abstract NormsProducer getNormsReader(); > > /** > * Expert: retrieve underlying DocValuesProducer > * @throws AlreadyClosedException if this reader is closed > * @lucene.internal > */ > protected abstract DocValuesProducer getDocValuesReader(); > > /** > * Expert: retrieve underlying FieldsProducer > * @throws AlreadyClosedException if this reader is closed > * @lucene.internal > */ > protected abstract FieldsProducer getPostingsReader(); > // user/search oriented public apis based on the above > public final Fields fields(); > public final void document(int, StoredFieldVisitor); > public final Fields getTermVectors(int); > public final NumericDocValues getNumericDocValues(String); > public final Bits getDocsWithField(String); > public final BinaryDocValues getBinaryDocValues(String); > public final SortedDocValues getSortedDocValues(String); > public final SortedNumericDocValues getSortedNumericDocValues(String); > public final SortedSetDocValues getSortedSetDocValues(String); > public final NumericDocValues getNormValues(String); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org