> > Therefore the DocMap is initialized only when the > merge actually executes ... what is there more to postpone?
Agreed. However, what I am asking is, if there is an alternative to DocMap, will that be better? Plz read-on And besides, if the segments are already sorted, you should return a null DocMap, > like Lucene code does ... What I am trying to say is, my individual segments are sorted. However, when a merge combines "N" individual sorted-segments, there needs to be a global sort-order for writing the new segment. Passing null DocMap won't work here, no? DocMap is one-way of bringing the global order during a merge. Another way is to use something like a MergedIterator<SegmentReader> instead of DocMap, which doesn't need any memory I was trying to get a heads-up on these 2 approaches. Please do let me know if I have understood correctly -- Ravi On Tue, Jun 17, 2014 at 5:42 PM, Shai Erera <ser...@gmail.com> wrote: > > > > I am afraid the DocMap still maintains doc-id mappings till merge and I > am > > trying to avoid it... > > > > What do you mean 'till merge'? The method OneMerge.getMergeReaders() is > called only when the merge is executed, not when the MergePolicy decided to > merge those segments. Therefore the DocMap is initialized only when the > merge actually executes ... what is there more to postpone? > > And besides, if the segments are already sorted, you should return a null > DocMap, like Lucene code does ... > > If I miss your point, I'd appreciate if you can point me to a code example, > preferably in Lucene source, which demonstrates the problem. > > Shai > > > On Tue, Jun 17, 2014 at 3:03 PM, Ravikumar Govindarajan < > ravikumar.govindara...@gmail.com> wrote: > > > I am afraid the DocMap still maintains doc-id mappings till merge and I > am > > trying to avoid it... > > > > I think lucene itself has a MergeIterator in o.a.l.util package. > > > > A MergePolicy can wrap a simple MergeIterator for iterating docs across > > different AtomicReaders in correct sort-order for a given field/term > > > > That should be fine right? > > > > -- > > Ravi > > > > -- > > Ravi > > > > > > On Tue, Jun 17, 2014 at 1:24 PM, Shai Erera <ser...@gmail.com> wrote: > > > > > loadSortTerm is your method right? In the current Sorter.sort > > > implementation, I see this code: > > > > > > boolean sorted = true; > > > for (int i = 1; i < maxDoc; ++i) { > > > if (comparator.compare(i-1, i) > 0) { > > > sorted = false; > > > break; > > > } > > > } > > > if (sorted) { > > > return null; > > > } > > > > > > Perhaps you can write similar code? > > > > > > Also note that the sorting interface has changed, I think in 4.8, and > now > > > you don't really need to implement a Sorter, but rather pass a > SortField, > > > if that works for you. > > > > > > Shai > > > > > > > > > On Tue, Jun 17, 2014 at 9:41 AM, Ravikumar Govindarajan < > > > ravikumar.govindara...@gmail.com> wrote: > > > > > > > Shai, > > > > > > > > This is the code snippet I use inside my class... > > > > > > > > public class MySorter extends Sorter { > > > > > > > > @Override > > > > > > > > public DocMap sort(AtomicReader reader) throws IOException { > > > > > > > > final Map<Integer, BytesRef> docVsId = loadSortTerm(reader); > > > > > > > > final Sorter.DocComparator comparator = new Sorter.DocComparator() > { > > > > > > > > @Override > > > > > > > > public int compare(int docID1, int docID2) { > > > > > > > > BytesRef v1 = docVsId.get(docID1); > > > > > > > > BytesRef v2 = docVsId.get(docID2); > > > > > > > > return v1.compareTo(v2); > > > > > > > > } > > > > > > > > }; > > > > > > > > return sort(reader.maxDoc(), comparator); > > > > > > > > } > > > > } > > > > > > > > My Problem is, the "AtomicReader" passed to Sorter.sort method is > > > actually > > > > a SlowCompositeReader, composed of a list of AtomicReaders each of > > which > > > is > > > > already sorted. > > > > > > > > I find this "loadSortTerm(compositeReader)" to be a bit heavy where > it > > > > tries to all load the doc-to-term mappings eagerly... > > > > > > > > Are there some alternatives for this? > > > > > > > > -- > > > > Ravi > > > > > > > > > > > > On Tue, Jun 17, 2014 at 10:58 AM, Shai Erera <ser...@gmail.com> > wrote: > > > > > > > > > I'm not sure that I follow ... where do you see DocMap being loaded > > up > > > > > front? Specifically, Sorter.sort may return null of the readers are > > > > already > > > > > sorted ... I think we already optimized for the case where the > > readers > > > > are > > > > > sorted. > > > > > > > > > > Shai > > > > > > > > > > > > > > > On Tue, Jun 17, 2014 at 4:04 AM, Ravikumar Govindarajan < > > > > > ravikumar.govindara...@gmail.com> wrote: > > > > > > > > > > > I am planning to use SortingMergePolicy where all the > > > > merge-participating > > > > > > segments are already sorted... I understand that I need to > define a > > > > > DocMap > > > > > > with old-new doc-id mappings. > > > > > > > > > > > > Is it possible to optimize the eager loading of DocMap and make > it > > > kind > > > > > of > > > > > > lazy load on-demand? > > > > > > > > > > > > Ex: Pass List<AtomicReader> to the caller and ask for next > new-old > > > doc > > > > > > mapping.. > > > > > > > > > > > > Since my segments are already sorted, I could save on memory a > > > > little-bit > > > > > > this way, instead of loading the full DocMap upfront > > > > > > > > > > > > -- > > > > > > Ravi > > > > > > > > > > > > > > > > > > > > >