Hi guys: Looking at the 2.9 sorting algorithm, and while trying to understand FieldComparator class, I was wondering about the following optimization: (I am using StringOrdValComparator as an example)
Currently we have 1 instance of per segment data structure, e.g. (ords,vals etc.), and we keep track of the reader context using the readerGen array. We keep track of the top numHits elemnents by keeping a reference to the bottom element. To convert from readerContext1 to readerContext2, we need a mapping from value->ord, and the cost is a binary search (this is can be done N*numhits times). For some instances of custom sort extension, this lookup maybe expensive. How about instead, we keep a N instances of segment data structure, all compares within segment is done with ord compare. We never to across segment compare until the end. Where a merge is done on N instances using the Comparator. This way, we avoid the lookup from value->ord, and can keep the simpler API: ScoreDocComparator, which is much easier to extend for custom sorting. It is a higher memory cost, but it is an extra (N-1)*(numhits) times the data structure, where N is the number of segments, and is not too large. I am probably missing some intricacies with the current sorting algorithm. If so, please shine some light. Thanks -John