I was planning to use ETSC in-conjunction with SortingMergePolicy and got
stuck.
In ESTC, we have
@Override
public void collect(int doc) throws IOException {
in.collect(doc);
if (++numCollected >= numDocsToCollect) {
throw new CollectionTerminatedException();
}
}
I understand this collector is per-segment. There is one-doubt regarding it.
Since a global-sort ordering is difficult, I collect hits for each segment
& return the final "numDocsToCollect" results using a PQ
If my "numDocsToCollect" = 50 and no.of. segments = 15, then
collector.collect() will be called 750 times.
When I use a SortField instead, then TopFieldDocs does the sorting for all
segments and collector.collect() will be called only 50 times...
Assuming a stored-field seek for every collector.collect(), will it be
advisable to still persist with ETSC? Was it introduced as a trade-off b/n
memory & disk?
Any help is much appreciated
--
Ravi