jainankitk commented on code in PR #16091:
URL: https://github.com/apache/lucene/pull/16091#discussion_r3352921692
##########
lucene/join/src/java/org/apache/lucene/search/join/JoinUtil.java:
##########
@@ -550,11 +554,26 @@ public static Query createJoinQuery(
break;
case None:
if (min <= 1 && max == Integer.MAX_VALUE) {
- GlobalOrdinalsCollector globalOrdinalsCollector =
- new GlobalOrdinalsCollector(joinField, ordinalMap, valueCount);
- searcher.search(rewrittenFromQuery, globalOrdinalsCollector);
+ LongBitSet collectedOrds =
+ searcher.search(
+ rewrittenFromQuery,
+ new CollectorManager<GlobalOrdinalsCollector, LongBitSet>() {
+ @Override
+ public GlobalOrdinalsCollector newCollector() {
+ return new GlobalOrdinalsCollector(joinField,
finalOrdinalMap, valueCount);
+ }
Review Comment:
I am wondering if workloads with `allowSegmentPartitions=true` allocate
fresh `LongBitSet(segmentValueCount)` by invoking
`SegmentLocalCollector.getLeafCollector(ctx)` once per partition per segment.
Also, are we doing slightly more overall work by deferring the global ordinals
remap to the reduction phase.
While I don't have good proposal that works well for all scenarios, maybe a
hybrid approach. I was also thinking about single shared global bitset with
atomic word writes, but maybe the concurrency overhead outweighs the concurrent
collection benefit?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]