clintropolis commented on a change in pull request #7133: Time Ordering On Scans
URL: https://github.com/apache/incubator-druid/pull/7133#discussion_r270201441
##########
File path:
processing/src/main/java/org/apache/druid/query/scan/ScanQueryLimitRowIterator.java
##########
@@ -81,12 +109,26 @@ public ScanResultValue next()
} else {
// last batch
// single batch length is <= Integer.MAX_VALUE, so this should not
overflow
- int left = (int) (limit - count);
+ int numLeft = (int) (limit - count);
count = limit;
- return new ScanResultValue(batch.getSegmentId(), batch.getColumns(),
events.subList(0, left));
+ return new ScanResultValue(batch.getSegmentId(), batch.getColumns(),
events.subList(0, numLeft));
+ }
+ } else {
+ // Perform single-event ScanResultValue batching at the outer level.
Each scan result value from the yielder
+ // in this case will only have one event so there's no need to iterate
through events.
+ int batchSize = query.getBatchSize();
+ List<Object> eventsToAdd = new ArrayList<>(batchSize);
+ List<String> columns = new ArrayList<>();
+ while (eventsToAdd.size() < batchSize && !yielder.isDone() && count <
limit) {
+ ScanResultValue srv = yielder.get();
+ // Only replace once using the columns from the first event
+ columns = columns.isEmpty() ? srv.getColumns() : columns;
+ eventsToAdd.add(Iterables.getOnlyElement((List<Object>)
srv.getEvents()));
+ yielder = yielder.next(null);
+ count++;
}
+ return new ScanResultValue(null, columns, eventsToAdd);
Review comment:
I think the only way to know what `segmentId` is at all is to read the docs,
`null` isn't particularly intuitive here either. But you're right, I don't
think using `segmentId` for this is the correct thing, rather it probably makes
sense to add a new `interval` property to the json and `ScanResultValue`, that
has interval of min and max event timestamps, which seems useful for both
ordered an unordered scan queries (if it's not too painful for unordered). I
don't think this needs to be done in this PR, but maybe a nice follow up to
consider.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]