sohami commented on a change in pull request #1334: DRILL-6385: Support JPPD
feature
URL: https://github.com/apache/drill/pull/1334#discussion_r207750658
##########
File path:
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java
##########
@@ -190,11 +213,21 @@ public IterOutcome next() {
if (isNewSchema) {
// Even when recordCount = 0, we should return return OK_NEW_SCHEMA
if current reader presents a new schema.
// This could happen when data sources have a non-trivial schema
with 0 row.
- container.buildSchema(SelectionVectorMode.NONE);
+ if (firstRuntimeFiltered) {
+ container.buildSchema(SelectionVectorMode.TWO_BYTE);
+ runtimeFiltered = true;
+ } else {
+ container.buildSchema(SelectionVectorMode.NONE);
+ }
Review comment:
In general I am concerned about the different types of output container
being generated in ScanBatch at runtime. None of the operator does that post
buildSchema phase and it increases the chances of introducing bugs in code.
When a RecordBatch returns SV vector along with it then general convention is
that record count will be dictated by SV vector, but here we are relying on
another variable `recordCount`. Also we need to be extra careful when to set
SV2 correctly both with conditions of schema change and when runtimeFiltered
flag is applied.
I think the reason to do this way is to avoid extra copy by
RemovingRecordBatch for cases when there is no records filtered out using bloom
filter condition. But this will still happen in this case when let say with one
batch some records were filtered which moved ScanBatch from SVMode None to Two
and later batches were such that none of the records were filtered out.
My recommendation will be to use a global query level option to determine
when the BloomFilter can be applied, and use that information to add a
FilterOperator on top of Scan. Since Filter will also do the exact same thing
(i.e. apply SV2) based on the condition obtainer from RuntimeFilter. Until
FilterOperator gets the runTimeFilter information it will just pass through the
batches as is from Scan. This way Scan doesn't have to duplicate the logic of
Filter using SV2 vector. @amansinha100 - Do you have any recommendation for
this ?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services