sohami commented on a change in pull request #1334: DRILL-6385: Support JPPD feature URL: https://github.com/apache/drill/pull/1334#discussion_r207750658
########## File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java ########## @@ -190,11 +213,21 @@ public IterOutcome next() { if (isNewSchema) { // Even when recordCount = 0, we should return return OK_NEW_SCHEMA if current reader presents a new schema. // This could happen when data sources have a non-trivial schema with 0 row. - container.buildSchema(SelectionVectorMode.NONE); + if (firstRuntimeFiltered) { + container.buildSchema(SelectionVectorMode.TWO_BYTE); + runtimeFiltered = true; + } else { + container.buildSchema(SelectionVectorMode.NONE); + } Review comment: In general I am concerned about the different types of output container being generated in ScanBatch at runtime. None of the operator does that post buildSchema phase and it increases the chances of introducing bugs in code. When a RecordBatch returns SV vector along with it then general convention is that record count will be dictated by SV vector, but here we are relying on another variable `recordCount`. Also we need to be extra careful when to set SV2 correctly both with conditions of schema change and when runtimeFiltered flag is applied. I think the reason to do this way is to avoid extra copy by RemovingRecordBatch for cases when there is no records filtered out using bloom filter condition. But this will still happen in this case when let say with one batch some records were filtered which moved ScanBatch from SVMode None to Two and later batches were such that none of the records were filtered out. My recommendation will be to use a global query level option to determine when the BloomFilter can be applied, and use that information to add a FilterOperator on top of Scan. Since Filter will also do the exact same thing (i.e. apply SV2) based on the condition obtainer from RuntimeFilter. Until FilterOperator gets the runTimeFilter information it will just pass through the batches as is from Scan. This way Scan doesn't have to duplicate the logic of Filter using SV2 vector. @amansinha100 - Do you have any recommendation for this ? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services