[GitHub] sohami commented on a change in pull request #1334: DRILL-6385: Support JPPD feature

GitBox Sun, 05 Aug 2018 17:29:24 -0700

sohami commented on a change in pull request #1334: DRILL-6385: Support JPPD 
feature
URL: https://github.com/apache/drill/pull/1334#discussion_r207750658


 ##########
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java
 ##########
 @@ -190,11 +213,21 @@ public IterOutcome next() {
         if (isNewSchema) {
           // Even when recordCount = 0, we should return return OK_NEW_SCHEMA 
if current reader presents a new schema.
           // This could happen when data sources have a non-trivial schema 
with 0 row.
-          container.buildSchema(SelectionVectorMode.NONE);
+          if (firstRuntimeFiltered) {
+            container.buildSchema(SelectionVectorMode.TWO_BYTE);
+            runtimeFiltered = true;
+          } else {
+            container.buildSchema(SelectionVectorMode.NONE);
+          }
 
 Review comment:
   In general I am concerned about the different types of output container 
being generated in ScanBatch at runtime. None of the operator does that post 
buildSchema phase and it increases the chances of introducing bugs in code. 
When a RecordBatch returns SV vector along with it then general convention is 
that record count will be dictated by SV vector, but here we are relying on 
another variable `recordCount`.  Also we need to be extra careful when to set 
SV2 correctly both with conditions of schema change and when runtimeFiltered 
flag is applied.
   
   I think the reason to do this way is to avoid extra copy by 
RemovingRecordBatch for cases when there is no records filtered out using bloom 
filter condition. But this will still happen in this case when let say with one 
batch some records were filtered which moved ScanBatch from SVMode None to Two 
and later batches were such that none of the records were filtered out.
   
   My recommendation will be to use a global query level option to determine 
when the BloomFilter can be applied, and use that information to add a 
FilterOperator on top of Scan. Since Filter will also do the exact same thing 
(i.e. apply SV2) based on the condition obtainer from RuntimeFilter. Until 
FilterOperator gets the runTimeFilter information it will just pass through the 
batches as is from Scan. This way Scan doesn't have to duplicate the logic of 
Filter using SV2 vector. @amansinha100 - Do you have any recommendation for 
this ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] sohami commented on a change in pull request #1334: DRILL-6385: Support JPPD feature

Reply via email to