[GitHub] sohami commented on a change in pull request #1334: DRILL-6385: Support JPPD feature

GitBox Sun, 05 Aug 2018 17:29:24 -0700

sohami commented on a change in pull request #1334: DRILL-6385: Support JPPD 
feature
URL: https://github.com/apache/drill/pull/1334#discussion_r207749408


 ##########
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java
 ##########
 @@ -226,6 +259,93 @@ public IterOutcome next() {
     }
   }
 
+  /**
+   *
+   * @return true means rows are filtered by the RuntimeFilter, false means 
not affected by the RuntimeFilter.
+   * @throws SchemaChangeException
+   */
+  private boolean applyRuntimeFilter() throws SchemaChangeException {
+    RuntimeFilterWritable runtimeFilterWritable = context.getRuntimeFilter();
+    if (runtimeFilterWritable == null) {
+      return false;
+    }
+    if (recordCount <= 0) {
+      return false;
+    }
+    List<BloomFilter> bloomFilters = runtimeFilterWritable.unwrap();
+    if (hash64 == null) {
+      ValueVectorHashHelper hashHelper = new ValueVectorHashHelper(this, 
context);
+      try {
+        //generate hash helper
+        this.toFilterFields = 
runtimeFilterWritable.getRuntimeFilterBDef().getProbeFieldsList();
+        List<LogicalExpression> hashFieldExps = new ArrayList<>();
+        List<TypedFieldId> typedFieldIds = new ArrayList<>();
+        for (String toFilterField : toFilterFields) {
+          SchemaPath schemaPath = new SchemaPath(new 
PathSegment.NameSegment(toFilterField), ExpressionPosition.UNKNOWN);
+          TypedFieldId typedFieldId = container.getValueVectorId(schemaPath);
+          this.field2id.put(toFilterField, typedFieldId.getFieldIds()[0]);
+          typedFieldIds.add(typedFieldId);
+          ValueVectorReadExpression toHashFieldExp = new 
ValueVectorReadExpression(typedFieldId);
+          hashFieldExps.add(toHashFieldExp);
+        }
+        hash64 = hashHelper.getHash64(hashFieldExps.toArray(new 
LogicalExpression[hashFieldExps.size()]), typedFieldIds.toArray(new 
TypedFieldId[typedFieldIds.size()]));
+      } catch (Exception e) {
+        throw UserException.internalError(e).build(logger);
+      }
+    }
+    selectionVector2.allocateNew(recordCount);
+    //To make each independent bloom filter work together to construct a final 
filter result: BitSet.
+    BitSet bitSet = new BitSet(recordCount);
+    for (int i = 0; i < toFilterFields.size(); i++) {
+      BloomFilter bloomFilter = bloomFilters.get(i);
+      String fieldName = toFilterFields.get(i);
+      computeBitSet(field2id.get(fieldName), bloomFilter, bitSet);
+    }
+
+    int svIndex = 0;
+    int tmpFilterRows = 0;
+    for (int i = 0; i < recordCount; i++) {
+      boolean contain = bitSet.get(i);
+      if (contain) {
+        selectionVector2.setIndex(svIndex, i);
+        svIndex++;
+      } else {
+        tmpFilterRows++;
+      }
+    }
+    selectionVector2.setRecordCount(svIndex);
+    if (tmpFilterRows > 0 && tmpFilterRows == recordCount) {
+      //all rows of the batch was filtered
+      recordCount = 0;
+      selectionVector2.clear();
+      logger.debug("filter {} rows by the RuntimeFilter", tmpFilterRows);
+      //return false to avoid unnecessary SV2 memory copy work
+      return false;
+    }
+    if (tmpFilterRows > 0 && tmpFilterRows != recordCount ) {
+      //partial of the rows was filtered
+      totalFilterRows = totalFilterRows + tmpFilterRows;
+      recordCount = svIndex;
+      logger.debug("filter {} rows by the RuntimeFilter", tmpFilterRows);
+      return true;
+    }
+    selectionVector2.clear();
 
 Review comment:
   In this case it means none of the rows are filtered out.
   
   There is a bug here. Consider a case when few rows where filtered for one 
batch so `applyRuntimeFilter` will return true with `selectionVector2` 
correctly allocated and set. Since this is first batch with some filtered rows 
the caller will create a batch with SV2 Mode and set `runtimeFiltered=true`. 
Now the next batch came in for which none of the rows were filtered. Hence we 
will clear **selectionVector2** here but the schema of returned batch is still 
SV2Mode with record count set to number of rows in the batches. The downstream 
operator will throw IOB while trying to access the first record since 
**selectionVector2** buffer is cleared.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] sohami commented on a change in pull request #1334: DRILL-6385: Support JPPD feature

Reply via email to