sohami commented on a change in pull request #1334: DRILL-6385: Support JPPD
feature
URL: https://github.com/apache/drill/pull/1334#discussion_r207749408
##########
File path:
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java
##########
@@ -226,6 +259,93 @@ public IterOutcome next() {
}
}
+ /**
+ *
+ * @return true means rows are filtered by the RuntimeFilter, false means
not affected by the RuntimeFilter.
+ * @throws SchemaChangeException
+ */
+ private boolean applyRuntimeFilter() throws SchemaChangeException {
+ RuntimeFilterWritable runtimeFilterWritable = context.getRuntimeFilter();
+ if (runtimeFilterWritable == null) {
+ return false;
+ }
+ if (recordCount <= 0) {
+ return false;
+ }
+ List<BloomFilter> bloomFilters = runtimeFilterWritable.unwrap();
+ if (hash64 == null) {
+ ValueVectorHashHelper hashHelper = new ValueVectorHashHelper(this,
context);
+ try {
+ //generate hash helper
+ this.toFilterFields =
runtimeFilterWritable.getRuntimeFilterBDef().getProbeFieldsList();
+ List<LogicalExpression> hashFieldExps = new ArrayList<>();
+ List<TypedFieldId> typedFieldIds = new ArrayList<>();
+ for (String toFilterField : toFilterFields) {
+ SchemaPath schemaPath = new SchemaPath(new
PathSegment.NameSegment(toFilterField), ExpressionPosition.UNKNOWN);
+ TypedFieldId typedFieldId = container.getValueVectorId(schemaPath);
+ this.field2id.put(toFilterField, typedFieldId.getFieldIds()[0]);
+ typedFieldIds.add(typedFieldId);
+ ValueVectorReadExpression toHashFieldExp = new
ValueVectorReadExpression(typedFieldId);
+ hashFieldExps.add(toHashFieldExp);
+ }
+ hash64 = hashHelper.getHash64(hashFieldExps.toArray(new
LogicalExpression[hashFieldExps.size()]), typedFieldIds.toArray(new
TypedFieldId[typedFieldIds.size()]));
+ } catch (Exception e) {
+ throw UserException.internalError(e).build(logger);
+ }
+ }
+ selectionVector2.allocateNew(recordCount);
+ //To make each independent bloom filter work together to construct a final
filter result: BitSet.
+ BitSet bitSet = new BitSet(recordCount);
+ for (int i = 0; i < toFilterFields.size(); i++) {
+ BloomFilter bloomFilter = bloomFilters.get(i);
+ String fieldName = toFilterFields.get(i);
+ computeBitSet(field2id.get(fieldName), bloomFilter, bitSet);
+ }
+
+ int svIndex = 0;
+ int tmpFilterRows = 0;
+ for (int i = 0; i < recordCount; i++) {
+ boolean contain = bitSet.get(i);
+ if (contain) {
+ selectionVector2.setIndex(svIndex, i);
+ svIndex++;
+ } else {
+ tmpFilterRows++;
+ }
+ }
+ selectionVector2.setRecordCount(svIndex);
+ if (tmpFilterRows > 0 && tmpFilterRows == recordCount) {
+ //all rows of the batch was filtered
+ recordCount = 0;
+ selectionVector2.clear();
+ logger.debug("filter {} rows by the RuntimeFilter", tmpFilterRows);
+ //return false to avoid unnecessary SV2 memory copy work
+ return false;
+ }
+ if (tmpFilterRows > 0 && tmpFilterRows != recordCount ) {
+ //partial of the rows was filtered
+ totalFilterRows = totalFilterRows + tmpFilterRows;
+ recordCount = svIndex;
+ logger.debug("filter {} rows by the RuntimeFilter", tmpFilterRows);
+ return true;
+ }
+ selectionVector2.clear();
Review comment:
In this case it means none of the rows are filtered out.
There is a bug here. Consider a case when few rows where filtered for one
batch so `applyRuntimeFilter` will return true with `selectionVector2`
correctly allocated and set. Since this is first batch with some filtered rows
the caller will create a batch with SV2 Mode and set `runtimeFiltered=true`.
Now the next batch came in for which none of the rows were filtered. Hence we
will clear **selectionVector2** here but the schema of returned batch is still
SV2Mode with record count set to number of rows in the batches. The downstream
operator will throw IOB while trying to access the first record since
**selectionVector2** buffer is cleared.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services