weijietong commented on a change in pull request #1600: DRILL-6947: fix
RuntimeFilter memory leak
URL: https://github.com/apache/drill/pull/1600#discussion_r250454504
##########
File path:
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/filter/RuntimeFilterRecordBatch.java
##########
@@ -224,21 +224,39 @@ private void applyRuntimeFilter() throws
SchemaChangeException {
setupHashHelper();
//To make each independent bloom filter work together to construct a final
filter result: BitSet.
BitSet bitSet = new BitSet(originalRecordCount);
- for (int i = 0; i < toFilterFields.size(); i++) {
- BloomFilter bloomFilter = bloomFilters.get(i);
- String fieldName = toFilterFields.get(i);
- computeBitSet(field2id.get(fieldName), bloomFilter, bitSet);
- }
+
+ int filterSize = toFilterFields.size();
int svIndex = 0;
- for (int i = 0; i < originalRecordCount; i++) {
- boolean contain = bitSet.get(i);
- if (contain) {
- sv2.setIndex(svIndex, i);
- svIndex++;
- } else {
- filteredRows++;
+ if (filterSize == 1) {
+ BloomFilter bloomFilter = bloomFilters.get(0);
+ String fieldName = toFilterFields.get(0);
+ int fieldId = field2id.get(fieldName);
+ for (int rowIndex = 0; rowIndex < originalRecordCount; rowIndex++) {
+ long hash = hash64.hash64Code(rowIndex, 0, fieldId);
+ boolean contain = bloomFilter.find(hash);
+ if (contain) {
+ sv2.setIndex(svIndex, rowIndex);
+ svIndex ++;
+ }
+ }
+ } else {
+ for (int i = 0; i < toFilterFields.size(); i++) {
+ BloomFilter bloomFilter = bloomFilters.get(i);
+ String fieldName = toFilterFields.get(i);
+ computeBitSet(field2id.get(fieldName), bloomFilter, bitSet);
+ }
+ for (int i = 0; i < originalRecordCount; i++) {
+ boolean contain = bitSet.get(i);
+ if (contain) {
+ sv2.setIndex(svIndex, i);
+ svIndex++;
+ } else {
+ filteredRows++;
+ }
Review comment:
As a vector memory format, I prefer one calculation per column execution
model which is data local friendly. This also applies to the HashJoin's
hashing phase which currently is not that way.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services