Hello Csaba Ringhofer, Bikramjeet Vig, Impala Public Jenkins, I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/15612 to look at the new patch set (#13). Change subject: IMPALA-9176: shared null-aware anti-join build ...................................................................... IMPALA-9176: shared null-aware anti-join build This switches null-aware anti-join (NAAJ) to use shared join builds with mt_dop > 0. To support this, we make all access to the join build data structures from the probe read-only. NAAJ requires iterating over rows from build partitions at various steps in the algorithm and before this patch this was not thread-safe. We avoided that problem by having a separate builder for each join node and duplicating the data. The main challenge was iteration over null_aware_partition()->build_rows() from the probe side, because it uses an embedded iterator in the stream so was not thread-safe (since each thread would be trying to use the same iterator). The solution is to extend BufferedTupleStream to allow multiple read iterators into a pinned, read-only, stream. Each probe thread can then iterate over the stream independently with no thread safety issues. With BufferedTupleStream changes, I partially abstracted ReadIterator more from the rest of BufferedTupleStream, but decided not to completely refactor so that this patchset didn't cause excessive churn. I.e. much BufferedTupleStream code still accesses internal fields of ReadIterator. Fix a pre-existing bug in grouping-aggregator where Spill() hit a DCHECK because the hash table was destroyed unnecessarily when it hit an OOM. This was flushed out by the parameter change in test_spilling. Testing: Add test to buffered-tuple-stream-test for multiple readers to BTS. Tweaked test_spilling_naaj_no_deny_reservation to have a smaller minimum reservation, required to keep the test passing with the new, lower, memory requirement. Updated a TPC-H planner test where resource requirements slightly decreased for the NAAJ. Ran the naaj tests in test_spilling.py with TSAN enabled, confirmed no data races. Ran exhaustive tests, which passed after fixing IMPALA-9611. Ran core tests with ASAN. Ran backend tests with TSAN. Perf: I ran this query that exercises EvaluateNullProbe() heavily. select l_orderkey, l_partkey, l_suppkey, l_linenumber from tpch30_parquet.lineitem where l_suppkey = 4162 and l_shipmode = 'AIR' and l_returnflag = 'A' and l_shipdate > '1993-01-01' and if(l_orderkey > 5500000, NULL, l_orderkey) not in ( select if(o_orderkey % 2 = 0, NULL, o_orderkey + 1) from orders where l_orderkey = o_orderkey) order by 1,2,3,4; It went from ~13s to ~11s running on a single impalad with this change, because of the inlining of CreateOutputRow() and EvalConjuncts(). I also ran TPC-H SF 30 on Parquet with mt_dop=4, and there was no change in performance. Change-Id: I95ead761430b0aa59a4fb2e7848e47d1bf73c1c9 --- M be/src/exec/blocking-join-node.cc M be/src/exec/blocking-join-node.h A be/src/exec/blocking-join-node.inline.h M be/src/exec/data-source-scan-node.cc M be/src/exec/exec-node.cc M be/src/exec/exec-node.h A be/src/exec/exec-node.inline.h M be/src/exec/grouping-aggregator-partition.cc M be/src/exec/grouping-aggregator.cc M be/src/exec/grouping-aggregator.h M be/src/exec/hbase-scan-node.cc M be/src/exec/hdfs-avro-scanner-ir.cc M be/src/exec/hdfs-columnar-scanner-ir.cc M be/src/exec/hdfs-orc-scanner.cc M be/src/exec/hdfs-rcfile-scanner.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/hdfs-text-scanner.cc M be/src/exec/kudu-scanner.cc M be/src/exec/nested-loop-join-node.cc M be/src/exec/non-grouping-aggregator.cc M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/partitioned-hash-join-builder.h M be/src/exec/partitioned-hash-join-node-ir.cc M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/partitioned-hash-join-node.h M be/src/exec/select-node-ir.cc M be/src/exec/unnest-node.cc M be/src/runtime/buffered-tuple-stream-test.cc M be/src/runtime/buffered-tuple-stream.cc M be/src/runtime/buffered-tuple-stream.h M be/src/runtime/buffered-tuple-stream.inline.h M be/src/runtime/bufferpool/buffer-pool-internal.h M be/src/runtime/bufferpool/buffer-pool-test.cc M be/src/runtime/bufferpool/buffer-pool.cc M be/src/runtime/bufferpool/buffer-pool.h M be/src/util/debug-util.cc M be/src/util/debug-util.h M common/thrift/generate_error_codes.py M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java M fe/src/main/java/org/apache/impala/planner/JoinNode.java M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test M testdata/workloads/functional-query/queries/QueryTest/spilling-no-debug-action.test M tests/query_test/test_spilling.py 45 files changed, 835 insertions(+), 441 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/12/15612/13 -- To view, visit http://gerrit.cloudera.org:8080/15612 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I95ead761430b0aa59a4fb2e7848e47d1bf73c1c9 Gerrit-Change-Number: 15612 Gerrit-PatchSet: 13 Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Bikramjeet Vig <bikramjeet....@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>