Hello Csaba Ringhofer, Joe McDonnell, Michael Smith, Impala Public Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/24089
to look at the new patch set (#20).
Change subject: IMPALA-14850: Codegen tuple DeepCopy for hash join
......................................................................
IMPALA-14850: Codegen tuple DeepCopy for hash join
Created codegen'd version of BufferedTupleStream::DeepCopy.
Codegen'd function is only used by PartitionedHashJoinBuilder in this
patch.
This patch does not do proper codegen for collection types, instead it
calls the interpreted code for them.
It was considered to use Tuple's TryDeepCopy* functions for
BufferedTupleStream, but it's better to keep its own DeepCopy
for there are differences between the two:
-BufferedTupleStream doesn't copy tuples serially, first
it copies "fixed len" parts of all tuples, then all
"string data" for all tuples, then all "collection data" of
all tuples.
-BufferedTupleStream's DeepCopy doesn't set String's pointers.
This also applies when copying a string from a collection.
Measurements:
Measured with the following commit:
select straight_join l_orderkey, o_custkey, o_orderkey, l_partkey
from tpch30.orders left join /*+broadcast*/ tpch30.lineitem
on o_orderkey = l_orderkey where o_totalprice<0;
Where tpch30 is generated by:
bin/load-data.py -s 30 -f --workloads tpch
--table_formats text/none,parquet/snap
Before:
BuildRowsPartitionTime: 3s996ms
After:
BuildRowsPartitionTime: 2s139ms
Testing:
Added tests to buffered-tuple-stream-test.cc that compare the results
of codegen'd and basic DeepCopy variations of BufferedTupleStream
with different data types.
Change-Id: I63e32babdbaf56095478c6c66afb9cb91189f946
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/exec/partitioned-hash-join-builder-ir.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-builder.h
A be/src/exec/partitioned-hash-join-builder.inline.h
M be/src/runtime/CMakeLists.txt
A be/src/runtime/buffered-tuple-stream-ir.cc
M be/src/runtime/buffered-tuple-stream-test.cc
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/buffered-tuple-stream.h
M be/src/runtime/buffered-tuple-stream.inline.h
M be/src/runtime/spillable-row-batch-queue.h
A be/src/runtime/tuple-row-ir.cc
M be/src/runtime/tuple-row.h
15 files changed, 741 insertions(+), 179 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/89/24089/20
--
To view, visit http://gerrit.cloudera.org:8080/24089
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I63e32babdbaf56095478c6c66afb9cb91189f946
Gerrit-Change-Number: 24089
Gerrit-PatchSet: 20
Gerrit-Owner: Balazs Hevele <[email protected]>
Gerrit-Reviewer: Balazs Hevele <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>