Hello Impala Public Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/24089
to look at the new patch set (#15).
Change subject: WIP IMPALA-2744: Codegen for tuple DeepCopy - part1
......................................................................
WIP IMPALA-2744: Codegen for tuple DeepCopy - part1
Created codegen'd version of BufferedTupleStream::DeepCopy.
Codegen'd function is currently used by PartitionedHashJoinBuilder.
TODO: Use it in other systems that use BufferedTupleStream:
-AnalyticEvalNode
-GroupingAggregator
-SpillableRowBatchQueue/BufferedPlanRootSink
It was considered to use Tuple's TryDeepCopy* functions for
BufferedTupleStream, but it's better to keep its own DeepCopy
for there are differences between the two:
-BufferedTupleStream doesn't copy tuples serially, first
it copies "fixed len" parts of all tuples, then all
"string data" for all tuples, then all "collection data" of
all tuples.
-BufferedTupleStream's DeepCopy doesn't set String's pointers.
This also applies when copying a string from a collection.
Measurements:
Measured with the following commit:
select straight_join l_orderkey, o_custkey, o_orderkey, l_partkey
from tpch30.orders left join tpch30.lineitem on o_orderkey = l_orderkey
where o_totalprice<0
Where tpch30 is generated by:
bin/load-data.py -s 30 -f --workloads tpch
--table_formats text/none,parquet/snap
Before:
BuildRowsPartitionTime: 2s530ms
After:
BuildRowsPartitionTime: 1s866ms
Testing:
Added tests to buffered-tuple-stream-test.cc that compare the results
of codegen'd and basic DeepCopy variations of BufferedTupleStream
with different data types.
Change-Id: I63e32babdbaf56095478c6c66afb9cb91189f946
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/exec/partitioned-hash-join-builder-ir.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-builder.h
A be/src/exec/partitioned-hash-join-builder.inline.h
M be/src/runtime/CMakeLists.txt
A be/src/runtime/buffered-tuple-stream-ir.cc
M be/src/runtime/buffered-tuple-stream-test.cc
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/buffered-tuple-stream.h
M be/src/runtime/buffered-tuple-stream.inline.h
M be/src/runtime/spillable-row-batch-queue.h
A be/src/runtime/tuple-row-ir.cc
M be/src/runtime/tuple-row.h
15 files changed, 736 insertions(+), 176 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/89/24089/15
--
To view, visit http://gerrit.cloudera.org:8080/24089
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I63e32babdbaf56095478c6c66afb9cb91189f946
Gerrit-Change-Number: 24089
Gerrit-PatchSet: 15
Gerrit-Owner: Balazs Hevele <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>