[ https://issues.apache.org/jira/browse/IMPALA-12204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17731435#comment-17731435 ]
Quanlong Huang commented on IMPALA-12204: ----------------------------------------- FWIW, here is the stacktrace to where the string is added: {noformat} @ 0x35d28f7 impala::DataSink::Open() @ 0x36b295d impala::PhjBuilder::Open() @ 0x3779788 impala::BlockingJoinNode::SendBuildInputToSink<>() @ 0x37742ff impala::BlockingJoinNode::ProcessBuildInputAndOpenProbe() @ 0x36daab7 impala::PartitionedHashJoinNode::Open() @ 0x3696fac impala::NestedLoopJoinNode::Open() @ 0x371a809 impala::SubplanNode::GetNext() @ 0x37edb98 impala::AggregationNode::Open() @ 0x2b41779 impala::FragmentInstanceState::Open() @ 0x2b3e1c0 impala::FragmentInstanceState::Exec() @ 0x2a2b5ef impala::QueryState::ExecFInstance() @ 0x297ac77 boost::function0<>::operator()() @ 0x34ae6f0 impala::Thread::SuperviseThread() @ 0x34baaf9 boost::_bi::list5<>::operator()<>() @ 0x34ba94c boost::_bi::bind_t<>::operator()() @ 0x4b8fd77 thread_proxy @ 0x7f42a69ee6db start_thread @ 0x7f42a376361f clone {noformat} Code snipper {code:cpp} Status DataSink::Open(RuntimeState* state) { DCHECK_EQ(output_exprs_.size(), output_expr_evals_.size()); for (const string& codegen_msg : sink_config_.codegen_status_msgs_) { profile_->AppendExecOption(codegen_msg); } return ScalarExprEvaluator::Open(output_expr_evals_, state); } {code} > Redundant codegen info of HashJoinBuilder inside a subplan > ---------------------------------------------------------- > > Key: IMPALA-12204 > URL: https://issues.apache.org/jira/browse/IMPALA-12204 > Project: IMPALA > Issue Type: Bug > Components: Backend > Reporter: Quanlong Huang > Assignee: Quanlong Huang > Priority: Critical > > In query profile, the info strings of a hash join builder contains an > ExecOption that has content like "Build Side Codegen Enabled, Hash Table > Construction Codegen Enabled". When there is a HashJoin node inside a SUBPLAN > node, this string could be repeated many times since the SUBPLAN node will > open the right child many times. This could blow up the profile size. > I can reproduce this by the following query: > {code:sql} > select count(*) from > tpch_nested_parquet.customer c1, > tpch_nested_parquet.customer c2, > (select x.* from c1.c_orders x, c2.c_orders y > where x.o_orderkey = y.o_orderkey) v > where c1.c_custkey = c2.c_custkey;{code} > In the query plan, there is a HASH JOIN node inside a SUBPLAN node: > {noformat} > 08:SUBPLAN > | row-size=56B cardinality=1.50M > | > |--06:NESTED LOOP JOIN [CROSS JOIN] > | | row-size=56B cardinality=10 > | | > | |--02:SINGULAR ROW SRC > | | row-size=40B cardinality=1 > | | > | 05:HASH JOIN [INNER JOIN] > | | hash predicates: x.o_orderkey = y.o_orderkey > | | row-size=16B cardinality=10 > | | > | |--04:UNNEST [c2.c_orders y] > | | row-size=0B cardinality=10 > | | > | 03:UNNEST [c1.c_orders x] > | row-size=0B cardinality=10 > {noformat} > The query porfile has super long strings: > {noformat} > Hash Join Builder (join_node_id=5): > ExecOption: Build Side Codegen Enabled, Hash Table Construction Codegen > Enabled, Build Side Codegen Enabled, Hash Table Construction Codegen > Enabled,... > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org