[ https://issues.apache.org/jira/browse/IMPALA-12204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17731438#comment-17731438 ]
Quanlong Huang commented on IMPALA-12204: ----------------------------------------- Go through all other Open methods and find out that UnionNode::Open() could also append strings repeatedly to the profile. Code snipper: {code:cpp} Status UnionNode::Open(RuntimeState* state) { ... if (is_codegen_status_added_ && num_const_scalar_expr_to_be_codegened_ == 0 && !const_exprs_lists_.empty()) { runtime_profile_->AppendExecOption("Codegen Disabled for const scalar expressions"); } return Status::OK(); } {code} The following query will hit the issue: {code:sql} select count(*) from tpch_nested_parquet.customer c1, tpch_nested_parquet.customer c2, ( select x.o_orderkey from c1.c_orders x union all select y.o_orderkey from c2.c_orders y union all select 100 ) v where c1.c_custkey = c2.c_custkey;{code} A UnionNode is inside the subplan: {code:sql} 08:SUBPLAN | row-size=40B cardinality=3.00M | |--06:NESTED LOOP JOIN [CROSS JOIN] | | row-size=40B cardinality=20 | | | |--02:SINGULAR ROW SRC | | row-size=40B cardinality=1 | | | 03:UNION | | row-size=0B cardinality=20 | | | |--05:UNNEST [c2.c_orders y] | | row-size=0B cardinality=10 | | | 04:UNNEST [c1.c_orders x] | row-size=0B cardinality=10 {code} Saw repeated strings of "Codegen Disabled for const scalar expressions" in profile: {noformat} UNION_NODE (id=3): ExecOption: Codegen Disabled for const scalar expressions, Codegen Disabled for const scalar expressions, Codegen Disabled for const scalar expressions, Codegen Disabled for const scalar expressions,...{noformat} > Redundant codegen info of HashJoinBuilder inside a subplan > ---------------------------------------------------------- > > Key: IMPALA-12204 > URL: https://issues.apache.org/jira/browse/IMPALA-12204 > Project: IMPALA > Issue Type: Bug > Components: Backend > Reporter: Quanlong Huang > Assignee: Quanlong Huang > Priority: Critical > > In query profile, the info strings of a hash join builder contains an > ExecOption that has content like "Build Side Codegen Enabled, Hash Table > Construction Codegen Enabled". When there is a HashJoin node inside a SUBPLAN > node, this string could be repeated many times since the SUBPLAN node will > open the right child many times. This could blow up the profile size. > I can reproduce this by the following query: > {code:sql} > select count(*) from > tpch_nested_parquet.customer c1, > tpch_nested_parquet.customer c2, > (select x.* from c1.c_orders x, c2.c_orders y > where x.o_orderkey = y.o_orderkey) v > where c1.c_custkey = c2.c_custkey;{code} > In the query plan, there is a HASH JOIN node inside a SUBPLAN node: > {noformat} > 08:SUBPLAN > | row-size=56B cardinality=1.50M > | > |--06:NESTED LOOP JOIN [CROSS JOIN] > | | row-size=56B cardinality=10 > | | > | |--02:SINGULAR ROW SRC > | | row-size=40B cardinality=1 > | | > | 05:HASH JOIN [INNER JOIN] > | | hash predicates: x.o_orderkey = y.o_orderkey > | | row-size=16B cardinality=10 > | | > | |--04:UNNEST [c2.c_orders y] > | | row-size=0B cardinality=10 > | | > | 03:UNNEST [c1.c_orders x] > | row-size=0B cardinality=10 > {noformat} > The query porfile has super long strings: > {noformat} > Hash Join Builder (join_node_id=5): > ExecOption: Build Side Codegen Enabled, Hash Table Construction Codegen > Enabled, Build Side Codegen Enabled, Hash Table Construction Codegen > Enabled,... > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org