[ 
https://issues.apache.org/jira/browse/IMPALA-12204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17731438#comment-17731438
 ] 

Quanlong Huang commented on IMPALA-12204:
-----------------------------------------

Go through all other Open methods and find out that UnionNode::Open() could 
also append strings repeatedly to the profile. Code snipper:
{code:cpp}
Status UnionNode::Open(RuntimeState* state) {
  ...
  if (is_codegen_status_added_ && num_const_scalar_expr_to_be_codegened_ == 0
      && !const_exprs_lists_.empty()) {
    runtime_profile_->AppendExecOption("Codegen Disabled for const scalar 
expressions");
  }
  return Status::OK();
} {code}
The following query will hit the issue:
{code:sql}
select count(*) from
  tpch_nested_parquet.customer c1,
  tpch_nested_parquet.customer c2,
  (
    select x.o_orderkey from c1.c_orders x
    union all
    select y.o_orderkey from c2.c_orders y
    union all
    select 100
  ) v
where c1.c_custkey = c2.c_custkey;{code}
A UnionNode is inside the subplan:
{code:sql}
08:SUBPLAN
|  row-size=40B cardinality=3.00M
|
|--06:NESTED LOOP JOIN [CROSS JOIN]
|  |  row-size=40B cardinality=20
|  |
|  |--02:SINGULAR ROW SRC
|  |     row-size=40B cardinality=1
|  |
|  03:UNION
|  |  row-size=0B cardinality=20
|  |
|  |--05:UNNEST [c2.c_orders y]
|  |     row-size=0B cardinality=10
|  |
|  04:UNNEST [c1.c_orders x]
|     row-size=0B cardinality=10 {code}
Saw repeated strings of "Codegen Disabled for const scalar expressions" in 
profile:
{noformat}
UNION_NODE (id=3):
  ExecOption: Codegen Disabled for const scalar expressions, Codegen Disabled 
for const scalar expressions, Codegen Disabled for const scalar expressions, 
Codegen Disabled for const scalar expressions,...{noformat}

> Redundant codegen info of HashJoinBuilder inside a subplan
> ----------------------------------------------------------
>
>                 Key: IMPALA-12204
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12204
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Critical
>
> In query profile, the info strings of a hash join builder contains an 
> ExecOption that has content like "Build Side Codegen Enabled, Hash Table 
> Construction Codegen Enabled". When there is a HashJoin node inside a SUBPLAN 
> node, this string could be repeated many times since the SUBPLAN node will 
> open the right child many times. This could blow up the profile size.
> I can reproduce this by the following query:
> {code:sql}
> select count(*) from
>   tpch_nested_parquet.customer c1,
>   tpch_nested_parquet.customer c2,
>   (select x.* from c1.c_orders x, c2.c_orders y
>   where x.o_orderkey = y.o_orderkey) v
> where c1.c_custkey = c2.c_custkey;{code}
> In the query plan, there is a HASH JOIN node inside a SUBPLAN node:
> {noformat}
> 08:SUBPLAN
> |  row-size=56B cardinality=1.50M
> |
> |--06:NESTED LOOP JOIN [CROSS JOIN]
> |  |  row-size=56B cardinality=10
> |  |
> |  |--02:SINGULAR ROW SRC
> |  |     row-size=40B cardinality=1
> |  |
> |  05:HASH JOIN [INNER JOIN]
> |  |  hash predicates: x.o_orderkey = y.o_orderkey
> |  |  row-size=16B cardinality=10
> |  |
> |  |--04:UNNEST [c2.c_orders y]
> |  |     row-size=0B cardinality=10
> |  |
> |  03:UNNEST [c1.c_orders x]
> |     row-size=0B cardinality=10
>  {noformat}
> The query porfile has super long strings:
> {noformat}
> Hash Join Builder (join_node_id=5):
>   ExecOption: Build Side Codegen Enabled, Hash Table Construction Codegen 
> Enabled, Build Side Codegen Enabled, Hash Table Construction Codegen 
> Enabled,...
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to