[ 
https://issues.apache.org/jira/browse/IMPALA-11067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker resolved IMPALA-11067.
------------------------------------
    Resolution: Fixed

> Unify struct subexpressions in rows
> -----------------------------------
>
>                 Key: IMPALA-11067
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11067
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend
>            Reporter: Daniel Becker
>            Assignee: Daniel Becker
>            Priority: Major
>              Labels: complextype, nested_types
>
> If a column is given multiple times in the select list, it is not duplicated 
> under the hood in the row because we recognise that multiple columns in the 
> result reference the same actual column, therefore the row size does not 
> increase:
>  
> {code:java}
> explain select id, outer_struct from 
> functional_orc_def.complextypes_nested_structs;
> Query: explain select id, outer_struct from 
> functional_orc_def.complextypes_nested_structs
> +---------------------------------------------------------------+
> | Explain String                                                |
> +---------------------------------------------------------------+
> | Max Per-Host Resource Reservation: Memory=4.07MB Threads=2    |
> | Per-Host Resource Estimates: Memory=20MB                      |
> | Codegen disabled by planner                                   |
> |                                                               |
> | PLAN-ROOT SINK                                                |
> | |                                                             |
> | 00:SCAN HDFS [functional_orc_def.complextypes_nested_structs] |
> |    HDFS partitions=1/1 files=1 size=1.18KB                    |
> |    row-size=64B cardinality=5                                 |
> +---------------------------------------------------------------+
> {code}
> With the id column duplicated:
>  
> {code:java}
> explain select id, id, outer_struct from 
> functional_orc_def.complextypes_nested_structs;
> Query: explain select id, id, outer_struct from 
> functional_orc_def.complextypes_nested_structs
> +---------------------------------------------------------------+
> | Explain String                                                |
> +---------------------------------------------------------------+
> | Max Per-Host Resource Reservation: Memory=4.07MB Threads=2    |
> | Per-Host Resource Estimates: Memory=20MB                      |
> | Codegen disabled by planner                                   |
> |                                                               |
> | PLAN-ROOT SINK                                                |
> | |                                                             |
> | 00:SCAN HDFS [functional_orc_def.complextypes_nested_structs] |
> |    HDFS partitions=1/1 files=1 size=1.18KB                    |
> |    row-size=64B cardinality=5                                 |
> +---------------------------------------------------------------+
> {code}
> However, if we query a struct and a subfield of the same struct, we do not 
> reuse the existing slot in the row but duplicate the subexpression, 
> increasing the row size:
>  
> {code:java}
> explain select id, outer_struct, outer_struct.inner_struct2 from 
> functional_orc_def.complextypes_nested_structs;
> Query: explain select id, outer_struct, outer_struct.inner_struct2 from 
> functional_orc_def.complextypes_nested_structs
> +---------------------------------------------------------------+
> | Explain String                                                |
> +---------------------------------------------------------------+
> | Max Per-Host Resource Reservation: Memory=4.09MB Threads=2    |
> | Per-Host Resource Estimates: Memory=20MB                      |
> | Codegen disabled by planner                                   |
> |                                                               |
> | PLAN-ROOT SINK                                                |
> | |                                                             |
> | 00:SCAN HDFS [functional_orc_def.complextypes_nested_structs] |
> |    HDFS partitions=1/1 files=1 size=1.18KB                    |
> |    row-size=80B cardinality=5                                 |
> +---------------------------------------------------------------+
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to