[ 
https://issues.apache.org/jira/browse/HUDI-4245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yue Zhang updated HUDI-4245:
----------------------------
    Fix Version/s: 0.14.0
                       (was: 0.13.1)

> Support nested fields in Column Stats Index
> -------------------------------------------
>
>                 Key: HUDI-4245
>                 URL: https://issues.apache.org/jira/browse/HUDI-4245
>             Project: Apache Hudi
>          Issue Type: Improvement
>          Components: metadata
>            Reporter: Alexey Kudinkin
>            Assignee: Alexey Kudinkin
>            Priority: Critical
>             Fix For: 0.14.0
>
>
> Currently only root-level fields are supported in the Column Stats Index, 
> while there's no reason for us not to be able to support nested fields given 
> that columnar file formats store nested fields as _nested columns,_ ie as 
> columns with a name of the field and corresponding struct it attributes to. 
>  
> For example following schema: 
> {code:java}
> c1: StringType
> c2: StructType(Seq(StructField("foo", StringType))){code}
> Would be stored in Parquet as "c1: string", "c2.foo: string", entailing that 
> Parquet actually already collects statistics for all the nested fields and we 
> just need to make sure we're propagating them into Column Stats Index
>  
> Original GH issue:
> [https://github.com/apache/hudi/issues/5804#issuecomment-1152983029]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to