[ https://issues.apache.org/jira/browse/HUDI-4245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alexey Kudinkin updated HUDI-4245: ---------------------------------- Priority: Critical (was: Blocker) > Support nested fields in Column Stats Index > ------------------------------------------- > > Key: HUDI-4245 > URL: https://issues.apache.org/jira/browse/HUDI-4245 > Project: Apache Hudi > Issue Type: Improvement > Reporter: Alexey Kudinkin > Assignee: Alexey Kudinkin > Priority: Critical > Fix For: 0.13.0 > > > Currently only root-level fields are supported in the Column Stats Index, > while there's no reason for us not to be able to support nested fields given > that columnar file formats store nested fields as _nested columns,_ ie as > columns with a name of the field and corresponding struct it attributes to. > > For example following schema: > {code:java} > c1: StringType > c2: StructType(Seq(StructField("foo", StringType))){code} > Would be stored in Parquet as "c1: string", "c2.foo: string", entailing that > Parquet actually already collects statistics for all the nested fields and we > just need to make sure we're propagating them into Column Stats Index > > Original GH issue: > [https://github.com/apache/hudi/issues/5804#issuecomment-1152983029] -- This message was sent by Atlassian Jira (v8.20.10#820010)