[ 
https://issues.apache.org/jira/browse/HUDI-8555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan closed HUDI-8555.
-------------------------------------
    Resolution: Fixed

> Fix nested field col stats generation for log files 
> ----------------------------------------------------
>
>                 Key: HUDI-8555
>                 URL: https://issues.apache.org/jira/browse/HUDI-8555
>             Project: Apache Hudi
>          Issue Type: Improvement
>          Components: metadata
>            Reporter: sivabalan narayanan
>            Assignee: sivabalan narayanan
>            Priority: Blocker
>             Fix For: 1.0.0
>
>
> Out of the box, we generate col stats only for top level fields. but user 
> does have an option to overide the columns for which they need hudi to 
> generate cols stats for.
>  
> When we tested for a nested field, we realized that we have a gap here. Hudi 
> does generate col stats for base files properly even for nested fields. but 
> log files are missing to generate col stats. 
> [https://github.com/apache/hudi/blob/fa5878d9c46f5c824ae56a9ad56ef90b0bc37a19/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java#L443]
>  
> The linked code snippet will only honor top level fields. 
>  
> So, we have two fixes here. 
> Fix1: lets avoid generating stats even for base files. also throw exception 
> if someone explicitly sets a nested field with 
> "hoodie.metadata.index.column.stats.column.list". 
> Fix2: Follow up to support nested field col stats generation. 
>  
> Fix1 is a blocker for 1.0 release. May be we can punt fix 2 for later. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to