yihua commented on PR #11579:
URL: https://github.com/apache/hudi/pull/11579#issuecomment-2251769299

   > I really feel we should cut down on the no of cols we generate stats out 
of the box. I have encountered OSS users give col stats a try and since it 
takes lot of time to populate col stats if their schema is wide, they give up 
on col stats. They don't know why its slow. just that the exp is not good, so 
they disable col stats and move on.
   
   This PR makes the behavior of the col_stats and partition_stats index 
consistent in terms of what columns to generate the index. Before this PR, if 
no value is specified in `hoodie.metadata.index.column.stats.column.list`, 
column stats of all columns are generated, while the partition stats is not 
generated at all.
   
   We can cut down the number of columns for generating columns stats by 
default.  That should be tackled in a separate PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to