yihua commented on PR #11579: URL: https://github.com/apache/hudi/pull/11579#issuecomment-2251769299
> I really feel we should cut down on the no of cols we generate stats out of the box. I have encountered OSS users give col stats a try and since it takes lot of time to populate col stats if their schema is wide, they give up on col stats. They don't know why its slow. just that the exp is not good, so they disable col stats and move on. This PR makes the behavior of the col_stats and partition_stats index consistent in terms of what columns to generate the index. Before this PR, if no value is specified in `hoodie.metadata.index.column.stats.column.list`, column stats of all columns are generated, while the partition stats is not generated at all. We can cut down the number of columns for generating columns stats by default. That should be tackled in a separate PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org