zhihuihong commented on issue #3676: URL: https://github.com/apache/hudi/issues/3676#issuecomment-927513816
have you tried using clustering after inserting data? my job created many 7mb files as well, and i used clustering to reorganize data layout. I don't know how to change 7mb setting as well, but clustering works. (some ratio of hoodie.parquet.max.file.size?) Below are some posts found in hudi website for your reference: https://hudi.apache.org/blog/2021/01/27/hudi-clustering-intro https://hudi.apache.org/blog/2021/08/23/async-clustering -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org