Is there a recommended way of achieving close to a pre-determined size of the files in the partitions, say by byte-size or the number of rows or should we use repartition and partitionBy while saving in Iceberg format?
For my test, I saved about 750 GB data using the hourly partition spec and the same using daily partition spec. Hourly partition: total 2404K Files, avg size 297KB, with 3.4K files per partition. Daily Partition: total 618K files, avg size 1.2 MB, 20K files per partition. I would like to go with Hourly Partition to support queries but is there a way to reduce the number of files in a partition ? I understand each file would be bigger in size. If we could increase the file size to even 1 GB each, that would reduce the S3 query costs as they are based on # of S3 requests. thanks Sandeep -- The information contained in this email may be confidential. It has been sent for the sole use of the intended recipient(s). If the reader of this email is not an intended recipient, you are hereby notified that any unauthorized review, use, disclosure, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this email in error, please notify the sender immediately and destroy all copies of the message.
