Is there a recommended way of achieving  close to a pre-determined size of
the files in the partitions, say by byte-size or the number of rows or
should we use repartition and partitionBy while saving in Iceberg format?

For my test, I saved about 750 GB data using the hourly partition spec and
the same using daily partition spec.

Hourly partition:  total 2404K Files, avg size 297KB, with 3.4K files per
partition.
Daily Partition:  total 618K files, avg size 1.2 MB, 20K files per
partition.

I would like to go with Hourly Partition to support queries but is there a
way to reduce the number of files in a partition ? I understand each file
would be bigger in size. If we could increase the file size to even 1 GB
each, that would reduce the S3 query costs as they are based on # of S3
requests.

thanks
Sandeep

-- 
The
 information contained in this email may be confidential. It has been 

sent for the sole use of the intended recipient(s). If the
reader of this 
email is not an intended recipient, you are hereby 
notified that any 
unauthorized review, use, disclosure, dissemination, 
distribution, or 
copying of this message is strictly prohibited. If you 
have received this 
email in error, please notify
the sender immediately and destroy all copies 
of the message.

Reply via email to