This is no longer accurate, since now we do have a "fan-out" writer for spark. But originally the idea here is that it is way more efficient to open a single file handle at a time and write to it, than to open a new file handle for every file as we find a new partition to write to in the same spark task. Fanout performs the write as just opening each handle as the writer sees a new partition.
Now that said, this is a local required sort for the default writer. For best performance though in making as few files as possible using write distribution mode "Hash" will force a real shuffle but eliminate this issue by making sure each spark task is writing to a single or single set of Partitions in order. We need to update this document to talk about distribution modes, especially since hash will be the new default soon and this information is basically for manual tuning only. If your data is already organized the way you want, setting distribution mode to none will avoid this shuffle. If you don't care about multiple file handles being open at the same time, you can set the fanout writer option. With "none" and "fan-out" writers you will basically write in the fastest way possible at the expense of memory at write time and possibly generating many files if your data isn't organized. On Tue, Mar 7, 2023 at 9:46 PM Manu Zhang <[email protected]> wrote: > Hi all, > > As per > https://iceberg.apache.org/docs/latest/spark-writes/#writing-to-partitioned-tables, > sort is required for Spark writing to a partitioned table. Does anyone know > the reason behind it? If this is to avoid creating too many small files, > isn't shuffle/repartition sufficient? > > Thanks, > Manu > >
