Spark does allow appending new files to bucketed tables. When the data is read in, Spark will combine the multiple files belonging to the same buckets into the same partitions.
Having said that, you need to be very careful with bucketing especially as you’re appending to avoid generating lots of small files. So, you may need to consider periodically running a compaction job. If you’re simply appending daily snapshots, then you could just consider using date partitions, instead? From: Rishi Shah <rishishah.s...@gmail.com> Date: Thursday, May 30, 2019 at 10:43 PM To: "user @spark" <user@spark.apache.org> Subject: [pyspark 2.3+] Bucketing with sort - incremental data load? Hi All, Can we use bucketing with sorting functionality to save data incrementally (say daily) ? I understand bucketing is supported in Spark only with saveAsTable, however can this be used with mode "append" instead of "overwrite"? My understanding around bucketing was, you need to rewrite entire table every time, can someone help advice? -- Regards, Rishi Shah