Re: [pyspark 2.3+] Bucketing with sort - incremental data load?

2019-05-31 Thread Georg Heiler
sing date partitions, instead? >> >> >> >> *From: *Rishi Shah >> *Date: *Thursday, May 30, 2019 at 10:43 PM >> *To: *"user @spark" >> *Subject: *[pyspark 2.3+] Bucketing with sort - incremental data load? >> >> >> >> Hi A

Re: [pyspark 2.3+] Bucketing with sort - incremental data load?

2019-05-31 Thread Rishi Shah
odically running a compaction job. > > > > If you’re simply appending daily snapshots, then you could just consider > using date partitions, instead? > > > > *From: *Rishi Shah > *Date: *Thursday, May 30, 2019 at 10:43 PM > *To: *"user @spark" > *Subje

Re: [pyspark 2.3+] Bucketing with sort - incremental data load?

2019-05-31 Thread Silvio Fiorito
yspark 2.3+] Bucketing with sort - incremental data load? Hi All, Can we use bucketing with sorting functionality to save data incrementally (say daily) ? I understand bucketing is supported in Spark only with saveAsTable, however can this be used with mode "append" instead of "over

Re: [pyspark 2.3+] Bucketing with sort - incremental data load?

2019-05-31 Thread Gourav Sengupta
Hi Rishi, I think that if you are using sorting and then appending data locally there will no need to bucket data and you are good with external tables that way. Regards, Gourav On Fri, May 31, 2019 at 3:43 AM Rishi Shah wrote: > Hi All, > > Can we use bucketing with sorting functionality to

[pyspark 2.3+] Bucketing with sort - incremental data load?

2019-05-30 Thread Rishi Shah
Hi All, Can we use bucketing with sorting functionality to save data incrementally (say daily) ? I understand bucketing is supported in Spark only with saveAsTable, however can this be used with mode "append" instead of "overwrite"? My understanding around bucketing was, you need to rewrite