Re: Static partitioning in partitionBy()
Thanks On Wed, May 8, 2019 at 10:36 AM Felix Cheung wrote: > You could > > df.filter(col(“c”) = “c1”).write().partitionBy(“c”).save > > It could get some data skew problem but might work for you > > > > -- > *From:* Burak Yavuz > *Sent:* Tuesday, May 7, 2019 9:35:10 AM > *To:* Shubham Chaurasia > *Cc:* dev; u...@spark.apache.org > *Subject:* Re: Static partitioning in partitionBy() > > It depends on the data source. Delta Lake (https://delta.io) allows you > to do it with the .option("replaceWhere", "c = c1"). With other file > formats, you can write directly into the partition directory > (tablePath/c=c1), but you lose atomicity. > > On Tue, May 7, 2019, 6:36 AM Shubham Chaurasia > wrote: > >> Hi All, >> >> Is there a way I can provide static partitions in partitionBy()? >> >> Like: >> df.write.mode("overwrite").format("MyDataSource").partitionBy("c=c1").save >> >> Above code gives following error as it tries to find column `c=c1` in df. >> >> org.apache.spark.sql.AnalysisException: Partition column `c=c1` not found >> in schema struct; >> >> Thanks, >> Shubham >> >
Re: Static partitioning in partitionBy()
You could df.filter(col(“c”) = “c1”).write().partitionBy(“c”).save It could get some data skew problem but might work for you From: Burak Yavuz Sent: Tuesday, May 7, 2019 9:35:10 AM To: Shubham Chaurasia Cc: dev; u...@spark.apache.org Subject: Re: Static partitioning in partitionBy() It depends on the data source. Delta Lake (https://delta.io) allows you to do it with the .option("replaceWhere", "c = c1"). With other file formats, you can write directly into the partition directory (tablePath/c=c1), but you lose atomicity. On Tue, May 7, 2019, 6:36 AM Shubham Chaurasia mailto:shubh.chaura...@gmail.com>> wrote: Hi All, Is there a way I can provide static partitions in partitionBy()? Like: df.write.mode("overwrite").format("MyDataSource").partitionBy("c=c1").save Above code gives following error as it tries to find column `c=c1` in df. org.apache.spark.sql.AnalysisException: Partition column `c=c1` not found in schema struct; Thanks, Shubham
Re: Static partitioning in partitionBy()
It depends on the data source. Delta Lake (https://delta.io) allows you to do it with the .option("replaceWhere", "c = c1"). With other file formats, you can write directly into the partition directory (tablePath/c=c1), but you lose atomicity. On Tue, May 7, 2019, 6:36 AM Shubham Chaurasia wrote: > Hi All, > > Is there a way I can provide static partitions in partitionBy()? > > Like: > df.write.mode("overwrite").format("MyDataSource").partitionBy("c=c1").save > > Above code gives following error as it tries to find column `c=c1` in df. > > org.apache.spark.sql.AnalysisException: Partition column `c=c1` not found > in schema struct; > > Thanks, > Shubham >
Static partitioning in partitionBy()
Hi All, Is there a way I can provide static partitions in partitionBy()? Like: df.write.mode("overwrite").format("MyDataSource").partitionBy("c=c1").save Above code gives following error as it tries to find column `c=c1` in df. org.apache.spark.sql.AnalysisException: Partition column `c=c1` not found in schema struct; Thanks, Shubham