Re: Static partitioning in partitionBy()
Hi Burak, Hurray so you made finally delta open source :) I always thought of asking TD, is there any chance we could get the streaming graphs back in the SPARK UI? It will just be wonderful. Hi Shubham, there are always easier way and super fancy way to solve problems, filtering data before persisting is a simple way. Similarly handling data skew in a simple way would be by using monotonically increasing id function in spark with modulus operator. For the fancy way I am sure that someone in the world will be working for mere mortals like me :) Regards, Gourav Sengupta On Wed, May 8, 2019 at 1:41 PM Shubham Chaurasia wrote: > Thanks > > On Wed, May 8, 2019 at 10:36 AM Felix Cheung > wrote: > >> You could >> >> df.filter(col(“c”) = “c1”).write().partitionBy(“c”).save >> >> It could get some data skew problem but might work for you >> >> >> >> -- >> *From:* Burak Yavuz >> *Sent:* Tuesday, May 7, 2019 9:35:10 AM >> *To:* Shubham Chaurasia >> *Cc:* dev; user@spark.apache.org >> *Subject:* Re: Static partitioning in partitionBy() >> >> It depends on the data source. Delta Lake (https://delta.io) allows you >> to do it with the .option("replaceWhere", "c = c1"). With other file >> formats, you can write directly into the partition directory >> (tablePath/c=c1), but you lose atomicity. >> >> On Tue, May 7, 2019, 6:36 AM Shubham Chaurasia >> wrote: >> >>> Hi All, >>> >>> Is there a way I can provide static partitions in partitionBy()? >>> >>> Like: >>> >>> df.write.mode("overwrite").format("MyDataSource").partitionBy("c=c1").save >>> >>> Above code gives following error as it tries to find column `c=c1` in df. >>> >>> org.apache.spark.sql.AnalysisException: Partition column `c=c1` not >>> found in schema struct; >>> >>> Thanks, >>> Shubham >>> >>
Re: Static partitioning in partitionBy()
Thanks On Wed, May 8, 2019 at 10:36 AM Felix Cheung wrote: > You could > > df.filter(col(“c”) = “c1”).write().partitionBy(“c”).save > > It could get some data skew problem but might work for you > > > > -- > *From:* Burak Yavuz > *Sent:* Tuesday, May 7, 2019 9:35:10 AM > *To:* Shubham Chaurasia > *Cc:* dev; user@spark.apache.org > *Subject:* Re: Static partitioning in partitionBy() > > It depends on the data source. Delta Lake (https://delta.io) allows you > to do it with the .option("replaceWhere", "c = c1"). With other file > formats, you can write directly into the partition directory > (tablePath/c=c1), but you lose atomicity. > > On Tue, May 7, 2019, 6:36 AM Shubham Chaurasia > wrote: > >> Hi All, >> >> Is there a way I can provide static partitions in partitionBy()? >> >> Like: >> df.write.mode("overwrite").format("MyDataSource").partitionBy("c=c1").save >> >> Above code gives following error as it tries to find column `c=c1` in df. >> >> org.apache.spark.sql.AnalysisException: Partition column `c=c1` not found >> in schema struct; >> >> Thanks, >> Shubham >> >
Re: Static partitioning in partitionBy()
You could df.filter(col(“c”) = “c1”).write().partitionBy(“c”).save It could get some data skew problem but might work for you From: Burak Yavuz Sent: Tuesday, May 7, 2019 9:35:10 AM To: Shubham Chaurasia Cc: dev; user@spark.apache.org Subject: Re: Static partitioning in partitionBy() It depends on the data source. Delta Lake (https://delta.io) allows you to do it with the .option("replaceWhere", "c = c1"). With other file formats, you can write directly into the partition directory (tablePath/c=c1), but you lose atomicity. On Tue, May 7, 2019, 6:36 AM Shubham Chaurasia mailto:shubh.chaura...@gmail.com>> wrote: Hi All, Is there a way I can provide static partitions in partitionBy()? Like: df.write.mode("overwrite").format("MyDataSource").partitionBy("c=c1").save Above code gives following error as it tries to find column `c=c1` in df. org.apache.spark.sql.AnalysisException: Partition column `c=c1` not found in schema struct; Thanks, Shubham
Re: Static partitioning in partitionBy()
It depends on the data source. Delta Lake (https://delta.io) allows you to do it with the .option("replaceWhere", "c = c1"). With other file formats, you can write directly into the partition directory (tablePath/c=c1), but you lose atomicity. On Tue, May 7, 2019, 6:36 AM Shubham Chaurasia wrote: > Hi All, > > Is there a way I can provide static partitions in partitionBy()? > > Like: > df.write.mode("overwrite").format("MyDataSource").partitionBy("c=c1").save > > Above code gives following error as it tries to find column `c=c1` in df. > > org.apache.spark.sql.AnalysisException: Partition column `c=c1` not found > in schema struct; > > Thanks, > Shubham >
Static partitioning in partitionBy()
Hi All, Is there a way I can provide static partitions in partitionBy()? Like: df.write.mode("overwrite").format("MyDataSource").partitionBy("c=c1").save Above code gives following error as it tries to find column `c=c1` in df. org.apache.spark.sql.AnalysisException: Partition column `c=c1` not found in schema struct; Thanks, Shubham