Re: Static partitioning in partitionBy()

2019-05-08 Thread Gourav Sengupta
Hi Burak,
Hurray so you made finally delta open source :)
I always thought of asking TD, is there any chance we could get the
streaming graphs back in the SPARK UI? It will just be wonderful.

Hi Shubham,
there are always easier way and super fancy way to solve problems,
filtering data before persisting is a simple way. Similarly handling data
skew in a simple way would be by using monotonically increasing id function
in spark with modulus operator. For the fancy way I am sure that someone in
the world will be working for mere mortals like me :)


Regards,
Gourav Sengupta





On Wed, May 8, 2019 at 1:41 PM Shubham Chaurasia 
wrote:

> Thanks
>
> On Wed, May 8, 2019 at 10:36 AM Felix Cheung 
> wrote:
>
>> You could
>>
>> df.filter(col(“c”) = “c1”).write().partitionBy(“c”).save
>>
>> It could get some data skew problem but might work for you
>>
>>
>>
>> --
>> *From:* Burak Yavuz 
>> *Sent:* Tuesday, May 7, 2019 9:35:10 AM
>> *To:* Shubham Chaurasia
>> *Cc:* dev; user@spark.apache.org
>> *Subject:* Re: Static partitioning in partitionBy()
>>
>> It depends on the data source. Delta Lake (https://delta.io) allows you
>> to do it with the .option("replaceWhere", "c = c1"). With other file
>> formats, you can write directly into the partition directory
>> (tablePath/c=c1), but you lose atomicity.
>>
>> On Tue, May 7, 2019, 6:36 AM Shubham Chaurasia 
>> wrote:
>>
>>> Hi All,
>>>
>>> Is there a way I can provide static partitions in partitionBy()?
>>>
>>> Like:
>>>
>>> df.write.mode("overwrite").format("MyDataSource").partitionBy("c=c1").save
>>>
>>> Above code gives following error as it tries to find column `c=c1` in df.
>>>
>>> org.apache.spark.sql.AnalysisException: Partition column `c=c1` not
>>> found in schema struct;
>>>
>>> Thanks,
>>> Shubham
>>>
>>


Re: Static partitioning in partitionBy()

2019-05-08 Thread Shubham Chaurasia
Thanks

On Wed, May 8, 2019 at 10:36 AM Felix Cheung 
wrote:

> You could
>
> df.filter(col(“c”) = “c1”).write().partitionBy(“c”).save
>
> It could get some data skew problem but might work for you
>
>
>
> --
> *From:* Burak Yavuz 
> *Sent:* Tuesday, May 7, 2019 9:35:10 AM
> *To:* Shubham Chaurasia
> *Cc:* dev; user@spark.apache.org
> *Subject:* Re: Static partitioning in partitionBy()
>
> It depends on the data source. Delta Lake (https://delta.io) allows you
> to do it with the .option("replaceWhere", "c = c1"). With other file
> formats, you can write directly into the partition directory
> (tablePath/c=c1), but you lose atomicity.
>
> On Tue, May 7, 2019, 6:36 AM Shubham Chaurasia 
> wrote:
>
>> Hi All,
>>
>> Is there a way I can provide static partitions in partitionBy()?
>>
>> Like:
>> df.write.mode("overwrite").format("MyDataSource").partitionBy("c=c1").save
>>
>> Above code gives following error as it tries to find column `c=c1` in df.
>>
>> org.apache.spark.sql.AnalysisException: Partition column `c=c1` not found
>> in schema struct;
>>
>> Thanks,
>> Shubham
>>
>


Re: Static partitioning in partitionBy()

2019-05-07 Thread Felix Cheung
You could

df.filter(col(“c”) = “c1”).write().partitionBy(“c”).save

It could get some data skew problem but might work for you




From: Burak Yavuz 
Sent: Tuesday, May 7, 2019 9:35:10 AM
To: Shubham Chaurasia
Cc: dev; user@spark.apache.org
Subject: Re: Static partitioning in partitionBy()

It depends on the data source. Delta Lake (https://delta.io) allows you to do 
it with the .option("replaceWhere", "c = c1"). With other file formats, you can 
write directly into the partition directory (tablePath/c=c1), but you lose 
atomicity.

On Tue, May 7, 2019, 6:36 AM Shubham Chaurasia 
mailto:shubh.chaura...@gmail.com>> wrote:
Hi All,

Is there a way I can provide static partitions in partitionBy()?

Like:
df.write.mode("overwrite").format("MyDataSource").partitionBy("c=c1").save

Above code gives following error as it tries to find column `c=c1` in df.

org.apache.spark.sql.AnalysisException: Partition column `c=c1` not found in 
schema struct;

Thanks,
Shubham


Re: Static partitioning in partitionBy()

2019-05-07 Thread Burak Yavuz
It depends on the data source. Delta Lake (https://delta.io) allows you to
do it with the .option("replaceWhere", "c = c1"). With other file formats,
you can write directly into the partition directory (tablePath/c=c1), but
you lose atomicity.

On Tue, May 7, 2019, 6:36 AM Shubham Chaurasia 
wrote:

> Hi All,
>
> Is there a way I can provide static partitions in partitionBy()?
>
> Like:
> df.write.mode("overwrite").format("MyDataSource").partitionBy("c=c1").save
>
> Above code gives following error as it tries to find column `c=c1` in df.
>
> org.apache.spark.sql.AnalysisException: Partition column `c=c1` not found
> in schema struct;
>
> Thanks,
> Shubham
>


Static partitioning in partitionBy()

2019-05-07 Thread Shubham Chaurasia
Hi All,

Is there a way I can provide static partitions in partitionBy()?

Like:
df.write.mode("overwrite").format("MyDataSource").partitionBy("c=c1").save

Above code gives following error as it tries to find column `c=c1` in df.

org.apache.spark.sql.AnalysisException: Partition column `c=c1` not found
in schema struct;

Thanks,
Shubham