Hi Pradeep,
Here is a way to partition your data into different files, by calling
repartition() on the dataframe:
df.repartition(12, $"Month")
.write
.format(...)
This is assuming you want to partition by a "month" column where there are
12 different values. Each partition will be stored in
Hi,
I don't want to reduce partitions. Should write files depending upon the column
value.
Trying to understand how reducing partition size will make it work.
Regards,
Pradeep
> On May 9, 2016, at 6:42 PM, Gourav Sengupta wrote:
>
> Hi,
>
> its supported, try to
Hi,
its supported, try to use coalesce(1) (the spelling is wrong) and after
that do the partitions.
Regards,
Gourav
On Mon, May 9, 2016 at 7:12 PM, Mail.com wrote:
> Hi,
>
> I have to write tab delimited file and need to have one directory for each
> unique value of a
Hi,
I have to write tab delimited file and need to have one directory for each
unique value of a column.
I tried using spark-csv with partitionBy and seems it is not supported. Is
there any other option available for doing this?
Regards,
Pradeep
Hello,
I want to save Spark job result as LZO compressed CSV files partitioned by
one or more columns.
Given that partitionBy is not supported by spark-csv, is there any
recommendation for achieving this in user code?
One quick option is to
i) cache the result dataframe
ii) get unique