Re: Spark-csv- partitionBy

2016-05-10 Thread Xinh Huynh
Hi Pradeep, Here is a way to partition your data into different files, by calling repartition() on the dataframe: df.repartition(12, $"Month") .write .format(...) This is assuming you want to partition by a "month" column where there are 12 different values. Each partition will be stored in

Re: Spark-csv- partitionBy

2016-05-10 Thread Mail.com
Hi, I don't want to reduce partitions. Should write files depending upon the column value. Trying to understand how reducing partition size will make it work. Regards, Pradeep > On May 9, 2016, at 6:42 PM, Gourav Sengupta wrote: > > Hi, > > its supported, try to

Re: Spark-csv- partitionBy

2016-05-09 Thread Gourav Sengupta
Hi, its supported, try to use coalesce(1) (the spelling is wrong) and after that do the partitions. Regards, Gourav On Mon, May 9, 2016 at 7:12 PM, Mail.com wrote: > Hi, > > I have to write tab delimited file and need to have one directory for each > unique value of a

Spark-csv- partitionBy

2016-05-09 Thread Mail.com
Hi, I have to write tab delimited file and need to have one directory for each unique value of a column. I tried using spark-csv with partitionBy and seems it is not supported. Is there any other option available for doing this? Regards, Pradeep

spark-csv partitionBy

2016-02-09 Thread Srikanth
Hello, I want to save Spark job result as LZO compressed CSV files partitioned by one or more columns. Given that partitionBy is not supported by spark-csv, is there any recommendation for achieving this in user code? One quick option is to i) cache the result dataframe ii) get unique