Re: SparkSQL: Reading data from hdfs and storing into multiple paths

2015-10-02 Thread Michael Armbrust
Once you convert your data to a dataframe (look at spark-csv), try df.write.partitionBy("", "mm").save("..."). On Thu, Oct 1, 2015 at 4:11 PM, haridass saisriram < haridass.saisri...@gmail.com> wrote: > Hi, > > I am trying to find a simple example to read a data file on HDFS. The > file

SparkSQL: Reading data from hdfs and storing into multiple paths

2015-10-01 Thread haridass saisriram
Hi, I am trying to find a simple example to read a data file on HDFS. The file has the following format a , b , c ,,mm a1,b1,c1,2015,09 a2,b2,c2,2014,08 I would like to read this file and store it in HDFS partitioned by year and month. Something like this /path/to/hdfs//mm I want to