Spark with Parquet
Hi All, I want to store a csv-text file in Parquet format in HDFS and then do some processing in Spark. Somehow my search to find the way to do was futile. More help was available for parquet with impala. Any guidance here? Thanks !!
Re: Spark with Parquet
Spark uses the Hadoop InputFormat and OutputFormat classes, so you can simply create a JobConf to read the data and pass that to SparkContext.hadoopFile. There are some examples for Parquet usage here: http://zenfractal.com/2013/08/21/a-powerful-big-data-trio/ and here: http://engineering.ooyala.com/blog/using-parquet-and-scrooge-spark. Matei On Apr 27, 2014, at 11:41 PM, Sai Prasanna wrote: > Hi All, > > I want to store a csv-text file in Parquet format in HDFS and then do some > processing in Spark. > > Somehow my search to find the way to do was futile. More help was available > for parquet with impala. > > Any guidance here? Thanks !! >
Re: Spark with Parquet
Create a hive table x Load your csv data in table x (LOAD DATA INPATH 'file/path' INTO TABLE x;) create hive table y with same structure as x except add STORED AS PARQUET; INSERT OVERWRITE TABLE y SELECT * FROM x; This would get you parquet files under /user/hive/warehouse/y (as an example) you can use this file path for your processing... -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-with-Parquet-tp4923p27584.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Spark with Parquet
something like this should work…. val df = sparkSession.read.csv(“myfile.csv”) //you may have to provide a schema if the guessed schema is not accurate df.write.parquet(“myfile.parquet”) Mohit Jaggi Founder, Data Orchard LLC www.dataorchardllc.com > On Apr 27, 2014, at 11:41 PM, Sai Prasanna wrote: > > Hi All, > > I want to store a csv-text file in Parquet format in HDFS and then do some > processing in Spark. > > Somehow my search to find the way to do was futile. More help was available > for parquet with impala. > > Any guidance here? Thanks !! >