Fwd: Saving very large data sets as Parquet on S3

2014-10-24 Thread Daniel Mahler
I am trying to convert some json logs to Parquet and save them on S3. In principle this is just import org.apache.spark._ val sqlContext = new sql.SQLContext(sc) val data = sqlContext.jsonFile(s3n://source/path/*/*,10e-8) data.registerAsTable(data) data.saveAsParquetFile(s3n://target/path) This

Re: Saving very large data sets as Parquet on S3

2014-10-24 Thread Haoyuan Li
Daniel, Currently, having Tachyon will at least help on the input part in this case. Haoyuan On Fri, Oct 24, 2014 at 2:01 PM, Daniel Mahler dmah...@gmail.com wrote: I am trying to convert some json logs to Parquet and save them on S3. In principle this is just import org.apache.spark._

Saving very large data sets as Parquet on S3

2014-10-20 Thread Daniel Mahler
I am trying to convert some json logs to Parquet and save them on S3. In principle this is just import org.apache.spark._ val sqlContext = new sql.SQLContext(sc) val data = sqlContext.jsonFile(s3n://source/path/*/*,10e-8) data.registerAsTable(data) data.saveAsParquetFile(s3n://target/path) This