I am trying to convert some json logs to Parquet and save them on S3.
In principle this is just
import org.apache.spark._
val sqlContext = new sql.SQLContext(sc)
val data = sqlContext.jsonFile(s3n://source/path/*/*,10e-8)
data.registerAsTable(data)
data.saveAsParquetFile(s3n://target/path)
This
Daniel,
Currently, having Tachyon will at least help on the input part in this case.
Haoyuan
On Fri, Oct 24, 2014 at 2:01 PM, Daniel Mahler dmah...@gmail.com wrote:
I am trying to convert some json logs to Parquet and save them on S3.
In principle this is just
import org.apache.spark._
I am trying to convert some json logs to Parquet and save them on S3.
In principle this is just
import org.apache.spark._
val sqlContext = new sql.SQLContext(sc)
val data = sqlContext.jsonFile(s3n://source/path/*/*,10e-8)
data.registerAsTable(data)
data.saveAsParquetFile(s3n://target/path)
This