Couple of questions :
1. "sqlContext.jsonFile" reads a json file, infers the schema for the data
stored, and then returns a SchemaRDD. Now, i could also create a SchemaRDD
by reading a file as text(which returns RDD[String]) and then use the
"jsonRDD" method. My question, is the "jsonFile" way of creating SchemaRDD
slower than the second method i mentioned (maybe because jsonFile needs to
infer the schema and jsonRDD just applies the schema to a dataset???)

 The workflow i am thinking of is: 1. For the first data set use "jsonFile"
and infer the schema. 2. Save the schema somewhere. 3. For later data sets,
create RDD[String] and then use "jsonRDD" method to convert the RDD[String]
to SchemaRDD.

2. What is the best way to store a schema or rather how can i serialize
StructType and store it in hdfs, so that i can load it later.

Rakesh Nair

Reply via email to