Couple of questions : 1. "sqlContext.jsonFile" reads a json file, infers the schema for the data stored, and then returns a SchemaRDD. Now, i could also create a SchemaRDD by reading a file as text(which returns RDD[String]) and then use the "jsonRDD" method. My question, is the "jsonFile" way of creating SchemaRDD slower than the second method i mentioned (maybe because jsonFile needs to infer the schema and jsonRDD just applies the schema to a dataset???)
The workflow i am thinking of is: 1. For the first data set use "jsonFile" and infer the schema. 2. Save the schema somewhere. 3. For later data sets, create RDD[String] and then use "jsonRDD" method to convert the RDD[String] to SchemaRDD. 2. What is the best way to store a schema or rather how can i serialize StructType and store it in hdfs, so that i can load it later. -- Regards Rakesh Nair