subject:"Spark \- HiveContext \- Unstructured Json"

Re: Spark - HiveContext - Unstructured Json

2014-10-22 Thread Harivardan Jayaraman

For me inference is not an issue as compared to persistence. Imagine a Streaming application where the input is JSON whose format can vary from row to row and whose format I cannot pre-determine. I can use `sqlContext.jsonRDD` , but once I have the `SchemaRDD`, there is no way for me to update the

Re: Spark - HiveContext - Unstructured Json

2014-10-21 Thread Cheng Lian

You can resort to |SQLContext.jsonFile(path: String, samplingRate: Double)| and set |samplingRate| to 1.0, so that all the columns can be inferred. You can also use |SQLContext.applySchema| to specify your own schema (which is a |StructType|). On 10/22/14 5:56 AM, Harivardan Jayaraman