Spark SQL reading json with pre-defined schema

ganesh.tiwari Tue, 10 Nov 2015 17:56:58 -0800

I have very very large json and I want to save by avoiding Spark to make scan
over data to infer the schema. Instead since I already know the data, I
would prefer to provide the schema myself with


sqlContext.read().schema(mySchema).json(jsonFilePath)

however the problem is the json data format is kind of weird

[
{
  "apiTypeName": "someApi",
  "allFieldsAndValues": {
    "Field_1": "Value",
    "Field_2": "Value",
    "Field_3": 779.0,
    "Field_4": "Value",
    "Field_5": true
  }
},
{
  "apiTypeName": "someApi",
  "allFieldsAndValues": { 
     "Field_1": "Value", 
     "Field_2": "Value", 
     "Field_3": 779.0, 
     "Field_4": "Value", 
     "Field_5": true }
}
]

I can't seem to construct a schema for this kind of data that Spark could
use to avoid inferring schema on its own.  Every which I have tried to
create schema from StructType, StructField or Array combinations to build
the schema, spark wouldn't pick it up as i intend it to


Any help is appreciated



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-reading-json-with-pre-defined-schema-tp25353.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Spark SQL reading json with pre-defined schema

Reply via email to