Re: Spark SQL Nested Array of JSON with empty field

2016-06-05 Thread Ewan Leith
The spark json read is unforgiving of things like missing elements from some json records, or mixed types. If you want to pass invalid json files through spark you're best doing an initial parse through the Jackson APIs using a defined schema first, then you can set types like Option[String]

Re: Spark SQL Nested Array of JSON with empty field

2016-06-03 Thread Christian Hellström
If that's your JSON file, then the first problem is that it's incorrectly formatted. Apart from that you can just read the JSON into a DataFrame with sqlContext.read.json() and then select directly on the DataFrame without having to register a temporary table: jsonDF.select("firstname",

Spark SQL Nested Array of JSON with empty field

2016-06-03 Thread Jerry Wong
Hi, I met a problem of empty field in the nested JSON file with Spark SQL. For instance, There are two lines of JSON file as follows, { "firstname": "Jack", "lastname": "Nelson", "address": { "state": "New York", "city": "New York" } }{ "firstname": "Landy", "middlename": "Ken", "lastname":