I have been using Spark SQL to read in JSON data, like so:
val myJsonFile = sqc.jsonFile(args("myLocation"))
myJsonFile.registerTempTable("myTable")
sqc.sql("mySQLQuery").map { row =>
myFunction(row)
}

And then in myFunction(row) I can read the various columns with the
Row.getX methods. However, this methods only work for basic types (string,
int, ...).
I was having some trouble reading columns that are arrays or maps (i.e.
other JSON objects).

I am now using Spark 1.2 from the Cloudera snapshot and I noticed that
there is a new method getAs. I was able to use it to read for example an
array of strings like so:
t.getAs[Buffer[CharSequence]](12)

However, if I try to read a column with a nested JSON object like this:
t.getAs[Map[String, Any]](11)

I get the following error:
java.lang.ClassCastException:
org.apache.spark.sql.catalyst.expressions.GenericRow cannot be cast to
scala.collection.immutable.Map

How can I read such a field? Am I just missing something small or should I
be looking for a completely different alternative to reading JSON?

Simone Franzini, PhD

http://www.linkedin.com/in/simonefranzini

Reply via email to