Re: org.apache.spark.sql.types.GenericArrayData cannot be cast to org.apache.spark.sql.catalyst.InternalRow

2016-03-07 Thread dmt
Is there a workaround ? My dataset contains billions of rows, and it would be nice to ignore/exclude the few lines that are badly formatted. -- View this message in context:

Re: org.apache.spark.sql.types.GenericArrayData cannot be cast to org.apache.spark.sql.catalyst.InternalRow

2016-03-07 Thread dmt
I have found why the exception is raised. I have defined a JSON schema, using org.apache.spark.sql.types.StructType, that expects this kind of record : /{ "request": { "user": { "id": 123 } } }/ There's a bad record in my dataset, that defines field "user" as an array, instead

org.apache.spark.sql.types.GenericArrayData cannot be cast to org.apache.spark.sql.catalyst.InternalRow

2016-03-02 Thread dmt
): java.lang.ClassCastException: org.apache.spark.sql.types.GenericArrayData cannot be cast to org.apache.spark.sql.catalyst.InternalRow at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getStruct(rows.scala:50