Re: org.apache.spark.sql.types.GenericArrayData cannot be cast to org.apache.spark.sql.catalyst.InternalRow

2016-03-07 Thread dmt
Is there a workaround ? My dataset contains billions of rows, and it would be nice to ignore/exclude the few lines that are badly formatted. -- View this message in context:

Re: org.apache.spark.sql.types.GenericArrayData cannot be cast to org.apache.spark.sql.catalyst.InternalRow

2016-03-07 Thread dmt
I have found why the exception is raised. I have defined a JSON schema, using org.apache.spark.sql.types.StructType, that expects this kind of record : /{ "request": { "user": { "id": 123 } } }/ There's a bad record in my dataset, that defines field "user" as an array, instead

org.apache.spark.sql.types.GenericArrayData cannot be cast to org.apache.spark.sql.catalyst.InternalRow

2016-03-02 Thread dmt
Hi, the following error is raised using Spark 1.5.2 or 1.6.0, in stand alone mode, on my computer. Has anyone had the same problem, and do you know what might cause this exception ? Thanks in advance. /16/03/02 15:12:27 WARN TaskSetManager: Lost task 9.0 in stage 0.0 (TID 9, 192.168.1.36):