Hi, I am experiencing a weird error that suddenly popped up in my unit tests. I have a couple of HDFS files in JSON format and my test is basically creating a JsonRDD and then issuing a very simple SQL query over it. This used to work fine, but now suddenly I get:
15:58:49.039 [Executor task launch worker-1] ERROR executor.Executor - Exception in task 1.0 in stage 29.0 (TID 117) scala.MatchError: 14452800566866169008 (of class java.math.BigInteger) at org.apache.spark.sql.json.JsonRDD$.toLong(JsonRDD.scala:282) at org.apache.spark.sql.json.JsonRDD$.enforceCorrectType(JsonRDD.scala:353) at org.apache.spark.sql.json.JsonRDD$$anonfun$org$apache$spark$sql$json$JsonRDD$$asRow$1$$anonfun$apply$12.apply(JsonRDD.scala:381) at scala.Option.map(Option.scala:145) at org.apache.spark.sql.json.JsonRDD$$anonfun$org$apache$spark$sql$json$JsonRDD$$asRow$1.apply(JsonRDD.scala:380) at org.apache.spark.sql.json.JsonRDD$$anonfun$org$apache$spark$sql$json$JsonRDD$$asRow$1.apply(JsonRDD.scala:365) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.sql.json.JsonRDD$.org$apache$spark$sql$json$JsonRDD$$asRow(JsonRDD.scala:365) at org.apache.spark.sql.json.JsonRDD$$anonfun$jsonStringToRow$1.apply(JsonRDD.scala:38) at org.apache.spark.sql.json.JsonRDD$$anonfun$jsonStringToRow$1.apply(JsonRDD.scala:38) ... java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:745) The stack trace contains none of my classes, so it's a bit hard to track down where this starts. The code of JsonRDD.toLong is in fact private def toLong(value: Any): Long = { value match { case value: java.lang.Integer => value.asInstanceOf[Int].toLong case value: java.lang.Long => value.asInstanceOf[Long] } } so if value is a BigInteger, toLong doesn't work. Now I'm wondering where this comes from (I haven't touched this component in a while, nor upgraded Spark etc.), but in particular I would like to know how to work around this. Thanks Tobias