Hello guys, after upgrading spark to 1.3.0 (and performing necessary code changes) an issue appeared making me unable to handle Date fields (java.sql.Date) with Spark SQL module. An exception appears in the console when I try to execute and SQL query on a DataFrame (see below).
When I tried to examine the cause of the exception, I found out, that it happens when the framework tries to collect column statistics on DataFrames - in particular: method gatherStats in org.apache.spark.sql.columnar.DateColumnStats is inherited from IntColumnStats, handles thus the column value as Integer, which causes this kind of error. Now the question is - what is the right type for Date field in Spark SQL DataFrames? - according to documentation for org.apache.spark.sql.types.DateType, it represents java.sql.Date (which doesn't work as it worked fine before Spark 1.3.0). - JvmType in org.apache.spark.sql.types.DateType points to Int - according to implementation of JdbcRDD, it looks like they still use DateType for java.sql.Date fields, so it seems to me, that an attempt to read from JDBC table containig date fields using Spark SQL will most likely end up with an error as well So what is the type handled by org.apache.spark.sql.types.DateType? Is it Int or is it still java.sql.Date? If it is an Int - what is the exact meaning of the number and how to convert from/to Date (sql.Date, util.Date, JodaTime...)? Thank You for Your help. Best regards R.Krist Stack trace of an exception being reported since upgrade to 1.3.0: java.lang.ClassCastException: java.sql.Date cannot be cast to java.lang.Integer at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:105) ~[scala-library-2.11.6.jar:na] at org.apache.spark.sql.catalyst.expressions.GenericRow.getInt(rows.scala:83) ~[spark-catalyst_2.11-1.3.0.jar:1.3.0] at org.apache.spark.sql.columnar.IntColumnStats.gatherStats(ColumnStats.scala:191) ~[spark-sql_2.11-1.3.0.jar:1.3.0] at org.apache.spark.sql.columnar.NullableColumnBuilder$class.appendFrom(NullableColumnBuilder.scala:56) ~[spark-sql_2.11-1.3.0.jar:1.3.0] at org.apache.spark.sql.columnar.NativeColumnBuilder.org$apache$spark$sql$columnar$compression$CompressibleColumnBuilder$$super$appendFrom(ColumnBuilder.scala:87) ~[spark-sql_2.11-1.3.0.jar:1.3.0] at org.apache.spark.sql.columnar.compression.CompressibleColumnBuilder$class.appendFrom(CompressibleColumnBuilder.scala:78) ~[spark-sql_2.11-1.3.0.jar:1.3.0] at org.apache.spark.sql.columnar.NativeColumnBuilder.appendFrom(ColumnBuilder.scala:87) ~[spark-sql_2.11-1.3.0.jar:1.3.0] at org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1.next(InMemoryColumnarTableScan.scala:135) ~[spark-sql_2.11-1.3.0.jar:1.3.0] at org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1.next(InMemoryColumnarTableScan.scala:111) ~[spark-sql_2.11-1.3.0.jar:1.3.0] at org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:249) ~[spark-core_2.11-1.3.0.jar:1.3.0] at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:172) ~[spark-core_2.11-1.3.0.jar:1.3.0] at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:79) ~[spark-core_2.11-1.3.0.jar:1.3.0] at org.apache.spark.rdd.RDD.iterator(RDD.scala:242) ~[spark-core_2.11-1.3.0.jar:1.3.0] at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) ~[spark-core_2.11-1.3.0.jar:1.3.0] at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) ~[spark-core_2.11-1.3.0.jar:1.3.0] at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) ~[spark-core_2.11-1.3.0.jar:1.3.0] at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) ~[spark-core_2.11-1.3.0.jar:1.3.0] at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) ~[spark-core_2.11-1.3.0.jar:1.3.0] at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) ~[spark-core_2.11-1.3.0.jar:1.3.0] at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) ~[spark-core_2.11-1.3.0.jar:1.3.0] at org.apache.spark.scheduler.Task.run(Task.scala:64) ~[spark-core_2.11-1.3.0.jar:1.3.0] at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) ~[spark-core_2.11-1.3.0.jar:1.3.0] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_11] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_11] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_11] -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/ClassCastException-processing-date-fields-using-spark-SQL-since-1-3-0-tp22522.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org