Hello guys,

after upgrading spark to 1.3.0 (and performing necessary code changes) an
issue appeared making me unable to handle Date fields (java.sql.Date) with
Spark SQL module. An exception appears in the console when I try to execute
and SQL query on a DataFrame (see below).

When I tried to examine the cause of the exception, I found out, that it
happens when the framework tries to collect column statistics on DataFrames
- in particular:

method gatherStats in org.apache.spark.sql.columnar.DateColumnStats is
inherited from IntColumnStats, handles thus the column value as Integer,
which causes this kind of error. 
Now the question is - what is the right type for Date field in Spark SQL
DataFrames? 
- according to documentation for org.apache.spark.sql.types.DateType, it
represents java.sql.Date (which doesn't work as it worked fine before Spark
1.3.0). 
- JvmType in org.apache.spark.sql.types.DateType points to Int
- according to implementation of JdbcRDD, it looks like they still use
DateType for java.sql.Date fields, so it seems to me, that an attempt to
read from JDBC table containig date fields using Spark SQL will most likely
end up with an error as well

So what is the type handled by org.apache.spark.sql.types.DateType? Is it
Int or is it still java.sql.Date? If it is an Int - what is the exact
meaning of the number and how to convert from/to Date (sql.Date, util.Date,
JodaTime...)?

Thank You for Your help.

Best regards

R.Krist


Stack trace of an exception being reported since upgrade to 1.3.0:
java.lang.ClassCastException: java.sql.Date cannot be cast to
java.lang.Integer
        at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:105)
~[scala-library-2.11.6.jar:na]
        at
org.apache.spark.sql.catalyst.expressions.GenericRow.getInt(rows.scala:83)
~[spark-catalyst_2.11-1.3.0.jar:1.3.0]
        at
org.apache.spark.sql.columnar.IntColumnStats.gatherStats(ColumnStats.scala:191)
~[spark-sql_2.11-1.3.0.jar:1.3.0]
        at
org.apache.spark.sql.columnar.NullableColumnBuilder$class.appendFrom(NullableColumnBuilder.scala:56)
~[spark-sql_2.11-1.3.0.jar:1.3.0]
        at
org.apache.spark.sql.columnar.NativeColumnBuilder.org$apache$spark$sql$columnar$compression$CompressibleColumnBuilder$$super$appendFrom(ColumnBuilder.scala:87)
~[spark-sql_2.11-1.3.0.jar:1.3.0]
        at
org.apache.spark.sql.columnar.compression.CompressibleColumnBuilder$class.appendFrom(CompressibleColumnBuilder.scala:78)
~[spark-sql_2.11-1.3.0.jar:1.3.0]
        at
org.apache.spark.sql.columnar.NativeColumnBuilder.appendFrom(ColumnBuilder.scala:87)
~[spark-sql_2.11-1.3.0.jar:1.3.0]
        at
org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1.next(InMemoryColumnarTableScan.scala:135)
~[spark-sql_2.11-1.3.0.jar:1.3.0]
        at
org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1.next(InMemoryColumnarTableScan.scala:111)
~[spark-sql_2.11-1.3.0.jar:1.3.0]
        at 
org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:249)
~[spark-core_2.11-1.3.0.jar:1.3.0]
        at 
org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:172)
~[spark-core_2.11-1.3.0.jar:1.3.0]
        at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:79)
~[spark-core_2.11-1.3.0.jar:1.3.0]
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:242)
~[spark-core_2.11-1.3.0.jar:1.3.0]
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
~[spark-core_2.11-1.3.0.jar:1.3.0]
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
~[spark-core_2.11-1.3.0.jar:1.3.0]
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
~[spark-core_2.11-1.3.0.jar:1.3.0]
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
~[spark-core_2.11-1.3.0.jar:1.3.0]
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
~[spark-core_2.11-1.3.0.jar:1.3.0]
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
~[spark-core_2.11-1.3.0.jar:1.3.0]
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
~[spark-core_2.11-1.3.0.jar:1.3.0]
        at org.apache.spark.scheduler.Task.run(Task.scala:64)
~[spark-core_2.11-1.3.0.jar:1.3.0]
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
~[spark-core_2.11-1.3.0.jar:1.3.0]
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[na:1.8.0_11]
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[na:1.8.0_11]
        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_11]



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/ClassCastException-processing-date-fields-using-spark-SQL-since-1-3-0-tp22522.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to