Hi Chen, Thanks for looking into this. It looks like the bug may be in the Spark Cassandra connector code. Table x is a table in Cassandra.
However, while trying to troubleshoot this issue, I noticed another issue. This time I did not use Cassandra; instead created a table on the fly. I am not seeing the same issue, but the results do not like right. Here is a my complete Spark-shell session: Spark assembly has been built with Hive, including Datanucleus jars on classpath Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 1.1.0 /_/ Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_67) Type in expressions to have them evaluated. Type :help for more information. 14/10/10 11:05:11 WARN Utils: Your hostname, ubuntu resolves to a loopback address: 127.0.1.1; using 192.168.59.135 instead (on interface eth0) 14/10/10 11:05:11 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address 14/10/10 11:05:14 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Spark context available as sc. scala> import org.apache.spark.sql._ import org.apache.spark.sql._ scala> val sqlContext = new SQLContext(sc) sqlContext: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@2be5c74d scala> import sqlContext.createSchemaRDD import sqlContext.createSchemaRDD scala> case class X(a: Int, ts: java.sql.Timestamp) defined class X scala> val rdd = sc.parallelize( 1 to 5).map{ n => X(n, new java.sql.Timestamp(1325548800000L + n*86400000))} rdd: org.apache.spark.rdd.RDD[X] = MappedRDD[1] at map at <console>:20 scala> rdd.collect res0: Array[X] = Array(X(1,2012-01-03 16:00:00.0), X(2,2012-01-04 16:00:00.0), X(3,2012-01-05 16:00:00.0), X(4,2012-01-06 16:00:00.0), X(5,2012-01-07 16:00:00.0)) scala> rdd.registerTempTable("x") scala> val sRdd = sqlContext.sql("select a from x where ts >= '2012-01-01T00:00:00';") sRdd: org.apache.spark.sql.SchemaRDD = SchemaRDD[4] at RDD at SchemaRDD.scala:103 == Query Plan == == Physical Plan == Project [a#0] ExistingRdd [a#0,ts#1], MapPartitionsRDD[6] at mapPartitions at basicOperators.scala:208 scala> sRdd.collect res2: Array[org.apache.spark.sql.Row] = Array() Mohammed From: Cheng Lian [mailto:lian.cs....@gmail.com] Sent: Friday, October 10, 2014 4:37 AM To: Mohammed Guller; user@spark.apache.org Subject: Re: Spark SQL parser bug? Hi Mohammed, Would you mind to share the DDL of the table x and the complete stacktrace of the exception you got? A full Spark shell session history would be more than helpful. PR #2084 had been merged in master in Aug, and timestamp type is supported in 1.1. I tried the following snippets in Spark shell (v1.1), and didn’t observe this issue: scala> import org.apache.spark.sql._ import org.apache.spark.sql._ scala> import sc._ import sc._ scala> val sqlContext = new SQLContext(sc) sqlContext: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@6c3441c5<mailto:org.apache.spark.sql.SQLContext@6c3441c5> scala> import sqlContext._ import sqlContext._ scala> case class Record(a: Int, ts: java.sql.Timestamp) defined class Record scala> makeRDD(Seq.empty[Record], 1).registerTempTable("x") scala> sql("SELECT a FROM x WHERE ts >= '2012-01-01T00:00:00' AND ts <= '2012-03-31T23:59:59'") res1: org.apache.spark.sql.SchemaRDD = SchemaRDD[3] at RDD at SchemaRDD.scala:103 == Query Plan == == Physical Plan == Project [a#0] ExistingRdd [a#0,ts#1], MapPartitionsRDD[5] at mapPartitions at basicOperators.scala:208 scala> res1.collect() ... res2: Array[org.apache.spark.sql.Row] = Array() Cheng On 10/9/14 10:26 AM, Mohammed Guller wrote: Hi – When I run the following Spark SQL query in Spark-shell ( version 1.1.0) : val rdd = sqlContext.sql("SELECT a FROM x WHERE ts >= '2012-01-01T00:00:00' AND ts <= '2012-03-31T23:59:59' ") it gives the following error: rdd: org.apache.spark.sql.SchemaRDD = SchemaRDD[294] at RDD at SchemaRDD.scala:103 == Query Plan == == Physical Plan == java.util.NoSuchElementException: head of empty list The ts column in the where clause has timestamp data and is of type timestamp. If I replace the string '2012-01-01T00:00:00' in the where clause with its epoch value, then the query works fine. It looks I have run into an issue described in this pull request: https://github.com/apache/spark/pull/2084 Is that PR not merged in Spark version 1.1.0? Or am I missing something? Thanks, Mohammed