Re: sql timestamp timezone bug

2016-03-19 Thread Andy Davidson
Hi Davies > > What's the type of `created`? TimestampType? The Œcreated¹ column in cassandra is a timestamp https://docs.datastax.com/en/cql/3.0/cql/cql_reference/timestamp_type_r.html In the spark data frame it is a a timestamp > > If yes, when created is compared to a string, it will be

Re: sql timestamp timezone bug

2016-03-19 Thread Andy Davidson
For completeness. Clearly spark sql returned a different data set In [4]: rawDF.selectExpr("count(row_key) as num_samples", "sum(count) as total", "max(count) as max ").show() +---++-+ |num_samples|total|max|

Re: sql timestamp timezone bug

2016-03-19 Thread Davies Liu
On Thu, Mar 17, 2016 at 3:02 PM, Andy Davidson wrote: > I am using pyspark 1.6.0 and > datastax:spark-cassandra-connector:1.6.0-M1-s_2.10 to analyze time series > data > > The data is originally captured by a spark streaming app and written to > Cassandra. The value

Re: sql timestamp timezone bug

2016-03-19 Thread Davies Liu
In Spark SQL, timestamp is the number of micro seconds since epoch, so it has nothing with timezone. When you compare it again unix_timestamp or string, it's better to convert these into timestamp then compare them. In your case, the where clause should be: where (created > cast('{0}' as