spark git commit: [SPARK-8420] [SQL] Fix comparision of timestamps/dates with strings (branch-1.4)

2015-06-22 Thread yhuai
Repository: spark
Updated Branches:
  refs/heads/branch-1.4 451c8722a -> 65981619b


[SPARK-8420] [SQL] Fix comparision of timestamps/dates with strings (branch-1.4)

This is branch 1.4 backport of https://github.com/apache/spark/pull/6888.

Below is the original description.

In earlier versions of Spark SQL we casted `TimestampType` and `DataType` to 
`StringType` when it was involved in a binary comparison with a `StringType`.  
This allowed comparing a timestamp with a partial date as a user would expect.
 - `time > "2014-06-10"`
 - `time > "2014"`

In 1.4.0 we tried to cast the String instead into a Timestamp.  However, since 
partial dates are not a valid complete timestamp this results in `null` which 
results in the tuple being filtered.

This PR restores the earlier behavior.  Note that we still special case 
equality so that these comparisons are not affected by not printing zeros for 
subsecond precision.

Author: Michael Armbrust 

Closes #6888 from marmbrus/timeCompareString and squashes the following commits:

bdef29c [Michael Armbrust] test partial date
1f09adf [Michael Armbrust] special handling of equality
1172c60 [Michael Armbrust] more test fixing
4dfc412 [Michael Armbrust] fix tests
aaa9508 [Michael Armbrust] newline
04d908f [Michael Armbrust] [SPARK-8420][SQL] Fix comparision of 
timestamps/dates with strings

Conflicts:

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala

Author: Michael Armbrust 

Closes #6914 from yhuai/timeCompareString-1.4 and squashes the following 
commits:

9882915 [Michael Armbrust] [SPARK-8420] [SQL] Fix comparision of 
timestamps/dates with strings


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/65981619
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/65981619
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/65981619

Branch: refs/heads/branch-1.4
Commit: 65981619b26da03f0c5133133e318a180235e96d
Parents: 451c872
Author: Michael Armbrust 
Authored: Mon Jun 22 10:45:33 2015 -0700
Committer: Yin Huai 
Committed: Mon Jun 22 10:45:33 2015 -0700

--
 .../catalyst/analysis/HiveTypeCoercion.scala| 17 --
 .../sql/catalyst/expressions/predicates.scala   |  9 
 .../apache/spark/sql/DataFrameDateSuite.scala   | 56 
 .../org/apache/spark/sql/SQLQuerySuite.scala|  4 ++
 .../scala/org/apache/spark/sql/TestData.scala   |  6 ---
 .../columnar/InMemoryColumnarQuerySuite.scala   |  7 ++-
 6 files changed, 88 insertions(+), 11 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/65981619/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala
index fa7968e..6d0f4a0 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala
@@ -242,7 +242,16 @@ trait HiveTypeCoercion {
   case a: BinaryArithmetic if a.right.dataType == StringType =>
 a.makeCopy(Array(a.left, Cast(a.right, DoubleType)))
 
-  // we should cast all timestamp/date/string compare into string compare
+  // For equality between string and timestamp we cast the string to a 
timestamp
+  // so that things like rounding of subsecond precision does not affect 
the comparison.
+  case p @ Equality(left @ StringType(), right @ TimestampType()) =>
+p.makeCopy(Array(Cast(left, TimestampType), right))
+  case p @ Equality(left @ TimestampType(), right @ StringType()) =>
+p.makeCopy(Array(left, Cast(right, TimestampType)))
+
+  // We should cast all relative timestamp/date/string comparison into 
string comparisions
+  // This behaves as a user would expect because timestamp strings sort 
lexicographically.
+  // i.e. TimeStamp(2013-01-01 00:00 ...) < "2014" = true
   case p: BinaryComparison if p.left.dataType == StringType &&
   p.right.dataType == DateType =>
 p.makeCopy(Array(p.left, Cast(p.right, StringType)))
@@ -251,10 +260,12 @@ trait HiveTypeCoercion {
 p.makeCopy(Array(Cast(p.left, StringType), p.right))
   case p: BinaryComparison if p.left.dataType == StringType &&
   p.right.dataType == TimestampType =>
-p.makeCopy(Array(Cast(p.left, TimestampType), p.right))
+p.makeCopy(Array(p.left, Cast(p.right, StringType

spark git commit: [SPARK-8420] [SQL] Fix comparision of timestamps/dates with strings

2015-06-19 Thread yhuai
Repository: spark
Updated Branches:
  refs/heads/master 9814b971f -> a333a72e0


[SPARK-8420] [SQL] Fix comparision of timestamps/dates with strings

In earlier versions of Spark SQL we casted `TimestampType` and `DataType` to 
`StringType` when it was involved in a binary comparison with a `StringType`.  
This allowed comparing a timestamp with a partial date as a user would expect.
 - `time > "2014-06-10"`
 - `time > "2014"`

In 1.4.0 we tried to cast the String instead into a Timestamp.  However, since 
partial dates are not a valid complete timestamp this results in `null` which 
results in the tuple being filtered.

This PR restores the earlier behavior.  Note that we still special case 
equality so that these comparisons are not affected by not printing zeros for 
subsecond precision.

Author: Michael Armbrust 

Closes #6888 from marmbrus/timeCompareString and squashes the following commits:

bdef29c [Michael Armbrust] test partial date
1f09adf [Michael Armbrust] special handling of equality
1172c60 [Michael Armbrust] more test fixing
4dfc412 [Michael Armbrust] fix tests
aaa9508 [Michael Armbrust] newline
04d908f [Michael Armbrust] [SPARK-8420][SQL] Fix comparision of 
timestamps/dates with strings


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a333a72e
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a333a72e
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a333a72e

Branch: refs/heads/master
Commit: a333a72e029d2546a66b36d6b3458e965430c530
Parents: 9814b97
Author: Michael Armbrust 
Authored: Fri Jun 19 16:54:51 2015 -0700
Committer: Yin Huai 
Committed: Fri Jun 19 16:54:51 2015 -0700

--
 .../catalyst/analysis/HiveTypeCoercion.scala| 17 --
 .../sql/catalyst/expressions/predicates.scala   |  9 
 .../apache/spark/sql/DataFrameDateSuite.scala   | 56 
 .../org/apache/spark/sql/SQLQuerySuite.scala|  4 ++
 .../scala/org/apache/spark/sql/TestData.scala   |  6 ---
 .../columnar/InMemoryColumnarQuerySuite.scala   |  7 ++-
 6 files changed, 88 insertions(+), 11 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/a333a72e/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala
index 8012b22..d4ab1fc 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala
@@ -277,15 +277,26 @@ trait HiveTypeCoercion {
   case a @ BinaryArithmetic(left, right @ StringType()) =>
 a.makeCopy(Array(left, Cast(right, DoubleType)))
 
-  // we should cast all timestamp/date/string compare into string compare
+  // For equality between string and timestamp we cast the string to a 
timestamp
+  // so that things like rounding of subsecond precision does not affect 
the comparison.
+  case p @ Equality(left @ StringType(), right @ TimestampType()) =>
+p.makeCopy(Array(Cast(left, TimestampType), right))
+  case p @ Equality(left @ TimestampType(), right @ StringType()) =>
+p.makeCopy(Array(left, Cast(right, TimestampType)))
+
+  // We should cast all relative timestamp/date/string comparison into 
string comparisions
+  // This behaves as a user would expect because timestamp strings sort 
lexicographically.
+  // i.e. TimeStamp(2013-01-01 00:00 ...) < "2014" = true
   case p @ BinaryComparison(left @ StringType(), right @ DateType()) =>
 p.makeCopy(Array(left, Cast(right, StringType)))
   case p @ BinaryComparison(left @ DateType(), right @ StringType()) =>
 p.makeCopy(Array(Cast(left, StringType), right))
   case p @ BinaryComparison(left @ StringType(), right @ TimestampType()) 
=>
-p.makeCopy(Array(Cast(left, TimestampType), right))
+p.makeCopy(Array(left, Cast(right, StringType)))
   case p @ BinaryComparison(left @ TimestampType(), right @ StringType()) 
=>
-p.makeCopy(Array(left, Cast(right, TimestampType)))
+p.makeCopy(Array(Cast(left, StringType), right))
+
+  // Comparisons between dates and timestamps.
   case p @ BinaryComparison(left @ TimestampType(), right @ DateType()) =>
 p.makeCopy(Array(Cast(left, StringType), Cast(right, StringType)))
   case p @ BinaryComparison(left @ DateType(), right @ TimestampType()) =>

http://git-wip-us.apache.org/repos/asf/spark/blob/a333a72e/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala
-