Github user tanejagagan commented on a diff in the pull request: https://github.com/apache/spark/pull/17174#discussion_r105333124 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -324,14 +324,22 @@ object TypeCoercion { // We should cast all relative timestamp/date/string comparison into string comparisons // This behaves as a user would expect because timestamp strings sort lexicographically. // i.e. TimeStamp(2013-01-01 00:00 ...) < "2014" = true - case p @ BinaryComparison(left @ StringType(), right @ DateType()) => - p.makeCopy(Array(left, Cast(right, StringType))) - case p @ BinaryComparison(left @ DateType(), right @ StringType()) => - p.makeCopy(Array(Cast(left, StringType), right)) - case p @ BinaryComparison(left @ StringType(), right @ TimestampType()) => - p.makeCopy(Array(left, Cast(right, StringType))) - case p @ BinaryComparison(left @ TimestampType(), right @ StringType()) => - p.makeCopy(Array(Cast(left, StringType), right)) + // If StringType is foldable then we need to cast String to Date or Timestamp type + // which would give order of magnitude performance gain as well as preserve the behavior + // achieved by expressed above + // TimeStamp(2013-01-01 00:00 ...) < Cast( "2014" as timestamp) = true + case p @ BinaryComparison(left @ StringType(), right) if dateOrTimestampType(right) => + if (left.foldable) { + p.makeCopy(Array(Cast(left, right.dataType), right)) --- End diff -- Yes.. You can explicitly cast the string to timestamp and then speed up will be much faster. By default without casting query just runs fine silently , pick up a very bad plan, with no indication to user whatsoever and about order of magnitude slower Some of the other issue related to comparison such as` time < 'abc' `will also run just fine which i think should be fail fast and let user know about the issue with casting Other problem is with BI tools which generate these SQLs where user do not have direct control on the SQL. We came across this issue when the same query in Impala was running 10 times faster than in Spark and investigation of the that resulted in this bug and therefore fix
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org