[ https://issues.apache.org/jira/browse/SPARK-27638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16833957#comment-16833957 ]
Maxim Gekk edited comment on SPARK-27638 at 5/6/19 4:10 PM: ------------------------------------------------------------ It works with explicit to_date: {code:scala} scala> val ds = spark.range(1).selectExpr("date '2000-01-01' as d") ds: org.apache.spark.sql.DataFrame = [d: date] scala> ds.where("d >= to_date('2000-1-1')").show +----------+ | d| +----------+ |2000-01-01| +----------+ {code} but with to_date, it compares strings: {code} scala> ds.where("d >= '2000-1-1'").explain(true) == Parsed Logical Plan == 'Filter ('d >= 2000-1-1) +- Project [10957 AS d#51] +- Range (0, 1, step=1, splits=Some(8)) == Analyzed Logical Plan == d: date Filter (cast(d#51 as string) >= 2000-1-1) +- Project [10957 AS d#51] +- Range (0, 1, step=1, splits=Some(8)) == Optimized Logical Plan == LocalRelation <empty>, [d#51] == Physical Plan == LocalTableScan <empty>, [d#51] {code} The same is for '2000-01-01', the date column is casted to string. was (Author: maxgekk): It works with explicit to_date: {code:scala} scala> val ds = spark.range(1).selectExpr("date '2000-01-01' as d") ds: org.apache.spark.sql.DataFrame = [d: date] scala> ds.where("d >= to_date('2000-1-1')").show +----------+ | d| +----------+ |2000-01-01| +----------+ {code} but with to_date, it compares strings: {code} scala> ds.where("d >= '2000-1-1'").explain(true) == Parsed Logical Plan == 'Filter ('d >= 2000-1-1) +- Project [10957 AS d#51] +- Range (0, 1, step=1, splits=Some(8)) == Analyzed Logical Plan == d: date Filter (cast(d#51 as string) >= 2000-1-1) +- Project [10957 AS d#51] +- Range (0, 1, step=1, splits=Some(8)) == Optimized Logical Plan == LocalRelation <empty>, [d#51] == Physical Plan == LocalTableScan <empty>, [d#51] {code} > date format yyyy-M-dd string comparison not handled properly > ------------------------------------------------------------- > > Key: SPARK-27638 > URL: https://issues.apache.org/jira/browse/SPARK-27638 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.4.2 > Reporter: peng bo > Priority: Major > > The below example works with both Mysql and Hive, however not with spark. > {code:java} > mysql> select * from date_test where date_col >= '2000-1-1'; > +------------+ > | date_col | > +------------+ > | 2000-01-01 | > +------------+ > {code} > The reason is that Spark casts both sides to String type during date and > string comparison for partial date support. Please find more details in > https://issues.apache.org/jira/browse/SPARK-8420. > Based on some tests, the behavior of Date and String comparison in Hive and > Mysql: > Hive: Cast to Date, partial date is not supported > Spark: Cast to Date, certain "partial date" is supported by defining certain > date string parse rules. Check out {{str_to_datetime}} in > https://github.com/mysql/mysql-server/blob/5.5/sql-common/my_time.c > Here's 2 proposals: > a. Follow Mysql parse rule, but some partial date string comparison cases > won't be supported either. > b. Cast String value to Date, if it passes use date.toString, original string > otherwise. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org