Github user squito commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18853#discussion_r140799432
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
    @@ -115,21 +115,46 @@ object TypeCoercion {
        * is a String and the other is not. It also handles when one op is a 
Date and the
        * other is a Timestamp by making the target type to be String.
        */
    -  val findCommonTypeForBinaryComparison: (DataType, DataType) => 
Option[DataType] = {
    -    // We should cast all relative timestamp/date/string comparison into 
string comparisons
    -    // This behaves as a user would expect because timestamp strings sort 
lexicographically.
    -    // i.e. TimeStamp(2013-01-01 00:00 ...) < "2014" = true
    -    case (StringType, DateType) => Some(StringType)
    -    case (DateType, StringType) => Some(StringType)
    -    case (StringType, TimestampType) => Some(StringType)
    -    case (TimestampType, StringType) => Some(StringType)
    -    case (TimestampType, DateType) => Some(StringType)
    -    case (DateType, TimestampType) => Some(StringType)
    -    case (StringType, NullType) => Some(StringType)
    -    case (NullType, StringType) => Some(StringType)
    -    case (l: StringType, r: AtomicType) if r != StringType => Some(r)
    -    case (l: AtomicType, r: StringType) if (l != StringType) => Some(l)
    -    case (l, r) => None
    +  private def findCommonTypeForBinaryComparison(
    +      plan: LogicalPlan,
    +      l: DataType,
    +      r: DataType): Option[DataType] =
    +    if (!plan.conf.isHiveTypeCoercionMode) {
    +      (l, r) match {
    +        // We should cast all relative timestamp/date/string comparison 
into string comparisons
    +        // This behaves as a user would expect because timestamp strings 
sort lexicographically.
    +        // i.e. TimeStamp(2013-01-01 00:00 ...) < "2014" = true
    --- End diff --
    
    I think this comment should be updated given the latest investigation, 
explaining both the original motivation, and how it is flawed.  Eg.
    
    "Originally spark cast all relative timestamp/date/string comparison into 
string comparisons.  The motivation was that this would lead to natural 
comparisons on simple string inputs for times, eg. TimeStamp(2013-01-01 00:00 
...) < "2014" = true.  However, this leads to other logical inconsistencies, 
eg. TimeStamp(2013-01-01 00:00 ...) < "5" = true.  Also, equals is not 
consistent with other binary comparisions.  Futhermore, comparing to a string 
that does not look like a time at all will still compare, just with an 
unexpected result."


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to