Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/18853#discussion_r140799432 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -115,21 +115,46 @@ object TypeCoercion { * is a String and the other is not. It also handles when one op is a Date and the * other is a Timestamp by making the target type to be String. */ - val findCommonTypeForBinaryComparison: (DataType, DataType) => Option[DataType] = { - // We should cast all relative timestamp/date/string comparison into string comparisons - // This behaves as a user would expect because timestamp strings sort lexicographically. - // i.e. TimeStamp(2013-01-01 00:00 ...) < "2014" = true - case (StringType, DateType) => Some(StringType) - case (DateType, StringType) => Some(StringType) - case (StringType, TimestampType) => Some(StringType) - case (TimestampType, StringType) => Some(StringType) - case (TimestampType, DateType) => Some(StringType) - case (DateType, TimestampType) => Some(StringType) - case (StringType, NullType) => Some(StringType) - case (NullType, StringType) => Some(StringType) - case (l: StringType, r: AtomicType) if r != StringType => Some(r) - case (l: AtomicType, r: StringType) if (l != StringType) => Some(l) - case (l, r) => None + private def findCommonTypeForBinaryComparison( + plan: LogicalPlan, + l: DataType, + r: DataType): Option[DataType] = + if (!plan.conf.isHiveTypeCoercionMode) { + (l, r) match { + // We should cast all relative timestamp/date/string comparison into string comparisons + // This behaves as a user would expect because timestamp strings sort lexicographically. + // i.e. TimeStamp(2013-01-01 00:00 ...) < "2014" = true --- End diff -- I think this comment should be updated given the latest investigation, explaining both the original motivation, and how it is flawed. Eg. "Originally spark cast all relative timestamp/date/string comparison into string comparisons. The motivation was that this would lead to natural comparisons on simple string inputs for times, eg. TimeStamp(2013-01-01 00:00 ...) < "2014" = true. However, this leads to other logical inconsistencies, eg. TimeStamp(2013-01-01 00:00 ...) < "5" = true. Also, equals is not consistent with other binary comparisions. Futhermore, comparing to a string that does not look like a time at all will still compare, just with an unexpected result."
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org