Github user uzadude commented on a diff in the pull request: https://github.com/apache/spark/pull/23042#discussion_r234431689 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -138,6 +138,11 @@ object TypeCoercion { case (DateType, TimestampType) => if (conf.compareDateTimestampInTimestamp) Some(TimestampType) else Some(StringType) + // to support a popular use case of tables using Decimal(X, 0) for long IDs instead of strings + // see SPARK-26070 for more details + case (n: DecimalType, s: StringType) if n.scale == 0 => Some(DecimalType(n.precision, n.scale)) --- End diff -- I personally agree with @cloud-fan that there are a few types that are "definitely safe", and as the user is not always responsible to his input tables, I believe convinience is more important than schema definitions. Also, even count() returns a bigint then you'll have to filter 'count(*)>100L' which means huge regression. I believe that the "definitely safe" list is very short and we should use it. @mgaido91, in your examples I do agree that Double to Decimal is not safe and so is String to almost anything. the trivial safes are something like (Long, Int), (Int, Double), (Decimal, Decimal) - that could be expanded to the same precision and scale, maybe (Data, TimeStamp)..
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org