maropu commented on a change in pull request #28673:
URL: https://github.com/apache/spark/pull/28673#discussion_r432804934



##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeFormatterHelper.scala
##########
@@ -118,6 +118,31 @@ trait DateTimeFormatterHelper {
         s"before Spark 3.0, or set to CORRECTED and treat it as an invalid 
datetime string.", e)
   }
 
+  // When legacy time parser policy set to EXCEPTION, check whether we will 
get different results
+  // between legacy formatter and new formatter. If new formatter fails but 
legacy formatter works,
+  // throw a SparkUpgradeException. On the contrary, if the legacy policy set 
to CORRECTED,
+  // DateTimeParseException will address by the caller side.
+  protected def checkDiffFormatResult[T <: Date](
+      d: T,
+      legacyFormatFunc: T => String): PartialFunction[Throwable, String] = {
+    case e if needConvertToSparkUpgradeException(e) =>
+      val resultCandidate = try {
+      legacyFormatFunc(d)
+    } catch {
+      case _: Throwable => throw e
+    }
+    throw new SparkUpgradeException("3.0", s"Fail to format it to 
'$resultCandidate' in the new" +
+      s" formatter. You can set ${SQLConf.LEGACY_TIME_PARSER_POLICY.key} to 
LEGACY to restore" +
+      s" the behavior before Spark 3.0, or set to CORRECTED and treat it as an 
invalid" +
+      s" datetime string.", e)
+  }
+
+  private def needConvertToSparkUpgradeException(e: Throwable): Boolean = e 
match {
+    case _: DateTimeException | _: ArrayIndexOutOfBoundsException
+      if SQLConf.get.legacyTimeParserPolicy == EXCEPTION => true

Review comment:
       Could you leave some comments about what's a condition for 
`ArrayIndexOutOfBoundsException` to be thrown?

##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeFormatterHelper.scala
##########
@@ -105,9 +105,9 @@ trait DateTimeFormatterHelper {
   // between legacy parser and new parser. If new parser fails but legacy 
parser works, throw a
   // SparkUpgradeException. On the contrary, if the legacy policy set to 
CORRECTED,
   // DateTimeParseException will address by the caller side.
-  protected def checkDiffResult[T](
+  protected def checkDiffParserResult[T](
       s: String, legacyParseFunc: String => T): PartialFunction[Throwable, T] 
= {
-    case e: DateTimeException if SQLConf.get.legacyTimeParserPolicy == 
EXCEPTION =>
+    case e if needConvertToSparkUpgradeException(e) =>

Review comment:
       The parser case can get`ArrayIndexOutOfBoundsException`, too?

##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeFormatterHelper.scala
##########
@@ -118,6 +118,31 @@ trait DateTimeFormatterHelper {
         s"before Spark 3.0, or set to CORRECTED and treat it as an invalid 
datetime string.", e)
   }
 
+  // When legacy time parser policy set to EXCEPTION, check whether we will 
get different results
+  // between legacy formatter and new formatter. If new formatter fails but 
legacy formatter works,
+  // throw a SparkUpgradeException. On the contrary, if the legacy policy set 
to CORRECTED,
+  // DateTimeParseException will address by the caller side.
+  protected def checkDiffFormatResult[T <: Date](
+      d: T,
+      legacyFormatFunc: T => String): PartialFunction[Throwable, String] = {
+    case e if needConvertToSparkUpgradeException(e) =>
+      val resultCandidate = try {
+      legacyFormatFunc(d)
+    } catch {
+      case _: Throwable => throw e
+    }
+    throw new SparkUpgradeException("3.0", s"Fail to format it to 
'$resultCandidate' in the new" +
+      s" formatter. You can set ${SQLConf.LEGACY_TIME_PARSER_POLICY.key} to 
LEGACY to restore" +
+      s" the behavior before Spark 3.0, or set to CORRECTED and treat it as an 
invalid" +
+      s" datetime string.", e)

Review comment:
       nit: removes unnecessary `s`s in the head.

##########
File path: sql/core/src/test/resources/sql-tests/inputs/datetime.sql
##########
@@ -160,3 +160,7 @@ select from_json('{"time":"26/October/2015"}', 'time 
Timestamp', map('timestampF
 select from_json('{"date":"26/October/2015"}', 'date Date', map('dateFormat', 
'dd/MMMMM/yyyy'));
 select from_csv('26/October/2015', 'time Timestamp', map('timestampFormat', 
'dd/MMMMM/yyyy'));
 select from_csv('26/October/2015', 'date Date', map('dateFormat', 
'dd/MMMMM/yyyy'));
+
+-- fix silent data change in date time formatters
+select from_unixtime(1, 'yyyyyyyyyyy-MM-dd');
+select date_format(date '2018-11-17', 'yyyyyyyyyyy-MM-dd');

Review comment:
       How about adding a test for the case `4<len(|y....y|)<11`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to