[ 
https://issues.apache.org/jira/browse/SPARK-35097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17324643#comment-17324643
 ] 

Simeon Simeonov commented on SPARK-35097:
-----------------------------------------

[~maxgekk] thanks for creating this issue; it came from a problem we 
discovered. 

Reporting the column name alone in the exception message would not be 
sufficient for fast root cause analysis in many situations. Imagine the case 
where the column name is something like {{date}}, which is common across many 
tables, and a job that uses lots of different tables. To aid users, one needs 
to narrow down the source of the problem by either (a) providing user stack 
trace information or, if that is for some reason impossible or very difficult, 
(b) provide information about the Parquet parquet source with the issue (path, 
plan info, etc.).

[~angerszhuuu] can either (a) or (b) above be added to your PR?

> Add column name to SparkUpgradeException about ancient datetime
> ---------------------------------------------------------------
>
>                 Key: SPARK-35097
>                 URL: https://issues.apache.org/jira/browse/SPARK-35097
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.2.0
>            Reporter: Max Gekk
>            Priority: Major
>
> The error message:
> {code:java}
> org.apache.spark.SparkUpgradeException: You may get a different result due to 
> the upgrading of Spark 3.0: reading dates before 1582-10-15 or timestamps 
> before 1900-01-01T00:00:00Z from Parquet files can be ambiguous, as the files 
> may be written by Spark 2.x or legacy versions of Hive, which uses a legacy 
> hybrid calendar that is different from Spark 3.0+'s Proleptic Gregorian 
> calendar. See more details in SPARK-31404. You can set 
> spark.sql.legacy.parquet.datetimeRebaseModeInRead to 'LEGACY' to rebase the 
> datetime values w.r.t. the calendar difference during reading. Or set 
> spark.sql.legacy.parquet.datetimeRebaseModeInRead to 'CORRECTED' to read the 
> datetime values as it is.
> {code}
> doesn't have any clues of which column causes the issue. Need to improve the 
> message and add column name to it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to