Naresh P R created SPARK-54697:
----------------------------------

             Summary: Read/Write proleptic dates older than 1582-10-04 via 
Hive/Spark for interoperability
                 Key: SPARK-54697
                 URL: https://issues.apache.org/jira/browse/SPARK-54697
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 3.5.7
            Reporter: Naresh P R


eg., 
{code:java}
create external table test_calendar (writerType string, inputDate date) stored 
as parquet;
INSERT INTO test.test_calendar values('spark-corrected', CAST('0685-04-12' AS 
DATE)), ('spark-corrected', CAST('1582-10-04' AS DATE)); {code}
Hive writes a flag in parquet metadata ({*}writer.date.proleptic{*}) which 
helps Hive-Parquet readers to decide whether the date is in hybrid or 
proleptic. *hive.parquet.date.proleptic.gregorian* is used in writer flow which 
adds *writer.date.proleptic* = true/false on the parquet file metadata.

 

Setting *hive.parquet.date.proleptic.gregorian=true/false* while reading the 
files doesn’t not have any effect, Hive parquet read depends on 
*writer.date.proleptic* file specific metadata config on each individual file.

 

Its better if Spark can comply with Hive *writer.date.proleptic* meta config. 
(ie., Spark writer should add writer.date.proleptic=true/false in parquet file 
metadata and consider the same metadata config while reading in spark instead 
of relying on spark.sql.parquet.datetimeRebaseModeInRead/ 
spark.sql.parquet.datetimeRebaseModeInWrite as LEGACY/CORRECTED. Or have a 
better a common ground so that all reads know whether the dates are Julian or 
Gregorian.

 

Without this common ground, Hive written files will show wrong results in Spark 
& Spark written files will show wrong results in Hive.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to