Hi @Stamatis Zampetakis<mailto:zabe...@gmail.com>, 
@David<mailto:dam6...@gmail.com>,

Our current implementation using DateTimeFormatter is not backward compatible 
and it leads to migration issues.
One of our customer who have this use-case where we don't have a better options 
to migrate.

Hive 1.2/Spark 2.4 (Shared metastore):
Set VM time zone to Asia/Bangkok.
INSERT values ("1400-01-01 00:00:00") into parquet_table; // Here, parquet 
writer converts the data into UTC (- 07:00:00) and stored it.

Migrate to Hive 3.x/Spark 3.x (Shared metastore)::
Set VM time zone to Asia/Bangkok.
SELECT ts from parquet_table; // Hive returns different value whereas Spark 
(spark.sql.legacy.timeParserPolicy=LEGACY) returns 1400-01-01 00:00:00

It is not easy to change thousands of Hive scripts to handle this difference 
and it adds to migration cost.
I think, it is necessary to enable backward compatibility for smooth migration. 
Pls share your thoughts.

Thanks,
Sankar

From: Ashish Sharma <ashishkumarsharm...@gmail.com>
Sent: 29 September 2021 19:11
To: dev@hive.apache.org; u...@hive.apache.org
Cc: sank...@apache.org
Subject: [EXTERNAL] Raise exception instead of silent change for new 
DateTimeformatter


History

Hive 1.2 -

VM time zone set to Asia/Bangkok

Query - SELECT FROM_UNIXTIME(UNIX_TIMESTAMP('1800-01-01 00:00:00 
UTC','yyyy-MM-dd HH:mm:ss z'));

Result - 1800-01-01 07:00:00

Implementation details -

SimpleDateFormat formatter = new SimpleDateFormat(pattern);
Long unixtime = formatter.parse(textval).getTime() / 1000;
Date date = new Date(unixtime * 1000L);

https://docs.oracle.com/javase/8/docs/api/java/util/Date.html<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.oracle.com%2Fjavase%2F8%2Fdocs%2Fapi%2Fjava%2Futil%2FDate.html&data=04%7C01%7CSankar.Hariappan%40microsoft.com%7C013a8535c2af4647fb1308d9834ede18%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637685197136779324%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=xxOBj5zDm29DTpPYC6rlgz639Dhn7vpHxALYHdn9VO0%3D&reserved=0>
 . In official documentation they have mentioned that "Unfortunately, the API 
for these functions was not amenable to internationalization and The 
corresponding methods in Date are deprecated" . Due to that this is producing 
wrong result

latest hive -

set hive.local.time.zone=Asia/Bangkok;

Query - SELECT FROM_UNIXTIME(UNIX_TIMESTAMP('1800-01-01 00:00:00 
UTC','yyyy-MM-dd HH:mm:ss z'));

Result - 1800-01-01 06:42:04

Implementation details -

DateTimeFormatter dtformatter = new DateTimeFormatterBuilder()
.parseCaseInsensitive()
.appendPattern(pattern)
.toFormatter();

ZonedDateTime zonedDateTime = 
ZonedDateTime.parse(textval,dtformatter).withZoneSameInstant(ZoneId.of(timezone));
Long dttime = zonedDateTime.toInstant().getEpochSecond();



Problem-

Now SimpleDateFormat has been replaced with DateTimeFormatter which is not 
backward compatible. Causing issues at times for migration to the new version. 
Because the older data written using Hive 1.x or 2.x is not compatible with 
DateTimeFormatter.



Solution -

Introduce an config "hive.legacy.timeParserPolicy" with following values -
1. EXCEPTION - compare value of both SimpleDateFormat & DateTimeFormatter raise 
exception if doesn't match
2. LEGACY - use SimpleDateFormat
3. CORRECTED - use DateTimeFormatter

This will help hive user in the following manner -
1. Migrate to new version using LEGACY
2. Find values which are not compatible with the new version - EXCEPTION
3. Use latest date apis - CORRECTED

Note: apache spark also face the same issue 
https://issues.apache.org/jira/browse/SPARK-30668<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FSPARK-30668&data=04%7C01%7CSankar.Hariappan%40microsoft.com%7C013a8535c2af4647fb1308d9834ede18%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637685197136779324%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=yYKfiVbSW%2FbfD5V1leqB8cH349Qb6FzYtSn5ClcZrqc%3D&reserved=0>



Hive jira - 
https://issues.apache.org/jira/browse/HIVE-25576<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FHIVE-25576&data=04%7C01%7CSankar.Hariappan%40microsoft.com%7C013a8535c2af4647fb1308d9834ede18%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637685197136789283%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=W9ZZoPgtBeA69eF%2FonPtdXdp15PG4%2F1M6rc99G%2BErcc%3D&reserved=0>



Thanks

Ashish Sharma

Reply via email to