Stamatis Zampetakis created HIVE-27199:
------------------------------------------
Summary: Read TIMESTAMP WITH LOCAL TIME ZONE columns from text
files using custom formats
Key: HIVE-27199
URL: https://issues.apache.org/jira/browse/HIVE-27199
Project: Hive
Issue Type: Improvement
Components: Serializers/Deserializers
Affects Versions: 4.0.0-alpha-2
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis
Timestamp values come in many flavors and formats and there is no single
representation that can satisfy everyone especially when such values are stored
in plain text/csv files.
HIVE-9298, added a special SERDE property, {{{}timestamp.formats{}}}, that
allows to provide custom timestamp patterns to parse correctly TIMESTAMP values
coming from files.
However, when the column type is TIMESTAMP WITH LOCAL TIME ZONE (LTZ) it is not
possible to use a custom pattern thus when the built-in Hive parser does not
match the expected format a NULL value is returned.
Consider a text file, F1, with the following values:
{noformat}
2016-05-03 12:26:34
2016-05-03T12:26:34
{noformat}
and a table with a column declared as LTZ.
{code:sql}
CREATE TABLE ts_table (ts TIMESTAMP WITH LOCAL TIME ZONE);
LOAD DATA LOCAL INPATH './F1' INTO TABLE ts_table;
SELECT * FROM ts_table;
2016-05-03 12:26:34.0 US/Pacific
NULL
{code}
In order to give more flexibility to the users relying on the TIMESTAMP WITH
LOCAL TIME ZONE datatype and also align the behavior with the TIMESTAMP type
this JIRA aims to reuse the {{timestamp.formats}} property for both TIMESTAMP
types.
The work here focuses exclusively on simple text files but the same could be
done for other SERDE such as JSON etc.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)