Stamatis Zampetakis created HIVE-26658:
------------------------------------------
Summary: INT64 Parquet timestamps cannot be mapped to most Hive
numeric types
Key: HIVE-26658
URL: https://issues.apache.org/jira/browse/HIVE-26658
Project: Hive
Issue Type: Bug
Components: Parquet, Serializers/Deserializers
Affects Versions: 4.0.0-alpha-1
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis
When attempting to read a Parquet file with column of primitive type INT64 and
logical type
[TIMESTAMP|https://github.com/apache/parquet-format/blob/54e53e5d7794d383529dd30746378f19a12afd58/LogicalTypes.md?plain=1#L337]
an error is raised when the Hive type is different from TIMESTAMP and BIGINT.
Consider a Parquet file (e.g., ts_file.parquet) with the following schema:
{code:json}
{
"name": "eventtime",
"type": ["null", {
"type": "long",
"logicalType": "timestamp-millis"
}],
"default": null
}
{code}
Mapping the column to a Hive numeric type among TINYINT, SMALLINT, INT, FLOAT,
DOUBLE, DECIMAL, and trying to run a SELECT will give back an error.
The following snippet can be used to reproduce the problem.
{code:sql}
CREATE TABLE ts_table (eventtime INT) STORED AS PARQUET;
LOAD DATA LOCAL INPATH 'ts_file.parquet' into table ts_table;
SELECT * FROM ts_table;
{code}
This is a regression caused by HIVE-21215. Although, HIVE-21215 allows to read
INT64 types as Hive TIMESTAMP, which was not possible before, at the same time
it broke the mapping to every other Hive numeric type. The problem was
addressed selectively for BIGINT type very recently (HIVE-26612).
The primary goal of this ticket is to restore backward compatibility since
these use-cases were working before HIVE-21215.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)