Csaba Ringhofer created IMPALA-7723:
---------------------------------------

             Summary: Recognize int64 timestamps in CREATE TABLE LIKE PARQUET
                 Key: IMPALA-7723
                 URL: https://issues.apache.org/jira/browse/IMPALA-7723
             Project: IMPALA
          Issue Type: Improvement
          Components: Frontend
            Reporter: Csaba Ringhofer


IMPALA-5050 adds support for reading int64 encoded Parquet timestamps. These 
columns have int64 physical type, and converted/logical types has to be used to 
differentiate them from BIGINTs. These columns can be read both as BIGINTs and 
TIMESTAMPs depending on the table's schema.

CREATE TABLE LIKE PARQUET could also convert these columns to TIMESTAMP instead 
of BIGINT, but I decided to postpone adding this feature for two reasons:

1. It could break the following possible workflow:
- generate Parquet files (that contain int64 timestamps) with some tool
- use Impala's CREATE TABLE LIKE PARQUET + LOAD DATA to make it accessible as a 
table
- run some queries that rely on interpreting these columns as integers

CAST (col as BIGINT) in the query would make this even worse, as it would 
convert timestamp to unix time in seconds instead of micros/millis without any 
warning.

2. Adding support for int64 timestamps with nanoseconds precision will need 
Impala's  parquet-hadoop-bundle dependency to be bumped to a new major version, 
which may contain incompatible API changes.

Note that parquet-hadoop-bundle is only used in CREATE TABLE LIKE PARQUET. The 
C++ parts of Impala only rely on parquet.thrift, which can be updated more 
easily.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to