[ 
https://issues.apache.org/jira/browse/IMPALA-7723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer closed IMPALA-7723.
-----------------------------------
    Resolution: Invalid

> Recognize int64 timestamps in CREATE TABLE LIKE PARQUET
> -------------------------------------------------------
>
>                 Key: IMPALA-7723
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7723
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend
>            Reporter: Csaba Ringhofer
>            Priority: Minor
>              Labels: parquet
>
> IMPALA-5050 adds support for reading int64 encoded Parquet timestamps. These 
> columns have int64 physical type, and converted/logical types has to be used 
> to differentiate them from BIGINTs. These columns can be read both as BIGINTs 
> and TIMESTAMPs depending on the table's schema.
> CREATE TABLE LIKE PARQUET could also convert these columns to TIMESTAMP 
> instead of BIGINT, but I decided to postpone adding this feature for two 
> reasons:
> 1. It could break the following possible workflow:
> - generate Parquet files (that contain int64 timestamps) with some tool
> - use Impala's CREATE TABLE LIKE PARQUET + LOAD DATA to make it accessible as 
> a table
> - run some queries that rely on interpreting these columns as integers
> CAST (col as BIGINT) in the query would make this even worse, as it would 
> convert timestamp to unix time in seconds instead of micros/millis without 
> any warning.
> 2. Adding support for int64 timestamps with nanoseconds precision will need 
> Impala's  parquet-hadoop-bundle dependency to be bumped to a new major 
> version, which may contain incompatible API changes.
> Note that parquet-hadoop-bundle is only used in CREATE TABLE LIKE PARQUET. 
> The C++ parts of Impala only rely on parquet.thrift, which can be updated 
> more easily.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to