[jira] [Commented] (HIVE-18323) Vectorization: add the support of timestamp in VectorizedPrimitiveColumnReader for parquet

Vihang Karajgaonkar (JIRA) Tue, 16 Jan 2018 23:27:25 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-18323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16328390#comment-16328390
 ]


Vihang Karajgaonkar commented on HIVE-18323:
--------------------------------------------

I was porting the patch to branch-2 when I realized that branch-2 still uses 
Java 7 and this code will not compile in branch-2
{code}
case INT64:
          long seconds = 0;
          long nanoSeconds = 0;
          switch (type.getOriginalType()) {
          case TIMESTAMP_MILLIS:
            long miliSeconds = dataColumn.readLong();
            seconds = miliSeconds / TEN_TO_POW_3;
            nanoSeconds = (miliSeconds - seconds * TEN_TO_POW_3) * TEN_TO_POW_6;
            break;
          default:
            throw new IOException(
                "Unsupported parquet logical type: " + type.getOriginalType() + 
" for timestamp");
          }
          c.set(rowId, Timestamp.from(Instant.ofEpochSecond(seconds, 
nanoSeconds)));
{code}

because {{Timestamp.from(Instant.ofEpochSecond(seconds, nanoSeconds))}} uses 
Instant class which only available from Java 8. Also, I noticed that in 
{{DataWritableWriter}} the timestampWritable object is written as INT96. Do we 
even support writing timestamps as INT64 in parquet in hive? Any idea [~Ferd] 
[~aihuaxu] [~spena]

> Vectorization: add the support of timestamp in 
> VectorizedPrimitiveColumnReader for parquet
> ------------------------------------------------------------------------------------------
>
>                 Key: HIVE-18323
>                 URL: https://issues.apache.org/jira/browse/HIVE-18323
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Vectorization
>    Affects Versions: 3.0.0
>            Reporter: Aihua Xu
>            Assignee: Vihang Karajgaonkar
>            Priority: Major
>         Attachments: HIVE-18323.02.patch, HIVE-18323.03.patch, 
> HIVE-18323.04.patch, HIVE-18323.05.patch, HIVE-18323.06.patch, 
> HIVE-18323.07.patch, HIVE-18323.1.patch
>
>
> {noformat}
> CREATE TABLE `t1`(
>   `ts` timestamp,
>   `s1` string)
> STORED AS PARQUET;
> set hive.vectorized.execution.enabled=true;
> SELECT * from t1 SORT BY s1;
> {noformat}
> This query will throw exception since timestamp is not supported here yet.
> {noformat}
> Caused by: java.io.IOException: java.io.IOException: Unsupported type: 
> optional int96 ts
>         at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
>         at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
>         at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365)
>         at 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:116)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-18323) Vectorization: add the support of timestamp in VectorizedPrimitiveColumnReader for parquet

Reply via email to