[jira] [Commented] (HIVE-8128) Improve Parquet Vectorization

Dong Chen (JIRA) Mon, 24 Nov 2014 00:19:33 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-8128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14222774#comment-14222774
 ]


Dong Chen commented on HIVE-8128:
---------------------------------

To improve Parquet Vectorization, I think we need following changes, and they 
should be based on PARQUET-131. These are some initial thoughts and I will make 
them more specific after working on parquet side for a while.

Assuming the RecordReader in Hive will get data of type 
{{ParquetVectorizedRowBatch}}.

1. The next() method of {{VectorizedParquetRecordReader}} should be 
{{next(NullWritable key, ParquetVectorizedRowBatch outputBatch)}}. This will 
let Hive get a vectorized batch of rows of Parquet at a time.

2. A {{VectorizedParquetHiveSerDe}} will be added to convert 
{{ParquetVectorizedRowBatch}} to Hive recognized {{VectorizedRowBatch}}. In 
order to make conversion efficiently, the Parquet vectorized API design might 
consider this. The more similar between the 2 kinds of row batch, the better.

3. The support for partition has been in trunk. Whether it works for Parquet 
should be verified after main work is done, and make possible changes if 
neccessary.

> Improve Parquet Vectorization
> -----------------------------
>
>                 Key: HIVE-8128
>                 URL: https://issues.apache.org/jira/browse/HIVE-8128
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Brock Noland
>            Assignee: Dong Chen
>
> We'll want to do is finish the vectorization work (e.g. VectorizedOrcSerde, 
> VectorizedOrcSerde) which was partially done in HIVE-5998.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8128) Improve Parquet Vectorization

Reply via email to