[jira] [Updated] (TAJO-30) Parquet Integration

Hyunsik Choi (JIRA) Sun, 23 Mar 2014 21:06:24 -0700

     [ 
https://issues.apache.org/jira/browse/TAJO-30?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Hyunsik Choi updated TAJO-30:
-----------------------------

    Attachment: null_handling.patch

Hi David,

I'm sorry for late response. The patch looks nice to me. Due to my other 
schedule, I'm still reviewing the patch.

While I'm reviewing, I found that your Parquet work handles NULL values in a 
different way. So, I created a diff file which can be applied to your work. 
This diff contains a unit test to show the different null handling. I think 
that it would be great if this patch is included in your work.

In addition, in this patch, the unit tests for Trevni format are commented out 
because our Trevni implementation cannot handle NULL value correctly. This 
problem will be addressed in another issue, so you don't need to care about 
Trevni.

Regards,
Hyunsik

> Parquet Integration
> -------------------
>
>                 Key: TAJO-30
>                 URL: https://issues.apache.org/jira/browse/TAJO-30
>             Project: Tajo
>          Issue Type: New Feature
>            Reporter: Hyunsik Choi
>            Assignee: David Chen
>              Labels: Parquet
>         Attachments: TAJO-30.patch, null_handling.patch
>
>
> Parquet is a columnar storage format developed by Twitter. Implement Parquet 
> (http://parquet.io/) support for Tajo.
> The implementation consists of the following:
>  * {{ParquetScanner}} and {{ParquetAppender}} - FileScanner and FileAppenders 
> for reading and writing Parquet.
>  * {{TajoParquetReader}} and {{TajoParquetWriter}} - Top-level reader and 
> writer for serializing/deserializing to Tajo Tuples.
>  * {{TajoReadSupport}} and {{TajoWriteSupport}} - Abstractions to perform 
> conversion between Parquet and Tajo records.
>  * {{TajoRecordMaterializer}} - Materializes Tajo Tuples from Parquet's 
> internal representation.
>  * {{TajoRecordConverter}} - Used by {{TajoRecordMateriailzer}} to 
> materialize a Tajo Tuple.
>  * {{TajoSchemaConverter}} - Converts between Tajo and Parquet schemas.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (TAJO-30) Parquet Integration

Reply via email to