[
https://issues.apache.org/jira/browse/TAJO-30?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hyunsik Choi updated TAJO-30:
-----------------------------
Attachment: null_handling.patch
Hi David,
I'm sorry for late response. The patch looks nice to me. Due to my other
schedule, I'm still reviewing the patch.
While I'm reviewing, I found that your Parquet work handles NULL values in a
different way. So, I created a diff file which can be applied to your work.
This diff contains a unit test to show the different null handling. I think
that it would be great if this patch is included in your work.
In addition, in this patch, the unit tests for Trevni format are commented out
because our Trevni implementation cannot handle NULL value correctly. This
problem will be addressed in another issue, so you don't need to care about
Trevni.
Regards,
Hyunsik
> Parquet Integration
> -------------------
>
> Key: TAJO-30
> URL: https://issues.apache.org/jira/browse/TAJO-30
> Project: Tajo
> Issue Type: New Feature
> Reporter: Hyunsik Choi
> Assignee: David Chen
> Labels: Parquet
> Attachments: TAJO-30.patch, null_handling.patch
>
>
> Parquet is a columnar storage format developed by Twitter. Implement Parquet
> (http://parquet.io/) support for Tajo.
> The implementation consists of the following:
> * {{ParquetScanner}} and {{ParquetAppender}} - FileScanner and FileAppenders
> for reading and writing Parquet.
> * {{TajoParquetReader}} and {{TajoParquetWriter}} - Top-level reader and
> writer for serializing/deserializing to Tajo Tuples.
> * {{TajoReadSupport}} and {{TajoWriteSupport}} - Abstractions to perform
> conversion between Parquet and Tajo records.
> * {{TajoRecordMaterializer}} - Materializes Tajo Tuples from Parquet's
> internal representation.
> * {{TajoRecordConverter}} - Used by {{TajoRecordMateriailzer}} to
> materialize a Tajo Tuple.
> * {{TajoSchemaConverter}} - Converts between Tajo and Parquet schemas.
--
This message was sent by Atlassian JIRA
(v6.2#6252)