[ 
https://issues.apache.org/jira/browse/TAJO-710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Chen updated TAJO-710:
----------------------------

    Summary: Add support for nested schemas and complex types  (was: Add 
support for nested schemas)

> Add support for nested schemas and complex types
> ------------------------------------------------
>
>                 Key: TAJO-710
>                 URL: https://issues.apache.org/jira/browse/TAJO-710
>             Project: Tajo
>          Issue Type: New Feature
>          Components: data type
>            Reporter: David Chen
>
> Add support for nested schemas. Here are some ways other systems handle 
> nested schemas:
>  * Pig and Hive uses complex data types, such as bags, structs, arrays, etc.
>  * Impala doesn't support nested schemas and simply flattens the schema.
> From the discussion in TAJO-30:
> {quote}
> I have a plan for nested schema. Currently, Tajo only supports a flat schema 
> like relational DBMS. So, even though Tajo is extended to nested data mode, 
> it will not break the compatibility.
> I'm thinking that Tajo takes Parquet data model (= protobuf or BigQuery). 
> When I consider nested data model, I thought two main points. Parquet data 
> model satisfies with these points. The first point that I've thought is the 
> processing model on nested data. Parquet data model is the same to that of 
> BigQuery, and BigQuery already concreted the processing model including 
> flattening, cross production on repeated fields, and aggregation on repeated 
> fields [1][2]. The second point is file format. Parquet is a native file 
> format for this model. Parquet already includes the efficient record assembly 
> method. Besides, Parquet is already mature and is widely used in many systems.
> [1] http://research.google.com/pubs/pub36632.html
> [2] https://developers.google.com/bigquery/docs/data
> I'm thinking that we need three stages for this work. Firstly, we can start 
> with a small change to improve our schema system. Then, we will add some 
> physical operator to just flatten one nested row into a number of flattened 
> rows. Finally, we will solve some query optimization issues like 
> projection/filter push down on nested schema and will add some physical 
> operators to directly process nested rows.
> If you have any idea, feel free to share with us.
> Thanks,
> Hyunsik
> {quote}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to