[
https://issues.apache.org/jira/browse/TAJO-710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
David Chen updated TAJO-710:
----------------------------
Summary: Add support for nested schemas and complex types (was: Add
support for nested schemas)
> Add support for nested schemas and complex types
> ------------------------------------------------
>
> Key: TAJO-710
> URL: https://issues.apache.org/jira/browse/TAJO-710
> Project: Tajo
> Issue Type: New Feature
> Components: data type
> Reporter: David Chen
>
> Add support for nested schemas. Here are some ways other systems handle
> nested schemas:
> * Pig and Hive uses complex data types, such as bags, structs, arrays, etc.
> * Impala doesn't support nested schemas and simply flattens the schema.
> From the discussion in TAJO-30:
> {quote}
> I have a plan for nested schema. Currently, Tajo only supports a flat schema
> like relational DBMS. So, even though Tajo is extended to nested data mode,
> it will not break the compatibility.
> I'm thinking that Tajo takes Parquet data model (= protobuf or BigQuery).
> When I consider nested data model, I thought two main points. Parquet data
> model satisfies with these points. The first point that I've thought is the
> processing model on nested data. Parquet data model is the same to that of
> BigQuery, and BigQuery already concreted the processing model including
> flattening, cross production on repeated fields, and aggregation on repeated
> fields [1][2]. The second point is file format. Parquet is a native file
> format for this model. Parquet already includes the efficient record assembly
> method. Besides, Parquet is already mature and is widely used in many systems.
> [1] http://research.google.com/pubs/pub36632.html
> [2] https://developers.google.com/bigquery/docs/data
> I'm thinking that we need three stages for this work. Firstly, we can start
> with a small change to improve our schema system. Then, we will add some
> physical operator to just flatten one nested row into a number of flattened
> rows. Finally, we will solve some query optimization issues like
> projection/filter push down on nested schema and will add some physical
> operators to directly process nested rows.
> If you have any idea, feel free to share with us.
> Thanks,
> Hyunsik
> {quote}
--
This message was sent by Atlassian JIRA
(v6.2#6252)