[ 
https://issues.apache.org/jira/browse/TAJO-809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13986696#comment-13986696
 ] 

Hyunsik Choi commented on TAJO-809:
-----------------------------------

I really like the keyword 'record' and the scalar-style syntax for declaring 
array, maps, and unions. Structs, array, map, and union are all widely used in 
Hadoop ecosystems, such as Hive, Avro, and Pig(?). We have to support them all.

In these days, I'm thinking that their internal representation model and how to 
evaluate all nested types in query processing. I'd like to start the discussion 
about them in TAJO-710. I'll comment soon in TAJO-710.

> Langauge extension for non-scalar types
> ---------------------------------------
>
>                 Key: TAJO-809
>                 URL: https://issues.apache.org/jira/browse/TAJO-809
>             Project: Tajo
>          Issue Type: New Feature
>            Reporter: David Chen
>            Assignee: David Chen
>
> This ticket is to track the work for defining the syntax for nested schemas, 
> maps, arrays, and unions and the work for adding the syntax to the parser. 
> Initially, we can add stubs for the parser endpoints that will then be 
> fleshed out when support for the data type is actually implemented (see other 
> subtasks of TAJO-710).
> I have an idea of a possible DDL syntax for these types, and I would like to 
> get your feedback on it. I considered just using Hive's syntax but I felt 
> that it was not the best syntax for these types.
> Instead of calling nested records "structs" like the way Hive does, I simply 
> call them records as well and use the same syntax used for declaring the 
> top-level record fields:
> {code}
> create table record_example (
>     nested_field record (
>       field1 int,
>       field2 double),
>     two_levels_nested record (
>       inner_nested record (
>         field3 string,
>         field4 int),
>       field5 int),
>   ) using parquet;
> {code}
> For arrays, maps, and unions, I am using a syntax inspired by Scala's syntax 
> for generics:
> {code}
> create table array_example (
>     int_array array[int],
>     record_array array[record (
>       field1 int,
>       field2 string)]
>   ) using avro;
> create table map_example (
>     string_to_int map[string, int],
>     int_to_record map[int, record (
>       field1 string,
>       field2 int)],
>   ) using avro;
> create table union_example (
>     integers union[bit, smallint, integer, bigint]
>   ) using parquet;
> {code}
> Of course, it is possible that when we implement these data types, we may 
> make changes to the syntax, but for now, I think we should define an initial 
> language. Once the initial syntax has stabilized, I will write a formal 
> grammar for it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to