[ 
https://issues.apache.org/jira/browse/FLINK-5280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15760985#comment-15760985
 ] 

Fabian Hueske commented on FLINK-5280:
--------------------------------------

Sorry for my late response. I'll try to answer your questions and will comment 
on some of your ideas:

* The order of fields in a {{PojoTypeInfo}} depend on the order in which the 
fields are returned by Java' reflection interfaces. I think the order is 
lexicographic. But as [~jark] said, we can get the indexes of the fields by 
{{PojoTypeInfo.getFieldIndex()}}. This will give us the right indexes. 
* We do not need (and don't want to have) an Avro dependency in 
{{flink-table}}. The mapping should happen in the {{TableSource}} which should 
be located in a connector Maven module.
* A Generic Avro record is a generic holder for data of any schema. The data of 
a generic record object is interpreted using an Avro Schema. The Schema would 
give us the required field names and types. Using the schema, we could 
construct a {{Row}} with possibly nested {{Row}}s and move all data from a 
generic record into a nested {{Row}} object. 

* {{TableSource.getNumberOfFields()}} can be dropped. The question is whether 
this is important enough to break the API. If we decide to touch the interface, 
I'm +1 to remove it.
* I'm not sure about requiring that a {{TableSource}} must return a {{Row}}. In 
case of a Specific Avro record, we would need an additional step to copy the 
first-level Pojo fields into a {{Row}}, which would need some reflection or 
code generation, instead of simply forwarding the Avro object. We could still 
allow any kind of type information and use field names provided by the 
{{TypeInformation}}. If the return type is a Pojo, we would use its field 
names. If the return type is a Tuple, the fields would be named `f0`, `f1`, 
.... If this is not desired, the {{TableSource}} could return {{Row}}s. If we 
want to rename fields, we have to use {{Row}} as well.

> Extend TableSource to support nested data
> -----------------------------------------
>
>                 Key: FLINK-5280
>                 URL: https://issues.apache.org/jira/browse/FLINK-5280
>             Project: Flink
>          Issue Type: Improvement
>          Components: Table API & SQL
>    Affects Versions: 1.2.0
>            Reporter: Fabian Hueske
>            Assignee: Ivan Mushketyk
>
> The {{TableSource}} interface does currently only support the definition of 
> flat rows. 
> However, there are several storage formats for nested data that should be 
> supported such as Avro, Json, Parquet, and Orc. The Table API and SQL can 
> also natively handle nested rows.
> The {{TableSource}} interface and the code to register table sources in 
> Calcite's schema need to be extended to support nested data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to