[jira] [Commented] (FLINK-1919) Add HCatOutputFormat for Tuple data types

James Cao (JIRA) Thu, 30 Jul 2015 07:52:09 -0700

    [ 
https://issues.apache.org/jira/browse/FLINK-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14647727#comment-14647727
 ]


James Cao commented on FLINK-1919:
----------------------------------

Thanks!

> Add HCatOutputFormat for Tuple data types
> -----------------------------------------
>
>                 Key: FLINK-1919
>                 URL: https://issues.apache.org/jira/browse/FLINK-1919
>             Project: Flink
>          Issue Type: New Feature
>          Components: Java API, Scala API
>            Reporter: Fabian Hueske
>            Assignee: James Cao
>            Priority: Minor
>              Labels: starter
>
> It would be good to have an OutputFormat that can write data to HCatalog 
> tables.
> The Hadoop `HCatOutputFormat` expects `HCatRecord` objects and writes these 
> to HCatalog tables. We can do the same thing, by creating these `HCatRecord` 
> object with a Map function that precedes a `HadoopOutputFormat` that wraps 
> the Hadoop `HCatOutputFormat`.
> Better support for Flink Tuples can be added by implementing a custom 
> `HCatOutputFormat` that also depends on the Hadoop `HCatOutputFormat` but 
> internally converts Flink Tuples to `HCatRecords`. This would also include to 
> check if the schema of the HCatalog table and the Flink tuples match. For 
> data types other than tuples, the OutputFormat could either require a 
> preceding Map function that converts to `HCatRecords` or let users specify a 
> MapFunction and invoke that internally.
> We have already a Flink `HCatInputFormat` which does this in the reverse 
> directions, i.e., it emits Flink Tuples from HCatalog tables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1919) Add HCatOutputFormat for Tuple data types

Reply via email to