Fabian Hueske created FLINK-1919:
------------------------------------

             Summary: Add HCatOutputFormat for Tuple data types
                 Key: FLINK-1919
                 URL: https://issues.apache.org/jira/browse/FLINK-1919
             Project: Flink
          Issue Type: New Feature
          Components: Java API, Scala API
            Reporter: Fabian Hueske
            Priority: Minor


It would be good to have an OutputFormat that can write data to HCatalog tables.

The Hadoop `HCatOutputFormat` expects `HCatRecord` objects and writes these to 
HCatalog tables. We can do the same thing, by creating these `HCatRecord` 
object with a Map function that precedes a `HadoopOutputFormat` that wraps the 
Hadoop `HCatOutputFormat`.

Better support for Flink Tuples can be added by implementing a custom 
`HCatOutputFormat` that also depends on the Hadoop `HCatOutputFormat` but 
internally converts Flink Tuples to `HCatRecords`. This would also include to 
check if the schema of the HCatalog table and the Flink tuples match. For data 
types other than tuples, the OutputFormat could either require a preceding Map 
function that converts to `HCatRecords` or let users specify a MapFunction and 
invoke that internally.

We have already a Flink `HCatInputFormat` which does this in the reverse 
directions, i.e., it emits Flink Tuples from HCatalog tables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to