[GitHub] flink pull request: [FLINK1919] add HCatOutputFormat

jamescao Sun, 30 Aug 2015 17:17:17 -0700

GitHub user jamescao opened a pull request:

    https://github.com/apache/flink/pull/1079


    [FLINK1919] add HCatOutputFormat

    [FLINK1919]
    Add `HCatOutputFormat` for Tuple data types for java and scala api also fix 
a bug for the scala api's `HCatInputFormat` for hive complex types.
    Java api includes check for whether the schema of the HCatalog table and 
the Flink tuples match if the user provides a `TypeInformation` in the 
constructor. For data types other than tuples, the OutputFormat requires a 
preceding Map function that converts to `HCatRecords`
    scala api includes check if the schema of the HCatalog table and the Scala 
tuples match. For data types other than scala Tuple, the `OutputFormat` 
requires a preceding Map function that converts to HCatRecords scala api 
requires user to import `org.apache.flink.api.scala`._ to allow the type be 
captured by the scala macro.
    The Hcatalog jar in maven central is compiled using hadoop1, which is not 
compatible with hive jars for testing, so a cloudera hcatalog jar is pulled 
into the pom for testing purpose. It can be removed if not required.
    java List and Map can not be cast to scala `List` and `Map`, 
`JavaConverters` is used to fix a bug in HcatInputFormat scala api
    
    @chiwanpark @rmetzger 
    I have changed the hcatalog jar to the apache version. That requires that I 
move the hcatalog module to hadoop1 profile. 
    @chiwanpark 
    I had made changes to most of your comment. Except for your comment 
regarding the verification of Exception in the tests. I feel that it's better 
to verify the exception at the point it's expected to be thrown. If we use 
method-wide annotation, we are not sure where the exception is thrown from the 
test method, this is not safe especially for common exception types such as 
IOException. I did remove the test dependency on exception error message.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jamescao/flink hcatbranch

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/1079.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1079
    
----
commit b226ff06aa37a84b3203fe012be321f4161f2b03
Author: James Cao <james...@outlook.com>
Date:   2015-08-06T01:52:45Z

    add HCatOutputFormat
    java api and scala api
    fix scala HCatInputFormat bug for complex type
    moved hcatalog module to hadoop1 profile.
    Modify the surefile configuration for hcatalog tests.
    Addressed review comments from the first PR.
    remove unused import

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK1919] add HCatOutputFormat

Reply via email to