[ 
https://issues.apache.org/jira/browse/FLINK-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14743517#comment-14743517
 ] 

ASF GitHub Bot commented on FLINK-2167:
---------------------------------------

GitHub user twalthr opened a pull request:

    https://github.com/apache/flink/pull/1127

    [FLINK-2167] [table] Add fromHCat() to TableEnvironment

    This PR introduces input format interfaces (so-called `TableSource`s) for 
the Table API. There are two types of TableSources:
    
    - `AdaptiveTableSource`s can adapt their output to the requirements of the 
plan. Although the output schema stays the same, the TableSource can react on 
field resolution and/or predicates internally and can return adapted 
DataSet/DataStream versions in the "translate" step.
    - `StaticTableSource`s are an easy way to provide the Table API with 
additional input formats without much implementation effort (e.g. for 
fromCsvFile())
    
    TableSource have been deeply integrated into the Table API. 
    
    The TableEnvironment now requires the newly introduced 
`AbstractExecutionEnvironment` (common super class of all ExecutionEnvironments 
for DataSets and DataStreams).
    
    An example of an AdaptiveTableSources can be found in `HCatTableSource`. 
HCatTableSource supports predicate pushdown as well as selection pushdown to 
HCatalog. Only those predicates are pushed to HCatalog that are partioned 
columns. Unresolved fields will not be read from HCatalog and remain `null` 
within the Table APIs rows.
    
    A an easy example looks like:
    ```
    TableEnironment t = new TableEnvironment(env);
    t.fromHCat("database", "table")
      .select("col1, col2")
      .filter("partCol==='5'");
    ```
    
    Here's what a TableSource can see from more complicated queries:
    
    ```
    getTableJava(tableSource1)
      .filter("a===5 || a===6")
      .select("a as a4, b as b4, c as c4")
      .filter("b4===7")
      .join(getTableJava(tableSource2))
      .where("a===a4 && c==='Test' && c4==='Test2'")
    
    // Result predicates for tableSource1:
    //  List("a===5 || a===6", "b===7", "c==='Test2'")
    // Result predicates for tableSource2:
    //  List("c==='Test'")
    // Result resolved fields for tableSource1 (true = filtering, 
false=selection):
    //  Set(("a", true), ("a", false), ("b", true), ("b", false), ("c", false), 
("c", true))
    // Result resolved fields for tableSource2 (true = filtering, 
false=selection):
    //  Set(("a", true), ("c", true))
    ```
    
    
    HCatTableSource has no tests yet, but I will implement it them soon. First 
I would be happy about some general feedback.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/twalthr/flink TableApiHcat

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/1127.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1127
    
----
commit f245604caccd8f97c1d6eabf16968dab3aa47572
Author: twalthr <twal...@apache.org>
Date:   2015-07-09T09:57:05Z

    [FLINK-2167] [table] Add fromHCat() to TableEnvironment

----


> Add fromHCat() to TableEnvironment
> ----------------------------------
>
>                 Key: FLINK-2167
>                 URL: https://issues.apache.org/jira/browse/FLINK-2167
>             Project: Flink
>          Issue Type: New Feature
>          Components: Table API
>    Affects Versions: 0.9
>            Reporter: Fabian Hueske
>            Assignee: Timo Walther
>            Priority: Minor
>              Labels: starter
>
> Add a {{fromHCat()}} method to the {{TableEnvironment}} to read a {{Table}} 
> from an HCatalog table.
> The implementation could reuse Flink's HCatInputFormat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to