[ 
https://issues.apache.org/jira/browse/FLINK-20955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arvid Heise updated FLINK-20955:
--------------------------------
    Fix Version/s:     (was: 1.12.0)
                   1.13.0

> Refactor HBase Source in accordance with FLIP-27
> ------------------------------------------------
>
>                 Key: FLINK-20955
>                 URL: https://issues.apache.org/jira/browse/FLINK-20955
>             Project: Flink
>          Issue Type: Improvement
>          Components: Connectors / HBase
>            Reporter: Moritz Manner
>            Assignee: Moritz Manner
>            Priority: Major
>             Fix For: 1.13.0
>
>
> The HBase connector source implementation should be updated in accordance 
> with [FLIP-27: Refactor Source 
> Interface|https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface].
> One source should map to one table in HBase. Users can specify which 
> column[families] to watch; each change in one of the columns triggers a 
> record with change type, table, column family, column, value, and timestamp.
> h3. Idea
> The new Flink HBase Source makes use of the internal [replication mechanism 
> of HBase|https://hbase.apache.org/book.html#_cluster_replication]. The Source 
> is registering at the HBase cluster and will receive all WAL edits written in 
> HBase. From those WAL edits the Source can create the DataStream. 
> h3. Split
> We're still not 100% sure which information a Split should contain. We have 
> the following possibilities: 
>  # There is only one Split per Source and the Split contains all the 
> necessary information to connect with HBase. The SourceReader which processes 
> the Split will receive all WAL edits for all tables and filters the relevant 
> edits. 
>  # There are multiple Splits per Source, each Split representing one HBase 
> Region to read from. This assumes that it is possible to only receive WAL 
> edits from a specific HBase Region and not receive all WAL edits. This would 
> be preferable as it allows parallel processing of multiple regions, but we 
> still need to figure out how this is possible.
> In both cases the Split will contain information about the HBase instance and 
> table. 
> h3. Split Enumerator
> Depending on which Split we'll decide on, the split enumerator will connect 
> to HBase and get all relevant regions or just create one Split.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to