[ https://issues.apache.org/jira/browse/FLINK-20955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Arvid Heise updated FLINK-20955: -------------------------------- Fix Version/s: (was: 1.12.0) 1.13.0 > Refactor HBase Source in accordance with FLIP-27 > ------------------------------------------------ > > Key: FLINK-20955 > URL: https://issues.apache.org/jira/browse/FLINK-20955 > Project: Flink > Issue Type: Improvement > Components: Connectors / HBase > Reporter: Moritz Manner > Assignee: Moritz Manner > Priority: Major > Fix For: 1.13.0 > > > The HBase connector source implementation should be updated in accordance > with [FLIP-27: Refactor Source > Interface|https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface]. > One source should map to one table in HBase. Users can specify which > column[families] to watch; each change in one of the columns triggers a > record with change type, table, column family, column, value, and timestamp. > h3. Idea > The new Flink HBase Source makes use of the internal [replication mechanism > of HBase|https://hbase.apache.org/book.html#_cluster_replication]. The Source > is registering at the HBase cluster and will receive all WAL edits written in > HBase. From those WAL edits the Source can create the DataStream. > h3. Split > We're still not 100% sure which information a Split should contain. We have > the following possibilities: > # There is only one Split per Source and the Split contains all the > necessary information to connect with HBase. The SourceReader which processes > the Split will receive all WAL edits for all tables and filters the relevant > edits. > # There are multiple Splits per Source, each Split representing one HBase > Region to read from. This assumes that it is possible to only receive WAL > edits from a specific HBase Region and not receive all WAL edits. This would > be preferable as it allows parallel processing of multiple regions, but we > still need to figure out how this is possible. > In both cases the Split will contain information about the HBase instance and > table. > h3. Split Enumerator > Depending on which Split we'll decide on, the split enumerator will connect > to HBase and get all relevant regions or just create one Split. -- This message was sent by Atlassian Jira (v8.3.4#803005)