Re: Topic: CDC for Flink Source

Bharath Vissapragada Fri, 04 Dec 2020 14:37:23 -0800

cc: dev@ bcc: user@

We are also building a database change stream from HBase/Phoenix data as
the data source (for another event bus implementation) and we hope to open
source it soon. We had similar discussions and ended up taking SEP
<https://www.ngdata.com/the-hbase-side-effect-processor-and-hbase-replication-monitoring/>
like
approach. For us it was critical to not have external service dependencies
or third party code running inside HBase code (which is the case if we take
a coprocessor / a custom ReplicationEndPoint approach)

Also building something like this on top of the replication framework has a
lot of benefits because it already handles stuff like checkpointing,
builtin backpressure handling (incase of downstream delays) and it has been
battle tested for a while. There are some edge cases with ordering
guarantees because of quirks in our replication design but we plan to fix
that too in the coming months.

On Fri, Dec 4, 2020 at 7:09 AM Leon Bein <[email protected]> wrote:

> Hi @all,
>
> we are currently developing an HBase source for Flink with the new API
> (FLIP-27
> <
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface
> >).
> For this, we try to implement change data capturing on HBase.
>
> What are your opinions on which approach to pursue?
> Options that we found so far include
>
>   * using Coprocessors, e.g. RegionObserver
>   * reading the WAL e.g. with ProtobufLogReader (which is marked as
>     LimitedPrivate)
>   * and using ReplicationEndpoint (which also seem to be more internal).
>
> Best regards,
> Leon Bein
>

Re: Topic: CDC for Flink Source

Reply via email to