cc: dev@ bcc: user@ We are also building a database change stream from HBase/Phoenix data as the data source (for another event bus implementation) and we hope to open source it soon. We had similar discussions and ended up taking SEP <https://www.ngdata.com/the-hbase-side-effect-processor-and-hbase-replication-monitoring/> like approach. For us it was critical to not have external service dependencies or third party code running inside HBase code (which is the case if we take a coprocessor / a custom ReplicationEndPoint approach)
Also building something like this on top of the replication framework has a lot of benefits because it already handles stuff like checkpointing, builtin backpressure handling (incase of downstream delays) and it has been battle tested for a while. There are some edge cases with ordering guarantees because of quirks in our replication design but we plan to fix that too in the coming months. On Fri, Dec 4, 2020 at 7:09 AM Leon Bein <leon.b...@student.hpi.de> wrote: > Hi @all, > > we are currently developing an HBase source for Flink with the new API > (FLIP-27 > < > https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface > >). > For this, we try to implement change data capturing on HBase. > > What are your opinions on which approach to pursue? > Options that we found so far include > > * using Coprocessors, e.g. RegionObserver > * reading the WAL e.g. with ProtobufLogReader (which is marked as > LimitedPrivate) > * and using ReplicationEndpoint (which also seem to be more internal). > > Best regards, > Leon Bein >