The Debezium project [1] is working on building change data capture connectors 
for a variety of databases. MySQL is available now, MongoDB will be soon, and 
PostgreSQL and Oracle are next on our roadmap. 

One way in which Debezium and Infinispan can be used together is when 
Infinispan is being used as a cache for data stored in a database. In this 
case, Debezium can capture the changes to the database and produce a stream of 
events; a separate process can consume these change and evict entries from an 
Infinispan cache.

If Infinispan is to be used as a data store, then it would be useful for 
Debezium to be able to capture those changes so other apps/services can consume 
the changes. First of all, does this make sense? Secondly, if it does, then 
Debezium would need an Infinispan connector, and it’s not clear to me how that 
connector might capture the changes from Infinispan.

Debezium typically monitors the log of transactions/changes that are committed 
to a database. Of course how this works varies for each type of database. For 
example, MySQL internally produces a transaction log that contains information 
about every committed row change, and MySQL ensures that every committed change 
is included and that non-committed changes are excluded. The MySQL mechanism is 
actually part of the replication mechanism, so slaves update their internal 
state by reading the master’s log. The Debezium MySQL connector [2] simply 
reads the same log.

Infinispan has several mechanisms that may be useful:

Interceptors - See [3]. This seems pretty straightforward and IIUC provides 
access to all internal operations. However, it’s not clear to me whether a 
single interceptor will see all the changes in a cluster (perhaps in local and 
replicated modes) or only those changes that happen on that particular node (in 
distributed mode). It’s also not clear whether this interceptor is called 
within the context of the cache’s transaction, so if a failure happens just at 
the wrong time whether a change might be made to the cache but is not seen by 
the interceptor (or vice versa).
Cross-site replication - See [4][5]. A potential advantage of this mechanism 
appears to be that it is defined (more) globally, and it appears to function if 
the remote backup comes back online after being offline for a period of time.
State transfer - is it possible to participate as a non-active member of the 
cluster, and to effectively read all state transfer activities that occur 
within the cluster?
Cache store - tie into the cache store mechanism, perhaps by wrapping an 
existing cache store and sitting between the cache and the cache store
Monitor the cache store - don’t monitor Infinispan at all, and instead monitor 
the store in which Infinispan is storing entries. (This is probably the least 
attractive, since some stores can’t be monitored, or because the store is 
persisting an opaque binary value.)

Are there other mechanism that might be used?

There are a couple of important requirements for change data capture to be able 
to work correctly:

Upon initial connection, the CDC connector must be able to obtain a snapshot of 
all existing data, followed by seeing all changes to data that may have 
occurred since the snapshot was started. If the connector is stopped/fails, 
upon restart it needs to be able to reconnect and either see all changes that 
occurred since it last was capturing changes, or perform a snapshot. 
(Performing a snapshot upon restart is very inefficient and undesirable.) This 
works as follows: the CDC connector only records the “offset” in the source’s 
sequence of events; what this “offset” entails depends on the source. Upon 
restart, the connector can use this offset information to coordinate with the 
source where it wants to start reading. (In MySQL and PostgreSQL, every event 
includes the filename of the log and position in that file. MongoDB includes in 
each event the monotonically increasing timestamp of the transaction.
No change can be missed, even when things go wrong and components crash.
When a new entry is added, the “after” state of the entity will be included. 
When an entry is updated, the “after” state will be included in the event; if 
possible, the event should also include the “before” state. When an entry is 
removed, the “before” state should be included in the event.

Any thoughts or advice would be greatly appreciated.

Best regards,

Randall


[1] http://debezium.io
[2] http://debezium.io/docs/connectors/mysql/
[3] 
http://infinispan.org/docs/stable/user_guide/user_guide.html#_custom_interceptors_chapter
[4] 
http://infinispan.org/docs/stable/user_guide/user_guide.html#CrossSiteReplication
[5] 
https://github.com/infinispan/infinispan/wiki/Design-For-Cross-Site-Replication
_______________________________________________
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Reply via email to