Re: [infinispan-dev] Infinispan and change data capture

Emmanuel Bernard Thu, 15 Dec 2016 09:19:57 -0800

> On 15 Dec 2016, at 15:59, Gustavo Fernandes <gust...@infinispan.org> wrote:
> 
> 
> 
> On Thu, Dec 15, 2016 at 2:53 PM, Emmanuel Bernard <emman...@hibernate.org 
> <mailto:emman...@hibernate.org>> wrote:
> 
>> On 15 Dec 2016, at 11:18, Gustavo Fernandes <gust...@infinispan.org 
>> <mailto:gust...@infinispan.org>> wrote:
>> 
>> On Thu, Dec 15, 2016 at 9:54 AM, Emmanuel Bernard <emman...@hibernate.org 
>> <mailto:emman...@hibernate.org>> wrote:
>> The goal is as followed: allow to collect all changes to push them to 
>> Debezium and thus Kafka.
>> 
>> This need does not require to remember all changes since the beginning of 
>> time in Infinispan. Just enough to:
>> - let Kafka catchup assuming it is the bottleneck
>> - let us not lose a change in Kafka when it happened in Infinispan 
>> (coordinator, owner, replicas dying)
>> 
>> The ability to read back history would then be handled by the Debezium / 
>> Kafka tail, not infinispan itself.
>> 
>> 
>> Having an embedded Debezium connector pushing everything to Kafka sounds 
>> cool, but what impact would it bring to the other stream consumers:
>> 
>> * Remote listeners, which is supported in several clients apart from Java
>> * Continuous Queries (the same)
>> * Spark Stream
>> * Other eventual 3rd party stream processors: Apache Flick, Storm, etc.
>> 
>>  
> 
> Impact as in perf impact? Potential redesign impact? Or are you thinking of 
> another question?
> 
> 
> You mentioned that "The ability to read back history would then be handled by 
> the Debezium / Kafka tail, not infinispan itself", my question
> was how the other consumers would get access to that history.


Yes that’s an interesting point.

First off here we are describing an ad-hoc model where we push changes to 
Debezium and then Kafka.
But the underlying temp queue mechanism I described on the Dec 9th email might 
be used to harden the code pushing changes to the sources you describe and that 
even improve the continuous queries engine and the Spark DStream integration I 
suppose.
Maybe we want a more generic mechanism relying on that temp queue system to 
plug a list of consumers. And focus on Spark Stream, Continuous queries and 
Debezium as a first set of “clients”.

For the ability to read back in history, I am happy to force consumers to go 
through a Kafka queue. As others pointed out, if we make Infinispan a durable 
queue system, we are making a different Infinispan than what it is today and 
this is probably undesirable.

Emmanuel

_______________________________________________
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Infinispan and change data capture

Reply via email to