Hi,

I definitely agree with Pavel and Evgenii ideas and comments.

>From my point of view the proposal is not about Apache Ignite
features. Described functionality could be implemented outside of the
Apache Ignite project. Perhaps, Debezium connector or WAL-G module are
the best candidates for the proposed CDC.

CDC API will require additional support from Apache Ignite community
while there is no need in any public API for described cases. It is a
really good idea to implement CDC tool as part of Debezium or WAL-G
projects. For other cases there is WALIterator.

Also I have some comments about IEP-59. I use `>` for quotes from the IEP.

> Many use-cases build on observation and processing changed records.

Yes. But there is a significant difference between Apache Ignite and
RDBMS like MySQL and PostgreSQL. Apache Ignite is a middleware class
product and required logic could be implemented as part of business
logic while RDBMS don't provide such possibility (you could implement
an optional module for CDC purposes but *usually* you can't implement
CDC-like functionality as a stored procedure).

> Disadvantages of the CQ in described scenarios:
>
> CQ requires data to be sent over the network

In a normal case the CDC tool will also send data over the network. I
doubt that Apache Ignite node which is hosted on the same server
where, for example, analytical storage (consumer) is hosted, is a good
idea.

> CQ parts (filter, transformer) live inside server node JVM so issues in it 
> may affect server node stability.

May affect or may not. It depends on filter/transformer
implementation. Just implement these parts properly. We have many
dangerous pits in Apache Ignite which represent entities like filter,
transformer, listener, etc. But we do not invent something that should
protect developers from blocking all this stuff. Except of failure
handling and blocker threads detection of course :) Many concurrent
models based on similar concepts and all these models are sensitive to
blocking.

> Slow CQ listener leads to increasing of the memory consumption of the server 
> node.

Yes. But CQ couldn't be for free. Otherwise, this point is more likely
to refer to the previous one.

> Fails of the CQ listener lead to the loss of the events.

What kind of fails? CQ listener is the developer's responsibility.

> The convenient solution should be:
>
> Independence from the server node process (JVM) - issues and failures of the 
> consumer shouldn't lead to server node instability.

It's a very good point. And it is the best point for implementing CDC
tool as some external tool.

> Notification guarantees and failover - i.e. track and save a pointer to the 
> last consumed record. Continue notification from this pointer in case of 
> restart.

Notification is a superfluous word here. There is no need for
notifications. All you need is WAL segments.


On Wed, Oct 14, 2020 at 4:05 PM Nikolay Izhikov <[email protected]> wrote:
>
> Hello, Evgeni.
>
> > It seems like this solution will have an unpredictable delay for 
> > synchronization for handling events.
>
> It’s true that CDC solution doesn’t have strict boundaries for notification 
> delay because of asynchronous nature.
> But, I assume that we will introduce a WAL rollout timeout for CDC cases
> Please, take a look at the ticket [1].
>
> The same approach is used by Oracle and other databases that implement CDC.
>
> Anyway, I treat notification delay and split of the event and its consumption 
> as an advantage of CDC, not downside :)
>
> > Why can't we just implement a Debezium connector for Ignite, for example?
>
> I think we can.
> But, AFAIK debezium connectors developed for other databases uses CDC 
> implementations similar to proposed.
>
> [1] https://issues.apache.org/jira/browse/IGNITE-13582?src=confmacro
>
>
> > 14 окт. 2020 г., в 15:36, Evgenii Zhuravlev <[email protected]> 
> > написал(а):
> >
> > Hi,
> >
> >> On the segment archiving, utility iterates it using existing WALIterator
> >> Wait and respond to some specific events or data changes.
> > It seems like this solution will have an unpredictable delay for
> > synchronization for handling events.
> >
> > Why can't we just implement a Debezium connector for Ignite, for example?
> > https://debezium.io/documentation/reference/1.3/index.html. It is a pretty
> > popular product that uses Kafka underneath.
> >
> > Evgenii
> >
> >
> > ср, 14 окт. 2020 г. в 05:00, Pavel Kovalenko <[email protected]>:
> >
> >> This tool is also can be used to store snapshots in an external warehouse.
> >>
> >>
> >> ср, 14 окт. 2020 г. в 14:57, Pavel Kovalenko <[email protected]>:
> >>
> >>> Hi Nikolay,
> >>>
> >>> The idea is good. But what do you think to integrate these ideas into
> >>> WAL-G project?
> >>> https://github.com/wal-g/wal-g
> >>> It's a well-known tool that is already used to stream WAL for PostgreSQL,
> >>> MySQL, and MongoDB.
> >>> The advantages are integration with S3, GCP, Azure out of the box,
> >>> encryption, and compression.
> >>>
> >>>
> >>> ср, 14 окт. 2020 г. в 14:21, Nikolay Izhikov <[email protected]>:
> >>>
> >>>> Hello, Igniters.
> >>>>
> >>>> I want to start a discussion of the new feature [1]
> >>>>
> >>>> CDC - capture data change. The feature allows the consumer to receive
> >>>> online notifications about data record changes.
> >>>>
> >>>> It can be used in the following scenarios:
> >>>>        * Export data into some warehouse, full-text search, or
> >>>> distributed log system.
> >>>>        * Online statistics and analytics.
> >>>>        * Wait and respond to some specific events or data changes.
> >>>>
> >>>> Propose to implement new IgniteCDC application as follows:
> >>>>        * Run on the server node host.
> >>>>        * Watches for the appearance of the WAL archive segments.
> >>>>        * Iterates it using existing WALIterator and notifies consumer
> >> of
> >>>> each record from the segment.
> >>>>
> >>>> IgniteCDC features:
> >>>>        * Independence from the server node process (JVM) - issues and
> >>>> failures of the consumer will not lead to server node instability.
> >>>>        * Notification guarantees and failover - i.e. CDC track and save
> >>>> the pointer to the last consumed record. Continue notification from this
> >>>> pointer in case of restart.
> >>>>        * Resilience for the consumer - it's not an issue when a
> >> consumer
> >>>> temporarily consumes slower than data appear.
> >>>>
> >>>> WDYT?
> >>>>
> >>>> [1]
> >>>>
> >> https://cwiki.apache.org/confluence/display/IGNITE/IEP-59+CDC+-+Capture+Data+Change
> >>>
> >>>
> >>
>

Reply via email to