Here's a deck of some proposed additions, discussed at one of the NGCC
sessions last fall:

https://github.com/ngcc/ngcc2017/blob/master/CassandraDataIngestion.pdf



On Tue, Jan 30, 2018 at 5:10 PM, Andrew Prudhomme <a...@yelp.com> wrote:

> Hi all,
>
> We are currently designing a system that allows our Cassandra clusters to
> produce a stream of data updates. Naturally, we have been evaluating if CDC
> can aid in this endeavor. We have found several challenges in using CDC for
> this purpose.
>
> CDC provides only the mutation as opposed to the full column value, which
> tends to be of limited use for us. Applications might want to know the full
> column value, without having to issue a read back. We also see value in
> being able to publish the full column value both before and after the
> update. This is especially true when deleting a column since this stream
> may be joined with others, or consumers may require other fields to
> properly process the delete.
>
> Additionally, there is some difficulty with processing CDC itself such as:
> - Updates not being immediately available (addressed by CASSANDRA-12148)
> - Each node providing an independent streams of updates that must be
> unified and deduplicated
>
> Our question is, what is the vision for CDC development? The current
> implementation could work for some use cases, but is a ways from a general
> streaming solution. I understand that the nature of Cassandra makes this
> quite complicated, but are there any thoughts or desires on the future
> direction of CDC?
>
> Thanks
>
>

Reply via email to