Hi Rex,
yes you can go directly into Flink since 1.11.0 [1], but afaik only through
Table API/SQL currently (which you seem to be using anyways most of the
time). I'd recommend using 1.11.1+ (some bugfixes) or even 1.12.0+ (many
new useful features [2]). You can also check the main doc [3].
If yo
Thanks Arvid,
I think my confusion lies in misinterpreting the meaning of CDC. We
basically don't want CDC, we just use it to get data into a compacted Kafka
topic where we hold the current state of the world to consume from multiple
consumers. You have described pretty thoroughly where we want to
> We are rereading the topics, at any time we might want a completely
different materialized view for a different web service for some new
application feature. Other jobs / new jobs need to read all the up-to-date
rows from the databases.
> I still don't see how this is the case if everything just
Hi Arvid,
>If you are not rereading the topics, why do you compact them?
We are rereading the topics, at any time we might want a completely
different materialized view for a different web service for some new
application feature. Other jobs / new jobs need to read all the up-to-date
rows from the
Hi Rex,
Your initial question was about the impact of compaction on your CDC
application logic. I have been (unsuccessfully) trying to tell you that you
do not need compaction and it's counterproductive.
If you are not rereading the topics, why do you compact them? It's lost
compute time and I/O
Hi Arvid,
I really appreciate the thorough response but I don't think this
contradicts our use case. In servicing web applications we're doing nothing
more than taking data from giant databases we use, and performing joins and
denormalizing aggs strictly for performance reasons (joining across a l
Hi Rex,
imho log compaction and CDC for historic processes are incompatible on
conceptual level. Let's take this example:
topic: party membership
+(1, Dem, 2000)
-(1, Dem, 2009)
+(1, Gop, 2009)
Where 1 is the id of a real person.
Now, let's consider you want to count memberships retroactively ea
Digging around, it looks like Upsert Kafka which requires a Primary Key
will actually do what I want and uses compaction, but it doesn't look
compatible with Debezium format? Is this on the roadmap?
In the meantime, we're considering consuming from Debezium Kafka (still
compacted) and then writing
Does this also imply that it's not safe to compact the initial topic where
data is coming from Debezium? I'd think that Flink's Kafka source would
emit retractions on any existing data with a primary key as new data with
the same pk arrived (in our case all data has primary keys). I guess that
goes
Just to clarify, intermediate topics should in most cases not be compacted
for exactly the reasons if your application depends on all intermediate
data. For the final topic, it makes sense. If you also consume intermediate
topics for web application, one solution is to split it into two topics
(lik
All of our Flink jobs are (currently) used for web applications at the end
of the day. We see a lot of latency spikes from record amplification and we
were at first hoping we could pass intermediate results through Kafka and
compact them to lower the record amplification, but then it hit me that
th
Jan's response is correct, but I'd like to emphasize the impact on a Flink
application.
If the compaction happens before the data arrives in Flink, the
intermediate updates are lost and just the final result appears.
Also if you restart your Flink application and reprocess older data, it
will natu
Hi Rex,
If I understand correctly, you are concerned about behavior of Kafka
source in the case of compacted topic, right? If that is the case, then
this is not directly related to Flink, Flink will expose the behavior
defined by Kafka. You can read about it for instance here [1]. TL;TD -
you
Apologies, forgot to finish. If the Kafka source performs its own
retractions of old data on key (user_id) for every append it receives, it
should resolve this discrepancy I think.
Again, is this true? Anything else I'm missing?
Thanks!
On Tue, Feb 23, 2021 at 6:12 PM Rex Fenley wrote:
> Hi,
Hi,
I'm concerned about the impacts of Kafka's compactions when sending data
between running flink jobs.
For example, one job produces retract stream records in sequence of
(false, (user_id: 1, state: "california") -- retract
(true, (user_id: 1, state: "ohio")) -- append
Which is consumed by Kafk
15 matches
Mail list logo