Re: [DISCUSS] KIP-58 - Make Log Compaction Point Configurable

Eric Wasserman Mon, 16 May 2016 21:22:34 -0700

Gwen,

For simplicity, the example I gave in the gist is for a single table with a 
single partition. The salient point is that even for a single topic with one 
partition there is no guarantee without the feature that one will be able to 
restore some particular checkpoint as the offset indicated by that checkpoint 
may have been compacted away.

The practical reality is we are trying to restore the state of a database with 
nearly 1000 tables each of which has 8 partitions. In this real case there are 
8000 offsets indicated in each checkpoint. If even a single one of those 8000 
is compacted the checkpointed state cannot be reconstructed.

Additionally, we don't really intend to have the consumers of the table topics 
try to keep current. Rather they will occasionally (say at 1AM each day) try to 
build the state of the database at a recent checkpoint (say from midnight). 
Supposing this takes a bit of time (10's of minutes to hours) to read all the 
partitions of all the table topics up each to its target offset indicated in 
the midnight checkpoint. By the time all the consumers have arrive at the 
designated offset perhaps one of them will have had its target offset compacted 
away. We would then need to select a new target checkpoint with its offsets for 
each topic and partition that is a bit later. How much later? It might well be 
around the 10's of minutes to hours it took to read through to the offsets of 
the original target checkpoint as the compaction that foiled us may have 
occurred just before we reached the goal.

Really the issue is that while without the feature while we could eventually 
restore _some_ consistent state we couldn't be assured of being able to restore 
any
particular (recent) one. My comment about never being assured of the process 
terminating is just acknowledging the perhaps small but nonetheless finite 
possibility of the process of chasing the checkpoints looking for which no 
partition has yet had its target offset compacted away could continue 
indefinitely. There is really no condition in which one could be absolutely 
guaranteed this process would terminate.

The feature addresses this by providing a guarantee that _any_ checkpoint can 
be reconstructed as long as it is within the compaction lag. I would love to be 
convinced that I am in error but short of that I frankly would never turn on 
compaction for a CDC use case without it.

As to reducing the number of parameters. I personally only see the 
min.compaction.lag.ms as being truly essential. Even the existing ratio setting 
is secondary in my mind.

Eric

> On May 16, 2016, at 6:42 PM, Gwen Shapira <g...@confluent.io> wrote:
> 
> Hi Eric,
> 
> Thank you for submitting this improvement suggestion.
> 
> Do you mind clarifying the use-case for me?
> 
> Looking at your gist: https://gist.github.com/ewasserman/f8c892c2e7a9cf26ee46
> 
> If my consumer started reading all the CDC topics from the very
> beginning in which they were created, without ever stopping, it is
> obviously guaranteed to see every single consistent state of the
> database.
> If my consumer joined late (lets say after Tq got clobbered by Tr) it
> will get a mixed state, but if it will continue listening on those
> topics, always following the logs to their end, it is guaranteed to
> see a consistent state as soon a new transaction commits. Am I missing
> anything?
> 
> Basically, I do not understand why you claim: "However, to recover all
> the tables at the same checkpoint, with each independently compacting,
> one may need to move to an even more recent checkpoint when a
> different table had the same read issue with the new checkpoint. Thus
> one could never be assured of this process terminating."
> 
> I mean, it is true that you need to continuously read forward in order
> to get to a consistent state, but why can't you be assured of getting
> there?
> 
> We are doing something very similar in KafkaConnect, where we need a
> consistent view of our configuration. We make sure that if the current
> state is inconsistent (i.e there is data that are not "committed"
> yet), we continue reading to the log end until we get to a consistent
> state.
> 
> I am not convinced the new functionality is necessary, or even helpful.
> 
> Gwen
> 
> On Mon, May 16, 2016 at 4:07 PM, Eric Wasserman
> <eric.wasser...@gmail.com> wrote:
>> I would like to begin discussion on KIP-58
>> 
>> The KIP is here:
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-58+-+Make+Log+Compaction+Point+Configurable
>> 
>> Jira: https://issues.apache.org/jira/browse/KAFKA-1981
>> 
>> Pull Request: https://github.com/apache/kafka/pull/1168
>> 
>> Thanks,
>> 
>> Eric

Re: [DISCUSS] KIP-58 - Make Log Compaction Point Configurable

Reply via email to