I see what you mean, Eric.

I was unclear on the specifics of your architecture. It sounds like
you have a table somewhere that maps checkpoints to lists of
<topicPartition, offset>.
In that case it is indeed useful to know that if the checkpoint was
written N ms ago, you will be able to find the exact offsets by
looking at the log.

Reading ahead won't really help in that case, since it sounds like the
state is too large to maintain in memory while reading ahead to a
future checkpoint.
(Different from Jay's abstract case in that regard).

Gwen


On Mon, May 16, 2016 at 9:21 PM, Eric Wasserman
<eric.wasser...@gmail.com> wrote:
> Gwen,
>
> For simplicity, the example I gave in the gist is for a single table with a 
> single partition. The salient point is that even for a single topic with one 
> partition there is no guarantee without the feature that one will be able to 
> restore some particular checkpoint as the offset indicated by that checkpoint 
> may have been compacted away.
>
> The practical reality is we are trying to restore the state of a database 
> with nearly 1000 tables each of which has 8 partitions. In this real case 
> there are 8000 offsets indicated in each checkpoint. If even a single one of 
> those 8000 is compacted the checkpointed state cannot be reconstructed.
>
> Additionally, we don't really intend to have the consumers of the table 
> topics try to keep current. Rather they will occasionally (say at 1AM each 
> day) try to build the state of the database at a recent checkpoint (say from 
> midnight). Supposing this takes a bit of time (10's of minutes to hours) to 
> read all the partitions of all the table topics up each to its target offset 
> indicated in the midnight checkpoint. By the time all the consumers have 
> arrive at the designated offset perhaps one of them will have had its target 
> offset compacted away. We would then need to select a new target checkpoint 
> with its offsets for each topic and partition that is a bit later. How much 
> later? It might well be around the 10's of minutes to hours it took to read 
> through to the offsets of the original target checkpoint as the compaction 
> that foiled us may have occurred just before we reached the goal.
>
> Really the issue is that while without the feature while we could eventually 
> restore _some_ consistent state we couldn't be assured of being able to 
> restore any
> particular (recent) one. My comment about never being assured of the process 
> terminating is just acknowledging the perhaps small but nonetheless finite 
> possibility of the process of chasing the checkpoints looking for which no 
> partition has yet had its target offset compacted away could continue 
> indefinitely. There is really no condition in which one could be absolutely 
> guaranteed this process would terminate.
>
> The feature addresses this by providing a guarantee that _any_ checkpoint can 
> be reconstructed as long as it is within the compaction lag. I would love to 
> be convinced that I am in error but short of that I frankly would never turn 
> on compaction for a CDC use case without it.
>
> As to reducing the number of parameters. I personally only see the 
> min.compaction.lag.ms as being truly essential. Even the existing ratio 
> setting is secondary in my mind.
>
> Eric
>
>> On May 16, 2016, at 6:42 PM, Gwen Shapira <g...@confluent.io> wrote:
>>
>> Hi Eric,
>>
>> Thank you for submitting this improvement suggestion.
>>
>> Do you mind clarifying the use-case for me?
>>
>> Looking at your gist: https://gist.github.com/ewasserman/f8c892c2e7a9cf26ee46
>>
>> If my consumer started reading all the CDC topics from the very
>> beginning in which they were created, without ever stopping, it is
>> obviously guaranteed to see every single consistent state of the
>> database.
>> If my consumer joined late (lets say after Tq got clobbered by Tr) it
>> will get a mixed state, but if it will continue listening on those
>> topics, always following the logs to their end, it is guaranteed to
>> see a consistent state as soon a new transaction commits. Am I missing
>> anything?
>>
>> Basically, I do not understand why you claim: "However, to recover all
>> the tables at the same checkpoint, with each independently compacting,
>> one may need to move to an even more recent checkpoint when a
>> different table had the same read issue with the new checkpoint. Thus
>> one could never be assured of this process terminating."
>>
>> I mean, it is true that you need to continuously read forward in order
>> to get to a consistent state, but why can't you be assured of getting
>> there?
>>
>> We are doing something very similar in KafkaConnect, where we need a
>> consistent view of our configuration. We make sure that if the current
>> state is inconsistent (i.e there is data that are not "committed"
>> yet), we continue reading to the log end until we get to a consistent
>> state.
>>
>> I am not convinced the new functionality is necessary, or even helpful.
>>
>> Gwen
>>
>> On Mon, May 16, 2016 at 4:07 PM, Eric Wasserman
>> <eric.wasser...@gmail.com> wrote:
>>> I would like to begin discussion on KIP-58
>>>
>>> The KIP is here:
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-58+-+Make+Log+Compaction+Point+Configurable
>>>
>>> Jira: https://issues.apache.org/jira/browse/KAFKA-1981
>>>
>>> Pull Request: https://github.com/apache/kafka/pull/1168
>>>
>>> Thanks,
>>>
>>> Eric
>

Reply via email to