[jira] [Commented] (KAFKA-1489) Global threshold on data retention size

Jim Hoagland (JIRA) Mon, 25 Aug 2014 20:31:37 -0700

    [ 
https://issues.apache.org/jira/browse/KAFKA-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14110225#comment-14110225
 ]


Jim Hoagland commented on KAFKA-1489:
-------------------------------------

> The per-topic config right now basically provides guarantees for each topic / 
> tenant using Kafka, although it may be tedious, it sounds
> like the right approach to me for hosting multiple applications.

I agree.  Is there anything we can do to make it easier to re-used most of the 
same settings between a set of topics (e.g. those set of topics belonging to a 
single tenant)?  (Of course there is much more that is needed in Kafka to 
really have secure multi-tenancy.)

>  setting log.retention.size to "-1" indicating we do not want to retain any 
> of its data

If you want to go a bit further on that, then you ask the user to set a 
retention priority; those with the lowest priority will be cleaned up first.  
However I'm not sure if we want to add the complexity or prioritized deletes in 
the first version.

> Another issue is that, like Jay said, there is a risk that different nodes 
> may have different sized logs retained for the same partition due 
> to the retention policy, and hence when there is a leader change the consumer 
> clients may get "out-of-range" exception. We also need 
> to be careful handling that case.

Good point.  If we do get to a condition like that we should say out-of-range 
to indicate that some messages may have been skipped.  If feasible we should 
try to coordinate the pruning of a topic across the different partitions (at 
least amongst those with brokers that are online).  Given this I think we 
should definitely try to have the invocation of this logic be proactive rather 
than reactive to allow a bit more relaxed timeframe.  Maybe the check could be 
part of the retention cleanup process though we would probably want this check 
to be in place though probably run more frequently than normal retention 
cleanup.  In any case we should make sure that the emergency drop and normal 
retention cleanup don't somehow confuse each other by running at the same time.

> Global threshold on data retention size
> ---------------------------------------
>
>                 Key: KAFKA-1489
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1489
>             Project: Kafka
>          Issue Type: New Feature
>          Components: log
>    Affects Versions: 0.8.1.1
>            Reporter: Andras Sereny
>            Assignee: Jay Kreps
>              Labels: newbie
>
> Currently, Kafka has per topic settings to control the size of one single log 
> (log.retention.bytes). With lots of topics of different volume and as they 
> grow in number, it could become tedious to maintain topic level settings 
> applying to a single log. 
> Often, a chunk of disk space is dedicated to Kafka that hosts all logs 
> stored, so it'd make sense to have a configurable threshold to control how 
> much space *all* data in one Kafka log data directory can take up.
> See also:
> http://mail-archives.apache.org/mod_mbox/kafka-users/201406.mbox/browser
> http://mail-archives.apache.org/mod_mbox/kafka-users/201311.mbox/%3c20131107015125.gc9...@jkoshy-ld.linkedin.biz%3E



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (KAFKA-1489) Global threshold on data retention size

Reply via email to