Re: log.retention.size

2014-06-11 Thread András Serény

Opened https://issues.apache.org/jira/browse/KAFKA-1489 .

Regards,
András

On 6/11/2014 6:19 AM, Jun Rao wrote:

Could you file a jira to track this?

Thanks,

Jun


On Tue, Jun 10, 2014 at 8:22 AM, András Serény 
wrote:


Hi Kafka devs,

are there currently any plans to implement the global threshold feature?
Is there a JIRA about it?

We are considering to implement a solution for this issue (either inside
or outside of Kafka).

Thanks a lot,
András


On 5/30/2014 11:45 AM, András Serény wrote:


Sorry for the delay on this.

Yes, that's right -- it'd be just another term in the chain of 'or'
conditions. Currently it's  OR . With the global
condition, it would be
 OR  OR 

In my view, that's fairly simple and intuitive, hence a fine piece of
logic.

Regards,
András

On 5/27/2014 4:34 PM, Jun Rao wrote:


For log.retention.bytes.per.topic and log.retention.hours.per.topic, the
current interpretation is that those are tight bounds. In other words,
only
when those thresholds are violated, a segment is deleted. To further
satisfy log.retention.bytes.global, the per topic thresholds may no
longer
be tight, i.e., we may need to delete a segment even when the per topic
threshold is not violated.

Thanks,

Jun


On Tue, May 27, 2014 at 12:22 AM, András Serény <
sereny.and...@gravityrd.com


wrote:
No, I think more specific settings should get a chance first. I'm
suggesting that provided that there is a segment rolled for a topic,
*any
*of log.retention.bytes.per.topic, log.retention.hours.per.topic, and a
future log.retention.bytes.global violation would cause segments to be
deleted.

As far as I understand, the current logic says

(1)
for each topic, if there is a segment already rolled {
  mark segments eligible for deletion due to
log.retention.hours.for.this.topic
  if log.retention.bytes.for.this.topic is still violated, mark
segments eligible for deletion due to log.retention.bytes.for.this.
topic
}

After this cleanup cycle, there could be another one,  taking into
account
the global threshold. For instance, something along the lines of

(2)
if after (1) log.retention.bytes.global is still violated, for each
topic,
if there is a segment already rolled {
calculate the required size for this topic (e.g. the proportional
size,
or simply (full size - threshold)/#topics ?)
mark segments exceeding the required size for deletion
}

Regards,
András



On 5/23/2014 4:46 PM, Jun Rao wrote:

  Yes, that's possible. There is a default log.retention.bytes for every

topic. By introducing a global threshold, we may have to delete data
from
logs whose size is smaller than log.retention.bytes. So, are you saying
that the global threshold has precedence?

Thanks,

Jun


On Fri, May 23, 2014 at 2:26 AM, András Serény
wrote:

   Hi Kafka users,


this feature would also be very useful for us. With lots of topics of
different volume (and as they grow in number) it could become tedious
to
maintain topic level settings.

As a start, I think uniform reduction is a good idea. Logs wouldn't be
retained as long as you want, but that's already the case when a
log.retention.bytes setting is specified. As for early rolling, I
don't
think it's necessary: currently, if there is no log segment eligible
for
deletion, log.retention.bytes and log.retention.hours settings won't
kick
in, so it's possible to exceed these limits, which is completely fine
(please correct me if I'm mistaken here).

All in all, introducing a global threshold doesn't seem to induce a
considerable change in current retention logic.

Regards,
András


On 5/8/2014 2:00 AM, vinh wrote:

   Agreed…a global knob is a bit tricky for exactly the reason you've


identified.  Perhaps the problem could be simplified though by
considering
the context and purpose of Kafka.  I would use a persistent message
queue
because I want to guarantee that data/messages don't get lost.  But,
since
Kafka is not meant to be a long term storage solution (other products
can
be used for that), I would clarify that guarantee to apply only to
the
most
recent messages up until a certain configured threshold (i.e. max 24
hrs,
max 500GB, etc).  Once those thresholds are reached, old messages are
deleted first.

To ensure no message loss (up to a limit), I must ensure Kafka is
highly
available.  There's a small a chance that the message deletion rate
is
the
same rate that receive rate.  For example, when the incoming volume
is
so
high that the size threshold is reached before the time threshold.
   But, I
may be ok with that because if Kafka goes down, it can cause upstream
applications to fail.  This can result in higher losses overall, and
particularly of the most *recent* messages.

In other words, in a persistent but ephemeral message queue, I would
give
higher precedence to recent messages over older ones. On the flip
side, by
allowing Kafka to go down when a disk is full, applications are
forced
to
deal with the issue.  This adds complexity to apps, but perhaps it's
not a
bad

Re: log.retention.size

2014-06-10 Thread Jun Rao
Could you file a jira to track this?

Thanks,

Jun


On Tue, Jun 10, 2014 at 8:22 AM, András Serény 
wrote:

> Hi Kafka devs,
>
> are there currently any plans to implement the global threshold feature?
> Is there a JIRA about it?
>
> We are considering to implement a solution for this issue (either inside
> or outside of Kafka).
>
> Thanks a lot,
> András
>
>
> On 5/30/2014 11:45 AM, András Serény wrote:
>
>>
>> Sorry for the delay on this.
>>
>> Yes, that's right -- it'd be just another term in the chain of 'or'
>> conditions. Currently it's  OR . With the global
>> condition, it would be
>>  OR  OR 
>>
>> In my view, that's fairly simple and intuitive, hence a fine piece of
>> logic.
>>
>> Regards,
>> András
>>
>> On 5/27/2014 4:34 PM, Jun Rao wrote:
>>
>>> For log.retention.bytes.per.topic and log.retention.hours.per.topic, the
>>> current interpretation is that those are tight bounds. In other words,
>>> only
>>> when those thresholds are violated, a segment is deleted. To further
>>> satisfy log.retention.bytes.global, the per topic thresholds may no
>>> longer
>>> be tight, i.e., we may need to delete a segment even when the per topic
>>> threshold is not violated.
>>>
>>> Thanks,
>>>
>>> Jun
>>>
>>>
>>> On Tue, May 27, 2014 at 12:22 AM, András Serény <
>>> sereny.and...@gravityrd.com
>>>
 wrote:
 No, I think more specific settings should get a chance first. I'm
 suggesting that provided that there is a segment rolled for a topic,
 *any
 *of log.retention.bytes.per.topic, log.retention.hours.per.topic, and a
 future log.retention.bytes.global violation would cause segments to be
 deleted.

 As far as I understand, the current logic says

 (1)
 for each topic, if there is a segment already rolled {
  mark segments eligible for deletion due to
 log.retention.hours.for.this.topic
  if log.retention.bytes.for.this.topic is still violated, mark
 segments eligible for deletion due to log.retention.bytes.for.this.
 topic
 }

 After this cleanup cycle, there could be another one,  taking into
 account
 the global threshold. For instance, something along the lines of

 (2)
 if after (1) log.retention.bytes.global is still violated, for each
 topic,
 if there is a segment already rolled {
calculate the required size for this topic (e.g. the proportional
 size,
 or simply (full size - threshold)/#topics ?)
mark segments exceeding the required size for deletion
 }

 Regards,
 András



 On 5/23/2014 4:46 PM, Jun Rao wrote:

  Yes, that's possible. There is a default log.retention.bytes for every
> topic. By introducing a global threshold, we may have to delete data
> from
> logs whose size is smaller than log.retention.bytes. So, are you saying
> that the global threshold has precedence?
>
> Thanks,
>
> Jun
>
>
> On Fri, May 23, 2014 at 2:26 AM, András Serény
> wrote:
>
>   Hi Kafka users,
>
>> this feature would also be very useful for us. With lots of topics of
>> different volume (and as they grow in number) it could become tedious
>> to
>> maintain topic level settings.
>>
>> As a start, I think uniform reduction is a good idea. Logs wouldn't be
>> retained as long as you want, but that's already the case when a
>> log.retention.bytes setting is specified. As for early rolling, I
>> don't
>> think it's necessary: currently, if there is no log segment eligible
>> for
>> deletion, log.retention.bytes and log.retention.hours settings won't
>> kick
>> in, so it's possible to exceed these limits, which is completely fine
>> (please correct me if I'm mistaken here).
>>
>> All in all, introducing a global threshold doesn't seem to induce a
>> considerable change in current retention logic.
>>
>> Regards,
>> András
>>
>>
>> On 5/8/2014 2:00 AM, vinh wrote:
>>
>>   Agreed…a global knob is a bit tricky for exactly the reason you've
>>
>>> identified.  Perhaps the problem could be simplified though by
>>> considering
>>> the context and purpose of Kafka.  I would use a persistent message
>>> queue
>>> because I want to guarantee that data/messages don't get lost.  But,
>>> since
>>> Kafka is not meant to be a long term storage solution (other products
>>> can
>>> be used for that), I would clarify that guarantee to apply only to
>>> the
>>> most
>>> recent messages up until a certain configured threshold (i.e. max 24
>>> hrs,
>>> max 500GB, etc).  Once those thresholds are reached, old messages are
>>> deleted first.
>>>
>>> To ensure no message loss (up to a limit), I must ensure Kafka is
>>> highly
>>> available.  There's a small a chance that the message deletion rate
>>> is
>>> the
>>> 

Re: log.retention.size

2014-06-10 Thread András Serény

Hi Kafka devs,

are there currently any plans to implement the global threshold feature? 
Is there a JIRA about it?


We are considering to implement a solution for this issue (either inside 
or outside of Kafka).


Thanks a lot,
András

On 5/30/2014 11:45 AM, András Serény wrote:


Sorry for the delay on this.

Yes, that's right -- it'd be just another term in the chain of 'or' 
conditions. Currently it's  OR . With the 
global condition, it would be

 OR  OR 

In my view, that's fairly simple and intuitive, hence a fine piece of 
logic.


Regards,
András

On 5/27/2014 4:34 PM, Jun Rao wrote:

For log.retention.bytes.per.topic and log.retention.hours.per.topic, the
current interpretation is that those are tight bounds. In other 
words, only

when those thresholds are violated, a segment is deleted. To further
satisfy log.retention.bytes.global, the per topic thresholds may no 
longer

be tight, i.e., we may need to delete a segment even when the per topic
threshold is not violated.

Thanks,

Jun


On Tue, May 27, 2014 at 12:22 AM, András Serény 

wrote:
No, I think more specific settings should get a chance first. I'm
suggesting that provided that there is a segment rolled for a topic, 
*any

*of log.retention.bytes.per.topic, log.retention.hours.per.topic, and a
future log.retention.bytes.global violation would cause segments to be
deleted.

As far as I understand, the current logic says

(1)
for each topic, if there is a segment already rolled {
 mark segments eligible for deletion due to
log.retention.hours.for.this.topic
 if log.retention.bytes.for.this.topic is still violated, mark
segments eligible for deletion due to 
log.retention.bytes.for.this.topic

}

After this cleanup cycle, there could be another one,  taking into 
account

the global threshold. For instance, something along the lines of

(2)
if after (1) log.retention.bytes.global is still violated, for each 
topic,

if there is a segment already rolled {
   calculate the required size for this topic (e.g. the proportional 
size,

or simply (full size - threshold)/#topics ?)
   mark segments exceeding the required size for deletion
}

Regards,
András



On 5/23/2014 4:46 PM, Jun Rao wrote:


Yes, that's possible. There is a default log.retention.bytes for every
topic. By introducing a global threshold, we may have to delete 
data from
logs whose size is smaller than log.retention.bytes. So, are you 
saying

that the global threshold has precedence?

Thanks,

Jun


On Fri, May 23, 2014 at 2:26 AM, András Serény
wrote:

  Hi Kafka users,

this feature would also be very useful for us. With lots of topics of
different volume (and as they grow in number) it could become 
tedious to

maintain topic level settings.

As a start, I think uniform reduction is a good idea. Logs 
wouldn't be

retained as long as you want, but that's already the case when a
log.retention.bytes setting is specified. As for early rolling, I 
don't
think it's necessary: currently, if there is no log segment 
eligible for
deletion, log.retention.bytes and log.retention.hours settings 
won't kick

in, so it's possible to exceed these limits, which is completely fine
(please correct me if I'm mistaken here).

All in all, introducing a global threshold doesn't seem to induce a
considerable change in current retention logic.

Regards,
András


On 5/8/2014 2:00 AM, vinh wrote:

  Agreed…a global knob is a bit tricky for exactly the reason you've

identified.  Perhaps the problem could be simplified though by
considering
the context and purpose of Kafka.  I would use a persistent message
queue
because I want to guarantee that data/messages don't get lost.  But,
since
Kafka is not meant to be a long term storage solution (other 
products

can
be used for that), I would clarify that guarantee to apply only 
to the

most
recent messages up until a certain configured threshold (i.e. max 24
hrs,
max 500GB, etc).  Once those thresholds are reached, old messages 
are

deleted first.

To ensure no message loss (up to a limit), I must ensure Kafka is 
highly
available.  There's a small a chance that the message deletion 
rate is

the
same rate that receive rate.  For example, when the incoming 
volume is

so
high that the size threshold is reached before the time threshold.
  But, I
may be ok with that because if Kafka goes down, it can cause 
upstream

applications to fail.  This can result in higher losses overall, and
particularly of the most *recent* messages.

In other words, in a persistent but ephemeral message queue, I would
give
higher precedence to recent messages over older ones. On the flip
side, by
allowing Kafka to go down when a disk is full, applications are 
forced

to
deal with the issue.  This adds complexity to apps, but perhaps it's
not a
bad thing.  After all, in scalability, all apps should be 
designed to

handle failure.

Having said that, next is to decide which messages to delete 
first.  I

believe that's a separate issue and has its own complexities

Re: log.retention.size

2014-05-30 Thread András Serény


Sorry for the delay on this.

Yes, that's right -- it'd be just another term in the chain of  'or' 
conditions. Currently it's  OR . With the global 
condition, it would be

 OR  OR 

In my view, that's fairly simple and intuitive, hence a fine piece of logic.

Regards,
András

On 5/27/2014 4:34 PM, Jun Rao wrote:

For log.retention.bytes.per.topic and log.retention.hours.per.topic, the
current interpretation is that those are tight bounds. In other words, only
when those thresholds are violated, a segment is deleted. To further
satisfy log.retention.bytes.global, the per topic thresholds may no longer
be tight, i.e., we may need to delete a segment even when the per topic
threshold is not violated.

Thanks,

Jun


On Tue, May 27, 2014 at 12:22 AM, András Serény 
wrote:
No, I think more specific settings should get a chance first. I'm
suggesting that provided that there is a segment rolled for a topic, *any
*of log.retention.bytes.per.topic, log.retention.hours.per.topic, and a
future log.retention.bytes.global violation would cause segments to be
deleted.

As far as I understand, the current logic says

(1)
for each topic, if there is a segment already rolled {
 mark segments eligible for deletion due to
log.retention.hours.for.this.topic
 if log.retention.bytes.for.this.topic is still violated, mark
segments eligible for deletion due to log.retention.bytes.for.this.topic
}

After this cleanup cycle, there could be another one,  taking into account
the global threshold. For instance, something along the lines of

(2)
if after (1) log.retention.bytes.global is still violated, for each topic,
if there is a segment already rolled {
   calculate the required size for this topic (e.g. the proportional size,
or simply (full size - threshold)/#topics ?)
   mark segments exceeding the required size for deletion
}

Regards,
András



On 5/23/2014 4:46 PM, Jun Rao wrote:


Yes, that's possible. There is a default log.retention.bytes for every
topic. By introducing a global threshold, we may have to delete data from
logs whose size is smaller than log.retention.bytes. So, are you saying
that the global threshold has precedence?

Thanks,

Jun


On Fri, May 23, 2014 at 2:26 AM, András Serény
wrote:

  Hi Kafka users,

this feature would also be very useful for us. With lots of topics of
different volume (and as they grow in number) it could become tedious to
maintain topic level settings.

As a start, I think uniform reduction is a good idea. Logs wouldn't be
retained as long as you want, but that's already the case when a
log.retention.bytes setting is specified. As for early rolling, I don't
think it's necessary: currently, if there is no log segment eligible for
deletion, log.retention.bytes and log.retention.hours settings won't kick
in, so it's possible to exceed these limits, which is completely fine
(please correct me if I'm mistaken here).

All in all, introducing a global threshold doesn't seem to induce a
considerable change in current retention logic.

Regards,
András


On 5/8/2014 2:00 AM, vinh wrote:

  Agreed…a global knob is a bit tricky for exactly the reason you've

identified.  Perhaps the problem could be simplified though by
considering
the context and purpose of Kafka.  I would use a persistent message
queue
because I want to guarantee that data/messages don't get lost.  But,
since
Kafka is not meant to be a long term storage solution (other products
can
be used for that), I would clarify that guarantee to apply only to the
most
recent messages up until a certain configured threshold (i.e. max 24
hrs,
max 500GB, etc).  Once those thresholds are reached, old messages are
deleted first.

To ensure no message loss (up to a limit), I must ensure Kafka is highly
available.  There's a small a chance that the message deletion rate is
the
same rate that receive rate.  For example, when the incoming volume is
so
high that the size threshold is reached before the time threshold.
  But, I
may be ok with that because if Kafka goes down, it can cause upstream
applications to fail.  This can result in higher losses overall, and
particularly of the most *recent* messages.

In other words, in a persistent but ephemeral message queue, I would
give
higher precedence to recent messages over older ones.  On the flip
side, by
allowing Kafka to go down when a disk is full, applications are forced
to
deal with the issue.  This adds complexity to apps, but perhaps it's
not a
bad thing.  After all, in scalability, all apps should be designed to
handle failure.

Having said that, next is to decide which messages to delete first.  I
believe that's a separate issue and has its own complexities, too.

The main idea though is that a global knob would provide flexibility,
even if not used.  From an operation perspective, if we can't ensure HA
for
all applications/components, it would be good if we can for at least
some
of the core ones, like Kafka.  This is much easier said that done
though.


On May 5, 2014, at 9:16 

Re: log.retention.size

2014-05-27 Thread Jun Rao
For log.retention.bytes.per.topic and log.retention.hours.per.topic, the
current interpretation is that those are tight bounds. In other words, only
when those thresholds are violated, a segment is deleted. To further
satisfy log.retention.bytes.global, the per topic thresholds may no longer
be tight, i.e., we may need to delete a segment even when the per topic
threshold is not violated.

Thanks,

Jun


On Tue, May 27, 2014 at 12:22 AM, András Serény  wrote:

>
> No, I think more specific settings should get a chance first. I'm
> suggesting that provided that there is a segment rolled for a topic, *any
> *of log.retention.bytes.per.topic, log.retention.hours.per.topic, and a
> future log.retention.bytes.global violation would cause segments to be
> deleted.
>
> As far as I understand, the current logic says
>
> (1)
> for each topic, if there is a segment already rolled {
> mark segments eligible for deletion due to
> log.retention.hours.for.this.topic
> if log.retention.bytes.for.this.topic is still violated, mark
> segments eligible for deletion due to log.retention.bytes.for.this.topic
> }
>
> After this cleanup cycle, there could be another one,  taking into account
> the global threshold. For instance, something along the lines of
>
> (2)
> if after (1) log.retention.bytes.global is still violated, for each topic,
> if there is a segment already rolled {
>   calculate the required size for this topic (e.g. the proportional size,
> or simply (full size - threshold)/#topics ?)
>   mark segments exceeding the required size for deletion
> }
>
> Regards,
> András
>
>
>
> On 5/23/2014 4:46 PM, Jun Rao wrote:
>
>> Yes, that's possible. There is a default log.retention.bytes for every
>> topic. By introducing a global threshold, we may have to delete data from
>> logs whose size is smaller than log.retention.bytes. So, are you saying
>> that the global threshold has precedence?
>>
>> Thanks,
>>
>> Jun
>>
>>
>> On Fri, May 23, 2014 at 2:26 AM, András Serény
>> wrote:
>>
>>  Hi Kafka users,
>>>
>>> this feature would also be very useful for us. With lots of topics of
>>> different volume (and as they grow in number) it could become tedious to
>>> maintain topic level settings.
>>>
>>> As a start, I think uniform reduction is a good idea. Logs wouldn't be
>>> retained as long as you want, but that's already the case when a
>>> log.retention.bytes setting is specified. As for early rolling, I don't
>>> think it's necessary: currently, if there is no log segment eligible for
>>> deletion, log.retention.bytes and log.retention.hours settings won't kick
>>> in, so it's possible to exceed these limits, which is completely fine
>>> (please correct me if I'm mistaken here).
>>>
>>> All in all, introducing a global threshold doesn't seem to induce a
>>> considerable change in current retention logic.
>>>
>>> Regards,
>>> András
>>>
>>>
>>> On 5/8/2014 2:00 AM, vinh wrote:
>>>
>>>  Agreed…a global knob is a bit tricky for exactly the reason you've
 identified.  Perhaps the problem could be simplified though by
 considering
 the context and purpose of Kafka.  I would use a persistent message
 queue
 because I want to guarantee that data/messages don't get lost.  But,
 since
 Kafka is not meant to be a long term storage solution (other products
 can
 be used for that), I would clarify that guarantee to apply only to the
 most
 recent messages up until a certain configured threshold (i.e. max 24
 hrs,
 max 500GB, etc).  Once those thresholds are reached, old messages are
 deleted first.

 To ensure no message loss (up to a limit), I must ensure Kafka is highly
 available.  There's a small a chance that the message deletion rate is
 the
 same rate that receive rate.  For example, when the incoming volume is
 so
 high that the size threshold is reached before the time threshold.
  But, I
 may be ok with that because if Kafka goes down, it can cause upstream
 applications to fail.  This can result in higher losses overall, and
 particularly of the most *recent* messages.

 In other words, in a persistent but ephemeral message queue, I would
 give
 higher precedence to recent messages over older ones.  On the flip
 side, by
 allowing Kafka to go down when a disk is full, applications are forced
 to
 deal with the issue.  This adds complexity to apps, but perhaps it's
 not a
 bad thing.  After all, in scalability, all apps should be designed to
 handle failure.

 Having said that, next is to decide which messages to delete first.  I
 believe that's a separate issue and has its own complexities, too.

 The main idea though is that a global knob would provide flexibility,
 even if not used.  From an operation perspective, if we can't ensure HA
 for
 all applications/components, it would be good if we can for at least
 some
 of the core ones, 

Re: log.retention.size

2014-05-27 Thread András Serény


No, I think more specific settings should get a chance first. I'm 
suggesting that provided that there is a segment rolled for a topic, 
*any *of log.retention.bytes.per.topic, log.retention.hours.per.topic, 
and a future log.retention.bytes.global violation would cause segments 
to be deleted.


As far as I understand, the current logic says

(1)
for each topic, if there is a segment already rolled {
mark segments eligible for deletion due to 
log.retention.hours.for.this.topic
if log.retention.bytes.for.this.topic is still violated, mark 
segments eligible for deletion due to log.retention.bytes.for.this.topic

}

After this cleanup cycle, there could be another one,  taking into 
account the global threshold. For instance, something along the lines of


(2)
if after (1) log.retention.bytes.global is still violated, for each 
topic, if there is a segment already rolled {
  calculate the required size for this topic (e.g. the proportional 
size, or simply (full size - threshold)/#topics ?)

  mark segments exceeding the required size for deletion
}

Regards,
András


On 5/23/2014 4:46 PM, Jun Rao wrote:

Yes, that's possible. There is a default log.retention.bytes for every
topic. By introducing a global threshold, we may have to delete data from
logs whose size is smaller than log.retention.bytes. So, are you saying
that the global threshold has precedence?

Thanks,

Jun


On Fri, May 23, 2014 at 2:26 AM, András Serény
wrote:


Hi Kafka users,

this feature would also be very useful for us. With lots of topics of
different volume (and as they grow in number) it could become tedious to
maintain topic level settings.

As a start, I think uniform reduction is a good idea. Logs wouldn't be
retained as long as you want, but that's already the case when a
log.retention.bytes setting is specified. As for early rolling, I don't
think it's necessary: currently, if there is no log segment eligible for
deletion, log.retention.bytes and log.retention.hours settings won't kick
in, so it's possible to exceed these limits, which is completely fine
(please correct me if I'm mistaken here).

All in all, introducing a global threshold doesn't seem to induce a
considerable change in current retention logic.

Regards,
András


On 5/8/2014 2:00 AM, vinh wrote:


Agreed…a global knob is a bit tricky for exactly the reason you've
identified.  Perhaps the problem could be simplified though by considering
the context and purpose of Kafka.  I would use a persistent message queue
because I want to guarantee that data/messages don't get lost.  But, since
Kafka is not meant to be a long term storage solution (other products can
be used for that), I would clarify that guarantee to apply only to the most
recent messages up until a certain configured threshold (i.e. max 24 hrs,
max 500GB, etc).  Once those thresholds are reached, old messages are
deleted first.

To ensure no message loss (up to a limit), I must ensure Kafka is highly
available.  There's a small a chance that the message deletion rate is the
same rate that receive rate.  For example, when the incoming volume is so
high that the size threshold is reached before the time threshold.  But, I
may be ok with that because if Kafka goes down, it can cause upstream
applications to fail.  This can result in higher losses overall, and
particularly of the most *recent* messages.

In other words, in a persistent but ephemeral message queue, I would give
higher precedence to recent messages over older ones.  On the flip side, by
allowing Kafka to go down when a disk is full, applications are forced to
deal with the issue.  This adds complexity to apps, but perhaps it's not a
bad thing.  After all, in scalability, all apps should be designed to
handle failure.

Having said that, next is to decide which messages to delete first.  I
believe that's a separate issue and has its own complexities, too.

The main idea though is that a global knob would provide flexibility,
even if not used.  From an operation perspective, if we can't ensure HA for
all applications/components, it would be good if we can for at least some
of the core ones, like Kafka.  This is much easier said that done though.


On May 5, 2014, at 9:16 AM, Jun Rao  wrote:

  Yes, your understanding is correct. A global knob that controls aggregate

log size may make sense. What would be the expected behavior when that
limit is reached? Would you reduce the retention uniformly across all
topics? Then, it just means that some of the logs may not be retained as
long as you want. Also, we need to think through what happens when every
log has only 1 segment left and yet the total size still exceeds the
limit.
Do we roll log segments early?

Thanks,

Jun


On Sun, May 4, 2014 at 4:31 AM, vinh  wrote:

  Thanks Jun.  So if I understand this correctly, there really is no

master
property to control the total aggregate size of all Kafka data files on
a
broker.

log.retention.size and log.file.size are great for managing data

Re: log.retention.size

2014-05-23 Thread Jun Rao
Yes, that's possible. There is a default log.retention.bytes for every
topic. By introducing a global threshold, we may have to delete data from
logs whose size is smaller than log.retention.bytes. So, are you saying
that the global threshold has precedence?

Thanks,

Jun


On Fri, May 23, 2014 at 2:26 AM, András Serény
wrote:

>
> Hi Kafka users,
>
> this feature would also be very useful for us. With lots of topics of
> different volume (and as they grow in number) it could become tedious to
> maintain topic level settings.
>
> As a start, I think uniform reduction is a good idea. Logs wouldn't be
> retained as long as you want, but that's already the case when a
> log.retention.bytes setting is specified. As for early rolling, I don't
> think it's necessary: currently, if there is no log segment eligible for
> deletion, log.retention.bytes and log.retention.hours settings won't kick
> in, so it's possible to exceed these limits, which is completely fine
> (please correct me if I'm mistaken here).
>
> All in all, introducing a global threshold doesn't seem to induce a
> considerable change in current retention logic.
>
> Regards,
> András
>
>
> On 5/8/2014 2:00 AM, vinh wrote:
>
>> Agreed…a global knob is a bit tricky for exactly the reason you've
>> identified.  Perhaps the problem could be simplified though by considering
>> the context and purpose of Kafka.  I would use a persistent message queue
>> because I want to guarantee that data/messages don't get lost.  But, since
>> Kafka is not meant to be a long term storage solution (other products can
>> be used for that), I would clarify that guarantee to apply only to the most
>> recent messages up until a certain configured threshold (i.e. max 24 hrs,
>> max 500GB, etc).  Once those thresholds are reached, old messages are
>> deleted first.
>>
>> To ensure no message loss (up to a limit), I must ensure Kafka is highly
>> available.  There's a small a chance that the message deletion rate is the
>> same rate that receive rate.  For example, when the incoming volume is so
>> high that the size threshold is reached before the time threshold.  But, I
>> may be ok with that because if Kafka goes down, it can cause upstream
>> applications to fail.  This can result in higher losses overall, and
>> particularly of the most *recent* messages.
>>
>> In other words, in a persistent but ephemeral message queue, I would give
>> higher precedence to recent messages over older ones.  On the flip side, by
>> allowing Kafka to go down when a disk is full, applications are forced to
>> deal with the issue.  This adds complexity to apps, but perhaps it's not a
>> bad thing.  After all, in scalability, all apps should be designed to
>> handle failure.
>>
>> Having said that, next is to decide which messages to delete first.  I
>> believe that's a separate issue and has its own complexities, too.
>>
>> The main idea though is that a global knob would provide flexibility,
>> even if not used.  From an operation perspective, if we can't ensure HA for
>> all applications/components, it would be good if we can for at least some
>> of the core ones, like Kafka.  This is much easier said that done though.
>>
>>
>> On May 5, 2014, at 9:16 AM, Jun Rao  wrote:
>>
>>  Yes, your understanding is correct. A global knob that controls aggregate
>>> log size may make sense. What would be the expected behavior when that
>>> limit is reached? Would you reduce the retention uniformly across all
>>> topics? Then, it just means that some of the logs may not be retained as
>>> long as you want. Also, we need to think through what happens when every
>>> log has only 1 segment left and yet the total size still exceeds the
>>> limit.
>>> Do we roll log segments early?
>>>
>>> Thanks,
>>>
>>> Jun
>>>
>>>
>>> On Sun, May 4, 2014 at 4:31 AM, vinh  wrote:
>>>
>>>  Thanks Jun.  So if I understand this correctly, there really is no
 master
 property to control the total aggregate size of all Kafka data files on
 a
 broker.

 log.retention.size and log.file.size are great for managing data at the
 application level.  In our case, application needs change frequently,
 and
 performance itself is an ever evolving feature.  This means various
 configs
 are constantly changing, like topics, # of partitions, etc.

 What rarely changes though is provisioned hardware resources.  So a
 setting to control the total aggregate size of Kafka logs (or persisted
 data, for better clarity) would definitely simplify things at an
 operational level, regardless what happens at the application level.


 On May 2, 2014, at 7:49 AM, Jun Rao  wrote:

  log.retention.size controls the total size in a log dir (per
> partition). log.file.size
> controls the size of each log segment in the log dir.
>
> Thanks,
>
> Jun
>
>
> On Thu, May 1, 2014 at 9:31 PM, vinh  wrote:
>
>  In the 0.7 docs, the descript

Re: log.retention.size

2014-05-23 Thread András Serény


Hi Kafka users,

this feature would also be very useful for us. With lots of topics of 
different volume (and as they grow in number) it could become tedious to 
maintain topic level settings.


As a start, I think uniform reduction is a good idea. Logs wouldn't be 
retained as long as you want, but that's already the case when a 
log.retention.bytes setting is specified. As for early rolling, I don't 
think it's necessary: currently, if there is no log segment eligible for 
deletion, log.retention.bytes and log.retention.hours settings won't 
kick in, so it's possible to exceed these limits, which is completely 
fine (please correct me if I'm mistaken here).


All in all, introducing a global threshold doesn't seem to induce a 
considerable change in current retention logic.


Regards,
András

On 5/8/2014 2:00 AM, vinh wrote:

Agreed…a global knob is a bit tricky for exactly the reason you've identified.  
Perhaps the problem could be simplified though by considering the context and 
purpose of Kafka.  I would use a persistent message queue because I want to 
guarantee that data/messages don't get lost.  But, since Kafka is not meant to 
be a long term storage solution (other products can be used for that), I would 
clarify that guarantee to apply only to the most recent messages up until a 
certain configured threshold (i.e. max 24 hrs, max 500GB, etc).  Once those 
thresholds are reached, old messages are deleted first.

To ensure no message loss (up to a limit), I must ensure Kafka is highly 
available.  There's a small a chance that the message deletion rate is the same 
rate that receive rate.  For example, when the incoming volume is so high that 
the size threshold is reached before the time threshold.  But, I may be ok with 
that because if Kafka goes down, it can cause upstream applications to fail.  
This can result in higher losses overall, and particularly of the most *recent* 
messages.

In other words, in a persistent but ephemeral message queue, I would give 
higher precedence to recent messages over older ones.  On the flip side, by 
allowing Kafka to go down when a disk is full, applications are forced to deal 
with the issue.  This adds complexity to apps, but perhaps it's not a bad 
thing.  After all, in scalability, all apps should be designed to handle 
failure.

Having said that, next is to decide which messages to delete first.  I believe 
that's a separate issue and has its own complexities, too.

The main idea though is that a global knob would provide flexibility, even if 
not used.  From an operation perspective, if we can't ensure HA for all 
applications/components, it would be good if we can for at least some of the 
core ones, like Kafka.  This is much easier said that done though.


On May 5, 2014, at 9:16 AM, Jun Rao  wrote:


Yes, your understanding is correct. A global knob that controls aggregate
log size may make sense. What would be the expected behavior when that
limit is reached? Would you reduce the retention uniformly across all
topics? Then, it just means that some of the logs may not be retained as
long as you want. Also, we need to think through what happens when every
log has only 1 segment left and yet the total size still exceeds the limit.
Do we roll log segments early?

Thanks,

Jun


On Sun, May 4, 2014 at 4:31 AM, vinh  wrote:


Thanks Jun.  So if I understand this correctly, there really is no master
property to control the total aggregate size of all Kafka data files on a
broker.

log.retention.size and log.file.size are great for managing data at the
application level.  In our case, application needs change frequently, and
performance itself is an ever evolving feature.  This means various configs
are constantly changing, like topics, # of partitions, etc.

What rarely changes though is provisioned hardware resources.  So a
setting to control the total aggregate size of Kafka logs (or persisted
data, for better clarity) would definitely simplify things at an
operational level, regardless what happens at the application level.


On May 2, 2014, at 7:49 AM, Jun Rao  wrote:


log.retention.size controls the total size in a log dir (per
partition). log.file.size
controls the size of each log segment in the log dir.

Thanks,

Jun


On Thu, May 1, 2014 at 9:31 PM, vinh  wrote:


In the 0.7 docs, the description for log.retention.size and

log.file.size

sound very much the same.  In particular, that they apply to a single

log

file (or log segment file).

http://kafka.apache.org/07/configuration.html

I'm beginning to think there is no setting to control the max aggregate
size of all logs.  If this is correct, what would be a good approach to
enforce this requirement?  In my particular scenario, I have a lot of

data

being written to Kafka at a very high rate.  So a 1TB disk can easily be
filled up in 24hrs or so.  One option is to add more Kafka brokers to

add

more disk space to the pool, but I'd like to avoid that and see if I can
simply configure K

Re: log.retention.size

2014-05-16 Thread vinh
Agreed…a global knob is a bit tricky for exactly the reason you've identified.  
Perhaps the problem could be simplified though by considering the context and 
purpose of Kafka.  I would use a persistent message queue because I want to 
guarantee that data/messages don't get lost.  But, since Kafka is not meant to 
be a long term storage solution (other products can be used for that), I would 
clarify that guarantee to apply only to the most recent messages up until a 
certain configured threshold (i.e. max 24 hrs, max 500GB, etc).  Once those 
thresholds are reached, old messages are deleted first.

To ensure no message loss (up to a limit), I must ensure Kafka is highly 
available.  There's a small a chance that the message deletion rate is the same 
rate that receive rate.  For example, when the incoming volume is so high that 
the size threshold is reached before the time threshold.  But, I may be ok with 
that because if Kafka goes down, it can cause upstream applications to fail.  
This can result in higher losses overall, and particularly of the most *recent* 
messages.

In other words, in a persistent but ephemeral message queue, I would give 
higher precedence to recent messages over older ones.  On the flip side, by 
allowing Kafka to go down when a disk is full, applications are forced to deal 
with the issue.  This adds complexity to apps, but perhaps it's not a bad 
thing.  After all, in scalability, all apps should be designed to handle 
failure.

Having said that, next is to decide which messages to delete first.  I believe 
that's a separate issue and has its own complexities, too.

The main idea though is that a global knob would provide flexibility, even if 
not used.  From an operation perspective, if we can't ensure HA for all 
applications/components, it would be good if we can for at least some of the 
core ones, like Kafka.  This is much easier said that done though.


On May 5, 2014, at 9:16 AM, Jun Rao  wrote:

> Yes, your understanding is correct. A global knob that controls aggregate
> log size may make sense. What would be the expected behavior when that
> limit is reached? Would you reduce the retention uniformly across all
> topics? Then, it just means that some of the logs may not be retained as
> long as you want. Also, we need to think through what happens when every
> log has only 1 segment left and yet the total size still exceeds the limit.
> Do we roll log segments early?
> 
> Thanks,
> 
> Jun
> 
> 
> On Sun, May 4, 2014 at 4:31 AM, vinh  wrote:
> 
>> Thanks Jun.  So if I understand this correctly, there really is no master
>> property to control the total aggregate size of all Kafka data files on a
>> broker.
>> 
>> log.retention.size and log.file.size are great for managing data at the
>> application level.  In our case, application needs change frequently, and
>> performance itself is an ever evolving feature.  This means various configs
>> are constantly changing, like topics, # of partitions, etc.
>> 
>> What rarely changes though is provisioned hardware resources.  So a
>> setting to control the total aggregate size of Kafka logs (or persisted
>> data, for better clarity) would definitely simplify things at an
>> operational level, regardless what happens at the application level.
>> 
>> 
>> On May 2, 2014, at 7:49 AM, Jun Rao  wrote:
>> 
>>> log.retention.size controls the total size in a log dir (per
>>> partition). log.file.size
>>> controls the size of each log segment in the log dir.
>>> 
>>> Thanks,
>>> 
>>> Jun
>>> 
>>> 
>>> On Thu, May 1, 2014 at 9:31 PM, vinh  wrote:
>>> 
 In the 0.7 docs, the description for log.retention.size and
>> log.file.size
 sound very much the same.  In particular, that they apply to a single
>> log
 file (or log segment file).
 
 http://kafka.apache.org/07/configuration.html
 
 I'm beginning to think there is no setting to control the max aggregate
 size of all logs.  If this is correct, what would be a good approach to
 enforce this requirement?  In my particular scenario, I have a lot of
>> data
 being written to Kafka at a very high rate.  So a 1TB disk can easily be
 filled up in 24hrs or so.  One option is to add more Kafka brokers to
>> add
 more disk space to the pool, but I'd like to avoid that and see if I can
 simply configure Kafka to not write more than 1TB aggregate.  Else,
>> Kafka
 will OOM and kill itself, and possibly the crash the node itself because
 the disk is full.
 
 
 On May 1, 2014, at 9:21 PM, vinh  wrote:
 
> Using Kafka 0.7.2, I have the following in server.properties:
> 
> log.retention.hours=48
> log.retention.size=107374182400
> log.file.size=536870912
> 
> My interpretation of this is:
> a) a single log segment file over 48hrs old will be deleted
> b) the total combined size of *all* logs is 100GB
> c) a single log segment file is limited to 500MB in size before a new
 segment fi

Re: log.retention.size

2014-05-05 Thread Jun Rao
Yes, your understanding is correct. A global knob that controls aggregate
log size may make sense. What would be the expected behavior when that
limit is reached? Would you reduce the retention uniformly across all
topics? Then, it just means that some of the logs may not be retained as
long as you want. Also, we need to think through what happens when every
log has only 1 segment left and yet the total size still exceeds the limit.
Do we roll log segments early?

Thanks,

Jun


On Sun, May 4, 2014 at 4:31 AM, vinh  wrote:

> Thanks Jun.  So if I understand this correctly, there really is no master
> property to control the total aggregate size of all Kafka data files on a
> broker.
>
> log.retention.size and log.file.size are great for managing data at the
> application level.  In our case, application needs change frequently, and
> performance itself is an ever evolving feature.  This means various configs
> are constantly changing, like topics, # of partitions, etc.
>
> What rarely changes though is provisioned hardware resources.  So a
> setting to control the total aggregate size of Kafka logs (or persisted
> data, for better clarity) would definitely simplify things at an
> operational level, regardless what happens at the application level.
>
>
> On May 2, 2014, at 7:49 AM, Jun Rao  wrote:
>
> > log.retention.size controls the total size in a log dir (per
> > partition). log.file.size
> > controls the size of each log segment in the log dir.
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Thu, May 1, 2014 at 9:31 PM, vinh  wrote:
> >
> >> In the 0.7 docs, the description for log.retention.size and
> log.file.size
> >> sound very much the same.  In particular, that they apply to a single
> log
> >> file (or log segment file).
> >>
> >> http://kafka.apache.org/07/configuration.html
> >>
> >> I'm beginning to think there is no setting to control the max aggregate
> >> size of all logs.  If this is correct, what would be a good approach to
> >> enforce this requirement?  In my particular scenario, I have a lot of
> data
> >> being written to Kafka at a very high rate.  So a 1TB disk can easily be
> >> filled up in 24hrs or so.  One option is to add more Kafka brokers to
> add
> >> more disk space to the pool, but I'd like to avoid that and see if I can
> >> simply configure Kafka to not write more than 1TB aggregate.  Else,
> Kafka
> >> will OOM and kill itself, and possibly the crash the node itself because
> >> the disk is full.
> >>
> >>
> >> On May 1, 2014, at 9:21 PM, vinh  wrote:
> >>
> >>> Using Kafka 0.7.2, I have the following in server.properties:
> >>>
> >>> log.retention.hours=48
> >>> log.retention.size=107374182400
> >>> log.file.size=536870912
> >>>
> >>> My interpretation of this is:
> >>> a) a single log segment file over 48hrs old will be deleted
> >>> b) the total combined size of *all* logs is 100GB
> >>> c) a single log segment file is limited to 500MB in size before a new
> >> segment file is spawned spawning a new segment file
> >>> d) a "log file" can be composed of many "log segment files"
> >>>
> >>> But, even after setting the above, I find that the total combined size
> >> of all Kafka logs on disk is 200GB right now.  Isn't log.retention.size
> >> supposed to limit it to 100GB?  Am I missing something?  The docs are
> not
> >> really clear, especially when it comes to distinguishing between a "log
> >> file" and a "log segment file".
> >>>
> >>> I have disk monitoring.  But like anything else in software, even
> >> monitoring can fail.  Via configuration, I'd like to make sure that
> Kafka
> >> does not write more than the available disk space.  Or something like
> >> log4j, where I can set a max number of log files and the max size per
> file,
> >> which essentially allows me to set a max aggregate size limit across all
> >> logs.
> >>>
> >>> Thanks,
> >>> -Vinh
> >>
> >>
>
>


Re: log.retention.size

2014-05-04 Thread vinh
Thanks Jun.  So if I understand this correctly, there really is no master 
property to control the total aggregate size of all Kafka data files on a 
broker.

log.retention.size and log.file.size are great for managing data at the 
application level.  In our case, application needs change frequently, and 
performance itself is an ever evolving feature.  This means various configs are 
constantly changing, like topics, # of partitions, etc.

What rarely changes though is provisioned hardware resources.  So a setting to 
control the total aggregate size of Kafka logs (or persisted data, for better 
clarity) would definitely simplify things at an operational level, regardless 
what happens at the application level.


On May 2, 2014, at 7:49 AM, Jun Rao  wrote:

> log.retention.size controls the total size in a log dir (per
> partition). log.file.size
> controls the size of each log segment in the log dir.
> 
> Thanks,
> 
> Jun
> 
> 
> On Thu, May 1, 2014 at 9:31 PM, vinh  wrote:
> 
>> In the 0.7 docs, the description for log.retention.size and log.file.size
>> sound very much the same.  In particular, that they apply to a single log
>> file (or log segment file).
>> 
>> http://kafka.apache.org/07/configuration.html
>> 
>> I'm beginning to think there is no setting to control the max aggregate
>> size of all logs.  If this is correct, what would be a good approach to
>> enforce this requirement?  In my particular scenario, I have a lot of data
>> being written to Kafka at a very high rate.  So a 1TB disk can easily be
>> filled up in 24hrs or so.  One option is to add more Kafka brokers to add
>> more disk space to the pool, but I'd like to avoid that and see if I can
>> simply configure Kafka to not write more than 1TB aggregate.  Else, Kafka
>> will OOM and kill itself, and possibly the crash the node itself because
>> the disk is full.
>> 
>> 
>> On May 1, 2014, at 9:21 PM, vinh  wrote:
>> 
>>> Using Kafka 0.7.2, I have the following in server.properties:
>>> 
>>> log.retention.hours=48
>>> log.retention.size=107374182400
>>> log.file.size=536870912
>>> 
>>> My interpretation of this is:
>>> a) a single log segment file over 48hrs old will be deleted
>>> b) the total combined size of *all* logs is 100GB
>>> c) a single log segment file is limited to 500MB in size before a new
>> segment file is spawned spawning a new segment file
>>> d) a "log file" can be composed of many "log segment files"
>>> 
>>> But, even after setting the above, I find that the total combined size
>> of all Kafka logs on disk is 200GB right now.  Isn't log.retention.size
>> supposed to limit it to 100GB?  Am I missing something?  The docs are not
>> really clear, especially when it comes to distinguishing between a "log
>> file" and a "log segment file".
>>> 
>>> I have disk monitoring.  But like anything else in software, even
>> monitoring can fail.  Via configuration, I'd like to make sure that Kafka
>> does not write more than the available disk space.  Or something like
>> log4j, where I can set a max number of log files and the max size per file,
>> which essentially allows me to set a max aggregate size limit across all
>> logs.
>>> 
>>> Thanks,
>>> -Vinh
>> 
>> 



Re: log.retention.size

2014-05-02 Thread Jun Rao
log.retention.size controls the total size in a log dir (per
partition). log.file.size
controls the size of each log segment in the log dir.

Thanks,

Jun


On Thu, May 1, 2014 at 9:31 PM, vinh  wrote:

> In the 0.7 docs, the description for log.retention.size and log.file.size
> sound very much the same.  In particular, that they apply to a single log
> file (or log segment file).
>
> http://kafka.apache.org/07/configuration.html
>
> I'm beginning to think there is no setting to control the max aggregate
> size of all logs.  If this is correct, what would be a good approach to
> enforce this requirement?  In my particular scenario, I have a lot of data
> being written to Kafka at a very high rate.  So a 1TB disk can easily be
> filled up in 24hrs or so.  One option is to add more Kafka brokers to add
> more disk space to the pool, but I'd like to avoid that and see if I can
> simply configure Kafka to not write more than 1TB aggregate.  Else, Kafka
> will OOM and kill itself, and possibly the crash the node itself because
> the disk is full.
>
>
> On May 1, 2014, at 9:21 PM, vinh  wrote:
>
> > Using Kafka 0.7.2, I have the following in server.properties:
> >
> > log.retention.hours=48
> > log.retention.size=107374182400
> > log.file.size=536870912
> >
> > My interpretation of this is:
> > a) a single log segment file over 48hrs old will be deleted
> > b) the total combined size of *all* logs is 100GB
> > c) a single log segment file is limited to 500MB in size before a new
> segment file is spawned spawning a new segment file
> > d) a "log file" can be composed of many "log segment files"
> >
> > But, even after setting the above, I find that the total combined size
> of all Kafka logs on disk is 200GB right now.  Isn't log.retention.size
> supposed to limit it to 100GB?  Am I missing something?  The docs are not
> really clear, especially when it comes to distinguishing between a "log
> file" and a "log segment file".
> >
> > I have disk monitoring.  But like anything else in software, even
> monitoring can fail.  Via configuration, I'd like to make sure that Kafka
> does not write more than the available disk space.  Or something like
> log4j, where I can set a max number of log files and the max size per file,
> which essentially allows me to set a max aggregate size limit across all
> logs.
> >
> > Thanks,
> > -Vinh
>
>


Re: log.retention.size

2014-05-01 Thread vinh
In the 0.7 docs, the description for log.retention.size and log.file.size sound 
very much the same.  In particular, that they apply to a single log file (or 
log segment file).

http://kafka.apache.org/07/configuration.html

I'm beginning to think there is no setting to control the max aggregate size of 
all logs.  If this is correct, what would be a good approach to enforce this 
requirement?  In my particular scenario, I have a lot of data being written to 
Kafka at a very high rate.  So a 1TB disk can easily be filled up in 24hrs or 
so.  One option is to add more Kafka brokers to add more disk space to the 
pool, but I'd like to avoid that and see if I can simply configure Kafka to not 
write more than 1TB aggregate.  Else, Kafka will OOM and kill itself, and 
possibly the crash the node itself because the disk is full.


On May 1, 2014, at 9:21 PM, vinh  wrote:

> Using Kafka 0.7.2, I have the following in server.properties:
> 
> log.retention.hours=48
> log.retention.size=107374182400
> log.file.size=536870912
> 
> My interpretation of this is:
> a) a single log segment file over 48hrs old will be deleted
> b) the total combined size of *all* logs is 100GB
> c) a single log segment file is limited to 500MB in size before a new segment 
> file is spawned spawning a new segment file
> d) a "log file" can be composed of many "log segment files"
> 
> But, even after setting the above, I find that the total combined size of all 
> Kafka logs on disk is 200GB right now.  Isn't log.retention.size supposed to 
> limit it to 100GB?  Am I missing something?  The docs are not really clear, 
> especially when it comes to distinguishing between a "log file" and a "log 
> segment file".
> 
> I have disk monitoring.  But like anything else in software, even monitoring 
> can fail.  Via configuration, I'd like to make sure that Kafka does not write 
> more than the available disk space.  Or something like log4j, where I can set 
> a max number of log files and the max size per file, which essentially allows 
> me to set a max aggregate size limit across all logs.
> 
> Thanks,
> -Vinh