Re: questtion about log.retention.bytes

2013-08-28 Thread Jay Kreps
I think the problem is that there is no way to understand the meaning of
that config from the docs, so people keep asking over and over again. The
docs make iit sound like it is per topic and just says it is the maximum
size before it is deleted which makes no sense...

-jay

On Tuesday, August 27, 2013, Jun Rao wrote:

 For the first question, yes.

 For the second one, this is documented in
 http://kafka.apache.org/documentation.html#brokerconfigs

 Note that all per topic configuration properties below have the format of
 csv (e.g., topic1:value1,topic2:value2).
 Thanks,
 Jun



 On Tue, Aug 27, 2013 at 11:52 AM, Yu, Libo libo...@citi.comjavascript:;
 wrote:

  Hi Jun,
 
  In a previous email thread
 
 http://markmail.org/search/?q=kafka+log.retention.bytes#query:kafka%20log.retention.bytes+page:1+mid:qnt4pbq47goii2ui+state:results
  ,
  you said log.retention.bytes is for each partition. Could you clarify on
  that?
 
  Say if I have a topic with three partitions. I want to limit the disk
  space to 1Gb for each partition.
  Then log.retention.bytes should be set to 1Gb (not 3Gb). Is that right?
 
  If I want to use log.retention.bytes.per.topic to set the same limit,
  should it be set to 1G or 3G?
 
  Libo
 
 
  -Original Message-
  From: Jun Rao [mailto:jun...@gmail.com javascript:;]
  Sent: Saturday, August 17, 2013 12:40 AM
  To: users@kafka.apache.org javascript:;
  Subject: Re: questtion about log.retention.bytes
 
  log.retention.bytes is for all topics that are not included in
  log.retention.bytes.per.topic (which defines a map of topic - size).
 
  Currently, we don't have a total size limit across all topics.
 
  Thanks,
 
  Jun
 
 
  On Fri, Aug 16, 2013 at 2:00 PM, Paul Christian
  pchrist...@salesforce.com javascript:;wrote:
 
   According to the Kafka 8 documentation under broker configuration.
   There are these parameters and their definitions.
  
   log.retention.bytes -1 The maximum size of the log before deleting it
   log.retention.bytes.per.topic  The maximum size of the log for some
   specific topic before deleting it
  
   I'm curious what the first value 'log.retention.bytes' is for if the
   second one is for per topic logs, because aren't all logs generated
   per topic? Is this an aggregate value across topics?
  
   Related question, is there a parameter for kafka where you can say
   only hold this much TOTAL data across all topic ( logs/index together
 )?
  I.e.
   our  hosts have this much available space and so value
   log.retention.whatever.aggregate == 75% total  disk space.
  
 



Re: questtion about log.retention.bytes

2013-08-21 Thread Jun Rao
All per topic configuration properties below have the format of csv (e.g.,
topic1:value1,topic2:value2). Updated our website to make it clear.

Thanks,

Jun


On Tue, Aug 20, 2013 at 6:16 AM, Paul Christian
pchrist...@salesforce.comwrote:

 Jun,

 For my first example is that syntax correct? I.e.

 log.retention.bytes.per.topic.A = 15MB
 log.retention.bytes.per.topic.B = 20MB

 I totally guessed there and was wondering if I guessed right? Otherwise is
 there a document with the proper formatting to full out this map?

 Thank you,

 Paul



re: questtion about log.retention.bytes

2013-08-20 Thread Paul Christian
Jun,

For my first example is that syntax correct? I.e.

log.retention.bytes.per.topic.A = 15MB
log.retention.bytes.per.topic.B = 20MB

I totally guessed there and was wondering if I guessed right? Otherwise is
there a document with the proper formatting to full out this map?

Thank you,

Paul


Re: questtion about log.retention.bytes

2013-08-20 Thread Paul Christian
Neha,

Correct, that is my question. We want to investigate capping our disk usage
so we don't fill up our hds. If you have any recommended configurations or
documents on these setting, please let us know.

Thank you,

Paul



On Tue, Aug 20, 2013 at 6:16 AM, Paul Christian
pchrist...@salesforce.comwrote:

 Jun,

 For my first example is that syntax correct? I.e.

 log.retention.bytes.per.topic.A = 15MB
 log.retention.bytes.per.topic.B = 20MB

 I totally guessed there and was wondering if I guessed right? Otherwise is
 there a document with the proper formatting to full out this map?

 Thank you,

 Paul






re: questtion about log.retention.bytes

2013-08-19 Thread Paul Christian
Hi Jun,

Thank you for your reply. I'm still a little fuzzy on the concept.

Are you saying I can have topic A, B and C and with

log.retention.bytes.per.topic.A = 15MB
log.retention.bytes.per.topic.B = 20MB
log.retention.bytes = 30MB

And thus topic C will get the value 30MB? Since it's not defined like the
others' 'per topic'?

log.retention.bytes is for all topics that are not included in
log.retention.bytes.per.topic
(which defines a map of topic - size).

Otherwise, log.retention.bytes.per.topic and log.retention.bytes seem very
similar to me.

Additionally, we've experimented with this value on our test cluster where
we set the log.retention.bytes to 11MB as a test. Below is a snippet from
our server.properties.

# A size-based retention policy for logs. Segments are pruned from the log
as long as the remaining
# segments don't drop below log.retention.bytes.
log.retention.bytes=11534336

Here is a ls -lh from one of the topics

-rw-r--r-- 1 kafka service  10M Aug 19 15:45 07021913.index
-rw-r--r-- 1 kafka service 114M Aug 19 15:45 07021913.log

The index file appears to be reflected in the property
log.index.size.max.bytes, but the log just keeps going.


Re: questtion about log.retention.bytes

2013-08-19 Thread Jun Rao
For the first question, yes, topic C will get the value of 30MB.

For the second question, log.retention.bytes only controls the segment log
file size, not the index. Typically, index file size is much smaller than
the log file. The index file of the last (active) segment is presized to
the max index size (defaults to 10MB). However, the size is trimmed as soon
as the segment rolls.

Thanks,

Jun


On Mon, Aug 19, 2013 at 9:22 AM, Paul Christian
pchrist...@salesforce.comwrote:

 Hi Jun,

 Thank you for your reply. I'm still a little fuzzy on the concept.

 Are you saying I can have topic A, B and C and with

 log.retention.bytes.per.topic.A = 15MB
 log.retention.bytes.per.topic.B = 20MB
 log.retention.bytes = 30MB

 And thus topic C will get the value 30MB? Since it's not defined like the
 others' 'per topic'?

 log.retention.bytes is for all topics that are not included in
 log.retention.bytes.per.topic
 (which defines a map of topic - size).

 Otherwise, log.retention.bytes.per.topic and log.retention.bytes seem very
 similar to me.

 Additionally, we've experimented with this value on our test cluster where
 we set the log.retention.bytes to 11MB as a test. Below is a snippet from
 our server.properties.

 # A size-based retention policy for logs. Segments are pruned from the log
 as long as the remaining
 # segments don't drop below log.retention.bytes.
 log.retention.bytes=11534336

 Here is a ls -lh from one of the topics

 -rw-r--r-- 1 kafka service  10M Aug 19 15:45 07021913.index
 -rw-r--r-- 1 kafka service 114M Aug 19 15:45 07021913.log

 The index file appears to be reflected in the property
 log.index.size.max.bytes, but the log just keeps going.



Re: questtion about log.retention.bytes

2013-08-19 Thread Neha Narkhede
Paul,

I'm trying to understand the 2nd problem you reported. Are you saying that
you set the log.retention.bytes=11534336 (11MB) but nevertheless your log
grew to 114MB. Which means the config option didn't really work as expected?

Thanks,
Neha


On Mon, Aug 19, 2013 at 8:46 PM, Jun Rao jun...@gmail.com wrote:

 For the first question, yes, topic C will get the value of 30MB.

 For the second question, log.retention.bytes only controls the segment log
 file size, not the index. Typically, index file size is much smaller than
 the log file. The index file of the last (active) segment is presized to
 the max index size (defaults to 10MB). However, the size is trimmed as soon
 as the segment rolls.

 Thanks,

 Jun


 On Mon, Aug 19, 2013 at 9:22 AM, Paul Christian
 pchrist...@salesforce.comwrote:

  Hi Jun,
 
  Thank you for your reply. I'm still a little fuzzy on the concept.
 
  Are you saying I can have topic A, B and C and with
 
  log.retention.bytes.per.topic.A = 15MB
  log.retention.bytes.per.topic.B = 20MB
  log.retention.bytes = 30MB
 
  And thus topic C will get the value 30MB? Since it's not defined like the
  others' 'per topic'?
 
  log.retention.bytes is for all topics that are not included in
  log.retention.bytes.per.topic
  (which defines a map of topic - size).
 
  Otherwise, log.retention.bytes.per.topic and log.retention.bytes seem
 very
  similar to me.
 
  Additionally, we've experimented with this value on our test cluster
 where
  we set the log.retention.bytes to 11MB as a test. Below is a snippet from
  our server.properties.
 
  # A size-based retention policy for logs. Segments are pruned from the
 log
  as long as the remaining
  # segments don't drop below log.retention.bytes.
  log.retention.bytes=11534336
 
  Here is a ls -lh from one of the topics
 
  -rw-r--r-- 1 kafka service  10M Aug 19 15:45 07021913.index
  -rw-r--r-- 1 kafka service 114M Aug 19 15:45 07021913.log
 
  The index file appears to be reflected in the property
  log.index.size.max.bytes, but the log just keeps going.
 



Re: questtion about log.retention.bytes

2013-08-16 Thread Jun Rao
log.retention.bytes is for all topics that are not included in
log.retention.bytes.per.topic
(which defines a map of topic - size).

Currently, we don't have a total size limit across all topics.

Thanks,

Jun


On Fri, Aug 16, 2013 at 2:00 PM, Paul Christian
pchrist...@salesforce.comwrote:

 According to the Kafka 8 documentation under broker configuration. There
 are these parameters and their definitions.

 log.retention.bytes -1 The maximum size of the log before deleting it
 log.retention.bytes.per.topic  The maximum size of the log for some
 specific topic before deleting it

 I'm curious what the first value 'log.retention.bytes' is for if the second
 one is for per topic logs, because aren't all logs generated per topic? Is
 this an aggregate value across topics?

 Related question, is there a parameter for kafka where you can say only
 hold this much TOTAL data across all topic ( logs/index together )? I.e.
 our  hosts have this much available space and so value
 log.retention.whatever.aggregate == 75% total  disk space.