Re: questtion about log.retention.bytes
I think the problem is that there is no way to understand the meaning of that config from the docs, so people keep asking over and over again. The docs make iit sound like it is per topic and just says it is the maximum size before it is deleted which makes no sense... -jay On Tuesday, August 27, 2013, Jun Rao wrote: For the first question, yes. For the second one, this is documented in http://kafka.apache.org/documentation.html#brokerconfigs Note that all per topic configuration properties below have the format of csv (e.g., topic1:value1,topic2:value2). Thanks, Jun On Tue, Aug 27, 2013 at 11:52 AM, Yu, Libo libo...@citi.comjavascript:; wrote: Hi Jun, In a previous email thread http://markmail.org/search/?q=kafka+log.retention.bytes#query:kafka%20log.retention.bytes+page:1+mid:qnt4pbq47goii2ui+state:results , you said log.retention.bytes is for each partition. Could you clarify on that? Say if I have a topic with three partitions. I want to limit the disk space to 1Gb for each partition. Then log.retention.bytes should be set to 1Gb (not 3Gb). Is that right? If I want to use log.retention.bytes.per.topic to set the same limit, should it be set to 1G or 3G? Libo -Original Message- From: Jun Rao [mailto:jun...@gmail.com javascript:;] Sent: Saturday, August 17, 2013 12:40 AM To: users@kafka.apache.org javascript:; Subject: Re: questtion about log.retention.bytes log.retention.bytes is for all topics that are not included in log.retention.bytes.per.topic (which defines a map of topic - size). Currently, we don't have a total size limit across all topics. Thanks, Jun On Fri, Aug 16, 2013 at 2:00 PM, Paul Christian pchrist...@salesforce.com javascript:;wrote: According to the Kafka 8 documentation under broker configuration. There are these parameters and their definitions. log.retention.bytes -1 The maximum size of the log before deleting it log.retention.bytes.per.topic The maximum size of the log for some specific topic before deleting it I'm curious what the first value 'log.retention.bytes' is for if the second one is for per topic logs, because aren't all logs generated per topic? Is this an aggregate value across topics? Related question, is there a parameter for kafka where you can say only hold this much TOTAL data across all topic ( logs/index together )? I.e. our hosts have this much available space and so value log.retention.whatever.aggregate == 75% total disk space.
Re: questtion about log.retention.bytes
All per topic configuration properties below have the format of csv (e.g., topic1:value1,topic2:value2). Updated our website to make it clear. Thanks, Jun On Tue, Aug 20, 2013 at 6:16 AM, Paul Christian pchrist...@salesforce.comwrote: Jun, For my first example is that syntax correct? I.e. log.retention.bytes.per.topic.A = 15MB log.retention.bytes.per.topic.B = 20MB I totally guessed there and was wondering if I guessed right? Otherwise is there a document with the proper formatting to full out this map? Thank you, Paul
re: questtion about log.retention.bytes
Jun, For my first example is that syntax correct? I.e. log.retention.bytes.per.topic.A = 15MB log.retention.bytes.per.topic.B = 20MB I totally guessed there and was wondering if I guessed right? Otherwise is there a document with the proper formatting to full out this map? Thank you, Paul
Re: questtion about log.retention.bytes
Neha, Correct, that is my question. We want to investigate capping our disk usage so we don't fill up our hds. If you have any recommended configurations or documents on these setting, please let us know. Thank you, Paul On Tue, Aug 20, 2013 at 6:16 AM, Paul Christian pchrist...@salesforce.comwrote: Jun, For my first example is that syntax correct? I.e. log.retention.bytes.per.topic.A = 15MB log.retention.bytes.per.topic.B = 20MB I totally guessed there and was wondering if I guessed right? Otherwise is there a document with the proper formatting to full out this map? Thank you, Paul
re: questtion about log.retention.bytes
Hi Jun, Thank you for your reply. I'm still a little fuzzy on the concept. Are you saying I can have topic A, B and C and with log.retention.bytes.per.topic.A = 15MB log.retention.bytes.per.topic.B = 20MB log.retention.bytes = 30MB And thus topic C will get the value 30MB? Since it's not defined like the others' 'per topic'? log.retention.bytes is for all topics that are not included in log.retention.bytes.per.topic (which defines a map of topic - size). Otherwise, log.retention.bytes.per.topic and log.retention.bytes seem very similar to me. Additionally, we've experimented with this value on our test cluster where we set the log.retention.bytes to 11MB as a test. Below is a snippet from our server.properties. # A size-based retention policy for logs. Segments are pruned from the log as long as the remaining # segments don't drop below log.retention.bytes. log.retention.bytes=11534336 Here is a ls -lh from one of the topics -rw-r--r-- 1 kafka service 10M Aug 19 15:45 07021913.index -rw-r--r-- 1 kafka service 114M Aug 19 15:45 07021913.log The index file appears to be reflected in the property log.index.size.max.bytes, but the log just keeps going.
Re: questtion about log.retention.bytes
For the first question, yes, topic C will get the value of 30MB. For the second question, log.retention.bytes only controls the segment log file size, not the index. Typically, index file size is much smaller than the log file. The index file of the last (active) segment is presized to the max index size (defaults to 10MB). However, the size is trimmed as soon as the segment rolls. Thanks, Jun On Mon, Aug 19, 2013 at 9:22 AM, Paul Christian pchrist...@salesforce.comwrote: Hi Jun, Thank you for your reply. I'm still a little fuzzy on the concept. Are you saying I can have topic A, B and C and with log.retention.bytes.per.topic.A = 15MB log.retention.bytes.per.topic.B = 20MB log.retention.bytes = 30MB And thus topic C will get the value 30MB? Since it's not defined like the others' 'per topic'? log.retention.bytes is for all topics that are not included in log.retention.bytes.per.topic (which defines a map of topic - size). Otherwise, log.retention.bytes.per.topic and log.retention.bytes seem very similar to me. Additionally, we've experimented with this value on our test cluster where we set the log.retention.bytes to 11MB as a test. Below is a snippet from our server.properties. # A size-based retention policy for logs. Segments are pruned from the log as long as the remaining # segments don't drop below log.retention.bytes. log.retention.bytes=11534336 Here is a ls -lh from one of the topics -rw-r--r-- 1 kafka service 10M Aug 19 15:45 07021913.index -rw-r--r-- 1 kafka service 114M Aug 19 15:45 07021913.log The index file appears to be reflected in the property log.index.size.max.bytes, but the log just keeps going.
Re: questtion about log.retention.bytes
Paul, I'm trying to understand the 2nd problem you reported. Are you saying that you set the log.retention.bytes=11534336 (11MB) but nevertheless your log grew to 114MB. Which means the config option didn't really work as expected? Thanks, Neha On Mon, Aug 19, 2013 at 8:46 PM, Jun Rao jun...@gmail.com wrote: For the first question, yes, topic C will get the value of 30MB. For the second question, log.retention.bytes only controls the segment log file size, not the index. Typically, index file size is much smaller than the log file. The index file of the last (active) segment is presized to the max index size (defaults to 10MB). However, the size is trimmed as soon as the segment rolls. Thanks, Jun On Mon, Aug 19, 2013 at 9:22 AM, Paul Christian pchrist...@salesforce.comwrote: Hi Jun, Thank you for your reply. I'm still a little fuzzy on the concept. Are you saying I can have topic A, B and C and with log.retention.bytes.per.topic.A = 15MB log.retention.bytes.per.topic.B = 20MB log.retention.bytes = 30MB And thus topic C will get the value 30MB? Since it's not defined like the others' 'per topic'? log.retention.bytes is for all topics that are not included in log.retention.bytes.per.topic (which defines a map of topic - size). Otherwise, log.retention.bytes.per.topic and log.retention.bytes seem very similar to me. Additionally, we've experimented with this value on our test cluster where we set the log.retention.bytes to 11MB as a test. Below is a snippet from our server.properties. # A size-based retention policy for logs. Segments are pruned from the log as long as the remaining # segments don't drop below log.retention.bytes. log.retention.bytes=11534336 Here is a ls -lh from one of the topics -rw-r--r-- 1 kafka service 10M Aug 19 15:45 07021913.index -rw-r--r-- 1 kafka service 114M Aug 19 15:45 07021913.log The index file appears to be reflected in the property log.index.size.max.bytes, but the log just keeps going.
Re: questtion about log.retention.bytes
log.retention.bytes is for all topics that are not included in log.retention.bytes.per.topic (which defines a map of topic - size). Currently, we don't have a total size limit across all topics. Thanks, Jun On Fri, Aug 16, 2013 at 2:00 PM, Paul Christian pchrist...@salesforce.comwrote: According to the Kafka 8 documentation under broker configuration. There are these parameters and their definitions. log.retention.bytes -1 The maximum size of the log before deleting it log.retention.bytes.per.topic The maximum size of the log for some specific topic before deleting it I'm curious what the first value 'log.retention.bytes' is for if the second one is for per topic logs, because aren't all logs generated per topic? Is this an aggregate value across topics? Related question, is there a parameter for kafka where you can say only hold this much TOTAL data across all topic ( logs/index together )? I.e. our hosts have this much available space and so value log.retention.whatever.aggregate == 75% total disk space.