Re: Frequent "offset out of range" messages, partitions deserted by consumer
unsubscribe
Consumer offsets resetting
We’re seeing a strange error on our Kafka cluster. Users have reported their consumer offsets seem to periodically reset, which makes downstream processes handle a massive amount of duplicates. Our Kafka cluster is currently running 0.10.1.1. Previously the cluster had been at 0.10.0.1. It was upgraded to 0.10.1.0, but we ran into multiple deadlocks so we downgraded to 0.10.0.1. We upgrade from 0.10.0.1 to 0.10.1.1. This is the error we see. It only comes up every few weeks and only on some topics. [2017-04-12 08:41:14,313] ERROR [ReplicaFetcherThread-1-8], Error for partition [email_priority,10] to broker 8:org.apache.kafka.common.errors.UnknownServerException: The server experienced an unexpected error when processing the request (kafka.server.ReplicaFetcherThread) Has anyone seen this behavior before? Is there any way to prevent this from happening? Thank you all! Lawrence Weikum | Sr. Software Engineer | Pandora 1426 Pearl Street, Suite 100, Boulder CO 80302 m 720.203.1578 | lwei...@pandora.com<mailto:lwei...@pandora.com>
Consumer offsets resetting
We’re seeing a strange error on our Kafka cluster. Users have reported their consumer offsets seem to periodically reset, which makes downstream processes handle a massive amount of duplicates. Our Kafka cluster is currently running 0.10.1.1. Previously the cluster had been at 0.10.0.1. It was upgraded to 0.10.1.0, but we ran into multiple deadlocks so we downgraded to 0.10.0.1. We upgrade from 0.10.0.1 to 0.10.1.1. This is the error we see. It only comes up every few weeks and only on some topics. [2017-04-12 08:41:14,313] ERROR [ReplicaFetcherThread-1-8], Error for partition [email_priority,10] to broker 8:org.apache.kafka.common.errors.UnknownServerException: The server experienced an unexpected error when processing the request (kafka.server.ReplicaFetcherThread) Has anyone seen this behavior before? Is there any way to prevent this from happening? Thank you all! Lawrence Weikum | Sr. Software Engineer | Pandora 1426 Pearl Street, Suite 100, Boulder CO 80302 m 720.203.1578 | lwei...@pandora.com
Re: Deadlock using latest 0.10.1 Kafka release
We saw this increase when upgrading from 0.9.0.1 to 0.10.0.1. We’re now running on 0.10.1.0, and the FD increase is due to a deadlock, not functionality or new features. Lawrence Weikum | Software Engineer | Pandora 1426 Pearl Street, Suite 100, Boulder CO 80302 m 720.203.1578 | lwei...@pandora.com On 11/3/16, 12:42 PM, "Hans Jespersen" <h...@confluent.io> wrote: The 0.10.1 broker will use more file descriptor than previous releases because of the new timestamp indexes. You should expect and plan for ~33% more file descriptors to be open. -hans /** * Hans Jespersen, Principal Systems Engineer, Confluent Inc. * h...@confluent.io (650)924-2670 */ On Thu, Nov 3, 2016 at 10:02 AM, Marcos Juarez <mjua...@gmail.com> wrote: > We're running into a recurrent deadlock issue in both our production and > staging clusters, both using the latest 0.10.1 release. The symptom we > noticed was that, in servers in which kafka producer connections are short > lived, every other day or so, we'd see file descriptors being exhausted, > until the broker is restarted, or the broker runs out of file descriptors, > and it goes down. None of the clients are on 0.10.1 kafka jars, they're > all using previous versions. > > When diagnosing the issue, we found that when the system is in that state, > using up file descriptors at a really fast rate, the JVM is actually in a > deadlock. Did a thread dump from both jstack and visualvm, and attached > those to this email. > > This is the interesting bit from the jstack thread dump: > > > Found one Java-level deadlock: > = > "executor-Heartbeat": > waiting to lock monitor 0x016c8138 (object 0x00062732a398, a > kafka.coordinator.GroupMetadata), > which is held by "group-metadata-manager-0" > > "group-metadata-manager-0": > waiting to lock monitor 0x011ddaa8 (object 0x00063f1b0cc0, a > java.util.LinkedList), > which is held by "kafka-request-handler-3" > > "kafka-request-handler-3": > waiting to lock monitor 0x016c8138 (object 0x00062732a398, a > kafka.coordinator.GroupMetadata), > which is held by "group-metadata-manager-0" > > > I also noticed the background heartbeat thread (I'm guessing the one > called "executor-Heartbeat" above) is new for this release, under > KAFKA-3888 ticket - https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_KAFKA-2D3888=CwIBaQ=gFTBenQ7Vj71sUi1A4CkFnmPzqwDo07QsHw-JRepxyw=VSog3hHkqzZLadc6n_6BPH1OAPc78b24WpAbuhVZI0E=zJ2wVkapVi8N-jmDGRxM8a16nchqtjTfs20lhBw5xB0=nEcLEnYWPyaDuPDI5vSSKPWoljoXYbvNriVw0wrEegk= > > We haven't noticed this problem with earlier Kafka broker versions, so I'm > guessing maybe this new background heartbeat thread is what introduced the > deadlock problem. > > That same broker is still in the deadlock scenario, we haven't restarted > it, so let me know if you'd like more info/log/stats from the system before > we restart it. > > Thanks, > > Marcos Juarez >
Schema Registry in Staging Environment
Hello, Has anyone used Confluent’s Schema Registry? If so, I’m curious to hear about best practices for using it in a staging environment. Do users typically copy schemas over to the staging environment from production? Are developers allowed to create new schemas in the staging environment? Thanks! Lawrence Weikum
Flickering Kafka Topic
Hello everyone! We seem to be experiencing some odd behavior in Kafka and were wondering if anyone has come across the same issue and if you’ve been able to fix it. Here’s the setup: 8 brokers in the cluster. Kafka 0.10.0.0. One topic, and only one topic on this cluster, is having issues where ISRs continuously decrease and increase but never stabilize. This happens after roughly 50,000 messages per second come in, and the problem is exacerbated when the messages increased to 110,000 messages per second. Messages are small. Total inbound is only about 50 MB/s. There’s no errors in the logs. We just get countless number of messages like theses in the logs: [2016-09-09 12:54:07,147] INFO Partition [topic_a,11] on broker 4: Expanding ISR for partition [topic_a,11] from 4 to 4,2 (kafka.cluster.Partition) [2016-09-09 12:54:23,070] INFO Partition [topic_a,11] on broker 4: Shrinking ISR for partition [topic_a,11] from 4,2 to 4 (kafka.cluster.Partition) This topic has transient data that is unimportant after 20 minutes, so losing some due to a cluster shutdown isn’t that important, and we also don’t mind if messages are occasionally dropped. With this in mind we have these settings: Partitions = 16 Producer ACKs = 1 Replication factor = 2 min.insync.replicas = 1 CPU is sitting fairly idle at ~18%, and a thread dump and profile showed that most threads are sitting idle as well – very little contention if any. We tried to increase the number of partitions from 16 to 24, but it seems to have only grown the CPU (from 18% to 23%) and the number of Under Replicated Partitions. Any advice or insight is appreciated. Thank you all! Lawrence
0.10 Metrics Reporter Suggestions
Hello, We’re looking into the functionality of Metrics Reporters for producers and consumers in Kafka 0.10. Are there any projects that can be recommended that seem promising; specifically involving sending metrics to either StatsD or Graphite? As always, thank you for your help! Lawrence Weikum
Re: Consumer Offsets and Open FDs
It seems that the log-cleaner is still failing no matter what settings I give it. Here is the full output from one of our brokers: [2016-07-18 13:00:40,726] ERROR [kafka-log-cleaner-thread-0], Error due to (kafka.log.LogCleaner) java.lang.IllegalArgumentException: requirement failed: 192053210 messages in segment __consumer_offsets-15/.log but offset map can fit only 7499. You can increase log.cleaner.dedupe.buffer.size or decrease log.cleaner.threads at scala.Predef$.require(Predef.scala:219) at kafka.log.Cleaner$$anonfun$buildOffsetMap$4.apply(LogCleaner.scala:584) at kafka.log.Cleaner$$anonfun$buildOffsetMap$4.apply(LogCleaner.scala:580) at scala.collection.immutable.Stream$StreamWithFilter.foreach(Stream.scala:570) at kafka.log.Cleaner.buildOffsetMap(LogCleaner.scala:580) at kafka.log.Cleaner.clean(LogCleaner.scala:322) at kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:230) at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:208) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63) [2016-07-18 13:00:40,732] INFO [kafka-log-cleaner-thread-0], Stopped (kafka.log.LogCleaner) Currently, I have heap allocation up to 64GB, only one log-cleaning thread is set to run, and log.cleaner.dedupe.buffer.size is 2GB. I get this error if I try to increase it anymore: WARN [kafka-log-cleaner-thread-0], Cannot use more than 2G of cleaner buffer space per cleaner thread, ignoring excess buffer space... (kafka.log.LogCleaner) Is there something else I can do to help the broker compact the __consumer_offset topics? Thank you again for your help! Lawrence Weikum On 7/13/16, 1:06 PM, "Rakesh Vidyadharan" <rvidyadha...@gracenote.com> wrote: We ran into this as well, and I ended up with the following that works for us. log.cleaner.dedupe.buffer.size=536870912 log.cleaner.io.buffer.size=2000 On 13/07/2016 14:01, "Lawrence Weikum" <lwei...@pandora.com> wrote: >Apologies. Here is the full trace from a broker: > >[2016-06-24 09:57:39,881] ERROR [kafka-log-cleaner-thread-0], Error due to >(kafka.log.LogCleaner) >java.lang.IllegalArgumentException: requirement failed: 9730197928 messages in >segment __consumer_offsets-36/.log but offset map can fit >only 5033164. You can increase log.cleaner.dedupe.buffer.size or decrease >log.cleaner.threads >at scala.Predef$.require(Predef.scala:219) >at > kafka.log.Cleaner$$anonfun$buildOffsetMap$4.apply(LogCleaner.scala:584) >at > kafka.log.Cleaner$$anonfun$buildOffsetMap$4.apply(LogCleaner.scala:580) >at > scala.collection.immutable.Stream$StreamWithFilter.foreach(Stream.scala:570) >at kafka.log.Cleaner.buildOffsetMap(LogCleaner.scala:580) >at kafka.log.Cleaner.clean(LogCleaner.scala:322) >at > kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:230) >at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:208) >at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63) >[2016-06-24 09:57:39,881] INFO [kafka-log-cleaner-thread-0], Stopped >(kafka.log.LogCleaner) > > >Is log.cleaner.dedupe.buffer.size a broker setting? What is a good number to >set it to? > > > >Lawrence Weikum > > >On 7/13/16, 11:18 AM, "Manikumar Reddy" <manikumar.re...@gmail.com> wrote: > >Can you post the complete error stack trace? Yes, you need to >restart the affected >brokers. >You can tweak log.cleaner.dedupe.buffer.size, log.cleaner.io.buffer.size >configs. > >Some related JIRAs: > >https://issues.apache.org/jira/browse/KAFKA-3587 >https://issues.apache.org/jira/browse/KAFKA-3894 >https://issues.apache.org/jira/browse/KAFKA-3915 > >On Wed, Jul 13, 2016 at 10:36 PM, Lawrence Weikum <lwei...@pandora.com> >wrote: > >> Oh interesting. I didn’t know about that log file until now. >> >> The only error that has been populated among all brokers showing this >> behavior is: >> >> ERROR [kafka-log-cleaner-thread-0], Error due to (kafka.log.LogCleaner) >> >> Then we see many messages like this: >> >> INFO Compaction for partition [__consumer_offsets,30] is resumed >> (kafka.log.LogCleaner) >> INFO The cleaning for partition [__consumer_offsets,30] is aborted >> (kafka.log.LogCleaner) >> >> Using Visual VM, I do not see any log-cleaner threads in those brokers. I >> do see it in the brokers not showing this behavior though. >> >> Any idea why the LogCleaner failed? >> >> As a temporary fix, should we restart the affected brokers? >> >> Thanks again! >> >> >> Lawrence We
Re: Consumer Offsets and Open FDs
Apologies. Here is the full trace from a broker: [2016-06-24 09:57:39,881] ERROR [kafka-log-cleaner-thread-0], Error due to (kafka.log.LogCleaner) java.lang.IllegalArgumentException: requirement failed: 9730197928 messages in segment __consumer_offsets-36/.log but offset map can fit only 5033164. You can increase log.cleaner.dedupe.buffer.size or decrease log.cleaner.threads at scala.Predef$.require(Predef.scala:219) at kafka.log.Cleaner$$anonfun$buildOffsetMap$4.apply(LogCleaner.scala:584) at kafka.log.Cleaner$$anonfun$buildOffsetMap$4.apply(LogCleaner.scala:580) at scala.collection.immutable.Stream$StreamWithFilter.foreach(Stream.scala:570) at kafka.log.Cleaner.buildOffsetMap(LogCleaner.scala:580) at kafka.log.Cleaner.clean(LogCleaner.scala:322) at kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:230) at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:208) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63) [2016-06-24 09:57:39,881] INFO [kafka-log-cleaner-thread-0], Stopped (kafka.log.LogCleaner) Is log.cleaner.dedupe.buffer.size a broker setting? What is a good number to set it to? Lawrence Weikum On 7/13/16, 11:18 AM, "Manikumar Reddy" <manikumar.re...@gmail.com> wrote: Can you post the complete error stack trace? Yes, you need to restart the affected brokers. You can tweak log.cleaner.dedupe.buffer.size, log.cleaner.io.buffer.size configs. Some related JIRAs: https://issues.apache.org/jira/browse/KAFKA-3587 https://issues.apache.org/jira/browse/KAFKA-3894 https://issues.apache.org/jira/browse/KAFKA-3915 On Wed, Jul 13, 2016 at 10:36 PM, Lawrence Weikum <lwei...@pandora.com> wrote: > Oh interesting. I didn’t know about that log file until now. > > The only error that has been populated among all brokers showing this > behavior is: > > ERROR [kafka-log-cleaner-thread-0], Error due to (kafka.log.LogCleaner) > > Then we see many messages like this: > > INFO Compaction for partition [__consumer_offsets,30] is resumed > (kafka.log.LogCleaner) > INFO The cleaning for partition [__consumer_offsets,30] is aborted > (kafka.log.LogCleaner) > > Using Visual VM, I do not see any log-cleaner threads in those brokers. I > do see it in the brokers not showing this behavior though. > > Any idea why the LogCleaner failed? > > As a temporary fix, should we restart the affected brokers? > > Thanks again! > > > Lawrence Weikum > > On 7/13/16, 10:34 AM, "Manikumar Reddy" <manikumar.re...@gmail.com> wrote: > > Hi, > > Are you seeing any errors in log-cleaner.log? The log-cleaner thread can > crash on certain errors. > > Thanks > Manikumar > > On Wed, Jul 13, 2016 at 9:54 PM, Lawrence Weikum <lwei...@pandora.com> > wrote: > > > Hello, > > > > We’re seeing a strange behavior in Kafka 0.9.0.1 which occurs about every > > other week. I’m curious if others have seen it and know of a solution. > > > > Setup and Scenario: > > > > - Brokers initially setup with log compaction turned off > > > > - After 30 days, log compaction was turned on > > > > - At this time, the number of Open FDs was ~ 30K per broker. > > > > - After 2 days, the __consumer_offsets topic was compacted > > fully. Open FDs reduced to ~5K per broker. > > > > - Cluster has been under normal load for roughly 7 days. > > > > - At the 7 day mark, __consumer_offsets topic seems to have > > stopped compacting on two of the brokers, and on those brokers, the FD > > count is up to ~25K. > > > > > > We have tried rebalancing the partitions before. The first time, the > > destination broker had compacted the data fine and open FDs were low. The > > second time, the destination broker kept the FDs open. > > > > > > In all the broker logs, we’re seeing this messages: > > INFO [Group Metadata Manager on Broker 8]: Removed 0 expired offsets in 0 > > milliseconds. (kafka.coordinator.GroupMetadataManager) > > > > There are only 4 consumers at the moment on the cluster; one topic with > 92 > > partitions. > > > > Is there a reason why log compaction may stop working or why the > > __consumer_offsets topic would start holding thousands of FDs? > > > > Thank you all for your help! > > > > Lawrence Weikum > > > > > > >
Re: Consumer Offsets and Open FDs
Oh interesting. I didn’t know about that log file until now. The only error that has been populated among all brokers showing this behavior is: ERROR [kafka-log-cleaner-thread-0], Error due to (kafka.log.LogCleaner) Then we see many messages like this: INFO Compaction for partition [__consumer_offsets,30] is resumed (kafka.log.LogCleaner) INFO The cleaning for partition [__consumer_offsets,30] is aborted (kafka.log.LogCleaner) Using Visual VM, I do not see any log-cleaner threads in those brokers. I do see it in the brokers not showing this behavior though. Any idea why the LogCleaner failed? As a temporary fix, should we restart the affected brokers? Thanks again! Lawrence Weikum On 7/13/16, 10:34 AM, "Manikumar Reddy" <manikumar.re...@gmail.com> wrote: Hi, Are you seeing any errors in log-cleaner.log? The log-cleaner thread can crash on certain errors. Thanks Manikumar On Wed, Jul 13, 2016 at 9:54 PM, Lawrence Weikum <lwei...@pandora.com> wrote: > Hello, > > We’re seeing a strange behavior in Kafka 0.9.0.1 which occurs about every > other week. I’m curious if others have seen it and know of a solution. > > Setup and Scenario: > > - Brokers initially setup with log compaction turned off > > - After 30 days, log compaction was turned on > > - At this time, the number of Open FDs was ~ 30K per broker. > > - After 2 days, the __consumer_offsets topic was compacted > fully. Open FDs reduced to ~5K per broker. > > - Cluster has been under normal load for roughly 7 days. > > - At the 7 day mark, __consumer_offsets topic seems to have > stopped compacting on two of the brokers, and on those brokers, the FD > count is up to ~25K. > > > We have tried rebalancing the partitions before. The first time, the > destination broker had compacted the data fine and open FDs were low. The > second time, the destination broker kept the FDs open. > > > In all the broker logs, we’re seeing this messages: > INFO [Group Metadata Manager on Broker 8]: Removed 0 expired offsets in 0 > milliseconds. (kafka.coordinator.GroupMetadataManager) > > There are only 4 consumers at the moment on the cluster; one topic with 92 > partitions. > > Is there a reason why log compaction may stop working or why the > __consumer_offsets topic would start holding thousands of FDs? > > Thank you all for your help! > > Lawrence Weikum > >
Consumer Offsets and Open FDs
Hello, We’re seeing a strange behavior in Kafka 0.9.0.1 which occurs about every other week. I’m curious if others have seen it and know of a solution. Setup and Scenario: - Brokers initially setup with log compaction turned off - After 30 days, log compaction was turned on - At this time, the number of Open FDs was ~ 30K per broker. - After 2 days, the __consumer_offsets topic was compacted fully. Open FDs reduced to ~5K per broker. - Cluster has been under normal load for roughly 7 days. - At the 7 day mark, __consumer_offsets topic seems to have stopped compacting on two of the brokers, and on those brokers, the FD count is up to ~25K. We have tried rebalancing the partitions before. The first time, the destination broker had compacted the data fine and open FDs were low. The second time, the destination broker kept the FDs open. In all the broker logs, we’re seeing this messages: INFO [Group Metadata Manager on Broker 8]: Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.GroupMetadataManager) There are only 4 consumers at the moment on the cluster; one topic with 92 partitions. Is there a reason why log compaction may stop working or why the __consumer_offsets topic would start holding thousands of FDs? Thank you all for your help! Lawrence Weikum
Re: Log Compaction not Deleting on Kafka 0.9.0.1
Fascinating. We are seeing no errors or warning in the logs after restart. It appears on this broker that the compaction thread is working: [2016-06-22 10:33:49,179] INFO Rolled new log segment for '__consumer_offsets-28' in 1 ms. (kafka.log.Log) [2016-06-22 10:34:00,968] INFO Deleting segment 0 from log __consumer_offsets-28. (kafka.log.Log) [2016-06-22 10:34:00,970] INFO Deleting index /kafka/data/__consumer_offsets-28/.index.deleted (kafka.log.OffsetIndex) [2016-06-22 10:34:00,992] INFO Deleting segment 2148144095 from log __consumer_offsets-28. (kafka.log.Log) [2016-06-22 10:34:00,994] INFO Deleting index /kafka/data/__consumer_offsets-28/002148144095.index.deleted (kafka.log.OffsetIndex) [2016-06-22 10:34:01,002] INFO Deleting segment 3189277822 from log __consumer_offsets-28. (kafka.log.Log) [2016-06-22 10:34:01,004] INFO Deleting index /kafka/data/__consumer_offsets-28/003189277822.index.deleted (kafka.log.OffsetIndex) [2016-06-22 10:34:02,019] INFO Deleting segment 3190205744 from log __consumer_offsets-28. (kafka.log.Log) [2016-06-22 10:34:02,039] INFO Deleting index /kafka/data/__consumer_offsets-28/003190205744.index.deleted (kafka.log.OffsetIndex) We see the “kafka-log-cleaner-thread” in the JMX. It seems to run about every 50 seconds. From the log-cleaner.log file, we see plenty of this output regarding the partition that’s hogging the FDs: [2016-06-22 10:44:31,845] INFO Cleaner 0: Beginning cleaning of log __consumer_offsets-28. (kafka.log.LogCleaner) [2016-06-22 10:44:31,846] INFO Cleaner 0: Building offset map for __consumer_offsets-28... (kafka.log.LogCleaner) [2016-06-22 10:44:31,878] INFO Cleaner 0: Building offset map for log __consumer_offsets-28 for 1 segments in offset range [3204124461, 3205052375). (kafka.log.LogCleaner) [2016-06-22 10:44:32,870] INFO Cleaner 0: Offset map for log __consumer_offsets-28 complete. (kafka.log.LogCleaner) [2016-06-22 10:44:32,871] INFO Cleaner 0: Cleaning log __consumer_offsets-28 (discarding tombstones prior to Tue Jun 21 10:43:19 PDT 2016)... (kafka.log.LogCleaner) [2016-06-22 10:44:32,871] INFO Cleaner 0: Cleaning segment 0 in log __consumer_offsets-28 (last modified Tue Jun 21 22:39:18 PDT 2016) into 0, retaining deletes. (kafka.log.LogCleaner) [2016-06-22 10:44:32,888] INFO Cleaner 0: Swapping in cleaned segment 0 for segment(s) 0 in log __consumer_offsets-28. (kafka.log.LogCleaner) [2016-06-22 10:44:32,889] INFO Cleaner 0: Cleaning segment 2148144095 in log __consumer_offsets-28 (last modified Wed Jun 22 10:42:31 PDT 2016) into 2148144095, retaining deletes. (kafka.log.LogCleaner) [2016-06-22 10:44:32,889] INFO Cleaner 0: Cleaning segment 3203196540 in log __consumer_offsets-28 (last modified Wed Jun 22 10:43:19 PDT 2016) into 2148144095, retaining deletes. (kafka.log.LogCleaner) [2016-06-22 10:44:32,905] INFO Cleaner 0: Swapping in cleaned segment 2148144095 for segment(s) 2148144095,3203196540 in log __consumer_offsets-28. (kafka.log.LogCleaner) [2016-06-22 10:44:32,905] INFO Cleaner 0: Cleaning segment 3204124461 in log __consumer_offsets-28 (last modified Wed Jun 22 10:44:21 PDT 2016) into 3204124461, retaining deletes. (kafka.log.LogCleaner) [2016-06-22 10:44:33,834] INFO Cleaner 0: Swapping in cleaned segment 3204124461 for segment(s) 3204124461 in log __consumer_offsets-28. (kafka.log.LogCleaner) [2016-06-22 10:44:33,836] INFO [kafka-log-cleaner-thread-0], Log cleaner thread 0 cleaned log __consumer_offsets-28 (dirty section = [3204124461, 3205052375]) 100.0 MB of log processed in 2.0 seconds (50.3 MB/sec). Indexed 100.0 MB in 1.0 seconds (97.6 Mb/sec, 51.5% of total time) Buffer utilization: 0.0% Cleaned 100.0 MB in 1.0 seconds (103.6 Mb/sec, 48.5% of total time) Start size: 100.0 MB (928,011 messages) End size: 0.0 MB (97 messages) 100.0% size reduction (100.0% fewer messages) (kafka.log.LogCleaner) But no actual delete messages like a properly-working broker is showing of a different partition. Lawrence Weikum On 6/22/16, 11:28 AM, "Tom Crayford" <tcrayf...@heroku.com> wrote: kafka-log-cleaner-thread-0
Log Compaction not Deleting on Kafka 0.9.0.1
We seem to be having a strange issue with a cluster of ours; specifically with the __consumer_offsets topic. When we brought the cluster online, log compaction was turned off. Realizing our mistake, we turned it on, but only after the topic had over 31,018,699,972 offsets committed to it. Log compaction seems to have worked and be working properly. The logs are showing that every partition has been compacted, and may pieces have been marked for deletion. The problem is that not all partitions are having their older logs deleted. Some partitions will grow to having 19 log files, but a few seconds later will have only 13. One partition in particular, though, still has all of its log files, all 19,000 of them, and this never seems to change, only grow as new offsets come in. Restarting that broker doesn’t seem to help. We’ve checked the broker settings on everything as well. log.cleaner.enable = true log.cleanup.policy = delete cleanup.policy = compact Has anyone encountered this issue before? Thank you all for the help! Lawrence Weikum
Re: JVM Optimizations
Hi Tom, Currently we’re using the default settings – no special tuning whatsoever. I think the kafka-run-class.sh has this: # Memory options if [ -z "$KAFKA_HEAP_OPTS" ]; then KAFKA_HEAP_OPTS="-Xmx256M" fi # JVM performance options if [ -z "$KAFKA_JVM_PERFORMANCE_OPTS" ]; then KAFKA_JVM_PERFORMANCE_OPTS="-server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+DisableExplicitGC -Djava.awt.headless=true" fi Is this the confluent doc you were referring to? http://docs.confluent.io/2.0.0/kafka/deployment.html Thanks! Lawrence Weikum On 6/9/16, 1:32 PM, "Tom Crayford" <tcrayf...@heroku.com> wrote: >Hi Lawrence, > >What JVM options were you using? There's a few pages in the confluent docs >on JVM tuning iirc. We simply use the G1 and a 4GB Max heap and things work >well (running many thousands of clusters). > >Thanks >Tom Crayford >Heroku Kafka > >On Thursday, 9 June 2016, Lawrence Weikum <lwei...@pandora.com> wrote: > >> Hello all, >> >> We’ve been running a benchmark test on a Kafka cluster of ours running >> 0.9.0.1 – slamming it with messages to see when/if things might break. >> During our test, we caused two brokers to throw OutOfMemory errors (looks >> like from the Heap) even though each machine still has 43% of the total >> memory unused. >> >> I’m curious what JVM optimizations are recommended for Kafka brokers? Or >> if there aren’t any that are recommended, what are some optimizations >> others are using to keep the brokers running smoothly? >> >> Best, >> >> Lawrence Weikum >> >>
JVM Optimizations
Hello all, We’ve been running a benchmark test on a Kafka cluster of ours running 0.9.0.1 – slamming it with messages to see when/if things might break. During our test, we caused two brokers to throw OutOfMemory errors (looks like from the Heap) even though each machine still has 43% of the total memory unused. I’m curious what JVM optimizations are recommended for Kafka brokers? Or if there aren’t any that are recommended, what are some optimizations others are using to keep the brokers running smoothly? Best, Lawrence Weikum
SchemaException error on 0.10.0.0-snapshot
Setting up a basic word count job, similar to the one from confluent-2.1.0-alpha1 (WordCountJob). Mine does the same thing essentially, just within our namespace. Coming across this error though, which is strange because the code for the moment is identical with the exception of the topic names. Exception in thread "StreamThread-1" org.apache.kafka.common.protocol.types.SchemaException: Error reading field 'topic_metadata': Error reading array of size 1731681, only 54 bytes available at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:75) at org.apache.kafka.clients.NetworkClient.parseResponse(NetworkClient.java:380) at org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:449) at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:269) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:360) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:192) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163) at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:179) at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:973) at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:937) at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:295) at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:218) Is there something I'm missing? Thanks for the help. Lawrence Begin full output: /Library/Java/JavaVirtualMachines/jdk1.8.0_25.jdk/Contents/Home/bin/java -Didea.launcher.port=7539 "-Didea.launcher.bin.path=/Applications/IntelliJ IDEA.app/Contents/bin" -Dfile.encoding=UTF-8 -classpath
Topic retention configuration clarification
Using 0.9.0.1. I'm building a new topic that should keep data for much longer than the brokers' default, say at least a year, before deleting messages. http://kafka.apache.org/documentation.html says setting the "retention.ms" for the topic will adjust the time, but I cannot find out what unit of time Kafka uses for this. "ms" would suggest "milliseconds", so a year would be about 3.154e+13 milliseconds. This seems like an uncomforatably-high number to give. Can anyone else confirm this time unit for "retention.ms" for the topic config is in milliseconds? Is there also a "retention.minutes" that's just undocumented? Thanks! Lawrence Weikum
Default behavior for full broker
Hello, I'm curious about the expected or default behavior that might occur if a broker in the system has filled up. By that I mean when a broker has used all of its memory and disk space. Is the node simply removed from the system until space is cleared? As I'm thinking through this a little more, if that is the case and the node is removed, some partitions in the system may be marked as under-replicated and cause a cascading effect where partitions are re-replicated and cause other nodes to fill up. Has that ever happened? Does Kafka have a contingency plan for such a scenario? Thank you so much for your insight and all of your hard work! Lawrence