[jira] [Created] (KAFKA-6465) Add a metrics for the number of records per log
Ivan Babrou created KAFKA-6465: -- Summary: Add a metrics for the number of records per log Key: KAFKA-6465 URL: https://issues.apache.org/jira/browse/KAFKA-6465 Project: Kafka Issue Type: Bug Reporter: Ivan Babrou Currently there are log metrics for: * Start offset * End offset * Size in bytes * Number of segments I propose to add another metric to track number of record batches in the log. This should provide operators with an idea of how much batching is happening on the producers. Having this metric in one place seems easier than scraping the metric from each producer. Having an absolute counter may be infeasible (batches are not assigned sequential IDs), but gauge should be ok. Average batch size can be calculated as (end offset - start offset) / number of batches. This will be heavily skewed for logs with long retention, though. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-6444) Kafka consumers and producers get confused by dualstack A + AAAA DNS records
Ivan Babrou created KAFKA-6444: -- Summary: Kafka consumers and producers get confused by dualstack A + DNS records Key: KAFKA-6444 URL: https://issues.apache.org/jira/browse/KAFKA-6444 Project: Kafka Issue Type: Bug Affects Versions: 0.11.0.1 Reporter: Ivan Babrou We have hostnames with both A (IPv4) and (IPv6) DNS records. Kafka is configured to only listen on IPv6 by manually setting IP to listen on and advertise outside. Brokers have no issue communicating between them, because they are not given the option to resolve hostnames and pick IP protocol version. Consumers and producers have to use bootstrap hostnames and do not try to connect to IPv6 at all, they are stuck in SYN_SENT over IPv4: {noformat} syslog-ng 12621 999 123u IPv6 2411122889 0t0TCP 192.168.0.2:41626->192.168.0.1:9092 (SYN_SENT) {noformat} This happened to consumer in syslog-ng output plugin: * https://github.com/balabit/syslog-ng/issues/1835 It also happened to a Flink consumer, although I do no have any more info about that one. We fixed the issue by only providing records for bootstrapping. Previously we saw the opposite problem with dualstack: software does not implement happy eyeballs and only connects to IPv6 address, which is firewalled. This happened to SSH (client gets stuck if you don't supply -4) and Go (https://github.com/golang/go/issues/5) to give a couple of examples. The solution for this is Happy Eyeballs: https://en.wikipedia.org/wiki/Happy_Eyeballs Kafka clients should connect to IPv6 first and then fallback to IPv4 if not available. There is also KAFKA-3347. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (KAFKA-6441) FetchRequest populates buffer of size MinBytes, even if response is smaller
[ https://issues.apache.org/jira/browse/KAFKA-6441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Babrou resolved KAFKA-6441. Resolution: Invalid > FetchRequest populates buffer of size MinBytes, even if response is smaller > --- > > Key: KAFKA-6441 > URL: https://issues.apache.org/jira/browse/KAFKA-6441 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.11.0.1 >Reporter: Ivan Babrou > > We're using Sarama Go client as consumer, but I don't think it's relevant. > Producer is syslog-ng with Kafka output, I'm not quite sure which log format > Kafka itself is using, but I can assume 0.11.0.0, because that's what is set > in topic settings. > Our FetchRequest has minSize = 16MB, maxSize = 64, maxWait = 500ms. For a > silly reason, Kafka decides to reply with at least minSize buffer with just > one 1KB log message. When Sarama was using older consumer API, everything was > okay. When we upgraded to 0.11.0.0 consumer API, consumer traffic for > 125Mbit/s topic spiked to 55000Mbit/s on the wire and consumer wasn't even > able to keep up. > 1KB message in a 16MB buffer is 1,600,000% overhead. > I don't think there's any valid reason to do this. > It's also mildly annoying that there is no tag 0.11.0.1 in git, looking at > changes is harder than it should be. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KAFKA-6441) FetchRequest populates buffer of size MinBytes, even if response is smaller
Ivan Babrou created KAFKA-6441: -- Summary: FetchRequest populates buffer of size MinBytes, even if response is smaller Key: KAFKA-6441 URL: https://issues.apache.org/jira/browse/KAFKA-6441 Project: Kafka Issue Type: Bug Affects Versions: 0.11.0.1 Reporter: Ivan Babrou We're using Sarama Go client as consumer, but I don't think it's relevant. Producer is syslog-ng with Kafka output, I'm not quite sure which log format Kafka itself is using, but I can assume 0.11.0.0, because that's what is set in topic settings. Our FetchRequest has minSize = 16MB, maxSize = 64, maxWait = 500ms. For a silly reason, Kafka decides to reply with at least minSize buffer with just one 1KB log message. When Sarama was using older consumer API, everything was okay. When we upgraded to 0.11.0.0 consumer API, consumer traffic for 125Mbit/s topic spiked to 55000Mbit/s on the wire and consumer wasn't even able to keep up. 1KB message in a 16MB buffer is 1,600,000% overhead. I don't think there's any valid reason to do this. It's also mildly annoying that there is no tag 0.11.0.1 in git, looking at changes is harder than it should be. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KAFKA-6423) Slow shutdown with many open files
Ivan Babrou created KAFKA-6423: -- Summary: Slow shutdown with many open files Key: KAFKA-6423 URL: https://issues.apache.org/jira/browse/KAFKA-6423 Project: Kafka Issue Type: Bug Affects Versions: 0.11.0.1 Reporter: Ivan Babrou We have brokers with 20k open files and shutdown is extremely slow, progressing usually at around 60 closed file descriptors per second: {noformat} $ while true; do echo $(date) $(sudo ls /proc/6363/fd | wc -l); sleep 1; done Thu Jan 4 23:00:51 UTC 2018 9770 Thu Jan 4 23:00:52 UTC 2018 9691 Thu Jan 4 23:00:53 UTC 2018 9616 Thu Jan 4 23:00:54 UTC 2018 9561 Thu Jan 4 23:00:55 UTC 2018 9509 Thu Jan 4 23:00:56 UTC 2018 9427 Thu Jan 4 23:00:57 UTC 2018 9350 Thu Jan 4 23:00:58 UTC 2018 9260 Thu Jan 4 23:00:59 UTC 2018 9208 {noformat} If you strace the process, you can see: {noformat} $ sudo strace -f -c -p 6363 strace: Process 6363 attached with 97 threads ^Cstrace: Process 6363 detached strace: Process 6604 detached strace: Process 6605 detached strace: Process 6606 detached strace: Process 6607 detached strace: Process 6608 detached strace: Process 6609 detached strace: Process 6610 detached strace: Process 6611 detached strace: Process 6612 detached strace: Process 6613 detached strace: Process 6614 detached strace: Process 6615 detached strace: Process 6616 detached strace: Process 6617 detached strace: Process 6618 detached strace: Process 6619 detached strace: Process 6620 detached strace: Process 6621 detached strace: Process 6622 detached strace: Process 6623 detached strace: Process 6624 detached strace: Process 6625 detached strace: Process 6626 detached strace: Process 6627 detached strace: Process 6628 detached strace: Process 6629 detached strace: Process 6630 detached strace: Process 6631 detached strace: Process 6632 detached strace: Process 6633 detached strace: Process 6634 detached strace: Process 6635 detached strace: Process 6636 detached strace: Process 6637 detached strace: Process 6638 detached strace: Process 6639 detached strace: Process 6640 detached strace: Process 6641 detached strace: Process 6642 detached strace: Process 6643 detached strace: Process 6644 detached strace: Process 6645 detached strace: Process 6646 detached strace: Process 6647 detached strace: Process 6648 detached strace: Process 6649 detached strace: Process 6650 detached strace: Process 6651 detached strace: Process 6652 detached strace: Process 6653 detached strace: Process 6654 detached strace: Process 6655 detached strace: Process 6656 detached strace: Process 6657 detached strace: Process 6658 detached strace: Process 6659 detached strace: Process 6660 detached strace: Process 6661 detached strace: Process 6662 detached strace: Process 6663 detached strace: Process 6716 detached strace: Process 6717 detached strace: Process 6718 detached strace: Process 6719 detached strace: Process 6720 detached strace: Process 6721 detached strace: Process 6722 detached strace: Process 6723 detached strace: Process 6724 detached strace: Process 6725 detached strace: Process 6726 detached strace: Process 6727 detached strace: Process 6728 detached strace: Process 6729 detached strace: Process 6730 detached strace: Process 6731 detached strace: Process 6732 detached strace: Process 6733 detached strace: Process 6734 detached strace: Process 6735 detached strace: Process 6736 detached strace: Process 6737 detached strace: Process 6738 detached strace: Process 6739 detached strace: Process 6740 detached strace: Process 6741 detached strace: Process 6760 detached strace: Process 6779 detached strace: Process 6781 detached strace: Process 6783 detached strace: Process 6892 detached strace: Process 2339 detached strace: Process 2340 detached strace: Process 5122 detached strace: Process 5123 detached strace: Process 5652 detached % time seconds usecs/call callserrors syscall -- --- --- - - 65.190.859302 358042419 restart_syscall 26.600.350656 507 692 190 futex 5.170.068142227130 epoll_wait 1.220.016141 56 287 ftruncate 0.660.008679 20 432 close 0.380.005054 35 144 fsync 0.260.003489 12 288 open 0.190.002564 4 720 fstat 0.150.001914 3 576 lseek 0.140.001885 13 144 mmap 0.010.000114 716 getrusage 0.000.60 15 4 write 0.000.28 5 6 read 0.000.27 212 mprotect 0.000.19 5 4 epoll_ctl -- --- --- - - --
[jira] [Created] (KAFKA-6414) Inverse replication for replicas that are far behind
Ivan Babrou created KAFKA-6414: -- Summary: Inverse replication for replicas that are far behind Key: KAFKA-6414 URL: https://issues.apache.org/jira/browse/KAFKA-6414 Project: Kafka Issue Type: Bug Components: replication Reporter: Ivan Babrou Let's suppose the following starting point: * 1 topic * 1 partition * 1 reader * 24h retention period * leader outbound bandwidth is 3x of inbound bandwidth (1x replication + 1x reader + 1x slack = total outbound) In this scenario, when replica fails and needs to be brought back from scratch, you can catch up at 2x inbound bandwidth (1x regular replication + 1x slack used). 2x catch-up speed means replica will be at the point where leader is now in 24h / 2x = 12h. However, in 12h the oldest 12h of the topic will fall out of retention cliff and will be deleted. There's absolutely to use for this data, it will never be read from the replica in any scenario. And this not even including the fact that we still need to replicate 12h more of data that accumulated since the time we started. My suggestion is to refill sufficiently out of sync replicas backwards from the tip: newest segments first, oldest segments last. Then we can stop when we hit retention cliff and replicate far less data. The lower the ratio of catch-up bandwidth to inbound bandwidth, the higher the returns would be. This will also set a hard cap on retention time: it will be no higher than retention period if catch-up speed if >1x (if it's less, you're forever out of ISR anyway). What exactly "sufficiently out of sync" means in terms of lag is a topic for a debate. The default segment size is 1GiB, I'd say that being >1 full segments behind probably warrants this. As of now, the solution for slow recovery appears to be to reduce retention to speed up recovery, which doesn't seem very friendly. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KAFKA-6013) Controller getting stuck
Ivan Babrou created KAFKA-6013: -- Summary: Controller getting stuck Key: KAFKA-6013 URL: https://issues.apache.org/jira/browse/KAFKA-6013 Project: Kafka Issue Type: Bug Affects Versions: 0.11.0.0, 0.11.0.1 Reporter: Ivan Babrou It looks like a new issue in 0.11.0.0 and 0.11.0.1 still has it. We upgraded one of the clusters from 0.11.0.0 to 0.11.0.1 by shutting down 28 machines at once (single rack). When nodes came up none of them progressed after these log lines: {noformat} Oct 05 02:17:42 mybroker14 kafka[32940]: INFO Kafka version : 0.11.0.1 (org.apache.kafka.common.utils.AppInfoParser) Oct 05 02:17:42 mybroker14 kafka[32940]: INFO Kafka commitId : c2a0d5f9b1f45bf5 (org.apache.kafka.common.utils.AppInfoParser) Oct 05 02:17:42 mybroker14 kafka[32940]: INFO [Kafka Server 10014], started (kafka.server.KafkaServer) {noformat} There was no indication in controller node logs that it picked up rebooted nodes. This happened multiple times during the upgrade: once per rack plus some on top of that. Reboot took ~20m, all nodes in a single rack rebooted in parallel. The fix was to restart controller node, but that did not go cleanly too: {noformat} ivan@mybroker26:~$ sudo journalctl --since 01:00 -u kafka | fgrep 'Error during controlled shutdown' -A1 Oct 05 01:57:41 mybroker26 kafka[37409]: WARN [Kafka Server 10026], Error during controlled shutdown, possibly because leader movement took longer than the configured controller.socket.timeout.ms and/or request.timeout.ms: Connection to 10026 was disconnected before the response was read (kafka.server.KafkaServer) Oct 05 01:57:46 mybroker26 kafka[37409]: WARN [Kafka Server 10026], Retrying controlled shutdown after the previous attempt failed... (kafka.server.KafkaServer) -- Oct 05 01:58:16 mybroker26 kafka[37409]: WARN [Kafka Server 10026], Error during controlled shutdown, possibly because leader movement took longer than the configured controller.socket.timeout.ms and/or request.timeout.ms: Connection to 10026 was disconnected before the response was read (kafka.server.KafkaServer) Oct 05 01:58:18 mybroker26 kafka[37409]: INFO Rolled new log segment for 'requests-40' in 3 ms. (kafka.log.Log) -- Oct 05 01:58:51 mybroker26 kafka[37409]: WARN [Kafka Server 10026], Error during controlled shutdown, possibly because leader movement took longer than the configured controller.socket.timeout.ms and/or request.timeout.ms: Connection to 10026 was disconnected before the response was read (kafka.server.KafkaServer) Oct 05 01:58:56 mybroker26 kafka[37409]: WARN [Kafka Server 10026], Retrying controlled shutdown after the previous attempt failed... (kafka.server.KafkaServer) {noformat} I'm unable to reproduce the issue by just restarting or even rebooting one broker, controller picks it up: {noformat} Oct 05 03:18:18 mybroker83 kafka[37402]: INFO [Controller 10083]: Newly added brokers: 10001, deleted brokers: , all live brokers: ... {noformat} KAFKA-5028 happened in 0.11.0.0, so it's likely related. cc [~ijuma] -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KAFKA-5687) Retention settings are inconsistent between broker and topic
Ivan Babrou created KAFKA-5687: -- Summary: Retention settings are inconsistent between broker and topic Key: KAFKA-5687 URL: https://issues.apache.org/jira/browse/KAFKA-5687 Project: Kafka Issue Type: Bug Reporter: Ivan Babrou Topic configurations: * retention.ms Broker configurations: * log.retention.hours * log.retention.minutes * log.retention.ms First of all, it's impossible to set topic retention time in hours or minutes. Second, "seconds" version is missing between "ms" and "minutes". -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KAFKA-5633) Clarify another scenario of unclean leader election
Ivan Babrou created KAFKA-5633: -- Summary: Clarify another scenario of unclean leader election Key: KAFKA-5633 URL: https://issues.apache.org/jira/browse/KAFKA-5633 Project: Kafka Issue Type: Bug Reporter: Ivan Babrou When unclean leader election is enabled, you don't need to lose all replicas of some partition, it's enough to lose just one. Leading replica can get into the state when it kicks everything out of ISR because it has issue with the network, then it can just die, causing leaderless partition. This is what we saw: {noformat} Jul 24 18:05:53 broker-10029 kafka[4104]: INFO Partition [requests,9] on broker 10029: Shrinking ISR for partition [requests,9] from 10029,10016,10072 to 10029 (kafka.cluster.Partition) {noformat} {noformat} Topic: requests Partition: 9Leader: -1 Replicas: 10029,10072,10016 Isr: 10029 {noformat} This is the default behavior in 0.11.0.0+, but I don't think that docs are completely clear about implications. Before the change you could silently lose data if the scenario described above happened, but now you can grind your whole pipeline to halt when just one node has issues. My understanding is that to avoid this you'd want to have min.insync.replicas > 1 and acks > 1 (probably all). It's also worth documenting how to force leader election when unclean leader election is disabled. I assume it can be accomplished by switching unclean.leader.election.enable on and off again for problematic topic, but being crystal clear on this it docs would be tremendously helpful. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KAFKA-5348) kafka-consumer-groups.sh refuses to remove groups without ids
Ivan Babrou created KAFKA-5348: -- Summary: kafka-consumer-groups.sh refuses to remove groups without ids Key: KAFKA-5348 URL: https://issues.apache.org/jira/browse/KAFKA-5348 Project: Kafka Issue Type: Bug Components: admin Affects Versions: 0.10.2.0 Reporter: Ivan Babrou In zookeeper I have: {noformat} [zk: foo(CONNECTED) 37] ls /kafka/logs/consumers/console-consumer-4107 [offsets] {noformat} This consumer group also shows up when I list consumer groups: {noformat} $ /usr/local/kafka/bin/kafka-consumer-groups.sh --zookeeper foo:2181/kafka/logs --list | fgrep console-consumer-4107 Note: This will only show information about consumers that use ZooKeeper (not those using the Java consumer API). console-consumer-4107 {noformat} But I cannot remove this group: {noformat} $ /usr/local/kafka/bin/kafka-consumer-groups.sh --zookeeper 36zk1.in.pdx.cfdata.org:2181/kafka/logs --delete --group console-consumer-4107 Note: This will only show information about consumers that use ZooKeeper (not those using the Java consumer API). Error: Delete for group 'console-consumer-4107' failed because group does not exist. {noformat} I ran tcpdump and it turns out that /ids path is checked: {noformat} $.e.P.fP...&..<...//kafka/logs/consumers/console-consumer-4107/ids. {noformat} I think kafka should not check for /ids, it should check for / instead here. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (KAFKA-4576) Log segments close to max size break on fetch
Ivan Babrou created KAFKA-4576: -- Summary: Log segments close to max size break on fetch Key: KAFKA-4576 URL: https://issues.apache.org/jira/browse/KAFKA-4576 Project: Kafka Issue Type: Bug Components: log Affects Versions: 0.10.1.1 Reporter: Ivan Babrou We are running Kafka 0.10.1.1~rc1 (it's the same as 0.10.1.1). Max segment size is set to 2147483647 globally, that's 1 byte less than max signed int32. Every now and then we see failures like this: {noformat} Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: ERROR [Replica Manager on Broker 1006]: Error processing fetch operation on partition [mytopic,11], offset 483579108587 (kafka.server.ReplicaManager) Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: java.lang.IllegalStateException: Failed to read complete buffer for targetOffset 483686627237 startPosition 2145701130 in /disk/data0/kafka-logs/mytopic-11/483571890786.log Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at kafka.log.FileMessageSet.searchForOffsetWithSize(FileMessageSet.scala:145) Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at kafka.log.LogSegment.translateOffset(LogSegment.scala:128) Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at kafka.log.LogSegment.read(LogSegment.scala:180) Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at kafka.log.Log.read(Log.scala:563) Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at kafka.server.ReplicaManager.kafka$server$ReplicaManager$$read$1(ReplicaManager.scala:567) Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at kafka.server.ReplicaManager$$anonfun$readFromLocalLog$1.apply(ReplicaManager.scala:606) Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at kafka.server.ReplicaManager$$anonfun$readFromLocalLog$1.apply(ReplicaManager.scala:605) Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at scala.collection.Iterator$class.foreach(Iterator.scala:893) Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at scala.collection.AbstractIterable.foreach(Iterable.scala:54) Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at kafka.server.ReplicaManager.readFromLocalLog(ReplicaManager.scala:605) Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at kafka.server.ReplicaManager.fetchMessages(ReplicaManager.scala:469) Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at kafka.server.KafkaApis.handleFetchRequest(KafkaApis.scala:534) Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at kafka.server.KafkaApis.handle(KafkaApis.scala:79) Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60) Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at java.lang.Thread.run(Thread.java:745) {noformat} {noformat} ... -rw-r--r-- 1 kafka kafka 0 Dec 25 15:15 483557418204.timeindex -rw-r--r-- 1 kafka kafka 9496 Dec 25 15:26 483564654488.index -rw-r--r-- 1 kafka kafka 2145763964 Dec 25 15:26 483564654488.log -rw-r--r-- 1 kafka kafka 0 Dec 25 15:26 483564654488.timeindex -rw-r--r-- 1 kafka kafka 9576 Dec 25 15:37 483571890786.index -rw-r--r-- 1 kafka kafka 2147483644 Dec 25 15:37 483571890786.log -rw-r--r-- 1 kafka kafka 0 Dec 25 15:37 483571890786.timeindex -rw-r--r-- 1 kafka kafka 9568 Dec 25 15:48 483579135712.index -rw-r--r-- 1 kafka kafka 2146791360 Dec 25 15:48 483579135712.log -rw-r--r-- 1 kafka kafka 0 Dec 25 15:48 483579135712.timeindex -rw-r--r-- 1 kafka kafka 9408 Dec 25 15:59 483586374164.index ... {noformat} Here 483571890786.log is just 3 bytes below the max size. -- This message was sent by Atlassian JIRA (v6.3.4#6332)