[jira] [Commented] (KAFKA-1646) Improve consumer read performance for Windows
[ https://issues.apache.org/jira/browse/KAFKA-1646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14387828#comment-14387828 ] Sriharsha Chintalapani commented on KAFKA-1646: --- [~jkreps] Any guidance on the latest patch? Improve consumer read performance for Windows - Key: KAFKA-1646 URL: https://issues.apache.org/jira/browse/KAFKA-1646 Project: Kafka Issue Type: Improvement Components: log Affects Versions: 0.8.1.1 Environment: Windows Reporter: xueqiang wang Assignee: xueqiang wang Labels: newbie, patch Attachments: Improve consumer read performance for Windows.patch, KAFKA-1646-truncate-off-trailing-zeros-on-broker-restart-if-bro.patch, KAFKA-1646_20141216_163008.patch, KAFKA-1646_20150306_005526.patch, KAFKA-1646_20150312_200352.patch This patch is for Window platform only. In Windows platform, if there are more than one replicas writing to disk, the segment log files will not be consistent in disk and then consumer reading performance will be dropped down greatly. This fix allocates more disk spaces when rolling a new segment, and then it will improve the consumer reading performance in NTFS file system. This patch doesn't affect file allocation of other filesystems, for it only adds statements like 'if(Os.iswindow)' or adds methods used on Windows. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1983) TestEndToEndLatency can be unreliable after hard kill
[ https://issues.apache.org/jira/browse/KAFKA-1983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14386040#comment-14386040 ] Sriharsha Chintalapani commented on KAFKA-1983: --- [~gchao] Here are the steps 1) start a single node kafka server 2) create a topic test 3) ./bin/kafka-run-class.sh kafka.tools.TestEndToEndLatency localhost:9092 localhost:2181 test 10 1000 1 (USAGE: java kafka.tools.TestEndToEndLatency$ broker_list zookeeper_connect topic num_messages consumer_fetch_max_wait producer_acks) 4) hard kill the TestEndToEndLatency 5) restarting the number 3 step causes it report low latency. TestEndToEndLatency can be unreliable after hard kill - Key: KAFKA-1983 URL: https://issues.apache.org/jira/browse/KAFKA-1983 Project: Kafka Issue Type: Improvement Reporter: Jun Rao Assignee: Grayson Chao Labels: newbie If you hard kill TestEndToEndLatency, the committed offset remains the last checkpointed one. However, more messages are now appended after the last checkpointed offset. When restarting TestEndToEndLatency, the consumer resumes from the last checkpointed offset and will report really low latency since it doesn't need to wait for a new message to be produced to read the next message. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior
[ https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14384810#comment-14384810 ] Sriharsha Chintalapani commented on KAFKA-1461: --- Updated reviewboard https://reviews.apache.org/r/31366/diff/ against branch origin/trunk Replica fetcher thread does not implement any back-off behavior --- Key: KAFKA-1461 URL: https://issues.apache.org/jira/browse/KAFKA-1461 Project: Kafka Issue Type: Improvement Components: replication Affects Versions: 0.8.1.1 Reporter: Sam Meder Assignee: Sriharsha Chintalapani Labels: newbie++ Fix For: 0.8.3 Attachments: KAFKA-1461.patch, KAFKA-1461.patch, KAFKA-1461_2015-03-11_10:41:26.patch, KAFKA-1461_2015-03-11_18:17:51.patch, KAFKA-1461_2015-03-12_13:54:51.patch, KAFKA-1461_2015-03-17_16:03:33.patch, KAFKA-1461_2015-03-27_15:31:11.patch The current replica fetcher thread will retry in a tight loop if any error occurs during the fetch call. For example, we've seen cases where the fetch continuously throws a connection refused exception leading to several replica fetcher threads that spin in a pretty tight loop. To a much lesser degree this is also an issue in the consumer fetcher thread, although the fact that erroring partitions are removed so a leader can be re-discovered helps some. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior
[ https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14384813#comment-14384813 ] Sriharsha Chintalapani commented on KAFKA-1461: --- [~guozhang] addressed your last review. Please take a look. Thanks. Replica fetcher thread does not implement any back-off behavior --- Key: KAFKA-1461 URL: https://issues.apache.org/jira/browse/KAFKA-1461 Project: Kafka Issue Type: Improvement Components: replication Affects Versions: 0.8.1.1 Reporter: Sam Meder Assignee: Sriharsha Chintalapani Labels: newbie++ Fix For: 0.8.3 Attachments: KAFKA-1461.patch, KAFKA-1461.patch, KAFKA-1461_2015-03-11_10:41:26.patch, KAFKA-1461_2015-03-11_18:17:51.patch, KAFKA-1461_2015-03-12_13:54:51.patch, KAFKA-1461_2015-03-17_16:03:33.patch, KAFKA-1461_2015-03-27_15:31:11.patch The current replica fetcher thread will retry in a tight loop if any error occurs during the fetch call. For example, we've seen cases where the fetch continuously throws a connection refused exception leading to several replica fetcher threads that spin in a pretty tight loop. To a much lesser degree this is also an issue in the consumer fetcher thread, although the fact that erroring partitions are removed so a leader can be re-discovered helps some. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior
[ https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriharsha Chintalapani updated KAFKA-1461: -- Attachment: KAFKA-1461_2015-03-27_15:31:11.patch Replica fetcher thread does not implement any back-off behavior --- Key: KAFKA-1461 URL: https://issues.apache.org/jira/browse/KAFKA-1461 Project: Kafka Issue Type: Improvement Components: replication Affects Versions: 0.8.1.1 Reporter: Sam Meder Assignee: Sriharsha Chintalapani Labels: newbie++ Fix For: 0.8.3 Attachments: KAFKA-1461.patch, KAFKA-1461.patch, KAFKA-1461_2015-03-11_10:41:26.patch, KAFKA-1461_2015-03-11_18:17:51.patch, KAFKA-1461_2015-03-12_13:54:51.patch, KAFKA-1461_2015-03-17_16:03:33.patch, KAFKA-1461_2015-03-27_15:31:11.patch The current replica fetcher thread will retry in a tight loop if any error occurs during the fetch call. For example, we've seen cases where the fetch continuously throws a connection refused exception leading to several replica fetcher threads that spin in a pretty tight loop. To a much lesser degree this is also an issue in the consumer fetcher thread, although the fact that erroring partitions are removed so a leader can be re-discovered helps some. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior
[ https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14384922#comment-14384922 ] Sriharsha Chintalapani commented on KAFKA-1461: --- Updated reviewboard https://reviews.apache.org/r/31366/diff/ against branch origin/trunk Replica fetcher thread does not implement any back-off behavior --- Key: KAFKA-1461 URL: https://issues.apache.org/jira/browse/KAFKA-1461 Project: Kafka Issue Type: Improvement Components: replication Affects Versions: 0.8.1.1 Reporter: Sam Meder Assignee: Sriharsha Chintalapani Labels: newbie++ Fix For: 0.8.3 Attachments: KAFKA-1461.patch, KAFKA-1461.patch, KAFKA-1461_2015-03-11_10:41:26.patch, KAFKA-1461_2015-03-11_18:17:51.patch, KAFKA-1461_2015-03-12_13:54:51.patch, KAFKA-1461_2015-03-17_16:03:33.patch, KAFKA-1461_2015-03-27_15:31:11.patch, KAFKA-1461_2015-03-27_16:56:45.patch The current replica fetcher thread will retry in a tight loop if any error occurs during the fetch call. For example, we've seen cases where the fetch continuously throws a connection refused exception leading to several replica fetcher threads that spin in a pretty tight loop. To a much lesser degree this is also an issue in the consumer fetcher thread, although the fact that erroring partitions are removed so a leader can be re-discovered helps some. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior
[ https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriharsha Chintalapani updated KAFKA-1461: -- Attachment: KAFKA-1461_2015-03-27_16:56:45.patch Replica fetcher thread does not implement any back-off behavior --- Key: KAFKA-1461 URL: https://issues.apache.org/jira/browse/KAFKA-1461 Project: Kafka Issue Type: Improvement Components: replication Affects Versions: 0.8.1.1 Reporter: Sam Meder Assignee: Sriharsha Chintalapani Labels: newbie++ Fix For: 0.8.3 Attachments: KAFKA-1461.patch, KAFKA-1461.patch, KAFKA-1461_2015-03-11_10:41:26.patch, KAFKA-1461_2015-03-11_18:17:51.patch, KAFKA-1461_2015-03-12_13:54:51.patch, KAFKA-1461_2015-03-17_16:03:33.patch, KAFKA-1461_2015-03-27_15:31:11.patch, KAFKA-1461_2015-03-27_16:56:45.patch The current replica fetcher thread will retry in a tight loop if any error occurs during the fetch call. For example, we've seen cases where the fetch continuously throws a connection refused exception leading to several replica fetcher threads that spin in a pretty tight loop. To a much lesser degree this is also an issue in the consumer fetcher thread, although the fact that erroring partitions are removed so a leader can be re-discovered helps some. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior
[ https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14384932#comment-14384932 ] Sriharsha Chintalapani commented on KAFKA-1461: --- Updated reviewboard https://reviews.apache.org/r/31366/diff/ against branch origin/trunk Replica fetcher thread does not implement any back-off behavior --- Key: KAFKA-1461 URL: https://issues.apache.org/jira/browse/KAFKA-1461 Project: Kafka Issue Type: Improvement Components: replication Affects Versions: 0.8.1.1 Reporter: Sam Meder Assignee: Sriharsha Chintalapani Labels: newbie++ Fix For: 0.8.3 Attachments: KAFKA-1461.patch, KAFKA-1461.patch, KAFKA-1461_2015-03-11_10:41:26.patch, KAFKA-1461_2015-03-11_18:17:51.patch, KAFKA-1461_2015-03-12_13:54:51.patch, KAFKA-1461_2015-03-17_16:03:33.patch, KAFKA-1461_2015-03-27_15:31:11.patch, KAFKA-1461_2015-03-27_16:56:45.patch, KAFKA-1461_2015-03-27_17:02:32.patch The current replica fetcher thread will retry in a tight loop if any error occurs during the fetch call. For example, we've seen cases where the fetch continuously throws a connection refused exception leading to several replica fetcher threads that spin in a pretty tight loop. To a much lesser degree this is also an issue in the consumer fetcher thread, although the fact that erroring partitions are removed so a leader can be re-discovered helps some. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior
[ https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriharsha Chintalapani updated KAFKA-1461: -- Attachment: KAFKA-1461_2015-03-27_17:02:32.patch Replica fetcher thread does not implement any back-off behavior --- Key: KAFKA-1461 URL: https://issues.apache.org/jira/browse/KAFKA-1461 Project: Kafka Issue Type: Improvement Components: replication Affects Versions: 0.8.1.1 Reporter: Sam Meder Assignee: Sriharsha Chintalapani Labels: newbie++ Fix For: 0.8.3 Attachments: KAFKA-1461.patch, KAFKA-1461.patch, KAFKA-1461_2015-03-11_10:41:26.patch, KAFKA-1461_2015-03-11_18:17:51.patch, KAFKA-1461_2015-03-12_13:54:51.patch, KAFKA-1461_2015-03-17_16:03:33.patch, KAFKA-1461_2015-03-27_15:31:11.patch, KAFKA-1461_2015-03-27_16:56:45.patch, KAFKA-1461_2015-03-27_17:02:32.patch The current replica fetcher thread will retry in a tight loop if any error occurs during the fetch call. For example, we've seen cases where the fetch continuously throws a connection refused exception leading to several replica fetcher threads that spin in a pretty tight loop. To a much lesser degree this is also an issue in the consumer fetcher thread, although the fact that erroring partitions are removed so a leader can be re-discovered helps some. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2052) zookeeper.connect does not work when specifying multiple zk nodes with chroot
[ https://issues.apache.org/jira/browse/KAFKA-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14381516#comment-14381516 ] Sriharsha Chintalapani commented on KAFKA-2052: --- [~uohzxela] Syntax is to add the zkroot at the end of the zookeeper.connect string Here is an example zookeeper.connect=zookeeper-staging-a-1.bezurk.org:2181,zookeeper-staging-b-1.bezurk.org:2181/kafka Here is the text from https://kafka.apache.org/08/configuration.html Zookeeper also allows you to add a chroot path which will make all kafka data for this cluster appear under a particular path. This is a way to setup multiple Kafka clusters or other applications on the same zookeeper cluster. To do this give a connection string in the form hostname1:port1,hostname2:port2,hostname3:port3/chroot/path which would put all this cluster's data under the path /chroot/path. Note that you must create this path yourself prior to starting the broker and consumers must use the same connection string. Closing this as invalid. Please re-open if necessary. zookeeper.connect does not work when specifying multiple zk nodes with chroot -- Key: KAFKA-2052 URL: https://issues.apache.org/jira/browse/KAFKA-2052 Project: Kafka Issue Type: Bug Components: config Reporter: Alex Jiao Ziheng Priority: Minor I wish to specify multiple zk nodes with chroot strings in server.properties config file but Kafka is unable to separate the zk nodes properly in the presence of chroot strings: {code:title=server.properties|borderStyle=solid} zookeeper.connect=zookeeper-staging-a-1.bezurk.org:2181/kafka,zookeeper-staging-b-1.bezurk.org:2181/kafka {code} After running {code}service kafka start{code} Here are the error logs: {code:title=kafka_init_stdout.log|borderStyle=solid} [2015-03-26 13:58:21,330] INFO [Kafka Server 1], Connecting to zookeeper on zookeeper-staging-b-1.bezurk.org:2181/kafka,zookeeper-staging-a-1.bezurk.org:2181/kafka (kafka.server.KafkaServer) [2015-03-26 13:58:21,438] INFO Starting ZkClient event thread. (org.I0Itec.zkclient.ZkEventThread) [2015-03-26 13:58:21,454] INFO Client environment:zookeeper.version=3.3.3-1203054, built on 11/17/2011 05:47 GMT (org.apache.zookeeper.ZooKeeper) [2015-03-26 13:58:21,454] INFO Client environment:host.name=ip-10-0-11-38.ap-southeast-1.compute.internal (org.apache.zookeeper.ZooKeeper) [2015-03-26 13:58:21,454] INFO Client environment:java.version=1.7.0_75 (org.apache.zookeeper.ZooKeeper) [2015-03-26 13:58:21,454] INFO Client environment:java.vendor=Oracle Corporation (org.apache.zookeeper.ZooKeeper) [2015-03-26 13:58:21,454] INFO Client environment:java.home=/usr/lib/jvm/jdk1.7.0_75/jre (org.apache.zookeeper.ZooKeeper) [2015-03-26 13:58:21,455] INFO Client environment:java.class.path=:/opt/kafka/bin/../core/build/dependant-libs-2.8.0/*.jar:/opt/kafka/bin/../perf/build/libs//kafka-perf_2.8.0*.jar:/opt/kafka/bin/../clients/build/libs//kafka-clients*.jar:/opt/kafka/bin/../examples/build/libs//kafka-examples*.jar:/opt/kafka/bin/../contrib/hadoop-consumer/build/libs//kafka-hadoop-consumer*.jar:/opt/kafka/bin/../contrib/hadoop-producer/build/libs//kafka-hadoop-producer*.jar:/opt/kafka/bin/../libs/jopt-simple-3.2.jar:/opt/kafka/bin/../libs/kafka_2.9.2-0.8.1.1.jar:/opt/kafka/bin/../libs/kafka_2.9.2-0.8.1.1-javadoc.jar:/opt/kafka/bin/../libs/kafka_2.9.2-0.8.1.1-scaladoc.jar:/opt/kafka/bin/../libs/kafka_2.9.2-0.8.1.1-sources.jar:/opt/kafka/bin/../libs/log4j-1.2.15.jar:/opt/kafka/bin/../libs/metrics-core-2.2.0.jar:/opt/kafka/bin/../libs/scala-library-2.9.2.jar:/opt/kafka/bin/../libs/slf4j-api-1.7.2.jar:/opt/kafka/bin/../libs/snappy-java-1.0.5.jar:/opt/kafka/bin/../libs/zkclient-0.3.jar:/opt/kafka/bin/../libs/zookeeper-3.3.4.jar:/opt/kafka/bin/../core/build/libs/kafka_2.8.0*.jar (org.apache.zookeeper.ZooKeeper) [2015-03-26 13:58:21,455] INFO Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib (org.apache.zookeeper.ZooKeeper) [2015-03-26 13:58:21,455] INFO Client environment:java.io.tmpdir=/tmp (org.apache.zookeeper.ZooKeeper) [2015-03-26 13:58:21,455] INFO Client environment:java.compiler=NA (org.apache.zookeeper.ZooKeeper) [2015-03-26 13:58:21,455] INFO Client environment:os.name=Linux (org.apache.zookeeper.ZooKeeper) [2015-03-26 13:58:21,455] INFO Client environment:os.arch=amd64 (org.apache.zookeeper.ZooKeeper) [2015-03-26 13:58:21,456] INFO Client environment:os.version=3.2.0-58-virtual (org.apache.zookeeper.ZooKeeper) [2015-03-26 13:58:21,456] INFO Client environment:user.name=kafka (org.apache.zookeeper.ZooKeeper) [2015-03-26 13:58:21,456] INFO Client environment:user.home=/home/kafka (org.apache.zookeeper.ZooKeeper) [2015-03-26
[jira] [Resolved] (KAFKA-2052) zookeeper.connect does not work when specifying multiple zk nodes with chroot
[ https://issues.apache.org/jira/browse/KAFKA-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriharsha Chintalapani resolved KAFKA-2052. --- Resolution: Invalid zookeeper.connect does not work when specifying multiple zk nodes with chroot -- Key: KAFKA-2052 URL: https://issues.apache.org/jira/browse/KAFKA-2052 Project: Kafka Issue Type: Bug Components: config Reporter: Alex Jiao Ziheng Priority: Minor I wish to specify multiple zk nodes with chroot strings in server.properties config file but Kafka is unable to separate the zk nodes properly in the presence of chroot strings: {code:title=server.properties|borderStyle=solid} zookeeper.connect=zookeeper-staging-a-1.bezurk.org:2181/kafka,zookeeper-staging-b-1.bezurk.org:2181/kafka {code} After running {code}service kafka start{code} Here are the error logs: {code:title=kafka_init_stdout.log|borderStyle=solid} [2015-03-26 13:58:21,330] INFO [Kafka Server 1], Connecting to zookeeper on zookeeper-staging-b-1.bezurk.org:2181/kafka,zookeeper-staging-a-1.bezurk.org:2181/kafka (kafka.server.KafkaServer) [2015-03-26 13:58:21,438] INFO Starting ZkClient event thread. (org.I0Itec.zkclient.ZkEventThread) [2015-03-26 13:58:21,454] INFO Client environment:zookeeper.version=3.3.3-1203054, built on 11/17/2011 05:47 GMT (org.apache.zookeeper.ZooKeeper) [2015-03-26 13:58:21,454] INFO Client environment:host.name=ip-10-0-11-38.ap-southeast-1.compute.internal (org.apache.zookeeper.ZooKeeper) [2015-03-26 13:58:21,454] INFO Client environment:java.version=1.7.0_75 (org.apache.zookeeper.ZooKeeper) [2015-03-26 13:58:21,454] INFO Client environment:java.vendor=Oracle Corporation (org.apache.zookeeper.ZooKeeper) [2015-03-26 13:58:21,454] INFO Client environment:java.home=/usr/lib/jvm/jdk1.7.0_75/jre (org.apache.zookeeper.ZooKeeper) [2015-03-26 13:58:21,455] INFO Client environment:java.class.path=:/opt/kafka/bin/../core/build/dependant-libs-2.8.0/*.jar:/opt/kafka/bin/../perf/build/libs//kafka-perf_2.8.0*.jar:/opt/kafka/bin/../clients/build/libs//kafka-clients*.jar:/opt/kafka/bin/../examples/build/libs//kafka-examples*.jar:/opt/kafka/bin/../contrib/hadoop-consumer/build/libs//kafka-hadoop-consumer*.jar:/opt/kafka/bin/../contrib/hadoop-producer/build/libs//kafka-hadoop-producer*.jar:/opt/kafka/bin/../libs/jopt-simple-3.2.jar:/opt/kafka/bin/../libs/kafka_2.9.2-0.8.1.1.jar:/opt/kafka/bin/../libs/kafka_2.9.2-0.8.1.1-javadoc.jar:/opt/kafka/bin/../libs/kafka_2.9.2-0.8.1.1-scaladoc.jar:/opt/kafka/bin/../libs/kafka_2.9.2-0.8.1.1-sources.jar:/opt/kafka/bin/../libs/log4j-1.2.15.jar:/opt/kafka/bin/../libs/metrics-core-2.2.0.jar:/opt/kafka/bin/../libs/scala-library-2.9.2.jar:/opt/kafka/bin/../libs/slf4j-api-1.7.2.jar:/opt/kafka/bin/../libs/snappy-java-1.0.5.jar:/opt/kafka/bin/../libs/zkclient-0.3.jar:/opt/kafka/bin/../libs/zookeeper-3.3.4.jar:/opt/kafka/bin/../core/build/libs/kafka_2.8.0*.jar (org.apache.zookeeper.ZooKeeper) [2015-03-26 13:58:21,455] INFO Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib (org.apache.zookeeper.ZooKeeper) [2015-03-26 13:58:21,455] INFO Client environment:java.io.tmpdir=/tmp (org.apache.zookeeper.ZooKeeper) [2015-03-26 13:58:21,455] INFO Client environment:java.compiler=NA (org.apache.zookeeper.ZooKeeper) [2015-03-26 13:58:21,455] INFO Client environment:os.name=Linux (org.apache.zookeeper.ZooKeeper) [2015-03-26 13:58:21,455] INFO Client environment:os.arch=amd64 (org.apache.zookeeper.ZooKeeper) [2015-03-26 13:58:21,456] INFO Client environment:os.version=3.2.0-58-virtual (org.apache.zookeeper.ZooKeeper) [2015-03-26 13:58:21,456] INFO Client environment:user.name=kafka (org.apache.zookeeper.ZooKeeper) [2015-03-26 13:58:21,456] INFO Client environment:user.home=/home/kafka (org.apache.zookeeper.ZooKeeper) [2015-03-26 13:58:21,456] INFO Client environment:user.dir=/home/kafka (org.apache.zookeeper.ZooKeeper) [2015-03-26 13:58:21,457] INFO Initiating client connection, connectString=zookeeper-staging-b-1.bezurk.org:2181/kafka,zookeeper-staging-a-1.bezurk.org:2181/kafka sessionTimeout=6000 watcher=org.I0Itec.zkclient.ZkClient@c0d7 (org.apache.zookeeper.ZooKeeper) [2015-03-26 13:58:21,564] INFO Opening socket connection to server zookeeper-staging-b-1.bezurk.org/10.0.29.191:2181 (org.apache.zookeeper.ClientCnxn) [2015-03-26 13:58:21,578] INFO Socket connection established to zookeeper-staging-b-1.bezurk.org/10.0.29.191:2181, initiating session (org.apache.zookeeper.ClientCnxn) [2015-03-26 13:58:21,596] INFO Session establishment complete on server zookeeper-staging-b-1.bezurk.org/10.0.29.191:2181, sessionid = 0x24c50668a78000d, negotiated timeout = 6000
[jira] [Commented] (KAFKA-2050) Avoid calling .size() on java.util.ConcurrentLinkedQueue
[ https://issues.apache.org/jira/browse/KAFKA-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14380809#comment-14380809 ] Sriharsha Chintalapani commented on KAFKA-2050: --- [~timbrooks] changes looks good to me. But can you send the patch using kafka-patch-review.py https://cwiki.apache.org/confluence/display/KAFKA/Patch+submission+and+review Avoid calling .size() on java.util.ConcurrentLinkedQueue Key: KAFKA-2050 URL: https://issues.apache.org/jira/browse/KAFKA-2050 Project: Kafka Issue Type: Bug Components: network Reporter: Tim Brooks Assignee: Jun Rao Attachments: dont_call_queue_size.patch Generally, it seems to be preferred to avoid calling .size() on a Java ConcurrentLinkedQueue. This is an O(N) operation as it must iterate through all the nodes. Calling this every time through the loop makes this issue worse under high load. It seems like the same functionality can be attained by just polling and checking for null. This is more imperative and less functional, but it seems alright since this is in lower-level networking code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2046) Delete topic still doesn't work
[ https://issues.apache.org/jira/browse/KAFKA-2046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14378741#comment-14378741 ] Sriharsha Chintalapani commented on KAFKA-2046: --- [~clarkhaskins] can you add the details on how the big the cluster was and also do you have state-change.log files on the brokers where the Log data was not deleted. Delete topic still doesn't work --- Key: KAFKA-2046 URL: https://issues.apache.org/jira/browse/KAFKA-2046 Project: Kafka Issue Type: Bug Reporter: Clark Haskins I just attempted to delete at 128 partition topic with all inbound producers stopped. The result was as follows: The /admin/delete_topics znode was empty the topic under /brokers/topics was removed The Kafka topics command showed that the topic was removed However, the data on disk on each broker was not deleted and the topic has not yet been re-created by starting up the inbound mirror maker. Let me know what additional information is needed -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (KAFKA-2046) Delete topic still doesn't work
[ https://issues.apache.org/jira/browse/KAFKA-2046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriharsha Chintalapani reassigned KAFKA-2046: - Assignee: Sriharsha Chintalapani Delete topic still doesn't work --- Key: KAFKA-2046 URL: https://issues.apache.org/jira/browse/KAFKA-2046 Project: Kafka Issue Type: Bug Reporter: Clark Haskins Assignee: Sriharsha Chintalapani I just attempted to delete at 128 partition topic with all inbound producers stopped. The result was as follows: The /admin/delete_topics znode was empty the topic under /brokers/topics was removed The Kafka topics command showed that the topic was removed However, the data on disk on each broker was not deleted and the topic has not yet been re-created by starting up the inbound mirror maker. Let me know what additional information is needed -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior
[ https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14379162#comment-14379162 ] Sriharsha Chintalapani commented on KAFKA-1461: --- [~guozhang] Thanks for the review. Can you please take a look at my reply to your comment. Replica fetcher thread does not implement any back-off behavior --- Key: KAFKA-1461 URL: https://issues.apache.org/jira/browse/KAFKA-1461 Project: Kafka Issue Type: Improvement Components: replication Affects Versions: 0.8.1.1 Reporter: Sam Meder Assignee: Sriharsha Chintalapani Labels: newbie++ Fix For: 0.8.3 Attachments: KAFKA-1461.patch, KAFKA-1461.patch, KAFKA-1461_2015-03-11_10:41:26.patch, KAFKA-1461_2015-03-11_18:17:51.patch, KAFKA-1461_2015-03-12_13:54:51.patch, KAFKA-1461_2015-03-17_16:03:33.patch The current replica fetcher thread will retry in a tight loop if any error occurs during the fetch call. For example, we've seen cases where the fetch continuously throws a connection refused exception leading to several replica fetcher threads that spin in a pretty tight loop. To a much lesser degree this is also an issue in the consumer fetcher thread, although the fact that erroring partitions are removed so a leader can be re-discovered helps some. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1684) Implement TLS/SSL authentication
[ https://issues.apache.org/jira/browse/KAFKA-1684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14378335#comment-14378335 ] Sriharsha Chintalapani commented on KAFKA-1684: --- [~gwenshap] if you have patch available for KAFKA-1928 can you please upload it. I can modify my ssl and kerberos patches according to the new code. Implement TLS/SSL authentication Key: KAFKA-1684 URL: https://issues.apache.org/jira/browse/KAFKA-1684 Project: Kafka Issue Type: Sub-task Components: security Affects Versions: 0.9.0 Reporter: Jay Kreps Assignee: Sriharsha Chintalapani Attachments: KAFKA-1684.patch, KAFKA-1684.patch Add an SSL port to the configuration and advertise this as part of the metadata request. If the SSL port is configured the socket server will need to add a second Acceptor thread to listen on it. Connections accepted on this port will need to go through the SSL handshake prior to being registered with a Processor for request processing. SSL requests and responses may need to be wrapped or unwrapped using the SSLEngine that was initialized by the acceptor. This wrapping and unwrapping is very similar to what will need to be done for SASL-based authentication schemes. We should have a uniform interface that covers both of these and we will need to store the instance in the session with the request. The socket server will have to use this object when reading and writing requests. We will need to take care with the FetchRequests as the current FileChannel.transferTo mechanism will be incompatible with wrap/unwrap so we can only use this optimization for unencrypted sockets that don't require userspace translation (wrapping). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1507) Using GetOffsetShell against non-existent topic creates the topic unintentionally
[ https://issues.apache.org/jira/browse/KAFKA-1507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14376040#comment-14376040 ] Sriharsha Chintalapani commented on KAFKA-1507: --- [~jkreps] Since create/update/topic requests are part of KIP-4. Your proposal if the producer is throwing errors like UnknownTopicOrPartition users should catch this error and use AdminClient create a topic?. I still see a benefit of allowing users to pass in their required topic config( partitions, replication etcc) and if there is no topic exists send a createTopicRequest. If this is not desirable as per your suggestion we need to implement AdminClient?. In this case they can use AdminUtils and we should modify the AdminUtils send requests to broker instead of directly sending requests to zookeeper. This will also help KAFKA-1688 as all the create/update/delete requests will go through broker authorizer. Let me know if this what your thinking. Using GetOffsetShell against non-existent topic creates the topic unintentionally - Key: KAFKA-1507 URL: https://issues.apache.org/jira/browse/KAFKA-1507 Project: Kafka Issue Type: Bug Affects Versions: 0.8.1.1 Environment: centos Reporter: Luke Forehand Assignee: Sriharsha Chintalapani Priority: Minor Labels: newbie Attachments: KAFKA-1507.patch, KAFKA-1507.patch, KAFKA-1507_2014-07-22_10:27:45.patch, KAFKA-1507_2014-07-23_17:07:20.patch, KAFKA-1507_2014-08-12_18:09:06.patch, KAFKA-1507_2014-08-22_11:06:38.patch, KAFKA-1507_2014-08-22_11:08:51.patch A typo in using GetOffsetShell command can cause a topic to be created which cannot be deleted (because deletion is still in progress) ./kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list kafka10:9092,kafka11:9092,kafka12:9092,kafka13:9092 --topic typo --time 1 ./kafka-topics.sh --zookeeper stormqa1/kafka-prod --describe --topic typo Topic:typo PartitionCount:8ReplicationFactor:1 Configs: Topic: typo Partition: 0Leader: 10 Replicas: 10 Isr: 10 ... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (KAFKA-1684) Implement TLS/SSL authentication
[ https://issues.apache.org/jira/browse/KAFKA-1684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriharsha Chintalapani reassigned KAFKA-1684: - Assignee: Sriharsha Chintalapani (was: Ivan Lyutov) Implement TLS/SSL authentication Key: KAFKA-1684 URL: https://issues.apache.org/jira/browse/KAFKA-1684 Project: Kafka Issue Type: Sub-task Components: security Affects Versions: 0.9.0 Reporter: Jay Kreps Assignee: Sriharsha Chintalapani Attachments: KAFKA-1684.patch, KAFKA-1684.patch Add an SSL port to the configuration and advertise this as part of the metadata request. If the SSL port is configured the socket server will need to add a second Acceptor thread to listen on it. Connections accepted on this port will need to go through the SSL handshake prior to being registered with a Processor for request processing. SSL requests and responses may need to be wrapped or unwrapped using the SSLEngine that was initialized by the acceptor. This wrapping and unwrapping is very similar to what will need to be done for SASL-based authentication schemes. We should have a uniform interface that covers both of these and we will need to store the instance in the session with the request. The socket server will have to use this object when reading and writing requests. We will need to take care with the FetchRequests as the current FileChannel.transferTo mechanism will be incompatible with wrap/unwrap so we can only use this optimization for unencrypted sockets that don't require userspace translation (wrapping). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (KAFKA-2031) Executing scripts that invoke kafka-run-class.sh results in 'permission denied to create log dir' warning.
[ https://issues.apache.org/jira/browse/KAFKA-2031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriharsha Chintalapani resolved KAFKA-2031. --- Resolution: Duplicate Executing scripts that invoke kafka-run-class.sh results in 'permission denied to create log dir' warning. -- Key: KAFKA-2031 URL: https://issues.apache.org/jira/browse/KAFKA-2031 Project: Kafka Issue Type: Bug Components: packaging Affects Versions: 0.8.1.1 Reporter: Manikandan Narayanaswamy Priority: Minor Kafka-run-class.sh script expects LOG_DIR variable to be set, and if this variable is not set, it defaults log dir to $base_dir/logs. And in the event the executor of the script does not have the right permissions, it would lead to errors such as: {noformat} mkdir: cannot create directory `/usr/lib/kafka/bin/../logs': Permission denied. {noformat} Proposing one way to make this more configurable is by introducing kafka-env.sh. Sourcing this file would export LOG_DIR and potentially other variables if need be. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2031) Executing scripts that invoke kafka-run-class.sh results in 'permission denied to create log dir' warning.
[ https://issues.apache.org/jira/browse/KAFKA-2031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14368239#comment-14368239 ] Sriharsha Chintalapani commented on KAFKA-2031: --- [~mnarayan] we've a JIRA to add kafka-env.sh here https://issues.apache.org/jira/browse/KAFKA-1566 Executing scripts that invoke kafka-run-class.sh results in 'permission denied to create log dir' warning. -- Key: KAFKA-2031 URL: https://issues.apache.org/jira/browse/KAFKA-2031 Project: Kafka Issue Type: Bug Components: packaging Affects Versions: 0.8.1.1 Reporter: Manikandan Narayanaswamy Priority: Minor Kafka-run-class.sh script expects LOG_DIR variable to be set, and if this variable is not set, it defaults log dir to $base_dir/logs. And in the event the executor of the script does not have the right permissions, it would lead to errors such as: {noformat} mkdir: cannot create directory `/usr/lib/kafka/bin/../logs': Permission denied. {noformat} Proposing one way to make this more configurable is by introducing kafka-env.sh. Sourcing this file would export LOG_DIR and potentially other variables if need be. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (KAFKA-1912) Create a simple request re-routing facility
[ https://issues.apache.org/jira/browse/KAFKA-1912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriharsha Chintalapani reassigned KAFKA-1912: - Assignee: Sriharsha Chintalapani Create a simple request re-routing facility --- Key: KAFKA-1912 URL: https://issues.apache.org/jira/browse/KAFKA-1912 Project: Kafka Issue Type: Improvement Reporter: Jay Kreps Assignee: Sriharsha Chintalapani Fix For: 0.8.3 We are accumulating a lot of requests that have to be directed to the correct server. This makes sense for high volume produce or fetch requests. But it is silly to put the extra burden on the client for the many miscellaneous requests such as fetching or committing offsets and so on. This adds a ton of practical complexity to the clients with little or no payoff in performance. We should add a generic request-type agnostic re-routing facility on the server. This would allow any server to accept a request and forward it to the correct destination, proxying the response back to the user. Naturally it needs to do this without blocking the thread. The result is that a client implementation can choose to be optimally efficient and manage a local cache of cluster state and attempt to always direct its requests to the proper server OR it can choose simplicity and just send things all to a single host and let that host figure out where to forward it. I actually think we should implement this more or less across the board, but some requests such as produce and fetch require more logic to proxy since they have to be scattered out to multiple servers and gathered back to create the response. So these could be done in a second phase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (KAFKA-2007) update offsetrequest for more useful information we have on broker about partition
[ https://issues.apache.org/jira/browse/KAFKA-2007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriharsha Chintalapani reassigned KAFKA-2007: - Assignee: Sriharsha Chintalapani update offsetrequest for more useful information we have on broker about partition -- Key: KAFKA-2007 URL: https://issues.apache.org/jira/browse/KAFKA-2007 Project: Kafka Issue Type: Bug Reporter: Joe Stein Assignee: Sriharsha Chintalapani Fix For: 0.8.3 this will need a KIP via [~jkreps] in KIP-6 discussion about KAFKA-1694 The other information that would be really useful to get would be information about partitions--how much data is in the partition, what are the segment offsets, what is the log-end offset (i.e. last offset), what is the compaction point, etc. I think that done right this would be the successor to the very awkward OffsetRequest we have today. This is not really blocking that ticket and could happen before/after and has a lot of other useful purposes and is important to get done so tracking it here in this JIRA. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1566) Kafka environment configuration (kafka-env.sh)
[ https://issues.apache.org/jira/browse/KAFKA-1566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366377#comment-14366377 ] Sriharsha Chintalapani commented on KAFKA-1566: --- [~nehanarkhede] patch applies against the trunk cleanly now. Can you please take a look. Kafka environment configuration (kafka-env.sh) -- Key: KAFKA-1566 URL: https://issues.apache.org/jira/browse/KAFKA-1566 Project: Kafka Issue Type: Improvement Components: tools Reporter: Cosmin Lehene Assignee: Sriharsha Chintalapani Labels: newbie Fix For: 0.8.3 Attachments: KAFKA-1566.patch, KAFKA-1566_2015-02-21_21:57:02.patch, KAFKA-1566_2015-03-17_17:01:38.patch, KAFKA-1566_2015-03-17_17:19:23.patch It would be useful (especially for automated deployments) to have an environment configuration file that could be sourced from the launcher files (e.g. kafka-run-server.sh). This is how this could look like kafka-env.sh {code} export KAFKA_JVM_PERFORMANCE_OPTS=-XX:+UseCompressedOops -XX:+DisableExplicitGC -Djava.awt.headless=true \ -XX:+UseG1GC -XX:PermSize=48m -XX:MaxPermSize=48m -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35' % export KAFKA_HEAP_OPTS='-Xmx1G -Xms1G' % export KAFKA_LOG4J_OPTS=-Dkafka.logs.dir=/var/log/kafka {code} kafka-server-start.sh {code} ... source $base_dir/config/kafka-env.sh ... {code} This approach is consistent with Hadoop and HBase. However the idea here is to be able to set these values in a single place without having to edit startup scripts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1566) Kafka environment configuration (kafka-env.sh)
[ https://issues.apache.org/jira/browse/KAFKA-1566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366347#comment-14366347 ] Sriharsha Chintalapani commented on KAFKA-1566: --- Updated reviewboard https://reviews.apache.org/r/29724/diff/ against branch origin/trunk Kafka environment configuration (kafka-env.sh) -- Key: KAFKA-1566 URL: https://issues.apache.org/jira/browse/KAFKA-1566 Project: Kafka Issue Type: Improvement Components: tools Reporter: Cosmin Lehene Assignee: Sriharsha Chintalapani Labels: newbie Fix For: 0.8.3 Attachments: KAFKA-1566.patch, KAFKA-1566_2015-02-21_21:57:02.patch, KAFKA-1566_2015-03-17_17:01:38.patch It would be useful (especially for automated deployments) to have an environment configuration file that could be sourced from the launcher files (e.g. kafka-run-server.sh). This is how this could look like kafka-env.sh {code} export KAFKA_JVM_PERFORMANCE_OPTS=-XX:+UseCompressedOops -XX:+DisableExplicitGC -Djava.awt.headless=true \ -XX:+UseG1GC -XX:PermSize=48m -XX:MaxPermSize=48m -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35' % export KAFKA_HEAP_OPTS='-Xmx1G -Xms1G' % export KAFKA_LOG4J_OPTS=-Dkafka.logs.dir=/var/log/kafka {code} kafka-server-start.sh {code} ... source $base_dir/config/kafka-env.sh ... {code} This approach is consistent with Hadoop and HBase. However the idea here is to be able to set these values in a single place without having to edit startup scripts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1566) Kafka environment configuration (kafka-env.sh)
[ https://issues.apache.org/jira/browse/KAFKA-1566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriharsha Chintalapani updated KAFKA-1566: -- Attachment: KAFKA-1566_2015-03-17_17:01:38.patch Kafka environment configuration (kafka-env.sh) -- Key: KAFKA-1566 URL: https://issues.apache.org/jira/browse/KAFKA-1566 Project: Kafka Issue Type: Improvement Components: tools Reporter: Cosmin Lehene Assignee: Sriharsha Chintalapani Labels: newbie Fix For: 0.8.3 Attachments: KAFKA-1566.patch, KAFKA-1566_2015-02-21_21:57:02.patch, KAFKA-1566_2015-03-17_17:01:38.patch It would be useful (especially for automated deployments) to have an environment configuration file that could be sourced from the launcher files (e.g. kafka-run-server.sh). This is how this could look like kafka-env.sh {code} export KAFKA_JVM_PERFORMANCE_OPTS=-XX:+UseCompressedOops -XX:+DisableExplicitGC -Djava.awt.headless=true \ -XX:+UseG1GC -XX:PermSize=48m -XX:MaxPermSize=48m -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35' % export KAFKA_HEAP_OPTS='-Xmx1G -Xms1G' % export KAFKA_LOG4J_OPTS=-Dkafka.logs.dir=/var/log/kafka {code} kafka-server-start.sh {code} ... source $base_dir/config/kafka-env.sh ... {code} This approach is consistent with Hadoop and HBase. However the idea here is to be able to set these values in a single place without having to edit startup scripts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1566) Kafka environment configuration (kafka-env.sh)
[ https://issues.apache.org/jira/browse/KAFKA-1566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriharsha Chintalapani updated KAFKA-1566: -- Attachment: KAFKA-1566_2015-03-17_17:19:23.patch Kafka environment configuration (kafka-env.sh) -- Key: KAFKA-1566 URL: https://issues.apache.org/jira/browse/KAFKA-1566 Project: Kafka Issue Type: Improvement Components: tools Reporter: Cosmin Lehene Assignee: Sriharsha Chintalapani Labels: newbie Fix For: 0.8.3 Attachments: KAFKA-1566.patch, KAFKA-1566_2015-02-21_21:57:02.patch, KAFKA-1566_2015-03-17_17:01:38.patch, KAFKA-1566_2015-03-17_17:19:23.patch It would be useful (especially for automated deployments) to have an environment configuration file that could be sourced from the launcher files (e.g. kafka-run-server.sh). This is how this could look like kafka-env.sh {code} export KAFKA_JVM_PERFORMANCE_OPTS=-XX:+UseCompressedOops -XX:+DisableExplicitGC -Djava.awt.headless=true \ -XX:+UseG1GC -XX:PermSize=48m -XX:MaxPermSize=48m -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35' % export KAFKA_HEAP_OPTS='-Xmx1G -Xms1G' % export KAFKA_LOG4J_OPTS=-Dkafka.logs.dir=/var/log/kafka {code} kafka-server-start.sh {code} ... source $base_dir/config/kafka-env.sh ... {code} This approach is consistent with Hadoop and HBase. However the idea here is to be able to set these values in a single place without having to edit startup scripts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1566) Kafka environment configuration (kafka-env.sh)
[ https://issues.apache.org/jira/browse/KAFKA-1566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366373#comment-14366373 ] Sriharsha Chintalapani commented on KAFKA-1566: --- Updated reviewboard https://reviews.apache.org/r/29724/diff/ against branch origin/trunk Kafka environment configuration (kafka-env.sh) -- Key: KAFKA-1566 URL: https://issues.apache.org/jira/browse/KAFKA-1566 Project: Kafka Issue Type: Improvement Components: tools Reporter: Cosmin Lehene Assignee: Sriharsha Chintalapani Labels: newbie Fix For: 0.8.3 Attachments: KAFKA-1566.patch, KAFKA-1566_2015-02-21_21:57:02.patch, KAFKA-1566_2015-03-17_17:01:38.patch, KAFKA-1566_2015-03-17_17:19:23.patch It would be useful (especially for automated deployments) to have an environment configuration file that could be sourced from the launcher files (e.g. kafka-run-server.sh). This is how this could look like kafka-env.sh {code} export KAFKA_JVM_PERFORMANCE_OPTS=-XX:+UseCompressedOops -XX:+DisableExplicitGC -Djava.awt.headless=true \ -XX:+UseG1GC -XX:PermSize=48m -XX:MaxPermSize=48m -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35' % export KAFKA_HEAP_OPTS='-Xmx1G -Xms1G' % export KAFKA_LOG4J_OPTS=-Dkafka.logs.dir=/var/log/kafka {code} kafka-server-start.sh {code} ... source $base_dir/config/kafka-env.sh ... {code} This approach is consistent with Hadoop and HBase. However the idea here is to be able to set these values in a single place without having to edit startup scripts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior
[ https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriharsha Chintalapani updated KAFKA-1461: -- Attachment: KAFKA-1461_2015-03-17_16:03:33.patch Replica fetcher thread does not implement any back-off behavior --- Key: KAFKA-1461 URL: https://issues.apache.org/jira/browse/KAFKA-1461 Project: Kafka Issue Type: Improvement Components: replication Affects Versions: 0.8.1.1 Reporter: Sam Meder Assignee: Sriharsha Chintalapani Labels: newbie++ Fix For: 0.8.3 Attachments: KAFKA-1461.patch, KAFKA-1461.patch, KAFKA-1461_2015-03-11_10:41:26.patch, KAFKA-1461_2015-03-11_18:17:51.patch, KAFKA-1461_2015-03-12_13:54:51.patch, KAFKA-1461_2015-03-17_16:03:33.patch The current replica fetcher thread will retry in a tight loop if any error occurs during the fetch call. For example, we've seen cases where the fetch continuously throws a connection refused exception leading to several replica fetcher threads that spin in a pretty tight loop. To a much lesser degree this is also an issue in the consumer fetcher thread, although the fact that erroring partitions are removed so a leader can be re-discovered helps some. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior
[ https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366262#comment-14366262 ] Sriharsha Chintalapani commented on KAFKA-1461: --- Updated reviewboard https://reviews.apache.org/r/31366/diff/ against branch origin/trunk Replica fetcher thread does not implement any back-off behavior --- Key: KAFKA-1461 URL: https://issues.apache.org/jira/browse/KAFKA-1461 Project: Kafka Issue Type: Improvement Components: replication Affects Versions: 0.8.1.1 Reporter: Sam Meder Assignee: Sriharsha Chintalapani Labels: newbie++ Fix For: 0.8.3 Attachments: KAFKA-1461.patch, KAFKA-1461.patch, KAFKA-1461_2015-03-11_10:41:26.patch, KAFKA-1461_2015-03-11_18:17:51.patch, KAFKA-1461_2015-03-12_13:54:51.patch, KAFKA-1461_2015-03-17_16:03:33.patch The current replica fetcher thread will retry in a tight loop if any error occurs during the fetch call. For example, we've seen cases where the fetch continuously throws a connection refused exception leading to several replica fetcher threads that spin in a pretty tight loop. To a much lesser degree this is also an issue in the consumer fetcher thread, although the fact that erroring partitions are removed so a leader can be re-discovered helps some. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1912) Create a simple request re-routing facility
[ https://issues.apache.org/jira/browse/KAFKA-1912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriharsha Chintalapani updated KAFKA-1912: -- Assignee: (was: Sriharsha Chintalapani) Create a simple request re-routing facility --- Key: KAFKA-1912 URL: https://issues.apache.org/jira/browse/KAFKA-1912 Project: Kafka Issue Type: Improvement Reporter: Jay Kreps Fix For: 0.8.3 We are accumulating a lot of requests that have to be directed to the correct server. This makes sense for high volume produce or fetch requests. But it is silly to put the extra burden on the client for the many miscellaneous requests such as fetching or committing offsets and so on. This adds a ton of practical complexity to the clients with little or no payoff in performance. We should add a generic request-type agnostic re-routing facility on the server. This would allow any server to accept a request and forward it to the correct destination, proxying the response back to the user. Naturally it needs to do this without blocking the thread. The result is that a client implementation can choose to be optimally efficient and manage a local cache of cluster state and attempt to always direct its requests to the proper server OR it can choose simplicity and just send things all to a single host and let that host figure out where to forward it. I actually think we should implement this more or less across the board, but some requests such as produce and fetch require more logic to proxy since they have to be scattered out to multiple servers and gathered back to create the response. So these could be done in a second phase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1646) Improve consumer read performance for Windows
[ https://issues.apache.org/jira/browse/KAFKA-1646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14364344#comment-14364344 ] Sriharsha Chintalapani commented on KAFKA-1646: --- [~waldenchen] Its looks like the patch is against 0.8.1.1 branch can you send us a patch against trunk. Improve consumer read performance for Windows - Key: KAFKA-1646 URL: https://issues.apache.org/jira/browse/KAFKA-1646 Project: Kafka Issue Type: Improvement Components: log Affects Versions: 0.8.1.1 Environment: Windows Reporter: xueqiang wang Assignee: xueqiang wang Labels: newbie, patch Attachments: Improve consumer read performance for Windows.patch, KAFKA-1646-truncate-off-trailing-zeros-on-broker-restart-if-bro.patch, KAFKA-1646_20141216_163008.patch, KAFKA-1646_20150306_005526.patch, KAFKA-1646_20150312_200352.patch This patch is for Window platform only. In Windows platform, if there are more than one replicas writing to disk, the segment log files will not be consistent in disk and then consumer reading performance will be dropped down greatly. This fix allocates more disk spaces when rolling a new segment, and then it will improve the consumer reading performance in NTFS file system. This patch doesn't affect file allocation of other filesystems, for it only adds statements like 'if(Os.iswindow)' or adds methods used on Windows. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (KAFKA-1685) Implement TLS/SSL tests
[ https://issues.apache.org/jira/browse/KAFKA-1685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriharsha Chintalapani reassigned KAFKA-1685: - Assignee: Sriharsha Chintalapani Implement TLS/SSL tests --- Key: KAFKA-1685 URL: https://issues.apache.org/jira/browse/KAFKA-1685 Project: Kafka Issue Type: Sub-task Components: security Affects Versions: 0.9.0 Reporter: Jay Kreps Assignee: Sriharsha Chintalapani Fix For: 0.9.0 We need to write a suite of unit tests for TLS authentication. This should be doable with a junit integration test. We can use the simple authorization plugin with only a single user whitelisted. The test can start the server and then connects with and without TLS and validates that access is only possible when authenticated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1682) Security for Kafka
[ https://issues.apache.org/jira/browse/KAFKA-1682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14362644#comment-14362644 ] Sriharsha Chintalapani commented on KAFKA-1682: --- Also 5. KAFKA-1688(Authorization), by [~parthrbhatt] , depending on KAFKA-1684 KAFKA-1686 Security for Kafka -- Key: KAFKA-1682 URL: https://issues.apache.org/jira/browse/KAFKA-1682 Project: Kafka Issue Type: New Feature Components: security Affects Versions: 0.9.0 Reporter: Jay Kreps Parent ticket for security. Wiki and discussion is here: https://cwiki.apache.org/confluence/display/KAFKA/Security -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KAFKA-1682) Security for Kafka
[ https://issues.apache.org/jira/browse/KAFKA-1682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14362644#comment-14362644 ] Sriharsha Chintalapani edited comment on KAFKA-1682 at 3/16/15 1:14 AM: Also 5. KAFKA-1688(Authorization), by [~parth.brahmbhatt] , depending on KAFKA-1684 KAFKA-1686 was (Author: sriharsha): Also 5. KAFKA-1688(Authorization), by [~parthrbhatt] , depending on KAFKA-1684 KAFKA-1686 Security for Kafka -- Key: KAFKA-1682 URL: https://issues.apache.org/jira/browse/KAFKA-1682 Project: Kafka Issue Type: New Feature Components: security Affects Versions: 0.9.0 Reporter: Jay Kreps Parent ticket for security. Wiki and discussion is here: https://cwiki.apache.org/confluence/display/KAFKA/Security -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1684) Implement TLS/SSL authentication
[ https://issues.apache.org/jira/browse/KAFKA-1684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14361539#comment-14361539 ] Sriharsha Chintalapani commented on KAFKA-1684: --- [~junrao] Definitely we can go that route. I would like to get more details on what needs to be done for KAFKA-1928. Currently the SSLChannel can be used by both client and socketServer. Also we need to get KAFKA-1809 in that sets up the framework for multiple ports. [~gwenshap] KAFKA-1928 is currently assigned to you. If you are not actively working on it can I take it?. Implement TLS/SSL authentication Key: KAFKA-1684 URL: https://issues.apache.org/jira/browse/KAFKA-1684 Project: Kafka Issue Type: Sub-task Components: security Affects Versions: 0.9.0 Reporter: Jay Kreps Assignee: Ivan Lyutov Attachments: KAFKA-1684.patch, KAFKA-1684.patch Add an SSL port to the configuration and advertise this as part of the metadata request. If the SSL port is configured the socket server will need to add a second Acceptor thread to listen on it. Connections accepted on this port will need to go through the SSL handshake prior to being registered with a Processor for request processing. SSL requests and responses may need to be wrapped or unwrapped using the SSLEngine that was initialized by the acceptor. This wrapping and unwrapping is very similar to what will need to be done for SASL-based authentication schemes. We should have a uniform interface that covers both of these and we will need to store the instance in the session with the request. The socket server will have to use this object when reading and writing requests. We will need to take care with the FetchRequests as the current FileChannel.transferTo mechanism will be incompatible with wrap/unwrap so we can only use this optimization for unencrypted sockets that don't require userspace translation (wrapping). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior
[ https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359384#comment-14359384 ] Sriharsha Chintalapani commented on KAFKA-1461: --- Updated reviewboard https://reviews.apache.org/r/31927/diff/ against branch origin/trunk Replica fetcher thread does not implement any back-off behavior --- Key: KAFKA-1461 URL: https://issues.apache.org/jira/browse/KAFKA-1461 Project: Kafka Issue Type: Improvement Components: replication Affects Versions: 0.8.1.1 Reporter: Sam Meder Assignee: Sriharsha Chintalapani Labels: newbie++ Fix For: 0.8.3 Attachments: KAFKA-1461.patch, KAFKA-1461.patch, KAFKA-1461_2015-03-11_10:41:26.patch, KAFKA-1461_2015-03-11_18:17:51.patch, KAFKA-1461_2015-03-12_13:54:51.patch The current replica fetcher thread will retry in a tight loop if any error occurs during the fetch call. For example, we've seen cases where the fetch continuously throws a connection refused exception leading to several replica fetcher threads that spin in a pretty tight loop. To a much lesser degree this is also an issue in the consumer fetcher thread, although the fact that erroring partitions are removed so a leader can be re-discovered helps some. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior
[ https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriharsha Chintalapani updated KAFKA-1461: -- Attachment: KAFKA-1461_2015-03-12_13:54:51.patch Replica fetcher thread does not implement any back-off behavior --- Key: KAFKA-1461 URL: https://issues.apache.org/jira/browse/KAFKA-1461 Project: Kafka Issue Type: Improvement Components: replication Affects Versions: 0.8.1.1 Reporter: Sam Meder Assignee: Sriharsha Chintalapani Labels: newbie++ Fix For: 0.8.3 Attachments: KAFKA-1461.patch, KAFKA-1461.patch, KAFKA-1461_2015-03-11_10:41:26.patch, KAFKA-1461_2015-03-11_18:17:51.patch, KAFKA-1461_2015-03-12_13:54:51.patch The current replica fetcher thread will retry in a tight loop if any error occurs during the fetch call. For example, we've seen cases where the fetch continuously throws a connection refused exception leading to several replica fetcher threads that spin in a pretty tight loop. To a much lesser degree this is also an issue in the consumer fetcher thread, although the fact that erroring partitions are removed so a leader can be re-discovered helps some. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior
[ https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359469#comment-14359469 ] Sriharsha Chintalapani commented on KAFKA-1461: --- Thanks [~junrao] I'll incorporate your and [~guozhang] feedback for RB 31366. Yes we can add that condition at partitionMapCond.await(200L) and use the fetchBackoffMs . I'll send updated pr for it. Replica fetcher thread does not implement any back-off behavior --- Key: KAFKA-1461 URL: https://issues.apache.org/jira/browse/KAFKA-1461 Project: Kafka Issue Type: Improvement Components: replication Affects Versions: 0.8.1.1 Reporter: Sam Meder Assignee: Sriharsha Chintalapani Labels: newbie++ Fix For: 0.8.3 Attachments: KAFKA-1461.patch, KAFKA-1461.patch, KAFKA-1461_2015-03-11_10:41:26.patch, KAFKA-1461_2015-03-11_18:17:51.patch, KAFKA-1461_2015-03-12_13:54:51.patch The current replica fetcher thread will retry in a tight loop if any error occurs during the fetch call. For example, we've seen cases where the fetch continuously throws a connection refused exception leading to several replica fetcher threads that spin in a pretty tight loop. To a much lesser degree this is also an issue in the consumer fetcher thread, although the fact that erroring partitions are removed so a leader can be re-discovered helps some. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1684) Implement TLS/SSL authentication
[ https://issues.apache.org/jira/browse/KAFKA-1684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357665#comment-14357665 ] Sriharsha Chintalapani commented on KAFKA-1684: --- [~junrao] [~jkreps] [~gwenshap] [~jjkoshy] Hi Everyone, I am trying to setup a pattern for both ssl and kerberos auth. The above patch include SSL auth for socket server. Please ignore the changes around KafkaServer and SockerServer way of creating acceptors as there is work done in KAFKA-1809. Based on the comments before I didn't extend any socketChannel instead SSLChannel more of a wrapper around socketChannel. The SSL handshake done using non blocking I/O and without any thread.sleeps . Since SocketServer.processor depends on socketChannels selection I had to use a ConcurrentHashMap which stores the socketChannel as key and ChannelWrapper as the value. The other option I've is to extend the SocketChannel or use the SelectionKey attachment but we are using this attachment for request and response. The patch also contains a unit test in SocketServerTest which generates self-signed certs and runs the socket server using those also sends a request. I also have GSSChannel for KAFKA-1686 which does the kerberos auth in similar pattern to SSLChannel. Once this patch gets reviewed, based on the feedback I'll change the SSL and kerberos patches. I appreciate any feedback on the above patch. Implement TLS/SSL authentication Key: KAFKA-1684 URL: https://issues.apache.org/jira/browse/KAFKA-1684 Project: Kafka Issue Type: Sub-task Components: security Affects Versions: 0.9.0 Reporter: Jay Kreps Assignee: Ivan Lyutov Attachments: KAFKA-1684.patch, KAFKA-1684.patch Add an SSL port to the configuration and advertise this as part of the metadata request. If the SSL port is configured the socket server will need to add a second Acceptor thread to listen on it. Connections accepted on this port will need to go through the SSL handshake prior to being registered with a Processor for request processing. SSL requests and responses may need to be wrapped or unwrapped using the SSLEngine that was initialized by the acceptor. This wrapping and unwrapping is very similar to what will need to be done for SASL-based authentication schemes. We should have a uniform interface that covers both of these and we will need to store the instance in the session with the request. The socket server will have to use this object when reading and writing requests. We will need to take care with the FetchRequests as the current FileChannel.transferTo mechanism will be incompatible with wrap/unwrap so we can only use this optimization for unencrypted sockets that don't require userspace translation (wrapping). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1684) Implement TLS/SSL authentication
[ https://issues.apache.org/jira/browse/KAFKA-1684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357660#comment-14357660 ] Sriharsha Chintalapani commented on KAFKA-1684: --- Created reviewboard https://reviews.apache.org/r/31958/diff/ against branch origin/trunk Implement TLS/SSL authentication Key: KAFKA-1684 URL: https://issues.apache.org/jira/browse/KAFKA-1684 Project: Kafka Issue Type: Sub-task Components: security Affects Versions: 0.9.0 Reporter: Jay Kreps Assignee: Ivan Lyutov Attachments: KAFKA-1684.patch, KAFKA-1684.patch Add an SSL port to the configuration and advertise this as part of the metadata request. If the SSL port is configured the socket server will need to add a second Acceptor thread to listen on it. Connections accepted on this port will need to go through the SSL handshake prior to being registered with a Processor for request processing. SSL requests and responses may need to be wrapped or unwrapped using the SSLEngine that was initialized by the acceptor. This wrapping and unwrapping is very similar to what will need to be done for SASL-based authentication schemes. We should have a uniform interface that covers both of these and we will need to store the instance in the session with the request. The socket server will have to use this object when reading and writing requests. We will need to take care with the FetchRequests as the current FileChannel.transferTo mechanism will be incompatible with wrap/unwrap so we can only use this optimization for unencrypted sockets that don't require userspace translation (wrapping). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1684) Implement TLS/SSL authentication
[ https://issues.apache.org/jira/browse/KAFKA-1684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriharsha Chintalapani updated KAFKA-1684: -- Attachment: KAFKA-1684.patch Implement TLS/SSL authentication Key: KAFKA-1684 URL: https://issues.apache.org/jira/browse/KAFKA-1684 Project: Kafka Issue Type: Sub-task Components: security Affects Versions: 0.9.0 Reporter: Jay Kreps Assignee: Ivan Lyutov Attachments: KAFKA-1684.patch, KAFKA-1684.patch Add an SSL port to the configuration and advertise this as part of the metadata request. If the SSL port is configured the socket server will need to add a second Acceptor thread to listen on it. Connections accepted on this port will need to go through the SSL handshake prior to being registered with a Processor for request processing. SSL requests and responses may need to be wrapped or unwrapped using the SSLEngine that was initialized by the acceptor. This wrapping and unwrapping is very similar to what will need to be done for SASL-based authentication schemes. We should have a uniform interface that covers both of these and we will need to store the instance in the session with the request. The socket server will have to use this object when reading and writing requests. We will need to take care with the FetchRequests as the current FileChannel.transferTo mechanism will be incompatible with wrap/unwrap so we can only use this optimization for unencrypted sockets that don't require userspace translation (wrapping). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior
[ https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357263#comment-14357263 ] Sriharsha Chintalapani commented on KAFKA-1461: --- Updated reviewboard https://reviews.apache.org/r/31927/diff/ against branch origin/trunk Replica fetcher thread does not implement any back-off behavior --- Key: KAFKA-1461 URL: https://issues.apache.org/jira/browse/KAFKA-1461 Project: Kafka Issue Type: Improvement Components: replication Affects Versions: 0.8.1.1 Reporter: Sam Meder Assignee: Sriharsha Chintalapani Labels: newbie++ Fix For: 0.8.3 Attachments: KAFKA-1461.patch, KAFKA-1461.patch, KAFKA-1461_2015-03-11_10:41:26.patch The current replica fetcher thread will retry in a tight loop if any error occurs during the fetch call. For example, we've seen cases where the fetch continuously throws a connection refused exception leading to several replica fetcher threads that spin in a pretty tight loop. To a much lesser degree this is also an issue in the consumer fetcher thread, although the fact that erroring partitions are removed so a leader can be re-discovered helps some. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior
[ https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriharsha Chintalapani updated KAFKA-1461: -- Attachment: KAFKA-1461_2015-03-11_10:41:26.patch Replica fetcher thread does not implement any back-off behavior --- Key: KAFKA-1461 URL: https://issues.apache.org/jira/browse/KAFKA-1461 Project: Kafka Issue Type: Improvement Components: replication Affects Versions: 0.8.1.1 Reporter: Sam Meder Assignee: Sriharsha Chintalapani Labels: newbie++ Fix For: 0.8.3 Attachments: KAFKA-1461.patch, KAFKA-1461.patch, KAFKA-1461_2015-03-11_10:41:26.patch The current replica fetcher thread will retry in a tight loop if any error occurs during the fetch call. For example, we've seen cases where the fetch continuously throws a connection refused exception leading to several replica fetcher threads that spin in a pretty tight loop. To a much lesser degree this is also an issue in the consumer fetcher thread, although the fact that erroring partitions are removed so a leader can be re-discovered helps some. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior
[ https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriharsha Chintalapani updated KAFKA-1461: -- Attachment: KAFKA-1461_2015-03-11_18:17:51.patch Replica fetcher thread does not implement any back-off behavior --- Key: KAFKA-1461 URL: https://issues.apache.org/jira/browse/KAFKA-1461 Project: Kafka Issue Type: Improvement Components: replication Affects Versions: 0.8.1.1 Reporter: Sam Meder Assignee: Sriharsha Chintalapani Labels: newbie++ Fix For: 0.8.3 Attachments: KAFKA-1461.patch, KAFKA-1461.patch, KAFKA-1461_2015-03-11_10:41:26.patch, KAFKA-1461_2015-03-11_18:17:51.patch The current replica fetcher thread will retry in a tight loop if any error occurs during the fetch call. For example, we've seen cases where the fetch continuously throws a connection refused exception leading to several replica fetcher threads that spin in a pretty tight loop. To a much lesser degree this is also an issue in the consumer fetcher thread, although the fact that erroring partitions are removed so a leader can be re-discovered helps some. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior
[ https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357927#comment-14357927 ] Sriharsha Chintalapani commented on KAFKA-1461: --- Updated reviewboard https://reviews.apache.org/r/31927/diff/ against branch origin/trunk Replica fetcher thread does not implement any back-off behavior --- Key: KAFKA-1461 URL: https://issues.apache.org/jira/browse/KAFKA-1461 Project: Kafka Issue Type: Improvement Components: replication Affects Versions: 0.8.1.1 Reporter: Sam Meder Assignee: Sriharsha Chintalapani Labels: newbie++ Fix For: 0.8.3 Attachments: KAFKA-1461.patch, KAFKA-1461.patch, KAFKA-1461_2015-03-11_10:41:26.patch, KAFKA-1461_2015-03-11_18:17:51.patch The current replica fetcher thread will retry in a tight loop if any error occurs during the fetch call. For example, we've seen cases where the fetch continuously throws a connection refused exception leading to several replica fetcher threads that spin in a pretty tight loop. To a much lesser degree this is also an issue in the consumer fetcher thread, although the fact that erroring partitions are removed so a leader can be re-discovered helps some. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior
[ https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358040#comment-14358040 ] Sriharsha Chintalapani commented on KAFKA-1461: --- [~charmalloc] since there aren't any interface changes I am not sure if a KIP is necessary. Ofcourse we added a new config for replica.fetch.backoff.ms If this warrants a KIP than I can write up one. Replica fetcher thread does not implement any back-off behavior --- Key: KAFKA-1461 URL: https://issues.apache.org/jira/browse/KAFKA-1461 Project: Kafka Issue Type: Improvement Components: replication Affects Versions: 0.8.1.1 Reporter: Sam Meder Assignee: Sriharsha Chintalapani Labels: newbie++ Fix For: 0.8.3 Attachments: KAFKA-1461.patch, KAFKA-1461.patch, KAFKA-1461_2015-03-11_10:41:26.patch, KAFKA-1461_2015-03-11_18:17:51.patch The current replica fetcher thread will retry in a tight loop if any error occurs during the fetch call. For example, we've seen cases where the fetch continuously throws a connection refused exception leading to several replica fetcher threads that spin in a pretty tight loop. To a much lesser degree this is also an issue in the consumer fetcher thread, although the fact that erroring partitions are removed so a leader can be re-discovered helps some. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior
[ https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356341#comment-14356341 ] Sriharsha Chintalapani commented on KAFKA-1461: --- Created reviewboard https://reviews.apache.org/r/31927/diff/ against branch origin/trunk Replica fetcher thread does not implement any back-off behavior --- Key: KAFKA-1461 URL: https://issues.apache.org/jira/browse/KAFKA-1461 Project: Kafka Issue Type: Improvement Components: replication Affects Versions: 0.8.1.1 Reporter: Sam Meder Assignee: Sriharsha Chintalapani Labels: newbie++ Fix For: 0.8.3 Attachments: KAFKA-1461.patch, KAFKA-1461.patch The current replica fetcher thread will retry in a tight loop if any error occurs during the fetch call. For example, we've seen cases where the fetch continuously throws a connection refused exception leading to several replica fetcher threads that spin in a pretty tight loop. To a much lesser degree this is also an issue in the consumer fetcher thread, although the fact that erroring partitions are removed so a leader can be re-discovered helps some. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior
[ https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356344#comment-14356344 ] Sriharsha Chintalapani commented on KAFKA-1461: --- [~junrao] [~guozhang] please take a look at the above patch . Let me know if that's what you have in mind. I also added replica.fetch.backoff.ms and controller.socket.timeout.ms to TestUtils.createBrokerConfig this reduced the total test run time from 15mins to under 10mins on my machine. Replica fetcher thread does not implement any back-off behavior --- Key: KAFKA-1461 URL: https://issues.apache.org/jira/browse/KAFKA-1461 Project: Kafka Issue Type: Improvement Components: replication Affects Versions: 0.8.1.1 Reporter: Sam Meder Assignee: Sriharsha Chintalapani Labels: newbie++ Fix For: 0.8.3 Attachments: KAFKA-1461.patch, KAFKA-1461.patch The current replica fetcher thread will retry in a tight loop if any error occurs during the fetch call. For example, we've seen cases where the fetch continuously throws a connection refused exception leading to several replica fetcher threads that spin in a pretty tight loop. To a much lesser degree this is also an issue in the consumer fetcher thread, although the fact that erroring partitions are removed so a leader can be re-discovered helps some. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior
[ https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriharsha Chintalapani updated KAFKA-1461: -- Attachment: KAFKA-1461.patch Replica fetcher thread does not implement any back-off behavior --- Key: KAFKA-1461 URL: https://issues.apache.org/jira/browse/KAFKA-1461 Project: Kafka Issue Type: Improvement Components: replication Affects Versions: 0.8.1.1 Reporter: Sam Meder Assignee: Sriharsha Chintalapani Labels: newbie++ Fix For: 0.8.3 Attachments: KAFKA-1461.patch, KAFKA-1461.patch The current replica fetcher thread will retry in a tight loop if any error occurs during the fetch call. For example, we've seen cases where the fetch continuously throws a connection refused exception leading to several replica fetcher threads that spin in a pretty tight loop. To a much lesser degree this is also an issue in the consumer fetcher thread, although the fact that erroring partitions are removed so a leader can be re-discovered helps some. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior
[ https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14354139#comment-14354139 ] Sriharsha Chintalapani commented on KAFKA-1461: --- [~junrao] I'll work tomorrow and finish up the patch. Few questions on your recommendations In AbstractFetcherThread, simply backoff based on the configured time, if it hits an exception when doing a fetch. so instead of handling partitions errors if there is an exception while fetching we will just backoff the AbstractFetcherThread wait until configured time elapsed or a condition is met? Replica fetcher thread does not implement any back-off behavior --- Key: KAFKA-1461 URL: https://issues.apache.org/jira/browse/KAFKA-1461 Project: Kafka Issue Type: Improvement Components: replication Affects Versions: 0.8.1.1 Reporter: Sam Meder Assignee: Sriharsha Chintalapani Labels: newbie++ Fix For: 0.8.3 Attachments: KAFKA-1461.patch The current replica fetcher thread will retry in a tight loop if any error occurs during the fetch call. For example, we've seen cases where the fetch continuously throws a connection refused exception leading to several replica fetcher threads that spin in a pretty tight loop. To a much lesser degree this is also an issue in the consumer fetcher thread, although the fact that erroring partitions are removed so a leader can be re-discovered helps some. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2010) unit tests sometimes are slow during shutdown
[ https://issues.apache.org/jira/browse/KAFKA-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14352329#comment-14352329 ] Sriharsha Chintalapani commented on KAFKA-2010: --- [~junrao] It might be due to this https://issues.apache.org/jira/browse/KAFKA-1887?focusedCommentId=14330093page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14330093 . KAFKA-1971 removed healthCheck.shutdown() which doesn't remove the brokerId causing controllerChannelManager.shutdown() throw an exception as noted in my comment in KAFKA-1887. unit tests sometimes are slow during shutdown - Key: KAFKA-2010 URL: https://issues.apache.org/jira/browse/KAFKA-2010 Project: Kafka Issue Type: Bug Reporter: Jun Rao Assignee: Jun Rao Fix For: 0.8.3 Our unit tests in trunk seem to be slower than before. The slowness seems to be due to a handful of tests. For example, if you run the following test, sometimes it can take more than 40 secs, while normally it takes less than 10 secs. ./gradlew -i cleanTest core:test --tests kafka.admin.AddPartitionsTest.testIncrementPartitions -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-2000) Delete consumer offsets from kafka once the topic is deleted
Sriharsha Chintalapani created KAFKA-2000: - Summary: Delete consumer offsets from kafka once the topic is deleted Key: KAFKA-2000 URL: https://issues.apache.org/jira/browse/KAFKA-2000 Project: Kafka Issue Type: Bug Reporter: Sriharsha Chintalapani Assignee: Sriharsha Chintalapani -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1866) LogStartOffset gauge throws exceptions after log.delete()
[ https://issues.apache.org/jira/browse/KAFKA-1866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14340162#comment-14340162 ] Sriharsha Chintalapani commented on KAFKA-1866: --- [~nehanarkhede] I updated the patch can you please take a look. Thanks. LogStartOffset gauge throws exceptions after log.delete() - Key: KAFKA-1866 URL: https://issues.apache.org/jira/browse/KAFKA-1866 Project: Kafka Issue Type: Bug Reporter: Gian Merlino Assignee: Sriharsha Chintalapani Attachments: KAFKA-1866.patch, KAFKA-1866_2015-02-10_22:50:09.patch, KAFKA-1866_2015-02-11_09:25:33.patch The LogStartOffset gauge does logSegments.head.baseOffset, which throws NoSuchElementException on an empty list, which can occur after a delete() of the log. This makes life harder for custom MetricsReporters, since they have to deal with .value() possibly throwing an exception. Locally we're dealing with this by having Log.delete() also call removeMetric on all the gauges. That also has the benefit of not having a bunch of metrics floating around for logs that the broker is not actually handling. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1852) OffsetCommitRequest can commit offset on unknown topic
[ https://issues.apache.org/jira/browse/KAFKA-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14340163#comment-14340163 ] Sriharsha Chintalapani commented on KAFKA-1852: --- [~jjkoshy] Updated the patch as per your suggestions can you please take a look.Thanks OffsetCommitRequest can commit offset on unknown topic -- Key: KAFKA-1852 URL: https://issues.apache.org/jira/browse/KAFKA-1852 Project: Kafka Issue Type: Bug Affects Versions: 0.8.3 Reporter: Jun Rao Assignee: Sriharsha Chintalapani Attachments: KAFKA-1852.patch, KAFKA-1852_2015-01-19_10:44:01.patch, KAFKA-1852_2015-02-12_16:46:10.patch, KAFKA-1852_2015-02-16_13:21:46.patch, KAFKA-1852_2015-02-18_13:13:17.patch Currently, we allow an offset to be committed to Kafka, even when the topic/partition for the offset doesn't exist. We probably should disallow that and send an error back in that case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1852) OffsetCommitRequest can commit offset on unknown topic
[ https://issues.apache.org/jira/browse/KAFKA-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriharsha Chintalapani updated KAFKA-1852: -- Attachment: KAFKA-1852_2015-02-27_13:50:34.patch OffsetCommitRequest can commit offset on unknown topic -- Key: KAFKA-1852 URL: https://issues.apache.org/jira/browse/KAFKA-1852 Project: Kafka Issue Type: Bug Affects Versions: 0.8.3 Reporter: Jun Rao Assignee: Sriharsha Chintalapani Attachments: KAFKA-1852.patch, KAFKA-1852_2015-01-19_10:44:01.patch, KAFKA-1852_2015-02-12_16:46:10.patch, KAFKA-1852_2015-02-16_13:21:46.patch, KAFKA-1852_2015-02-18_13:13:17.patch, KAFKA-1852_2015-02-27_13:50:34.patch Currently, we allow an offset to be committed to Kafka, even when the topic/partition for the offset doesn't exist. We probably should disallow that and send an error back in that case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1852) OffsetCommitRequest can commit offset on unknown topic
[ https://issues.apache.org/jira/browse/KAFKA-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14340874#comment-14340874 ] Sriharsha Chintalapani commented on KAFKA-1852: --- Updated reviewboard https://reviews.apache.org/r/29912/diff/ against branch origin/trunk OffsetCommitRequest can commit offset on unknown topic -- Key: KAFKA-1852 URL: https://issues.apache.org/jira/browse/KAFKA-1852 Project: Kafka Issue Type: Bug Affects Versions: 0.8.3 Reporter: Jun Rao Assignee: Sriharsha Chintalapani Attachments: KAFKA-1852.patch, KAFKA-1852_2015-01-19_10:44:01.patch, KAFKA-1852_2015-02-12_16:46:10.patch, KAFKA-1852_2015-02-16_13:21:46.patch, KAFKA-1852_2015-02-18_13:13:17.patch, KAFKA-1852_2015-02-27_13:50:34.patch Currently, we allow an offset to be committed to Kafka, even when the topic/partition for the offset doesn't exist. We probably should disallow that and send an error back in that case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1596) Exception in KafkaScheduler while shutting down
[ https://issues.apache.org/jira/browse/KAFKA-1596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriharsha Chintalapani updated KAFKA-1596: -- Resolution: Duplicate Status: Resolved (was: Patch Available) This is issue is already fixed as part of KAFKA-1760. Exception in KafkaScheduler while shutting down --- Key: KAFKA-1596 URL: https://issues.apache.org/jira/browse/KAFKA-1596 Project: Kafka Issue Type: Bug Reporter: Joel Koshy Labels: newbie Attachments: kafka-1596.patch Saw this while trying to reproduce KAFKA-1577. It is very minor and won't happen in practice but annoying nonetheless. {code} [2014-08-14 18:03:56,686] INFO zookeeper state changed (SyncConnected) (org.I0Itec.zkclient.ZkClient) [2014-08-14 18:03:56,776] INFO Loading logs. (kafka.log.LogManager) [2014-08-14 18:03:56,783] INFO Logs loading complete. (kafka.log.LogManager) [2014-08-14 18:03:57,120] INFO Starting log cleanup with a period of 30 ms. (kafka.log.LogManager) [2014-08-14 18:03:57,124] INFO Starting log flusher with a default period of 9223372036854775807 ms. (kafka.log.LogManager) [2014-08-14 18:03:57,158] INFO Awaiting socket connections on 0.0.0.0:9092. (kafka.network.Acceptor) [2014-08-14 18:03:57,160] INFO [Socket Server on Broker 0], Started (kafka.network.SocketServer) ^C[2014-08-14 18:03:57,203] INFO [Kafka Server 0], shutting down (kafka.server.KafkaServer) [2014-08-14 18:03:57,211] INFO [Socket Server on Broker 0], Shutting down (kafka.network.SocketServer) [2014-08-14 18:03:57,222] INFO [Socket Server on Broker 0], Shutdown completed (kafka.network.SocketServer) [2014-08-14 18:03:57,226] INFO [Replica Manager on Broker 0]: Shut down (kafka.server.ReplicaManager) [2014-08-14 18:03:57,228] INFO [ReplicaFetcherManager on broker 0] shutting down (kafka.server.ReplicaFetcherManager) [2014-08-14 18:03:57,233] INFO [ReplicaFetcherManager on broker 0] shutdown completed (kafka.server.ReplicaFetcherManager) [2014-08-14 18:03:57,274] INFO [Replica Manager on Broker 0]: Shut down completely (kafka.server.ReplicaManager) [2014-08-14 18:03:57,276] INFO Shutting down. (kafka.log.LogManager) [2014-08-14 18:03:57,296] INFO Will not load MX4J, mx4j-tools.jar is not in the classpath (kafka.utils.Mx4jLoader$) [2014-08-14 18:03:57,297] INFO Shutdown complete. (kafka.log.LogManager) [2014-08-14 18:03:57,301] FATAL Fatal error during KafkaServerStable startup. Prepare to shutdown (kafka.server.KafkaServerStartable) java.lang.IllegalStateException: Kafka scheduler has not been started at kafka.utils.KafkaScheduler.ensureStarted(KafkaScheduler.scala:114) at kafka.utils.KafkaScheduler.schedule(KafkaScheduler.scala:95) at kafka.server.ReplicaManager.startup(ReplicaManager.scala:138) at kafka.server.KafkaServer.startup(KafkaServer.scala:112) at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:28) at kafka.Kafka$.main(Kafka.scala:46) at kafka.Kafka.main(Kafka.scala) [2014-08-14 18:03:57,324] INFO [Kafka Server 0], shutting down (kafka.server.KafkaServer) [2014-08-14 18:03:57,326] INFO Terminate ZkClient event thread. (org.I0Itec.zkclient.ZkEventThread) [2014-08-14 18:03:57,329] INFO Session: 0x147d5b0a51a closed (org.apache.zookeeper.ZooKeeper) [2014-08-14 18:03:57,329] INFO EventThread shut down (org.apache.zookeeper.ClientCnxn) [2014-08-14 18:03:57,329] INFO [Kafka Server 0], shut down completed (kafka.server.KafkaServer) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior
[ https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriharsha Chintalapani updated KAFKA-1461: -- Attachment: KAFKA-1461.patch Replica fetcher thread does not implement any back-off behavior --- Key: KAFKA-1461 URL: https://issues.apache.org/jira/browse/KAFKA-1461 Project: Kafka Issue Type: Improvement Components: replication Affects Versions: 0.8.1.1 Reporter: Sam Meder Assignee: Sriharsha Chintalapani Labels: newbie++ Fix For: 0.8.3 Attachments: KAFKA-1461.patch The current replica fetcher thread will retry in a tight loop if any error occurs during the fetch call. For example, we've seen cases where the fetch continuously throws a connection refused exception leading to several replica fetcher threads that spin in a pretty tight loop. To a much lesser degree this is also an issue in the consumer fetcher thread, although the fact that erroring partitions are removed so a leader can be re-discovered helps some. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior
[ https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335164#comment-14335164 ] Sriharsha Chintalapani commented on KAFKA-1461: --- Created reviewboard https://reviews.apache.org/r/31366/diff/ against branch origin/trunk Replica fetcher thread does not implement any back-off behavior --- Key: KAFKA-1461 URL: https://issues.apache.org/jira/browse/KAFKA-1461 Project: Kafka Issue Type: Improvement Components: replication Affects Versions: 0.8.1.1 Reporter: Sam Meder Assignee: Sriharsha Chintalapani Labels: newbie++ Fix For: 0.8.3 Attachments: KAFKA-1461.patch The current replica fetcher thread will retry in a tight loop if any error occurs during the fetch call. For example, we've seen cases where the fetch continuously throws a connection refused exception leading to several replica fetcher threads that spin in a pretty tight loop. To a much lesser degree this is also an issue in the consumer fetcher thread, although the fact that erroring partitions are removed so a leader can be re-discovered helps some. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior
[ https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriharsha Chintalapani updated KAFKA-1461: -- Status: Patch Available (was: Open) Replica fetcher thread does not implement any back-off behavior --- Key: KAFKA-1461 URL: https://issues.apache.org/jira/browse/KAFKA-1461 Project: Kafka Issue Type: Improvement Components: replication Affects Versions: 0.8.1.1 Reporter: Sam Meder Assignee: Sriharsha Chintalapani Labels: newbie++ Fix For: 0.8.3 Attachments: KAFKA-1461.patch The current replica fetcher thread will retry in a tight loop if any error occurs during the fetch call. For example, we've seen cases where the fetch continuously throws a connection refused exception leading to several replica fetcher threads that spin in a pretty tight loop. To a much lesser degree this is also an issue in the consumer fetcher thread, although the fact that erroring partitions are removed so a leader can be re-discovered helps some. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior
[ https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335166#comment-14335166 ] Sriharsha Chintalapani commented on KAFKA-1461: --- [~guozhang] thanks for the pointers. Can you please take a look at the patch when you get a chance. Replica fetcher thread does not implement any back-off behavior --- Key: KAFKA-1461 URL: https://issues.apache.org/jira/browse/KAFKA-1461 Project: Kafka Issue Type: Improvement Components: replication Affects Versions: 0.8.1.1 Reporter: Sam Meder Assignee: Sriharsha Chintalapani Labels: newbie++ Fix For: 0.8.3 Attachments: KAFKA-1461.patch The current replica fetcher thread will retry in a tight loop if any error occurs during the fetch call. For example, we've seen cases where the fetch continuously throws a connection refused exception leading to several replica fetcher threads that spin in a pretty tight loop. To a much lesser degree this is also an issue in the consumer fetcher thread, although the fact that erroring partitions are removed so a leader can be re-discovered helps some. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1757) Can not delete Topic index on Windows
[ https://issues.apache.org/jira/browse/KAFKA-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334389#comment-14334389 ] Sriharsha Chintalapani commented on KAFKA-1757: --- patch merged by [~jkreps] closing this as fixed. Can not delete Topic index on Windows - Key: KAFKA-1757 URL: https://issues.apache.org/jira/browse/KAFKA-1757 Project: Kafka Issue Type: Bug Components: log Affects Versions: 0.8.2.0 Reporter: Lukáš Vyhlídka Assignee: Sriharsha Chintalapani Priority: Minor Fix For: 0.8.3 Attachments: KAFKA-1757.patch, lucky-v.patch When running the Kafka 0.8.2-Beta (Scala 2.10) on Windows, an attempt to delete the Topic throwed an error: ERROR [KafkaApi-1] error when handling request Name: StopReplicaRequest; Version: 0; CorrelationId: 38; ClientId: ; DeletePartitions: true; ControllerId: 0; ControllerEpoch: 3; Partitions: [test,0] (kafka.server.KafkaApis) kafka.common.KafkaStorageException: Delete of index .index failed. at kafka.log.LogSegment.delete(LogSegment.scala:283) at kafka.log.Log$$anonfun$delete$1.apply(Log.scala:608) at kafka.log.Log$$anonfun$delete$1.apply(Log.scala:608) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at kafka.log.Log.delete(Log.scala:608) at kafka.log.LogManager.deleteLog(LogManager.scala:375) at kafka.cluster.Partition$$anonfun$delete$1.apply$mcV$sp(Partition.scala:144) at kafka.cluster.Partition$$anonfun$delete$1.apply(Partition.scala:139) at kafka.cluster.Partition$$anonfun$delete$1.apply(Partition.scala:139) at kafka.utils.Utils$.inLock(Utils.scala:535) at kafka.utils.Utils$.inWriteLock(Utils.scala:543) at kafka.cluster.Partition.delete(Partition.scala:139) at kafka.server.ReplicaManager.stopReplica(ReplicaManager.scala:158) at kafka.server.ReplicaManager$$anonfun$stopReplicas$3.apply(ReplicaManager.scala:191) at kafka.server.ReplicaManager$$anonfun$stopReplicas$3.apply(ReplicaManager.scala:190) at scala.collection.immutable.Set$Set1.foreach(Set.scala:74) at kafka.server.ReplicaManager.stopReplicas(ReplicaManager.scala:190) at kafka.server.KafkaApis.handleStopReplicaRequest(KafkaApis.scala:96) at kafka.server.KafkaApis.handle(KafkaApis.scala:59) at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:59) at java.lang.Thread.run(Thread.java:744) When I have investigated the issue I figured out that the index file (in my environment it was C:\tmp\kafka-logs\----0014-0\.index) was locked by the kafka process and the OS did not allow to delete that file. I tried to fix the problem in source codes and when I added close() method call into LogSegment.delete(), the Topic deletion started to work. I will add here (not sure how to upload the file during issue creation) a diff with the changes I have made so You can take a look on that whether it is reasonable or not. It would be perfect if it could make it into the product... In the end I would like to say that on Linux the deletion works just fine... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1757) Can not delete Topic index on Windows
[ https://issues.apache.org/jira/browse/KAFKA-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriharsha Chintalapani updated KAFKA-1757: -- Resolution: Fixed Status: Resolved (was: Patch Available) Can not delete Topic index on Windows - Key: KAFKA-1757 URL: https://issues.apache.org/jira/browse/KAFKA-1757 Project: Kafka Issue Type: Bug Components: log Affects Versions: 0.8.2.0 Reporter: Lukáš Vyhlídka Assignee: Sriharsha Chintalapani Priority: Minor Fix For: 0.8.3 Attachments: KAFKA-1757.patch, lucky-v.patch When running the Kafka 0.8.2-Beta (Scala 2.10) on Windows, an attempt to delete the Topic throwed an error: ERROR [KafkaApi-1] error when handling request Name: StopReplicaRequest; Version: 0; CorrelationId: 38; ClientId: ; DeletePartitions: true; ControllerId: 0; ControllerEpoch: 3; Partitions: [test,0] (kafka.server.KafkaApis) kafka.common.KafkaStorageException: Delete of index .index failed. at kafka.log.LogSegment.delete(LogSegment.scala:283) at kafka.log.Log$$anonfun$delete$1.apply(Log.scala:608) at kafka.log.Log$$anonfun$delete$1.apply(Log.scala:608) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at kafka.log.Log.delete(Log.scala:608) at kafka.log.LogManager.deleteLog(LogManager.scala:375) at kafka.cluster.Partition$$anonfun$delete$1.apply$mcV$sp(Partition.scala:144) at kafka.cluster.Partition$$anonfun$delete$1.apply(Partition.scala:139) at kafka.cluster.Partition$$anonfun$delete$1.apply(Partition.scala:139) at kafka.utils.Utils$.inLock(Utils.scala:535) at kafka.utils.Utils$.inWriteLock(Utils.scala:543) at kafka.cluster.Partition.delete(Partition.scala:139) at kafka.server.ReplicaManager.stopReplica(ReplicaManager.scala:158) at kafka.server.ReplicaManager$$anonfun$stopReplicas$3.apply(ReplicaManager.scala:191) at kafka.server.ReplicaManager$$anonfun$stopReplicas$3.apply(ReplicaManager.scala:190) at scala.collection.immutable.Set$Set1.foreach(Set.scala:74) at kafka.server.ReplicaManager.stopReplicas(ReplicaManager.scala:190) at kafka.server.KafkaApis.handleStopReplicaRequest(KafkaApis.scala:96) at kafka.server.KafkaApis.handle(KafkaApis.scala:59) at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:59) at java.lang.Thread.run(Thread.java:744) When I have investigated the issue I figured out that the index file (in my environment it was C:\tmp\kafka-logs\----0014-0\.index) was locked by the kafka process and the OS did not allow to delete that file. I tried to fix the problem in source codes and when I added close() method call into LogSegment.delete(), the Topic deletion started to work. I will add here (not sure how to upload the file during issue creation) a diff with the changes I have made so You can take a look on that whether it is reasonable or not. It would be perfect if it could make it into the product... In the end I would like to say that on Linux the deletion works just fine... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1724) Errors after reboot in single node setup
[ https://issues.apache.org/jira/browse/KAFKA-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334148#comment-14334148 ] Sriharsha Chintalapani commented on KAFKA-1724: --- [~junrao] Yes. We've isStarted in KafkaScheduler which gets set after its started and in shutdown we check isStarted and go through shutdown process. Tested it in a cluster to reproduce don't see any errors. Errors after reboot in single node setup Key: KAFKA-1724 URL: https://issues.apache.org/jira/browse/KAFKA-1724 Project: Kafka Issue Type: Bug Affects Versions: 0.8.2.0 Reporter: Ciprian Hacman Assignee: Sriharsha Chintalapani Labels: newbie Fix For: 0.8.3 Attachments: KAFKA-1724.patch In a single node setup, after reboot, Kafka logs show the following: {code} [2014-10-22 16:37:22,206] INFO [Controller 0]: Controller starting up (kafka.controller.KafkaController) [2014-10-22 16:37:22,419] INFO [Controller 0]: Controller startup complete (kafka.controller.KafkaController) [2014-10-22 16:37:22,554] INFO conflict in /brokers/ids/0 data: {jmx_port:-1,timestamp:1413995842465,host:ip-10-91-142-54.eu-west-1.compute.internal,version:1,port:9092} stored data: {jmx_port:-1,timestamp:1413994171579,host:ip-10-91-142-54.eu-west-1.compute.internal,version:1,port:9092} (kafka.utils.ZkUtils$) [2014-10-22 16:37:22,736] INFO I wrote this conflicted ephemeral node [{jmx_port:-1,timestamp:1413995842465,host:ip-10-91-142-54.eu-west-1.compute.internal,version:1,port:9092}] at /brokers/ids/0 a while back in a different session, hence I will backoff for this node to be deleted by Zookeeper and retry (kafka.utils.ZkUtils$) [2014-10-22 16:37:25,010] ERROR Error handling event ZkEvent[Data of /controller changed sent to kafka.server.ZookeeperLeaderElector$LeaderChangeListener@a6af882] (org.I0Itec.zkclient.ZkEventThread) java.lang.IllegalStateException: Kafka scheduler has not been started at kafka.utils.KafkaScheduler.ensureStarted(KafkaScheduler.scala:114) at kafka.utils.KafkaScheduler.shutdown(KafkaScheduler.scala:86) at kafka.controller.KafkaController.onControllerResignation(KafkaController.scala:350) at kafka.controller.KafkaController$$anonfun$2.apply$mcV$sp(KafkaController.scala:162) at kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply$mcZ$sp(ZookeeperLeaderElector.scala:138) at kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply(ZookeeperLeaderElector.scala:134) at kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply(ZookeeperLeaderElector.scala:134) at kafka.utils.Utils$.inLock(Utils.scala:535) at kafka.server.ZookeeperLeaderElector$LeaderChangeListener.handleDataDeleted(ZookeeperLeaderElector.scala:134) at org.I0Itec.zkclient.ZkClient$6.run(ZkClient.java:549) at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71) [2014-10-22 16:37:28,757] INFO Registered broker 0 at path /brokers/ids/0 with address ip-10-91-142-54.eu-west-1.compute.internal:9092. (kafka.utils.ZkUtils$) [2014-10-22 16:37:28,849] INFO [Kafka Server 0], started (kafka.server.KafkaServer) [2014-10-22 16:38:56,718] INFO Closing socket connection to /127.0.0.1. (kafka.network.Processor) [2014-10-22 16:38:56,850] INFO Closing socket connection to /127.0.0.1. (kafka.network.Processor) [2014-10-22 16:38:56,985] INFO Closing socket connection to /127.0.0.1. (kafka.network.Processor) {code} The last log line repeats forever and is correlated with errors on the app side. Restarting Kafka fixes the errors. Steps to reproduce (with help from the mailing list): # start zookeeper # start kafka-broker # create topic or start a producer writing to a topic # stop zookeeper # stop kafka-broker( kafka broker shutdown goes into WARN Session 0x14938d9dc010001 for server null, unexpected error, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn) java.net.ConnectException: Connection refused) # kill -9 kafka-broker # restart zookeeper and than kafka-broker leads into the the error above -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1724) Errors after reboot in single node setup
[ https://issues.apache.org/jira/browse/KAFKA-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334088#comment-14334088 ] Sriharsha Chintalapani commented on KAFKA-1724: --- [~junrao] Thanks for the comments on the patch. So it looks like this is already fixed in the trunk. We can close this JIRA. Errors after reboot in single node setup Key: KAFKA-1724 URL: https://issues.apache.org/jira/browse/KAFKA-1724 Project: Kafka Issue Type: Bug Affects Versions: 0.8.2.0 Reporter: Ciprian Hacman Assignee: Sriharsha Chintalapani Labels: newbie Fix For: 0.8.3 Attachments: KAFKA-1724.patch In a single node setup, after reboot, Kafka logs show the following: {code} [2014-10-22 16:37:22,206] INFO [Controller 0]: Controller starting up (kafka.controller.KafkaController) [2014-10-22 16:37:22,419] INFO [Controller 0]: Controller startup complete (kafka.controller.KafkaController) [2014-10-22 16:37:22,554] INFO conflict in /brokers/ids/0 data: {jmx_port:-1,timestamp:1413995842465,host:ip-10-91-142-54.eu-west-1.compute.internal,version:1,port:9092} stored data: {jmx_port:-1,timestamp:1413994171579,host:ip-10-91-142-54.eu-west-1.compute.internal,version:1,port:9092} (kafka.utils.ZkUtils$) [2014-10-22 16:37:22,736] INFO I wrote this conflicted ephemeral node [{jmx_port:-1,timestamp:1413995842465,host:ip-10-91-142-54.eu-west-1.compute.internal,version:1,port:9092}] at /brokers/ids/0 a while back in a different session, hence I will backoff for this node to be deleted by Zookeeper and retry (kafka.utils.ZkUtils$) [2014-10-22 16:37:25,010] ERROR Error handling event ZkEvent[Data of /controller changed sent to kafka.server.ZookeeperLeaderElector$LeaderChangeListener@a6af882] (org.I0Itec.zkclient.ZkEventThread) java.lang.IllegalStateException: Kafka scheduler has not been started at kafka.utils.KafkaScheduler.ensureStarted(KafkaScheduler.scala:114) at kafka.utils.KafkaScheduler.shutdown(KafkaScheduler.scala:86) at kafka.controller.KafkaController.onControllerResignation(KafkaController.scala:350) at kafka.controller.KafkaController$$anonfun$2.apply$mcV$sp(KafkaController.scala:162) at kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply$mcZ$sp(ZookeeperLeaderElector.scala:138) at kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply(ZookeeperLeaderElector.scala:134) at kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply(ZookeeperLeaderElector.scala:134) at kafka.utils.Utils$.inLock(Utils.scala:535) at kafka.server.ZookeeperLeaderElector$LeaderChangeListener.handleDataDeleted(ZookeeperLeaderElector.scala:134) at org.I0Itec.zkclient.ZkClient$6.run(ZkClient.java:549) at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71) [2014-10-22 16:37:28,757] INFO Registered broker 0 at path /brokers/ids/0 with address ip-10-91-142-54.eu-west-1.compute.internal:9092. (kafka.utils.ZkUtils$) [2014-10-22 16:37:28,849] INFO [Kafka Server 0], started (kafka.server.KafkaServer) [2014-10-22 16:38:56,718] INFO Closing socket connection to /127.0.0.1. (kafka.network.Processor) [2014-10-22 16:38:56,850] INFO Closing socket connection to /127.0.0.1. (kafka.network.Processor) [2014-10-22 16:38:56,985] INFO Closing socket connection to /127.0.0.1. (kafka.network.Processor) {code} The last log line repeats forever and is correlated with errors on the app side. Restarting Kafka fixes the errors. Steps to reproduce (with help from the mailing list): # start zookeeper # start kafka-broker # create topic or start a producer writing to a topic # stop zookeeper # stop kafka-broker( kafka broker shutdown goes into WARN Session 0x14938d9dc010001 for server null, unexpected error, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn) java.net.ConnectException: Connection refused) # kill -9 kafka-broker # restart zookeeper and than kafka-broker leads into the the error above -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1976) transient unit test failure in ProducerFailureHandlingTest.testNotEnoughReplicasAfterBrokerShutdown
[ https://issues.apache.org/jira/browse/KAFKA-1976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14332514#comment-14332514 ] Sriharsha Chintalapani commented on KAFKA-1976: --- [~junrao] I covered this issue as part of this JIRA https://issues.apache.org/jira/browse/KAFKA-1887 transient unit test failure in ProducerFailureHandlingTest.testNotEnoughReplicasAfterBrokerShutdown --- Key: KAFKA-1976 URL: https://issues.apache.org/jira/browse/KAFKA-1976 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.8.3 Reporter: Jun Rao Saw the following failure a few times. kafka.api.test.ProducerFailureHandlingTest testNotEnoughReplicasAfterBrokerShutdown FAILED org.scalatest.junit.JUnitTestFailedError: Expected NotEnoughReplicasException when producing to topic with fewer brokers than min.insync.replicas at org.scalatest.junit.AssertionsForJUnit$class.newAssertionFailedException(AssertionsForJUnit.scala:101) at org.scalatest.junit.JUnit3Suite.newAssertionFailedException(JUnit3Suite.scala:149) at org.scalatest.Assertions$class.fail(Assertions.scala:711) at org.scalatest.junit.JUnit3Suite.fail(JUnit3Suite.scala:149) at kafka.api.test.ProducerFailureHandlingTest.testNotEnoughReplicasAfterBrokerShutdown(ProducerFailureHandlingTest.scala:352) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1566) Kafka environment configuration (kafka-env.sh)
[ https://issues.apache.org/jira/browse/KAFKA-1566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriharsha Chintalapani updated KAFKA-1566: -- Attachment: KAFKA-1566_2015-02-21_21:57:02.patch Kafka environment configuration (kafka-env.sh) -- Key: KAFKA-1566 URL: https://issues.apache.org/jira/browse/KAFKA-1566 Project: Kafka Issue Type: Improvement Components: tools Reporter: Cosmin Lehene Assignee: Sriharsha Chintalapani Labels: newbie Fix For: 0.8.3 Attachments: KAFKA-1566.patch, KAFKA-1566_2015-02-21_21:57:02.patch It would be useful (especially for automated deployments) to have an environment configuration file that could be sourced from the launcher files (e.g. kafka-run-server.sh). This is how this could look like kafka-env.sh {code} export KAFKA_JVM_PERFORMANCE_OPTS=-XX:+UseCompressedOops -XX:+DisableExplicitGC -Djava.awt.headless=true \ -XX:+UseG1GC -XX:PermSize=48m -XX:MaxPermSize=48m -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35' % export KAFKA_HEAP_OPTS='-Xmx1G -Xms1G' % export KAFKA_LOG4J_OPTS=-Dkafka.logs.dir=/var/log/kafka {code} kafka-server-start.sh {code} ... source $base_dir/config/kafka-env.sh ... {code} This approach is consistent with Hadoop and HBase. However the idea here is to be able to set these values in a single place without having to edit startup scripts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1566) Kafka environment configuration (kafka-env.sh)
[ https://issues.apache.org/jira/browse/KAFKA-1566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14332047#comment-14332047 ] Sriharsha Chintalapani commented on KAFKA-1566: --- Updated reviewboard https://reviews.apache.org/r/29724/diff/ against branch origin/trunk Kafka environment configuration (kafka-env.sh) -- Key: KAFKA-1566 URL: https://issues.apache.org/jira/browse/KAFKA-1566 Project: Kafka Issue Type: Improvement Components: tools Reporter: Cosmin Lehene Assignee: Sriharsha Chintalapani Labels: newbie Fix For: 0.8.3 Attachments: KAFKA-1566.patch, KAFKA-1566_2015-02-21_21:57:02.patch It would be useful (especially for automated deployments) to have an environment configuration file that could be sourced from the launcher files (e.g. kafka-run-server.sh). This is how this could look like kafka-env.sh {code} export KAFKA_JVM_PERFORMANCE_OPTS=-XX:+UseCompressedOops -XX:+DisableExplicitGC -Djava.awt.headless=true \ -XX:+UseG1GC -XX:PermSize=48m -XX:MaxPermSize=48m -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35' % export KAFKA_HEAP_OPTS='-Xmx1G -Xms1G' % export KAFKA_LOG4J_OPTS=-Dkafka.logs.dir=/var/log/kafka {code} kafka-server-start.sh {code} ... source $base_dir/config/kafka-env.sh ... {code} This approach is consistent with Hadoop and HBase. However the idea here is to be able to set these values in a single place without having to edit startup scripts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1566) Kafka environment configuration (kafka-env.sh)
[ https://issues.apache.org/jira/browse/KAFKA-1566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14332028#comment-14332028 ] Sriharsha Chintalapani commented on KAFKA-1566: --- [~nehanarkhede] will rebase and send a new patch. [~omkreddy] makes sense will add to the new patch. Kafka environment configuration (kafka-env.sh) -- Key: KAFKA-1566 URL: https://issues.apache.org/jira/browse/KAFKA-1566 Project: Kafka Issue Type: Improvement Components: tools Reporter: Cosmin Lehene Assignee: Sriharsha Chintalapani Labels: newbie Fix For: 0.8.3 Attachments: KAFKA-1566.patch It would be useful (especially for automated deployments) to have an environment configuration file that could be sourced from the launcher files (e.g. kafka-run-server.sh). This is how this could look like kafka-env.sh {code} export KAFKA_JVM_PERFORMANCE_OPTS=-XX:+UseCompressedOops -XX:+DisableExplicitGC -Djava.awt.headless=true \ -XX:+UseG1GC -XX:PermSize=48m -XX:MaxPermSize=48m -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35' % export KAFKA_HEAP_OPTS='-Xmx1G -Xms1G' % export KAFKA_LOG4J_OPTS=-Dkafka.logs.dir=/var/log/kafka {code} kafka-server-start.sh {code} ... source $base_dir/config/kafka-env.sh ... {code} This approach is consistent with Hadoop and HBase. However the idea here is to be able to set these values in a single place without having to edit startup scripts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1566) Kafka environment configuration (kafka-env.sh)
[ https://issues.apache.org/jira/browse/KAFKA-1566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14332048#comment-14332048 ] Sriharsha Chintalapani commented on KAFKA-1566: --- [~nehanarkhede] can you please check the latest patch I am able to apply cleanly on the trunk. Thanks. Kafka environment configuration (kafka-env.sh) -- Key: KAFKA-1566 URL: https://issues.apache.org/jira/browse/KAFKA-1566 Project: Kafka Issue Type: Improvement Components: tools Reporter: Cosmin Lehene Assignee: Sriharsha Chintalapani Labels: newbie Fix For: 0.8.3 Attachments: KAFKA-1566.patch, KAFKA-1566_2015-02-21_21:57:02.patch It would be useful (especially for automated deployments) to have an environment configuration file that could be sourced from the launcher files (e.g. kafka-run-server.sh). This is how this could look like kafka-env.sh {code} export KAFKA_JVM_PERFORMANCE_OPTS=-XX:+UseCompressedOops -XX:+DisableExplicitGC -Djava.awt.headless=true \ -XX:+UseG1GC -XX:PermSize=48m -XX:MaxPermSize=48m -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35' % export KAFKA_HEAP_OPTS='-Xmx1G -Xms1G' % export KAFKA_LOG4J_OPTS=-Dkafka.logs.dir=/var/log/kafka {code} kafka-server-start.sh {code} ... source $base_dir/config/kafka-env.sh ... {code} This approach is consistent with Hadoop and HBase. However the idea here is to be able to set these values in a single place without having to edit startup scripts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1887) controller error message on shutting the last broker
[ https://issues.apache.org/jira/browse/KAFKA-1887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14330090#comment-14330090 ] Sriharsha Chintalapani commented on KAFKA-1887: --- Created reviewboard https://reviews.apache.org/r/31256/diff/ against branch origin/trunk controller error message on shutting the last broker Key: KAFKA-1887 URL: https://issues.apache.org/jira/browse/KAFKA-1887 Project: Kafka Issue Type: Bug Components: core Reporter: Jun Rao Assignee: Sriharsha Chintalapani Priority: Minor Fix For: 0.8.3 Attachments: KAFKA-1887.patch We always see the following error in state-change log on shutting down the last broker. [2015-01-20 13:21:04,036] ERROR Controller 0 epoch 3 initiated state change for partition [test,0] from OfflinePartition to OnlinePartition failed (state.change.logger) kafka.common.NoReplicaOnlineException: No replica for partition [test,0] is alive. Live brokers are: [Set()], Assigned replicas are: [List(0)] at kafka.controller.OfflinePartitionLeaderSelector.selectLeader(PartitionLeaderSelector.scala:75) at kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:357) at kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:206) at kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:120) at kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:117) at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) at scala.collection.mutable.HashMap.foreach(HashMap.scala:98) at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771) at kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:117) at kafka.controller.KafkaController.onBrokerFailure(KafkaController.scala:446) at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ReplicaStateMachine.scala:373) at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359) at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359) at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33) at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply$mcV$sp(ReplicaStateMachine.scala:358) at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357) at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357) at kafka.utils.Utils$.inLock(Utils.scala:535) at kafka.controller.ReplicaStateMachine$BrokerChangeListener.handleChildChange(ReplicaStateMachine.scala:356) at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:568) at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1887) controller error message on shutting the last broker
[ https://issues.apache.org/jira/browse/KAFKA-1887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriharsha Chintalapani updated KAFKA-1887: -- Attachment: KAFKA-1887.patch controller error message on shutting the last broker Key: KAFKA-1887 URL: https://issues.apache.org/jira/browse/KAFKA-1887 Project: Kafka Issue Type: Bug Components: core Reporter: Jun Rao Assignee: Sriharsha Chintalapani Priority: Minor Fix For: 0.8.3 Attachments: KAFKA-1887.patch We always see the following error in state-change log on shutting down the last broker. [2015-01-20 13:21:04,036] ERROR Controller 0 epoch 3 initiated state change for partition [test,0] from OfflinePartition to OnlinePartition failed (state.change.logger) kafka.common.NoReplicaOnlineException: No replica for partition [test,0] is alive. Live brokers are: [Set()], Assigned replicas are: [List(0)] at kafka.controller.OfflinePartitionLeaderSelector.selectLeader(PartitionLeaderSelector.scala:75) at kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:357) at kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:206) at kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:120) at kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:117) at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) at scala.collection.mutable.HashMap.foreach(HashMap.scala:98) at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771) at kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:117) at kafka.controller.KafkaController.onBrokerFailure(KafkaController.scala:446) at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ReplicaStateMachine.scala:373) at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359) at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359) at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33) at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply$mcV$sp(ReplicaStateMachine.scala:358) at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357) at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357) at kafka.utils.Utils$.inLock(Utils.scala:535) at kafka.controller.ReplicaStateMachine$BrokerChangeListener.handleChildChange(ReplicaStateMachine.scala:356) at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:568) at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1887) controller error message on shutting the last broker
[ https://issues.apache.org/jira/browse/KAFKA-1887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriharsha Chintalapani updated KAFKA-1887: -- Attachment: KAFKA-1887_2015-02-21_01:12:25.patch controller error message on shutting the last broker Key: KAFKA-1887 URL: https://issues.apache.org/jira/browse/KAFKA-1887 Project: Kafka Issue Type: Bug Components: core Reporter: Jun Rao Assignee: Sriharsha Chintalapani Priority: Minor Fix For: 0.8.3 Attachments: KAFKA-1887.patch, KAFKA-1887_2015-02-21_01:12:25.patch We always see the following error in state-change log on shutting down the last broker. [2015-01-20 13:21:04,036] ERROR Controller 0 epoch 3 initiated state change for partition [test,0] from OfflinePartition to OnlinePartition failed (state.change.logger) kafka.common.NoReplicaOnlineException: No replica for partition [test,0] is alive. Live brokers are: [Set()], Assigned replicas are: [List(0)] at kafka.controller.OfflinePartitionLeaderSelector.selectLeader(PartitionLeaderSelector.scala:75) at kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:357) at kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:206) at kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:120) at kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:117) at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) at scala.collection.mutable.HashMap.foreach(HashMap.scala:98) at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771) at kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:117) at kafka.controller.KafkaController.onBrokerFailure(KafkaController.scala:446) at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ReplicaStateMachine.scala:373) at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359) at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359) at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33) at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply$mcV$sp(ReplicaStateMachine.scala:358) at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357) at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357) at kafka.utils.Utils$.inLock(Utils.scala:535) at kafka.controller.ReplicaStateMachine$BrokerChangeListener.handleChildChange(ReplicaStateMachine.scala:356) at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:568) at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1887) controller error message on shutting the last broker
[ https://issues.apache.org/jira/browse/KAFKA-1887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14330094#comment-14330094 ] Sriharsha Chintalapani commented on KAFKA-1887: --- Updated reviewboard https://reviews.apache.org/r/31256/diff/ against branch origin/trunk controller error message on shutting the last broker Key: KAFKA-1887 URL: https://issues.apache.org/jira/browse/KAFKA-1887 Project: Kafka Issue Type: Bug Components: core Reporter: Jun Rao Assignee: Sriharsha Chintalapani Priority: Minor Fix For: 0.8.3 Attachments: KAFKA-1887.patch, KAFKA-1887_2015-02-21_01:12:25.patch We always see the following error in state-change log on shutting down the last broker. [2015-01-20 13:21:04,036] ERROR Controller 0 epoch 3 initiated state change for partition [test,0] from OfflinePartition to OnlinePartition failed (state.change.logger) kafka.common.NoReplicaOnlineException: No replica for partition [test,0] is alive. Live brokers are: [Set()], Assigned replicas are: [List(0)] at kafka.controller.OfflinePartitionLeaderSelector.selectLeader(PartitionLeaderSelector.scala:75) at kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:357) at kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:206) at kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:120) at kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:117) at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) at scala.collection.mutable.HashMap.foreach(HashMap.scala:98) at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771) at kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:117) at kafka.controller.KafkaController.onBrokerFailure(KafkaController.scala:446) at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ReplicaStateMachine.scala:373) at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359) at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359) at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33) at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply$mcV$sp(ReplicaStateMachine.scala:358) at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357) at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357) at kafka.utils.Utils$.inLock(Utils.scala:535) at kafka.controller.ReplicaStateMachine$BrokerChangeListener.handleChildChange(ReplicaStateMachine.scala:356) at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:568) at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1887) controller error message on shutting the last broker
[ https://issues.apache.org/jira/browse/KAFKA-1887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriharsha Chintalapani updated KAFKA-1887: -- Status: Patch Available (was: Open) controller error message on shutting the last broker Key: KAFKA-1887 URL: https://issues.apache.org/jira/browse/KAFKA-1887 Project: Kafka Issue Type: Bug Components: core Reporter: Jun Rao Assignee: Sriharsha Chintalapani Priority: Minor Fix For: 0.8.3 Attachments: KAFKA-1887.patch We always see the following error in state-change log on shutting down the last broker. [2015-01-20 13:21:04,036] ERROR Controller 0 epoch 3 initiated state change for partition [test,0] from OfflinePartition to OnlinePartition failed (state.change.logger) kafka.common.NoReplicaOnlineException: No replica for partition [test,0] is alive. Live brokers are: [Set()], Assigned replicas are: [List(0)] at kafka.controller.OfflinePartitionLeaderSelector.selectLeader(PartitionLeaderSelector.scala:75) at kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:357) at kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:206) at kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:120) at kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:117) at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) at scala.collection.mutable.HashMap.foreach(HashMap.scala:98) at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771) at kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:117) at kafka.controller.KafkaController.onBrokerFailure(KafkaController.scala:446) at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ReplicaStateMachine.scala:373) at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359) at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359) at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33) at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply$mcV$sp(ReplicaStateMachine.scala:358) at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357) at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357) at kafka.utils.Utils$.inLock(Utils.scala:535) at kafka.controller.ReplicaStateMachine$BrokerChangeListener.handleChildChange(ReplicaStateMachine.scala:356) at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:568) at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1887) controller error message on shutting the last broker
[ https://issues.apache.org/jira/browse/KAFKA-1887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14330093#comment-14330093 ] Sriharsha Chintalapani commented on KAFKA-1887: --- [~nehanarkhede] I moved KafkaController.shutdown() followed by KafkaHealthCheck.shutdown() above SocketServer.shutdown(). 1) Moving kafkaHealthCheck below controller shutdown doesn't trigger ReplicaStateMachine.BrokerChangeListener() 2) because of 1 controllerContext.controllerChannelManager.removeBroker for the current brokerId and it continues exist in controllerContext.controllerChannelManager.brokerStateInfo. 3) when kafkaController.shutdown() gets called it calls controllerChannelManager.shutdown() and it will go through removeExistingBroker for the brokerId whose SocketServer is shutdown causing removeExistingBroker().brokerStateInfo(brokerId).channel.disconnect() throw an exception . Because of this exception KafkaBroker.shutdown() is slowing down. In the above patch moved KafkaController.shutdown and KafkaHealthCheck.shutdown above SocketServer.shutdown() controller error message on shutting the last broker Key: KAFKA-1887 URL: https://issues.apache.org/jira/browse/KAFKA-1887 Project: Kafka Issue Type: Bug Components: core Reporter: Jun Rao Assignee: Sriharsha Chintalapani Priority: Minor Fix For: 0.8.3 Attachments: KAFKA-1887.patch We always see the following error in state-change log on shutting down the last broker. [2015-01-20 13:21:04,036] ERROR Controller 0 epoch 3 initiated state change for partition [test,0] from OfflinePartition to OnlinePartition failed (state.change.logger) kafka.common.NoReplicaOnlineException: No replica for partition [test,0] is alive. Live brokers are: [Set()], Assigned replicas are: [List(0)] at kafka.controller.OfflinePartitionLeaderSelector.selectLeader(PartitionLeaderSelector.scala:75) at kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:357) at kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:206) at kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:120) at kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:117) at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) at scala.collection.mutable.HashMap.foreach(HashMap.scala:98) at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771) at kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:117) at kafka.controller.KafkaController.onBrokerFailure(KafkaController.scala:446) at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ReplicaStateMachine.scala:373) at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359) at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359) at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33) at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply$mcV$sp(ReplicaStateMachine.scala:358) at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357) at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357) at kafka.utils.Utils$.inLock(Utils.scala:535) at kafka.controller.ReplicaStateMachine$BrokerChangeListener.handleChildChange(ReplicaStateMachine.scala:356) at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:568) at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KAFKA-1887) controller error message on shutting the last broker
[ https://issues.apache.org/jira/browse/KAFKA-1887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14330093#comment-14330093 ] Sriharsha Chintalapani edited comment on KAFKA-1887 at 2/21/15 5:53 PM: [~nehanarkhede] I moved KafkaController.shutdown() followed by KafkaHealthCheck.shutdown() above SocketServer.shutdown(). 1) Moving kafkaHealthCheck below controller shutdown doesn't trigger ReplicaStateMachine.BrokerChangeListener() 2) because of 1 controllerContext.controllerChannelManager.removeBroker doesn't gets called for the current brokerId and it continues exist in controllerContext.controllerChannelManager.brokerStateInfo. 3) when kafkaController.shutdown() gets called it calls controllerChannelManager.shutdown() and it will go through removeExistingBroker for the brokerId whose SocketServer is shutdown causing removeExistingBroker().brokerStateInfo(brokerId).channel.disconnect() throw an exception . Because of this exception KafkaBroker.shutdown() is slowing down. In the above patch moved KafkaController.shutdown and KafkaHealthCheck.shutdown above SocketServer.shutdown() was (Author: sriharsha): [~nehanarkhede] I moved KafkaController.shutdown() followed by KafkaHealthCheck.shutdown() above SocketServer.shutdown(). 1) Moving kafkaHealthCheck below controller shutdown doesn't trigger ReplicaStateMachine.BrokerChangeListener() 2) because of 1 controllerContext.controllerChannelManager.removeBroker for the current brokerId and it continues exist in controllerContext.controllerChannelManager.brokerStateInfo. 3) when kafkaController.shutdown() gets called it calls controllerChannelManager.shutdown() and it will go through removeExistingBroker for the brokerId whose SocketServer is shutdown causing removeExistingBroker().brokerStateInfo(brokerId).channel.disconnect() throw an exception . Because of this exception KafkaBroker.shutdown() is slowing down. In the above patch moved KafkaController.shutdown and KafkaHealthCheck.shutdown above SocketServer.shutdown() controller error message on shutting the last broker Key: KAFKA-1887 URL: https://issues.apache.org/jira/browse/KAFKA-1887 Project: Kafka Issue Type: Bug Components: core Reporter: Jun Rao Assignee: Sriharsha Chintalapani Priority: Minor Fix For: 0.8.3 Attachments: KAFKA-1887.patch, KAFKA-1887_2015-02-21_01:12:25.patch We always see the following error in state-change log on shutting down the last broker. [2015-01-20 13:21:04,036] ERROR Controller 0 epoch 3 initiated state change for partition [test,0] from OfflinePartition to OnlinePartition failed (state.change.logger) kafka.common.NoReplicaOnlineException: No replica for partition [test,0] is alive. Live brokers are: [Set()], Assigned replicas are: [List(0)] at kafka.controller.OfflinePartitionLeaderSelector.selectLeader(PartitionLeaderSelector.scala:75) at kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:357) at kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:206) at kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:120) at kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:117) at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) at scala.collection.mutable.HashMap.foreach(HashMap.scala:98) at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771) at kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:117) at kafka.controller.KafkaController.onBrokerFailure(KafkaController.scala:446) at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ReplicaStateMachine.scala:373) at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359) at
[jira] [Commented] (KAFKA-1887) controller error message on shutting the last broker
[ https://issues.apache.org/jira/browse/KAFKA-1887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329122#comment-14329122 ] Sriharsha Chintalapani commented on KAFKA-1887: --- [~nehanarkhede] Another problem I forgot add in my comment above with this fix i.e moving kafkaHealthCheck below the controller shutdown causing broker controlled shutdown to fail . Here is the log on a 2 node cluster with 2 partitions, 2 replication factor topic. This happens after I shutdown second broker and next shutdown the controller I'll get more details on this. [2015-02-20 08:24:34,548] INFO [Kafka Server 0], Starting controlled shutdown (kafka.server.KafkaServer) [2015-02-20 08:24:34,569] INFO [Kafka Server 0], Remaining partitions to move: [test3,0],[test3,1] (kafka.server.KafkaServer) [2015-02-20 08:24:34,570] INFO [Kafka Server 0], Error code from controller: 0 (kafka.server.KafkaServer) [2015-02-20 08:24:39,575] WARN [Kafka Server 0], Retrying controlled shutdown after the previous attempt failed... (kafka.server.KafkaServer) [2015-02-20 08:24:39,596] INFO [Kafka Server 0], Remaining partitions to move: [test3,0],[test3,1] (kafka.server.KafkaServer) [2015-02-20 08:24:39,596] INFO [Kafka Server 0], Error code from controller: 0 (kafka.server.KafkaServer) ^C[2015-02-20 08:24:44,598] WARN [Kafka Server 0], Retrying controlled shutdown after the previous attempt failed... (kafka.server.KafkaServer) [2015-02-20 08:24:44,617] INFO [Kafka Server 0], Remaining partitions to move: [test3,0],[test3,1] (kafka.server.KafkaServer) [2015-02-20 08:24:44,617] INFO [Kafka Server 0], Error code from controller: 0 (kafka.server.KafkaServer) [2015-02-20 08:24:49,620] WARN [Kafka Server 0], Retrying controlled shutdown after the previous attempt failed... (kafka.server.KafkaServer) [2015-02-20 08:24:49,621] INFO Closing socket connection to /192.168.202.1. (kafka.network.Processor) [2015-02-20 08:24:49,621] WARN [Kafka Server 0], Proceeding to do an unclean shutdown as all the controlled shutdown attempts failed (kafka.server.KafkaServer) controller error message on shutting the last broker Key: KAFKA-1887 URL: https://issues.apache.org/jira/browse/KAFKA-1887 Project: Kafka Issue Type: Bug Components: core Reporter: Jun Rao Assignee: Sriharsha Chintalapani Priority: Minor Fix For: 0.8.3 We always see the following error in state-change log on shutting down the last broker. [2015-01-20 13:21:04,036] ERROR Controller 0 epoch 3 initiated state change for partition [test,0] from OfflinePartition to OnlinePartition failed (state.change.logger) kafka.common.NoReplicaOnlineException: No replica for partition [test,0] is alive. Live brokers are: [Set()], Assigned replicas are: [List(0)] at kafka.controller.OfflinePartitionLeaderSelector.selectLeader(PartitionLeaderSelector.scala:75) at kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:357) at kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:206) at kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:120) at kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:117) at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) at scala.collection.mutable.HashMap.foreach(HashMap.scala:98) at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771) at kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:117) at kafka.controller.KafkaController.onBrokerFailure(KafkaController.scala:446) at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ReplicaStateMachine.scala:373) at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359) at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359) at
[jira] [Issue Comment Deleted] (KAFKA-1887) controller error message on shutting the last broker
[ https://issues.apache.org/jira/browse/KAFKA-1887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriharsha Chintalapani updated KAFKA-1887: -- Comment: was deleted (was: [~nehanarkhede] Another problem I forgot add in my comment above with this fix i.e moving kafkaHealthCheck below the controller shutdown causing broker controlled shutdown to fail . Here is the log on a 2 node cluster with 2 partitions, 2 replication factor topic. This happens after I shutdown second broker and next shutdown the controller I'll get more details on this. [2015-02-20 08:24:34,548] INFO [Kafka Server 0], Starting controlled shutdown (kafka.server.KafkaServer) [2015-02-20 08:24:34,569] INFO [Kafka Server 0], Remaining partitions to move: [test3,0],[test3,1] (kafka.server.KafkaServer) [2015-02-20 08:24:34,570] INFO [Kafka Server 0], Error code from controller: 0 (kafka.server.KafkaServer) [2015-02-20 08:24:39,575] WARN [Kafka Server 0], Retrying controlled shutdown after the previous attempt failed... (kafka.server.KafkaServer) [2015-02-20 08:24:39,596] INFO [Kafka Server 0], Remaining partitions to move: [test3,0],[test3,1] (kafka.server.KafkaServer) [2015-02-20 08:24:39,596] INFO [Kafka Server 0], Error code from controller: 0 (kafka.server.KafkaServer) ^C[2015-02-20 08:24:44,598] WARN [Kafka Server 0], Retrying controlled shutdown after the previous attempt failed... (kafka.server.KafkaServer) [2015-02-20 08:24:44,617] INFO [Kafka Server 0], Remaining partitions to move: [test3,0],[test3,1] (kafka.server.KafkaServer) [2015-02-20 08:24:44,617] INFO [Kafka Server 0], Error code from controller: 0 (kafka.server.KafkaServer) [2015-02-20 08:24:49,620] WARN [Kafka Server 0], Retrying controlled shutdown after the previous attempt failed... (kafka.server.KafkaServer) [2015-02-20 08:24:49,621] INFO Closing socket connection to /192.168.202.1. (kafka.network.Processor) [2015-02-20 08:24:49,621] WARN [Kafka Server 0], Proceeding to do an unclean shutdown as all the controlled shutdown attempts failed (kafka.server.KafkaServer) ) controller error message on shutting the last broker Key: KAFKA-1887 URL: https://issues.apache.org/jira/browse/KAFKA-1887 Project: Kafka Issue Type: Bug Components: core Reporter: Jun Rao Assignee: Sriharsha Chintalapani Priority: Minor Fix For: 0.8.3 We always see the following error in state-change log on shutting down the last broker. [2015-01-20 13:21:04,036] ERROR Controller 0 epoch 3 initiated state change for partition [test,0] from OfflinePartition to OnlinePartition failed (state.change.logger) kafka.common.NoReplicaOnlineException: No replica for partition [test,0] is alive. Live brokers are: [Set()], Assigned replicas are: [List(0)] at kafka.controller.OfflinePartitionLeaderSelector.selectLeader(PartitionLeaderSelector.scala:75) at kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:357) at kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:206) at kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:120) at kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:117) at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) at scala.collection.mutable.HashMap.foreach(HashMap.scala:98) at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771) at kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:117) at kafka.controller.KafkaController.onBrokerFailure(KafkaController.scala:446) at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ReplicaStateMachine.scala:373) at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359) at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359) at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
[jira] [Commented] (KAFKA-1887) controller error message on shutting the last broker
[ https://issues.apache.org/jira/browse/KAFKA-1887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329318#comment-14329318 ] Sriharsha Chintalapani commented on KAFKA-1887: --- This can be fixed by moving KafkaHealthcheck.shutdown() after controller.shutdown() as suggested by [~nehanarkhede] [~gwenshap] . We also need to move kafkaHealthCheck.start() before controller.start() otherwise we will see the same error in state-change.log after broker started. The below patch causing the unit tests run time to go from 9min 30 sec to 15mins on my machine and also causing intermittent test failure in ProducerFailureHandlingTest.testNotEnoughReplicasAfterBrokerShutdown as producer.send.get gets an NotEnoughReplicasAfterAppendException instead of NotEnoughReplicasException (probably not related to this patch). I am looking into the test slowness with the patch but if you have any idea/fix for this please go ahead and take the jira I don't want to hold up 0.8.2.1 release. I'll update the jira as soon as I've a fix. {code} +/* tell everyone we are alive */ +kafkaHealthcheck = new KafkaHealthcheck(config.brokerId, config.advertisedHostName, config.advertisedPort, config.zkSessionTimeoutMs, zkClient) +kafkaHealthcheck.startup() + /* start kafka controller */ kafkaController = new KafkaController(config, zkClient, brokerState) kafkaController.startup() @@ -152,10 +156,6 @@ class KafkaServer(val config: KafkaConfig, time: Time = SystemTime) extends Logg topicConfigManager = new TopicConfigManager(zkClient, logManager) topicConfigManager.startup() -/* tell everyone we are alive */ -kafkaHealthcheck = new KafkaHealthcheck(config.brokerId, config.advertisedHostName, config.advertisedPort, config.zkSessionTimeoutMs, zkClient) -kafkaHealthcheck.startup() - /* register broker metrics */ registerStats() @@ -310,8 +310,6 @@ class KafkaServer(val config: KafkaConfig, time: Time = SystemTime) extends Logg if (canShutdown) { Utils.swallow(controlledShutdown()) brokerState.newState(BrokerShuttingDown) -if(kafkaHealthcheck != null) - Utils.swallow(kafkaHealthcheck.shutdown()) if(socketServer != null) Utils.swallow(socketServer.shutdown()) if(requestHandlerPool != null) @@ -329,6 +327,8 @@ class KafkaServer(val config: KafkaConfig, time: Time = SystemTime) extends Logg Utils.swallow(consumerCoordinator.shutdown()) if(kafkaController != null) Utils.swallow(kafkaController.shutdown()) +if(kafkaHealthcheck != null) + Utils.swallow(kafkaHealthcheck.shutdown()) if(zkClient != null) Utils.swallow(zkClient.close()) {code} controller error message on shutting the last broker Key: KAFKA-1887 URL: https://issues.apache.org/jira/browse/KAFKA-1887 Project: Kafka Issue Type: Bug Components: core Reporter: Jun Rao Assignee: Sriharsha Chintalapani Priority: Minor Fix For: 0.8.3 We always see the following error in state-change log on shutting down the last broker. [2015-01-20 13:21:04,036] ERROR Controller 0 epoch 3 initiated state change for partition [test,0] from OfflinePartition to OnlinePartition failed (state.change.logger) kafka.common.NoReplicaOnlineException: No replica for partition [test,0] is alive. Live brokers are: [Set()], Assigned replicas are: [List(0)] at kafka.controller.OfflinePartitionLeaderSelector.selectLeader(PartitionLeaderSelector.scala:75) at kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:357) at kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:206) at kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:120) at kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:117) at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) at scala.collection.mutable.HashMap.foreach(HashMap.scala:98) at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
[jira] [Updated] (KAFKA-1852) OffsetCommitRequest can commit offset on unknown topic
[ https://issues.apache.org/jira/browse/KAFKA-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriharsha Chintalapani updated KAFKA-1852: -- Attachment: KAFKA-1852_2015-02-18_13:13:17.patch OffsetCommitRequest can commit offset on unknown topic -- Key: KAFKA-1852 URL: https://issues.apache.org/jira/browse/KAFKA-1852 Project: Kafka Issue Type: Bug Affects Versions: 0.8.3 Reporter: Jun Rao Assignee: Sriharsha Chintalapani Attachments: KAFKA-1852.patch, KAFKA-1852_2015-01-19_10:44:01.patch, KAFKA-1852_2015-02-12_16:46:10.patch, KAFKA-1852_2015-02-16_13:21:46.patch, KAFKA-1852_2015-02-18_13:13:17.patch Currently, we allow an offset to be committed to Kafka, even when the topic/partition for the offset doesn't exist. We probably should disallow that and send an error back in that case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1852) OffsetCommitRequest can commit offset on unknown topic
[ https://issues.apache.org/jira/browse/KAFKA-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326566#comment-14326566 ] Sriharsha Chintalapani commented on KAFKA-1852: --- Updated reviewboard https://reviews.apache.org/r/29912/diff/ against branch origin/trunk OffsetCommitRequest can commit offset on unknown topic -- Key: KAFKA-1852 URL: https://issues.apache.org/jira/browse/KAFKA-1852 Project: Kafka Issue Type: Bug Affects Versions: 0.8.3 Reporter: Jun Rao Assignee: Sriharsha Chintalapani Attachments: KAFKA-1852.patch, KAFKA-1852_2015-01-19_10:44:01.patch, KAFKA-1852_2015-02-12_16:46:10.patch, KAFKA-1852_2015-02-16_13:21:46.patch, KAFKA-1852_2015-02-18_13:13:17.patch Currently, we allow an offset to be committed to Kafka, even when the topic/partition for the offset doesn't exist. We probably should disallow that and send an error back in that case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1965) Leaner DelayedItem
[ https://issues.apache.org/jira/browse/KAFKA-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326973#comment-14326973 ] Sriharsha Chintalapani commented on KAFKA-1965: --- [~yasuhiro.matsuda] Changes looks good to me. Can you attach it to the review board here are instructions https://cwiki.apache.org/confluence/display/KAFKA/Patch+submission+and+review Leaner DelayedItem -- Key: KAFKA-1965 URL: https://issues.apache.org/jira/browse/KAFKA-1965 Project: Kafka Issue Type: Improvement Components: purgatory Reporter: Yasuhiro Matsuda Assignee: Joel Koshy Priority: Trivial Attachments: KAFKA-1965.patch In DelayedItem, which is a superclass of DelayedOperation, both the creation timestamp and the length delay are stored. However, all it needs is one timestamp that is the due of the item. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1867) liveBroker list not updated on a cluster with no topics
[ https://issues.apache.org/jira/browse/KAFKA-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14324659#comment-14324659 ] Sriharsha Chintalapani commented on KAFKA-1867: --- [~nehanarkhede] pinging for a review. Thanks. liveBroker list not updated on a cluster with no topics --- Key: KAFKA-1867 URL: https://issues.apache.org/jira/browse/KAFKA-1867 Project: Kafka Issue Type: Bug Reporter: Jun Rao Assignee: Sriharsha Chintalapani Fix For: 0.8.3 Attachments: KAFKA-1867.patch, KAFKA-1867.patch, KAFKA-1867_2015-01-25_21:07:47.patch Currently, when there is no topic in a cluster, the controller doesn't send any UpdateMetadataRequest to the broker when it starts up. As a result, the liveBroker list in metadataCache is empty. This means that we will return incorrect broker list in TopicMetatadataResponse. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1852) OffsetCommitRequest can commit offset on unknown topic
[ https://issues.apache.org/jira/browse/KAFKA-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14323316#comment-14323316 ] Sriharsha Chintalapani commented on KAFKA-1852: --- Updated reviewboard https://reviews.apache.org/r/29912/diff/ against branch origin/trunk OffsetCommitRequest can commit offset on unknown topic -- Key: KAFKA-1852 URL: https://issues.apache.org/jira/browse/KAFKA-1852 Project: Kafka Issue Type: Bug Affects Versions: 0.8.3 Reporter: Jun Rao Assignee: Sriharsha Chintalapani Attachments: KAFKA-1852.patch, KAFKA-1852_2015-01-19_10:44:01.patch, KAFKA-1852_2015-02-12_16:46:10.patch, KAFKA-1852_2015-02-16_13:21:46.patch Currently, we allow an offset to be committed to Kafka, even when the topic/partition for the offset doesn't exist. We probably should disallow that and send an error back in that case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1852) OffsetCommitRequest can commit offset on unknown topic
[ https://issues.apache.org/jira/browse/KAFKA-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14319327#comment-14319327 ] Sriharsha Chintalapani commented on KAFKA-1852: --- Updated reviewboard https://reviews.apache.org/r/29912/diff/ against branch origin/trunk OffsetCommitRequest can commit offset on unknown topic -- Key: KAFKA-1852 URL: https://issues.apache.org/jira/browse/KAFKA-1852 Project: Kafka Issue Type: Bug Affects Versions: 0.8.3 Reporter: Jun Rao Assignee: Sriharsha Chintalapani Attachments: KAFKA-1852.patch, KAFKA-1852_2015-01-19_10:44:01.patch, KAFKA-1852_2015-02-12_16:46:10.patch Currently, we allow an offset to be committed to Kafka, even when the topic/partition for the offset doesn't exist. We probably should disallow that and send an error back in that case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1852) OffsetCommitRequest can commit offset on unknown topic
[ https://issues.apache.org/jira/browse/KAFKA-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriharsha Chintalapani updated KAFKA-1852: -- Attachment: KAFKA-1852_2015-02-12_16:46:10.patch OffsetCommitRequest can commit offset on unknown topic -- Key: KAFKA-1852 URL: https://issues.apache.org/jira/browse/KAFKA-1852 Project: Kafka Issue Type: Bug Affects Versions: 0.8.3 Reporter: Jun Rao Assignee: Sriharsha Chintalapani Attachments: KAFKA-1852.patch, KAFKA-1852_2015-01-19_10:44:01.patch, KAFKA-1852_2015-02-12_16:46:10.patch Currently, we allow an offset to be committed to Kafka, even when the topic/partition for the offset doesn't exist. We probably should disallow that and send an error back in that case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1852) OffsetCommitRequest can commit offset on unknown topic
[ https://issues.apache.org/jira/browse/KAFKA-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14319332#comment-14319332 ] Sriharsha Chintalapani commented on KAFKA-1852: --- Thanks [~jjkoshy] for the review. I added your suggestion please take a look when you get a chance. OffsetCommitRequest can commit offset on unknown topic -- Key: KAFKA-1852 URL: https://issues.apache.org/jira/browse/KAFKA-1852 Project: Kafka Issue Type: Bug Affects Versions: 0.8.3 Reporter: Jun Rao Assignee: Sriharsha Chintalapani Attachments: KAFKA-1852.patch, KAFKA-1852_2015-01-19_10:44:01.patch, KAFKA-1852_2015-02-12_16:46:10.patch Currently, we allow an offset to be committed to Kafka, even when the topic/partition for the offset doesn't exist. We probably should disallow that and send an error back in that case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1887) controller error message on shutting the last broker
[ https://issues.apache.org/jira/browse/KAFKA-1887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14318312#comment-14318312 ] Sriharsha Chintalapani commented on KAFKA-1887: --- [~gwenshap] I am worried about moving kafkaHealthCheck below as it won't be able to trigger onBrokerFailure in KafkaController. In a cluster this probably fine as there will be another controller gets elected. Also this change causes ProducerFailureHandlingTest.testNotEnoughReplicasAfterBrokerShutdown to fail as producer.send.get gets an NotEnoughReplicasAfterAppendException instead of NotEnoughReplicasException controller error message on shutting the last broker Key: KAFKA-1887 URL: https://issues.apache.org/jira/browse/KAFKA-1887 Project: Kafka Issue Type: Bug Components: core Reporter: Jun Rao Assignee: Sriharsha Chintalapani Priority: Minor Fix For: 0.8.3 We always see the following error in state-change log on shutting down the last broker. [2015-01-20 13:21:04,036] ERROR Controller 0 epoch 3 initiated state change for partition [test,0] from OfflinePartition to OnlinePartition failed (state.change.logger) kafka.common.NoReplicaOnlineException: No replica for partition [test,0] is alive. Live brokers are: [Set()], Assigned replicas are: [List(0)] at kafka.controller.OfflinePartitionLeaderSelector.selectLeader(PartitionLeaderSelector.scala:75) at kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:357) at kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:206) at kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:120) at kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:117) at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) at scala.collection.mutable.HashMap.foreach(HashMap.scala:98) at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771) at kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:117) at kafka.controller.KafkaController.onBrokerFailure(KafkaController.scala:446) at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ReplicaStateMachine.scala:373) at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359) at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359) at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33) at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply$mcV$sp(ReplicaStateMachine.scala:358) at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357) at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357) at kafka.utils.Utils$.inLock(Utils.scala:535) at kafka.controller.ReplicaStateMachine$BrokerChangeListener.handleChildChange(ReplicaStateMachine.scala:356) at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:568) at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1866) LogStartOffset gauge throws exceptions after log.delete()
[ https://issues.apache.org/jira/browse/KAFKA-1866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriharsha Chintalapani updated KAFKA-1866: -- Attachment: KAFKA-1866_2015-02-11_09:25:33.patch LogStartOffset gauge throws exceptions after log.delete() - Key: KAFKA-1866 URL: https://issues.apache.org/jira/browse/KAFKA-1866 Project: Kafka Issue Type: Bug Reporter: Gian Merlino Assignee: Sriharsha Chintalapani Attachments: KAFKA-1866.patch, KAFKA-1866_2015-02-10_22:50:09.patch, KAFKA-1866_2015-02-11_09:25:33.patch The LogStartOffset gauge does logSegments.head.baseOffset, which throws NoSuchElementException on an empty list, which can occur after a delete() of the log. This makes life harder for custom MetricsReporters, since they have to deal with .value() possibly throwing an exception. Locally we're dealing with this by having Log.delete() also call removeMetric on all the gauges. That also has the benefit of not having a bunch of metrics floating around for logs that the broker is not actually handling. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1757) Can not delete Topic index on Windows
[ https://issues.apache.org/jira/browse/KAFKA-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316454#comment-14316454 ] Sriharsha Chintalapani commented on KAFKA-1757: --- [~junrao] Can you please review the patch. Thanks. Can not delete Topic index on Windows - Key: KAFKA-1757 URL: https://issues.apache.org/jira/browse/KAFKA-1757 Project: Kafka Issue Type: Bug Components: log Affects Versions: 0.8.2.0 Reporter: Lukáš Vyhlídka Assignee: Sriharsha Chintalapani Priority: Minor Fix For: 0.8.3 Attachments: KAFKA-1757.patch, lucky-v.patch When running the Kafka 0.8.2-Beta (Scala 2.10) on Windows, an attempt to delete the Topic throwed an error: ERROR [KafkaApi-1] error when handling request Name: StopReplicaRequest; Version: 0; CorrelationId: 38; ClientId: ; DeletePartitions: true; ControllerId: 0; ControllerEpoch: 3; Partitions: [test,0] (kafka.server.KafkaApis) kafka.common.KafkaStorageException: Delete of index .index failed. at kafka.log.LogSegment.delete(LogSegment.scala:283) at kafka.log.Log$$anonfun$delete$1.apply(Log.scala:608) at kafka.log.Log$$anonfun$delete$1.apply(Log.scala:608) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at kafka.log.Log.delete(Log.scala:608) at kafka.log.LogManager.deleteLog(LogManager.scala:375) at kafka.cluster.Partition$$anonfun$delete$1.apply$mcV$sp(Partition.scala:144) at kafka.cluster.Partition$$anonfun$delete$1.apply(Partition.scala:139) at kafka.cluster.Partition$$anonfun$delete$1.apply(Partition.scala:139) at kafka.utils.Utils$.inLock(Utils.scala:535) at kafka.utils.Utils$.inWriteLock(Utils.scala:543) at kafka.cluster.Partition.delete(Partition.scala:139) at kafka.server.ReplicaManager.stopReplica(ReplicaManager.scala:158) at kafka.server.ReplicaManager$$anonfun$stopReplicas$3.apply(ReplicaManager.scala:191) at kafka.server.ReplicaManager$$anonfun$stopReplicas$3.apply(ReplicaManager.scala:190) at scala.collection.immutable.Set$Set1.foreach(Set.scala:74) at kafka.server.ReplicaManager.stopReplicas(ReplicaManager.scala:190) at kafka.server.KafkaApis.handleStopReplicaRequest(KafkaApis.scala:96) at kafka.server.KafkaApis.handle(KafkaApis.scala:59) at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:59) at java.lang.Thread.run(Thread.java:744) When I have investigated the issue I figured out that the index file (in my environment it was C:\tmp\kafka-logs\----0014-0\.index) was locked by the kafka process and the OS did not allow to delete that file. I tried to fix the problem in source codes and when I added close() method call into LogSegment.delete(), the Topic deletion started to work. I will add here (not sure how to upload the file during issue creation) a diff with the changes I have made so You can take a look on that whether it is reasonable or not. It would be perfect if it could make it into the product... In the end I would like to say that on Linux the deletion works just fine... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1866) LogStartOffset gauge throws exceptions after log.delete()
[ https://issues.apache.org/jira/browse/KAFKA-1866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316572#comment-14316572 ] Sriharsha Chintalapani commented on KAFKA-1866: --- Updated reviewboard https://reviews.apache.org/r/30084/diff/ against branch origin/trunk LogStartOffset gauge throws exceptions after log.delete() - Key: KAFKA-1866 URL: https://issues.apache.org/jira/browse/KAFKA-1866 Project: Kafka Issue Type: Bug Reporter: Gian Merlino Assignee: Sriharsha Chintalapani Attachments: KAFKA-1866.patch, KAFKA-1866_2015-02-10_22:50:09.patch, KAFKA-1866_2015-02-11_09:25:33.patch The LogStartOffset gauge does logSegments.head.baseOffset, which throws NoSuchElementException on an empty list, which can occur after a delete() of the log. This makes life harder for custom MetricsReporters, since they have to deal with .value() possibly throwing an exception. Locally we're dealing with this by having Log.delete() also call removeMetric on all the gauges. That also has the benefit of not having a bunch of metrics floating around for logs that the broker is not actually handling. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1566) Kafka environment configuration (kafka-env.sh)
[ https://issues.apache.org/jira/browse/KAFKA-1566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316455#comment-14316455 ] Sriharsha Chintalapani commented on KAFKA-1566: --- [~jkreps] [~nehanarkhede] Can you please review this. kafka-env.sh will allow the flexibility of defining a custom java_home for the users. Kafka environment configuration (kafka-env.sh) -- Key: KAFKA-1566 URL: https://issues.apache.org/jira/browse/KAFKA-1566 Project: Kafka Issue Type: Improvement Components: tools Reporter: Cosmin Lehene Assignee: Sriharsha Chintalapani Labels: newbie Fix For: 0.8.3 Attachments: KAFKA-1566.patch It would be useful (especially for automated deployments) to have an environment configuration file that could be sourced from the launcher files (e.g. kafka-run-server.sh). This is how this could look like kafka-env.sh {code} export KAFKA_JVM_PERFORMANCE_OPTS=-XX:+UseCompressedOops -XX:+DisableExplicitGC -Djava.awt.headless=true \ -XX:+UseG1GC -XX:PermSize=48m -XX:MaxPermSize=48m -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35' % export KAFKA_HEAP_OPTS='-Xmx1G -Xms1G' % export KAFKA_LOG4J_OPTS=-Dkafka.logs.dir=/var/log/kafka {code} kafka-server-start.sh {code} ... source $base_dir/config/kafka-env.sh ... {code} This approach is consistent with Hadoop and HBase. However the idea here is to be able to set these values in a single place without having to edit startup scripts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1852) OffsetCommitRequest can commit offset on unknown topic
[ https://issues.apache.org/jira/browse/KAFKA-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316447#comment-14316447 ] Sriharsha Chintalapani commented on KAFKA-1852: --- [~jjkoshy] pinging for a review. Thanks. OffsetCommitRequest can commit offset on unknown topic -- Key: KAFKA-1852 URL: https://issues.apache.org/jira/browse/KAFKA-1852 Project: Kafka Issue Type: Bug Affects Versions: 0.8.3 Reporter: Jun Rao Assignee: Sriharsha Chintalapani Attachments: KAFKA-1852.patch, KAFKA-1852_2015-01-19_10:44:01.patch Currently, we allow an offset to be committed to Kafka, even when the topic/partition for the offset doesn't exist. We probably should disallow that and send an error back in that case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (KAFKA-1887) controller error message on shutting the last broker
[ https://issues.apache.org/jira/browse/KAFKA-1887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriharsha Chintalapani reassigned KAFKA-1887: - Assignee: Sriharsha Chintalapani controller error message on shutting the last broker Key: KAFKA-1887 URL: https://issues.apache.org/jira/browse/KAFKA-1887 Project: Kafka Issue Type: Bug Components: core Reporter: Jun Rao Assignee: Sriharsha Chintalapani Priority: Minor Fix For: 0.8.3 We always see the following error in state-change log on shutting down the last broker. [2015-01-20 13:21:04,036] ERROR Controller 0 epoch 3 initiated state change for partition [test,0] from OfflinePartition to OnlinePartition failed (state.change.logger) kafka.common.NoReplicaOnlineException: No replica for partition [test,0] is alive. Live brokers are: [Set()], Assigned replicas are: [List(0)] at kafka.controller.OfflinePartitionLeaderSelector.selectLeader(PartitionLeaderSelector.scala:75) at kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:357) at kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:206) at kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:120) at kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:117) at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) at scala.collection.mutable.HashMap.foreach(HashMap.scala:98) at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771) at kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:117) at kafka.controller.KafkaController.onBrokerFailure(KafkaController.scala:446) at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ReplicaStateMachine.scala:373) at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359) at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359) at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33) at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply$mcV$sp(ReplicaStateMachine.scala:358) at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357) at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357) at kafka.utils.Utils$.inLock(Utils.scala:535) at kafka.controller.ReplicaStateMachine$BrokerChangeListener.handleChildChange(ReplicaStateMachine.scala:356) at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:568) at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1926) Replace kafka.utils.Utils with o.a.k.common.utils.Utils
[ https://issues.apache.org/jira/browse/KAFKA-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316319#comment-14316319 ] Sriharsha Chintalapani commented on KAFKA-1926: --- [~tongli] you should make patch against trunk. Replace kafka.utils.Utils with o.a.k.common.utils.Utils --- Key: KAFKA-1926 URL: https://issues.apache.org/jira/browse/KAFKA-1926 Project: Kafka Issue Type: Improvement Affects Versions: 0.8.2.0 Reporter: Jay Kreps Labels: newbie, patch Attachments: KAFKA-1926.patch There is currently a lot of duplication between the Utils class in common and the one in core. Our plan has been to deprecate duplicate code in the server and replace it with the new common code. As such we should evaluate each method in the scala Utils and do one of the following: 1. Migrate it to o.a.k.common.utils.Utils if it is a sensible general purpose utility in active use that is not Kafka-specific. If we migrate it we should really think about the API and make sure there is some test coverage. A few things in there are kind of funky and we shouldn't just blindly copy them over. 2. Create a new class ServerUtils or ScalaUtils in kafka.utils that will hold any utilities that really need to make use of Scala features to be convenient. 3. Delete it if it is not used, or has a bad api. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1887) controller error message on shutting the last broker
[ https://issues.apache.org/jira/browse/KAFKA-1887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14317678#comment-14317678 ] Sriharsha Chintalapani commented on KAFKA-1887: --- [~junrao] Shutting down the controller before shutting down broker is not a good idea as broker does a controlledShutdown which sends ControllShutdownRequest and there won't be any controller handling these requests. More over if we don't shutdown broker first and if it happens to be the last broker in the cluster it won't go through onBrokerFailover which will move the partitions to offlinestate. I am not sure if there is good way to fix this as it does go through the expected steps. Please let me know if you have any ideas. controller error message on shutting the last broker Key: KAFKA-1887 URL: https://issues.apache.org/jira/browse/KAFKA-1887 Project: Kafka Issue Type: Bug Components: core Reporter: Jun Rao Assignee: Sriharsha Chintalapani Priority: Minor Fix For: 0.8.3 We always see the following error in state-change log on shutting down the last broker. [2015-01-20 13:21:04,036] ERROR Controller 0 epoch 3 initiated state change for partition [test,0] from OfflinePartition to OnlinePartition failed (state.change.logger) kafka.common.NoReplicaOnlineException: No replica for partition [test,0] is alive. Live brokers are: [Set()], Assigned replicas are: [List(0)] at kafka.controller.OfflinePartitionLeaderSelector.selectLeader(PartitionLeaderSelector.scala:75) at kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:357) at kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:206) at kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:120) at kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:117) at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) at scala.collection.mutable.HashMap.foreach(HashMap.scala:98) at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771) at kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:117) at kafka.controller.KafkaController.onBrokerFailure(KafkaController.scala:446) at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ReplicaStateMachine.scala:373) at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359) at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359) at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33) at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply$mcV$sp(ReplicaStateMachine.scala:358) at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357) at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357) at kafka.utils.Utils$.inLock(Utils.scala:535) at kafka.controller.ReplicaStateMachine$BrokerChangeListener.handleChildChange(ReplicaStateMachine.scala:356) at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:568) at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1926) Replace kafka.utils.Utils with o.a.k.common.utils.Utils
[ https://issues.apache.org/jira/browse/KAFKA-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314920#comment-14314920 ] Sriharsha Chintalapani commented on KAFKA-1926: --- [~tongli] you might want to check these https://cwiki.apache.org/confluence/display/KAFKA/Patch+submission+and+review https://cwiki.apache.org/confluence/display/KAFKA/Kafka+patch+review+tool You can submit patch using kafka patch review tool it will upload the patch to https://reviews.apache.org and attaches link here on JIRA. Replace kafka.utils.Utils with o.a.k.common.utils.Utils --- Key: KAFKA-1926 URL: https://issues.apache.org/jira/browse/KAFKA-1926 Project: Kafka Issue Type: Improvement Affects Versions: 0.8.2.0 Reporter: Jay Kreps Labels: newbie, patch Attachments: KAFKA-1926-p1.patch There is currently a lot of duplication between the Utils class in common and the one in core. Our plan has been to deprecate duplicate code in the server and replace it with the new common code. As such we should evaluate each method in the scala Utils and do one of the following: 1. Migrate it to o.a.k.common.utils.Utils if it is a sensible general purpose utility in active use that is not Kafka-specific. If we migrate it we should really think about the API and make sure there is some test coverage. A few things in there are kind of funky and we shouldn't just blindly copy them over. 2. Create a new class ServerUtils or ScalaUtils in kafka.utils that will hold any utilities that really need to make use of Scala features to be convenient. 3. Delete it if it is not used, or has a bad api. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1866) LogStartOffset gauge throws exceptions after log.delete()
[ https://issues.apache.org/jira/browse/KAFKA-1866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14315682#comment-14315682 ] Sriharsha Chintalapani commented on KAFKA-1866: --- Updated reviewboard https://reviews.apache.org/r/30084/diff/ against branch origin/trunk LogStartOffset gauge throws exceptions after log.delete() - Key: KAFKA-1866 URL: https://issues.apache.org/jira/browse/KAFKA-1866 Project: Kafka Issue Type: Bug Reporter: Gian Merlino Assignee: Sriharsha Chintalapani Attachments: KAFKA-1866.patch, KAFKA-1866_2015-02-10_22:50:09.patch The LogStartOffset gauge does logSegments.head.baseOffset, which throws NoSuchElementException on an empty list, which can occur after a delete() of the log. This makes life harder for custom MetricsReporters, since they have to deal with .value() possibly throwing an exception. Locally we're dealing with this by having Log.delete() also call removeMetric on all the gauges. That also has the benefit of not having a bunch of metrics floating around for logs that the broker is not actually handling. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1866) LogStartOffset gauge throws exceptions after log.delete()
[ https://issues.apache.org/jira/browse/KAFKA-1866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriharsha Chintalapani updated KAFKA-1866: -- Attachment: KAFKA-1866_2015-02-10_22:50:09.patch LogStartOffset gauge throws exceptions after log.delete() - Key: KAFKA-1866 URL: https://issues.apache.org/jira/browse/KAFKA-1866 Project: Kafka Issue Type: Bug Reporter: Gian Merlino Assignee: Sriharsha Chintalapani Attachments: KAFKA-1866.patch, KAFKA-1866_2015-02-10_22:50:09.patch The LogStartOffset gauge does logSegments.head.baseOffset, which throws NoSuchElementException on an empty list, which can occur after a delete() of the log. This makes life harder for custom MetricsReporters, since they have to deal with .value() possibly throwing an exception. Locally we're dealing with this by having Log.delete() also call removeMetric on all the gauges. That also has the benefit of not having a bunch of metrics floating around for logs that the broker is not actually handling. -- This message was sent by Atlassian JIRA (v6.3.4#6332)