[jira] [Commented] (KAFKA-1298) Controlled shutdown tool doesn't seem to work out of the box
[ https://issues.apache.org/jira/browse/KAFKA-1298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14019904#comment-14019904 ] Jun Rao commented on KAFKA-1298: Thanks for the patch. This patch seems to have made the unit tests significantly slower than before. For example, the following test took ~26 secs before the patch and more than 2 mins after the patch. ./gradlew -Dtest.single=AddPartitionsTest cleanTest core:test Controlled shutdown tool doesn't seem to work out of the box Key: KAFKA-1298 URL: https://issues.apache.org/jira/browse/KAFKA-1298 Project: Kafka Issue Type: Improvement Reporter: Jay Kreps Assignee: Sriharsha Chintalapani Labels: usability Attachments: KAFKA-1298.patch Download Kafka and try to use our shutdown tool. Got this: bin/kafka-run-class.sh kafka.admin.ShutdownBroker --zookeeper localhost:2181 --broker 0 [2014-03-06 16:58:23,636] ERROR Operation failed due to controller failure (kafka.admin.ShutdownBroker$) java.io.IOException: Failed to retrieve RMIServer stub: javax.naming.ServiceUnavailableException [Root exception is java.rmi.ConnectException: Connection refused to host: jkreps-mn.linkedin.biz; nested exception is: java.net.ConnectException: Connection refused] at javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:340) at javax.management.remote.JMXConnectorFactory.connect(JMXConnectorFactory.java:249) at kafka.admin.ShutdownBroker$.kafka$admin$ShutdownBroker$$invokeShutdown(ShutdownBroker.scala:56) at kafka.admin.ShutdownBroker$.main(ShutdownBroker.scala:109) at kafka.admin.ShutdownBroker.main(ShutdownBroker.scala) Caused by: javax.naming.ServiceUnavailableException [Root exception is java.rmi.ConnectException: Connection refused to host: jkreps-mn.linkedin.biz; nested exception is: java.net.ConnectException: Connection refused] at com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:101) at com.sun.jndi.toolkit.url.GenericURLContext.lookup(GenericURLContext.java:185) at javax.naming.InitialContext.lookup(InitialContext.java:392) at javax.management.remote.rmi.RMIConnector.findRMIServerJNDI(RMIConnector.java:1888) at javax.management.remote.rmi.RMIConnector.findRMIServer(RMIConnector.java:1858) at javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:257) ... 4 more Caused by: java.rmi.ConnectException: Connection refused to host: jkreps-mn.linkedin.biz; nested exception is: java.net.ConnectException: Connection refused at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:601) at sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:198) at sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:184) at sun.rmi.server.UnicastRef.newCall(UnicastRef.java:322) at sun.rmi.registry.RegistryImpl_Stub.lookup(Unknown Source) at com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:97) ... 9 more Caused by: java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:382) at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:241) at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:228) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:431) at java.net.Socket.connect(Socket.java:527) at java.net.Socket.connect(Socket.java:476) at java.net.Socket.init(Socket.java:373) at java.net.Socket.init(Socket.java:187) at sun.rmi.transport.proxy.RMIDirectSocketFactory.createSocket(RMIDirectSocketFactory.java:22) at sun.rmi.transport.proxy.RMIMasterSocketFactory.createSocket(RMIMasterSocketFactory.java:128) at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:595) ... 14 more Oh god, RMI?!!!??? Presumably this is because we stopped setting the JMX port by default. This is good because setting the JMX port breaks the quickstart which requires running multiple nodes on a single machine. The root cause imo is just using RMI here instead of our regular RPC. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 21937: Patch for KAFKA-1316
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/21937/#review44663 --- Thanks for the patch. Could we write short description in the comments on how the refactored producer works? For example, (1) what NetworkClient manages; (2) how metadata is refreshed, etc? clients/src/main/java/org/apache/kafka/clients/ClientRequest.java https://reviews.apache.org/r/21937/#comment79112 Apache header. clients/src/main/java/org/apache/kafka/common/protocol/types/Schema.java https://reviews.apache.org/r/21937/#comment79470 Let's use the same license header in the HEADER file. clients/src/main/java/org/apache/kafka/clients/NetworkClient.java https://reviews.apache.org/r/21937/#comment79489 Couldn't find where this is called in the producer or sender. - Jun Rao On June 3, 2014, 9:33 p.m., Jay Kreps wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/21937/ --- (Updated June 3, 2014, 9:33 p.m.) Review request for kafka. Bugs: KAFKA-1316 https://issues.apache.org/jira/browse/KAFKA-1316 Repository: kafka Description --- KAFKA-1316 Refactor a reusable NetworkClient interface out of Sender. Diffs - clients/src/main/java/org/apache/kafka/clients/ClientRequest.java PRE-CREATION clients/src/main/java/org/apache/kafka/clients/ClientResponse.java PRE-CREATION clients/src/main/java/org/apache/kafka/clients/ClusterConnectionStates.java PRE-CREATION clients/src/main/java/org/apache/kafka/clients/ConnectionState.java PRE-CREATION clients/src/main/java/org/apache/kafka/clients/InFlightRequests.java PRE-CREATION clients/src/main/java/org/apache/kafka/clients/KafkaClient.java PRE-CREATION clients/src/main/java/org/apache/kafka/clients/NetworkClient.java PRE-CREATION clients/src/main/java/org/apache/kafka/clients/NodeConnectionState.java PRE-CREATION clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java d15562a968d9e4b08f26b8d30986881adfe29e31 clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java 9b1f5653548ba90defdae43940a5554066770b0a clients/src/main/java/org/apache/kafka/common/network/Selector.java 3e358985ed72a894a71d683acc7460695d6f2056 clients/src/main/java/org/apache/kafka/common/protocol/types/Schema.java 68b8827f3bdd64580e1b443fce5b8c63152dd94a clients/src/main/java/org/apache/kafka/common/record/MemoryRecords.java 428968cd38a7b12991f87868bf759926ff7e594e clients/src/main/java/org/apache/kafka/common/requests/ProduceRequest.java PRE-CREATION clients/src/main/java/org/apache/kafka/common/requests/ProduceResponse.java 6fa4a58f5f9792776a647e8f682d7faadc0d1556 clients/src/test/java/org/apache/kafka/clients/MockClient.java PRE-CREATION clients/src/test/java/org/apache/kafka/clients/NetworkClientTest.java PRE-CREATION clients/src/test/java/org/apache/kafka/clients/producer/RecordAccumulatorTest.java c4072ae90fb58101a67f83054fbe0b8349e71c2e clients/src/test/java/org/apache/kafka/clients/producer/SenderTest.java 3ef692ca3e9fb83868a9da6f30c0705bb3d0aed2 clients/src/test/java/org/apache/kafka/common/utils/MockTime.java cda8e644587aad9a8c9c96c222edc1ba27de1fb0 core/src/test/scala/integration/kafka/api/ProducerFailureHandlingTest.scala cd4ca2fa77763b090c6ad4ba4a5d46a6a8b76698 Diff: https://reviews.apache.org/r/21937/diff/ Testing --- Thanks, Jay Kreps
[jira] [Commented] (KAFKA-1298) Controlled shutdown tool doesn't seem to work out of the box
[ https://issues.apache.org/jira/browse/KAFKA-1298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020007#comment-14020007 ] Sriharsha Chintalapani commented on KAFKA-1298: --- [~junrao] This patch made controlled.shutdown.enable to true by default. So its kicking off controlled shutdown after each test in AddPartitonsTest which is causing the delay in execution. If I disable the controlled shutdown tests are passing in 22secs. I can send a patch by disabling controlled shutdown for AddPartitionsTest servers. Controlled shutdown tool doesn't seem to work out of the box Key: KAFKA-1298 URL: https://issues.apache.org/jira/browse/KAFKA-1298 Project: Kafka Issue Type: Improvement Reporter: Jay Kreps Assignee: Sriharsha Chintalapani Labels: usability Attachments: KAFKA-1298.patch Download Kafka and try to use our shutdown tool. Got this: bin/kafka-run-class.sh kafka.admin.ShutdownBroker --zookeeper localhost:2181 --broker 0 [2014-03-06 16:58:23,636] ERROR Operation failed due to controller failure (kafka.admin.ShutdownBroker$) java.io.IOException: Failed to retrieve RMIServer stub: javax.naming.ServiceUnavailableException [Root exception is java.rmi.ConnectException: Connection refused to host: jkreps-mn.linkedin.biz; nested exception is: java.net.ConnectException: Connection refused] at javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:340) at javax.management.remote.JMXConnectorFactory.connect(JMXConnectorFactory.java:249) at kafka.admin.ShutdownBroker$.kafka$admin$ShutdownBroker$$invokeShutdown(ShutdownBroker.scala:56) at kafka.admin.ShutdownBroker$.main(ShutdownBroker.scala:109) at kafka.admin.ShutdownBroker.main(ShutdownBroker.scala) Caused by: javax.naming.ServiceUnavailableException [Root exception is java.rmi.ConnectException: Connection refused to host: jkreps-mn.linkedin.biz; nested exception is: java.net.ConnectException: Connection refused] at com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:101) at com.sun.jndi.toolkit.url.GenericURLContext.lookup(GenericURLContext.java:185) at javax.naming.InitialContext.lookup(InitialContext.java:392) at javax.management.remote.rmi.RMIConnector.findRMIServerJNDI(RMIConnector.java:1888) at javax.management.remote.rmi.RMIConnector.findRMIServer(RMIConnector.java:1858) at javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:257) ... 4 more Caused by: java.rmi.ConnectException: Connection refused to host: jkreps-mn.linkedin.biz; nested exception is: java.net.ConnectException: Connection refused at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:601) at sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:198) at sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:184) at sun.rmi.server.UnicastRef.newCall(UnicastRef.java:322) at sun.rmi.registry.RegistryImpl_Stub.lookup(Unknown Source) at com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:97) ... 9 more Caused by: java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:382) at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:241) at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:228) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:431) at java.net.Socket.connect(Socket.java:527) at java.net.Socket.connect(Socket.java:476) at java.net.Socket.init(Socket.java:373) at java.net.Socket.init(Socket.java:187) at sun.rmi.transport.proxy.RMIDirectSocketFactory.createSocket(RMIDirectSocketFactory.java:22) at sun.rmi.transport.proxy.RMIMasterSocketFactory.createSocket(RMIMasterSocketFactory.java:128) at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:595) ... 14 more Oh god, RMI?!!!??? Presumably this is because we stopped setting the JMX port by default. This is good because setting the JMX port breaks the quickstart which requires running multiple nodes on a single machine. The root cause imo is just using RMI here instead of our regular RPC. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (KAFKA-1298) Controlled shutdown tool doesn't seem to work out of the box
[ https://issues.apache.org/jira/browse/KAFKA-1298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020016#comment-14020016 ] Neha Narkhede commented on KAFKA-1298: -- [~sriharsha] The unit tests use some utility in TestUtils to create the server config. In this utility, we can specifically turn off controlled shutdown. But let's leave it as on by default in KafkaConfig Controlled shutdown tool doesn't seem to work out of the box Key: KAFKA-1298 URL: https://issues.apache.org/jira/browse/KAFKA-1298 Project: Kafka Issue Type: Improvement Reporter: Jay Kreps Assignee: Sriharsha Chintalapani Labels: usability Attachments: KAFKA-1298.patch Download Kafka and try to use our shutdown tool. Got this: bin/kafka-run-class.sh kafka.admin.ShutdownBroker --zookeeper localhost:2181 --broker 0 [2014-03-06 16:58:23,636] ERROR Operation failed due to controller failure (kafka.admin.ShutdownBroker$) java.io.IOException: Failed to retrieve RMIServer stub: javax.naming.ServiceUnavailableException [Root exception is java.rmi.ConnectException: Connection refused to host: jkreps-mn.linkedin.biz; nested exception is: java.net.ConnectException: Connection refused] at javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:340) at javax.management.remote.JMXConnectorFactory.connect(JMXConnectorFactory.java:249) at kafka.admin.ShutdownBroker$.kafka$admin$ShutdownBroker$$invokeShutdown(ShutdownBroker.scala:56) at kafka.admin.ShutdownBroker$.main(ShutdownBroker.scala:109) at kafka.admin.ShutdownBroker.main(ShutdownBroker.scala) Caused by: javax.naming.ServiceUnavailableException [Root exception is java.rmi.ConnectException: Connection refused to host: jkreps-mn.linkedin.biz; nested exception is: java.net.ConnectException: Connection refused] at com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:101) at com.sun.jndi.toolkit.url.GenericURLContext.lookup(GenericURLContext.java:185) at javax.naming.InitialContext.lookup(InitialContext.java:392) at javax.management.remote.rmi.RMIConnector.findRMIServerJNDI(RMIConnector.java:1888) at javax.management.remote.rmi.RMIConnector.findRMIServer(RMIConnector.java:1858) at javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:257) ... 4 more Caused by: java.rmi.ConnectException: Connection refused to host: jkreps-mn.linkedin.biz; nested exception is: java.net.ConnectException: Connection refused at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:601) at sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:198) at sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:184) at sun.rmi.server.UnicastRef.newCall(UnicastRef.java:322) at sun.rmi.registry.RegistryImpl_Stub.lookup(Unknown Source) at com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:97) ... 9 more Caused by: java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:382) at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:241) at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:228) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:431) at java.net.Socket.connect(Socket.java:527) at java.net.Socket.connect(Socket.java:476) at java.net.Socket.init(Socket.java:373) at java.net.Socket.init(Socket.java:187) at sun.rmi.transport.proxy.RMIDirectSocketFactory.createSocket(RMIDirectSocketFactory.java:22) at sun.rmi.transport.proxy.RMIMasterSocketFactory.createSocket(RMIMasterSocketFactory.java:128) at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:595) ... 14 more Oh god, RMI?!!!??? Presumably this is because we stopped setting the JMX port by default. This is good because setting the JMX port breaks the quickstart which requires running multiple nodes on a single machine. The root cause imo is just using RMI here instead of our regular RPC. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Closed] (KAFKA-1443) Add delete topic to topic commands and update DeleteTopicCommand
[ https://issues.apache.org/jira/browse/KAFKA-1443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede closed KAFKA-1443. Add delete topic to topic commands and update DeleteTopicCommand Key: KAFKA-1443 URL: https://issues.apache.org/jira/browse/KAFKA-1443 Project: Kafka Issue Type: New Feature Reporter: Timothy Chen Assignee: Timothy Chen Attachments: KAFKA-1443.patch, KAFKA-1443.patch, KAFKA-1443_2014-06-04_20:51:44.patch Add delete topic option to current topic commands -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 22063: Patch for KAFKA-1472
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/22063/#review44913 --- Ship it! Ship It! - Guozhang Wang On June 6, 2014, 5:42 a.m., Dong Lin wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/22063/ --- (Updated June 6, 2014, 5:42 a.m.) Review request for kafka. Bugs: KAFKA-1472 https://issues.apache.org/jira/browse/KAFKA-1472 Repository: kafka Description --- global compression rate Diffs - clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java 9b1f5653548ba90defdae43940a5554066770b0a clients/src/main/java/org/apache/kafka/common/record/Compressor.java 0fa6dd2d6aad3dbe33c9c05406931caae4d8ecf5 clients/src/main/java/org/apache/kafka/common/record/MemoryRecords.java 428968cd38a7b12991f87868bf759926ff7e594e Diff: https://reviews.apache.org/r/22063/diff/ Testing --- Thanks, Dong Lin
Re: Review Request 22131: Patch for KAFKA-1477
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/22131/#review44914 --- Ship it! Looks like the changes we did on this commit https://github.com/relango/kafka/commit/0ec255e94973df995c43818bb09d1246440aded9 is not included in patch. We made those changes to fix BadVersion Error thrown by zookeeper. Hopefully they don't happen anymore with latest code in trunk. If it comes back we can create another patch since it is not related to security. So ok with not including it. - Rajasekar Elango On June 3, 2014, 10:53 a.m., Ivan Lyutov wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/22131/ --- (Updated June 3, 2014, 10:53 a.m.) Review request for kafka. Bugs: KAFKA-1477 https://issues.apache.org/jira/browse/KAFKA-1477 Repository: kafka Description --- Updated according to requested changes: refactoring, minor edits. Reverted gradle version Added SSL for Kafka Minor fixes, cleanup Refactoring Fixed tests compilation error. Diffs - config/client.keystore PRE-CREATION config/client.public-key PRE-CREATION config/client.security.properties PRE-CREATION config/consumer.properties 7343cbc28cf8b8de3f096d09c2be955bea73164f config/producer.properties 39d65d7c6c21f4fccd7af89be6ca12a088d5dd98 config/server.keystore PRE-CREATION config/server.properties c9e923aed8551e0797b1ea6f69628b277faf8f48 config/server.public-key PRE-CREATION config/server.security.properties PRE-CREATION core/src/main/scala/kafka/api/FetchRequest.scala a8b73acd1a813284744359e8434cb52d22063c99 core/src/main/scala/kafka/client/ClientUtils.scala ba5fbdcd9e60f953575e529325caf4c41e22f22d core/src/main/scala/kafka/cluster/Broker.scala 9407ed21fbbd57edeecd888edc32bea6a05d95b3 core/src/main/scala/kafka/common/UnknownKeyStoreException.scala PRE-CREATION core/src/main/scala/kafka/consumer/ConsoleConsumer.scala 1a16c691683dda0c53f316e3c4797ea38e776574 core/src/main/scala/kafka/consumer/ConsumerConfig.scala 1cf2f62ba02e4aa66bfa7575865e5d57baf82212 core/src/main/scala/kafka/consumer/ConsumerFetcherManager.scala b9e2bea7b442a19bcebd1b350d39541a8c9dd068 core/src/main/scala/kafka/consumer/SimpleConsumer.scala 0e64632210385ef63c2ad3445b55ac4f37a63df2 core/src/main/scala/kafka/controller/ControllerChannelManager.scala 8763968fbff697e4c5c98ab1274627c192a4d26a core/src/main/scala/kafka/network/BlockingChannel.scala eb7bb14d94cb3648c06d4de36a3b34aacbde4556 core/src/main/scala/kafka/network/SocketServer.scala 4976d9c3a66bc965f5870a0736e21c7b32650bab core/src/main/scala/kafka/network/security/AuthConfig.scala PRE-CREATION core/src/main/scala/kafka/network/security/KeyStores.scala PRE-CREATION core/src/main/scala/kafka/network/security/SSLSocketChannel.scala PRE-CREATION core/src/main/scala/kafka/network/security/SecureAuth.scala PRE-CREATION core/src/main/scala/kafka/network/security/store/JKSInitializer.scala PRE-CREATION core/src/main/scala/kafka/producer/ConsoleProducer.scala a2af988d99a94a20291d6a2dc9bec73197f1b756 core/src/main/scala/kafka/producer/ProducerConfig.scala 3cdf23dce3407f1770b9c6543e3a8ae8ab3ff255 core/src/main/scala/kafka/producer/ProducerPool.scala 43df70bb461dd3e385e6b20396adef3c4016a3fc core/src/main/scala/kafka/producer/SyncProducer.scala 489f0077512d9a69be81649c490274964290fa40 core/src/main/scala/kafka/producer/SyncProducerConfig.scala 69b2d0c11bb1412ce76d566f285333c806be301a core/src/main/scala/kafka/server/AbstractFetcherThread.scala 3b15254f32252cf824d7a292889ac7662d73ada1 core/src/main/scala/kafka/server/KafkaConfig.scala ef75b67b67676ae5b8931902cbc8c0c2cc72c0d3 core/src/main/scala/kafka/server/KafkaHealthcheck.scala 4acdd70fe9c1ee78d6510741006c2ece65450671 core/src/main/scala/kafka/server/KafkaServer.scala c22e51e0412843ec993721ad3230824c0aadd2ba core/src/main/scala/kafka/tools/ConsumerOffsetChecker.scala 19df757d75fdbb3ff0b434b6cb10338ff5cc32da core/src/main/scala/kafka/tools/GetOffsetShell.scala fba652e3716a67b04431fc46790ad255201b639f core/src/main/scala/kafka/tools/ReplicaVerificationTool.scala 91f072816418040a396a0cad26bc889f539dadd6 core/src/main/scala/kafka/tools/SimpleConsumerShell.scala 747e07280cce72d621acbc771337b909a9b2487e core/src/main/scala/kafka/utils/ZkUtils.scala fcbe269b6057b45793ea95f357890d5d6922e8d4 core/src/test/scala/unit/kafka/admin/AddPartitionsTest.scala fcd5eee09fc1831e7fac4c3f1151e9708dc6f5f1 core/src/test/scala/unit/kafka/integration/TopicMetadataTest.scala 35dc071b1056e775326981573c9618d8046e601d
Re: Review Request 22131: Patch for KAFKA-1477
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/22131/#review44916 --- Ship it! Looks like the changes we did on this commit https://github.com/relango/kafka/commit/0ec255e94973df995c43818bb09d1246440aded9 is not included in patch. We made those changes to fix BadVersion Error thrown by zookeeper. Hopefully they don't happen anymore with latest code in trunk. If it comes back we can create another patch since it is not related to security. So ok with not including it. - Rajasekar Elango On June 3, 2014, 10:53 a.m., Ivan Lyutov wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/22131/ --- (Updated June 3, 2014, 10:53 a.m.) Review request for kafka. Bugs: KAFKA-1477 https://issues.apache.org/jira/browse/KAFKA-1477 Repository: kafka Description --- Updated according to requested changes: refactoring, minor edits. Reverted gradle version Added SSL for Kafka Minor fixes, cleanup Refactoring Fixed tests compilation error. Diffs - config/client.keystore PRE-CREATION config/client.public-key PRE-CREATION config/client.security.properties PRE-CREATION config/consumer.properties 7343cbc28cf8b8de3f096d09c2be955bea73164f config/producer.properties 39d65d7c6c21f4fccd7af89be6ca12a088d5dd98 config/server.keystore PRE-CREATION config/server.properties c9e923aed8551e0797b1ea6f69628b277faf8f48 config/server.public-key PRE-CREATION config/server.security.properties PRE-CREATION core/src/main/scala/kafka/api/FetchRequest.scala a8b73acd1a813284744359e8434cb52d22063c99 core/src/main/scala/kafka/client/ClientUtils.scala ba5fbdcd9e60f953575e529325caf4c41e22f22d core/src/main/scala/kafka/cluster/Broker.scala 9407ed21fbbd57edeecd888edc32bea6a05d95b3 core/src/main/scala/kafka/common/UnknownKeyStoreException.scala PRE-CREATION core/src/main/scala/kafka/consumer/ConsoleConsumer.scala 1a16c691683dda0c53f316e3c4797ea38e776574 core/src/main/scala/kafka/consumer/ConsumerConfig.scala 1cf2f62ba02e4aa66bfa7575865e5d57baf82212 core/src/main/scala/kafka/consumer/ConsumerFetcherManager.scala b9e2bea7b442a19bcebd1b350d39541a8c9dd068 core/src/main/scala/kafka/consumer/SimpleConsumer.scala 0e64632210385ef63c2ad3445b55ac4f37a63df2 core/src/main/scala/kafka/controller/ControllerChannelManager.scala 8763968fbff697e4c5c98ab1274627c192a4d26a core/src/main/scala/kafka/network/BlockingChannel.scala eb7bb14d94cb3648c06d4de36a3b34aacbde4556 core/src/main/scala/kafka/network/SocketServer.scala 4976d9c3a66bc965f5870a0736e21c7b32650bab core/src/main/scala/kafka/network/security/AuthConfig.scala PRE-CREATION core/src/main/scala/kafka/network/security/KeyStores.scala PRE-CREATION core/src/main/scala/kafka/network/security/SSLSocketChannel.scala PRE-CREATION core/src/main/scala/kafka/network/security/SecureAuth.scala PRE-CREATION core/src/main/scala/kafka/network/security/store/JKSInitializer.scala PRE-CREATION core/src/main/scala/kafka/producer/ConsoleProducer.scala a2af988d99a94a20291d6a2dc9bec73197f1b756 core/src/main/scala/kafka/producer/ProducerConfig.scala 3cdf23dce3407f1770b9c6543e3a8ae8ab3ff255 core/src/main/scala/kafka/producer/ProducerPool.scala 43df70bb461dd3e385e6b20396adef3c4016a3fc core/src/main/scala/kafka/producer/SyncProducer.scala 489f0077512d9a69be81649c490274964290fa40 core/src/main/scala/kafka/producer/SyncProducerConfig.scala 69b2d0c11bb1412ce76d566f285333c806be301a core/src/main/scala/kafka/server/AbstractFetcherThread.scala 3b15254f32252cf824d7a292889ac7662d73ada1 core/src/main/scala/kafka/server/KafkaConfig.scala ef75b67b67676ae5b8931902cbc8c0c2cc72c0d3 core/src/main/scala/kafka/server/KafkaHealthcheck.scala 4acdd70fe9c1ee78d6510741006c2ece65450671 core/src/main/scala/kafka/server/KafkaServer.scala c22e51e0412843ec993721ad3230824c0aadd2ba core/src/main/scala/kafka/tools/ConsumerOffsetChecker.scala 19df757d75fdbb3ff0b434b6cb10338ff5cc32da core/src/main/scala/kafka/tools/GetOffsetShell.scala fba652e3716a67b04431fc46790ad255201b639f core/src/main/scala/kafka/tools/ReplicaVerificationTool.scala 91f072816418040a396a0cad26bc889f539dadd6 core/src/main/scala/kafka/tools/SimpleConsumerShell.scala 747e07280cce72d621acbc771337b909a9b2487e core/src/main/scala/kafka/utils/ZkUtils.scala fcbe269b6057b45793ea95f357890d5d6922e8d4 core/src/test/scala/unit/kafka/admin/AddPartitionsTest.scala fcd5eee09fc1831e7fac4c3f1151e9708dc6f5f1 core/src/test/scala/unit/kafka/integration/TopicMetadataTest.scala 35dc071b1056e775326981573c9618d8046e601d
[jira] [Updated] (KAFKA-1482) Transient test failures for kafka.admin.DeleteTopicTest
[ https://issues.apache.org/jira/browse/KAFKA-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-1482: - Labels: newbie (was: ) Transient test failures for kafka.admin.DeleteTopicTest --- Key: KAFKA-1482 URL: https://issues.apache.org/jira/browse/KAFKA-1482 Project: Kafka Issue Type: Bug Reporter: Guozhang Wang Assignee: Jun Rao Labels: newbie Fix For: 0.8.2 A couple of test cases have timing related transient test failures: kafka.admin.DeleteTopicTest testPartitionReassignmentDuringDeleteTopic FAILED junit.framework.AssertionFailedError: Admin path /admin/delete_topic/test path not deleted even after a replica is restarted at junit.framework.Assert.fail(Assert.java:47) at kafka.utils.TestUtils$.waitUntilTrue(TestUtils.scala:578) at kafka.admin.DeleteTopicTest.verifyTopicDeletion(DeleteTopicTest.scala:333) at kafka.admin.DeleteTopicTest.testPartitionReassignmentDuringDeleteTopic(DeleteTopicTest.scala:197) kafka.admin.DeleteTopicTest testDeleteTopicDuringAddPartition FAILED junit.framework.AssertionFailedError: Replica logs not deleted after delete topic is complete at junit.framework.Assert.fail(Assert.java:47) at junit.framework.Assert.assertTrue(Assert.java:20) at kafka.admin.DeleteTopicTest.verifyTopicDeletion(DeleteTopicTest.scala:338) at kafka.admin.DeleteTopicTest.testDeleteTopicDuringAddPartition(DeleteTopicTest.scala:216) kafka.admin.DeleteTopicTest testRequestHandlingDuringDeleteTopic FAILED org.scalatest.junit.JUnitTestFailedError: fails with exception at org.scalatest.junit.AssertionsForJUnit$class.newAssertionFailedException(AssertionsForJUnit.scala:102) at org.scalatest.junit.JUnit3Suite.newAssertionFailedException(JUnit3Suite.scala:142) at org.scalatest.Assertions$class.fail(Assertions.scala:664) at org.scalatest.junit.JUnit3Suite.fail(JUnit3Suite.scala:142) at kafka.admin.DeleteTopicTest.testRequestHandlingDuringDeleteTopic(DeleteTopicTest.scala:123) Caused by: org.scalatest.junit.JUnitTestFailedError: Test should fail because the topic is being deleted at org.scalatest.junit.AssertionsForJUnit$class.newAssertionFailedException(AssertionsForJUnit.scala:101) at org.scalatest.junit.JUnit3Suite.newAssertionFailedException(JUnit3Suite.scala:142) at org.scalatest.Assertions$class.fail(Assertions.scala:644) at org.scalatest.junit.JUnit3Suite.fail(JUnit3Suite.scala:142) at kafka.admin.DeleteTopicTest.testRequestHandlingDuringDeleteTopic(DeleteTopicTest.scala:120) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (KAFKA-1483) Split Brain about Leader Partitions
[ https://issues.apache.org/jira/browse/KAFKA-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-1483: - Labels: newbie++ (was: ) Split Brain about Leader Partitions --- Key: KAFKA-1483 URL: https://issues.apache.org/jira/browse/KAFKA-1483 Project: Kafka Issue Type: Improvement Reporter: Guozhang Wang Assignee: Jun Rao Labels: newbie++ Fix For: 0.9.0 Today in the server there are two places storing the leader partition info: 1) leaderPartitions list in the ReplicaManager. 2) leaderBrokerIdOpt in the Partition. 1) is used as the ground truth to decide if the server is the current leader for serving requests; 2) is used as the ground truth for reporting leader counts metrics, etc and for the background Shrinking-ISR thread to decide which partition to check. There is a risk that these two ground truth caches are not consistent, and we'd better only make one of them as the ground truth. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (KAFKA-1483) Split Brain about Leader Partitions
[ https://issues.apache.org/jira/browse/KAFKA-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-1483: - Issue Type: Improvement (was: Bug) Split Brain about Leader Partitions --- Key: KAFKA-1483 URL: https://issues.apache.org/jira/browse/KAFKA-1483 Project: Kafka Issue Type: Improvement Reporter: Guozhang Wang Assignee: Jun Rao Labels: newbie++ Fix For: 0.9.0 Today in the server there are two places storing the leader partition info: 1) leaderPartitions list in the ReplicaManager. 2) leaderBrokerIdOpt in the Partition. 1) is used as the ground truth to decide if the server is the current leader for serving requests; 2) is used as the ground truth for reporting leader counts metrics, etc and for the background Shrinking-ISR thread to decide which partition to check. There is a risk that these two ground truth caches are not consistent, and we'd better only make one of them as the ground truth. -- This message was sent by Atlassian JIRA (v6.2#6252)
Review Request 22309: Patch for KAFKA-1298
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/22309/ --- Review request for kafka. Bugs: KAFKA-1298 https://issues.apache.org/jira/browse/KAFKA-1298 Repository: kafka Description --- KAFKA-1298. Controlled shutdown tool doesn't seem to work out of the box. Disabling controlled shutdown to speed up the tests. Diffs - core/src/test/scala/unit/kafka/utils/TestUtils.scala 4da0f2c245f75ff0dcab4ecf0af085ab9f8da1bb Diff: https://reviews.apache.org/r/22309/diff/ Testing --- Thanks, Sriharsha Chintalapani
[jira] [Updated] (KAFKA-1298) Controlled shutdown tool doesn't seem to work out of the box
[ https://issues.apache.org/jira/browse/KAFKA-1298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriharsha Chintalapani updated KAFKA-1298: -- Attachment: KAFKA-1298.patch Controlled shutdown tool doesn't seem to work out of the box Key: KAFKA-1298 URL: https://issues.apache.org/jira/browse/KAFKA-1298 Project: Kafka Issue Type: Improvement Reporter: Jay Kreps Assignee: Sriharsha Chintalapani Labels: usability Attachments: KAFKA-1298.patch, KAFKA-1298.patch Download Kafka and try to use our shutdown tool. Got this: bin/kafka-run-class.sh kafka.admin.ShutdownBroker --zookeeper localhost:2181 --broker 0 [2014-03-06 16:58:23,636] ERROR Operation failed due to controller failure (kafka.admin.ShutdownBroker$) java.io.IOException: Failed to retrieve RMIServer stub: javax.naming.ServiceUnavailableException [Root exception is java.rmi.ConnectException: Connection refused to host: jkreps-mn.linkedin.biz; nested exception is: java.net.ConnectException: Connection refused] at javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:340) at javax.management.remote.JMXConnectorFactory.connect(JMXConnectorFactory.java:249) at kafka.admin.ShutdownBroker$.kafka$admin$ShutdownBroker$$invokeShutdown(ShutdownBroker.scala:56) at kafka.admin.ShutdownBroker$.main(ShutdownBroker.scala:109) at kafka.admin.ShutdownBroker.main(ShutdownBroker.scala) Caused by: javax.naming.ServiceUnavailableException [Root exception is java.rmi.ConnectException: Connection refused to host: jkreps-mn.linkedin.biz; nested exception is: java.net.ConnectException: Connection refused] at com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:101) at com.sun.jndi.toolkit.url.GenericURLContext.lookup(GenericURLContext.java:185) at javax.naming.InitialContext.lookup(InitialContext.java:392) at javax.management.remote.rmi.RMIConnector.findRMIServerJNDI(RMIConnector.java:1888) at javax.management.remote.rmi.RMIConnector.findRMIServer(RMIConnector.java:1858) at javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:257) ... 4 more Caused by: java.rmi.ConnectException: Connection refused to host: jkreps-mn.linkedin.biz; nested exception is: java.net.ConnectException: Connection refused at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:601) at sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:198) at sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:184) at sun.rmi.server.UnicastRef.newCall(UnicastRef.java:322) at sun.rmi.registry.RegistryImpl_Stub.lookup(Unknown Source) at com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:97) ... 9 more Caused by: java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:382) at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:241) at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:228) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:431) at java.net.Socket.connect(Socket.java:527) at java.net.Socket.connect(Socket.java:476) at java.net.Socket.init(Socket.java:373) at java.net.Socket.init(Socket.java:187) at sun.rmi.transport.proxy.RMIDirectSocketFactory.createSocket(RMIDirectSocketFactory.java:22) at sun.rmi.transport.proxy.RMIMasterSocketFactory.createSocket(RMIMasterSocketFactory.java:128) at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:595) ... 14 more Oh god, RMI?!!!??? Presumably this is because we stopped setting the JMX port by default. This is good because setting the JMX port breaks the quickstart which requires running multiple nodes on a single machine. The root cause imo is just using RMI here instead of our regular RPC. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (KAFKA-1298) Controlled shutdown tool doesn't seem to work out of the box
[ https://issues.apache.org/jira/browse/KAFKA-1298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020046#comment-14020046 ] Sriharsha Chintalapani commented on KAFKA-1298: --- Created reviewboard https://reviews.apache.org/r/22309/diff/ against branch origin/trunk Controlled shutdown tool doesn't seem to work out of the box Key: KAFKA-1298 URL: https://issues.apache.org/jira/browse/KAFKA-1298 Project: Kafka Issue Type: Improvement Reporter: Jay Kreps Assignee: Sriharsha Chintalapani Labels: usability Attachments: KAFKA-1298.patch, KAFKA-1298.patch Download Kafka and try to use our shutdown tool. Got this: bin/kafka-run-class.sh kafka.admin.ShutdownBroker --zookeeper localhost:2181 --broker 0 [2014-03-06 16:58:23,636] ERROR Operation failed due to controller failure (kafka.admin.ShutdownBroker$) java.io.IOException: Failed to retrieve RMIServer stub: javax.naming.ServiceUnavailableException [Root exception is java.rmi.ConnectException: Connection refused to host: jkreps-mn.linkedin.biz; nested exception is: java.net.ConnectException: Connection refused] at javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:340) at javax.management.remote.JMXConnectorFactory.connect(JMXConnectorFactory.java:249) at kafka.admin.ShutdownBroker$.kafka$admin$ShutdownBroker$$invokeShutdown(ShutdownBroker.scala:56) at kafka.admin.ShutdownBroker$.main(ShutdownBroker.scala:109) at kafka.admin.ShutdownBroker.main(ShutdownBroker.scala) Caused by: javax.naming.ServiceUnavailableException [Root exception is java.rmi.ConnectException: Connection refused to host: jkreps-mn.linkedin.biz; nested exception is: java.net.ConnectException: Connection refused] at com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:101) at com.sun.jndi.toolkit.url.GenericURLContext.lookup(GenericURLContext.java:185) at javax.naming.InitialContext.lookup(InitialContext.java:392) at javax.management.remote.rmi.RMIConnector.findRMIServerJNDI(RMIConnector.java:1888) at javax.management.remote.rmi.RMIConnector.findRMIServer(RMIConnector.java:1858) at javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:257) ... 4 more Caused by: java.rmi.ConnectException: Connection refused to host: jkreps-mn.linkedin.biz; nested exception is: java.net.ConnectException: Connection refused at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:601) at sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:198) at sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:184) at sun.rmi.server.UnicastRef.newCall(UnicastRef.java:322) at sun.rmi.registry.RegistryImpl_Stub.lookup(Unknown Source) at com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:97) ... 9 more Caused by: java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:382) at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:241) at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:228) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:431) at java.net.Socket.connect(Socket.java:527) at java.net.Socket.connect(Socket.java:476) at java.net.Socket.init(Socket.java:373) at java.net.Socket.init(Socket.java:187) at sun.rmi.transport.proxy.RMIDirectSocketFactory.createSocket(RMIDirectSocketFactory.java:22) at sun.rmi.transport.proxy.RMIMasterSocketFactory.createSocket(RMIMasterSocketFactory.java:128) at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:595) ... 14 more Oh god, RMI?!!!??? Presumably this is because we stopped setting the JMX port by default. This is good because setting the JMX port breaks the quickstart which requires running multiple nodes on a single machine. The root cause imo is just using RMI here instead of our regular RPC. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (KAFKA-1469) Util.abs function does not return correct absolute values for negative values
[ https://issues.apache.org/jira/browse/KAFKA-1469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020050#comment-14020050 ] Neha Narkhede commented on KAFKA-1469: -- +1. Thanks for the patch. Util.abs function does not return correct absolute values for negative values - Key: KAFKA-1469 URL: https://issues.apache.org/jira/browse/KAFKA-1469 Project: Kafka Issue Type: Bug Reporter: Joel Koshy Labels: newbie, patch Attachments: KAFKA-1469.patch Reported by Russell Melick. [edit: I don't think this affects correctness of the places that use the abs utility since we just need it to return a consistent positive value, but we should fix this nonetheless] {code} /** * Get the absolute value of the given number. If the number is Int.MinValue return 0. * This is different from java.lang.Math.abs or scala.math.abs in that they return Int.MinValue (!). */ def abs(n: Int) = n 0x7fff {code} For negative integers, it does not return the absolute value. It does appear to do what the comment says for Int.MinValue though. For example, {code} scala -1 0x7fff res8: Int = 2147483647 scala -2 0x7fff res9: Int = 2147483646 scala -2147483647 0x7fff res11: Int = 1 scala -2147483648 0x7fff res12: Int = 0 {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (KAFKA-1469) Util.abs function does not return correct absolute values for negative values
[ https://issues.apache.org/jira/browse/KAFKA-1469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-1469: - Resolution: Fixed Status: Resolved (was: Patch Available) Pushed to trunk Util.abs function does not return correct absolute values for negative values - Key: KAFKA-1469 URL: https://issues.apache.org/jira/browse/KAFKA-1469 Project: Kafka Issue Type: Bug Reporter: Joel Koshy Labels: newbie, patch Attachments: KAFKA-1469.patch Reported by Russell Melick. [edit: I don't think this affects correctness of the places that use the abs utility since we just need it to return a consistent positive value, but we should fix this nonetheless] {code} /** * Get the absolute value of the given number. If the number is Int.MinValue return 0. * This is different from java.lang.Math.abs or scala.math.abs in that they return Int.MinValue (!). */ def abs(n: Int) = n 0x7fff {code} For negative integers, it does not return the absolute value. It does appear to do what the comment says for Int.MinValue though. For example, {code} scala -1 0x7fff res8: Int = 2147483647 scala -2 0x7fff res9: Int = 2147483646 scala -2147483647 0x7fff res11: Int = 1 scala -2147483648 0x7fff res12: Int = 0 {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (KAFKA-1298) Controlled shutdown tool doesn't seem to work out of the box
[ https://issues.apache.org/jira/browse/KAFKA-1298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020054#comment-14020054 ] Sriharsha Chintalapani commented on KAFKA-1298: --- [~nehanarkhede] Yes I added the config there. Thanks. [~junrao] Thanks for catching it. Controlled shutdown tool doesn't seem to work out of the box Key: KAFKA-1298 URL: https://issues.apache.org/jira/browse/KAFKA-1298 Project: Kafka Issue Type: Improvement Reporter: Jay Kreps Assignee: Sriharsha Chintalapani Labels: usability Attachments: KAFKA-1298.patch, KAFKA-1298.patch Download Kafka and try to use our shutdown tool. Got this: bin/kafka-run-class.sh kafka.admin.ShutdownBroker --zookeeper localhost:2181 --broker 0 [2014-03-06 16:58:23,636] ERROR Operation failed due to controller failure (kafka.admin.ShutdownBroker$) java.io.IOException: Failed to retrieve RMIServer stub: javax.naming.ServiceUnavailableException [Root exception is java.rmi.ConnectException: Connection refused to host: jkreps-mn.linkedin.biz; nested exception is: java.net.ConnectException: Connection refused] at javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:340) at javax.management.remote.JMXConnectorFactory.connect(JMXConnectorFactory.java:249) at kafka.admin.ShutdownBroker$.kafka$admin$ShutdownBroker$$invokeShutdown(ShutdownBroker.scala:56) at kafka.admin.ShutdownBroker$.main(ShutdownBroker.scala:109) at kafka.admin.ShutdownBroker.main(ShutdownBroker.scala) Caused by: javax.naming.ServiceUnavailableException [Root exception is java.rmi.ConnectException: Connection refused to host: jkreps-mn.linkedin.biz; nested exception is: java.net.ConnectException: Connection refused] at com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:101) at com.sun.jndi.toolkit.url.GenericURLContext.lookup(GenericURLContext.java:185) at javax.naming.InitialContext.lookup(InitialContext.java:392) at javax.management.remote.rmi.RMIConnector.findRMIServerJNDI(RMIConnector.java:1888) at javax.management.remote.rmi.RMIConnector.findRMIServer(RMIConnector.java:1858) at javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:257) ... 4 more Caused by: java.rmi.ConnectException: Connection refused to host: jkreps-mn.linkedin.biz; nested exception is: java.net.ConnectException: Connection refused at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:601) at sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:198) at sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:184) at sun.rmi.server.UnicastRef.newCall(UnicastRef.java:322) at sun.rmi.registry.RegistryImpl_Stub.lookup(Unknown Source) at com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:97) ... 9 more Caused by: java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:382) at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:241) at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:228) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:431) at java.net.Socket.connect(Socket.java:527) at java.net.Socket.connect(Socket.java:476) at java.net.Socket.init(Socket.java:373) at java.net.Socket.init(Socket.java:187) at sun.rmi.transport.proxy.RMIDirectSocketFactory.createSocket(RMIDirectSocketFactory.java:22) at sun.rmi.transport.proxy.RMIMasterSocketFactory.createSocket(RMIMasterSocketFactory.java:128) at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:595) ... 14 more Oh god, RMI?!!!??? Presumably this is because we stopped setting the JMX port by default. This is good because setting the JMX port breaks the quickstart which requires running multiple nodes on a single machine. The root cause imo is just using RMI here instead of our regular RPC. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Closed] (KAFKA-1469) Util.abs function does not return correct absolute values for negative values
[ https://issues.apache.org/jira/browse/KAFKA-1469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede closed KAFKA-1469. Util.abs function does not return correct absolute values for negative values - Key: KAFKA-1469 URL: https://issues.apache.org/jira/browse/KAFKA-1469 Project: Kafka Issue Type: Bug Reporter: Joel Koshy Labels: newbie, patch Attachments: KAFKA-1469.patch Reported by Russell Melick. [edit: I don't think this affects correctness of the places that use the abs utility since we just need it to return a consistent positive value, but we should fix this nonetheless] {code} /** * Get the absolute value of the given number. If the number is Int.MinValue return 0. * This is different from java.lang.Math.abs or scala.math.abs in that they return Int.MinValue (!). */ def abs(n: Int) = n 0x7fff {code} For negative integers, it does not return the absolute value. It does appear to do what the comment says for Int.MinValue though. For example, {code} scala -1 0x7fff res8: Int = 2147483647 scala -2 0x7fff res9: Int = 2147483646 scala -2147483647 0x7fff res11: Int = 1 scala -2147483648 0x7fff res12: Int = 0 {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 22131: Patch for KAFKA-1477
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/22131/#review44927 --- config/server.properties https://reviews.apache.org/r/22131/#comment79513 Should we secure property to false to make kafka run in non-secure mode by default so it won't impact existing users. - Rajasekar Elango On June 3, 2014, 10:53 a.m., Ivan Lyutov wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/22131/ --- (Updated June 3, 2014, 10:53 a.m.) Review request for kafka. Bugs: KAFKA-1477 https://issues.apache.org/jira/browse/KAFKA-1477 Repository: kafka Description --- Updated according to requested changes: refactoring, minor edits. Reverted gradle version Added SSL for Kafka Minor fixes, cleanup Refactoring Fixed tests compilation error. Diffs - config/client.keystore PRE-CREATION config/client.public-key PRE-CREATION config/client.security.properties PRE-CREATION config/consumer.properties 7343cbc28cf8b8de3f096d09c2be955bea73164f config/producer.properties 39d65d7c6c21f4fccd7af89be6ca12a088d5dd98 config/server.keystore PRE-CREATION config/server.properties c9e923aed8551e0797b1ea6f69628b277faf8f48 config/server.public-key PRE-CREATION config/server.security.properties PRE-CREATION core/src/main/scala/kafka/api/FetchRequest.scala a8b73acd1a813284744359e8434cb52d22063c99 core/src/main/scala/kafka/client/ClientUtils.scala ba5fbdcd9e60f953575e529325caf4c41e22f22d core/src/main/scala/kafka/cluster/Broker.scala 9407ed21fbbd57edeecd888edc32bea6a05d95b3 core/src/main/scala/kafka/common/UnknownKeyStoreException.scala PRE-CREATION core/src/main/scala/kafka/consumer/ConsoleConsumer.scala 1a16c691683dda0c53f316e3c4797ea38e776574 core/src/main/scala/kafka/consumer/ConsumerConfig.scala 1cf2f62ba02e4aa66bfa7575865e5d57baf82212 core/src/main/scala/kafka/consumer/ConsumerFetcherManager.scala b9e2bea7b442a19bcebd1b350d39541a8c9dd068 core/src/main/scala/kafka/consumer/SimpleConsumer.scala 0e64632210385ef63c2ad3445b55ac4f37a63df2 core/src/main/scala/kafka/controller/ControllerChannelManager.scala 8763968fbff697e4c5c98ab1274627c192a4d26a core/src/main/scala/kafka/network/BlockingChannel.scala eb7bb14d94cb3648c06d4de36a3b34aacbde4556 core/src/main/scala/kafka/network/SocketServer.scala 4976d9c3a66bc965f5870a0736e21c7b32650bab core/src/main/scala/kafka/network/security/AuthConfig.scala PRE-CREATION core/src/main/scala/kafka/network/security/KeyStores.scala PRE-CREATION core/src/main/scala/kafka/network/security/SSLSocketChannel.scala PRE-CREATION core/src/main/scala/kafka/network/security/SecureAuth.scala PRE-CREATION core/src/main/scala/kafka/network/security/store/JKSInitializer.scala PRE-CREATION core/src/main/scala/kafka/producer/ConsoleProducer.scala a2af988d99a94a20291d6a2dc9bec73197f1b756 core/src/main/scala/kafka/producer/ProducerConfig.scala 3cdf23dce3407f1770b9c6543e3a8ae8ab3ff255 core/src/main/scala/kafka/producer/ProducerPool.scala 43df70bb461dd3e385e6b20396adef3c4016a3fc core/src/main/scala/kafka/producer/SyncProducer.scala 489f0077512d9a69be81649c490274964290fa40 core/src/main/scala/kafka/producer/SyncProducerConfig.scala 69b2d0c11bb1412ce76d566f285333c806be301a core/src/main/scala/kafka/server/AbstractFetcherThread.scala 3b15254f32252cf824d7a292889ac7662d73ada1 core/src/main/scala/kafka/server/KafkaConfig.scala ef75b67b67676ae5b8931902cbc8c0c2cc72c0d3 core/src/main/scala/kafka/server/KafkaHealthcheck.scala 4acdd70fe9c1ee78d6510741006c2ece65450671 core/src/main/scala/kafka/server/KafkaServer.scala c22e51e0412843ec993721ad3230824c0aadd2ba core/src/main/scala/kafka/tools/ConsumerOffsetChecker.scala 19df757d75fdbb3ff0b434b6cb10338ff5cc32da core/src/main/scala/kafka/tools/GetOffsetShell.scala fba652e3716a67b04431fc46790ad255201b639f core/src/main/scala/kafka/tools/ReplicaVerificationTool.scala 91f072816418040a396a0cad26bc889f539dadd6 core/src/main/scala/kafka/tools/SimpleConsumerShell.scala 747e07280cce72d621acbc771337b909a9b2487e core/src/main/scala/kafka/utils/ZkUtils.scala fcbe269b6057b45793ea95f357890d5d6922e8d4 core/src/test/scala/unit/kafka/admin/AddPartitionsTest.scala fcd5eee09fc1831e7fac4c3f1151e9708dc6f5f1 core/src/test/scala/unit/kafka/integration/TopicMetadataTest.scala 35dc071b1056e775326981573c9618d8046e601d core/src/test/scala/unit/kafka/network/SocketServerTest.scala 62fb02cf02d3876b9804d756c4bf8514554cc836 core/src/test/scala/unit/kafka/utils/TestUtils.scala 4da0f2c245f75ff0dcab4ecf0af085ab9f8da1bb
Re: Review Request 21937: Patch for KAFKA-1316
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/21937/#review44921 --- clients/src/main/java/org/apache/kafka/clients/InFlightRequests.java https://reviews.apache.org/r/21937/#comment79509 Rename ByteBufferSend.complete() to completed()? clients/src/main/java/org/apache/kafka/clients/NetworkClient.java https://reviews.apache.org/r/21937/#comment79507 Maybe rename to getReady()? clients/src/main/java/org/apache/kafka/clients/NetworkClient.java https://reviews.apache.org/r/21937/#comment79508 numInFlightRequests()? clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java https://reviews.apache.org/r/21937/#comment79516 I made now to nowMs since we have both MilliSec and NanoSec in the producer used in metrics, and I felt it is better to indicate that which values are for Milli and which are for Nano. - Guozhang Wang On June 3, 2014, 9:33 p.m., Jay Kreps wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/21937/ --- (Updated June 3, 2014, 9:33 p.m.) Review request for kafka. Bugs: KAFKA-1316 https://issues.apache.org/jira/browse/KAFKA-1316 Repository: kafka Description --- KAFKA-1316 Refactor a reusable NetworkClient interface out of Sender. Diffs - clients/src/main/java/org/apache/kafka/clients/ClientRequest.java PRE-CREATION clients/src/main/java/org/apache/kafka/clients/ClientResponse.java PRE-CREATION clients/src/main/java/org/apache/kafka/clients/ClusterConnectionStates.java PRE-CREATION clients/src/main/java/org/apache/kafka/clients/ConnectionState.java PRE-CREATION clients/src/main/java/org/apache/kafka/clients/InFlightRequests.java PRE-CREATION clients/src/main/java/org/apache/kafka/clients/KafkaClient.java PRE-CREATION clients/src/main/java/org/apache/kafka/clients/NetworkClient.java PRE-CREATION clients/src/main/java/org/apache/kafka/clients/NodeConnectionState.java PRE-CREATION clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java d15562a968d9e4b08f26b8d30986881adfe29e31 clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java 9b1f5653548ba90defdae43940a5554066770b0a clients/src/main/java/org/apache/kafka/common/network/Selector.java 3e358985ed72a894a71d683acc7460695d6f2056 clients/src/main/java/org/apache/kafka/common/protocol/types/Schema.java 68b8827f3bdd64580e1b443fce5b8c63152dd94a clients/src/main/java/org/apache/kafka/common/record/MemoryRecords.java 428968cd38a7b12991f87868bf759926ff7e594e clients/src/main/java/org/apache/kafka/common/requests/ProduceRequest.java PRE-CREATION clients/src/main/java/org/apache/kafka/common/requests/ProduceResponse.java 6fa4a58f5f9792776a647e8f682d7faadc0d1556 clients/src/test/java/org/apache/kafka/clients/MockClient.java PRE-CREATION clients/src/test/java/org/apache/kafka/clients/NetworkClientTest.java PRE-CREATION clients/src/test/java/org/apache/kafka/clients/producer/RecordAccumulatorTest.java c4072ae90fb58101a67f83054fbe0b8349e71c2e clients/src/test/java/org/apache/kafka/clients/producer/SenderTest.java 3ef692ca3e9fb83868a9da6f30c0705bb3d0aed2 clients/src/test/java/org/apache/kafka/common/utils/MockTime.java cda8e644587aad9a8c9c96c222edc1ba27de1fb0 core/src/test/scala/integration/kafka/api/ProducerFailureHandlingTest.scala cd4ca2fa77763b090c6ad4ba4a5d46a6a8b76698 Diff: https://reviews.apache.org/r/21937/diff/ Testing --- Thanks, Jay Kreps
Re: Review Request 21937: Patch for KAFKA-1316
On June 6, 2014, 3:18 a.m., Neha Narkhede wrote: clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java, line 58 https://reviews.apache.org/r/21937/diff/3/?file=603197#file603197line58 Now we will end up with potentially two Senders - one for the producer's state machine and another for the consumer's state machine. Can we rename this one to sth like ProduerSender? I think this is fine: this Sender is under kafka.clients.producer.internal, the consumer Sender will be under consumer.internal and there will not likely classes we need to import both. - Guozhang --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/21937/#review44850 --- On June 3, 2014, 9:33 p.m., Jay Kreps wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/21937/ --- (Updated June 3, 2014, 9:33 p.m.) Review request for kafka. Bugs: KAFKA-1316 https://issues.apache.org/jira/browse/KAFKA-1316 Repository: kafka Description --- KAFKA-1316 Refactor a reusable NetworkClient interface out of Sender. Diffs - clients/src/main/java/org/apache/kafka/clients/ClientRequest.java PRE-CREATION clients/src/main/java/org/apache/kafka/clients/ClientResponse.java PRE-CREATION clients/src/main/java/org/apache/kafka/clients/ClusterConnectionStates.java PRE-CREATION clients/src/main/java/org/apache/kafka/clients/ConnectionState.java PRE-CREATION clients/src/main/java/org/apache/kafka/clients/InFlightRequests.java PRE-CREATION clients/src/main/java/org/apache/kafka/clients/KafkaClient.java PRE-CREATION clients/src/main/java/org/apache/kafka/clients/NetworkClient.java PRE-CREATION clients/src/main/java/org/apache/kafka/clients/NodeConnectionState.java PRE-CREATION clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java d15562a968d9e4b08f26b8d30986881adfe29e31 clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java 9b1f5653548ba90defdae43940a5554066770b0a clients/src/main/java/org/apache/kafka/common/network/Selector.java 3e358985ed72a894a71d683acc7460695d6f2056 clients/src/main/java/org/apache/kafka/common/protocol/types/Schema.java 68b8827f3bdd64580e1b443fce5b8c63152dd94a clients/src/main/java/org/apache/kafka/common/record/MemoryRecords.java 428968cd38a7b12991f87868bf759926ff7e594e clients/src/main/java/org/apache/kafka/common/requests/ProduceRequest.java PRE-CREATION clients/src/main/java/org/apache/kafka/common/requests/ProduceResponse.java 6fa4a58f5f9792776a647e8f682d7faadc0d1556 clients/src/test/java/org/apache/kafka/clients/MockClient.java PRE-CREATION clients/src/test/java/org/apache/kafka/clients/NetworkClientTest.java PRE-CREATION clients/src/test/java/org/apache/kafka/clients/producer/RecordAccumulatorTest.java c4072ae90fb58101a67f83054fbe0b8349e71c2e clients/src/test/java/org/apache/kafka/clients/producer/SenderTest.java 3ef692ca3e9fb83868a9da6f30c0705bb3d0aed2 clients/src/test/java/org/apache/kafka/common/utils/MockTime.java cda8e644587aad9a8c9c96c222edc1ba27de1fb0 core/src/test/scala/integration/kafka/api/ProducerFailureHandlingTest.scala cd4ca2fa77763b090c6ad4ba4a5d46a6a8b76698 Diff: https://reviews.apache.org/r/21937/diff/ Testing --- Thanks, Jay Kreps
[jira] [Commented] (KAFKA-824) java.lang.NullPointerException in commitOffsets
[ https://issues.apache.org/jira/browse/KAFKA-824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020084#comment-14020084 ] Andrew Olson commented on KAFKA-824: I have reported this issue to the zkclient project: https://github.com/sgroschupf/zkclient/issues/25 java.lang.NullPointerException in commitOffsets Key: KAFKA-824 URL: https://issues.apache.org/jira/browse/KAFKA-824 Project: Kafka Issue Type: Bug Components: consumer Affects Versions: 0.7.2 Reporter: Yonghui Zhao Assignee: Neha Narkhede Attachments: ZkClient.0.3.txt, ZkClient.0.4.txt, screenshot-1.jpg Neha Narkhede Yes, I have. Unfortunately, I never quite around to fixing it. My guess is that it is caused due to a race condition between the rebalance thread and the offset commit thread when a rebalance is triggered or the client is being shutdown. Do you mind filing a bug ? 2013/03/25 12:08:32.020 WARN [ZookeeperConsumerConnector] [] 0_lu-ml-test10.bj-1364184411339-7c88f710 exception during commitOffsets java.lang.NullPointerException at org.I0Itec.zkclient.ZkConnection.writeData(ZkConnection.java:111) at org.I0Itec.zkclient.ZkClient$10.call(ZkClient.java:813) at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675) at org.I0Itec.zkclient.ZkClient.writeData(ZkClient.java:809) at org.I0Itec.zkclient.ZkClient.writeData(ZkClient.java:777) at kafka.utils.ZkUtils$.updatePersistentPath(ZkUtils.scala:103) at kafka.consumer.ZookeeperConsumerConnector$$anonfun$commitOffsets$2$$anonfun$apply$4.apply(ZookeeperConsumerConnector.scala:251) at kafka.consumer.ZookeeperConsumerConnector$$anonfun$commitOffsets$2$$anonfun$apply$4.apply(ZookeeperConsumerConnector.scala:248) at scala.collection.Iterator$class.foreach(Iterator.scala:631) at scala.collection.JavaConversions$JIteratorWrapper.foreach(JavaConversions.scala:549) at scala.collection.IterableLike$class.foreach(IterableLike.scala:79) at scala.collection.JavaConversions$JCollectionWrapper.foreach(JavaConversions.scala:570) at kafka.consumer.ZookeeperConsumerConnector$$anonfun$commitOffsets$2.apply(ZookeeperConsumerConnector.scala:248) at kafka.consumer.ZookeeperConsumerConnector$$anonfun$commitOffsets$2.apply(ZookeeperConsumerConnector.scala:246) at scala.collection.Iterator$class.foreach(Iterator.scala:631) at kafka.utils.Pool$$anon$1.foreach(Pool.scala:53) at scala.collection.IterableLike$class.foreach(IterableLike.scala:79) at kafka.utils.Pool.foreach(Pool.scala:24) at kafka.consumer.ZookeeperConsumerConnector.commitOffsets(ZookeeperConsumerConnector.scala:246) at kafka.consumer.ZookeeperConsumerConnector.autoCommit(ZookeeperConsumerConnector.scala:232) at kafka.consumer.ZookeeperConsumerConnector$$anonfun$1.apply$mcV$sp(ZookeeperConsumerConnector.scala:126) at kafka.utils.Utils$$anon$2.run(Utils.scala:58) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) -- This message was sent by Atlassian JIRA (v6.2#6252)
Build failed in Jenkins: Kafka-trunk #200
See https://builds.apache.org/job/Kafka-trunk/200/changes Changes: [neha.narkhede] KAFKA-1443 Add delete topic option to topic commands; reviewed by Neha Narkhede [neha.narkhede] KAFKA-1469 Util.abs function does not return correct absolute values for negative values; patched by Guozhang Wang and Neha Narkhede -- [...truncated 2168 lines...] java.net.BindException: Address already in use at sun.nio.ch.Net.bind(Native Method) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:124) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:52) at org.apache.zookeeper.server.NIOServerCnxn$Factory.init(NIOServerCnxn.java:144) at org.apache.zookeeper.server.NIOServerCnxn$Factory.init(NIOServerCnxn.java:125) at kafka.zk.EmbeddedZookeeper.init(EmbeddedZookeeper.scala:32) at kafka.zk.ZooKeeperTestHarness$class.setUp(ZooKeeperTestHarness.scala:33) at kafka.consumer.ZookeeperConsumerConnectorTest.kafka$integration$KafkaServerTestHarness$$super$setUp(ZookeeperConsumerConnectorTest.scala:35) at kafka.integration.KafkaServerTestHarness$class.setUp(KafkaServerTestHarness.scala:35) at kafka.consumer.ZookeeperConsumerConnectorTest.setUp(ZookeeperConsumerConnectorTest.scala:57) kafka.consumer.ConsumerIteratorTest testConsumerIteratorDeduplicationDeepIterator FAILED java.net.BindException: Address already in use at sun.nio.ch.Net.bind(Native Method) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:124) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:52) at org.apache.zookeeper.server.NIOServerCnxn$Factory.init(NIOServerCnxn.java:144) at org.apache.zookeeper.server.NIOServerCnxn$Factory.init(NIOServerCnxn.java:125) at kafka.zk.EmbeddedZookeeper.init(EmbeddedZookeeper.scala:32) at kafka.zk.ZooKeeperTestHarness$class.setUp(ZooKeeperTestHarness.scala:33) at kafka.consumer.ConsumerIteratorTest.kafka$integration$KafkaServerTestHarness$$super$setUp(ConsumerIteratorTest.scala:36) at kafka.integration.KafkaServerTestHarness$class.setUp(KafkaServerTestHarness.scala:35) at kafka.consumer.ConsumerIteratorTest.setUp(ConsumerIteratorTest.scala:61) kafka.consumer.ConsumerIteratorTest testConsumerIteratorDecodingFailure FAILED java.net.BindException: Address already in use at sun.nio.ch.Net.bind(Native Method) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:124) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:52) at org.apache.zookeeper.server.NIOServerCnxn$Factory.init(NIOServerCnxn.java:144) at org.apache.zookeeper.server.NIOServerCnxn$Factory.init(NIOServerCnxn.java:125) at kafka.zk.EmbeddedZookeeper.init(EmbeddedZookeeper.scala:32) at kafka.zk.ZooKeeperTestHarness$class.setUp(ZooKeeperTestHarness.scala:33) at kafka.consumer.ConsumerIteratorTest.kafka$integration$KafkaServerTestHarness$$super$setUp(ConsumerIteratorTest.scala:36) at kafka.integration.KafkaServerTestHarness$class.setUp(KafkaServerTestHarness.scala:35) at kafka.consumer.ConsumerIteratorTest.setUp(ConsumerIteratorTest.scala:61) kafka.producer.AsyncProducerTest testInvalidPartition PASSED kafka.producer.AsyncProducerTest testProducerQueueSize PASSED kafka.producer.AsyncProducerTest testProduceAfterClosed PASSED kafka.producer.AsyncProducerTest testBatchSize PASSED kafka.producer.AsyncProducerTest testQueueTimeExpired PASSED kafka.producer.AsyncProducerTest testPartitionAndCollateEvents PASSED kafka.producer.AsyncProducerTest testSerializeEvents PASSED kafka.producer.AsyncProducerTest testNoBroker PASSED kafka.producer.AsyncProducerTest testIncompatibleEncoder PASSED kafka.producer.AsyncProducerTest testRandomPartitioner PASSED kafka.producer.AsyncProducerTest testFailedSendRetryLogic PASSED kafka.producer.AsyncProducerTest testJavaProducer PASSED kafka.producer.AsyncProducerTest testInvalidConfiguration PASSED kafka.producer.ProducerTest testUpdateBrokerPartitionInfo FAILED java.net.BindException: Address already in use at sun.nio.ch.Net.bind(Native Method) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:124) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:52) at org.apache.zookeeper.server.NIOServerCnxn$Factory.init(NIOServerCnxn.java:144) at org.apache.zookeeper.server.NIOServerCnxn$Factory.init(NIOServerCnxn.java:125) at
[jira] [Commented] (KAFKA-1475) Kafka consumer stops LeaderFinder/FetcherThreads, but application does not know
[ https://issues.apache.org/jira/browse/KAFKA-1475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020139#comment-14020139 ] Hang Qi commented on KAFKA-1475: Hi Guozhang, Thanks for your reply. I checked the code between kafka 0.8.0 and 0.8.1, it looks like this piece of codes are the same. The function createEphemeralPathExpectConflictHandleZKBug first calls createEphemeralPathExpectConflict and using the checker to handle ZkNodeExistsException. However, createEphemeralPathExpectConflict itself has some logic to handle ZkNodeExistsException. /** * Create an ephemeral node with the given path and data. * Throw NodeExistException if node already exists. */ def createEphemeralPathExpectConflict(client: ZkClient, path: String, data: String): Unit = { try { createEphemeralPath(client, path, data) } catch { case e: ZkNodeExistsException = { // this can happen when there is connection loss; make sure the data is what we intend to write var storedData: String = null try { storedData = readData(client, path)._1 } catch { case e1: ZkNoNodeException = // the node disappeared; treat as if node existed and let caller handles this case e2: Throwable = throw e2 } if (storedData == null || storedData != data) { info(conflict in + path + data: + data + stored data: + storedData) throw e } else { // otherwise, the creation succeeded, return normally info(path + exists with value + data + during connection loss; this is ok) } } case e2: Throwable = throw e2 } } We observed the log info(path + exists with value + data + during connection loss; this is ok) which comes from createEphemeralPathExpectConflict. Kafka consumer stops LeaderFinder/FetcherThreads, but application does not know --- Key: KAFKA-1475 URL: https://issues.apache.org/jira/browse/KAFKA-1475 Project: Kafka Issue Type: Bug Components: clients, consumer Affects Versions: 0.8.0 Environment: linux, rhel 6.4 Reporter: Hang Qi Assignee: Neha Narkhede Labels: consumer Attachments: 5055aeee-zk.txt We encounter an issue of consumers not consuming messages in production. ( this consumer has its own consumer group, and just consumes one topic of 3 partitions.) Based on the logs, we have following findings: 1. Zookeeper session expires, kafka highlevel consumer detected this event, and released old broker parition ownership and re-register consumer. 2. Upon creating ephemeral path in Zookeeper, it found that the path still exists, and try to read the content of the node. 3. After read back the content, it founds the content is same as that it is going to write, so it logged as [ZkClient-EventThread-428-ZK/kafka] (kafka.utils.ZkUtils$) - /consumers/consumerA/ids/consumerA-1400815740329-5055aeee exists with value { pattern:static, subscription:{ TOPIC: 1}, timestamp:1400846114845, version:1 } during connection loss; this is ok, and doing nothing. 4. After that, it throws exception indicated that the cause is org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /consumers/consumerA/ids/consumerA-1400815740329-5055aeee during rebalance. 5. After all retries failed, it gave up retry and left the LeaderFinderThread, FetcherThread stopped. Step 3 looks very weird, checking the code, there is timestamp contains in the stored data, it may be caused by Zookeeper issue. But what I am wondering is that whether it is possible to let application (kafka client users) to know that the underline LeaderFinderThread and FetcherThread are stopped, like allowing application to register some callback or throws some exception (by invalidate the KafkaStream iterator for example)? For me, it is not reasonable for the kafka client to shutdown everything and wait for next rebalance, and let application wait on iterator.hasNext() without knowing that there is something wrong underline. I've read about twiki about kafka 0.9 consumer rewrite, and there is a ConsumerRebalanceCallback interface, but I am not sure how long it will take to be ready, and how long it will take for us to migrate. Please help to look at this issue. Thanks very much! -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (KAFKA-1475) Kafka consumer stops LeaderFinder/FetcherThreads, but application does not know
[ https://issues.apache.org/jira/browse/KAFKA-1475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020148#comment-14020148 ] Guozhang Wang commented on KAFKA-1475: -- In this case are the consumer using the same timestamp after session timeout to re-register? Kafka consumer stops LeaderFinder/FetcherThreads, but application does not know --- Key: KAFKA-1475 URL: https://issues.apache.org/jira/browse/KAFKA-1475 Project: Kafka Issue Type: Bug Components: clients, consumer Affects Versions: 0.8.0 Environment: linux, rhel 6.4 Reporter: Hang Qi Assignee: Neha Narkhede Labels: consumer Attachments: 5055aeee-zk.txt We encounter an issue of consumers not consuming messages in production. ( this consumer has its own consumer group, and just consumes one topic of 3 partitions.) Based on the logs, we have following findings: 1. Zookeeper session expires, kafka highlevel consumer detected this event, and released old broker parition ownership and re-register consumer. 2. Upon creating ephemeral path in Zookeeper, it found that the path still exists, and try to read the content of the node. 3. After read back the content, it founds the content is same as that it is going to write, so it logged as [ZkClient-EventThread-428-ZK/kafka] (kafka.utils.ZkUtils$) - /consumers/consumerA/ids/consumerA-1400815740329-5055aeee exists with value { pattern:static, subscription:{ TOPIC: 1}, timestamp:1400846114845, version:1 } during connection loss; this is ok, and doing nothing. 4. After that, it throws exception indicated that the cause is org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /consumers/consumerA/ids/consumerA-1400815740329-5055aeee during rebalance. 5. After all retries failed, it gave up retry and left the LeaderFinderThread, FetcherThread stopped. Step 3 looks very weird, checking the code, there is timestamp contains in the stored data, it may be caused by Zookeeper issue. But what I am wondering is that whether it is possible to let application (kafka client users) to know that the underline LeaderFinderThread and FetcherThread are stopped, like allowing application to register some callback or throws some exception (by invalidate the KafkaStream iterator for example)? For me, it is not reasonable for the kafka client to shutdown everything and wait for next rebalance, and let application wait on iterator.hasNext() without knowing that there is something wrong underline. I've read about twiki about kafka 0.9 consumer rewrite, and there is a ConsumerRebalanceCallback interface, but I am not sure how long it will take to be ready, and how long it will take for us to migrate. Please help to look at this issue. Thanks very much! -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (KAFKA-1475) Kafka consumer stops LeaderFinder/FetcherThreads, but application does not know
[ https://issues.apache.org/jira/browse/KAFKA-1475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020269#comment-14020269 ] Hang Qi commented on KAFKA-1475: Not exactly. Let me recap. In original description, step 3. 3. After read back the content, it founds the content is same as that it is going to write, so it logged as [ZkClient-EventThread-428-ZK/kafka] (kafka.utils.ZkUtils$) - /consumers/consumerA/ids/consumerA-1400815740329-5055aeee exists with value {pattern:static, subscription:{TOPIC: 1}, timestamp:1400846114845, version:1} the timestamp is 1400846114845 = 11:55:14,845 UTC 2014, which is 07:55:14,845 EDT (our log uses EDT). Looking at my second comment and also the logs in the attachment, 1. 07:55:14 zk client got session expire,populate the event to watcher. Thus kafka client tried to recreate ephemeral node '/kafka8Agg/consumers/myconsumergroup/ids/myconsumergroup_ooo0001-1400815740329-5055aeee that's the time when kafka client aware that zk session expired and tried to re-create ephemeral node. However, based on following log 3. 07:55:26 zk sender did not hear server for connectReq response, thus close the connection and try to connect. 4. 07:55:40 zk sender established connection with mbus0005 and got session 0x545f6dc6f510757 5. 07:55:40 zk sender got response of create ephemeral node, zk server responded node exist(response code is -110). zk client successful connected to zk cluster at 07:55:40, and then found the ephemeral was created and the content was the same with it was about to write. But the owner this ephemeral node is 163808312244176699 = 0x245f6cac692073b, but the session before expire (0x345f6cac6ed071d), nor the afterward session (0x545f6dc6f510757). So I feel it is very weird, guessing that, between 07:55:14 and 07:55:26, somehow, zk client sent create request to mbus0002,and mbus002 processed it successful , but the response is not read by client, the session 0x245f6cac692073b was created at that time. Kafka consumer stops LeaderFinder/FetcherThreads, but application does not know --- Key: KAFKA-1475 URL: https://issues.apache.org/jira/browse/KAFKA-1475 Project: Kafka Issue Type: Bug Components: clients, consumer Affects Versions: 0.8.0 Environment: linux, rhel 6.4 Reporter: Hang Qi Assignee: Neha Narkhede Labels: consumer Attachments: 5055aeee-zk.txt We encounter an issue of consumers not consuming messages in production. ( this consumer has its own consumer group, and just consumes one topic of 3 partitions.) Based on the logs, we have following findings: 1. Zookeeper session expires, kafka highlevel consumer detected this event, and released old broker parition ownership and re-register consumer. 2. Upon creating ephemeral path in Zookeeper, it found that the path still exists, and try to read the content of the node. 3. After read back the content, it founds the content is same as that it is going to write, so it logged as [ZkClient-EventThread-428-ZK/kafka] (kafka.utils.ZkUtils$) - /consumers/consumerA/ids/consumerA-1400815740329-5055aeee exists with value { pattern:static, subscription:{ TOPIC: 1}, timestamp:1400846114845, version:1 } during connection loss; this is ok, and doing nothing. 4. After that, it throws exception indicated that the cause is org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /consumers/consumerA/ids/consumerA-1400815740329-5055aeee during rebalance. 5. After all retries failed, it gave up retry and left the LeaderFinderThread, FetcherThread stopped. Step 3 looks very weird, checking the code, there is timestamp contains in the stored data, it may be caused by Zookeeper issue. But what I am wondering is that whether it is possible to let application (kafka client users) to know that the underline LeaderFinderThread and FetcherThread are stopped, like allowing application to register some callback or throws some exception (by invalidate the KafkaStream iterator for example)? For me, it is not reasonable for the kafka client to shutdown everything and wait for next rebalance, and let application wait on iterator.hasNext() without knowing that there is something wrong underline. I've read about twiki about kafka 0.9 consumer rewrite, and there is a ConsumerRebalanceCallback interface, but I am not sure how long it will take to be ready, and how long it will take for us to migrate. Please help to look at this issue. Thanks very much! -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 19731: Patch for KAFKA-1328
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/19731/#review44964 --- If a consumer fails to establish a connection to zookeeper, how is that indicated? Also, just to confirm I understand the unsubscribe API correctly: if I have a large number of ephemeral consumers, will unsubscribe remove the consumer from zookeeper? Is that what the TODO: rebalance means? - Baran Nohutcuoglu On May 20, 2014, 11:34 p.m., Neha Narkhede wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/19731/ --- (Updated May 20, 2014, 11:34 p.m.) Review request for kafka. Bugs: KAFKA-1328 https://issues.apache.org/jira/browse/KAFKA-1328 Repository: kafka Description --- Fixed inconsistent javadoc for position(), committed() and offsetsBeforeTime() Converted ellipsis to Collection in a bunch of places, removed TimeUnit from the poll() API 1. Improved documentation on the position() API 2. Changed signature of commit API to remove Future and include a sync flag Included Jun's review suggestions part 2, except change to the commit() API since it needs more thought Review comments from Jun and Guozhang Checked in ConsumerRecordMetadata Fixed the javadoc usage examples in KafkaConsumer to match the API changes Changed the signature of poll to return MapString,ConsumerRecordMetadata to organize the ConsumerRecords around topic and then optionally around partition. This will serve the group management as well as custom partition subscription use cases 1. Changed the signature of poll() to return MapString, ListConsumerRecord 2. Changed ConsumerRecord to throw an exception if an error is detected for the partition. For example, if a single large message is larger than the total memory just for that partition, we don't want poll() to throw an exception since that will affect the processing of the remaining partitions as well Fixed MockConsumer to make subscribe(topics) and subscribe(partitions) mutually exclusive Changed the package to org.apache.kafka.clients.consumer from kafka.clients.consumer Changed the package to org.apache.kafka.clients.consumer from kafka.clients.consumer 1. Removed the commitAsync() APIs 2. Changed the commit() APIs to return a Future Fixed configs to match the producer side configs for metrics Renamed AUTO_COMMIT_ENABLE_CONFIG to ENABLE_AUTO_COMMIT_CONFIG Addressing review comments from Tim and Guozhang Rebasing after producer side config cleanup Added license headers Cleaned javadoc for ConsumerConfig Fixed minor indentation in ConsumerConfig Improve docs on ConsumerConfig 1. Added ClientUtils 2. Added basic constructor implementation for KafkaConsumer Improved MockConsumer Chris's feedback and also consumer rewind example code Added commit() and commitAsync() APIs to the consumer and updated docs and examples to reflect that 1. Added consumer usage examples to javadoc 2. Changed signature of APIs that accept or return offsets from list of offsets to map of offsets Improved example for using ConsumerRebalanceCallback Improved example for using ConsumerRebalanceCallback Included Jun's review comments and renamed positions to seek. Also included position() Changes to javadoc for positions() Changed the javadoc for ConsumerRebalanceCallback Changing unsubscribe to also take in var args for topic list Incorporated first round of feedback from Jay, Pradeep and Mattijs on the mailing list Updated configs Javadoc for consumer complete Completed docs for Consumer and ConsumerRebalanceCallback. Added MockConsumer Added the initial interfaces and related documentation for the consumer. More docs required to complete the public API Diffs - clients/src/main/java/org/apache/kafka/clients/consumer/Consumer.java PRE-CREATION clients/src/main/java/org/apache/kafka/clients/consumer/ConsumerConfig.java PRE-CREATION clients/src/main/java/org/apache/kafka/clients/consumer/ConsumerRebalanceCallback.java PRE-CREATION clients/src/main/java/org/apache/kafka/clients/consumer/ConsumerRecord.java PRE-CREATION clients/src/main/java/org/apache/kafka/clients/consumer/ConsumerRecords.java PRE-CREATION clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java PRE-CREATION clients/src/main/java/org/apache/kafka/clients/consumer/MockConsumer.java PRE-CREATION clients/src/main/java/org/apache/kafka/clients/consumer/OffsetMetadata.java PRE-CREATION
[jira] [Commented] (KAFKA-1298) Controlled shutdown tool doesn't seem to work out of the box
[ https://issues.apache.org/jira/browse/KAFKA-1298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020446#comment-14020446 ] Jun Rao commented on KAFKA-1298: Sriharsha, Thanks for the patch. Do we know why the controlled shutdown take that long in the case with just one broker? I was thinking this should only add a few ms overhead. So, instead of turning controlled shutdown off in the unit tests, perhaps we should just improve the performance of controlled shutdown? Looking at the code, it seems if replication factor is 1, the controller can just send ack back immediately w/o having to send any requests like StopReplica. Controlled shutdown tool doesn't seem to work out of the box Key: KAFKA-1298 URL: https://issues.apache.org/jira/browse/KAFKA-1298 Project: Kafka Issue Type: Improvement Reporter: Jay Kreps Assignee: Sriharsha Chintalapani Labels: usability Attachments: KAFKA-1298.patch, KAFKA-1298.patch Download Kafka and try to use our shutdown tool. Got this: bin/kafka-run-class.sh kafka.admin.ShutdownBroker --zookeeper localhost:2181 --broker 0 [2014-03-06 16:58:23,636] ERROR Operation failed due to controller failure (kafka.admin.ShutdownBroker$) java.io.IOException: Failed to retrieve RMIServer stub: javax.naming.ServiceUnavailableException [Root exception is java.rmi.ConnectException: Connection refused to host: jkreps-mn.linkedin.biz; nested exception is: java.net.ConnectException: Connection refused] at javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:340) at javax.management.remote.JMXConnectorFactory.connect(JMXConnectorFactory.java:249) at kafka.admin.ShutdownBroker$.kafka$admin$ShutdownBroker$$invokeShutdown(ShutdownBroker.scala:56) at kafka.admin.ShutdownBroker$.main(ShutdownBroker.scala:109) at kafka.admin.ShutdownBroker.main(ShutdownBroker.scala) Caused by: javax.naming.ServiceUnavailableException [Root exception is java.rmi.ConnectException: Connection refused to host: jkreps-mn.linkedin.biz; nested exception is: java.net.ConnectException: Connection refused] at com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:101) at com.sun.jndi.toolkit.url.GenericURLContext.lookup(GenericURLContext.java:185) at javax.naming.InitialContext.lookup(InitialContext.java:392) at javax.management.remote.rmi.RMIConnector.findRMIServerJNDI(RMIConnector.java:1888) at javax.management.remote.rmi.RMIConnector.findRMIServer(RMIConnector.java:1858) at javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:257) ... 4 more Caused by: java.rmi.ConnectException: Connection refused to host: jkreps-mn.linkedin.biz; nested exception is: java.net.ConnectException: Connection refused at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:601) at sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:198) at sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:184) at sun.rmi.server.UnicastRef.newCall(UnicastRef.java:322) at sun.rmi.registry.RegistryImpl_Stub.lookup(Unknown Source) at com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:97) ... 9 more Caused by: java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:382) at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:241) at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:228) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:431) at java.net.Socket.connect(Socket.java:527) at java.net.Socket.connect(Socket.java:476) at java.net.Socket.init(Socket.java:373) at java.net.Socket.init(Socket.java:187) at sun.rmi.transport.proxy.RMIDirectSocketFactory.createSocket(RMIDirectSocketFactory.java:22) at sun.rmi.transport.proxy.RMIMasterSocketFactory.createSocket(RMIMasterSocketFactory.java:128) at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:595) ... 14 more Oh god, RMI?!!!??? Presumably this is because we stopped setting the JMX port by default. This is good because setting the JMX port breaks the quickstart which requires running multiple nodes on a single machine. The root cause imo is just using RMI here instead of our regular RPC. -- This message was sent by Atlassian JIRA (v6.2#6252)
NoReplicaOnlineException with 0.8.1.1
Hi, I am running 0.8.1.1 with one Zookeeper and one broker. I created a partition 'test2' as below: /kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 3 --topic test2 I noticed this exception in the state-change.log. Why is this occuring ? Should I be running more brokers if I have more than one partition ? *Exception:* kafka.common.NoReplicaOnlineException: No replica for partition [test2,2] is alive. Live brokers are: [Set()], Assigned replicas are: [List(0)] at kafka.controller.OfflinePartitionLeaderSelector.selectLeader(PartitionLeaderSelector.scala:61) at kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:336) at kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:185) at kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:99) at kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:96) at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:743) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:95) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:95) at scala.collection.Iterator$class.foreach(Iterator.scala:772) at scala.collection.mutable.HashTable$$anon$1.foreach(HashTable.scala:157) at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:190) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:45) at scala.collection.mutable.HashMap.foreach(HashMap.scala:95) at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:742) at kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:96) at kafka.controller.PartitionStateMachine.startup(PartitionStateMachine.scala:68) at kafka.controller.KafkaController.onControllerFailover(KafkaController.scala:312) at kafka.controller.KafkaController$$anonfun$1.apply$mcV$sp(KafkaController.scala:162) at kafka.server.ZookeeperLeaderElector.elect(ZookeeperLeaderElector.scala:63) at kafka.server.ZookeeperLeaderElector$$anonfun$startup$1.apply$mcZ$sp(ZookeeperLeaderElector.scala:49) at kafka.server.ZookeeperLeaderElector$$anonfun$startup$1.apply(ZookeeperLeaderElector.scala:47) at kafka.server.ZookeeperLeaderElector$$anonfun$startup$1.apply(ZookeeperLeaderElector.scala:47) at kafka.utils.Utils$.inLock(Utils.scala:538) at kafka.server.ZookeeperLeaderElector.startup(ZookeeperLeaderElector.scala:47) at kafka.controller.KafkaController$$anonfun$startup$1.apply$mcV$sp(KafkaController.scala:637) at kafka.controller.KafkaController$$anonfun$startup$1.apply(KafkaController.scala:633) at kafka.controller.KafkaController$$anonfun$startup$1.apply(KafkaController.scala:633) at kafka.utils.Utils$.inLock(Utils.scala:538) at kafka.controller.KafkaController.startup(KafkaController.scala:633) at kafka.server.KafkaServer.startup(KafkaServer.scala:96) at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:34) at kafka.Kafka$.main(Kafka.scala:46) at kafka.Kafka.main(Kafka.scala) Thanks, Prakash
[jira] [Commented] (KAFKA-1298) Controlled shutdown tool doesn't seem to work out of the box
[ https://issues.apache.org/jira/browse/KAFKA-1298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020461#comment-14020461 ] Sriharsha Chintalapani commented on KAFKA-1298: --- [~junrao] Shouldn't we take the partition to offline state and stop replica. I will test this part again but when I was initially working on this I tested returning immediately if the replication factor is 1 that resulted in a error during the shutdown. Without this patch if I enable the controlled shutdown it does take the same time. Although in that case there is an exception happens and returns with controlledshutdown success. Controlled shutdown tool doesn't seem to work out of the box Key: KAFKA-1298 URL: https://issues.apache.org/jira/browse/KAFKA-1298 Project: Kafka Issue Type: Improvement Reporter: Jay Kreps Assignee: Sriharsha Chintalapani Labels: usability Attachments: KAFKA-1298.patch, KAFKA-1298.patch Download Kafka and try to use our shutdown tool. Got this: bin/kafka-run-class.sh kafka.admin.ShutdownBroker --zookeeper localhost:2181 --broker 0 [2014-03-06 16:58:23,636] ERROR Operation failed due to controller failure (kafka.admin.ShutdownBroker$) java.io.IOException: Failed to retrieve RMIServer stub: javax.naming.ServiceUnavailableException [Root exception is java.rmi.ConnectException: Connection refused to host: jkreps-mn.linkedin.biz; nested exception is: java.net.ConnectException: Connection refused] at javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:340) at javax.management.remote.JMXConnectorFactory.connect(JMXConnectorFactory.java:249) at kafka.admin.ShutdownBroker$.kafka$admin$ShutdownBroker$$invokeShutdown(ShutdownBroker.scala:56) at kafka.admin.ShutdownBroker$.main(ShutdownBroker.scala:109) at kafka.admin.ShutdownBroker.main(ShutdownBroker.scala) Caused by: javax.naming.ServiceUnavailableException [Root exception is java.rmi.ConnectException: Connection refused to host: jkreps-mn.linkedin.biz; nested exception is: java.net.ConnectException: Connection refused] at com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:101) at com.sun.jndi.toolkit.url.GenericURLContext.lookup(GenericURLContext.java:185) at javax.naming.InitialContext.lookup(InitialContext.java:392) at javax.management.remote.rmi.RMIConnector.findRMIServerJNDI(RMIConnector.java:1888) at javax.management.remote.rmi.RMIConnector.findRMIServer(RMIConnector.java:1858) at javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:257) ... 4 more Caused by: java.rmi.ConnectException: Connection refused to host: jkreps-mn.linkedin.biz; nested exception is: java.net.ConnectException: Connection refused at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:601) at sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:198) at sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:184) at sun.rmi.server.UnicastRef.newCall(UnicastRef.java:322) at sun.rmi.registry.RegistryImpl_Stub.lookup(Unknown Source) at com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:97) ... 9 more Caused by: java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:382) at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:241) at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:228) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:431) at java.net.Socket.connect(Socket.java:527) at java.net.Socket.connect(Socket.java:476) at java.net.Socket.init(Socket.java:373) at java.net.Socket.init(Socket.java:187) at sun.rmi.transport.proxy.RMIDirectSocketFactory.createSocket(RMIDirectSocketFactory.java:22) at sun.rmi.transport.proxy.RMIMasterSocketFactory.createSocket(RMIMasterSocketFactory.java:128) at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:595) ... 14 more Oh god, RMI?!!!??? Presumably this is because we stopped setting the JMX port by default. This is good because setting the JMX port breaks the quickstart which requires running multiple nodes on a single machine. The root cause imo is just using RMI here instead of our regular RPC. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (KAFKA-1438) Migrate kafka client tools
[ https://issues.apache.org/jira/browse/KAFKA-1438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020462#comment-14020462 ] Jun Rao commented on KAFKA-1438: Thanks for the patch. It seems that after the patch, the message count reported for the producer is always 0 in system tests. _test_case_name : testcase_0001 _test_class_name : ReplicaBasicTest arg : bounce_broker : false arg : broker_type : leader arg : message_producing_free_time_sec : 15 arg : num_iteration : 1 arg : num_messages_to_produce_per_producer_call : 50 arg : num_partition : 1 arg : replica_factor : 3 arg : sleep_seconds_between_producer_calls : 1 validation_status : No. of messages from consumer on [test_1] at simple_consumer_test_1-0_r1.log : 15500 No. of messages from consumer on [test_1] at simple_consumer_test_1-0_r2.log : 15500 No. of messages from consumer on [test_1] at simple_consumer_test_1-0_r3.log : 15500 Unique messages from consumer on [test_1] : 15500 Unique messages from producer on [test_1] : 0 Validate for data matched on topic [test_1] across replicas : PASSED Validate for merged log segment checksum in cluster [source] : PASSED Migrate kafka client tools -- Key: KAFKA-1438 URL: https://issues.apache.org/jira/browse/KAFKA-1438 Project: Kafka Issue Type: Bug Reporter: Guozhang Wang Assignee: Sriharsha Chintalapani Labels: newbie, tools, usability Fix For: 0.8.2 Attachments: KAFKA-1438.patch, KAFKA-1438_2014-05-27_11:45:29.patch, KAFKA-1438_2014-05-27_12:16:00.patch, KAFKA-1438_2014-05-27_17:08:59.patch, KAFKA-1438_2014-05-28_08:32:46.patch, KAFKA-1438_2014-05-28_08:36:28.patch, KAFKA-1438_2014-05-28_08:40:22.patch, KAFKA-1438_2014-05-30_11:36:01.patch, KAFKA-1438_2014-05-30_11:38:46.patch, KAFKA-1438_2014-05-30_11:42:32.patch Currently the console/perf client tools scatter across different packages, we'd better to: 1. Move Consumer/ProducerPerformance and SimpleConsumerPerformance to tools and remove the perf sub-project. 2. Move ConsoleConsumer from kafka.consumer to kafka.tools. 3. Move other consumer related tools from kafka.consumer to kafka.tools. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (KAFKA-1475) Kafka consumer stops LeaderFinder/FetcherThreads, but application does not know
[ https://issues.apache.org/jira/browse/KAFKA-1475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020463#comment-14020463 ] Guozhang Wang commented on KAFKA-1475: -- I see. This seems to me a ZkClient issue, such that it did not give all the events in the event queue and back to the caller. We have some similar findings before (https://issues.apache.org/jira/browse/KAFKA-1387). Do you see this issue often? Kafka consumer stops LeaderFinder/FetcherThreads, but application does not know --- Key: KAFKA-1475 URL: https://issues.apache.org/jira/browse/KAFKA-1475 Project: Kafka Issue Type: Bug Components: clients, consumer Affects Versions: 0.8.0 Environment: linux, rhel 6.4 Reporter: Hang Qi Assignee: Neha Narkhede Labels: consumer Attachments: 5055aeee-zk.txt We encounter an issue of consumers not consuming messages in production. ( this consumer has its own consumer group, and just consumes one topic of 3 partitions.) Based on the logs, we have following findings: 1. Zookeeper session expires, kafka highlevel consumer detected this event, and released old broker parition ownership and re-register consumer. 2. Upon creating ephemeral path in Zookeeper, it found that the path still exists, and try to read the content of the node. 3. After read back the content, it founds the content is same as that it is going to write, so it logged as [ZkClient-EventThread-428-ZK/kafka] (kafka.utils.ZkUtils$) - /consumers/consumerA/ids/consumerA-1400815740329-5055aeee exists with value { pattern:static, subscription:{ TOPIC: 1}, timestamp:1400846114845, version:1 } during connection loss; this is ok, and doing nothing. 4. After that, it throws exception indicated that the cause is org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /consumers/consumerA/ids/consumerA-1400815740329-5055aeee during rebalance. 5. After all retries failed, it gave up retry and left the LeaderFinderThread, FetcherThread stopped. Step 3 looks very weird, checking the code, there is timestamp contains in the stored data, it may be caused by Zookeeper issue. But what I am wondering is that whether it is possible to let application (kafka client users) to know that the underline LeaderFinderThread and FetcherThread are stopped, like allowing application to register some callback or throws some exception (by invalidate the KafkaStream iterator for example)? For me, it is not reasonable for the kafka client to shutdown everything and wait for next rebalance, and let application wait on iterator.hasNext() without knowing that there is something wrong underline. I've read about twiki about kafka 0.9 consumer rewrite, and there is a ConsumerRebalanceCallback interface, but I am not sure how long it will take to be ready, and how long it will take for us to migrate. Please help to look at this issue. Thanks very much! -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (KAFKA-1475) Kafka consumer stops LeaderFinder/FetcherThreads, but application does not know
[ https://issues.apache.org/jira/browse/KAFKA-1475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020533#comment-14020533 ] Hang Qi commented on KAFKA-1475: Not often. Mainly happens on some long full GC on the kafka consumer side, or some heavy load on ZK server side. Thanks for the reference, I will take a look at it. Kafka consumer stops LeaderFinder/FetcherThreads, but application does not know --- Key: KAFKA-1475 URL: https://issues.apache.org/jira/browse/KAFKA-1475 Project: Kafka Issue Type: Bug Components: clients, consumer Affects Versions: 0.8.0 Environment: linux, rhel 6.4 Reporter: Hang Qi Assignee: Neha Narkhede Labels: consumer Attachments: 5055aeee-zk.txt We encounter an issue of consumers not consuming messages in production. ( this consumer has its own consumer group, and just consumes one topic of 3 partitions.) Based on the logs, we have following findings: 1. Zookeeper session expires, kafka highlevel consumer detected this event, and released old broker parition ownership and re-register consumer. 2. Upon creating ephemeral path in Zookeeper, it found that the path still exists, and try to read the content of the node. 3. After read back the content, it founds the content is same as that it is going to write, so it logged as [ZkClient-EventThread-428-ZK/kafka] (kafka.utils.ZkUtils$) - /consumers/consumerA/ids/consumerA-1400815740329-5055aeee exists with value { pattern:static, subscription:{ TOPIC: 1}, timestamp:1400846114845, version:1 } during connection loss; this is ok, and doing nothing. 4. After that, it throws exception indicated that the cause is org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /consumers/consumerA/ids/consumerA-1400815740329-5055aeee during rebalance. 5. After all retries failed, it gave up retry and left the LeaderFinderThread, FetcherThread stopped. Step 3 looks very weird, checking the code, there is timestamp contains in the stored data, it may be caused by Zookeeper issue. But what I am wondering is that whether it is possible to let application (kafka client users) to know that the underline LeaderFinderThread and FetcherThread are stopped, like allowing application to register some callback or throws some exception (by invalidate the KafkaStream iterator for example)? For me, it is not reasonable for the kafka client to shutdown everything and wait for next rebalance, and let application wait on iterator.hasNext() without knowing that there is something wrong underline. I've read about twiki about kafka 0.9 consumer rewrite, and there is a ConsumerRebalanceCallback interface, but I am not sure how long it will take to be ready, and how long it will take for us to migrate. Please help to look at this issue. Thanks very much! -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 21588: Fix KAFKA-1430, round two
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/21588/#review44998 --- Overall, the patch looks pretty good to me. Some comments below. core/src/main/scala/kafka/api/FetchResponse.scala https://reviews.apache.org/r/21588/#comment79629 There is no need to specify val explicitly. Case class automatically makes every parameter in the constructor a val. core/src/main/scala/kafka/api/ProducerResponse.scala https://reviews.apache.org/r/21588/#comment79630 Ditto as the above. core/src/main/scala/kafka/log/FileMessageSet.scala https://reviews.apache.org/r/21588/#comment79633 Should this be a val instead of a function? core/src/main/scala/kafka/log/Log.scala https://reviews.apache.org/r/21588/#comment79634 Does time need to be val? core/src/main/scala/kafka/log/Log.scala https://reviews.apache.org/r/21588/#comment79635 Hmm, when we get an empty ByteBufferMessage, it may be important to return the corresponding offsetMetadata. core/src/main/scala/kafka/log/Log.scala https://reviews.apache.org/r/21588/#comment79636 This can only happen to regular consumer fetch requests. So, returning an UnknowOffset is ok since it's not going to be used for purgatory checking. We should describe this in the comment. core/src/main/scala/kafka/log/Log.scala https://reviews.apache.org/r/21588/#comment79637 Perhaps this can be named as convertToOffsetMetadata()? core/src/main/scala/kafka/server/KafkaServer.scala https://reviews.apache.org/r/21588/#comment79640 Could this be done inside KafkaApis? core/src/main/scala/kafka/server/LogOffsetMetadata.scala https://reviews.apache.org/r/21588/#comment79642 Could we make this a case class? Then equal() doesn't need to be overwritten. core/src/main/scala/kafka/server/LogOffsetMetadata.scala https://reviews.apache.org/r/21588/#comment79641 older() and peer() could be named better. How about offsetOnOlderSegment and offsetOnSameSegment? core/src/main/scala/kafka/server/LogOffsetMetadata.scala https://reviews.apache.org/r/21588/#comment79643 Maybe postitionDiff()? core/src/main/scala/kafka/server/RequestPurgatory.scala https://reviews.apache.org/r/21588/#comment79644 Do we need to change to override val? core/src/main/scala/kafka/server/RequestPurgatory.scala https://reviews.apache.org/r/21588/#comment79645 We will need to sync on t when doing the check (same as in line 187) to avoid the race condition that can cause two responses to be sent for the same request. - Jun Rao On June 6, 2014, 12:41 a.m., Guozhang Wang wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/21588/ --- (Updated June 6, 2014, 12:41 a.m.) Review request for kafka. Bugs: KAFKA-1430 https://issues.apache.org/jira/browse/KAFKA-1430 Repository: kafka Description --- 1. change the watch() API to checkAndMaybeWatch(). In that function, purgatory will try to add the delayed request to each keyed watchers list. a). When the watcher is trying to add the delayed request, it first check if it is already satisified, and only add the request if it is not satisfied yet. b). If one of the watchers failed to add the request since it is already satisfied, checkAndMaybeWatch() returns immediately. c). The purgatory size gauge now is the watcher lists' size plus the delayed queue size. 2. Add a LogOffsetMetadata structure, which contains a) Message offset, b) Segment file base offset, c) Relative physical position in segment file. Each replica then maintains the log offset metadata for a) current HW offset. On leader replica, the metadata includes all three values; on follower replica, the metadata only keeps the message offset (others are just -1). When a partition becomes the leader, it will use its HW message offset to construct other two values of the metadata by searching in its logs. HW offset will be updated in partition's maybeUpdateLeaderHW function. b) current log end offset. All replica maintains its own log end offset, which gets updated upon log append. The leader replica also maintain other replica's log end offset metadata, which are updated from the follower fetch request. 3. Move the readMessageSet logic from KafkaApis to ReplicaManager as part of the server-side refactoring. The log.read function now returns the fetch offset metadata along with the message set read. 4. The delayed fetch request then maintains for each of its fetching partitions the fetch log offset metadata, which is retrieved from the readMessageSet() call. 5. Delayed fetch request's satisfaction criterion now is:
[jira] [Commented] (KAFKA-1438) Migrate kafka client tools
[ https://issues.apache.org/jira/browse/KAFKA-1438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020696#comment-14020696 ] Neha Narkhede commented on KAFKA-1438: -- [~sriharsha] To rephrase Jun's finding, the way the system test relies on determining # of unique messages is by turning on DEBUG in ProducerPerformance and the parsing the log for the unique IDs that it outputs. Maybe there is a minor change required to the default log4j.properties that got moved during the patch? Migrate kafka client tools -- Key: KAFKA-1438 URL: https://issues.apache.org/jira/browse/KAFKA-1438 Project: Kafka Issue Type: Bug Reporter: Guozhang Wang Assignee: Sriharsha Chintalapani Labels: newbie, tools, usability Fix For: 0.8.2 Attachments: KAFKA-1438.patch, KAFKA-1438_2014-05-27_11:45:29.patch, KAFKA-1438_2014-05-27_12:16:00.patch, KAFKA-1438_2014-05-27_17:08:59.patch, KAFKA-1438_2014-05-28_08:32:46.patch, KAFKA-1438_2014-05-28_08:36:28.patch, KAFKA-1438_2014-05-28_08:40:22.patch, KAFKA-1438_2014-05-30_11:36:01.patch, KAFKA-1438_2014-05-30_11:38:46.patch, KAFKA-1438_2014-05-30_11:42:32.patch Currently the console/perf client tools scatter across different packages, we'd better to: 1. Move Consumer/ProducerPerformance and SimpleConsumerPerformance to tools and remove the perf sub-project. 2. Move ConsoleConsumer from kafka.consumer to kafka.tools. 3. Move other consumer related tools from kafka.consumer to kafka.tools. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (KAFKA-1430) Purgatory redesign
[ https://issues.apache.org/jira/browse/KAFKA-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-1430: - Assignee: Guozhang Wang Purgatory redesign -- Key: KAFKA-1430 URL: https://issues.apache.org/jira/browse/KAFKA-1430 Project: Kafka Issue Type: Improvement Components: core Affects Versions: 0.8.2 Reporter: Jun Rao Assignee: Guozhang Wang Attachments: KAFKA-1430.patch, KAFKA-1430_2014-06-05_17:07:37.patch, KAFKA-1430_2014-06-05_17:13:13.patch, KAFKA-1430_2014-06-05_17:40:53.patch We have seen 2 main issues with the Purgatory. 1. There is no atomic checkAndWatch functionality. So, a client typically first checks whether a request is satisfied or not and then register the watcher. However, by the time the watcher is registered, the registered item could already be satisfied. This item won't be satisfied until the next update happens or the delayed time expires, which means the watched item could be delayed. 2. FetchRequestPurgatory doesn't quite work. This is because the current design tries to incrementally maintain the accumulated bytes ready for fetch. However, this is difficult since the right time to check whether a fetch (for regular consumer) request is satisfied is when the high watermark moves. At that point, it's hard to figure out how many bytes we should incrementally add to each pending fetch request. The problem has been reported in KAFKA-1150 and KAFKA-703. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (KAFKA-1438) Migrate kafka client tools
[ https://issues.apache.org/jira/browse/KAFKA-1438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020703#comment-14020703 ] Sriharsha Chintalapani commented on KAFKA-1438: --- [~junrao] [~nehanarkhede] I'll take a look and update the patch. I might have missed it in changing system test. Thanks. Migrate kafka client tools -- Key: KAFKA-1438 URL: https://issues.apache.org/jira/browse/KAFKA-1438 Project: Kafka Issue Type: Bug Reporter: Guozhang Wang Assignee: Sriharsha Chintalapani Labels: newbie, tools, usability Fix For: 0.8.2 Attachments: KAFKA-1438.patch, KAFKA-1438_2014-05-27_11:45:29.patch, KAFKA-1438_2014-05-27_12:16:00.patch, KAFKA-1438_2014-05-27_17:08:59.patch, KAFKA-1438_2014-05-28_08:32:46.patch, KAFKA-1438_2014-05-28_08:36:28.patch, KAFKA-1438_2014-05-28_08:40:22.patch, KAFKA-1438_2014-05-30_11:36:01.patch, KAFKA-1438_2014-05-30_11:38:46.patch, KAFKA-1438_2014-05-30_11:42:32.patch Currently the console/perf client tools scatter across different packages, we'd better to: 1. Move Consumer/ProducerPerformance and SimpleConsumerPerformance to tools and remove the perf sub-project. 2. Move ConsoleConsumer from kafka.consumer to kafka.tools. 3. Move other consumer related tools from kafka.consumer to kafka.tools. -- This message was sent by Atlassian JIRA (v6.2#6252)