[ https://issues.apache.org/jira/browse/KAFKA-1298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14025457#comment-14025457 ]
Jun Rao commented on KAFKA-1298: -------------------------------- Sriharsha, That makes sense. Perhaps we could do the following. 1. Make the change to skip StopReplica if replication factor is 1. 2. Add an option in createBrokerConfig() to disable controlled shutdown. 3. For all unit tests that create more than 1 replica, disable controlled shutdown when calling createBrokerConfig(). In unit tests, we need to stop all brokers during tearDown(). So, controlled shutdown is going to timeout. We can then see if the time to complete all unit tests is comparable to what we have before. > Controlled shutdown tool doesn't seem to work out of the box > ------------------------------------------------------------ > > Key: KAFKA-1298 > URL: https://issues.apache.org/jira/browse/KAFKA-1298 > Project: Kafka > Issue Type: Improvement > Reporter: Jay Kreps > Assignee: Sriharsha Chintalapani > Labels: usability > Attachments: KAFKA-1298.patch, KAFKA-1298.patch > > > Download Kafka and try to use our shutdown tool. Got this: > bin/kafka-run-class.sh kafka.admin.ShutdownBroker --zookeeper localhost:2181 > --broker 0 > [2014-03-06 16:58:23,636] ERROR Operation failed due to controller failure > (kafka.admin.ShutdownBroker$) > java.io.IOException: Failed to retrieve RMIServer stub: > javax.naming.ServiceUnavailableException [Root exception is > java.rmi.ConnectException: Connection refused to host: > jkreps-mn.linkedin.biz; nested exception is: > java.net.ConnectException: Connection refused] > at > javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:340) > at > javax.management.remote.JMXConnectorFactory.connect(JMXConnectorFactory.java:249) > at > kafka.admin.ShutdownBroker$.kafka$admin$ShutdownBroker$$invokeShutdown(ShutdownBroker.scala:56) > at kafka.admin.ShutdownBroker$.main(ShutdownBroker.scala:109) > at kafka.admin.ShutdownBroker.main(ShutdownBroker.scala) > Caused by: javax.naming.ServiceUnavailableException [Root exception is > java.rmi.ConnectException: Connection refused to host: > jkreps-mn.linkedin.biz; nested exception is: > java.net.ConnectException: Connection refused] > at > com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:101) > at > com.sun.jndi.toolkit.url.GenericURLContext.lookup(GenericURLContext.java:185) > at javax.naming.InitialContext.lookup(InitialContext.java:392) > at > javax.management.remote.rmi.RMIConnector.findRMIServerJNDI(RMIConnector.java:1888) > at > javax.management.remote.rmi.RMIConnector.findRMIServer(RMIConnector.java:1858) > at > javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:257) > ... 4 more > Caused by: java.rmi.ConnectException: Connection refused to host: > jkreps-mn.linkedin.biz; nested exception is: > java.net.ConnectException: Connection refused > at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:601) > at > sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:198) > at sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:184) > at sun.rmi.server.UnicastRef.newCall(UnicastRef.java:322) > at sun.rmi.registry.RegistryImpl_Stub.lookup(Unknown Source) > at > com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:97) > ... 9 more > Caused by: java.net.ConnectException: Connection refused > at java.net.PlainSocketImpl.socketConnect(Native Method) > at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:382) > at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:241) > at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:228) > at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:431) > at java.net.Socket.connect(Socket.java:527) > at java.net.Socket.connect(Socket.java:476) > at java.net.Socket.<init>(Socket.java:373) > at java.net.Socket.<init>(Socket.java:187) > at > sun.rmi.transport.proxy.RMIDirectSocketFactory.createSocket(RMIDirectSocketFactory.java:22) > at > sun.rmi.transport.proxy.RMIMasterSocketFactory.createSocket(RMIMasterSocketFactory.java:128) > at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:595) > ... 14 more > Oh god, RMI?????!!!??? > Presumably this is because we stopped setting the JMX port by default. This > is good because setting the JMX port breaks the quickstart which requires > running multiple nodes on a single machine. The root cause imo is just using > RMI here instead of our regular RPC. -- This message was sent by Atlassian JIRA (v6.2#6252)