Hi Jun, Thanks for the clarification - all makes sense now - we've only ever seen this happen just the one time.
Best, hmb. On Tue, Dec 6, 2011 at 11:25 AM, Jun Rao <[email protected]> wrote: > Hisham, > > The brokerid registered under /brokers/ids is an ephemeral node, which > means that even if the broker is force killed, ZK server should remove it > after the session timeout (by default, 6 secs). In your case, it seems that > the ephemeral node never went away? We know that a ZK bug > (ZOOKEEPER-1208<https://issues.apache.org/jira/browse/ZOOKEEPER-1208>) > can > cause this. However, it should happen rarely. The bug has been fixed in ZK > 3.3.4 and we will do an upgrade soon. > > Thanks, > > Jun > > On Tue, Dec 6, 2011 at 8:13 AM, Hisham Mardam-Bey <[email protected]>wrote: > >> Hi again, >> >> I've cleaned up /brokers/ids/1 from ZK and all is fine again now - did >> not get cleaned up on broker shutdown? >> >> hmb. >> >> On Tue, Dec 6, 2011 at 10:50 AM, Hisham Mardam-Bey <[email protected]> >> wrote: >> > Hi folks, >> > >> > One of our Kafka brokers, after restarting it today, gave us the >> > following fatal error, whats the recommended way to fixing? >> > >> > [2011-12-06 10:28:07,664] INFO The retention hour for ADD_FEED_LOG : >> > 72 (kafka.utils.Utils$) >> > [2011-12-06 10:28:07,665] INFO The retention hour for READ_FEED_LOG : >> > 72 (kafka.utils.Utils$) >> > [2011-12-06 10:28:07,669] INFO Starting Kafka server... >> > (kafka.server.KafkaServer) >> > [2011-12-06 10:28:07,683] INFO Loading log 'HOTLIST_REMOVE-0' >> > (kafka.log.LogManager) >> > [2011-12-06 10:28:07,699] INFO Loading the last segment >> > /usr/local/kafka/data/HOTLIST_REMOVE-0/00000000000000000000.kafka in >> > mutable mode, recovery false (kafka.log.Log >> > [2011-12-06 10:28:07,714] INFO Loading log 'PROFILE_UPDATE-0' >> > (kafka.log.LogManager) >> > [2011-12-06 10:28:07,715] INFO Loading the last segment >> > /usr/local/kafka/data/PROFILE_UPDATE-0/00000000000000000000.kafka in >> > mutable mode, recovery false (kafka.log.Log >> > <snip> >> > ... >> > </snip> >> > [2011-12-06 10:28:07,724] INFO starting log cleaner every 60000 ms >> > (kafka.log.LogManager) >> > [2011-12-06 10:28:07,729] INFO connecting to ZK: >> > kafka1.prod.mate1:2181,kafka2.prod.mate1:2181,kafka3.prod.mate1:2181 >> > (kafka.server.KafkaZooKeeper) >> > [2011-12-06 10:28:07,739] INFO Starting ZkClient event thread. >> > (org.I0Itec.zkclient.ZkEventThread) >> > [2011-12-06 10:28:07,745] INFO Client >> > environment:zookeeper.version=3.3.3-1073969, built on 02/23/2011 22:27 >> > GMT (org.apache.zookeeper.ZooKeeper) >> > [2011-12-06 10:28:07,745] INFO Client >> > environment:host.name=kafka1.prod.mate1 >> > (org.apache.zookeeper.ZooKeeper) >> > [2011-12-06 10:28:07,745] INFO Client >> > environment:java.version=1.6.0_24 (org.apache.zookeeper.ZooKeeper) >> > [2011-12-06 10:28:07,745] INFO Client environment:java.vendor=Sun >> > Microsystems Inc. (org.apache.zookeeper.ZooKeeper) >> > [2011-12-06 10:28:07,745] INFO Client >> > environment:java.home=/usr/local/jre1.6.0_24 >> > (org.apache.zookeeper.ZooKeeper) >> > [2011-12-06 10:28:07,745] INFO Client >> > >> environment:java.class.path=:./bin/../project/boot/scala-2.9.0-1/lib/scala-compiler.jar:./bin/../project/boot/scala-2.9.0-1/lib/sc >> > >> /bin/../core/lib/zkclient-20110412.jar:./bin/../core/lib_managed/scala_2.9.0-1/compile/jopt-simple-3.2.jar:./bin/../core/lib_managed/scala_2.9.0-1/compile/log4j-1.2.15. >> > >> 4.1.jar:./bin/../core/lib_managed/scala_2.9.0-1/compile/zookeeper-3.3.3.jar >> > (org.apache.zookeeper.ZooKeeper) >> > [2011-12-06 10:28:07,745] INFO Client >> > >> environment:java.library.path=/usr/local/jre1.6.0_24/lib/amd64/server:/usr/local/jre1.6.0_24/lib/amd64:/usr/local/jre1.6.0_24/../l >> > b (org.apache.zookeeper.ZooKeeper) >> > [2011-12-06 10:28:07,745] INFO Client environment:java.io.tmpdir=/tmp >> > (org.apache.zookeeper.ZooKeeper) >> > [2011-12-06 10:28:07,745] INFO Client environment:java.compiler=<NA> >> > (org.apache.zookeeper.ZooKeeper) >> > [2011-12-06 10:28:07,745] INFO Client environment:os.name=Linux >> > (org.apache.zookeeper.ZooKeeper) >> > [2011-12-06 10:28:07,745] INFO Client environment:os.arch=amd64 >> > (org.apache.zookeeper.ZooKeeper) >> > [2011-12-06 10:28:07,745] INFO Client >> > environment:os.version=2.6.32-cgroup-memcap >> > (org.apache.zookeeper.ZooKeeper) >> > [2011-12-06 10:28:07,746] INFO Client environment:user.name=nobody >> > (org.apache.zookeeper.ZooKeeper) >> > [2011-12-06 10:28:07,746] INFO Client >> > environment:user.home=/nonexistent (org.apache.zookeeper.ZooKeeper) >> > [2011-12-06 10:28:07,746] INFO Client >> > environment:user.dir=/usr/local/kafka (org.apache.zookeeper.ZooKeeper) >> > [2011-12-06 10:28:07,746] INFO Initiating client connection, >> > >> connectString=kafka1.prod.mate1:2181,kafka2.prod.mate1:2181,kafka3.prod.mate1:2181 >> > sessionTimeout=6000 watc >> > eeper) >> > [2011-12-06 10:28:07,762] INFO Opening socket connection to server >> > kafka3.prod.mate1/10.10.7.14:2181 (org.apache.zookeeper.ClientCnxn) >> > [2011-12-06 10:28:07,766] INFO Socket connection established to >> > kafka3.prod.mate1/10.10.7.14:2181, initiating session >> > (org.apache.zookeeper.ClientCnxn) >> > [2011-12-06 10:28:07,773] INFO Session establishment complete on >> > server kafka3.prod.mate1/10.10.7.14:2181, sessionid = >> > 0x3340d5bd0b60391, negotiated timeout = 6000 (org >> > [2011-12-06 10:28:07,775] INFO zookeeper state changed (SyncConnected) >> > (org.I0Itec.zkclient.ZkClient) >> > [2011-12-06 10:28:07,857] INFO Awaiting connections on port 9092 >> > (kafka.network.Acceptor) >> > [2011-12-06 10:28:07,859] INFO Will not load MX4J, mx4j-tools.jar is >> > not in the classpath (kafka.utils.Mx4jLoader$) >> > [2011-12-06 10:28:07,859] INFO Registering broker /brokers/ids/1 >> > (kafka.server.KafkaZooKeeper) >> > [2011-12-06 10:28:07,881] INFO conflict in /brokers/ids/1 data: >> > 10.10.7.12-1323185287859:10.10.7.12:9092 stored data: >> > 10.10.7.12-1322928782935:10.10.7.12:9092 (kafka.ut >> > [2011-12-06 10:28:07,882] FATAL Fatal error during startup. >> > (kafka.server.KafkaServer) >> > java.lang.RuntimeException: A broker is already registered on the path >> > /brokers/ids/1. This probably indicates that you either have >> > configured a brokerid that is alread >> > ter than the zookeeper timeout so it appears to be re-registering. >> > at >> kafka.server.KafkaZooKeeper.registerBrokerInZk(KafkaZooKeeper.scala:60) >> > at kafka.log.LogManager.startup(LogManager.scala:124) >> > at kafka.server.KafkaServer.startup(KafkaServer.scala:80) >> > at >> kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:47) >> > at kafka.Kafka$.main(Kafka.scala:60) >> > at kafka.Kafka.main(Kafka.scala) >> > [2011-12-06 10:28:07,883] INFO Shutting down... >> (kafka.server.KafkaServer) >> > [2011-12-06 10:28:07,883] INFO shutdown scheduler kafka-logcleaner- >> > (kafka.utils.KafkaScheduler) >> > [2011-12-06 10:28:07,890] INFO shutdown scheduler kafka-logflusher- >> > (kafka.utils.KafkaScheduler) >> > [2011-12-06 10:28:07,909] INFO Closing zookeeper client... >> > (kafka.server.KafkaZooKeeper) >> > [2011-12-06 10:28:07,909] INFO Terminate ZkClient event thread. >> > (org.I0Itec.zkclient.ZkEventThread) >> > [2011-12-06 10:28:07,909] INFO zkActor stopped (kafka.log.LogManager) >> > [2011-12-06 10:28:07,912] INFO Session: 0x3340d5bd0b60391 closed >> > (org.apache.zookeeper.ZooKeeper) >> > [2011-12-06 10:28:07,912] INFO EventThread shut down >> > (org.apache.zookeeper.ClientCnxn) >> > [2011-12-06 10:28:07,912] INFO shut down completed >> (kafka.server.KafkaServer) >> > >> > Thanks! >> > >> > hisham. >> > >> > -- >> > Hisham Mardam Bey >> > Director of Engineering | Mate1 Inc. >> > 4200 St. Laurent Boulevard | Suite 550 >> > Montreal, Quebec | H2W 2R2 >> > t. +1.514.393.1414 x264 >> > >> > A: Because it messes up the order in which people normally read text. >> > Q: Why is top-posting such a bad thing? >> > A: Top-posting. >> > Q: What is the most annoying thing in e-mail? >> > >> > -=[ Codito Ergo Sum ]=- >> >> >> >> -- >> Hisham Mardam Bey >> Director of Engineering | Mate1 Inc. >> 4200 St. Laurent Boulevard | Suite 550 >> Montreal, Quebec | H2W 2R2 >> t. +1.514.393.1414 x264 >> >> A: Because it messes up the order in which people normally read text. >> Q: Why is top-posting such a bad thing? >> A: Top-posting. >> Q: What is the most annoying thing in e-mail? >> >> -=[ Codito Ergo Sum ]=- >> -- Hisham Mardam Bey Director of Engineering | Mate1 Inc. 4200 St. Laurent Boulevard | Suite 550 Montreal, Quebec | H2W 2R2 t. +1.514.393.1414 x264 A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? A: Top-posting. Q: What is the most annoying thing in e-mail? -=[ Codito Ergo Sum ]=-
