[jira] [Commented] (ZOOKEEPER-2347) Deadlock shutting down zookeeper
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15093401#comment-15093401 ] Rakesh R commented on ZOOKEEPER-2347: - Thanks [~rgs] for the reviews. bq. One question though, why use NettyServerCnxnFactory for the test instead of the NIO one (which much more used)? No specific reason. Test scenario has no relation with either Netty or NIO. bq. Also, how can we validate if the HBase tests now pass? Sometime back Ted has updated Hbase test status in jira, please see the [comments|https://issues.apache.org/jira/browse/ZOOKEEPER-2347?focusedCommentId=15063086&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15063086]. Thanks [~yuzhih...@gmail.com] for the test results. > Deadlock shutting down zookeeper > > > Key: ZOOKEEPER-2347 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2347 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.4.7 >Reporter: Ted Yu >Assignee: Rakesh R >Priority: Blocker > Fix For: 3.4.8 > > Attachments: ZOOKEEPER-2347-br-3.4.patch, > ZOOKEEPER-2347-br-3.4.patch, ZOOKEEPER-2347-br-3.4.patch, > ZOOKEEPER-2347-br-3.4.patch, testSplitLogManager.stack > > > HBase recently upgraded to zookeeper 3.4.7 > In one of the tests, TestSplitLogManager, there is reproducible hang at the > end of the test. > Below is snippet from stack trace related to zookeeper: > {code} > "main-EventThread" daemon prio=5 tid=0x7fd27488a800 nid=0x6f1f waiting on > condition [0x00011834b000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x0007c5b8d3a0> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) > at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501) > "main-SendThread(localhost:59510)" daemon prio=5 tid=0x7fd274eb4000 > nid=0x9513 waiting on condition [0x000118042000] >java.lang.Thread.State: TIMED_WAITING (sleeping) > at java.lang.Thread.sleep(Native Method) > at > org.apache.zookeeper.client.StaticHostProvider.next(StaticHostProvider.java:101) > at > org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:997) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1060) > "SyncThread:0" prio=5 tid=0x7fd274d02000 nid=0x730f waiting for monitor > entry [0x0001170ac000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.zookeeper.server.ZooKeeperServer.decInProcess(ZooKeeperServer.java:512) > - waiting to lock <0x0007c5b62128> (a > org.apache.zookeeper.server.ZooKeeperServer) > at > org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:144) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:200) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:131) > "main-EventThread" daemon prio=5 tid=0x7fd2753a3800 nid=0x711b waiting on > condition [0x000117a3] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x0007c9b106b8> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) > at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501) > "main" prio=5 tid=0x7fd27600 nid=0x1903 in Object.wait() > [0x000108aa1000] >java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > - waiting on <0x0007c5b66400> (a > org.apache.zookeeper.server.SyncRequestProcessor) > at java.lang.Thread.join(Thread.java:1281) > - locked <0x0007c5b66400> (a > org.apache.zookeeper.server.SyncRequestProcessor) > at java.lang.Thread.join(Thread.java:1355) > at > org.apache.zookeeper.server.SyncRequestProcessor.shutdown(SyncRequestProcessor.java:213) > at > org.apache.zookeeper.server.PrepRequestProcessor.shutdown(PrepRequestProcessor.java:770) > at > org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:478) > - locked <0x0007c5b62128> (a > org.apache.zookeeper.server.ZooKeeperServer) > at > org.apac
[jira] [Commented] (ZOOKEEPER-2347) Deadlock shutting down zookeeper
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15093366#comment-15093366 ] Raul Gutierrez Segales commented on ZOOKEEPER-2347: --- It lgtm - thanks [~rakeshr] and [~fpj]. One question though, why use NettyServerCnxnFactory for the test instead of the NIO one (which much more used)? [~cnauroth]: mind taking a look as well? Also, how can we validate if the HBase tests now pass? > Deadlock shutting down zookeeper > > > Key: ZOOKEEPER-2347 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2347 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.4.7 >Reporter: Ted Yu >Assignee: Rakesh R >Priority: Blocker > Fix For: 3.4.8 > > Attachments: ZOOKEEPER-2347-br-3.4.patch, > ZOOKEEPER-2347-br-3.4.patch, ZOOKEEPER-2347-br-3.4.patch, > ZOOKEEPER-2347-br-3.4.patch, testSplitLogManager.stack > > > HBase recently upgraded to zookeeper 3.4.7 > In one of the tests, TestSplitLogManager, there is reproducible hang at the > end of the test. > Below is snippet from stack trace related to zookeeper: > {code} > "main-EventThread" daemon prio=5 tid=0x7fd27488a800 nid=0x6f1f waiting on > condition [0x00011834b000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x0007c5b8d3a0> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) > at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501) > "main-SendThread(localhost:59510)" daemon prio=5 tid=0x7fd274eb4000 > nid=0x9513 waiting on condition [0x000118042000] >java.lang.Thread.State: TIMED_WAITING (sleeping) > at java.lang.Thread.sleep(Native Method) > at > org.apache.zookeeper.client.StaticHostProvider.next(StaticHostProvider.java:101) > at > org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:997) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1060) > "SyncThread:0" prio=5 tid=0x7fd274d02000 nid=0x730f waiting for monitor > entry [0x0001170ac000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.zookeeper.server.ZooKeeperServer.decInProcess(ZooKeeperServer.java:512) > - waiting to lock <0x0007c5b62128> (a > org.apache.zookeeper.server.ZooKeeperServer) > at > org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:144) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:200) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:131) > "main-EventThread" daemon prio=5 tid=0x7fd2753a3800 nid=0x711b waiting on > condition [0x000117a3] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x0007c9b106b8> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) > at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501) > "main" prio=5 tid=0x7fd27600 nid=0x1903 in Object.wait() > [0x000108aa1000] >java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > - waiting on <0x0007c5b66400> (a > org.apache.zookeeper.server.SyncRequestProcessor) > at java.lang.Thread.join(Thread.java:1281) > - locked <0x0007c5b66400> (a > org.apache.zookeeper.server.SyncRequestProcessor) > at java.lang.Thread.join(Thread.java:1355) > at > org.apache.zookeeper.server.SyncRequestProcessor.shutdown(SyncRequestProcessor.java:213) > at > org.apache.zookeeper.server.PrepRequestProcessor.shutdown(PrepRequestProcessor.java:770) > at > org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:478) > - locked <0x0007c5b62128> (a > org.apache.zookeeper.server.ZooKeeperServer) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.shutdown(NIOServerCnxnFactory.java:266) > at > org.apache.hadoop.hbase.zookeeper.MiniZooKeeperCluster.shutdown(MiniZooKeeperCluster.java:301) > {code} > Note the address (0x0007c5b66400) in the last hunk which seems to > indicate some form of deadlock. > Accord
[jira] [Commented] (ZOOKEEPER-2353) QuorumCnxManager protocol needs to be upgradable with-in a specific Version
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15093353#comment-15093353 ] Raul Gutierrez Segales commented on ZOOKEEPER-2353: --- Tackling changing the serialization mechanism probably needs to be decoupled from this. We'll probably have to support Jute forever, so we can start with that and then explore using protobuf for server to server messages. > QuorumCnxManager protocol needs to be upgradable with-in a specific Version > --- > > Key: ZOOKEEPER-2353 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2353 > Project: ZooKeeper > Issue Type: Improvement >Affects Versions: 3.4.7, 3.5.1 >Reporter: Powell Molleti > > Currently 3.5.X sends its hdr as follows: > {code:title=QuorumCnxManager.java|borderStyle=solid} > dout.writeLong(PROTOCOL_VERSION); > dout.writeLong(self.getId()); > String addr = self.getElectionAddress().getHostString() + ":" + > self.getElectionAddress().getPort(); > byte[] addr_bytes = addr.getBytes(); > dout.writeInt(addr_bytes.length); > dout.write(addr_bytes); > dout.flush(); > {code} > Since it writes length of host and port byte string there is no simple way to > append new fields to this hdr anymore. I.e the rx side has to consider all > bytes after sid for host and port parsing, which is what it does here: > [QuorumCnxManager.InitialMessage.parse(): http://bit.ly/1Q0znpW] > {code:title=QuorumCnxManager.java|borderStyle=solid} > sid = din.readLong(); > int remaining = din.readInt(); > if (remaining <= 0 || remaining > maxBuffer) { > throw new InitialMessageException( > "Unreasonable buffer length: %s", remaining); > } > byte[] b = new byte[remaining]; > int num_read = din.read(b); > if (num_read != remaining) { > throw new InitialMessageException( > "Read only %s bytes out of %s sent by server %s", > num_read, remaining, sid); > } > // FIXME: IPv6 is not supported. Using something like Guava's > HostAndPort > //parser would be good. > String addr = new String(b); > String[] host_port = addr.split(":"); > {code} > This has been captured in the discussion here: ZOOKEEPER-2186. > Though it is possible to circumvent this problem by various means the request > here is to design messages with hdr such that there is no need to bump > version number or hack certain fields (i.e figure out if its length of > host/port or length of different message etc, in the above case). > This is the idea here as captured in ZOOKEEPER-2186. > {code:java} > dout.writeLong(PROTOCOL_VERSION); > String addr = self.getElectionAddress().getHostString() + ":" + > self.getElectionAddress().getPort(); > byte[] addr_bytes = addr.getBytes(); > // After version write the total length of msg sent by sender. > dout.writeInt(Long.BYTES + addr_bytes.length); > // Write sid afterwards > dout.writeLong(self.getId()); > // Write length of host/port string > dout.writeInt(addr_bytes.length); > // Write host/port string > dout.write(addr_bytes); > {code} > Since total length of the message and length of each variable field is also > present it is quite easy to provide backward compatibility, w.r.t to parsing > of the message. > Older code will read the length of message it knows and ignore the rest. > Newer revision(s), that wants to keep things compatible, will only append to > hdr and not change the meaning of current fields. > I am guessing this was the original intent w.r.t the introduction of protocol > version here: ZOOKEEPER-1633 > Since 3.4.x code does not parse this and 3.5.x is still in alpha mode perhaps > it is possible to consider this change now?. > Also I would like to propose to carefully consider the option of using > protobufs for the next protocol version bump. This will prevent issues like > this in the future. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2353) QuorumCnxManager protocol needs to be upgradable with-in a specific Version
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15093302#comment-15093302 ] Alexander Shraer commented on ZOOKEEPER-2353: - You're absolutely right, its a hack. My intention was to get something working for the purpose of ZOOKEEPER-107 and get back to it in a separate JIRA, but I never got to it, sorry... I think using protobufs (or similar) here and elsewhere is a great idea. Currently ZooKeeper uses Jute for client-server messages but apparently the intention was also to replace it at some point, see ZOOKEEPER-102. One concern may be the impact on ZooKeeper performance of such serialization libraries - this needs to be evaluated. There were also backward compatibility concerns raised in ZK-102. > QuorumCnxManager protocol needs to be upgradable with-in a specific Version > --- > > Key: ZOOKEEPER-2353 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2353 > Project: ZooKeeper > Issue Type: Improvement >Affects Versions: 3.4.7, 3.5.1 >Reporter: Powell Molleti > > Currently 3.5.X sends its hdr as follows: > {code:title=QuorumCnxManager.java|borderStyle=solid} > dout.writeLong(PROTOCOL_VERSION); > dout.writeLong(self.getId()); > String addr = self.getElectionAddress().getHostString() + ":" + > self.getElectionAddress().getPort(); > byte[] addr_bytes = addr.getBytes(); > dout.writeInt(addr_bytes.length); > dout.write(addr_bytes); > dout.flush(); > {code} > Since it writes length of host and port byte string there is no simple way to > append new fields to this hdr anymore. I.e the rx side has to consider all > bytes after sid for host and port parsing, which is what it does here: > [QuorumCnxManager.InitialMessage.parse(): http://bit.ly/1Q0znpW] > {code:title=QuorumCnxManager.java|borderStyle=solid} > sid = din.readLong(); > int remaining = din.readInt(); > if (remaining <= 0 || remaining > maxBuffer) { > throw new InitialMessageException( > "Unreasonable buffer length: %s", remaining); > } > byte[] b = new byte[remaining]; > int num_read = din.read(b); > if (num_read != remaining) { > throw new InitialMessageException( > "Read only %s bytes out of %s sent by server %s", > num_read, remaining, sid); > } > // FIXME: IPv6 is not supported. Using something like Guava's > HostAndPort > //parser would be good. > String addr = new String(b); > String[] host_port = addr.split(":"); > {code} > This has been captured in the discussion here: ZOOKEEPER-2186. > Though it is possible to circumvent this problem by various means the request > here is to design messages with hdr such that there is no need to bump > version number or hack certain fields (i.e figure out if its length of > host/port or length of different message etc, in the above case). > This is the idea here as captured in ZOOKEEPER-2186. > {code:java} > dout.writeLong(PROTOCOL_VERSION); > String addr = self.getElectionAddress().getHostString() + ":" + > self.getElectionAddress().getPort(); > byte[] addr_bytes = addr.getBytes(); > // After version write the total length of msg sent by sender. > dout.writeInt(Long.BYTES + addr_bytes.length); > // Write sid afterwards > dout.writeLong(self.getId()); > // Write length of host/port string > dout.writeInt(addr_bytes.length); > // Write host/port string > dout.write(addr_bytes); > {code} > Since total length of the message and length of each variable field is also > present it is quite easy to provide backward compatibility, w.r.t to parsing > of the message. > Older code will read the length of message it knows and ignore the rest. > Newer revision(s), that wants to keep things compatible, will only append to > hdr and not change the meaning of current fields. > I am guessing this was the original intent w.r.t the introduction of protocol > version here: ZOOKEEPER-1633 > Since 3.4.x code does not parse this and 3.5.x is still in alpha mode perhaps > it is possible to consider this change now?. > Also I would like to propose to carefully consider the option of using > protobufs for the next protocol version bump. This will prevent issues like > this in the future. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2353) QuorumCnxManager protocol needs to be upgradable with-in a specific Version
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15093050#comment-15093050 ] Akihiro Suda commented on ZOOKEEPER-2353: - ZOOKEEPER-1931 uses protobuf, so I added the link to this issue. https://github.com/zk1931/jzab/ > QuorumCnxManager protocol needs to be upgradable with-in a specific Version > --- > > Key: ZOOKEEPER-2353 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2353 > Project: ZooKeeper > Issue Type: Improvement >Affects Versions: 3.4.7, 3.5.1 >Reporter: Powell Molleti > > Currently 3.5.X sends its hdr as follows: > {code:title=QuorumCnxManager.java|borderStyle=solid} > dout.writeLong(PROTOCOL_VERSION); > dout.writeLong(self.getId()); > String addr = self.getElectionAddress().getHostString() + ":" + > self.getElectionAddress().getPort(); > byte[] addr_bytes = addr.getBytes(); > dout.writeInt(addr_bytes.length); > dout.write(addr_bytes); > dout.flush(); > {code} > Since it writes length of host and port byte string there is no simple way to > append new fields to this hdr anymore. I.e the rx side has to consider all > bytes after sid for host and port parsing, which is what it does here: > [QuorumCnxManager.InitialMessage.parse(): http://bit.ly/1Q0znpW] > {code:title=QuorumCnxManager.java|borderStyle=solid} > sid = din.readLong(); > int remaining = din.readInt(); > if (remaining <= 0 || remaining > maxBuffer) { > throw new InitialMessageException( > "Unreasonable buffer length: %s", remaining); > } > byte[] b = new byte[remaining]; > int num_read = din.read(b); > if (num_read != remaining) { > throw new InitialMessageException( > "Read only %s bytes out of %s sent by server %s", > num_read, remaining, sid); > } > // FIXME: IPv6 is not supported. Using something like Guava's > HostAndPort > //parser would be good. > String addr = new String(b); > String[] host_port = addr.split(":"); > {code} > This has been captured in the discussion here: ZOOKEEPER-2186. > Though it is possible to circumvent this problem by various means the request > here is to design messages with hdr such that there is no need to bump > version number or hack certain fields (i.e figure out if its length of > host/port or length of different message etc, in the above case). > This is the idea here as captured in ZOOKEEPER-2186. > {code:java} > dout.writeLong(PROTOCOL_VERSION); > String addr = self.getElectionAddress().getHostString() + ":" + > self.getElectionAddress().getPort(); > byte[] addr_bytes = addr.getBytes(); > // After version write the total length of msg sent by sender. > dout.writeInt(Long.BYTES + addr_bytes.length); > // Write sid afterwards > dout.writeLong(self.getId()); > // Write length of host/port string > dout.writeInt(addr_bytes.length); > // Write host/port string > dout.write(addr_bytes); > {code} > Since total length of the message and length of each variable field is also > present it is quite easy to provide backward compatibility, w.r.t to parsing > of the message. > Older code will read the length of message it knows and ignore the rest. > Newer revision(s), that wants to keep things compatible, will only append to > hdr and not change the meaning of current fields. > I am guessing this was the original intent w.r.t the introduction of protocol > version here: ZOOKEEPER-1633 > Since 3.4.x code does not parse this and 3.5.x is still in alpha mode perhaps > it is possible to consider this change now?. > Also I would like to propose to carefully consider the option of using > protobufs for the next protocol version bump. This will prevent issues like > this in the future. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1045) Quorum Peer mutual authentication
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15092670#comment-15092670 ] Powell Molleti commented on ZOOKEEPER-1045: --- Is there much user demand for Kerberos support for inter-zk channels?. Will ZK have to always get token from KDC first before authenticating a peer?. I am not quite familiar with SASL Java API can you shed some light into the system level process. Does this provide encryption of the data traffic using the shared secret key?. > Quorum Peer mutual authentication > - > > Key: ZOOKEEPER-1045 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1045 > Project: ZooKeeper > Issue Type: New Feature > Components: server >Reporter: Eugene Koontz >Assignee: Rakesh R > Attachments: ZOOKEEPER-1045-00.patch, ZOOKEEPER-1045-Rolling Upgrade > Design Proposal.pdf > > > ZOOKEEPER-938 addresses mutual authentication between clients and servers. > This bug, on the other hand, is for authentication among quorum peers. > Hopefully much of the work done on SASL integration with Zookeeper for > ZOOKEEPER-938 can be used as a foundation for this enhancement. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2186) QuorumCnxManager#receiveConnection may crash with random input
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15092564#comment-15092564 ] Powell Molleti commented on ZOOKEEPER-2186: --- ZOOKEEPER-2353 > QuorumCnxManager#receiveConnection may crash with random input > -- > > Key: ZOOKEEPER-2186 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2186 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6, 3.5.0 >Reporter: Raul Gutierrez Segales >Assignee: Raul Gutierrez Segales > Fix For: 3.4.7, 3.5.1, 3.6.0 > > Attachments: ZOOKEEPER-2186-v3.4.patch, ZOOKEEPER-2186.patch, > ZOOKEEPER-2186.patch, ZOOKEEPER-2186.patch > > > This will allocate an arbitrarily large byte buffer (and try to read it!): > {code} > public boolean receiveConnection(Socket sock) { > Long sid = null; > ... > sid = din.readLong(); > // next comes the #bytes in the remainder of the message > > int num_remaining_bytes = din.readInt(); > byte[] b = new byte[num_remaining_bytes]; > // remove the remainder of the message from din > > int num_read = din.read(b); > {code} > This will crash the QuorumCnxManager thread, so the cluster will keep going > but future elections might fail to converge (ditto for leaving/joining > members). > Patch coming up in a bit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (ZOOKEEPER-2353) QuorumCnxManager protocol needs to be upgradable with-in a specific Version
Powell Molleti created ZOOKEEPER-2353: - Summary: QuorumCnxManager protocol needs to be upgradable with-in a specific Version Key: ZOOKEEPER-2353 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2353 Project: ZooKeeper Issue Type: Improvement Affects Versions: 3.5.1, 3.4.7 Reporter: Powell Molleti Currently 3.5.X sends its hdr as follows: {code:title=QuorumCnxManager.java|borderStyle=solid} dout.writeLong(PROTOCOL_VERSION); dout.writeLong(self.getId()); String addr = self.getElectionAddress().getHostString() + ":" + self.getElectionAddress().getPort(); byte[] addr_bytes = addr.getBytes(); dout.writeInt(addr_bytes.length); dout.write(addr_bytes); dout.flush(); {code} Since it writes length of host and port byte string there is no simple way to append new fields to this hdr anymore. I.e the rx side has to consider all bytes after sid for host and port parsing, which is what it does here: [QuorumCnxManager.InitialMessage.parse(): http://bit.ly/1Q0znpW] {code:title=QuorumCnxManager.java|borderStyle=solid} sid = din.readLong(); int remaining = din.readInt(); if (remaining <= 0 || remaining > maxBuffer) { throw new InitialMessageException( "Unreasonable buffer length: %s", remaining); } byte[] b = new byte[remaining]; int num_read = din.read(b); if (num_read != remaining) { throw new InitialMessageException( "Read only %s bytes out of %s sent by server %s", num_read, remaining, sid); } // FIXME: IPv6 is not supported. Using something like Guava's HostAndPort //parser would be good. String addr = new String(b); String[] host_port = addr.split(":"); {code} This has been captured in the discussion here: ZOOKEEPER-2186. Though it is possible to circumvent this problem by various means the request here is to design messages with hdr such that there is no need to bump version number or hack certain fields (i.e figure out if its length of host/port or length of different message etc, in the above case). This is the idea here as captured in ZOOKEEPER-2186. {code:java} dout.writeLong(PROTOCOL_VERSION); String addr = self.getElectionAddress().getHostString() + ":" + self.getElectionAddress().getPort(); byte[] addr_bytes = addr.getBytes(); // After version write the total length of msg sent by sender. dout.writeInt(Long.BYTES + addr_bytes.length); // Write sid afterwards dout.writeLong(self.getId()); // Write length of host/port string dout.writeInt(addr_bytes.length); // Write host/port string dout.write(addr_bytes); {code} Since total length of the message and length of each variable field is also present it is quite easy to provide backward compatibility, w.r.t to parsing of the message. Older code will read the length of message it knows and ignore the rest. Newer revision(s), that wants to keep things compatible, will only append to hdr and not change the meaning of current fields. I am guessing this was the original intent w.r.t the introduction of protocol version here: ZOOKEEPER-1633 Since 3.4.x code does not parse this and 3.5.x is still in alpha mode perhaps it is possible to consider this change now?. Also I would like to propose to carefully consider the option of using protobufs for the next protocol version bump. This will prevent issues like this in the future. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] zookeeper pull request: fix typo of java docs in OutputArchive.jav...
GitHub user ThomasLau opened a pull request: https://github.com/apache/zookeeper/pull/51 fix typo of java docs in OutputArchive.java You can merge this pull request into a Git repository by running: $ git pull https://github.com/ThomasLau/zookeeper trunk Alternatively you can review and apply these changes as the patch at: https://github.com/apache/zookeeper/pull/51.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #51 commit c434332661aaff7cfb2d959457b18ba7ee4ddb2a Author: Thomas Date: 2016-01-11T09:59:48Z fix typo of java docs in OutputArchive.java --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---