[
https://issues.apache.org/jira/browse/ZOOKEEPER-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13474571#comment-13474571
]
Ted Yu commented on ZOOKEEPER-1560:
-----------------------------------
In doIO(), should we check the return value from:
{code}
sock.write(pbb);
{code}
Here is jstack where testLargeNodeData hung:
{code}
"main" prio=5 tid=7f9bed000800 nid=0x10c382000 in Object.wait() [10c380000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <7dc1a15c0> (a org.apache.zookeeper.ClientCnxn$Packet)
at java.lang.Object.wait(Object.java:485)
at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1309)
- locked <7dc1a15c0> (a org.apache.zookeeper.ClientCnxn$Packet)
at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:781)
at
org.apache.zookeeper.test.ClientTest.testLargeNodeData(ClientTest.java:531)
{code}
I think we can send data in chunks if pbb.remaining() is beyond certain
threshold.
> Zookeeper client hangs on creation of large nodes
> -------------------------------------------------
>
> Key: ZOOKEEPER-1560
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1560
> Project: ZooKeeper
> Issue Type: Bug
> Components: java client
> Affects Versions: 3.4.4, 3.5.0
> Reporter: Igor Motov
> Attachments: ZOOKEEPER-1560.patch
>
>
> To reproduce, try creating a node with 0.5M of data using java client. The
> test will hang waiting for a response from the server. See the attached patch
> for the test that reproduces the issue.
> It seems that ZOOKEEPER-1437 introduced a few issues to
> {{ClientCnxnSocketNIO.doIO}} that prevent {{ClientCnxnSocketNIO}} from
> sending large packets that require several invocations of
> {{SocketChannel.write}} to complete. The first issue is that the call to
> {{outgoingQueue.removeFirstOccurrence(p);}} removes the packet from the queue
> even if the packet wasn't completely sent yet. It looks to me that this call
> should be moved under {{if (!pbb.hasRemaining())}} The second issue is that
> {{p.createBB()}} is reinitializing {{ByteBuffer}} on every iteration, which
> confuses {{SocketChannel.write}}. And the third issue is caused by extra
> calls to {{cnxn.getXid()}} that increment xid on every iteration and confuse
> the server.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira