Yip Ng created ZOOKEEPER-2425:
---------------------------------
Summary: Slow ZooKeeper operation on Linux PowerPC
Key: ZOOKEEPER-2425
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2425
Project: ZooKeeper
Issue Type: Bug
Components: c client
Affects Versions: 3.4.8, 3.4.6
Environment: Linux 2.6.32-431.el6.ppc64
Reporter: Yip Ng
We ran into this performance issue in a production environment where
the ZooKeeper operations(create, set, etc.) was very slow.
After some analysis we discover that there appears to be a TCP ACK delay of
~40+ ms when sending a very small size message(length field) followed by its
payload in ZooKeeper C client(v3.4.6) on a Linux PowerPC machine. The Linux
version is 2.6.32-431.el6.ppc64.
Here is a sample of the strace that illustrates the delays:
39314 1459790083.730221 send(13, "\0\0\0e", 4, MSG_NOSIGNAL) = 4 <0.000010>
39314 1459790083.730259 send(13,
"W\2\234\215\0\0\0\6\0\0\0Y/nnn.nnn.nn.nnnnnnn/"..., 101, MSG_NOSIGNAL) = 101
<0.000009>
39314 1459790083.730297 poll([{fd=11, events=POLLIN}, {fd=13, events=POLLIN}],
2, 13333) = 1 ([{fd=13, revents=POLLIN}]) <0.000204>
39314 1459790083.730535 recv(13, "\0\0\0\216", 4, 0) = 4 <0.000008>
39314 1459790083.730571 recv(13,
"W\2\234\215\0\0\0\1\0\0\n7\0\0\0\0\0\0\0\1\0\0\0\37\0\0\0\6dige"..., 142, 0) =
142 <0.000008>
39314 1459790083.730615 futex(0x1003bd90a2c, FUTEX_CMP_REQUEUE_PRIVATE, 1,
2147483647, 0x1003bd90a58, 2) = 1 <0.000010>
39279 1459790083.730647 <... futex resumed> ) = 0 <0.000488>
39314 1459790083.730659 poll([{fd=11, events=POLLIN}, {fd=13, events=POLLIN}],
2, 13333 <unfinished ...>
39279 1459790083.730681 futex(0x1003bd90a58, FUTEX_WAKE_PRIVATE, 1) = 0
<0.000012>
39279 1459790083.730737 write(12, "\0", 1 <unfinished ...>
39314 1459790083.730762 <... poll resumed> ) = 1 ([{fd=11, revents=POLLIN}])
<0.000091>
39279 1459790083.730776 <... write resumed> ) = 1 <0.000030>
39314 1459790083.730791 read(11, <unfinished ...>
39279 1459790083.730809 futex(0x1003bd90a2c, FUTEX_WAIT_PRIVATE, 1, NULL
<unfinished ...>
39314 1459790083.730819 <... read resumed> "\0", 128) = 1 <0.000021>
39314 1459790083.730838 poll([{fd=11, events=POLLIN}, {fd=13,
events=POLLIN|POLLOUT}], 2, 13333) = 1 ([{fd=13, revents=POLLOUT}]) <0.000008>
39314 1459790083.730879 send(13, "\0\0\0f", 4, MSG_NOSIGNAL) = 4 <0.000010>
39314 1459790083.730917 send(13,
"W\2\234\216\0\0\0\f\0\0\0Y/nnn.nnn.nn.nnnnnnn/"..., 102, MSG_NOSIGNAL) = 102
<0.000010>
39314 1459790083.730961 poll([{fd=11, events=POLLIN}, {fd=13, events=POLLIN}],
2, 13333) = 1 ([{fd=13, revents=POLLIN}]) <0.000189>
39314 1459790083.731184 recv(13, "\0\0\0k", 4, 0) = 4 <0.000007>
39314 1459790083.731219 recv(13,
"W\2\234\216\0\0\0\1\0\0\n7\0\0\0\0\0\0\0\2\0\0\0\4stop\0\0\0\7"..., 107, 0) =
107 <0.000008>
39314 1459790083.731264 futex(0x1003bd90a2c, FUTEX_CMP_REQUEUE_PRIVATE, 1,
2147483647, 0x1003bd90a58, 2) = 1 <0.000010>
39279 1459790083.731296 <... futex resumed> ) = 0 <0.000480>
39314 1459790083.731307 poll([{fd=11, events=POLLIN}, {fd=13, events=POLLIN}],
2, 13333 <unfinished ...>
39279 1459790083.731329 futex(0x1003bd90a58, FUTEX_WAKE_PRIVATE, 1) = 0
<0.000013>
39279 1459790083.731443 write(12, "\0", 1) = 1 <0.000019>
39314 1459790083.731485 <... poll resumed> ) = 1 ([{fd=11, revents=POLLIN}])
<0.000165>
39279 1459790083.731508 futex(0x1003bd90a2c, FUTEX_WAIT_PRIVATE, 1, NULL
<unfinished ...>
39314 1459790083.731524 read(11, "\0", 128) = 1 <0.000009>
39314 1459790083.731557 poll([{fd=11, events=POLLIN}, {fd=13,
events=POLLIN|POLLOUT}], 2, 13333) = 1 ([{fd=13, revents=POLLOUT}]) <0.000008>
39314 1459790083.731598 send(13, "\0\0\0k", 4, MSG_NOSIGNAL) = 4 <0.000010>
39314 1459790083.731636 send(13,
"W\2\234\217\0\0\0\f\0\0\0^/nnn.nnn.nn.nnnnnnn/"..., 107, MSG_NOSIGNAL) = 107
<0.000010>
39314 1459790083.731677 poll([{fd=11, events=POLLIN}, {fd=13, events=POLLIN}],
2, 13333) = 1 ([{fd=13, revents=POLLIN}]) <0.000188>
39314 1459790083.731900 recv(13, "\0\0\0X", 4, 0) = 4 <0.000008>
39314 1459790083.731936 recv(13,
"W\2\234\217\0\0\0\1\0\0\n7\0\0\0\0\0\0\0\0\0\0\0\1\0\0\n&\0\0\0\1"..., 88, 0)
= 88 <0.000007>
The fix is to collapse the 2 sends (length + payload) into a single send.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)