Hi,

migrated to proton 0.4 now, so I can ask relevant questions again :)

I'm doing basic latency analysis in a really trivial setup:
     (publisher -> broker -> subscriber)

publisher is emitting a message every 10 milliseconds, message contains an array of timestamps.
broker is forwarding the messages
subscriber is looking at propagation delay

all participants are adding timestamps to the message as it traverses the chain.

general operating parameters:

I have incoming and outgoing window set to 10

after each pn_messenger_get() I do a pn_messenger_accept() for the last message after each pn_messenger_put() I do a pn_messenger_settle() for the last message

the only synchronous call used is pn_messenger_recv(100), nominally
for each get() and/or put()

running all three processes on the same VM, no real network, loopback only.

During the run, the number of unsent messages (pn_messenger_outgoing()) fluctuates between 1, 2, 3, and 0 typically in a seesaw fashion, but not really predictable. Because of that the percieved propagation delay oscillated between 2 and 40+ milliseconds.

[408] P00:00:07.611882 <message.cpp:67 trace dump_timeline>
+                   e2e:   41.858 TIMELINE CREATE: P00:00:07.569998
+               PUBLISH:    0.000
+             SERIALIZE:    0.197
+               RECEIVE:    1.113
+           DESERIALIZE:    0.079
+             REPUBLISH:    0.773
+             SERIALIZE:    1.164
+               RECEIVE:   38.010
+           DESERIALIZE:    0.070
+           QUEUE_EVENT:    0.126
+            NEXT_EVENT:    0.047

After adding TCP_NODELAY to pn_configure_sock() this is no longer the case, and
all messages are sent immediately.

With this change the end-to-end propagation delay over loopback
hovers around 2 milliseconds for most of the time, there are occasional hiccups, but nowhere near values observed previously, 7ms is now considered long delay :).

A sample trace for one event:
[408] P00:00:17.497270 <message.cpp:67 trace dump_timeline>
+                   e2e:    2.015 TIMELINE CREATE: P00:00:17.495230
+               PUBLISH:    0.000
+             SERIALIZE:    0.185
+               RECEIVE:    0.359
+           DESERIALIZE:    0.052
+             REPUBLISH:    0.209
+             SERIALIZE:    0.165
+               RECEIVE:    0.333
+           DESERIALIZE:    0.047
+           QUEUE_EVENT:    0.158
+            NEXT_EVENT:    0.011

time spent from sender not long before calling pn_messenger_put() to
to not long after receiver gets the message from pn_messenger_get()
is attributed to 'RECEIVE'. There are two, one for each hop.

With the change, our processing became a significant contributing factor to the delay, a good spot to be in :)


One of futher ideas on the topic of analysing performance is adding the
following method to the data api:

typedef pn_timestamp_t (*pn_timestamp_cb_t)();
int pn_data_put_lazy_timestamp(pn_data_t *, pn_timestamp_cb_t cb)

This would allow one to embed a pn_timestamp_t anywhere into the pn_message_t * and proton would evaluate it as close to the wire as at all possible, for example, remember the byte offset into the binary buffer, and then do a sneaky layering
violation :).

A similar mechanism would come handy on the receive side, although I suspect the latency there is not really an issue.


Here's the simple version of the patch, that I'm running with.
I can also extend it so it's optionally settable, if you guys feel that
disabling nagle is not always a good idea.

--
Bozzo

-- snip --

Author: Bozo Dragojevic <bozzo.dragoje...@gmail.com>
Date:   Wed Mar 13 16:53:44 2013 -0400

    Disable nagle algorithm

diff --git a/proton-c/src/posix/driver.c b/proton-c/src/posix/driver.c
index 680ce6c..805a6ac 100644
--- a/proton-c/src/posix/driver.c
+++ b/proton-c/src/posix/driver.c
@@ -26,6 +26,7 @@
 #include <sys/types.h>
 #include <sys/socket.h>
 #include <netinet/in.h>
+#include <netinet/tcp.h>
 #include <netdb.h>
 #include <unistd.h>
 #include <fcntl.h>
@@ -252,6 +253,14 @@ static void pn_configure_sock(int sock) {
   };
   */

+  // disable nagle
+  int flag = 1;
+  int result = setsockopt(sock, IPPROTO_TCP, TCP_NODELAY,
+                          (char *) &flag, sizeof(int));
+  if (result < 0) {
+    perror("setsockopt");
+  }
+
     int flags = fcntl(sock, F_GETFL);
     flags |= O_NONBLOCK;

Reply via email to