On Wed, May 15, 2024 at 09:09:47PM +0200, Erin Shepherd wrote:
> It seems absent from the BSDs, but on Linux you can pass the MSG_MORE
> flag to send() to override TCP_NODELAY for a specific write

Am I understanding correctly this is a variant on TCP_NOPUSH/TCP_CORK?
"more data is coming, dont push the send button yet!"

In OpenBGPD, TCP_NODELAY is set on the socket (a socket option available
on all platforms, I think?), and then all data is coalesced into
sendmsg(), no need for 'corking'. From my limited testing it seems a
full routing table should fit in ~ TCP 41,000 packets.

BIRD has a code path sk_sendmsg()->sendmsg() called from
sk_maybe_write(); but based my limited testing I'm not sure this path is
followed in all cases, because I see way more than 41K packets for a
full table feed (with TCP_NODELAY enabled).

Perhaps there are two separate questions here:

- are BGP messages (slightly) delayed because of TCP_NODELAY not being
  set? (I think yes)
- are BGP messages as efficiently coalesced into as few TCP packets as
  possible? (with TCP_NODELAY set, I am not sure)

Kind regards,

Job

ps. To clarify why I started this thread: last week I fell into the TCP
subsystem rabbit hole: why are things the way they are? I started
auditing various programs related to my $dayjob and thought it would be
good to open a conversation with the BIRD developer community. My goal
is not necessarily to get this patch 'as-is' merged, but to learn from
and with friendly and respected BGP developers.

Reply via email to