I've run down a problem that causes a hang when very large networks are
executed.  It seems that the UI sends all the parameter assignments to the
exec before it starts reading replies back from the exec.  Each assignment
triggers a "Complete" message.  Eventually,  these messages from the exec
to the UI fill the available buffer space, causing the exec to block trying
to write to the socket.  This leaves the UI sending down assignments, but
now the exec isn't accepting them - since its blocked.  Eventually, the UI
fills the space available for buffering messages from the UI to the exec,
so it also hangs, and you have a deadlock.

The right fix is to cause the UI to interlace reading data from the exec
and writing data to the exec.  However, this involves a significant logic
change that I don't have time for now, so I've taken the quick fix to get
the needy user running - adding an environment variable allowing the user
to specify a larger socket buffer size (if the setsockopt call is
available), named DX_SOCKET_BUFSIZE.  The value assigned to it is passed as
the parameter to setsockopt for SOL_SNDBUF and SOL_RCVBUF.  The value used
is clamped to the max allowed by the system, and I've added a test calling
getsockopt to check that the value used is the value requested, and
produces an error if not.

On linux, the max buffer size can be seen in /and wmem_max.  You can change
it by adding

echo NNNNN > /proc/sys/net/core/rmem_max
echo NNNNN > /proc/sys/net/core/wmem_max

to /etc/rc.d/rc.local and rebooting, where NNNNN is the value you want (for
some reason, it'll use twice NNNNN).  We use NNNNN=262144.

Greg

Reply via email to