Hi Daniel,

Ah ok. Before you sent your email I had already created a small patch for myself. It almost seems that APR ignores the OS settings (i.e.: net.core.rmem_default) and creates a socket with it's own default (receive) buffer size.

Attached is a patch against 3.3.6 for lib/apr_net.c that stops the receive buffers errors for me.

The patch sets the buffer size a bit bigger, although I'm not sure what would be a sensible size for gmond. I would think if you have a large cluster with lots of UDP traffic you would need a bigger receive buffer than for smaller systems.

I will try out 3.3.7 and see what it's debug output says on buffer size's.


Kind regards,
- Ramon.


On 23-4-2012 14:40, Daniel Pocock wrote:


Hi Ramon,

Vladimir asked about similar errors on IRC recently

I thought buffer sizes may be an issue, so the 3.3.7 release candidate
has logging of RX buffer sizes (it is logged at debug level when gmond
starts).  It may be interesting and helpful to compare those buffer
sizes, system defaults, etc, from your own systems and other people with
any similar problem.  Looking at the log output should also show you
whether or not gmond is using the values you tried to set at a system level.

Regards,

Daniel

On 23/04/12 12:07, Ramon Bastiaans wrote:
This is with gmond version 3.3.1, with a simple udp_receive_channel set
like this:

udp_recv_channel {
   port = "8669"
}


- Ramon.

On 23-4-2012 12:03, Ramon Bastiaans wrote:
Hi,

While troubleshooting an other network issue, I enabled the
netstats.py module to report "udp_rcvbufrerrors".

Ironically, it seems to me as if gmond itself is experiencing udp
receive buffer errors.

When I check out /proc/net/udp for drops, amongst other things I see:

   sl  local_address rem_address   st tx_queue rx_queue tr tm->when
retrnsmt   uid  timeout inode ref pointer drops
   51: 00000000:21DD 00000000:0000 07 00000000:00000000 00:00000000
00000000   103        0 72590718 2 ffff8803a1a5d140 6676

It shows a 6676 dropcount for a socket with uid: 103

When I check out which process has this uid, it is gmond:

# ps -ef n | grep '103 '
      103  7800     1  0 10:32 ?        Ssl    0:04 /usr/sbin/gmond

I have tried tweaking some sysctl settings, increasing rmem for udp
and increasing the max_udp_message_len in gmond.conf but there seems
to be no effect.

Is this possibly a bug, or am I missing something and doing it wrong? ;)


Cheers,
- Ramon.



------------------------------------------------------------------------------
For Developers, A Lot Can Happen In A Second.
Boundary is the first to Know...and Tell You.
Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
http://p.sf.net/sfu/Boundary-d2dvs2



_______________________________________________
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers
------------------------------------------------------------------------------
For Developers, A Lot Can Happen In A Second.
Boundary is the first to Know...and Tell You.
Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
http://p.sf.net/sfu/Boundary-d2dvs2
_______________________________________________
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

--
ing. R. Bastiaans, B.ICT
* Senior Systems Programmer
* Operations, Support and Development

SARA
Science Park 140     PO Box 94613
1098 XG Amsterdam NL 1090 GP Amsterdam NL
P.+31 (0)20 592 3000 F.+31 (0)20 668 3167

--- apr_net.c.old       2012-04-13 03:02:27.000000000 +0200
+++ apr_net.c   2012-04-23 15:00:57.839151626 +0200
@@ -202,6 +202,12 @@
       apr_socket_close(sock);
       return NULL;
     }
+  stat = apr_socket_opt_set(sock, APR_SO_RCVBUF, 1024000);
+  if (stat != APR_SUCCESS)
+    {
+      apr_socket_close(sock);
+      return NULL;
+    }
 
   if(!localsa)
     {

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

------------------------------------------------------------------------------
For Developers, A Lot Can Happen In A Second.
Boundary is the first to Know...and Tell You.
Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
http://p.sf.net/sfu/Boundary-d2dvs2
_______________________________________________
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Reply via email to