Clear enough.
em(4) should be able to handle this amount traffic without polling unless all syslog traffics come at the same time that could cause congested resource. That is why I want you to run the script to watch the CPU utilization when the drop happens. The average CPU use does NOT
reflect sparkling issue.
If CPU utilization is lower than 60%, there is nothing to worry about interrupt, recvspace, etc. because CPU will have enough time to move data in and out. If you see CPU utilization over 60% and interrupt is also over 60%-80%, than interrupt coalescence or polling needs to be considered.

At this moment, only one place with three conditions can cause such drop (6.0-Release): --- see function sbappendaddr_locked() in kern/uipc_socket2.c between line 934-942 ---
recvspace, asa->sa_len, and number of mbufs.
I doubt recvspace will be the problem since sending size (maxdgram) is much smaller than
recvspace.
sa_len should not be the case unless we had bug in 6.0.
The last thing you may check is the mbufs -- type "netstat -m" to see the statistics on mbuf when drop happens. Since you have a lot of CPU time, try to run the script I mentioned to you and add "netstat -m" to the condition when drop count increasing. This should be a few minutes programming work, and run it for hours or a day.
If you can get such info., we may know what is going on.

We may have a bug since I just reboot my 6.0 box and see numbers of UDP drops, see below.

Belkin: netstat -p udp -s
udp:
       148 datagrams received
       0 with incomplete header
       0 with bad data length field
       0 with bad checksum
       0 with no checksum
       63 dropped due to no socket
       20 broadcast/multicast datagrams dropped due to no socket
       0 dropped due to full socket buffers
       0 not for hashed pcb
       65 delivered
       68 datagrams output
Belkin: netstat -p udp -s
udp:
       175 datagrams received
       0 with incomplete header
       0 with bad data length field
       0 with bad checksum
       0 with no checksum
       69 dropped due to no socket
       35 broadcast/multicast datagrams dropped due to no socket
       0 dropped due to full socket buffers
       0 not for hashed pcb
       71 delivered
       74 datagrams output

Imri Zvik wrote:

Hi,

1. The NIC being used is "Intel(R) PRO/1000" (the em(4) driver).
2. The CPU utilization in average is between 15% and 20%.
3. This machine is being used _only_ for the sysloging - the database resides 
on another server.

Meanwhile, I have added some more memory to the machine, and now it has 3GB of 
RAM, but I am still seeing packets being dropped due to full socket buffers.

Thanks,

--
Imri Zvik
PGP (2.6.3ia) Public Key: http://mariska.inter.net.il/~imriz/imriz.pgp

________________________________________
From: Jin Guojun [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 07, 2005 9:56 PM
To: Sean Chittenden
Cc: Imri Zvik; freebsd-performance@freebsd.org
Subject: Re: very busy syslog server

Sean Chittenden wrote: I'm trying to setup a syslog server to serve a large group of
servers.  For the syslog daemon, I have chosen rsyslogd, and the
backend is mysql (on a different machine).

The machine has 2 Intel Xeon 2.80GHz CPUs, and 1GB of RAM, and it is
running FreeBSD 6 (6.0-STABLE).

The problem is, that I see a lot of UDP packets being dropped:

udp:
       390202 datagrams received
       0 with incomplete header
       0 with bad data length field
       0 with bad checksum
       6 with no checksum
       0 dropped due to no socket
       0 broadcast/multicast datagrams dropped due to no socket
->>>    123677 dropped due to full socket buffers
       0 not for hashed pcb
       266525 delivered
       133260 datagrams output

I have tried to increase net.inet.udp.recvspace, but it didn't solve
the problem.

I would appreciate any hint or tips.
When you're doing a large number of packets per second, you may want
to look into enabling device polling(4).  Right now, every packet
results in an interrupt.  With device polling, you can handle more
than one packet per interrupt. See the man page for details. Not quite, the interrupt interval depends on the device driver, or which NIC is used.
A number NICs are able to to interrupt coalescence, which requires to increase 
buffer
descriptor ring size (just for receiving buffer descriptors). Of course, 
polling is a simple thing
to try.

Before we can come up a better way to alter a better solution for this case, 
you also need to
monitor a few things:

What is NIC on this machine?

What is the CUP utilization in average and in case the packet drops? You can 
simply write a
script to do this instead of instructing kernel to do so (since this needs no 
super accurate):

run vmstat to record CPU utilization in every 1 to 3 seconds for use when 
following event happens:
use netstat watch UDP and pipe it to awk "netstat -udp | awk '$2=="drooped" {print 
$1; exit}'"
every 3-5 seconds, and compare the result with previous one to see if any 
changes. If so,
grep the last couple of line from vmstat output records.

From your information, it seems that this machine has enough memory bandwidth 
for syslog needs,
since it is not clear what this machine is for rlog daemon or sql server, or 
both are on the same machine.
If the third case is true, then you may run out of memory bandwidth. Under this 
circumstance,
you need to obtain the packet rate and the average packet size in order to 
determine the I/O
and memory bandwidth requirements.

   -Jin Guojun

_______________________________________________
freebsd-performance@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reply via email to