Re: very busy syslog server

Jin Guojun [VFFS] Thu, 08 Dec 2005 11:01:36 -0800

Clear enough.

em(4) should be able to handle this amount traffic without pollingunless all syslogtraffics come at the same time that could cause congested resource. Thatis why I want you to runthe script to watch the CPU utilization when the drop happens. Theaverage CPU use does NOT

reflect sparkling issue.

If CPU utilization is lower than 60%, there is nothing to worry aboutinterrupt, recvspace, etc.because CPU will have enough time to move data in and out. If you seeCPU utilization over 60%and interrupt is also over 60%-80%, than interrupt coalescence orpolling needs to be considered.

At this moment, only one place with three conditions can cause such drop(6.0-Release):--- see function sbappendaddr_locked() in kern/uipc_socket2.cbetween line 934-942 ---

recvspace, asa->sa_len, and number of mbufs.

I doubt recvspace will be the problem since sending size (maxdgram) ismuch smaller than

recvspace.
sa_len should not be the case unless we had bug in 6.0.

The last thing you may check is the mbufs -- type "netstat -m" to seethe statisticson mbuf when drop happens. Since you have a lot of CPU time, try to runthe script Imentioned to you and add "netstat -m" to the condition when drop countincreasing.This should be a few minutes programming work, and run it for hours or aday.

If you can get such info., we may know what is going on.

We may have a bug since I just reboot my 6.0 box and see numbers of UDPdrops, see below.


Belkin: netstat -p udp -s
udp:
       148 datagrams received
       0 with incomplete header
       0 with bad data length field
       0 with bad checksum
       0 with no checksum
       63 dropped due to no socket
       20 broadcast/multicast datagrams dropped due to no socket
       0 dropped due to full socket buffers
       0 not for hashed pcb
       65 delivered
       68 datagrams output
Belkin: netstat -p udp -s
udp:
       175 datagrams received
       0 with incomplete header
       0 with bad data length field
       0 with bad checksum
       0 with no checksum
       69 dropped due to no socket
       35 broadcast/multicast datagrams dropped due to no socket
       0 dropped due to full socket buffers
       0 not for hashed pcb
       71 delivered
       74 datagrams output

Imri Zvik wrote:

Hi,

1. The NIC being used is "Intel(R) PRO/1000" (the em(4) driver).
2. The CPU utilization in average is between 15% and 20%.
3. This machine is being used _only_ for the sysloging - the database resides 
on another server.

Meanwhile, I have added some more memory to the machine, and now it has 3GB of 
RAM, but I am still seeing packets being dropped due to full socket buffers.

Thanks,

--
Imri Zvik
PGP (2.6.3ia) Public Key: http://mariska.inter.net.il/~imriz/imriz.pgp

________________________________________

From: Jin Guojun [mailto:[EMAIL PROTECTED]Sent: Wednesday, December 07, 2005 9:56 PM

To: Sean Chittenden
Cc: Imri Zvik; freebsd-performance@freebsd.org
Subject: Re: very busy syslog server

Sean Chittenden wrote:I'm trying to setup a syslog server to serve a large group of

servers.  For the syslog daemon, I have chosen rsyslogd, and the
backend is mysql (on a different machine).

The machine has 2 Intel Xeon 2.80GHz CPUs, and 1GB of RAM, and it is
running FreeBSD 6 (6.0-STABLE).

The problem is, that I see a lot of UDP packets being dropped:

udp:
       390202 datagrams received
       0 with incomplete header
       0 with bad data length field
       0 with bad checksum
       6 with no checksum
       0 dropped due to no socket
       0 broadcast/multicast datagrams dropped due to no socket
->>>    123677 dropped due to full socket buffers
       0 not for hashed pcb
       266525 delivered
       133260 datagrams output

I have tried to increase net.inet.udp.recvspace, but it didn't solve
the problem.

I would appreciate any hint or tips.

When you're doing a large number of packets per second, you may want
to look into enabling device polling(4).  Right now, every packet
results in an interrupt.  With device polling, you can handle more

than one packet per interrupt. See the man page for details.Not quite, the interrupt interval depends on the device driver, or which NIC is used.

A number NICs are able to to interrupt coalescence, which requires to increase 
buffer
descriptor ring size (just for receiving buffer descriptors). Of course, 
polling is a simple thing
to try.

Before we can come up a better way to alter a better solution for this case, 
you also need to
monitor a few things:

What is NIC on this machine?

What is the CUP utilization in average and in case the packet drops? You can 
simply write a
script to do this instead of instructing kernel to do so (since this needs no 
super accurate):

run vmstat to record CPU utilization in every 1 to 3 seconds for use when 
following event happens:
use netstat watch UDP and pipe it to awk "netstat -udp | awk '$2=="drooped" {print 
$1; exit}'"
every 3-5 seconds, and compare the result with previous one to see if any 
changes. If so,
grep the last couple of line from vmstat output records.

From your information, it seems that this machine has enough memory bandwidth 
for syslog needs,

since it is not clear what this machine is for rlog daemon or sql server, or 
both are on the same machine.
If the third case is true, then you may run out of memory bandwidth. Under this 
circumstance,
you need to obtain the packet rate and the average packet size in order to 
determine the I/O
and memory bandwidth requirements.

   -Jin Guojun


_______________________________________________
freebsd-performance@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: very busy syslog server

Reply via email to