I recently upgraded a couple of servers from postfix 2.2 to 2.5.
No configuration changes except those made by the upgrade scripts.

Now, during large mailings, the two new servers have frequent qmgr
crashes, while the ones running 2.2 do not.  The problem is qmgr runs
up against the per-process open filehandle limit of 1024:

  postfix/qmgr[21445]: fatal: fcntl F_DUPFD 128: Too many open files

What I'm trying to understand is *why* it's hitting the limit.

These servers are configured to have a maximum of 960 smtp processes.
On the 2.2 servers, when under heavy load when a new mailing is being
submitted, qmgr generally has a little over 960 open filehandles, so
I assume (but do not know for sure) that during heavy activity it
keeps one socket open to each smtp.

On the 2.5 servers, however, it tends to climb slowly towards 900-ish
and then suddenly spike up to 1024 and die, and I don't know what it
wants all those extra filehandles for.

I wrote a script to monitor how many open filehandles qmgr and scache
have, and to save lsof output to a file when it gets above 900.  Here
is typical output from a postfix 2.2 server:

2008-10-30 15:16:26 qmgr:   972  scache:   310
2008-10-30 15:16:36 qmgr:   970  scache:   479
2008-10-30 15:16:46 qmgr:   973  scache:   518
2008-10-30 15:16:50 qmgr:   974  scache:   538
2008-10-30 15:17:05 qmgr:   971  scache:   583
2008-10-30 15:17:09 qmgr:   970  scache:   593

... and here's a qmgr crash on one of the upgraded postfix 2.5 servers:

2008-10-30 14:41:05 qmgr:   840  scache:   907
2008-10-30 14:41:09 qmgr:   845  scache:   927
2008-10-30 14:41:18 qmgr:   860  scache:   898
2008-10-30 14:41:22 qmgr:   864  scache:   919
2008-10-30 14:41:57 qmgr:   904  scache:   851
2008-10-30 14:42:01 qmgr:   903  scache:   876
2008-10-30 14:42:06 qmgr:   909  scache:   885
2008-10-30 14:42:10 qmgr:    11  scache:   930
2008-10-30 14:43:14 qmgr:   632  scache:   845

The qmgr crash in this case was logged at 14:42:09,
so qmgr spiked from 909 to 1024 in about 3 seconds.
That's typical.

Saved qmgr output from the last sample looks like this:

qmgr    10477 postfix    0u   CHR                1,3               2176 
/dev/null
qmgr    10477 postfix    1u   CHR                1,3               2176 
/dev/null
qmgr    10477 postfix    2u   CHR                1,3               2176 
/dev/null
qmgr    10477 postfix    3r  FIFO                0,7         1061266654 pipe
qmgr    10477 postfix    4w  FIFO                0,7         1061266654 pipe
qmgr    10477 postfix    5u  unix 0x0000010222b41980         1061266544 socket
qmgr    10477 postfix    6u  FIFO               0,18         1061266542 
/var/spool/postfix/public/qmgr
qmgr    10477 postfix    7u  sock                0,4         1098866379 can't 
identify protocol
qmgr    10477 postfix    8r  0000                0,8       0 1098866385 
eventpoll
qmgr    10477 postfix    9r   DIR               0,18 1286400      38353 
/var/spool/postfix/incoming
qmgr    10477 postfix   10u  unix 0x000001006c70ac80         1100120907 socket
qmgr    10477 postfix   12r  0000                0,8       0 1061266532 
eventpoll
qmgr    10477 postfix   14u  sock                0,4         1098866686 can't 
identify protocol
qmgr    10477 postfix  128u  unix 0x000001004c68f640         1100103289 socket
qmgr    10477 postfix  129u  unix 0x0000010023874380         1100117731 socket
 ...
qmgr    10477 postfix 1012u  unix 0x000001021833b080         1100120331 socket
qmgr    10477 postfix 1013u  unix 0x00000100916a1940         1100120998 socket
qmgr    10477 postfix 1014u  unix 0x0000010083283c40         1100116997 socket
qmgr    10477 postfix 1015u  unix 0x000001013d373c40         1100120955 socket
qmgr    10477 postfix 1016u  unix 0x00000100537f6640         1100121209 socket

Which is just what it always looks like except with a lot more unix sockets.
I don't know how to determine what each socket is talking to.

The number of smtp processes never exceeds 960, and there are usually
2 smtpd processes (this server does not handle incoming mail).

Any explanation for why, with postfix 2.5, qmgr occasionally tries to
use a bunch of extra filehandles?  Or something I can do to find out?
  -- Cos

Reply via email to