Wietse Venema:
> Wietse Venema:
> > Devdas Bhagat:
> > > The last error messages I get are these:
> > > Sep 8 13:54:37 jaundiced-outlook postfix/smtp[7998]: warning: problem
> > > talking to service private/scache: Connection timed out
> > > Sep 8 13:54:37 jaundiced-outlook postfix/smtp[20375]: warning: problem
> > > talking to service private/scache: Connection timed out
> > > Sep 8 13:54:37 jaundiced-outlook postfix/smtp[7960]: warning: problem
> > > talking to service private/scache: Connection timed out
> > > Sep 8 13:54:37 jaundiced-outlook postfix/smtp[17618]: warning: problem
> > > talking to service private/scache: Connection timed out
> > > <snip about 600 similar lines about this problem>
> > > Sep 8 14:10:56 jaundiced-outlook postfix/master[11125]: fatal: watchdog
> > > timeout
> > > Sep 8 14:10:56 jaundiced-outlook postfix/qmgr[13568]: fatal: watchdog
> > > timeout
> >
> > I think that the kernel is running out of steam.
> >
> > Try reducing the concurrency.
> >
> > The master daemon triggers qmgr and pickup regularly. That "trigger"
> > write is non-blocking with a timeout of 1, so it cannot block the
> > master daemon. Except of course when the kernel is messed up.
>
> Hmm, except that write_buf() will retry the write() after en EAGAIN
> error. So to be really smart, write_buf() should watch the clock and
> break the loop when the time expires.
If this is the problem, the workaround would be to break the
loop after EAGAIN. That would keep the master from timing out.
You'd still have a deadlocked qmgr for 1000s, though.
Wietse
ssize_t write_buf(int fd, const char *buf, ssize_t len, int timeout)
{
const char *start = buf;
ssize_t count;
while (len > 0) {
if (timeout > 0 && write_wait(fd, timeout) < 0)
return (-1);
if ((count = write(fd, buf, len)) < 0) {
#if 0
if (errno == EAGAIN && timeout > 0)
continue;
#endif
if (errno == EINTR)
continue;
return (-1);
}