Although I am not an expert on it, would this not be fixed by using qmail's
maildir functionality as outlined in
http://www.lifewithqmail.org/lwq.html#maildir-delivery ?
Peter
At 20:31 24/09/01 -0700, you wrote:
>Hello. A few weeks ago I found out after the fact that an important email
>someone had sent me had never gotten to me. I dismissed it at first,
>thinking it must be their crappy Microsoft mail client or outgoing mail
>server. Surely it couldn't be our UNIX-based mail server. ;^>
>
>But then it happened again, and then yet again, where I failed to receive
>important work emails. The three losses were from disparate senders and
>domains.
>
>Clearly the problem was on our side. I asked around and couldn't find any
>other instances of people losing mail, so my mail client (nmh) was under
>suspicion, but since its POP code has been pretty stable since like the
>mid-1980s (!), I decided to investigate the server side first.
>
>With a lot of log-surfing on the server, which is running qpopper 3.0.2 on
>Solaris 2.5, I figured out that the two most recent mail losses coincided
>with a mail server crash (unfortunately not all that rare an occurrence due
>to an apparent hardware problem we have yet to figure out).
>
>The crash didn't occur around the time of the mail delivery, so it was not
>sendmail messing up here. Instead, the crashes occurred during or shortly
>after POP3 number-of-messages queries by my mail-checking scripts. The
>"Stats:" line in the log file before the crash showed I had messages
>waiting, but the next one after the crash showed they were gone (and I know
>by the timing that I was not doing any message-pulling during these
>crashes).
>
>It looks to me like what's happening is that my scripts do a POP3 connect
>(which I do more often than anyone else, explaining why only _I_ have
>noticed mail loss), my spool is emptied out of /var/mail/<user> into
>/var/mail/.<user>.pop, the machine crashes, and then after the machine's
>back up again, my spool is zero-length and the temp_drop is overwritten by
>the first check.
>
>I didn't pore through the code exhaustively, but I couldn't find any code
>that would prevent this. Shouldn't there be code that would check for the
>pre-existence of the temp_drop file and merge its messages back into the
>spool before doing anything else??
>
>As I understand things, the only way to prevent any possibility of
>overwriting an existing temp_drop file would be to do it atomically, with
>O_EXCL specified along with O_CREAT on the open() call. This is not being
>done in 3.0.2, nor has this been fixed in subsequent versions. Here's line
>1487 of qpopper 4.0.3's pop_dropcopy.c:
>
> dfd = open ( p->temp_drop, O_RDWR | O_CREAT, 0660 );
>
>This should be:
>
> dfd = open ( p->temp_drop, O_RDWR | O_CREAT | O_EXCL, 0660 );
>
>Even with that change alone, you'd prevent the mail loss that I'm seeing.
>Ideally, though, there should also be appropriate checking of the errno and
>if it's EEXIST, temp_drop's contents should be merged back into the mail
>spool to prevent the mail lossage that I'm seeing.
>
>--
>Dan Harkless
>SpeedGate Communications, Inc.