I've studied your problems with considerable concern, and I must admit to being completely stymied. I have never seen this problem, and you are the only site that has reported it. Another site has reported a different class of problem on Solaris, which may have a common cause (see below)

It's pretty clear that the corruption is due to some kind of buffer corruption within stdio that overwrites part of the line in the buffer with a fragment from some other index line. There is only one routine, and one fprintf(), that writes index file lines; and that routine can not possibly write those lines.

The first corruption is 18-21 bytes (most likely 18); the second is probably 15 bytes, and the third is 11 (maybe 12) bytes.

You do say that these are physical disk drives attached to the IMAP server, and no NFS or other network filesystem is involved, correct? If NFS is involved, then all bets are off; however, I would expect a different form of corruption with NFS lossage.

Since this is probably heap corruption, try rebuilding the software with whatever malloc() debug package that you have and then wait to see if you get an assert from that package. If it is a heap issue, then different systems will crap out in different ways.

There is one other thing to try.  Look for the following code in imapd:
  if (setjmp (jmpenv)) {        /* die if a signal handler say so */
                                /* in case we get borked now */
    if (setjmp (jmpenv)) _exit (1);
                                /* need to close stream gracefully? */
    if (stream && !stream->lock && (stream->dtb->flags & DR_XPOINT))
      stream = mail_close (stream);
    ret = 1;                    /* set exit status */
  }
and replace it with:
  if (setjmp (jmpenv)) _exit (1);

Does that make the problem go away? Note that this will prevent any logout message in the syslog if the session gets killed or hung up.

The "UID ran backwards" messages from mixrbld do not indicate any problem with the rebuild itself and you can proceed; "duplicate UID" and other messages are much more serious. However, the fact that UIDs ran backwards indicate some unexpected weirdness with the data files which lead me to believe that something else is going wrong on your system, perhaps some incorrect setting of the system clock. If I were you, after running mixrbld I would run mixcvt on the mailbox to make a new one with UIDs that don't go backwards.

On Mon, 17 Mar 2008, Dennis R. Kolpanen wrote:
Our IMAP server supports about 200 users.  The operating system is
FreeBSD 5.3.  The users have home directories on this machine, but no
shell access.  There is no POP access.  All user interaction is by
means of IMAP and their email client (Thunderbird and Outlook with a
slight advantage to Thunderbird.  A small amount of Squirrelmail.) The
mail is delivered into the users /var/mail mailbox using procmail.
From this point on, all movement is under the control of IMAP.

Up until about 6 months ago, the user mailboxes were traditional UNIX
flat file mailboxes.  Except for some severe performance issues as some
of the old mailboxes approached the multi-gigabyte size, there were no
problems of corruption with the old mailboxes.

In September of 2007, I started converting everything into mix format.
The performance of the mix mailboxes is fantastic.  There have been
occasional problems, however.  The majority have been with corrupted
.mixindex files.  In the first several months, the problems were
corrected as soon as they were found, usually by having the user
calling to complain.  Comprehensive documentation was not kept, so
these incidents will not be discussed.  Documentation has been kept,
however, on the last three problems.

On 20 February, a user's  INBOX .mixindex was corrupted in the
following way (two lines above and below the bad line are being
shown):

:000005e4:20070405104958-0400:00001370:46e6e89f:000e3fe0:0000002d:00000631:
:000005e5:20070405113021-0400:000425e8:46e6e894:00000000:0000002d:000007a3:
:000005e6:20070405113519-0400:0004:00000000:0000002d:615:0000002d:000005b9:
:000005e7:20070405114940-0400:00003c12:46e6e894:00042f6b:0000002d:0000066a:
:000005e8:20070405120232-0400:000227cd:46e6e894:00046baa:0000002d:000004f5:

On 04 March, a user's "Sent Items" .mixindex was corrupted in the
following way:

:0000099a:20070529164149-0400:000189e3:46e813b5:00096fc4:0000002d:000001d2:
:0000099b:20070529172503-0400:000004e2:46e813b5:000af9d4:0000002d:000002d5:
:0000099c:23:460:000004e2:400:00003b3f:46e813b5:000afee3:0000002d:00000244:
:0000099d:20070530105047-0400:00017fd7:46e813b5:000b3a4f:0000002d:000001cd:
:0000099e:20070530105244-0400:0000032f:46e813b5:000cba53:0000002d:00000258:

On 15 March, a user's INBOX .mixindex was corrupted in the following
way:

:00006a73:20080204085837-0500:0000930a:47a75a39:0003713c:0000002d:0000060f:
:00006a74:20080204092942-0500:0005453c:47a75a39:00040473:0000002d:000006f4:
:00006a2942-0500:0093605-0500:00000604:47a75a39:000949dc:0000002d:00000542:
:00006a76:20080204093739-0500:0000a6d0:47a75a39:0009500d:0000002d:00000736:
:00006a77:20080204094407-0500:00001ff1:47a75a39:0009f70a:0000002d:0000076f:

No system or power problems.  Three different users.  Two physical disc
drives.  In at least two of the cases, the mangled index line was
pointing to fairly old email.  I strongly suspect that the email in
question was not being looked at or deleted.

The bad index line in each case was fixed by restoring a clean version
of the line from a backup.  Everything then worked normally.

The mixrbld was first attempted in the first two cases.  The very large
number of "UID ran backwards" messages in the first two cases, 333 and
351, eliminated confidence in the rebuild and prompted the restoring of
the bad line from a backup.  In the third, recent, case, a mixrbld was
tried as an experiment after restoring the bad line from a backup.  In
this case, no error messages at all.

Two different IMAP versions were in use during the above mentioned
problems.  The first two happened when the release version of 2006k was
being used.  The last problem happened about ten days after version
2007 was installed.

Any ideas?

Dennis R. Kolpanen
Senior Staff Engineer
Kearfott Guidance & Navigation
_______________________________________________
Imap-uw mailing list
Imap-uw@u.washington.edu
https://mailman1.u.washington.edu/mailman/listinfo/imap-uw


-- Mark --

http://panda.com/mrc
Democracy is two wolves and a sheep deciding what to eat for lunch.
Liberty is a well-armed sheep contesting the vote.
_______________________________________________
Imap-uw mailing list
Imap-uw@u.washington.edu
https://mailman1.u.washington.edu/mailman/listinfo/imap-uw

Reply via email to