Two bugs in 2.3.10-rc2 that cause segfaults

2007-10-16 Thread Rob Mueller

Hi Ken

After some testing on our test server, we rolled out 2.3.10-rc2 onto one of 
our production IMAP stores and almost immediately starting noticing 
segfaults for some users.


Bron and I spent quite a bit of time this afternoon debugging, and we're 
pretty sure we found both the causes.


1. In index_parse_sequence(), your realloc() doesn't multiply by 
sizeof(struct seq_range), so if you have a sequence list with more than 
about 8 items, it will start corrupting memory


2. The signed -> unsigned changes in the prot stream stuff weren't as 
innocent as they looked and created a subtle and nasty bug. You might want 
to audit the other code where you changed signed -> unsigned as well...


Here's an example of the problem if you use IDLE and close the connection on 
it. In that case, what happens is:


In cmd_idle(), we wait for data from the client by calling:

   c = getword(imapd_in, &arg);

getword() calls prot_getc(), which looks like this:

#define prot_getc(s) ((s)->cnt-- > 0 ? (int)*(s)->ptr++ : prot_fill(s))

Say there's no data in the prot buffer, then cnt == 0. However when we call 
the above macro, it does cnt--, which makes cnt wrap to 4294967295, and then 
calls prot_fill. prot_fill() calls read(), and if read() returns <= 0 (eg 
other end closed the connection) it does.


   if (n <= 0) {
   if (n) s->error = xstrdup(strerror(errno));
   else s->eof = 1;
   return EOF;
   }

So it sets the eof flag and returns EOF.

Back in cmd_idle, we get the EOF, and exit out of cmd_idle and back into the 
main loop, which does.


   /* Parse tag */
   c = getword(imapd_in, &tag);

Now getword calls the prot_getc() macro again... but uh, oh. cnt is 
definitely > 0 now (it's 4294967295) and so we reading through unitialised 
memory trying to interpret it as IMAP commands until we hit a segfault 
somewhere.


Our fix is to change prot_getc() to only decrement cnt if it's non-zero. 
Because it's an expression macro, it uses the C comma operator which I've 
found often even experienced C people don't know about 
(http://www.eskimo.com/~scs/cclass/int/sx4db.html).


#define prot_getc(s) ((s)->cnt > 0 ? (--(s)->cnt, (int)*(s)->ptr++) : 
prot_fill(s))


Both patches are here:

http://cyrus.brong.fastmail.fm/patches/cyrus-fix-segfaults-2.3.10.diff

Rob



Re: delayed delete bug?

2007-10-16 Thread Ken Murchison

Bron Gondwana wrote:

On Mon, Oct 15, 2007 at 03:25:28PM +0200, Rudy Gevaert wrote:

Hi,

I'm busy looking at delayed delete.  I'm using unix hierarchy seperator.  I 
deleted a mailbox and see it like this:


kavula.ugent.be> lm
DELETED/user/rudy.gevaert/Foo/[EMAIL PROTECTED] (\HasNoChildren)
user/rudy.gevaert/[EMAIL PROTECTED] (\HasNoChildren)
user/rudy.gevaert/[EMAIL PROTECTED] (\HasNoChildren)
user/rudy.gevaert/[EMAIL PROTECTED] (\HasNoChildren)
user/[EMAIL PROTECTED] (\HasChildren)

However the file system location is:
/var/cyrus/imap/domain/u/ugent.be/u/DELETED/user/rudy^gevaert/Foo/471364C1/


Yeah, that's a "feature" - you see, hashing is done by whatever's after
the first dot.  In theory you don't have to care.

We wrote our userhash patch so the locations would be:

/var/cyrus/imap/domain/u/ugent.be/user/rudy^gevaert/DELETED.user.rudy^gevaert.Foo.471364C1/
/var/cyrus/imap/domain/u/ugent.be/user/rudy^gevaert/user.rudy^gevaert
/var/cyrus/imap/domain/u/ugent.be/user/rudy^gevaert/user.rudy^gevaert.Inbox2
/var/cyrus/imap/domain/u/ugent.be/user/rudy^gevaert/user.rudy^gevaert.Trash
/var/cyrus/imap/domain/u/ugent.be/user/rudy^gevaert/user.rudy^gevaert.Trash2

Which makes much more sense to me and has the nice property that a
directory is either a "trunk" or a "leaf" but not both - it contains
either purely sub directories or purely mailbox contents.


I would have thought it would be under domain/u/ugent.be/DELETED/user/...

Is something going wrong here?


No.


I've run cyr_expire and it cleans it up nicely!

Also, is it correct to undelete a deleted folder I need to rename it? (This 
works! And gets replicated).


Yeah, it all works fine.  It's just an odd location.  I got over that
pretty quickly given that my Perl libraries know where it is and Cyrus
knows where it is so I just let them do the legwork.


So, is it your opinion that this doesn't need to be fixed, or shouldn't 
be fixed?


--
Kenneth Murchison
Systems Programmer
Project Cyrus Developer/Maintainer
Carnegie Mellon University


Re: Two bugs in 2.3.10-rc2 that cause segfaults

2007-10-16 Thread Ken Murchison

Applied to CVS.  Thanks for hunting these down.


Rob Mueller wrote:

Hi Ken

After some testing on our test server, we rolled out 2.3.10-rc2 onto one 
of our production IMAP stores and almost immediately starting noticing 
segfaults for some users.


Bron and I spent quite a bit of time this afternoon debugging, and we're 
pretty sure we found both the causes.


1. In index_parse_sequence(), your realloc() doesn't multiply by 
sizeof(struct seq_range), so if you have a sequence list with more than 
about 8 items, it will start corrupting memory


2. The signed -> unsigned changes in the prot stream stuff weren't as 
innocent as they looked and created a subtle and nasty bug. You might 
want to audit the other code where you changed signed -> unsigned as 
well...


Here's an example of the problem if you use IDLE and close the 
connection on it. In that case, what happens is:


In cmd_idle(), we wait for data from the client by calling:

   c = getword(imapd_in, &arg);

getword() calls prot_getc(), which looks like this:

#define prot_getc(s) ((s)->cnt-- > 0 ? (int)*(s)->ptr++ : prot_fill(s))

Say there's no data in the prot buffer, then cnt == 0. However when we 
call the above macro, it does cnt--, which makes cnt wrap to 4294967295, 
and then calls prot_fill. prot_fill() calls read(), and if read() 
returns <= 0 (eg other end closed the connection) it does.


   if (n <= 0) {
   if (n) s->error = xstrdup(strerror(errno));
   else s->eof = 1;
   return EOF;
   }

So it sets the eof flag and returns EOF.

Back in cmd_idle, we get the EOF, and exit out of cmd_idle and back into 
the main loop, which does.


   /* Parse tag */
   c = getword(imapd_in, &tag);

Now getword calls the prot_getc() macro again... but uh, oh. cnt is 
definitely > 0 now (it's 4294967295) and so we reading through 
unitialised memory trying to interpret it as IMAP commands until we hit 
a segfault somewhere.


Our fix is to change prot_getc() to only decrement cnt if it's non-zero. 
Because it's an expression macro, it uses the C comma operator which 
I've found often even experienced C people don't know about 
(http://www.eskimo.com/~scs/cclass/int/sx4db.html).


#define prot_getc(s) ((s)->cnt > 0 ? (--(s)->cnt, (int)*(s)->ptr++) : 
prot_fill(s))


Both patches are here:

http://cyrus.brong.fastmail.fm/patches/cyrus-fix-segfaults-2.3.10.diff

Rob





--
Kenneth Murchison
Systems Programmer
Project Cyrus Developer/Maintainer
Carnegie Mellon University


Re: delayed delete bug?

2007-10-16 Thread Rudy Gevaert

Ken Murchison wrote:

So, is it your opinion that this doesn't need to be fixed, or shouldn't 
be fixed?




Te be honest, I would like to see it fixed.

Thanks in advance,

Rudy


Re: delayed delete bug?

2007-10-16 Thread Bron Gondwana

On Tue, 16 Oct 2007 18:55:55 +0200, "Rudy Gevaert" <[EMAIL PROTECTED]> said:
> Ken Murchison wrote:
> 
> > So, is it your opinion that this doesn't need to be fixed, or shouldn't 
> > be fixed?
> > 
> 
> Te be honest, I would like to see it fixed.
> 
> Thanks in advance,
> 
> Rudy

Bringing back some context:

>> I'm busy looking at delayed delete.  I'm using unix hierarchy seperator.  I
>> deleted a mailbox and see it like this:
>>
>> kavula.ugent.be> lm
>> DELETED/user/rudy.gevaert/Foo/[EMAIL PROTECTED] (\HasNoChildren)
>> user/rudy.gevaert/[EMAIL PROTECTED] (\HasNoChildren)
>> user/rudy.gevaert/[EMAIL PROTECTED] (\HasNoChildren)
>> user/rudy.gevaert/[EMAIL PROTECTED] (\HasNoChildren)
>> user/[EMAIL PROTECTED] (\HasChildren)
>>
>> However the file system location is:
>> /var/cyrus/imap/domain/u/ugent.be/u/DELETED/user/rudy^gevaert/Foo/471364C1/
>>
>> I would have thought it would be under domain/u/ugent.be/DELETED/user/...

Just to clarify exactly what you want, do you:

a) think that the DELETED folders should not be hashed? (your example)
b) think they should be hashed based on the username rather than into 'u'?
c) think something entirely different should be done with them?

My (c) is as I posted in my followup:

> We wrote our userhash patch so the locations would be:
>
> /var/cyrus/imap/domain/u/ugent.be/user/rudy^gevaert/DELETED.user.rudy^gevaert.Foo.471364C1/
> /var/cyrus/imap/domain/u/ugent.be/user/rudy^gevaert/user.rudy^gevaert
> /var/cyrus/imap/domain/u/ugent.be/user/rudy^gevaert/user.rudy^gevaert.Inbox2
> /var/cyrus/imap/domain/u/ugent.be/user/rudy^gevaert/user.rudy^gevaert.Trash

So that all the folders related to the same user are in the same directory
and have dots in their names as separators rather than being subdirectories
of each other.  This is obviously only one of many options, and has the
downside that it's a completely new filesystem layout that requires a rehash
run for every folder rather than just affecting a few deleted folders.

Bron.
-- 
  Bron Gondwana
  [EMAIL PROTECTED]



Re: Two bugs in 2.3.10-rc2 that cause segfaults

2007-10-16 Thread Bron Gondwana

On Tue, 16 Oct 2007 12:22:28 -0400, "Ken Murchison" <[EMAIL PROTECTED]> said:
> Applied to CVS.  Thanks for hunting these down.

I tell you what - we were about 1/4 grumpy at you and 3/4 grumpy at the C
language for being so sharp and unfriendly.  Watching our least loaded
"real store" segfaulting left, right and centre after we rolled this out
to real users, and knowing we either had to downgrade the indexes and
roll back, or figure this thing out.  Nothing like pressure to make you
read those gdb traces.

(downgrading the indexes isn't actually insanely painful, it would have been
5 minutes work to rewrite my little Perl utility to switch back to version 9
format complete with pattern matching the GUID and deciding if it needed zeroing
or not, and another 10 minutes or so to run it over all the mailboxes.  It's
like admitting defeat though!)

Bron.
-- 
  Bron Gondwana
  [EMAIL PROTECTED]