Re: locking problems with 2.1.9

2002-11-18 Thread +archive . info-cyrus
--On Friday, November 8, 2002 7:49 PM -0500 Peter Krotkov 
<[EMAIL PROTECTED]> wrote:

| Prior to a code fix to address the problems you observed, do you think it
| would be unreasonable to configure master so that imaps is not offered?
| We could revert to running stunnel for ssl support and then take our

Could this also be an entropy issue?  On this Solaris 8 box, what are you 
using for /dev/random, anyway?  That Solaris patch?

Amos




Re: locking problems with 2.1.9

2002-11-08 Thread Peter Krotkov
On Fri, 8 Nov 2002, Lawrence Greenfield wrote:

>Date: Fri, 8 Nov 2002 11:04:32 -0500 (EST)
>From: Peter Krotkov <[EMAIL PROTECTED]>
> [...]
>22335:  imapd -s
> ff09b3bc read (0, 1c4bc8, 6f5)
> 0008e8c8 sock_read (0, 1c4bc8, 6f5, 8e8a0, 18edf8, 1) + 28
> 0008d670 BIO_read (1bd090, 1c4bc8, 6f5, 1c32a8, 1bccf0, 0) + d0
> 0007dec8 ssl3_read_n (5, 2010, 2010, 191b, 0, 0) + 148
> 0007e140 ssl3_get_record (1bb7e0, 1bccf0, 0, 0, 23138, ff0941d8) + 1e0
> 0007e8d4 ssl3_read_bytes (1bb7e0, 17, 1aa610, 1000, 0, 1bccf0) + 1d4
> 0007c6e8 ssl3_read (1bb7e0, 1aa610, 1000, 7c6a0, 19a1ac, 0) + 48
> 0006e730 SSL_read (1bb7e0, 1aa610, 1000, 1, ff0bd194, ffbec4b8) + 70
> 00060524 prot_fill (, 0, 1000, 19cfb0, ffbec7b8, 1) + 340
> 00060d8c prot_read (0, ffbec7b8, 1000, 19cfb0, 1, ffbec7b8) + 6c
> 00050894 message_copy_strict (0, 19cfb0, 8008c, eff8, 1a17a8, ff09c648) + 64
> 00044584 append_fromstream (ffbed830, 1a17a8, 9408c, 3cbce214, 1d0520, 1) + 14c
>
> This one looks like the one that's actually having the problem. If you
> kill this process, everything will return to normal.
>
> What caused this? Well, prot_fill() isn't suppose to call SSL_read if
> SSL_read is going to block. Unfortunately, it doesn't succeed in this
> case.
>
> Really, we should put the SSL socket into non-blocking mode and have
> some additional logic to make sure this doesn't happen. Since the prot
> layer itself is (generally) blocking, it's not totally trivial and we
> haven't done the work.
>
> Finally, there's the larger issue that we lock the mailbox during an
> APPEND which is a Bad Idea, since a client can be arbitrarily slow
> uploading data and thus creates a DoS for other clients. Avoiding this
> isn't probably that hard (the staging code used by lmtpd can probably
> be adapted by imapd) but we haven't done it, either.
>
> At the very least, I'd appreciate it if you open a bug on the SSL
> issue and include the backtrace on bugzilla.andrew.cmu.edu.
>
> Larry
>

Larry,

Thank your your time and energies for investigating the problem.  I will
open a bug for the SSL issue along with a backtrace.

Prior to a code fix to address the problems you observed, do you think it
would be unreasonable to configure master so that imaps is not offered?
We could revert to running stunnel for ssl support and then take our
chances with clients that initiate starttls. Our client base has become
quite accustomed to the overall reliability of cyrus and would go
ballistic with even an occasional imapd/lmtpd going bonkers :-}.

Many thanks,

Pete




Re: locking problems with 2.1.9

2002-11-08 Thread Lawrence Greenfield
   Date: Fri, 8 Nov 2002 11:04:32 -0500 (EST)
   From: Peter Krotkov <[EMAIL PROTECTED]>
[...]
   22335:  imapd -s
ff09b3bc read (0, 1c4bc8, 6f5)
0008e8c8 sock_read (0, 1c4bc8, 6f5, 8e8a0, 18edf8, 1) + 28
0008d670 BIO_read (1bd090, 1c4bc8, 6f5, 1c32a8, 1bccf0, 0) + d0
0007dec8 ssl3_read_n (5, 2010, 2010, 191b, 0, 0) + 148
0007e140 ssl3_get_record (1bb7e0, 1bccf0, 0, 0, 23138, ff0941d8) + 1e0
0007e8d4 ssl3_read_bytes (1bb7e0, 17, 1aa610, 1000, 0, 1bccf0) + 1d4
0007c6e8 ssl3_read (1bb7e0, 1aa610, 1000, 7c6a0, 19a1ac, 0) + 48
0006e730 SSL_read (1bb7e0, 1aa610, 1000, 1, ff0bd194, ffbec4b8) + 70
00060524 prot_fill (, 0, 1000, 19cfb0, ffbec7b8, 1) + 340
00060d8c prot_read (0, ffbec7b8, 1000, 19cfb0, 1, ffbec7b8) + 6c
00050894 message_copy_strict (0, 19cfb0, 8008c, eff8, 1a17a8, ff09c648) + 64
00044584 append_fromstream (ffbed830, 1a17a8, 9408c, 3cbce214, 1d0520, 1) + 14c

This one looks like the one that's actually having the problem. If you
kill this process, everything will return to normal.

What caused this? Well, prot_fill() isn't suppose to call SSL_read if
SSL_read is going to block. Unfortunately, it doesn't succeed in this
case.

Really, we should put the SSL socket into non-blocking mode and have
some additional logic to make sure this doesn't happen. Since the prot
layer itself is (generally) blocking, it's not totally trivial and we
haven't done the work.

Finally, there's the larger issue that we lock the mailbox during an
APPEND which is a Bad Idea, since a client can be arbitrarily slow
uploading data and thus creates a DoS for other clients. Avoiding this
isn't probably that hard (the staging code used by lmtpd can probably
be adapted by imapd) but we haven't done it, either.

At the very least, I'd appreciate it if you open a bug on the SSL
issue and include the backtrace on bugzilla.andrew.cmu.edu.

Larry




Re: locking problems with 2.1.9

2002-11-06 Thread Lawrence Greenfield
   Date: Wed, 6 Nov 2002 14:07:11 -0500 (EST)
   From: Peter Krotkov <[EMAIL PROTECTED]>

   > Do the lmtpd acquire or are they _attempting_ to acquire the lock on
   > the cyrus.seen file?
   >
   > Are you using the seen_local backend instead of seen_db? This hasn't
   > been tested by us in a long time; we've been assuming everyone is
   > using seen_db.

Weird. You aren't using seen_local, which means that lmtpd should
never even be trying to acquire a lock on cyrus.seen (this file is
used read only exclusively).

   --with-duplicate-db=skiplist

just a warning, you'll probably experience performance problems using
the skiplist backend for duplicate delivery suppression. (If you have
duplicate delivery suppression turned off it probably doesn't matter.)

   We noticed this adventure happening yesterday and, in the end, 'master'
   was stopped and then started.  Not a single hiccup since then (about
   110,000 imap logins and 50,000 messages handed to lmtpd since midnight).
   Should this happen again I'll be sure a record the details concerning what
   process has/wants which locks.

Ok, that would be helpful.

Larry




Re: locking problems with 2.1.9

2002-11-06 Thread Peter Krotkov
On Wed, 6 Nov 2002, Lawrence Greenfield wrote:

>Date: Wed, 6 Nov 2002 09:04:56 -0500 (EST)
>From: [EMAIL PROTECTED]
>
>We are experiencing locking problems with cyrus 2.1.9 on a Solaris 8
>system using fcntl and skiplist (except flat for subscriptions).
>We've seen the following issues:
>
>  * Lmtpd's acquire a lock on a cyrus.seen file and never get it;
>they stack up as mail comes in.
>
> Do the lmtpd acquire or are they _attempting_ to acquire the lock on
> the cyrus.seen file?
>
> Are you using the seen_local backend instead of seen_db? This hasn't
> been tested by us in a long time; we've been assuming everyone is
> using seen_db.

./configure
--with-com_err
--prefix=/var/cyrus/local
--with-cyrus-prefix=/var/cyrus/local/cyrus
--with-cyrus-group=mail
--with-sasl=/var/cyrus/local
--with-openssl=/usr/local/ssl2
--without-ucdsnmp
--with-dbdir=/usr/local/BerkeleyDB.3.3
--with-libwrap=/usr/local
--with-duplicate-db=skiplist
--with-mboxlist-db=skiplist
--with-seen-db=skiplist
--with-subs-db=flat
--with-tls-db=skiplist

> Who is holding the lock? lsof can tell you.
>
> Larry

We noticed this adventure happening yesterday and, in the end, 'master'
was stopped and then started.  Not a single hiccup since then (about
110,000 imap logins and 50,000 messages handed to lmtpd since midnight).
Should this happen again I'll be sure a record the details concerning what
process has/wants which locks.

Thank you for your time,

Pete




Re: locking problems with 2.1.9

2002-11-06 Thread Lawrence Greenfield
   Date: Wed, 6 Nov 2002 09:04:56 -0500 (EST)
   From: [EMAIL PROTECTED]

   We are experiencing locking problems with cyrus 2.1.9 on a Solaris 8
   system using fcntl and skiplist (except flat for subscriptions).
   We've seen the following issues:

 * Lmtpd's acquire a lock on a cyrus.seen file and never get it;
   they stack up as mail comes in.

Do the lmtpd acquire or are they _attempting_ to acquire the lock on
the cyrus.seen file?

Are you using the seen_local backend instead of seen_db? This hasn't
been tested by us in a long time; we've been assuming everyone is
using seen_db.

Who is holding the lock? lsof can tell you.

Larry




Re: locking problems with 2.1.9

2002-11-06 Thread Lawrence Greenfield
   Date: Wed, 6 Nov 2002 14:02:52 -0200
   From: Henrique de Moraes Holschuh <[EMAIL PROTECTED]>

   On Wed, 06 Nov 2002, John Wade wrote:
   > I assume you are using flat seen files.  If so, I ran into this
   > problem on 2.0.16 and came up with a workaround which others
   > ported to 2.1.3.  This was based on flock, but you might be able
   > to use the same basic technique.  see
   > http://servercc.oakton.edu/~jwade/cyrus/

   Yeah, Debian has that patch applied (and forward ported to 2.1.9,
   both fcntl and flock), files lib/lock*...  It works wonderfully.  I
   have never received any reports of seen file lock troubles in the
   Debian packages.

   I believe it is also in the CMU Bugzilla.

I haven't seen any evidence of this happening on non-Linux platforms,
nor have I heard that there is ever a time when the file gets replaced
even though it is locked.

If it is merely a process that already has a lock blocks trying to get
the same lock, then it is a kernel problem.

Could we have more detail on any other failures?

Larry




Re: locking problems with 2.1.9

2002-11-06 Thread Henrique de Moraes Holschuh
On Wed, 06 Nov 2002, John Wade wrote:
> I assume you are using flat seen files.   If so, I ran into this problem on
> 2.0.16 and came up with a workaround which others ported to 2.1.3.   This
> was based on flock, but you might be able to use the same basic
> technique.   see http://servercc.oakton.edu/~jwade/cyrus/

Yeah, Debian has that patch applied (and forward ported to 2.1.9, both fcntl
and flock), files lib/lock*...  It works wonderfully.  I have never received
any reports of seen file lock troubles in the Debian packages.

I believe it is also in the CMU Bugzilla.

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh



Re: locking problems with 2.1.9

2002-11-06 Thread John Wade
Hi Pete,

I assume you are using flat seen files.   If so, I ran into this problem on
2.0.16 and came up with a workaround which others ported to 2.1.3.   This
was based on flock, but you might be able to use the same basic
technique.   see http://servercc.oakton.edu/~jwade/cyrus/

The flat file locking code is very strangely broken, I attributed it to
linux kernel problems since I could never reproduce the exact scenario that
I saw in my gdb stack traces.Others however reported this problem on
enough other platforms (including solaris) that I think the bug is in the
cyrus code.   It will take a far better C programmer than I to track it
down.  What I saw was that the initial process that held the lock that
everyone else was waiting on was invariably a imapd process and it was
trying to lock a file that it already had a lock on.Meanwhile, even
though the file was locked, other processes had managed to replace it.

The workaround I came up with is to have all attempts at file locks time
out rather than wait indefinitely.This kills the initial imapd process
that has the problem and the lmtpd's etc, are no longer blocked.   For us,
this happens between one and three times a day.  (the patch I created logs
it to syslog)

Hope this helps,
John

[EMAIL PROTECTED] wrote:

> We are experiencing locking problems with cyrus 2.1.9 on a Solaris 8
> system using fcntl and skiplist (except flat for subscriptions).
> We've seen the following issues:
>
>   * Lmtpd's acquire a lock on a cyrus.seen file and never get it;
> they stack up as mail comes in.
>   * In syslog we see 'IOERROR: reading message: unexpected end of file'
>   * In various partition's 'stage.' directory we see hundreds of
> messages stacked up waiting for - surprise - users who seem to
> be having the locking issues.
>   * Some users have cyrus.seen.NEW lying around in their folders.
>
> The above problems exist for only a handful of users; the other 12k
> users seem to be user'ing along without difficulty.  But when the
> other 18k users move to this box it might get worse...
>
> All users were transferred from a different Solaris system (cyrus
> 1.5.27)  to this new one using rsync (mail/folder dirs, quota,
> subscriptions).
>
> Any pointers or suggestions would be helpful!