Re: suggestion need to design an email system.

2008-09-17 Thread Jens Hoffrichter
Hello,

2008/9/17 Simon Matter <[EMAIL PROTECTED]>:
>> Another thing which really intrigued me was the inherent
>> cluster-ability of dovecot, which is a huge PITA to get to run on
>> cyrus (as I have just implemented it a couple of months ago). Yet I
>> only have read about it in the documentation, and not actually seen it
>> in action. But at least they thought about running on a clustered file
>> system
> Cyrus works quite well on a cluster with clustered filesystem. Do you talk
> about murder/replication or a 'simple' cluster?
Sorry, I really didn't make myself clear here.

I'm not talking about an active/passive cluster, where one cyrus
instance is running and another server takes over when the main one
dies, but an active/active configuration, where several nodes access
the same meta databases and mailboxes which is residing on a SAN with
a clustered filesystem (in my case GFS).

And cyrus had some issues running in that configuration which took us
a week or two to figure out, which were related to the berkeley db
support compiled in. There are some things regarding mmap file io, GFS
and mutexes which doesn't really work. After I compiled cyrus without
bdb support, it now works like a charm, and we have now 5k mailboxes
on the 3 machine cluster, testing it now for a couple of weeks before
we migrate the rest of the mailboxes :)

Dovecot seemed to have thought and adresses those issues about
concurrent access, and it just seems to play more nicely in a
heterogenous environment than cyrus, which needs exclusive management
access to the mailstore. At least, if you follow best practices, and
everything else is a bit of a guessing game.Been there, done that
;)

Jens

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: suggestion need to design an email system.

2008-09-17 Thread Jens Hoffrichter
Heya :)

2008/9/17 Adam Tauno Williams <[EMAIL PROTECTED]>:

>> I noticed that dovecot and cyrus don't differ that much in speed to
>> each other. Both seem to excel at certain points, while being weaker
>> at another. But overall the performance on a huge mailbox seemed to be
>> comparable. Dovecot seemed to be slightly better at searching in the
>> mailbox, esp.
> Do you have SQUAT enabled on Cyrus?  Are you sure that the IMAP client
> in question sends searches to the server (aka: not using Thunderbird)?
Nope, I hadn't enabled squat there. It was a plain Debian default
install for all the three mailservers. And I don't see this values as
the absolute truth, as they weren't done on a carefully engineered
test environment, but on a vmware I put together for that purpose. But
I noticed certain trends there, and I only will talk about them
instead of absolute numbers :)

And yes, I am certain that the search was sent to the server, as it
was a test script written on top of the python imap library, and I
measured times in there.

Again, I repeat, I don't take those numbers at a face value, but
rather wanted to get an overall impression, if it would be useful to
pursue dovecot instead of cyrus (and because a colleague of mine
emphasized always the point that cyrus is the fastest OSS IMAP server
out there, and I wanted to see that for myself ;) )

>> What really intrigued me about dovecot was the ability to run on
>> standard mailbox formats, which may not be much of an issue when
>> running in a pure cyrus environment, but is a huge plus when migrating
>> from another server.
> I don't see why it is an advantage for migration.  Tools like imapsync
> don't care.
Certainly it doesn't do much for a smaller system. But today I
discussed with the same colleague about something I added to a
migration process which added 1s per mailbox to the migration
process...And that became a huge issue. ;)

I don't think that an imap sync is the fastest way of doing a migrate
of a complete server.

But that is totally unrelated to the point I was trying to make, it
just came to my mind when talking about migration.

I really liked the fact that I could just stop courier on the vmware,
start dovecot and use the same mailstore without any conversion. It
would have been the same for an uw-imapd, with just a simple change in
the config file, instead of a conversion for all of my mailboxes. I
see that as a plus, and don't see anything negative there.

>> Especially the "self-healing indexes" which were
>> built on first use of a mailbox, and not using a reconstruct. So
>> getting dovecot to run was very simple. And I like programs which take
>> a common format, and don't think they need to re-invent the wheel.
> I actually view this as a strong negative as it incentivises people to
> muck about in the mailstore rather than using the tool-chain.   All
> operations on a "real" server (IMO) should be performed via a tool and
> not by hacking about beneath the service.
Well, that is an issue about access rights management instead of imap
server things. If I allow my users to muck directly in their
mailstorage directories, I need to live with the consequences. For me,
everything done to a mailbox should be done through my imap server. So
on my servers, users don't have direct access to their mailbox (if
they have a unix user at all).

The point I was trying to make here: Why does cyrus need it's own
structure for the mailboxes, which is similar, but not wholly
compatible, to maildir. Maildir and cyrus both suffer from the same
disadvantages (huge needs in terms of inodes etc.), yet I see no
distinctive advantage for the cyrus mailbox format to maildir.
Contrary, I could restore a maildir mailbox including seen state
completely, even if my metadata crashes.

>> And the last thing is SASL. Dovecot needs no SASL, it brings
>> authenticators for a variety of sources, and offers postfix and exim
>> auth mechs as well. This may be a very personal thing, but if I can
>> work around SASL, I'm very, very glad about that. SASL may offer
>> everything you may need in a century of running a mailserver, but
>> getting it to run is just painful, and debugging is non-existant (at
>> least the last time I tried to implement it, which is a couple of
>> years ago. Since then I worked around the issue whereever I could).
> Again, I just disagree.  Most SASL operations these days, "just work".
> And when using something like GSSAPI SASL is a nicely known quantity.
>
> Documentation of SASL is generally pretty bad, but hey, this is Open
> Source.  In general *ALL* the documentation is crap. :)
Well, as I said, I have given up on SASL like 4 years ago. I tried to
implement a simple authentication against a postgres database for
postfix, and I fiddled around for 3 days without coming anywhere near
a success.

I ended up emulating the saslauthd protocol from an external program,
which accesses the database directly. That worked like a charm, and
was conside

Re: suggestion need to design an email system.

2008-09-17 Thread Jens Hoffrichter
Hi all,

I know I will take some heat for this (as I know this is a cyrus
mailinglist), but I'm going to offer some points for dovecot. I'm all
for informed decisions, where one can weigh points against each other
and decide on that basis.

First of all, I have administrated both courier and cyrus imap in a
production environment (with cyrus scaling up way beyond 100k
mailboxes), and I have made recently some tests with dovecot (I
haven't seen it yet in production), which I found quite interesting.

I noticed that dovecot and cyrus don't differ that much in speed to
each other. Both seem to excel at certain points, while being weaker
at another. But overall the performance on a huge mailbox seemed to be
comparable. Dovecot seemed to be slightly better at searching in the
mailbox, esp. searching for common terms you have searched for before,
selecting of single mailboxes and downloading all headers of a
mailbox, while cyrus seemed to be slightly better in getting the
structure of a mailbox (the "(FLAGS INTERNALDATE RFC822.SIZE ENVELOPE
UID BODYSTRUCTURE)" fetch command, which seems to be commonly used by
mailclients.

What really intrigued me about dovecot was the ability to run on
standard mailbox formats, which may not be much of an issue when
running in a pure cyrus environment, but is a huge plus when migrating
from another server. Especially the "self-healing indexes" which were
built on first use of a mailbox, and not using a reconstruct. So
getting dovecot to run was very simple. And I like programs which take
a common format, and don't think they need to re-invent the wheel.

Another thing which really intrigued me was the inherent
cluster-ability of dovecot, which is a huge PITA to get to run on
cyrus (as I have just implemented it a couple of months ago). Yet I
only have read about it in the documentation, and not actually seen it
in action. But at least they thought about running on a clustered file
system

And the last thing is SASL. Dovecot needs no SASL, it brings
authenticators for a variety of sources, and offers postfix and exim
auth mechs as well. This may be a very personal thing, but if I can
work around SASL, I'm very, very glad about that. SASL may offer
everything you may need in a century of running a mailserver, but
getting it to run is just painful, and debugging is non-existant (at
least the last time I tried to implement it, which is a couple of
years ago. Since then I worked around the issue whereever I could).

Cyrus has undisputed the broadest implementation of the IMAP protocol
in the open source world, especially regarding shared folders. If you
need that, there is no way around cyrus. It has a very broad user
base, and has proven itsself to be quite solid in terms of scalability
and stability. Dovecot has yet to prove that (at least to me).

If I personally had the chance, I would give dovecot a shot, at least
in a testing environment. But probably mostly out of curiosity, and
because "Its new" ;) But except for the missing support for shared
mail folders, I see no real reasons against dovecot, at least not for
giving it a try.

And please don't take this as a personal insult to all hardcore cyrus
evangelist. I tried to be just and unbiased, and after all, it is MY
PERSONAL OPINION. On this mailinglist, you don't need one more person
voting for cyrus, there are enough of those ;)

Jens

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Fwd: Docs for Cyrus/GFS?

2008-09-08 Thread Jens Hoffrichter
There I told him to write to the list, and I just clicked the wrong
reply button myself ;)

Sorry for that.


-- Forwarded message --
From: Jens Hoffrichter <[EMAIL PROTECTED]>
Date: 2008/9/8
Subject: Re: Docs for Cyrus/GFS?
To: "Chris St. Pierre" <[EMAIL PROTECTED]>


Hi Chris,

2008/9/8 Chris St. Pierre <[EMAIL PROTECTED]>:

> I'm looking for any good documentation on using Cyrus IMAP with shared
> storage (in our case, GFS).  Thus far, all I've been able to turn up
> is a few snippets on mailing lists and in the wiki, but nothing really
> comprehensive.  Any pointers?  Thanks!
There is no real documentation up to now, I think, but I'm in the
process of implementing a quite big mail system on top of a GFS, so if
you have any specific questions, just ask here on the list or me
directly, though on the list would probably be better, to share the
answers.

Just some hints for things I stumbled upon up to now:

I suppose you meta directory will be shared, so you need to separate
out some directories between the single nodes, where each node can put
its own files. Directories coming to my mind are "proc" and "backup".
There are probably some more, but I have no access to the machine
right now to check it. For the separation there I used the named
symlinks from GFS, which worked quite fine.

The other thing is: Make ABSOLUTELY sure that you turn off anything
related to bdb during compile (you probably need to compile your own
cyrus anyway, we are using the incova sources here), as berkeley db
just doesn't work on top of GFS. Or, at least not in a cyrus context.
And it isn't sufficient to just use skiplist in the config file, you
definitely need to switch off compiling bdb support.

The reason for this is that even if you don't have any bdbs
configured, cyrus will still initialize some sort of bdb environment
in your meta directory, and if on another sort the same thing is
happening during the same time, it will try to sleep on a mutex,
waiting for the release of the mutex. Yet, the callback to wake up
never comes when using GFS, so your process will hang indefinitely.
That one took us at least 3 weeks to figure out ;)

Besides from those little things, cyrus works on top of the GFS just
fine, you probably need to tweak some tuning parameters, but I don't
remember them right now, if you need them, I can look them up for you.

The performance is nothing to brag about, though, cyrus and GFS just
don't play that well together. But I got the performance I needed
(around 15 logins per second with three nodes using a loadbalancer),
which is sufficient for our needs. The added reliability is nice,
though, as you can turn off one node, and the whole thing still works.

I don't know if you have any experience using RedHat Cluster, but be
absolutely sure to understand about fencing and quorum of the cluster
before you start any configuration. Cluster software is by no means
really easy to use, IMO, and everything you do needs some extra
thinking if you are just used working with single nodes normally :)

Just try your luck, and if you have any questions, just ask.

Jens

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: Cyrus - can't create user mailbox

2008-06-09 Thread Jens Hoffrichter
Hi Stephen,

2008/6/9 Stephen Liu <[EMAIL PROTECTED]>:

> Thanks for your advice.
No problem - we all struggled at some point and were glad for help :)

> $ cat /etc/postfix/master.cf | grep smtp
> smtp  inet  n   -   -   -   -   smtpd
>
> smtp  unix  -   -   -   -   -   smtp
> relay unix  -   -   -   -   -   smtp
> #   -o smtp_helo_timeout=5 -o smtp_connect_timeout=5
> bsmtp unix  -   n   n   -   -   pipe
>  flags=Fq. user=bsmtp argv=/usr/lib/bsmtp/bsmtp -t$nexthop -f$sender
> $recipient
> * end *
>
> There are only 2 lines there with smtp in the beginning.
>From this snippet you don't see if chroot is enabled by default - the
default is denoted by the -, and documented in the line directly
before the beginning of the transports. But as I know Debian and
Ubuntu, they have probably activated chroot.

> $ sudo nano /etc/cyrus.conf
>
> change both lines.
>
> changing;
> lmtpunixcmd="lmtpd" listen="/var/run/cyrus/socket/lmtp"
> prefork=0 maxchild=20
>
> as;
> lmtpunixcmd="lmtpd"
> listen="/var/spool/postfix/var/run/cyrus/socket/lmtp" prefork=0
> maxchild=20
>
>
> changing;
> notify  cmd="notifyd" listen="/var/run/cyrus/socket/notify"
> proto="udp" prefork=1
>
> as;
> notify  cmd="notifyd"
> listen="/var/spool/postfix/var/run/cyrus/socket/notify" proto="udp"
> prefork=1
I guess you won't need to change this line, as this is independent from postfix.

> Jun 10 00:36:30 lampserver postfix/smtpd[4955]: D21EA87820E:
> client=ti-out-0910.google.com[209.85.142.187]
> Jun 10 00:36:30 lampserver postfix/cleanup[4956]: D21EA87820E:
> message-id=<[EMAIL PROTECTED]>
> Jun 10 00:36:30 lampserver postfix/qmgr[4188]: D21EA87820E:
> from=<[EMAIL PROTECTED]>, size=1842, nrcpt=1 (queue active)
> Jun 10 00:36:30 lampserver postfix/lmtp[4958]: D21EA87820E:
> to=<[EMAIL PROTECTED]>, relay=none, delay=0, status=deferred
> (connect to /var/run/lmtp[/var/run/lmtp]: No such file or directory)
As you see here, it tries to connect to the socket "/var/run/lmtp",
but the cyrus default was /var/run/cyrus/socket/lmtp, or the postfix
chroot equivalent.

So EITHER you change the delivery socket in postfix (which I currently
don't know how to do, as I use a different delivery approach on my
postfix server - and a different IMAP server ;) ), or you try either
/var/run/postfix/var/run/lmtp or /var/run/lmtp in the lmtpunix line in
cyrus.conf

I hope that helps :)

Regards,
Jens

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Cyrus - can't create user mailbox

2008-06-09 Thread Jens Hoffrichter
This mail accidently went off-listSo here a resend


-- Forwarded message --
From: Jens Hoffrichter <[EMAIL PROTECTED]>
Date: 2008/6/9
Subject: Re: Cyrus - can't create user mailbox
To: Stephen Liu <[EMAIL PROTECTED]>


Hi Stephen,

2008/6/9 Stephen Liu <[EMAIL PROTECTED]>:
>> Try to find out where you cyrus creates it's lmtp socket and point
>> your
>> postfix config to it.
>>
>> I don't know Debian but I think it should come with some docs to get
>> things to work. (Ubuntu ins mainly a copy of Debian so the same
>> should
>> apply there as well). Maybe some Debian/Ubuntu user can point you to
>> the
>> right docs.
>
>
> $ cat /etc/cyrus.conf | grep socket
> # UNIX sockets start with a slash and are absolute paths
># (you must keep the Unix socket name in sync with imap.conf)
>lmtpunixcmd="lmtpd" listen="/var/run/cyrus/socket/lmtp"
> prefork=0 maxchild=20
>notify  cmd="notifyd"
> listen="/var/run/cyrus/socket/notify" proto="udp" prefork=1
> * end *
>
>
> Would it be /var/run/cyrus/socket/lmtp ?
Yep, it would be it.

But please note that the postfix smtpd under Debian (and probably
Ubuntu as well) runs in a chroot environment. You can see this by
looking in your /etc/postfix/master.cf file, look at the line with
smtp in the beginning and look in the right column for the chroot.
Which one that is should be documented in the top of the file.

If the smtpd runs in a chroot environment, it will expect the lmtp
socket relative to the chroot path, normally /var/spool/postfix, so
the correct complete path for the socket would be
/var/spool/postfix/var/run/cyrus/socket/notify ;) This created a lot
of headache for me when configuring sasl, until I figured it out.

Regards,
Jens

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: Problems with load balancing cluster on GFS

2008-06-06 Thread Jens Hoffrichter
Hallo Klaus,

2008/6/6 Klaus Steinberger <[EMAIL PROTECTED]>:

>> I'm seeing some weird behaviour with the pop3 daemon on a GFS HA
>> cluster with load balancing.
>
> I would not advise running cyrus-imapd on top of GFS. GFS is even with the
> best tuning possible very slow regarding small files (the typical load type
> of a cyrus-imapd). GFS runs into heavy locking with that type of load. So
> don't do it.
Thanks for the advice, but currently I am tied to that setup, due that
we are operating on a schedule, and are nearly going live with that.
And I just can't afford to redo everything at the moment. But I will
monitor performance very closely, will have a fallback plan if it just
doesn't do what I expect it to do, and I will start with a low load on
it. If you guys are interested in that setup, I will keep you updated
how the things progress :)

>  Current size of my Imap Server is 2500 users and currently 250 GByte of
> Mailboxes used (growing and growing).
Well, we will be talking about something in the range of above 50k
mailboxes, so a single machine is just out of question. And some sort
of standby will be needed. I didn't do the concept for this system,
though, I'm just the one who has to implement it ;)

> I don't see how to avoid a murder setup if you need more than one machine
> running cyrus-imapd in parallel.
Well, there are other possibilities I have seen, especially together
with perdition and an LDAP server (which we have here anyways). But
that is more in the region of an active-passive setup instead of an
active-active setup. And I must admit that I don't know murder that
well, only that it logs very little into the logfiles when delivering
a mail ;)

I don't think I can easily go away from the current setup I'm working
on, but I will monitor it very closely. As I said in the other mail, I
have solved the problem I had, but the performance is behind my
expectations. So I will need to do some more testing to confirm if I
can go live with that cluster.

Regards,
Jens

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: Problems with load balancing cluster on GFS

2008-06-06 Thread Jens Hoffrichter
Hello,

2008/6/6 Jorey Bump <[EMAIL PROTECTED]>:

> Yeah, it shouldn't lock with urandom. You might want to play around with
> poptimeout and popminpoll, to see if that has any effect on your load
> balancing test. Is jakarta-jmeter distributing these logins among enough
> different users to simulate real-world conditions? What do your imap/debug
> logs say when the lockup occurs?
Yes, I have configured jmeter to use all those 100 mailbox users in a
round robin fashion, so this should be close to a real world setup.

The log simply stops saying anything, especially about pop3 connections.

But I think I have solved the current problem:

The problem appears to be related to the Berkeley DB environment in
/var/lib/imap/db . Although I don't use that format, as all of the
databases are configured using skiplist, cyrus still initializes the
environment on every connection. And if some other process has locked
the database, it does a futex call on the mmap region, and goes to
sleep. The problem seems to be that with using GFS, it doesn't get a
signal that the database is unlocked, and stays sleeping forever.

I discovered this today when I systematically strace'd (with strace
-p, which apparently sends some kind of signal to the process) all
pop3d processes on one of the hanging machines, and suddenly
everything started to work again, including the hanging note. A closer
examination told me that it then does the futex call again, unlocks
that and just continues.

My solution for this is now that I disabled bdb while compiling, and
everything works like a charm now, though the performance is not yet
there where I expected it to be. But I'm not sure if that is my
loadbalancing test or the cluster config :)

> While I support POP3, I encourage all of my users to use IMAP, so I don't
> have many problems with pop3d (except for brute force attacks, which I
> solved by increasing sasl_minimum_layer, but that won't help you here).
Not an option here, the customer I'm building the cluster for supports
only POP3 to the outside, and IMAP only for the internal webmail app.
So POP3 HAS to run ;)

Regards,
Jens

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: Problems with load balancing cluster on GFS

2008-06-06 Thread Jens Hoffrichter
Hello Jorey

2008/6/5 Jorey Bump <[EMAIL PROTECTED]>:

>> At first I thought that this was a problem related to entropy, but it
>> even persisted after I turned off "allowapop", and unconfigured
>> everything relating to TLS (as SSL/TLS will be handled completely by
>> the perdition, we don't need it)
>
> To rule it out completely, watch it during your test:
>
>  watch -n 0 'cat /proc/sys/kernel/random/entropy_avail'
>
> It might start blocking when it gets as low as 100 (healthy seems to be
> above 1000). If you're at the console (not a remote terminal), type on the
> keyboard to add entropy and see if this helps. If it does, you may have a
> cyrus-sasl that uses /dev/random (the default). Check the source RPM to
> verify, and adjust it to use /dev/urandom to stop the blocking.
Thanks for that hint, I didn't know that you could monitor available
entropy that way, that is very useful to know :)

But it doesn't seem to be related to entropy. Though on one of the
nodes entropy is usually quite low (between 100 and 300), it never
drops below the 100 mark, and when running a load test, that node and
another failed, and on the one failing was more than 3000 entropy
available.

To rule it out completely I started rngd on all the nodes, feeding
from /dev/urandom (I know, not perfect, but better than nothing ;) ),
but that didn't change anything. And I checked the compilation
settings for my cyrus-sasl package, it already takes /dev/urandom as
entropy source. So I think I can rule it out mostly

But thanks for the input.

Regards,
Jens

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Problems with load balancing cluster on GFS

2008-06-05 Thread Jens Hoffrichter
Hello everyone,

I hope this is the correct mailing list to post this problem on.

I'm seeing some weird behaviour with the pop3 daemon on a GFS HA
cluster with load balancing.

The general situation is as follows:

I have 3 servers here, everyone installed with CentOS 5.1 and the
latest RedHat cluster. On every server is a cyrus 2.3.12p2 from the
Invoca distribution.
he
The servers share two common partitions for data storage on an SAN,
one 1 GB partition mounted on /var/lib/imap, and one 1.2TB partition
mounted on /var/spool/imap. On the /var/lib/imap partition I have set
up the following directories so they point to individual directories
for each node: backup, proc and socket. The backup directory was made
separately because some cron.daily entries locked each other up in the
night, rendering the cluster useless.

In front of the three backend servers is a load balancer, which
balances pop3, imap, lmtp and timsieved on a round robin basis to each
node.

The load balancer is used (or will be used ;) ) by two perdition
servers which connect to the pop or imap port on the LB, which
distributes them to a running node.

The idea behind this is that we can shut down any node without a
notable service interruption, and we only have one backend system
instead of several one. We want to migrate away from a murder based
setup, so any comments in that direction won't be very useful for me
at this stage ;)

The problematic behaviour I see at the moment:

I have migrated ~100 test mailboxes from the old backend system, and
I'm in the process of performing load tests on the new system to get
an impression how the performance will be, and if we are on the right
track. From the mailboxes around 80 are empty, 10 are medium filled
and 10 are filled to the maximum storage, which is about the
distribution we will be talking about after putting the system live.

The load test is performed with jakarta-jmeter from apache.org, which
chooses one of the mailboxes, and performs either a pop-3 or imap
login to the backend, using the load balancer. The distribution is
roughly that I do 5 pop3 logins for 1 imap login, with a performance
about 5 logins/sec.

After 30 to 60 seconds into the test, randomly one of the backend
servers pop3ds will stop working. It is still accepting connections,
but doesn't send a banner anymore. This is recognized by the load
balancer as "working" (as the port is still open), but one after
another all my connections will hit the malfunctioning server and the
test basically stalls.

A restart of the cyrus service stops the problem for another 30 - 60
seconds. If I just stop the one offending server, so it won't be used
by the LB anymore, the test usually finishes without a problem..

At first I thought that this was a problem related to entropy, but it
even persisted after I turned off "allowapop", and unconfigured
everything relating to TLS (as SSL/TLS will be handled completely by
the perdition, we don't need it)

My personal guess is that it is somehow related to the port tests by
the load balancer, as normally a connection from the load balancer is
the last thing I see in the log of the offending backend server. The
port tests are easily distinguishable, as the LB just opens a TCP
connection and instantly resets it before it reads any data from the
pop3d, not even waiting for a banner. After this happens, there are no
more log entries regarding pop3d, or log entries from the master that
it spawns new pop3 processes.

My second guess was that it is related to locking, but the IMAP server
just continues to run fine, and doesn't have a problem.

At the moment, I'm running out of ideas where to look, and my
knowledge about cyrus debugging is quite limited (never had such a
problem before ;) ), so any ideas or points how to debug the problem
would be appreciated.

Oh yes, I tried to strace the pop3d, and from the pop3d which
generates the last log entry normally comes a SIGPIPE, as the end
point isn't connected anymore to the pop3d.

It looks a bit like master doesn't recognize that there is a problem
regarding spawning off new children, and assigns new connections to a
dysfunctional pop3d.

Any ideas, hints, questions will be greatly appreciated, if
information is missing I will provide what I can :)

Thanks in advance!

Regards,
Jens

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html