Re: queue processing problem

2001-06-13 Thread Charles Cazabon

David Gartner <[EMAIL PROTECTED]> wrote:
> 
> 4 machines (3 nodes, running qmail, mounting /home from NFS server.  1 NFS
> server--running IDE RAID 5)  All four machines: have 64M of ram and a 633
> Mhz proc, have qmail installed, accept smtp and pop3 connections and all
> have the same /home (again, mounted over NFS).  Vpopmail is installed on the
> NFS server, in the home directory (local mail is put in
> /home/vpopmail/domains/whatever.com/).  /var on each machine is separate, so
> they each have a separate queue. No special concurreny settings.  tcpserver
> is accepting 150 connections on pop/smtp at a time.  Load balancers in the
> front of these four machines send traffic to the least congested (the nfs
> sever gets less traffic than the other three).
> 
> Now, My question is do you think this can support a small ISP (10,000)
> efficiently or should we go with special settings and/or think about
> faster/better hardware?  Do you think this leaves room for expansion?

If I was setting it up, I'd probably make the NFS server a separate box (not
accepting any SMTP or POP3 connections), probably running on SCSI RAID instead
of IDE.  The faster the SCSI setup, the better, of course.  Additional memory
in the NFS server would also be a benefit, and at a cost of USD$50 for 256MB
of ECC PC133 SDRAM, it's hard to justify the business case of _not_ purchasing
one or two extra sticks.

The only other concern I would have would be that if one of your SMTP/POP
toasters dies, you lose the contents of the queue on that machine, since
they're running a single IDE disk for the queue.  If this concerns you,
perhaps upgrade each of those machines to IDE RAID.

Can three toasters and one NFS server handle 10,000 users?  Probably, but it
depends a lot on what those users are doing.  If they're mailing 20MB
attachments to the net at large on a regular basis (or even worse, to each
other), and they're each connected 24/7 and POP-checking their mail every
minute, your systems might fall over rather quickly.  If they're mostly dialup
users connected an hour or two a day, sending a few 5k messages each, and only
POP-ing their mail every 15 minutes, maybe your current setup is already
overkill.

You said you were worried -- I wouldn't be.  Is the current setup working for
you?  Are the toasters frequently hitting their concurrency limits?  Do you
have the headroom to raise those limits?  Is the NFS server coping with the
current load?

Remember, with a modular architecture like you're using, you can always add
additional toasters in the future, feeding off the same central NFS server.
If you grow to the point that you can't handle it with a single PC-based NFS
server, a NetApp or similar might be within your reach at that point.

Charles
-- 
---
Charles Cazabon<[EMAIL PROTECTED]>
GPL'ed software available at:  http://www.qcc.sk.ca/~charlesc/software/
Any opinions expressed are just that -- my opinions.
---



Re: queue processing problem

2001-06-13 Thread Russell Nelson

David Gartner writes:
 > 4 machines (3 nodes, running qmail, mounting /home from NFS server.  1 NFS
 > server--running IDE RAID 5)

Switch to SCSI drives and you can do it with one machine.

-- 
-russ nelson <[EMAIL PROTECTED]>  http://russnelson.com
Crynwr sells support for free software  | PGPok | 
521 Pleasant Valley Rd. | +1 315 268 1925 voice | #exclude 
Potsdam, NY 13676-3213  | +1 315 268 9201 FAX   | 



Re: queue processing problem

2001-06-13 Thread David Gartner

Charles (and everyone else),

I saw this thread and it made me kinda worried about a cluster we're fixing to
send out.  Here's a brief description of how it works:

4 machines (3 nodes, running qmail, mounting /home from NFS server.  1 NFS
server--running IDE RAID 5)  All four machines: have 64M of ram and a 633 Mhz
proc, have qmail installed, accept smtp and pop3 connections and all have the
same /home (again, mounted over NFS).  Vpopmail is installed on the NFS server,
in the home directory (local mail is put in
/home/vpopmail/domains/whatever.com/).  /var on each machine is separate, so they
each have a separate queue. No special concurreny settings.  tcpserver is
accepting 150 connections on pop/smtp at a time.  Load balancers in the front of
these four machines send traffic to the least congested (the nfs sever gets less
traffic than the other three).

Now, My question is do you think this can support a small ISP (10,000)
efficiently or should we go with special settings and/or think about
faster/better hardware?  Do you think this leaves room for expansion?

Kinda distressed,

David




Re: queue processing problem

2001-06-13 Thread Charles Cazabon

Shawn Estes <[EMAIL PROTECTED]> wrote:

Dave Sill had some good debugging/pinpointing advice for you in a separate
message.  I'll add a few things here.

> First off, Im using concurrency patch and big-todo patch (from qmail.org)
> with qmail-1.03. I've configured the conf-spawn to 400. We are an ISP so we
> are not doing any kind of mailing lists, all messages coming through our
> system are seperate messages sent by different customers. We process about
> 15,000 different messages an hour. We have a server running FreeBSD 4.3,
> with 256MB RAM, 9GB Seagate Barracuda 7200 (this is the disk holding the
> queue), Quantum Fireball is holding the homedirs of the users.

You're running a significant load -- disk I/O bandwidth and latency are
probably at least part of the problem you're experiencing.  Switching to a
15kRPM disk on a U160 controller would almost certainly help -- it will at
least double available queue disk I/O bandwidth, while halving rotational
latency.  qmail does fsyncs at critical times to ensure reliability, and those
each involve a disk seek; halving the rotational latency will reduce the
access time significantly.

> 1) qmail-qstat is showing that the "not yet preprocessed" messages are
> growing, and very seldom is that number decreasing.

qmail-send is having a hard time keeping up to the rate at which you are
injecting messages.

> 3) Ran the qmail-send run file by itself and the messages in the queue went
> through very quickly. (5000 messages in about 15 minutes or so) A lot better
> then they are with everything running.

So when no messages are being injected, your system can deliver at reasonable
speed, but as soon as you turn on qmail-smtpd, it can no longer keep up.

> I appreciate any help that anyone can give me. I'm hoping that this is an
> easy problem that I am just overlooking. If anymore information is needed,
> please let me know.

At a few hundred dollars for a 9GB 15kRPM disk, I'd say it's certainly a
simple way to improve your system performance.

> subdirectory split: 23.

This is something else you might want to change.  With 8000 messages in the
queue and a subdir split of 23, you're averaging around 350 files per
directory -- I've not had good luck with FFS-based systems when my directories
have more than about 200 files each.  Perhaps try something higher (remember,
it should be prime).  If you can't just vaporize the current contents of the
queue or take the system down for a few hours, the way to switch over will be
to temporarily run two instances of qmail in parallel.

Charles
-- 
---
Charles Cazabon<[EMAIL PROTECTED]>
GPL'ed software available at:  http://www.qcc.sk.ca/~charlesc/software/
Any opinions expressed are just that -- my opinions.
---



Re: queue processing problem

2001-06-13 Thread Dave Sill

Shawn Estes <[EMAIL PROTECTED]> wrote:

>First off, Im using concurrency patch and big-todo patch (from
>qmail.org) with qmail-1.03. I've configured the conf-spawn to 400. We
>are an ISP so we are not doing any kind of mailing lists, all
>messages coming through our system are seperate messages sent by
>different customers. We process about 15,000 different messages an
>hour. We have a server running FreeBSD 4.3, with 256MB RAM, 9GB
>Seagate Barracuda 7200 (this is the disk holding the queue), Quantum
>Fireball is holding the homedirs of the users. 
>
>This is kind of broken up into a few different problems.
>
>1) qmail-qstat is showing that the "not yet preprocessed" messages
>   are growing, and very seldom is that number decreasing. 
>
>
>2) qmail-remote is being spawned way under the current remote
>   concurrency limit (175) I have very seldom seen this number reach
>   above 30.

Both suggest that qmail-send is having trouble keeping up. qmail-send
is responsible for processing messages placed in the queue and for
scheduling remote deliveries through qmail-rspawn.

The question to answer is why qmail-send isn't keeping up. Perhaps
disk I/O is the bottleneck. Or maybe the CPU is maxed out--though
that's unlikely. What else is the system doing? Is there any idle CPU?

Another possibility is that it's just too busy. You could split the
load somewhat by installing another instance of qmail,
e.g. in /var/qmail2, and let one instance handle locally injected
messages while the other handles SMTP injected messages. Since
qmail-send is single-threaded, it might be not able to keep
qmail-rspawn busy if it keeps seeing new messages that need
processing. Splitting the load like this would mean fewer
interruptions for the qmail-send handling locally injected messages.

>su-2.05# ps -ax | grep qmail-remote | wc -l
>  30
>su-2.05# ps -ax | grep qmail-smtpd | wc -l
> 111

That's a fairly high number of incoming SMTP connections.

>Excerpt from /var/log/qmail/current:

Too small to be useful, and lacking timestamps.

>3) Messages are staying in the queue and are not being delivered the
>   way they should be. Note: Messages are going out, just very
>   slowly. The logs are showing deliveries local and remote. There
>   are no error messages in the log. (A test message sent to a local
>   user takes approximately 30-45 minutes, roughly the same amount of
>   time for a remote user)

Same problem as 1 and 2.

>Here's what I've done so far:
>
>1) Checked the Trigger file to make sure it has the correct permissions:

Good.

>2) Checked ulimit and kern max files. 

OK.

>3) Ran the qmail-send run file by itself and the messages in the
>   queue went through very quickly. (5000 messages in about 15
>   minutes or so) A lot better then they are with everything
>   running.

Confirms my "qmail-send is being interrupted" hypothesis, I think.

>4) Verified my run scripts with LWQ. The run scripts have softlimits
>   that are increased from LWQ, could this be my problem?

No, but I wonder why you want such high limits. They're for your own
protection.

-Dave