Hi Dirk et al,

Hmm, after reading this thread, I see that I have a thought on
how to help with this that doesn't seem to exactly be stated
here...  (though some similar things have been)

I venture this suggestion with some trepidation, being that a)
I'm a newbie to qmail, and b) I'm a newbie to this list, and
also because I see some reference to "the great fork/exec"
wars.  But I suspect there's still at least _some_ validity to
my observation (which is made based on my experience with
non-qmail things), so I'll present it anyway...

Note:  The core of my observation is based on a suspicion that
it's not really qmail's fault at all (well, not entirely,
anyway) that this is slow, but rather the enclosing loop.

If you're of the belief that the above premise is false, and
not interested in experimenting with this, then stop reading
now.  :-)


[EMAIL PROTECTED] wrote (a few words more than) the following:
> 
> A client of ours is delivering a newsletter to 230,000
> people which we are feeding into the queue like this:
> 
> #! /bin/sh
> 
> for address in `cat list`
> do
> 
> echo -ne "[EMAIL PROTECTED]\000T$address\000\000" >/tmp/address
> sed s/xxxx/$address/g /tmp/message | /var/qmail/bin/qmail-queue 1< /tmp/address
> 
> echo $address  >>log
> done
> 
> Anybody know a better/faster way? 


Well, it seems to me that you're doing a lot of fork/exec pairs
(for sed) and open/close pairs (your /tmp/address file) that
aren't quite necessary...

In my experience, doing a shell loop with fork/exec pairs in the
middle of it is _hugely_ less efficient than, say, a perl loop
with no fork/exec that achieves the same effect.  The solution I
present, though, doesn't get rid of _all_ fork/exec's, so the
improvement will certainly be less dramatic...  But I suspect it
will still be present.

Note also that my solution assumes you have RAM to burn on
pre-reading your recipient list.  If that's not the case, you'll
particularly want to try changing that portion of the below.

I consider the following to be pseudo-code (and it's
pseudo-perl).  Most of it is actually probably genuine perl
code (i.e. it might actually work), but I have left parts out
(I'm not terribly practiced at doing writing to multiple
file-descriptors in a pipeline, for example, so I'll leave that
as just comments giving pointers to things you might use to do
this), I don't have error checking everywhere I should, and I
have not done any checks on what's there...  Most of the places
where I know I've left something out I have flagged with
'XXX'.  You'll want to change those pieces.

I've not tested any of this, etc. etc. etc. (use StdDisclaimer; :-)

...  but, here goes:

#### begin
$parallelism    = 10;   # you'll want to experiment with different
                        # values here.  Don't set it to less than 1,
                        # but any positive integer should be doable.
                        # (within reason, of course.  ;-)
                        # Note: a value of 1 most closely resembles
                        #       your stated solution.

open(RECIPS, "list") ||
    die("Couldn't open recipient list: $!\n");
@recipients     = <RECIPS>;     # slurp the whole file.  Note that this
                                # assumes one recip per line.
close(RECIPS);
chomp(@recipients);             # strip newlines

undef($/);      # go into file-at-a-time read mode

open(MESSAGE, "/tmp/message") ||
    die("Couldn't open /tmp/message: $!\n");
$message_master = <MESSAGE>;    # slurp the message into memory
close(MESSAGE);

$plid   = 0;    # parallelism id for the main parent

# note that the below loop doesn't execute if $parallelism is 1.
# That is the correct behavior.  We want $parallelism - 1 children.

for($i = 1; $i < $parallelism; $i++)
{
    if(($pid = fork()) == 0) # child
    {
        $plid   = $i;   # set the plid for this child
    }

    # parent doesn't need to do anything yet.

    # XXX should really check for failures too, by checking for
    # negative $pid.
}

# we no longer know, or care, if we are the master or one of the
# children from the loop above...  We have our plid, which is what
# matters to us now.

# now loop through all the recipients that this plid owns (this is the
# main loop that actually does the work):
for($i = $plid; $i < scalar(@recipients); $i += $parallelism)
{
    # copy the data for the message.  Need to copy it so we don't
    # break the master copy, which we'll need in the next iteration.
    $message_copy       = $message_master;

    # change all instances of 'xxxx' to the current recipient:
    $message_copy       = s/xxxx/$recipients[$i]/sg;

    # XXX do some pipe stuff.  I'll assume we create "FILDES0" and
    # "FILDES1" for the master to write from.

    if(($pid = fork()) == 0)    # child
    {
        # XXX do some magic to make FILDES0 and FILDES1 actually be
        # fd's 0 and one for the child.  Ala dup(3C), but I think in
        # perl it involves open() with ">&" type args...  See the
        # perlipc man page for more.

        exec("/var/qmail/bin/qmail-queue") ||
            die("Oops, exec of qmail-queue failed for recipient " .
                    "'$recipients[$i]'.\n");
    }
    elsif($pid > 0)     # parent
    {
        print(FILDES0 $message_copy);   # XXX error checking?
        print(FILDES1 "[EMAIL PROTECTED]\000T$address\000\000");
                                        # XXX again...  errors?
        # XXX wait for child to finish, close file descriptors, etc.
    }
    # XXX check for (and deal with) fork failure here, too.
}
#### end

So, it's about 8 times as many lines of code, but it cut's out
the whole sed thing, and gives you customizable parallelism,
which I suspect _might_ be quite helpful.  (Though that would
depend a lot on the system...  Having 10 (or whatever) of these
things might slow the system down enough to negate the benefits
of parallelism, I dunno.)  However, if my initial premise is
correct, this could make a world of difference (for the better).


Best of luck!

Cheers,

        David

-- 
David Lindes, KF6HFQ            DaveLtd[tm] Enterprises
[EMAIL PROTECTED]              http://www.daveltd.com/

Reply via email to