Java and Qmail - building a large mailmerge server - plain text version

2001-06-22 Thread manav


Hi,

I have been using qmail for the last year and a half and have been closely
following the mailing list at securepoint, and didn't find anything related
to my query, hence I took the liberty of posting it.

The objective is to build a high-volumer server capable of doing mail-merged
email blasts to several lists with 10,000 to 1,000,000 users, provide
detailed reports about the status of emails (sent, bounced, bad email
addresses, opened, forwarded), list management (across multiple lists for
each user) and of course, stability.

Over the period of last 12 months, we explored several options - and finally
settled on qmail (what else?). I am using a Pentium III with Linux Redhat
6.2 installed on it, with 512 MB of RAM, 20 GB HDD and JDK 1.2.2 connected
to a 128 Kbps line.

Following are the topics on which I need your comments/suggestions:-

1. Earlier we used to "Runtime.exec()" qmail-inject and manually give it the
messages. This way, qmail would go on and do the delivery. We would then
parse the log files to find the status of the message.
1.1 We had a unique "from" address for each blast for each user to
uniquely identify each email sent (in maillog). Sometimes, instead of
logging the "From" address, the maillog would have the "replyto" address.
Any ideas why? Is there anything else that can be used to uniquely identify
a message?
1.2 For each blast we want to handle the bounced emails individually (we
would need to update the appropriate table). What do we do for that? We
cannot just "set" environment variables since there will be multiple
mail-merges and blasts happening simultaneously.
1.3 Usually after about 5,000 deliveries, the messages would be stuck in
the queue. We then added the CNAME lookup patch, and this increased to about
10,000. Currently, we "prune" the lists uploaded by the users and send
messages in chunks of 2000, with less than 30 concurrent messages. Any
suggestions what could be the culprit? What can we do to circumvent this
problem?
1.4 What would be the best possible way to handle unsubscribe requests.
Currently we invoke a java program from the .qmail file that updates the
database. Any suggestions how this can be improved upon?

2. We then decided to switch over to using qmail-remote, to circumvent the
queue and the logging problem. This effectively means we will have to do our
own logging. Is there anyway to hand over different messages to qmail-remote
rather than invoking it for each message? We have now decided to change the
implementation so that at any point of time, there will be as many threads
sending messages as the qmail concurrency (say around 100), and the messages
themselves will be broken into chunks of 300 to 500 each. How can we improve
this?

3. Currently, we have our own implementation for checking bad e-mail
addresses, list management, handling bounces and mail-merge. Are there any
guidelines/sample code available (any language), that we can look at?

4 . What other things should we keep in mind to provide stability to the
system? What patches to qmail are advisable to be installed? What should be
the typical server configuration for such a system?

5. On a parallel note, what would be the best algorithm to track forwarded
messages? We make use of cookies right now (but that provides 50% accuracy).

I apologize if I broke some protocol and asked some questions that do not
pertain to this list.

Regards,
manav.




Re: Java and Qmail - building a large mailmerge server - plain text version

2001-06-22 Thread Russell Nelson

manav writes:
 > I have been using qmail for the last year and a half and have been closely
 > following the mailing list at securepoint, and didn't find anything related
 > to my query, hence I took the liberty of posting it.
 > 
 > The objective is to build a high-volumer server capable of doing mail-merged
 > email blasts to several lists with 10,000 to 1,000,000 users, provide
 > detailed reports about the status of emails (sent, bounced, bad email
 > addresses, opened, forwarded), list management (across multiple lists for
 > each user) and of course, stability.
 > 
 > Over the period of last 12 months, we explored several options - and finally
 > settled on qmail (what else?). I am using a Pentium III with Linux Redhat
 > 6.2 installed on it, with 512 MB of RAM, 20 GB HDD and JDK 1.2.2 connected
 > to a 128 Kbps line.

128Kbps?  Surely you mean Mbps.  If that's all the bandwidth you can
afford at your location, you should rent a server at a colocation site
n the US.  Use your server to create and distribute batches of
recipients to a server running qmail-qmqps configured with the
qmail-verh and big-concurrency patches.

Let's say that you're sending a 2K message.  Sent to 1,000,000 users,
that's 2,000,000,000 bytes.  Assuming that you're using qmail-verh (to
merge on the fly), that your system doesn't limit your sending (and if
you've got an IDE disk, it will), and assuming 20% overhead (tcp/ip
packet headers, smtp dialogue, message retries), this blast will take
15 seconds to clear your server.  That's 42 hours, minimum.

-- 
-russ nelson <[EMAIL PROTECTED]>  http://russnelson.com
Crynwr sells support for free software  | PGPok | 
521 Pleasant Valley Rd. | +1 315 268 1925 voice | #exclude 
Potsdam, NY 13676-3213  | +1 315 268 9201 FAX   | 



Re: Java and Qmail - building a large mailmerge server - plain text version

2001-06-22 Thread Brett Randall

Hi Manav. For most of this, one word: ezmlm (www.ezmlm.org). For the
rest...

> "manav" == manav  <[EMAIL PROTECTED]> writes:

> 1.2 For each blast we want to handle the bounced emails individually (we
> would need to update the appropriate table). What do we do for that? We
> cannot just "set" environment variables since there will be multiple
> mail-merges and blasts happening simultaneously.

Mailing list is the word I think you are after. See above...

> 1.3 Usually after about 5,000 deliveries, the messages would be stuck in
> the queue. We then added the CNAME lookup patch, and this increased to about
> 10,000. Currently, we "prune" the lists uploaded by the users and send
> messages in chunks of 2000, with less than 30 concurrent messages. Any
> suggestions what could be the culprit? What can we do to circumvent this
> problem?

The only reason I can see why you would want to do this would be if
you are customising the message for each individual user. If you
are... you will probably want a bit more processing power (ie: more
servers) than this. It is well known that qmail doesn't really enjoy
having 10,000+ e-mails in the queue...

> 1.4 What would be the best possible way to handle unsubscribe requests.
> Currently we invoke a java program from the .qmail file that updates the
> database. Any suggestions how this can be improved upon?

Ezmlm

> 2. We then decided to switch over to using qmail-remote, to circumvent the
> queue and the logging problem. This effectively means we will have to do our
> own logging. Is there anyway to hand over different messages to qmail-remote
> rather than invoking it for each message? We have now decided to change the
> implementation so that at any point of time, there will be as many threads
> sending messages as the qmail concurrency (say around 100), and the messages
> themselves will be broken into chunks of 300 to 500 each. How can we improve
> this?

Ezmlm looks after all of this for you. It is probably easier to hack
up ezmlm-idx to customise messages, than to make your own do
everything that ezmlm does.

> 3. Currently, we have our own implementation for checking bad e-mail
> addresses, list management, handling bounces and mail-merge. Are
> there any guidelines/sample code available (any language), that we
> can look at?

Ezmlm...

> 4 . What other things should we keep in mind to provide stability to
> the system? What patches to qmail are advisable to be installed?
> What should be the typical server configuration for such a system?

If you are customising messages, you definitely need parallel
processing or clustering. Also, that 128kb line is a MAJOR
bottleneck...

Oh, and RedHat 6.2 is not the best server distribution. I use it on a
number of my servers, but am moving them to Mandrake (for now) until I
find the time to investigate other alternatives such as Turbo Linux
and Debian. Mandrake can be made to work a lot better for you than
RedHat, and so far 8.0 has MUCH less bugs in the components than most
RedHat versions...

> 5. On a parallel note, what would be the best algorithm to track
> forwarded messages? We make use of cookies right now (but that
> provides 50% accuracy).

We use a blank 1x1pixel gif in our e-mails that is like:
http://my.server.com/cgi-bin/emailcount.pl?2001-06-22-Email-1"; width=1 
height=1>

That perl script then does whatever it has to (it logs the relevant
data to a file, and increases the count in another file) and then
returns a 1x1 pixel GIF, using the GD library, from
memory... Obviously this requires an HTML e-mail to be going out, but
if you're using cookies then you are obviously already there!

By the way, the parameter on the perl script (?2001-06-blah) is so
that we can use the same script for each e-mail that goes out, and
just change the parameter so that we can count for different
mailouts. On that note, Hotmail doesn't allow the forwarding of HTML
e-mail. I don't know about the other major free e-mail providers.

HTH

Brett.
-- 
Smash forehead on keyboard to continue



Re: Java and Qmail - building a large mailmerge server - plain text version

2001-06-22 Thread Mike Jackson

manav wrote:

> The objective is to build a high-volumer server capable of doing mail-merged
> email blasts to several lists with 10,000 to 1,000,000 users, provide
> detailed reports about the status of emails (sent, bounced, bad email
> addresses, opened, forwarded), list management (across multiple lists for
> each user) and of course, stability.
> 
> Over the period of last 12 months, we explored several options - and finally
> settled on qmail (what else?). I am using a Pentium III with Linux Redhat
> 6.2 installed on it, with 512 MB of RAM, 20 GB HDD and JDK 1.2.2 connected
> to a 128 Kbps line.
> 

Before you go any further, get a real pipe. Why do people insist that
their Volkswagen Beetle is capable of keeping up with a Ferrari on the
autobahn? The volume of messages that you are trying to send is nothing
short of ridiculous with a 128Kbps line.

--
Mike



Re: Java and Qmail - building a large mailmerge server - plain text version

2001-06-22 Thread manav

Hi Mike, Russ,

I really appreciate you took some time out to reply. Thanks.

Yes, I do have three of my production servers co-located with an ISP in the
US that promises unlimited bandwidth, with a 99.9% uptime. All these
production boxes have a SCSI Disk with hardware alarms to indicate any
malfunction, and 1 GB of RAM. I have a crude "load balancing" algorithm that
ensures the load is shared across these boxes.

We are running the alpha phase right now (with whatever current
implementations we have), and I have serious doubts about the stability and
scalability of the system. The maximum load that I've put on my production
boxes is 250,000 emails so far and I've had similar issues that I mentioned
on my development boxes (the ones that are resemble a Beetle, to quote Mike
:-) ).

Before I move anything to production, I test them on the local (Indian
servers). These issues appear at both places.

Thanks once again for your responses.

Manav.
- Original Message -
From: "Mike Jackson" <[EMAIL PROTECTED]>
To: "manav" <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Friday, June 22, 2001 8:53 PM
Subject: Re: Java and Qmail - building a large mailmerge server - plain text
version


> manav wrote:
>
> > The objective is to build a high-volumer server capable of doing
mail-merged
> > email blasts to several lists with 10,000 to 1,000,000 users, provide
> > detailed reports about the status of emails (sent, bounced, bad email
> > addresses, opened, forwarded), list management (across multiple lists
for
> > each user) and of course, stability.
> >
> > Over the period of last 12 months, we explored several options - and
finally
> > settled on qmail (what else?). I am using a Pentium III with Linux
Redhat
> > 6.2 installed on it, with 512 MB of RAM, 20 GB HDD and JDK 1.2.2
connected
> > to a 128 Kbps line.
> >
>
> Before you go any further, get a real pipe. Why do people insist that
> their Volkswagen Beetle is capable of keeping up with a Ferrari on the
> autobahn? The volume of messages that you are trying to send is nothing
> short of ridiculous with a 128Kbps line.
>
> --
> Mike




Re: Java and Qmail - building a large mailmerge server - plain text version

2001-06-22 Thread Russell Nelson

manav writes:
 > I really appreciate you took some time out to reply. Thanks.

And not flame you?  :-) Not everybody on the list is a flamer, and
besides you supplied us with all the necessary information.  You *did*
confuse us by mentioning 10 lakh recipients and 128kbps in the same
paragraph, but that's really no matter.  The real problem is injecting 
bulk email using separate messages.

 > We are running the alpha phase right now (with whatever current
 > implementations we have), and I have serious doubts about the stability and
 > scalability of the system. The maximum load that I've put on my production
 > boxes is 250,000 emails so far and I've had similar issues that I mentioned
 > on my development boxes (the ones that are resemble a Beetle, to quote Mike
 > :-) ).

The problem, simply enough, is that you should try very, very hard not 
to have a separate copy of the email on the disk.  If you're running
qmail-inject on each message, then yes, three machines aren't going to 
be enough.  On the other hand, three machines of the type you describe 
below will be sufficient to deliver one million emails in about eight
hours, IF you're doing the mail merge function at delivery time.

You can do that using the qmail-verh patch, you could call
qmail-remote directly (in theory; I don't know that anyone is doing
that), or you could purchase my qmail-merge system.  It lets you
substitute multiple fields into each message.  So you could substitute
in a first name, a last name, a database ID number, or whatever else
you want.  Handles bounces, and runs everything through the database.
Details upon request.

Dealing with bounces is a whole 'nother headache.  You see, there are
three types of email bounces: 4XX bounces, which are known to be
temporary.  A retry is definitely called for, and qmail will handle
that on its own.  You also get a 5XX bounce, where the smtp server has
told your smtp client that the email will never be deliverable.  These
get handled by parsing the QSBMF message.  And you can also get a
delivered but returned message.  VERP is your friend here, because
parsing bounce messages is a task only attempted by lunatics.

Even then, you can't treat a 5XX or returned message as a permanent
failure.  You have to have a system for retries these messages at a
later time.

As someone else pointed out, ezmlm handles this nicely.
Unfortunately, ezmlm doesn't work well when you've got users
subscribed to more than one type of mailing, because it doesn't share
bounce information between lists.

-- 
-russ nelson <[EMAIL PROTECTED]>  http://russnelson.com
Crynwr sells support for free software  | PGPok | 
521 Pleasant Valley Rd. | +1 315 268 1925 voice | #exclude 
Potsdam, NY 13676-3213  | +1 315 268 9201 FAX   | 



Re: Java and Qmail - building a large mailmerge server - plain text version

2001-06-22 Thread Mike Jackson

manav wrote:
> 
> Hi Mike, Russ,

Hi !

> 
> We are running the alpha phase right now (with whatever current
> implementations we have), and I have serious doubts about the stability and
> scalability of the system. The maximum load that I've put on my production
> boxes is 250,000 emails so far and I've had similar issues that I mentioned
> on my development boxes (the ones that are resemble a Beetle, to quote Mike
> :-) ).

Just as an example of the speed of qmail and ezmlm:

Machine: 1U rackmount cheapo 600Mhz Celeron, 128MB RAM, 18GB hard disk
OS: NetBSD 1.5
MTA: Qmail 1.03 with only the verh patch
List Manager: Ezmlm 0.53 with idx 0.40
remoteconcurrency: 120

Here are some stats from the first large mailing with this server. As
you can see, within 15 minutes most of the deliveries were completed.
The only kernel tuning I did was to raise the max processes to 256 and
max open files per process to 512. The numbers look a little off since
there are a few old messages still going through, mostly mail servers
that were previously unreachable.

12.45.21message sent to 4773 addresses

12.50.001738 deliveries
1924 attempts
1761 successes
187 failures

12.55.001775 deliveries
1937 attempts
1779 successes
166 failures

13.00.00423 deliveries
455 attempts
433 successes
32 failures

13.05.0013 deliveries
14 attempts

13.10.002 deliveries
2 attempts
---
Total   3951 deliveries
4332 attempts

 With the large concurrency patch, this throughput could be increased
significantly. I will put it into use if I get a requirement to send to
at least 10,000 addresses.

 Using qmail-ldap and qmqp with a frontend master server and several
slave servers, you can distribute the load among several servers very
easily. For example, if you have 4 slave servers then use a unique
mailhost attribute for each quarter of your subscriber base. The
scalability of qmail-ldap is almost limitless, I think. The master
server will transfer the qmqp messages to the slave servers via qmqp
faster than you can even dream of. For more info, www.nrg4u.com
qmail-ldap homepage.

Regards,
Mike



Re: Java and Qmail - building a large mailmerge server - plain text version

2001-06-22 Thread Karsten W. Rohrbach

manav([EMAIL PROTECTED])@2001.06.22 21:17:26 +:
> Yes, I do have three of my production servers co-located with an ISP in the
> US that promises unlimited bandwidth, with a 99.9% uptime. All these

wow, daring. my contract with my isp ensures 100mbit/fdx ethernet with
99.87something% availabilty -- unlimited bandwidth seems a little bit
high to me
;-)

/k

-- 
> MCSE: Management Can't Send E-mail
KR433/KR11-RIPE -- WebMonster Community Founder -- nGENn GmbH Senior Techie
http://www.webmonster.de/ -- ftp://ftp.webmonster.de/ -- http://www.ngenn.net/
karsten&rohrbach.de -- alpha&ngenn.net -- alpha&scene.org -- [EMAIL PROTECTED]
GnuPG 0x2964BF46 2001-03-15 42F9 9FFF 50D4 2F38 DBEE  DF22 3340 4F4E 2964 BF46
Please do not remove my address from To: and Cc: fields in mailing lists. 10x

 PGP signature


Re: Java and Qmail - building a large mailmerge server - plain text version

2001-06-23 Thread manav

Hi Brett,

Thanks for the reply.

I am exploring ezmlm right now, so I believe I'd have to trouble the people
on the ezmlm mailing list for queries on that :-)

For tracking forwarded emails, I have a hidden IMG tag which then calls a
servlet. When the user opens the email for the first time, the "hit" is
registered and a cookie is written. Subsequent "email reads" by the same
user can now be tracked. When the servlet finds the cookie is not there,
either the cookies were deleted or the user forwarded the email. I don't
think I can make use of any combination of HTTP headers to establish
uniqueness of the recipient (or if there is, please let me know).

Once again, if this discussion offends anyone on the list, I apologize (and
would be glad to carry the same offlist).

Thanks,
Manav.
- Original Message -
From: "Brett Randall" <[EMAIL PROTECTED]>
To: "manav" <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Saturday, June 23, 2001 6:36 AM
Subject: Re: Java and Qmail - building a large mailmerge server - plain text
version


> Hi Manav. For most of this, one word: ezmlm (www.ezmlm.org). For the
> rest...
>
> >>>>> "manav" == manav  <[EMAIL PROTECTED]> writes:
>
> > 1.2 For each blast we want to handle the bounced emails individually
(we
> > would need to update the appropriate table). What do we do for that? We
> > cannot just "set" environment variables since there will be multiple
> > mail-merges and blasts happening simultaneously.
>
> Mailing list is the word I think you are after. See above...
>
> > 1.3 Usually after about 5,000 deliveries, the messages would be
stuck in
> > the queue. We then added the CNAME lookup patch, and this increased to
about
> > 10,000. Currently, we "prune" the lists uploaded by the users and send
> > messages in chunks of 2000, with less than 30 concurrent messages. Any
> > suggestions what could be the culprit? What can we do to circumvent this
> > problem?
>
> The only reason I can see why you would want to do this would be if
> you are customising the message for each individual user. If you
> are... you will probably want a bit more processing power (ie: more
> servers) than this. It is well known that qmail doesn't really enjoy
> having 10,000+ e-mails in the queue...
>
> > 1.4 What would be the best possible way to handle unsubscribe
requests.
> > Currently we invoke a java program from the .qmail file that updates the
> > database. Any suggestions how this can be improved upon?
>
> Ezmlm
>
> > 2. We then decided to switch over to using qmail-remote, to circumvent
the
> > queue and the logging problem. This effectively means we will have to do
our
> > own logging. Is there anyway to hand over different messages to
qmail-remote
> > rather than invoking it for each message? We have now decided to change
the
> > implementation so that at any point of time, there will be as many
threads
> > sending messages as the qmail concurrency (say around 100), and the
messages
> > themselves will be broken into chunks of 300 to 500 each. How can we
improve
> > this?
>
> Ezmlm looks after all of this for you. It is probably easier to hack
> up ezmlm-idx to customise messages, than to make your own do
> everything that ezmlm does.
>
> > 3. Currently, we have our own implementation for checking bad e-mail
> > addresses, list management, handling bounces and mail-merge. Are
> > there any guidelines/sample code available (any language), that we
> > can look at?
>
> Ezmlm...
>
> > 4 . What other things should we keep in mind to provide stability to
> > the system? What patches to qmail are advisable to be installed?
> > What should be the typical server configuration for such a system?
>
> If you are customising messages, you definitely need parallel
> processing or clustering. Also, that 128kb line is a MAJOR
> bottleneck...
>
> Oh, and RedHat 6.2 is not the best server distribution. I use it on a
> number of my servers, but am moving them to Mandrake (for now) until I
> find the time to investigate other alternatives such as Turbo Linux
> and Debian. Mandrake can be made to work a lot better for you than
> RedHat, and so far 8.0 has MUCH less bugs in the components than most
> RedHat versions...
>
> > 5. On a parallel note, what would be the best algorithm to track
> > forwarded messages? We make use of cookies right now (but that
> > provides 50% accuracy).
>
> We use a blank 1x1pixel gif in our e-mails that is like:
> http://my.server.com/cgi-bin/emailcount.pl?2001-06-22-Email-1";
width=1 height=1>
>
> That perl script t

Re: Java and Qmail - building a large mailmerge server - plain text version

2001-06-27 Thread Greg Cope

Russell Nelson wrote:
> 
> 
> The problem, simply enough, is that you should try very, very hard not
> to have a separate copy of the email on the disk.  If you're running
> qmail-inject on each message, then yes, three machines aren't going to
> be enough.  On the other hand, three machines of the type you describe
> below will be sufficient to deliver one million emails in about eight
> hours, IF you're doing the mail merge function at delivery time.
> 
> You can do that using the qmail-verh patch, you could call
> qmail-remote directly (in theory; I don't know that anyone is doing
> that), or you could purchase my qmail-merge system.  It lets you
> substitute multiple fields into each message.  So you could substitute
> in a first name, a last name, a database ID number, or whatever else
> you want.  Handles bounces, and runs everything through the database.
> Details upon request.
> 

Russ,

I emailed you off list a few days ago about your qmail-merge system, but
as yet have had no reply did you get it ?  Can please contact me off
list.



Thanks

Greg

> 
> --
> -russ nelson <[EMAIL PROTECTED]>  http://russnelson.com
> Crynwr sells support for free software  | PGPok |
> 521 Pleasant Valley Rd. | +1 315 268 1925 voice | #exclude 
> Potsdam, NY 13676-3213  | +1 315 268 9201 FAX   |