Charlie Brady wrote:
On Sun, 26 Jul 2009, Junyi-HUANG wrote:

Robin Bowes Wrote:
 Take a look at the queue/smtp-forward plugin.

 I'm pretty sure it would be trivial to modify that to deliver to
 specific servers based on header content.
Thanks much Robin, yes , i agree . i think of queue/smtp-forward too.
but, base on the header , i want to route to a set of servers , not only to a certain server in the purpose of HA and LB.
...
Any clue will be appreciated.

As I said earlier, I think you should do that with DNS, which already does RR load balancing. Alternatively, you could do it with iptables (assuming you are using linux).

For HA you need failover. _That_ is what you need to implement in the forwarder.

My qpsmtp-async forwarder has a config list of IPs (and ports). During register, it randomizes the order. Associated with each IP is a "score". You could also do _that_ bit with DNS RR tricks.

Then, on each queue invocation, it tries (via Net::SMTP) to send to the lowest score IP. If that fails, it tries the next lowest, retrying until it works, or runs out of IPs. If it runs out of IPs, it returns a tempfail. The duration of the Net::SMTP transaction is computed (via Time::HiRes) and added to the score. If a server session fails, its score is incremented by a fairly large value.

Since qpsmtpd-async has persistence amongst threads, it means that the quickest SMTP server tends to win, broken ones avoided (for a time), and things balance out.

I've even noticed that the queuing (correctly) notices that some SMTP servers are slower than others (much farther away), and apportions the load quite reasonably.

Without thread persistance, it's probably enough to just randomize the order of servers to try during registration (no scoring necessary, could be done by DNS). But you _still_ need failover.

I don't know if it would be easy to modify my plugin to do multiple "chains" of outbound server IPs. But conceptually it's not hard.

Early on, we had back end mail servers down for a week or more, and nobody noticed because it still just worked fine.

Reply via email to