Short version -- all good tips; many thanks. We're investigating them all, but as you suggested, we're changing one thing at a time. Checking them thoroughly takes a little while, so I probably won't be able to report anything intelligent back on our results for a week or so (gotta wait for peak usage, for one thing).
A few more detailed remarks below. On Jul 16, 2005, at 9:51 AM, Brad Knowles wrote: >> Ah -- thanks for clarifying. This is not really what is implied in >> the >> FAQ >> (http://www.python.org/cgi-bin/faqw-mm.py? >> req=show&file=faq04.012.htp) >> -- I read it to imply that mailman is grouping messages to the MTA >> by >> domain in order to boost minimum number of deliveries to a given >> target >> domain. > > No. He's talking about the network bandwidth that will be used > by the MTA, once it has accepted all the messages and recipients from > Mailman. The MTA is forced to send no more than X recipients to a > given target site, because no more than X recipients exist at that > site. Gotcha -- I understand that. But the inference that I made was that if mailman did not group domains together in the 1000 member list shown in the example, then his numbers are not necessarily accurate. Specifically, mails to @example.com may not be conveniently lumped into 1 or 2 transfers to the MTA -- in a worst case, they may be spread out across 1000/SMTP_MAX_RCPTS transfers to the MTA. In this case, the MTA has no way of knowing that they are, in fact, the same message, and therefore may have to initiate 1000/SMTP_MAX_RCPTS connections to the same remote MTA at example.com. Or, even if the MTA can tell that they are the same message, due to race conditions and/or CPU load, the MTA may be eagerly delivering messages to the remote MTA, and therefore still have to initiate 1000/SMTP_MAX_RCPTS connections to the example.com MTA (e.g., if one message is fully sent to example.com's MTA before the next [identical] one arrives at the local MTA from mailman). Hence, I assumed that he was implicitly saying that mailman was grouping domains when it transferred to the MTA (i.e., packed as many @example.com's into a single MTA transfer as possible [until exhausted] -- repeating for all like domains in the recipient list), and therefore could guarantee that there would only be 1 or 2 transfers to the remote MTA (based on the numbers in his example). However, it's quite possible that my logic is incorrect here... :-) >> We have pumped our SMTP_MAX_RCPTS down to 5 (we had never changed it >> [snipped] > > One thing I would encourage you to do is to change just one thing > at a time, and see what the effects are. With regards to reducing > SMTP_MAX_RCPTS, I would encourage you to reduce the value by roughly > half at each stage. So, go from 500 to 250, 250 to 125, 125 to 62, > 62 to 32, etc.... This way, you should get a better idea of what the > real threshold is. Will do. >> Right -- sorry, I didn't mean to imply otherwise. We were not >> surprised >> by this, either. I was trying to say that we've seen this behavior >> for >> a long time and didn't have any performance issues with it. > > This sort of thing happens all the time with all sorts of > systems. People will notice that their tires seem a little low, and > there is some smoking coming out of the tailpipe, but they won't do > anything about it until the car blows up or the tires come off the > rims, etc.... That's when they take the car to the mechanic. > > With computers, people may notice that queues get really long, > but they'll think that this is perfectly normal and acceptable, until > something bad happens. That's when they go looking for help. > They've been seeing all the signs that something bad was likely to > happen soon, but they didn't recognize them for what they were. Indeed. Delivery on our big lists had *always* been [relatively] slow; as you said, we always assumed that that was the way it was supposed to work. But deliver for our small lists had always been fairly quick -- when it changed to be fairly slow, that was an indication that something was wrong. >> The thought occurs to me that perhaps it wasn't our sendmail guys who >> changed something, but perhaps the guys in the >> anti-spam/virus-checking >> crew changed something (I believe they also check outgoing mails for >> some insundry list of things that they believe indicates >> spam/viruses). >> Hmm. Need to go ping them, too... > > Yeah, gotta talk to them, too. The recommended practice for > mailing lists is to check messages on input, but don't try to check > them on output -- after all, the messages were already demonstrated > to be clean on input. > > You may or may not be able to do this at your site, but you > should at least check with them. Yes, that's exactly what I was thinking. I'm not sure what our anti-spam/anti-virus stuff is doing, but I won't be able to talk to the guys who run that stuff until Monday. > The problem could also be DNS or reverse DNS. Those kinds of > things can really slow down MTAs, as they check their incoming > connections. If a DNS server is flaking out, the MTAs could be > taking much longer than they used to in order to do all the same > sorts of checks that they've always been doing. We checked into that, and seem to have a pretty reliable DNS connection (and its cached locally). I don't think we're a victim of tarpit kinds of remote MTAs, but even if we are, lowering the SMTP_MAX_RCPTS should help with that, right? That is, if a recipient has a slow MTA, then *essentially* only the other (SMTP_MAX_RCPTS-1) other recipients will be penalized (because the others will be occurring in more-or-less parallel). Is that right? Thanks again! -- {+} Jeff Squyres {+} [EMAIL PROTECTED] {+} Post Doctoral Research Associate, Open Systems Lab, Indiana University {+} http://www.osl.iu.edu/ ------------------------------------------------------ Mailman-Users mailing list Mailman-Users@python.org http://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://www.python.org/cgi-bin/faqw-mm.py Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-users/archive%40jab.org Security Policy: http://www.python.org/cgi-bin/faqw-mm.py?req=show&file=faq01.027.htp