Hi, Andreas Metzler <ametz...@bebt.de> (Sa 31 Dez 2016 17:55:30 CET): > On 2016-11-24 "Heiko Schlittermann (HS12-RIPE)" <h...@schlittermann.de> wrote: > > Package: exim4-daemon-heavy > > Version: 4.84.2-2+deb8u1 > > Severity: important > > Tags: upstream patch > > > Dear Maintainer, > > > Current Exim versions have a memory leak when doing callouts via TLS > > connections. I can reproduce the problem and I've fixed it. > > > The fix is already pushed to the upstream repository of Exim (as I'm > > one of the Exim developers). > > > Commit ed62aae3051c9a713d35c8ae516fbd193d1401ba contains the fix. > [...] > > Hello Heiko, > > thanks for the report with fix (in the branch). > > Would you mind explaining why this is an important bug? Afaiu most exim > processes a short lived and I also would think that the respective > structure would not be huge. So at a glance I would have expected a > normal or even minor severity (... which would not be eligible for a > stable update.)
You're right. Most Exim processes are short lived. The callout is done by the (just forked) receiving process, *not* by another subprocess. Thus, if there is a huge number of addresses to be checked by callouts, the memory leak hurts. I discovered the problem on a central mailhub. One of the sattelites is a mailing list server (mailman), sending via its local Exim instance to the central mailhub. The default configuration of Mailman and Exim caused a batch of about 4k recipient addresses with a single message. The receiving Exim on the mailhub tried to verify these 4k addresses via TLS callouts. After about 1k address approx 4G¹ RAM where exhausted and the receiving process crashed. Fortunately the callout results were stored in the callout cache and the next connection caused the first 1k addresses verified by the cache entries, but the 2nd 1k addresses caused the receiver to crash during callouts… After about the 4th attempt all addresses where verified and the mail went through. In the above setup the delay was a major desaster. In other cases you might have much less addresses to check, or much looser constraints about delivery time … But the leak is clearly a bug and the fix is easy. (Even there are possibilities to create work-arounds, on the sender's side, and on the receivers side. Because of the callout cache it was kind of self-healing, but with shorter cache times and longer retry intervals this wouldn't work anymore.) As one of the Exim developers I'd really like to see this bug fixed in Exim releases that are distributed as "stable". If you need help for backporting, I can assist you. ¹) I'm not sure about the real numbers, maybe it was 1.5k addresses and 8G RAM, but I think, you get the idea. (It was reproduceable.) Best regards from Dresden/Germany Viele Grüße aus Dresden Heiko Schlittermann -- SCHLITTERMANN.de ---------------------------- internet & unix support - Heiko Schlittermann, Dipl.-Ing. (TU) - {fon,fax}: +49.351.802998{1,3} - gnupg encrypted messages are welcome --------------- key ID: F69376CE - ! key id 7CBF764A and 972EAC9F are revoked since 2015-01 ------------ -
signature.asc
Description: Digital signature