Re: Increasing logging on queue messages
MySQL Student: > Hi, > > >> > Ah, found it. ?The problem is you've got the Postfix host relaying mail > >> > to itself. ?It's right in your log entries: > >> > > >> > to=, relay=127.0.0.1[127.0.0.1] > >> > >> Yes, my apologies; as Sahil said, I'm using amavisd-new, although there is > >> no longer spam and virus scanning happening on this host. > >> > >> The message is just getting requeued back to the second queue. > > > > Have you already looked at a tcpdump recording as requested? > > I will try and investigate that today. Can anyone recommend a > front-end to make browsing > through the output easier? Is it ethereal/ettercap/wireshark that I > should be using? > > Can you recommend how I can isolate the output to a specific message? Perhaps > by > process ID or the ability to somehow track the queue ID? Postfix logs the approximate time of failed transactions. Use that information to narrow down the search. You need to look at the TCP-level features (stuff negotiated during the initial handshake, and the behavior of acks and windows during message delivery). Broken implementations of SACK and WSCALE have resulted in timeouts during message delivery, but we have also seen "security" appliances that simply drop packets with flags that they did not understand. Wietse
Re: Increasing logging on queue messages
MySQL Student wrote: Hi, Ah, found it. ?The problem is you've got the Postfix host relaying mail to itself. ?It's right in your log entries: to=, relay=127.0.0.1[127.0.0.1] Yes, my apologies; as Sahil said, I'm using amavisd-new, although there is no longer spam and virus scanning happening on this host. The message is just getting requeued back to the second queue. Have you already looked at a tcpdump recording as requested? I will try and investigate that today. Can anyone recommend a front-end to make browsing through the output easier? Is it ethereal/ettercap/wireshark that I should be using? Yes, wireshark is fine, and can read pcap files. Can you recommend how I can isolate the output to a specific message? Perhaps by process ID or the ability to somehow track the queue ID? The "Follow TCP stream" feature is valuable; past that you'll have to either grep through the output, or correlate timestamps accordingly. -- Corey
Re: Increasing logging on queue messages
Hi, >> > Ah, found it. ?The problem is you've got the Postfix host relaying mail >> > to itself. ?It's right in your log entries: >> > >> > to=, relay=127.0.0.1[127.0.0.1] >> >> Yes, my apologies; as Sahil said, I'm using amavisd-new, although there is >> no longer spam and virus scanning happening on this host. >> >> The message is just getting requeued back to the second queue. > > Have you already looked at a tcpdump recording as requested? I will try and investigate that today. Can anyone recommend a front-end to make browsing through the output easier? Is it ethereal/ettercap/wireshark that I should be using? Can you recommend how I can isolate the output to a specific message? Perhaps by process ID or the ability to somehow track the queue ID? Thanks, Alex
Re: Increasing logging on queue messages
MySQL Student: [ Charset ISO-8859-1 unsupported, converting... ] > Hi, > > > Ah, found it. ?The problem is you've got the Postfix host relaying mail > > to itself. ?It's right in your log entries: > > > > to=, relay=127.0.0.1[127.0.0.1] > > Yes, my apologies; as Sahil said, I'm using amavisd-new, although there is > no longer spam and virus scanning happening on this host. > > The message is just getting requeued back to the second queue. Have you already looked at a tcpdump recording as requested? Wietse
Re: Increasing logging on queue messages
Hi, > Ah, found it. The problem is you've got the Postfix host relaying mail > to itself. It's right in your log entries: > > to=, relay=127.0.0.1[127.0.0.1] Yes, my apologies; as Sahil said, I'm using amavisd-new, although there is no longer spam and virus scanning happening on this host. The message is just getting requeued back to the second queue. Thanks, Alex
Re: Increasing logging on queue messages
On Oct 12, 2009, at 1:14 AM, Stan Hoeppner wrote: MySQL Student put forth on 10/11/2009 11:53 PM: I appreciate your comments about obfuscation. Hopefully I haven't mangled the info too much. My log is more than just a few lines. Perhaps I have the logging level turned up too high. Isn't there a way to include more detailed logging directly in the deferred/ queue file itself? Ah, found it. The problem is you've got the Postfix host relaying mail to itself. It's right in your log entries: to=, relay=127.0.0.1[127.0.0.1] That is because the OP is using amavisd-new running on localhost, so Postfix (presumably because it has so been configured) is passing mail to amavisd-new on 127.0.0.1. Not so problematic in and of itself.
Increasing logging on queue messages
MySQL Student put forth on 10/11/2009 11:53 PM: > I appreciate your comments about obfuscation. Hopefully I haven't > mangled the info too much. My log is more than just a few lines. > Perhaps I have the logging level turned up too high. Isn't there a way > to include more detailed logging directly in the deferred/ queue file > itself? Ah, found it. The problem is you've got the Postfix host relaying mail to itself. It's right in your log entries: to=, relay=127.0.0.1[127.0.0.1] -- Stan
Re: Increasing logging on queue messages
Hi, >>> Unfortunately, you deleted lots of useful information from the >>> logging, including the break-down of handshake delays and of >>> transmission delays. >> >> I wasn't sure that I should post the whole queued message here. > > The logs for a single message are usually about half a dozen lines or > less. Ingress, cleanup, queue manager, egress, removal; not big at > all, and very useful. Mask "private" information if you must, but keep > in mind that it can make it harder for others to help diagnose > problems. I appreciate your comments about obfuscation. Hopefully I haven't mangled the info too much. My log is more than just a few lines. Perhaps I have the logging level turned up too high. Isn't there a way to include more detailed logging directly in the deferred/ queue file itself? There are currently 19 messages in the second queue, waiting to be delivered. How can I be sure that when I manually flush the queue to collect the log information, that I'm following the log information for a specific message? I've done my best to include the relevant information, and have pasted it at the end of this message. >> Why would it only be broken handling some messages and not others? Or >> is that what only tcpdump would be able to tell us? > > My first thought is, "we need to know more about these messages". You > say they're mostly for one domain, but you don't really specify > whether it's in or out, etc. Are the problematic messages perhaps: All the messages are outbound. It receives mail from only one server, and that's the domain primary which does content scanning and accepts it for the domain from the Internet. This server basically forwards it from the DMZ to an internal Exchange server. > - all from the same domain, to other domains? The mail is all destined for the same internal domain on the same internal server, but may come from any Internet email address. > - all to the same domain, from other domains? The mail could originate from any domain. > - all to/from one user/address? The mail could be destined for any user in the same domain. >>> debug_peer_list can turn on logging for specific sites, not messages. >> >> It's really more of a mail router, so only handles mail for a single domain >> :-( > > You mentioned a relayhost (smarthost) earlier. Do all mails go through > it? Do the messages going via the relayhost have problems? Maybe the > ones that don't go via the relayhost have problems? If you can narrow > that down a little, you could use debug_peer_list on that host > specifically, which may help. All the email is forwarded to just one server. Below is the log output. Oct 11 17:06:06 mailrelay postfix/smtpd[4531]: A4B16118006: client=smtp01.example.com[192.168.100.10] Oct 11 17:06:06 mailrelay postfix/cleanup[4643]: A4B16118006: message-id=<4665460.1255305961508.javamail@atcac007> Oct 11 17:06:06 mailrelay postfix/smtpd[4531]: disconnect from smtp01.example.com[192.168.100.10] Oct 11 17:06:06 mailrelay postfix/qmgr[13436]: A4B16118006: from=, size=3924, nrcpt=1 (queue active) Oct 11 17:06:06 mailrelay postfix_2/smtpd[4449]: > localhost.localdomain[127.0.0.1]: 220 localhost ESMTP Postfix Oct 11 17:06:06 mailrelay postfix_2/smtpd[4449]: connect from localhost.localdomain[127.0.0.1] Oct 11 17:06:06 mailrelay postfix_2/smtpd[4449]: watchdog_pat: 0x808eff8 Oct 11 17:06:06 mailrelay postfix_2/smtpd[4449]: < localhost.localdomain[127.0.0.1]: EHLO localhost Oct 11 17:06:06 mailrelay postfix_2/smtpd[4449]: > localhost.localdomain[127.0.0.1]: 250-localhost Oct 11 17:06:06 mailrelay postfix_2/smtpd[4449]: > localhost.localdomain[127.0.0.1]: 250-PIPELINING Oct 11 17:06:06 mailrelay postfix_2/smtpd[4449]: > localhost.localdomain[127.0.0.1]: 250-SIZE 13844000 Oct 11 17:06:06 mailrelay postfix_2/smtpd[4449]: > localhost.localdomain[127.0.0.1]: 250-ETRN Oct 11 17:06:06 mailrelay postfix_2/smtpd[4449]: > localhost.localdomain[127.0.0.1]: 250-XVERP Oct 11 17:06:06 mailrelay postfix_2/smtpd[4449]: > localhost.localdomain[127.0.0.1]: 250 8BITMIME Oct 11 17:06:06 mailrelay postfix_2/smtpd[4449]: watchdog_pat: 0x808eff8 Oct 11 17:06:06 mailrelay postfix_2/smtpd[4449]: < localhost.localdomain[127.0.0.1]: MAIL FROM: Oct 11 17:06:06 mailrelay postfix_2/smtpd[4449]: extract_addr: input: Oct 11 17:06:06 mailrelay postfix_2/smtpd[4449]: extract_addr: result: katie3...@gmail.com Oct 11 17:06:06 mailrelay postfix_2/smtpd[4449]: connect to subsystem public/cleanup2 Oct 11 17:06:06 mailrelay postfix_2/smtpd[4449]: public/cleanup2 socket: wanted attribute: queue_id Oct 11 17:06:06 mailrelay postfix_2/smtpd[4449]: input attribute name: queue_id Oct 11 17:06:06 mailrelay postfix_2/smtpd[4449]: input attribute value: D98301A8005 Oct 11 17:06:06 mailrelay postfix_2/smtpd[4449]: public/cleanup2 socket: wanted attribute: (list terminator) Oct 11 17:06:06 mailrelay postfix_2/smtpd[4449]: input attribute name: (end) Oct 11 17:06:06 mailrelay postfix_2/smtpd[4449]: send attr flags = 2 Oct 11 17:06:06 ma
Re: Increasing logging on queue messages
MySQL Student: > Hi, > > >> ?said: 421 smtp01.example.com Error: timeout exceeded (in reply to end > >> of DATA command)) > > > > Unfortunately, you deleted lots of useful information from the vvv > > logging, including the break-down of handshake delays and of ^^^ > > transmission delays. > > I wasn't sure that I should post the whole queued message here. I think I was asking for LOGGING, in my reply above. Please make a tcpdump recording. It will show why your KERNEL TCP stack stops receiving ackowledgments from the remote side, or why your KERNEL TCP stack stops sending to the remote side. The TCP window scale factor is chosen by your KERNEL. Your KERNEL has no clue about how much mail will be sent over the connection. Finally, it is possible that hardware mis-behaves with certain bit patterns. One should never exclude the unlikely. Wietse
Re: Increasing logging on queue messages
Barney Desmond put forth on 10/10/2009 12:19 AM: > Mask "private" information if you must, but keep > in mind that it can make it harder for others to help diagnose > problems. Absolutely correct. Obfuscate left-hand-side only to prevent valid addresses being scraped from this public mailing list by spammers. Please do _not_ obfuscate domain names, IP addresses of servers, and host FQDNs. Many problems reported here are DNS or network related. Without real domain names, real IPs, and real host names we can't help people until prying that info from them in successive emails. It's better to post the real info up front. There's not much a criminal can do with just a domain, host name, or IP address. And no one is going to sue you for listing this information in an attempt to solve a problem. You're not really protecting anything by obfuscating those. -- Stan
Re: Increasing logging on queue messages
2009/10/10 MySQL Student : >> Unfortunately, you deleted lots of useful information from the >> logging, including the break-down of handshake delays and of >> transmission delays. > > I wasn't sure that I should post the whole queued message here. The logs for a single message are usually about half a dozen lines or less. Ingress, cleanup, queue manager, egress, removal; not big at all, and very useful. Mask "private" information if you must, but keep in mind that it can make it harder for others to help diagnose problems. >> - Broken TCP window scaling support in firewall or end-host. Postfix >> 2.6 and later have a main.cf:tcp_windowsize parameter. Setting >> this to some value under 65536 will work around this, otherwise >> you'd have to poke the kernel with sysctl or whatever. > > Many of the messages themselves are less than 65k. Does this make a > difference? I can't comment with authority, but I'd say "not directly". The TCP window generally counts bytes on the wire, which means there's other overheads in play, not just the message data. > Why would it only be broken handling some messages and not others? Or > is that what only tcpdump would be able to tell us? My first thought is, "we need to know more about these messages". You say they're mostly for one domain, but you don't really specify whether it's in or out, etc. Are the problematic messages perhaps: - all from the same domain, to other domains? - all to the same domain, from other domains? - all to/from one user/address? >> debug_peer_list can turn on logging for specific sites, not messages. > > It's really more of a mail router, so only handles mail for a single domain > :-( You mentioned a relayhost (smarthost) earlier. Do all mails go through it? Do the messages going via the relayhost have problems? Maybe the ones that don't go via the relayhost have problems? If you can narrow that down a little, you could use debug_peer_list on that host specifically, which may help.
Re: Increasing logging on queue messages
Hi, >> said: 421 smtp01.example.com Error: timeout exceeded (in reply to end >> of DATA command)) > > Unfortunately, you deleted lots of useful information from the > logging, including the break-down of handshake delays and of > transmission delays. I wasn't sure that I should post the whole queued message here. > When the receiver times out receiving data, the problem is more > likely to be at the TCP level, so no amount of verbose Postfix > logging is going to help. Ah, makes sense. > - Broken TCP window scaling support in firewall or end-host. Postfix > 2.6 and later have a main.cf:tcp_windowsize parameter. Setting > this to some value under 65536 will work around this, otherwise > you'd have to poke the kernel with sysctl or whatever. Many of the messages themselves are less than 65k. Does this make a difference? Why would it only be broken handling some messages and not others? Or is that what only tcpdump would be able to tell us? > debug_peer_list can turn on logging for specific sites, not messages. It's really more of a mail router, so only handles mail for a single domain :-( Thanks again, Alex
Re: Increasing logging on queue messages
MySQL Student: > Hi, > > I have a postfix mail server that successfully delivers thousands of > messages per day, primarily to the same smarthost connected to it. > However, I have a number of messages that are stuck in the queue with > timeout errors: > > said: 421 smtp01.example.com Error: timeout exceeded (in reply to end > of DATA command)) Unfortunately, you deleted lots of useful information from the logging, including the break-down of handshake delays and of transmission delays. When the receiver times out receiving data, the problem is more likely to be at the TCP level, so no amount of verbose Postfix logging is going to help. Instead, consider diagnosing this with tcpdump. Common TCP-level problems are: - Broken TCP window scaling support in firewall or end-host. Postfix 2.6 and later have a main.cf:tcp_windowsize parameter. Setting this to some value under 65536 will work around this, otherwise you'd have to poke the kernel with sysctl or whatever. - Broken IP path MTU discovery. The symptom is that TCP packets larger than some size never get acknowledged. - Other TCP-level breakage that only shows up in packet-level recordings. > I've read through the debugging and logging pages at postfix.org, and > have increased the logging, but once you increase it to a certain > point, it becomes impossible to separate the signal from the noise. > How can I increase (or capture) the logging for either specific > messages, or increase it only for messages stuck in the queue? debug_peer_list can turn on logging for specific sites, not messages. Wietse
Increasing logging on queue messages
Hi, I have a postfix mail server that successfully delivers thousands of messages per day, primarily to the same smarthost connected to it. However, I have a number of messages that are stuck in the queue with timeout errors: said: 421 smtp01.example.com Error: timeout exceeded (in reply to end of DATA command)) I've read through the debugging and logging pages at postfix.org, and have increased the logging, but once you increase it to a certain point, it becomes impossible to separate the signal from the noise. How can I increase (or capture) the logging for either specific messages, or increase it only for messages stuck in the queue? Can anyone give me an idea why this problem would occur in the first place, only on specific messages? It doesn't seem to be related to size, load on the server or receiving system, or network issue, because it delivers tens of thousands of other messages without incident. There are no errors on the ethernet interfaces. There are also errors that indicate the message may be delivered more than once: timed out while sending end of data -- message may be sent more than once) It's difficult to track these down, though, so I can't tell if they actually have been delivered. Thanks, Alex