Re: Increasing logging on queue messages

2009-10-12 Thread Wietse Venema
MySQL Student:
> Hi,
> 
> >> > Ah, found it. ?The problem is you've got the Postfix host relaying mail
> >> > to itself. ?It's right in your log entries:
> >> >
> >> > to=, relay=127.0.0.1[127.0.0.1]
> >>
> >> Yes, my apologies; as Sahil said, I'm using amavisd-new, although there is
> >> no longer spam and virus scanning happening on this host.
> >>
> >> The message is just getting requeued back to the second queue.
> >
> > Have you already looked at a tcpdump recording as requested?
> 
> I will try and investigate that today. Can anyone recommend a
> front-end to make browsing
> through the output easier? Is it ethereal/ettercap/wireshark that I
> should be using?
> 
> Can you recommend how I can isolate the output to a specific message? Perhaps 
> by
> process ID or the ability to somehow track the queue ID?

Postfix logs the approximate time of failed transactions. Use
that information to narrow down the search.

You need to look at the TCP-level features (stuff negotiated during
the initial handshake, and the behavior of acks and windows during
message delivery).

Broken implementations of SACK and WSCALE have resulted in timeouts
during message delivery, but we have also seen "security" appliances
that simply drop packets with flags that they did not understand.

Wietse


Re: Increasing logging on queue messages

2009-10-12 Thread Corey Chandler

MySQL Student wrote:

Hi,

  

Ah, found it. ?The problem is you've got the Postfix host relaying mail
to itself. ?It's right in your log entries:

to=, relay=127.0.0.1[127.0.0.1]


Yes, my apologies; as Sahil said, I'm using amavisd-new, although there is
no longer spam and virus scanning happening on this host.

The message is just getting requeued back to the second queue.
  

Have you already looked at a tcpdump recording as requested?



I will try and investigate that today. Can anyone recommend a
front-end to make browsing
through the output easier? Is it ethereal/ettercap/wireshark that I
should be using?

  

Yes, wireshark is fine, and can read pcap files.

Can you recommend how I can isolate the output to a specific message? Perhaps by
process ID or the ability to somehow track the queue ID?
  


The "Follow TCP stream" feature is valuable; past that you'll have to 
either grep through the output, or correlate timestamps accordingly.


-- Corey


Re: Increasing logging on queue messages

2009-10-12 Thread MySQL Student
Hi,

>> > Ah, found it. ?The problem is you've got the Postfix host relaying mail
>> > to itself. ?It's right in your log entries:
>> >
>> > to=, relay=127.0.0.1[127.0.0.1]
>>
>> Yes, my apologies; as Sahil said, I'm using amavisd-new, although there is
>> no longer spam and virus scanning happening on this host.
>>
>> The message is just getting requeued back to the second queue.
>
> Have you already looked at a tcpdump recording as requested?

I will try and investigate that today. Can anyone recommend a
front-end to make browsing
through the output easier? Is it ethereal/ettercap/wireshark that I
should be using?

Can you recommend how I can isolate the output to a specific message? Perhaps by
process ID or the ability to somehow track the queue ID?

Thanks,
Alex


Re: Increasing logging on queue messages

2009-10-12 Thread Wietse Venema
MySQL Student:
[ Charset ISO-8859-1 unsupported, converting... ]
> Hi,
> 
> > Ah, found it. ?The problem is you've got the Postfix host relaying mail
> > to itself. ?It's right in your log entries:
> >
> > to=, relay=127.0.0.1[127.0.0.1]
> 
> Yes, my apologies; as Sahil said, I'm using amavisd-new, although there is
> no longer spam and virus scanning happening on this host.
> 
> The message is just getting requeued back to the second queue.

Have you already looked at a tcpdump recording as requested?

Wietse


Re: Increasing logging on queue messages

2009-10-12 Thread MySQL Student
Hi,

> Ah, found it.  The problem is you've got the Postfix host relaying mail
> to itself.  It's right in your log entries:
>
> to=, relay=127.0.0.1[127.0.0.1]

Yes, my apologies; as Sahil said, I'm using amavisd-new, although there is
no longer spam and virus scanning happening on this host.

The message is just getting requeued back to the second queue.

Thanks,
Alex


Re: Increasing logging on queue messages

2009-10-12 Thread Sahil Tandon
On Oct 12, 2009, at 1:14 AM, Stan Hoeppner   
wrote:



MySQL Student put forth on 10/11/2009 11:53 PM:


I appreciate your comments about obfuscation. Hopefully I haven't
mangled the info too much. My log is more than just a few lines.
Perhaps I have the logging level turned up too high. Isn't there a  
way

to include more detailed logging directly in the deferred/ queue file
itself?


Ah, found it.  The problem is you've got the Postfix host relaying  
mail

to itself.  It's right in your log entries:

to=, relay=127.0.0.1[127.0.0.1]


That is because the OP is using amavisd-new running on localhost, so  
Postfix (presumably because it has so been configured) is passing mail  
to amavisd-new on 127.0.0.1.  Not so problematic in and of itself.


Increasing logging on queue messages

2009-10-11 Thread Stan Hoeppner
MySQL Student put forth on 10/11/2009 11:53 PM:

> I appreciate your comments about obfuscation. Hopefully I haven't
> mangled the info too much. My log is more than just a few lines.
> Perhaps I have the logging level turned up too high. Isn't there a way
> to include more detailed logging directly in the deferred/ queue file
> itself?

Ah, found it.  The problem is you've got the Postfix host relaying mail
to itself.  It's right in your log entries:

to=, relay=127.0.0.1[127.0.0.1]

--
Stan


Re: Increasing logging on queue messages

2009-10-11 Thread MySQL Student
Hi,

>>> Unfortunately, you deleted lots of useful information from the
>>> logging, including the break-down of handshake delays and of
>>> transmission delays.
>>
>> I wasn't sure that I should post the whole queued message here.
>
> The logs for a single message are usually about half a dozen lines or
> less. Ingress, cleanup, queue manager, egress, removal; not big at
> all, and very useful. Mask "private" information if you must, but keep
> in mind that it can make it harder for others to help diagnose
> problems.

I appreciate your comments about obfuscation. Hopefully I haven't
mangled the info too much. My log is more than just a few lines.
Perhaps I have the logging level turned up too high. Isn't there a way
to include more detailed logging directly in the deferred/ queue file
itself?

There are currently 19 messages in the second queue, waiting to be
delivered. How can I be sure that when I manually flush the queue to
collect the log information, that I'm following the log information
for a specific message?

I've done my best to include the relevant information, and have pasted
it at the end of this message.

>> Why would it only be broken handling some messages and not others? Or
>> is that what only tcpdump would be able to tell us?
>
> My first thought is, "we need to know more about these messages". You
> say they're mostly for one domain, but you don't really specify
> whether it's in or out, etc. Are the problematic messages perhaps:

All the messages are outbound. It receives mail from only one server,
and that's the domain primary which does content scanning and accepts
it for the domain from the Internet. This server basically forwards it
from the DMZ to an internal Exchange server.

> - all from the same domain, to other domains?

The mail is all destined for the same internal domain on the same
internal server, but may come from any Internet email address.

> - all to the same domain, from other domains?

The mail could originate from any domain.

> - all to/from one user/address?

The mail could be destined for any user in the same domain.

>>> debug_peer_list can turn on logging for specific sites, not messages.
>>
>> It's really more of a mail router, so only handles mail for a single domain 
>> :-(
>
> You mentioned a relayhost (smarthost) earlier. Do all mails go through
> it? Do the messages going via the relayhost have problems? Maybe the
> ones that don't go via the relayhost have problems? If you can narrow
> that down a little, you could use debug_peer_list on that host
> specifically, which may help.

All the email is forwarded to just one server.

Below is the log output.

Oct 11 17:06:06 mailrelay postfix/smtpd[4531]: A4B16118006:
client=smtp01.example.com[192.168.100.10]
Oct 11 17:06:06 mailrelay postfix/cleanup[4643]: A4B16118006:
message-id=<4665460.1255305961508.javamail@atcac007>
Oct 11 17:06:06 mailrelay postfix/smtpd[4531]: disconnect from
smtp01.example.com[192.168.100.10]
Oct 11 17:06:06 mailrelay postfix/qmgr[13436]: A4B16118006:
from=, size=3924, nrcpt=1 (queue active)
Oct 11 17:06:06 mailrelay postfix_2/smtpd[4449]: >
localhost.localdomain[127.0.0.1]: 220 localhost ESMTP Postfix
Oct 11 17:06:06 mailrelay postfix_2/smtpd[4449]: connect from
localhost.localdomain[127.0.0.1]
Oct 11 17:06:06 mailrelay postfix_2/smtpd[4449]: watchdog_pat: 0x808eff8
Oct 11 17:06:06 mailrelay postfix_2/smtpd[4449]: <
localhost.localdomain[127.0.0.1]: EHLO localhost
Oct 11 17:06:06 mailrelay postfix_2/smtpd[4449]: >
localhost.localdomain[127.0.0.1]: 250-localhost
Oct 11 17:06:06 mailrelay postfix_2/smtpd[4449]: >
localhost.localdomain[127.0.0.1]: 250-PIPELINING
Oct 11 17:06:06 mailrelay postfix_2/smtpd[4449]: >
localhost.localdomain[127.0.0.1]: 250-SIZE 13844000
Oct 11 17:06:06 mailrelay postfix_2/smtpd[4449]: >
localhost.localdomain[127.0.0.1]: 250-ETRN
Oct 11 17:06:06 mailrelay postfix_2/smtpd[4449]: >
localhost.localdomain[127.0.0.1]: 250-XVERP
Oct 11 17:06:06 mailrelay postfix_2/smtpd[4449]: >
localhost.localdomain[127.0.0.1]: 250 8BITMIME
Oct 11 17:06:06 mailrelay postfix_2/smtpd[4449]: watchdog_pat: 0x808eff8
Oct 11 17:06:06 mailrelay postfix_2/smtpd[4449]: <
localhost.localdomain[127.0.0.1]: MAIL FROM:
Oct 11 17:06:06 mailrelay postfix_2/smtpd[4449]: extract_addr: input:

Oct 11 17:06:06 mailrelay postfix_2/smtpd[4449]: extract_addr: result:
katie3...@gmail.com
Oct 11 17:06:06 mailrelay postfix_2/smtpd[4449]: connect to subsystem
public/cleanup2
Oct 11 17:06:06 mailrelay postfix_2/smtpd[4449]: public/cleanup2
socket: wanted attribute: queue_id
Oct 11 17:06:06 mailrelay postfix_2/smtpd[4449]: input attribute name: queue_id
Oct 11 17:06:06 mailrelay postfix_2/smtpd[4449]: input attribute
value: D98301A8005
Oct 11 17:06:06 mailrelay postfix_2/smtpd[4449]: public/cleanup2
socket: wanted attribute: (list terminator)
Oct 11 17:06:06 mailrelay postfix_2/smtpd[4449]: input attribute name: (end)
Oct 11 17:06:06 mailrelay postfix_2/smtpd[4449]: send attr flags = 2
Oct 11 17:06:06 ma

Re: Increasing logging on queue messages

2009-10-10 Thread Wietse Venema
MySQL Student:
> Hi,
> 
> >> ?said: 421 smtp01.example.com Error: timeout exceeded (in reply to end
> >> of DATA command))
> >
> > Unfortunately, you deleted lots of useful information from the
vvv
> > logging, including the break-down of handshake delays and of
^^^
> > transmission delays.
> 
> I wasn't sure that I should post the whole queued message here.

I think I was asking for LOGGING, in my reply above.

Please make a tcpdump recording. It will show why your KERNEL TCP
stack stops receiving ackowledgments from the remote side, or why
your KERNEL TCP stack stops sending to the remote side.

The TCP window scale factor is chosen by your KERNEL. Your KERNEL
has no clue about how much mail will be sent over the connection.

Finally, it is possible that hardware mis-behaves with certain 
bit patterns. One should never exclude the unlikely.

Wietse


Re: Increasing logging on queue messages

2009-10-10 Thread Stan Hoeppner
Barney Desmond put forth on 10/10/2009 12:19 AM:

> Mask "private" information if you must, but keep
> in mind that it can make it harder for others to help diagnose
> problems.

Absolutely correct.  Obfuscate left-hand-side only to prevent valid
addresses being scraped from this public mailing list by spammers.
Please do _not_ obfuscate domain names, IP addresses of servers, and
host FQDNs.  Many problems reported here are DNS or network related.
Without real domain names, real IPs, and real host names we can't help
people until prying that info from them in successive emails.  It's
better to post the real info up front.  There's not much a criminal can
do with just a domain, host name, or IP address.  And no one is going to
sue you for listing this information in an attempt to solve a problem.
You're not really protecting anything by obfuscating those.

--
Stan


Re: Increasing logging on queue messages

2009-10-09 Thread Barney Desmond
2009/10/10 MySQL Student :
>> Unfortunately, you deleted lots of useful information from the
>> logging, including the break-down of handshake delays and of
>> transmission delays.
>
> I wasn't sure that I should post the whole queued message here.

The logs for a single message are usually about half a dozen lines or
less. Ingress, cleanup, queue manager, egress, removal; not big at
all, and very useful. Mask "private" information if you must, but keep
in mind that it can make it harder for others to help diagnose
problems.

>> - Broken TCP window scaling support in firewall or end-host. Postfix
>>  2.6 and later have a main.cf:tcp_windowsize parameter. Setting
>>  this to some value under 65536 will work around this, otherwise
>>  you'd have to poke the kernel with sysctl or whatever.
>
> Many of the messages themselves are less than 65k. Does this make a 
> difference?

I can't comment with authority, but I'd say "not directly". The TCP
window generally counts bytes on the wire, which means there's other
overheads in play, not just the message data.

> Why would it only be broken handling some messages and not others? Or
> is that what only tcpdump would be able to tell us?

My first thought is, "we need to know more about these messages". You
say they're mostly for one domain, but you don't really specify
whether it's in or out, etc. Are the problematic messages perhaps:
- all from the same domain, to other domains?
- all to the same domain, from other domains?
- all to/from one user/address?

>> debug_peer_list can turn on logging for specific sites, not messages.
>
> It's really more of a mail router, so only handles mail for a single domain 
> :-(

You mentioned a relayhost (smarthost) earlier. Do all mails go through
it? Do the messages going via the relayhost have problems? Maybe the
ones that don't go via the relayhost have problems? If you can narrow
that down a little, you could use debug_peer_list on that host
specifically, which may help.


Re: Increasing logging on queue messages

2009-10-09 Thread MySQL Student
Hi,

>>  said: 421 smtp01.example.com Error: timeout exceeded (in reply to end
>> of DATA command))
>
> Unfortunately, you deleted lots of useful information from the
> logging, including the break-down of handshake delays and of
> transmission delays.

I wasn't sure that I should post the whole queued message here.

> When the receiver times out receiving data, the problem is more
> likely to be at the TCP level, so no amount of verbose Postfix
> logging is going to help.

Ah, makes sense.

> - Broken TCP window scaling support in firewall or end-host. Postfix
>  2.6 and later have a main.cf:tcp_windowsize parameter. Setting
>  this to some value under 65536 will work around this, otherwise
>  you'd have to poke the kernel with sysctl or whatever.

Many of the messages themselves are less than 65k. Does this make a difference?

Why would it only be broken handling some messages and not others? Or
is that what only tcpdump would be able to tell us?

> debug_peer_list can turn on logging for specific sites, not messages.

It's really more of a mail router, so only handles mail for a single domain :-(

Thanks again,
Alex


Re: Increasing logging on queue messages

2009-10-09 Thread Wietse Venema
MySQL Student:
> Hi,
> 
> I have a postfix mail server that successfully delivers thousands of
> messages per day, primarily to the same smarthost connected to it.
> However, I have a number of messages that are stuck in the queue with
> timeout errors:
> 
>  said: 421 smtp01.example.com Error: timeout exceeded (in reply to end
> of DATA command))

Unfortunately, you deleted lots of useful information from the
logging, including the break-down of handshake delays and of
transmission delays.

When the receiver times out receiving data, the problem is more
likely to be at the TCP level, so no amount of verbose Postfix
logging is going to help.

Instead, consider diagnosing this with tcpdump. Common TCP-level
problems are:

- Broken TCP window scaling support in firewall or end-host. Postfix
  2.6 and later have a main.cf:tcp_windowsize parameter. Setting
  this to some value under 65536 will work around this, otherwise
  you'd have to poke the kernel with sysctl or whatever.

- Broken IP path MTU discovery. The symptom is that TCP packets
  larger than some size never get acknowledged.

- Other TCP-level breakage that only shows up in packet-level
  recordings.

> I've read through the debugging and logging pages at postfix.org, and
> have increased the logging, but once you increase it to a certain
> point, it becomes impossible to separate the signal from the noise.
> How can I increase (or capture) the logging for either specific
> messages, or increase it only for messages stuck in the queue?

debug_peer_list can turn on logging for specific sites, not messages.

Wietse


Increasing logging on queue messages

2009-10-09 Thread MySQL Student
Hi,

I have a postfix mail server that successfully delivers thousands of
messages per day, primarily to the same smarthost connected to it.
However, I have a number of messages that are stuck in the queue with
timeout errors:

 said: 421 smtp01.example.com Error: timeout exceeded (in reply to end
of DATA command))

I've read through the debugging and logging pages at postfix.org, and
have increased the logging, but once you increase it to a certain
point, it becomes impossible to separate the signal from the noise.
How can I increase (or capture) the logging for either specific
messages, or increase it only for messages stuck in the queue?

Can anyone give me an idea why this problem would occur in the first
place, only on specific messages? It doesn't seem to be related to
size, load on the server or receiving system, or network issue,
because it delivers tens of thousands of other messages without
incident. There are no errors on the ethernet interfaces.

There are also errors that indicate the message may be delivered more than once:

timed out while sending end of data -- message may be sent more than once)

It's difficult to track these down, though, so I can't tell if they
actually have been delivered.

Thanks,
Alex