On 12/10/2009 11:37 AM, Victor Duchovni wrote: > On Thu, Dec 10, 2009 at 11:12:36AM -0800, Emmett Culley wrote: > >>>> It seems like understanding where the "delay=86457" >>>> and "delays=86457/0/0.36/0.18" come from would probably help me to >>>> understand the 24 delay. >>> >>> Not really. The message took 1 day to enter the active queue, not >>> surprising, since pickup seems to have it a day late. Did your system >>> clock get changed (by a day or so) while Postfix was running? >> >> So, where does the delay=86457 come from? It is clear to me that the >> cron job's email is getting stuck on the originating machine for 24 hours. > > It comes from the message sitting in the maildrop queue for 24 hours, > or from a 24 hour delay between the invocation of sendmail(1) and EOF on > the message body. If, for example, the process that has the write-side of > the pipe to sendmail(1) keeps it open for 24 hours, the message will be > delayed by 24 hours. > >> I cannot let go of the fact that the log shows a delay=86457 (which is >> a few second more that 24 hours) isn't related to the late delivery. > > It is of course related, but does not answer your underlying question. > >> Where does the delay=86457 come from? Who sets it? What does it >> represent? Where is the messages stored for the missing 24 hours? > > The pre-active delay is measured from the time message is created until > it enters the active queue (in this case for the first time). So the > message was either: > > - Not yet fully formed (delayed EOF in the > mail(1)->sendmail(1)->postdrop(1)) process chain) > > - Fully formed, but unprocessed in the maildrop queue for 24 hours. > > - Fully formed, but unprocessed in the incoming queue for 24 hours. > > The last can be ruled out, because the pickup(8) and cleanup(8) > log entries do not predate the qmgr(8) log entries by 24 hours. > > So, it is one of the first two. None of this requires in-depth > understanding of the "delay" or "delays" fields, because the same evidence > is clear from the logs and headers. The "delays=1day" value just corroborates > the fact that the message went nowhere for 24 hours. > > Do try to rule out the possibility that the code creating the message > is directly or via a child-process keeping the write-side pipe open > for 24 hours or so. > > If this happens every day, just list the maildrop queue a few hours > before the expected departure time of the message: > > # ls -l /var/spool/postfix/maildrop > > and report your findings. If you find any files there with a timestamp > nearly a day old, also run: > > # fuser /var/spool/postfix/maildrop/<filename> > > and report the "ps -p<pid>" output for that process id (any parent > processes also if relevant). > Thanks to the hints above I looked into the code that I was calling from the cron job. I found that one of the scripts that was being called never returned until the next night when it was executed again. Hence the 24 hour delay.
I fixed that script and now the email from cron arrives when it should. Thanks for your comments.