First, let me thank everyone for their helpfull suggestions.  After another
day of relentless investigation we uncovered a "Titanic" like chain of
events that caused the failure.  I review it hopefull that others may
benefit.

For a description of the problem - check thread "Mysterious Imail Server
Lockup"

Solution:

We discovered that our ISP had 'fat fingered' when sending us the IP of one
of their caching nameservers.  This incorrect IP was entered in Imail's DNS
server box in the SMTP service tab.  The secondary one was correct so mail
did flow without problem. But......

Declude would attempt to use the primary DNS server from Imail, this would
fail, and our 1.46 version of Declude would wait for approx 30 seconds
before continuing with its scan and terminating the process.  This resulted
in a large number (in relation to total available Imail related processes)
of declude.exe processes in the task manager that were consuming no
processor time and very little memory.

When the volume of new mail coming in started getting heavier, Declude Queue
would look at the  total # of allowed Imail related processes, see all the
declude.exe processes waiting for answers they were never going to get and
think the machine was under full load unable to handle any more volume.
Thus it would put the new Q files in the "overflow" folder.  All the while,
processor utilization is running around 5% and memory utilization around
10%.

As we added new users the load got heavier and heavier and the overflow
folder started exploding.  We finally clued in on something wrong when we
disabled Junkmail.  When we did this we saw Imail go in and blow through the
messages pegging the CPU and clearing things out quickly.  It was in working
with Scott @ Declude that we figured out the DNS problem (there were never
any errors in the Imail logs because it would use the secondary DNS server,
Ipswitch even helped us review them).

What exacerbated this whole problem was that we had turned of Diskkeeper on
the spool drive trying to eliminate anything that may be affected disk I/O
performance and the spool folder became massively fragmented.  So....under
heavy load all these factors working together just "gummed up the works"
even though the typical utilization tools showed low load factors.

So....thanks for the help and suggestions.  Please let me know if you see
any holes in our final solution (theory)     ;-)

David


To Unsubscribe: http://www.ipswitch.com/support/mailing-lists.html
List Archive: http://www.mail-archive.com/imail_forum%40list.ipswitch.com/
Knowledge Base/FAQ: http://www.ipswitch.com/support/IMail/

Reply via email to