[Ntop-dev] Deadlocks ... hanging ntop

Burton M. Strauss III Fri, 13 Dec 2002 11:21:38 -0800

There are a couple of reports (and both Luca and I have seen the problem
across a range of OSes) with ntop deadlocking.  The symptom is that the web
server is up, but no packets are being processed, so the counts don't
change.  It happens on the 5 minute mark, i.e. related to idle purge.


Clearly, part of the problem was my performance change in hash.c for the
idle purge of large networks.  I've finally found the problem (operator
precedence) and so I believe I've closed that problem.  That fix is now in
the cvs (if you track my stuff, it's refs 178 and 179).  This code will be
in the snapshot of 14Dec2002.

It should be VASTLY better.

However, I've still seen ntop deadlock with that fix in.  Just much, MUCH
less frequently.


Over the past couple of days, I've put a couple of enhancements in to try
and trap these things.

1) info.html (and textinfo.html) now report "blocked" mutexes.

A single instance of a "blocked" mutex is NOT a problem (mutexes do block
now and then).

However, if it doesn't clear after a few seconds, that IS a problem.  Best
way to check is to refresh and see if the # of locks/unlocks grow.


2) Frees of unlocked mutexes are reported:

WARNING: releaseMutex() call with an UN-LOCKED mutex [%s:%d]


3) self-LOCKING is reported:

WARNING: accessMutex() call with a self-LOCKED mutex [from pbuf.c:1719
processPacket, locked by processPacket]

This is where a thread locks a mutex that it has already locked.  Note that
this warning is imperfect - the fields ntop uses are added onto the POSIX
Pthread mutex structure.  There's no way to make the added code atomic w/o
single threading ntop, so the simple minded wrapper I've put in place may
give spurious results.

It does not impact the functioning of ntop, it is just a log message.

             What I'm saying is, "DON'T PANIC".

If ntop does deadlock (i.e. no packets being processed and the # of
locks/unlocks doesn't grow), then the WARNING may explain why.

But, if you see the warning and ntop doesn't deadlock, well, then it was
just the way things got run.

If you're running ntop capturing from more than one network card, it's more
likely to see a spurious message because there are two packet capture
threads and (right now), the lock "keys" are the same.  Adding the pid to
the lock stuff would fix that, but the focus is on the deadlock, not the
cosmetics.  Especially if it's a busy network.

If your ntop locks up and you have pulled hash.c revision 2.108 from the cvs
(that is after
13Dec2002 18:50:48 in Pisa - 12:50:48pm US Central Time), please find the
lines from the log and report them on ntop-dev.


-----Burton





_______________________________________________
Ntop-dev mailing list
[EMAIL PROTECTED]
http://listgateway.unipi.it/mailman/listinfo/ntop-dev

[Ntop-dev] Deadlocks ... hanging ntop

Reply via email to