Burton M. Strauss III wrote:

As a really rough guess, that sounds like some kind of conflict with one of the Mutexes holding things up? I wonder if you enable some of
the debugging messages, what shows up when it finally begins to respond?? You could even try the #define PARM_SHOW_NTOP_HEARTBEAT 1 option - it should then begin to show messages for each of the thread
loops, etc.


see full report at the end


but you are STRONGLY advised to `sysctl -w kern.maxproc=1024` as else ntop exhausted proc table with zombies (something like a self Dos; experimented) ...

I've always thought there should be a pthread_kill() call too. But
we only
kill threads at the end of the run, so the OS reap should fix this.

Try running with -K (debug mode) - it skips the fork() calls
(although this
means that the http creation is done in the same thread, so response time suffers - don't create the BIG pages showing 1000s of hosts...) See if -K makes ntop stop creating zombies. If so, that gives us a place to look.

bingo, no zombies with -K, it slow down with pcap 0.5, little speed
difference with 0.8.3


* somehow, i find strange ntop take 90% CPU of a PII/350 for a home
 server traffic (even it is high in this category)


Userland threads - so the compile converts many of the interrupt driven calls to polling calls - this is what we saw in FreeBSD. That's why the pcap set nonblocking option exists - it converts the key call to a poll()/nanosleep() cycle.

Give it a try - now that you have libpcap 0.8.x, the set_pcap_nonblocking() call should be available to you.

i discuss it on misc@ and will report there after about why and maybe when.


Tests:

* -K
# ntop -i xl0 -t 5 -M -m 192.168.2.0/24 --ipv4 --skip-version-check -u
_ntop -w 3000 -W 0 -p /etc/ntop/protocol.list -K
slowness seems to last more and stay (but as said, debug plays ...),
have telnet answer
but no zombies
else debug only, as far as i see, some "CMPFCTN_DEBUG:
setResolvedName(0x3cd40000)" msg.
note also, than cpu stay low
16181 _ntop      4    0   15M   25M sleep bpf      1:00  0.05% ntop

test during 1h30 hours

* pcap_setnonblock (with pcap 0.8.3)
# ntop -i xl0 -t 5 -M -m 192.168.2.0/24 --ipv4 --skip-version-check -u
_ntop -w 3000 -W 0 -p /etc/ntop/protocol.list --set-pcap-nonblocking
fast, for real (respond in first 10 sec; weird, previously there was no
change in speed ...)

but zombie in two minutes
$ sudo lsof|grep ntop|grep IPv
ntop      17612    _ntop   11u  IPv4 0xd109ce38        0t0      TCP
*:3000 (LISTEN)
$ ps ax|grep [n]top
13966 ??  ZW      0:00.00 (ntop)
23426 ??  ZW      0:00.00 (ntop)
14575 ??  ZW      0:00.00 (ntop)
 1182 ??  ZW      0:00.00 (ntop)
 6982 ??  ZW      0:00.00 (ntop)
27065 ??  ZW      0:00.00 (ntop)
23495 ??  ZW      0:00.00 (ntop)
13042 ??  ZW      0:00.00 (ntop)
 1652 ??  ZW      0:00.00 (ntop)
10075 ??  ZW      0:00.00 (ntop)
17612 p4  I+      0:06.39 ntop -i xl0 -t 5 -M -m 192.168.2.0/24 --ipv4
--skip-version-check -u _ntop -w 3000 -W 0 -p /e

# ntop -i xl0 -t 5 -M -m 192.168.2.0/24 --ipv4 --skip-version-check -u
_ntop -w 3000 -W 0 -p /etc/ntop/protocol.list --set-pcap-nonblocking -K
speed ok, too; no zombie
cpu ok
17345 _ntop      4    0   14M   25M sleep bpf      0:08  3.61% ntop

Mutexes:

Mutex gdbmMutex, is unlocked.
     locked: 161 times, last was at Fri Apr 16 21:24:52 2004
util.c:4084(17345)
     unlocked: 161 times, last was util.c:4091(17345)
     longest: 0 sec from util.c:4091
Mutex packetProcessMutex, is unlocked.
     locked: 8518 times, last was at Fri Apr 16 21:24:55 2004
pbuf.c:2089(17345)
     unlocked: 8518 times, last was pbuf.c:2109(17345)
     longest: 0 sec from pbuf.c:2109
Mutex purgeMutex, is locked.    <= this one seems always locked but, it's
http code, so maybe normal ?
     locked: 54 times, last was at Fri Apr 16 21:24:55 2004
http.c:3056(17345)
     unlocked: 53 times, last was http.c:3092(17345)
     longest: 1 sec from http.c:3092
Mutex hostsHashMutex, is unlocked.
     locked: 24903 times, last was at Fri Apr 16 21:24:55 2004
pbuf.c:2508(17345)
     unlocked: 24903 times, last was pbuf.c:3315(17345)
     longest: 0 sec from pbuf.c:3315
Mutex tcpSessionsMutex, is unlocked.
     locked: 69593 times, last was at Fri Apr 16 21:24:55 2004
sessions.c:634(17345)
     unlocked: 69593 times, last was sessions.c:2033(17345)
     longest: 1 sec from sessions.c:551
Mutex purgePortsMutex, is unlocked.
     locked: 4144 times, last was at Fri Apr 16 21:24:55 2004
pbuf.c:698(17345)
     unlocked: 4144 times, last was pbuf.c:719(17345)
     longest: 0 sec from pbuf.c:719
Mutex securityItemsMutex, is unlocked.
     locked: 54 times, last was at Fri Apr 16 21:24:55 2004
http.c:2631(17345)
     unlocked: 54 times, last was http.c:2644(17345)
     longest: 0 sec from http.c:2626

Report created on Sat Apr 17 08:39:20 2004 [ntop uptime: 11:16:26]
Generated by ntop v.3.0 SourceForge .tgz MT (SSL) [i386-unknown-openbsd3.5]

* PARM_SHOW_NTOP_HEARTBEAT 1 with pcap 0.8.3

$ telnet localhost 3000
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
GET / HTTP/1.0
HTTP/1.0 408 Request Time-out
Date: Sat, 17 Apr 2004 06:46:33 GMT
Cache-Control: no-cache
Expires: 0
Connection: close
Server: ntop/3.0 SourceForge .tgz (i386-unknown-openbsd3.5)
Content-Type: text/html

<HTML>
<HEAD>
<TITLE>Error 408</TITLE>
<META HTTP-EQUIV=Pragma CONTENT=no-cache>
<META HTTP-EQUIV=Cache-Control CONTENT=no-cache>
<LINK REL=stylesheet HREF="/style.css" type="text/css">
<SCRIPT SRC="/functions.js" TYPE="text/javascript"
LANGUAGE="javascript"></SCRIPT>
</HEAD>
<BODY BACKGROUND="/white_bg.gif" BGCOLOR="#FFFFFF" LINK=blue VLINK=blue>
<H1>Error 408</H1>
The request was timed-out.
<P>Received request:<BR><BLOCKQUOTE><TT>&quot;GET /
HTTP/1.0&quot;</TT></BLOCKQUOTE>Connection closed by foreign host.
$

Sat Apr 17 08:49:33 2004 [MSGID9360773] INITWEB: Initializing web server
Sat Apr 17 08:49:33 2004 [MSGID8922501] INITWEB: Initializing tcp/ip
socket connections for web server
Sat Apr 17 08:49:33 2004 [MSGID0349927] Initializing socket, port 3000,
address (any)
Sat Apr 17 08:49:33 2004 [MSGID0218735] INITWEB: Created a new socket (11)
Sat Apr 17 08:49:33 2004 [MSGID0349927] INITWEB: Initialized socket,
port 3000, address (any)
Sat Apr 17 08:49:33 2004 [MSGID0818081] INITWEB: Waiting for HTTP
connections on port 3000
Sat Apr 17 08:49:33 2004 [MSGID0841093] INITWEB: Starting web server
Sat Apr 17 08:49:33 2004 [MSGID8791429] THREADMGMT: Started thread
(1009242112) for web server
Sat Apr 17 08:49:33 2004 [MSGID8437197] INITWEB: Server started...
continuing with initialization
Sat Apr 17 08:49:33 2004 [MSGID0037760] THREADMGMT: Started thread
(1009243136) for network packet sniffing on xl0
Sat Apr 17 08:49:33 2004 [MSGID8940166]
HEARTBEAT(000000003)[main.c:1235]: main(), sleep()...
Sat Apr 17 08:49:33 2004 [MSGID0548089] THREADMGMT: web connections
thread (22223) started...
Sat Apr 17 08:49:33 2004 [MSGID0316203] Note: SIGPIPE handler set (ignore)
Sat Apr 17 08:49:33 2004 [MSGID0986275] WEB: ntop's web server is now
processing requests
Sat Apr 17 08:49:33 2004 [MSGID0393976] THREADMGMT: pcap dispatch thread
running...
Sat Apr 17 08:50:32 2004 [MSGID8940166]
HEARTBEAT(000005262)[webInterface.c:8154]: handleWebConnections()
Sat Apr 17 08:50:32 2004 [MSGID8807438] SECURITY: Loading items table
Sat Apr 17 08:50:32 2004 [MSGID8778652] NOTE: -L | --use-syslog=facility
not specified, child processes will log to the default (24).
Sat Apr 17 08:50:32 2004 [MSGID8940166]
HEARTBEAT(000005263)[main.c:1243]: main(), sleep()...woke
Sat Apr 17 08:50:32 2004 [MSGID8940166]
HEARTBEAT(000005264)[main.c:1235]: main(), sleep()...
Sat Apr 17 08:50:32 2004 [MSGID8940166]
HEARTBEAT(000008685)[webInterface.c:8154]: handleWebConnections()
Sat Apr 17 08:50:32 2004 [MSGID8940166]
HEARTBEAT(000008686)[webInterface.c:8154]: handleWebConnections()
Sat Apr 17 08:50:32 2004 [MSGID8940166]
HEARTBEAT(000014888)[webInterface.c:8154]: handleWebConnections()
Sat Apr 17 08:50:32 2004 [MSGID8940166]
HEARTBEAT(000014889)[webInterface.c:8154]: handleWebConnections()
Sat Apr 17 08:50:32 2004 [MSGID8940166]
HEARTBEAT(000018789)[webInterface.c:8154]: handleWebConnections()
Sat Apr 17 08:50:32 2004 [MSGID8940166]
HEARTBEAT(000030100)[webInterface.c:8154]: handleWebConnections()
Sat Apr 17 08:50:32 2004 [MSGID8940166]
HEARTBEAT(000032983)[webInterface.c:8154]: handleWebConnections()
Sat Apr 17 08:50:32 2004 [MSGID8940166]
HEARTBEAT(000046906)[webInterface.c:8154]: handleWebConnections()
Sat Apr 17 08:50:33 2004 [MSGID8940166]
HEARTBEAT(000085467)[ntop.c:625]: scanIdleLoop(), sleep(60)...woke
Sat Apr 17 08:50:33 2004 [MSGID0825709] IDLE_PURGE: FINISHED selection,
0 [out of 37] hosts selected
Sat Apr 17 08:50:36 2004 [MSGID9399277] IDLE_PURGE: Device 0: no hosts
deleted
Sat Apr 17 08:50:36 2004 [MSGID8940166]
HEARTBEAT(000253488)[ntop.c:621]: scanIdleLoop(), sleep(60)...
Sat Apr 17 08:50:42 2004 [MSGID8940166]
HEARTBEAT(000904085)[main.c:1243]: main(), sleep()...woke
Sat Apr 17 08:50:42 2004 [MSGID8940166]
HEARTBEAT(000904086)[main.c:1235]: main(), sleep()...
Sat Apr 17 08:50:45 2004 [MSGID8940166]
HEARTBEAT(001225987)[webInterface.c:8154]: handleWebConnections()
Sat Apr 17 08:50:46 2004 [MSGID8940166]
HEARTBEAT(001335741)[webInterface.c:8154]: handleWebConnections()
Sat Apr 17 08:50:46 2004 [MSGID8940166]
HEARTBEAT(001351714)[webInterface.c:8154]: handleWebConnections()
Sat Apr 17 08:50:46 2004 [MSGID8940166]
HEARTBEAT(001394158)[webInterface.c:8154]: handleWebConnections()
Sat Apr 17 08:50:46 2004 [MSGID8940166]
HEARTBEAT(001401031)[webInterface.c:8154]: handleWebConnections()
Sat Apr 17 08:50:49 2004 [MSGID8940166]
HEARTBEAT(001665417)[webInterface.c:8154]: handleWebConnections()
Sat Apr 17 08:50:49 2004 [MSGID8940166]
HEARTBEAT(001676122)[webInterface.c:8154]: handleWebConnections()
Sat Apr 17 08:50:49 2004 [MSGID8940166]
HEARTBEAT(001690447)[webInterface.c:8154]: handleWebConnections()
Sat Apr 17 08:50:49 2004 [MSGID8940166]
HEARTBEAT(001693646)[webInterface.c:8154]: handleWebConnections()
Sat Apr 17 08:50:51 2004 [MSGID8940166]
HEARTBEAT(001885737)[webInterface.c:8154]: handleWebConnections()
Sat Apr 17 08:50:51 2004 [MSGID8940166]
HEARTBEAT(001921166)[webInterface.c:8154]: handleWebConnections()
Sat Apr 17 08:50:51 2004 [MSGID8940166]
HEARTBEAT(001935612)[webInterface.c:8154]: handleWebConnections()
Sat Apr 17 08:50:51 2004 [MSGID8940166]
HEARTBEAT(001941311)[webInterface.c:8154]: handleWebConnections()
Sat Apr 17 08:50:51 2004 [MSGID8940166]
HEARTBEAT(001941312)[webInterface.c:8154]: handleWebConnections()
Sat Apr 17 08:50:52 2004 [MSGID8940166]
HEARTBEAT(001942702)[webInterface.c:8154]: handleWebConnections()
Sat Apr 17 08:50:52 2004 [MSGID8940166]
HEARTBEAT(001942703)[webInterface.c:8154]: handleWebConnections()
Sat Apr 17 08:50:52 2004 [MSGID8940166]
HEARTBEAT(001942704)[webInterface.c:8154]: handleWebConnections()
Sat Apr 17 08:50:52 2004 [MSGID8940166]
HEARTBEAT(001957113)[webInterface.c:8154]: handleWebConnections()
Sat Apr 17 08:50:52 2004 [MSGID8940166]
HEARTBEAT(001970567)[main.c:1243]: main(), sleep()...woke
Sat Apr 17 08:50:52 2004 [MSGID8940166]
HEARTBEAT(001970568)[main.c:1235]: main(), sleep()...

# ps ax|grep ntop|grep ZW| wc -l
      22
  PID USERNAME PRI NICE  SIZE   RES STATE WAIT     TIME    CPU COMMAND
22223 _ntop     63    0   12M   22M run   -        1:16 90.43% ntop

normal time response
speed correct




* PARM_SHOW_NTOP_HEARTBEAT 1 with pcap 0.5 # ntop -i xl0 -t 5 -M -m 192.168.2.0/24 --ipv4 --skip-version-check -u _ntop -w 3000 -W 0 -p /etc/ntop/protocol.list

Sat Apr 17 09:06:31 2004 [MSGID9360773] INITWEB: Initializing web server
Sat Apr 17 09:06:31 2004 [MSGID8922501] INITWEB: Initializing tcp/ip
socket connections for web server
Sat Apr 17 09:06:31 2004 [MSGID0349927] Initializing socket, port 3000,
address (any)
Sat Apr 17 09:06:31 2004 [MSGID0218735] INITWEB: Created a new socket (11)
Sat Apr 17 09:06:31 2004 [MSGID0349927] INITWEB: Initialized socket,
port 3000, address (any)
Sat Apr 17 09:06:31 2004 [MSGID0818081] INITWEB: Waiting for HTTP
connections on port 3000
Sat Apr 17 09:06:31 2004 [MSGID0841093] INITWEB: Starting web server
Sat Apr 17 09:06:31 2004 [MSGID8791429] THREADMGMT: Started thread
(1009180672) for web server
Sat Apr 17 09:06:31 2004 [MSGID8437197] INITWEB: Server started...
continuing with initialization
Sat Apr 17 09:06:31 2004 [MSGID0037760] THREADMGMT: Started thread
(1009181696) for network packet sniffing on xl0
Sat Apr 17 09:06:31 2004 [MSGID8940166]
HEARTBEAT(000000003)[main.c:1235]: main(), sleep()...
Sat Apr 17 09:06:31 2004 [MSGID0548089] THREADMGMT: web connections
thread (27988) started...
Sat Apr 17 09:06:31 2004 [MSGID0316203] Note: SIGPIPE handler set (ignore)
Sat Apr 17 09:06:31 2004 [MSGID0986275] WEB: ntop's web server is now
processing requests
Sat Apr 17 09:06:31 2004 [MSGID0393976] THREADMGMT: pcap dispatch thread
running...
=> waiting for response
Sat Apr 17 09:07:27 2004 [MSGID8940166]
HEARTBEAT(000005086)[webInterface.c:8154]: handleWebConnections()
Sat Apr 17 09:07:27 2004 [MSGID8807438] SECURITY: Loading items table
Sat Apr 17 09:07:27 2004 [MSGID8940166]
HEARTBEAT(000005087)[main.c:1243]: main(), sleep()...woke
Sat Apr 17 09:07:27 2004 [MSGID8940166]
HEARTBEAT(000005088)[main.c:1235]: main(), sleep()...
=> always waiting (get title, little after first menu bar), cpu low, no
zombie
Sat Apr 17 09:08:38 2004 [MSGID8940166]
HEARTBEAT(000010232)[webInterface.c:8154]: handleWebConnections()
Sat Apr 17 09:08:38 2004 [MSGID8940166]
HEARTBEAT(000010233)[webInterface.c:8154]: handleWebConnections()
Sat Apr 17 09:08:38 2004 [MSGID8940166]
HEARTBEAT(000010234)[webInterface.c:8154]: handleWebConnections()
Sat Apr 17 09:08:38 2004 [MSGID8940166]
HEARTBEAT(000010235)[ntop.c:625]: scanIdleLoop(), sleep(60)...woke
Sat Apr 17 09:08:38 2004 [MSGID8940166]
HEARTBEAT(000010236)[main.c:1243]: main(), sleep()...woke
Sat Apr 17 09:08:38 2004 [MSGID8940166]
HEARTBEAT(000010237)[main.c:1235]: main(), sleep()...
Sat Apr 17 09:08:38 2004 [MSGID8940166]
HEARTBEAT(000010269)[webInterface.c:8154]: handleWebConnections()
Sat Apr 17 09:10:16 2004 [MSGID8940166]
HEARTBEAT(000017240)[main.c:1243]: main(), sleep()...woke
Sat Apr 17 09:10:16 2004 [MSGID8940166]
HEARTBEAT(000017241)[main.c:1235]: main(), sleep()...
Sat Apr 17 09:10:16 2004 [MSGID8940166]
HEARTBEAT(000017242)[ntop.c:673]: scanFingerprintLoop(), sleep()...woke
Sat Apr 17 09:10:16 2004 [MSGID8757584] OSFP: scanFingerprintLoop()
checked 6, resolved 6
Sat Apr 17 09:10:16 2004 [MSGID8940166]
HEARTBEAT(000017243)[ntop.c:669]: scanFingerprintLoop(), sleep()...
Sat Apr 17 09:10:35 2004 [MSGID8940166]
HEARTBEAT(000019061)[main.c:1243]: main(), sleep()...woke
Sat Apr 17 09:10:35 2004 [MSGID8940166]
HEARTBEAT(000019062)[main.c:1235]: main(), sleep()...
=> same
Sat Apr 17 09:10:45 2004 [MSGID8940166]
HEARTBEAT(000019834)[main.c:1243]: main(), sleep()...woke
Sat Apr 17 09:10:45 2004 [MSGID8940166]
HEARTBEAT(000019835)[main.c:1235]: main(), sleep()...
Sat Apr 17 09:11:09 2004 [MSGID8940166]
HEARTBEAT(000022174)[main.c:1243]: main(), sleep()...woke
Sat Apr 17 09:11:09 2004 [MSGID8940166]
HEARTBEAT(000022175)[main.c:1235]: main(), sleep()...
Sat Apr 17 09:11:57 2004 [MSGID0825709] IDLE_PURGE: FINISHED selection,
0 [out of 35] hosts selected
Sat Apr 17 09:12:38 2004 [MSGID8940166]
HEARTBEAT(000029083)[webInterface.c:8154]: handleWebConnections()
Sat Apr 17 09:12:38 2004 [MSGID8778652] NOTE: -L | --use-syslog=facility
not specified, child processes will log to the default (24).
Sat Apr 17 09:12:38 2004 [MSGID8940166]
HEARTBEAT(000029084)[webInterface.c:8154]: handleWebConnections()
Sat Apr 17 09:12:38 2004 [MSGID8940166]
HEARTBEAT(000029085)[main.c:1243]: main(), sleep()...woke
Sat Apr 17 09:12:38 2004 [MSGID8940166]
HEARTBEAT(000029086)[main.c:1235]: main(), sleep()...
Sat Apr 17 09:12:39 2004 [MSGID8940166]
HEARTBEAT(000029174)[webInterface.c:8154]: handleWebConnections()
Sat Apr 17 09:12:39 2004 [MSGID8940166]
HEARTBEAT(000032557)[webInterface.c:8154]: handleWebConnections()
Sat Apr 17 09:12:39 2004 [MSGID8940166]
HEARTBEAT(000032731)[webInterface.c:8154]: handleWebConnections()
Sat Apr 17 09:12:40 2004 [MSGID9399277] IDLE_PURGE: Device 0: no hosts
deleted
Sat Apr 17 09:12:41 2004 [MSGID8940166]
HEARTBEAT(000150940)[ntop.c:621]: scanIdleLoop(), sleep(60)...
Sat Apr 17 09:12:46 2004 [MSGID8940166]
HEARTBEAT(000772530)[ntop.c:673]: scanFingerprintLoop(), sleep()...woke
Sat Apr 17 09:12:46 2004 [MSGID8940166]
HEARTBEAT(000775980)[ntop.c:669]: scanFingerprintLoop(), sleep()...
=> have all
Sat Apr 17 09:12:48 2004 [MSGID8940166]
HEARTBEAT(001094595)[main.c:1243]: main(), sleep()...woke
Sat Apr 17 09:12:48 2004 [MSGID8940166]
HEARTBEAT(001094596)[main.c:1235]: main(), sleep()...
Sat Apr 17 09:12:58 2004 [MSGID8940166]
HEARTBEAT(002259902)[main.c:1243]: main(), sleep()...woke
Sat Apr 17 09:12:58 2004 [MSGID8940166]
HEARTBEAT(002259903)[main.c:1235]: main(), sleep()...
=> ask for configuration, normal speed, cpu high, first zombies

Report created on Sat Apr 17 09:13:13 2004 [ntop uptime: 6:42]
Generated by ntop v.3.0 SourceForge .tgz MT (SSL) [i386-unknown-openbsd3.5]

seems set nonblocking really helps only for the first ~five minutes on
speed, but it helps a lot for cpu.

Report created on Sat Apr 17 10:02:25 2004 [ntop uptime: 55:54]
Generated by ntop v.3.0 SourceForge .tgz MT (SSL) [i386-unknown-openbsd3.5]
speed ok, 155 zombies, cpu high


Regards


Julien


_______________________________________________ Ntop-dev mailing list [EMAIL PROTECTED] http://listgateway.unipi.it/mailman/listinfo/ntop-dev

Reply via email to