Right, you won't see the idle purge at the default trace level. IIRC they are _INFO (-t 4) or _NOISY (-t 5) messages.
-----Burton -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED] Sent: Tuesday, September 13, 2005 9:06 AM To: [email protected] Subject: Fw: [Ntop-dev] Core dump in 3.2rc1 netflow handling I poked around some more after sending the message. I went to see what was going on with the free() thread. I guess I found the idle purge: (gdb) thread 23 [Switching to thread 23 (process 20456)]#0 0xb5bebf50 in free () from /lib/tls/libc.so.6 (gdb) bt #0 0xb5bebf50 in free () from /lib/tls/libc.so.6 #1 0xb5f62215 in ntop_safefree (ptr=0xb2e00010, file=0xb5f8d5c5 "hash.c", line=504) at leaks.c:188 #2 0xb5f5a36a in freeHostInfo (host=0xa0e71170, actualDeviceId=7) at hash.c:504 #3 0xb5f5aef0 in purgeIdleHosts (actDevice=7) at hash.c:693 #4 0xb5f63b21 in scanIdleLoop (notUsed=0x0) at ntop.c:647 #5 0xb5d0cdac in start_thread () from /lib/tls/libpthread.so.0 #6 0xb5c569ea in clone () from /lib/tls/libc.so.6 ----- Forwarded by Beirne Konarski/CORP/FXG on 09/13/2005 10:02 AM ----- Beirne Konarski/CORP/FXG To 09/13/2005 09:47 [email protected] AM cc Subject RE: [Ntop-dev] Core dump in 3.2rc1 netflow handling(Document link: Beirne Konarski) The idea of something concurrent affecting the variables makes sense. I ran it again with the trace level set to 5. It crashed calling the same function with a NULL pointer from a different place, right after a line where the same dereference succeeded. There isn't anything about the idle purge in the trace output. I've included the end of the output below but I did search for the words idle and purge in the rest of the log. gdb run on core dump: (gdb) bt #0 0xb5bf4857 in memset () from /lib/tls/libc.so.6 #1 0x89a5c648 in ?? () #2 0xb5f839b1 in resetUsageCounter (counter=0x170c) at util.c:3690 #3 0xb5f83ca1 in resetSecurityHostTraffic (el=0x89a5c648) at util.c:3747 #4 0xb5f65802 in allocateSecurityHostPkts (srcHost=0x89a5c648) at pbuf.c:123 #5 0xb23e8d68 in handleGenericFlow (recordActTime=-1862720189, recordSysUpTime=-1775288765, record=0xadd2aab0, deviceId=7) at netflowPlugin.c:612 #6 0xb23e9c62 in dissectFlow (buffer=0xadd2b200 "", bufferLen=1464, deviceId=7) at netflowPlugin.c:1276 #7 0xb23eab7f in netflowMainLoop (_deviceId=0x7) at netflowPlugin.c:1469 #8 0xb5d0cdac in start_thread () from /lib/tls/libpthread.so.0 #9 0xb5c569ea in clone () from /lib/tls/libc.so.6 (gdb) frame 3 #3 0xb5f83ca1 in resetSecurityHostTraffic (el=0x89a5c648) at util.c:3747 3747 resetUsageCounter(&el->secHostPkts->tcpToDiagnosticPortRcvd); (gdb) list 3742 resetUsageCounter(&el->secHostPkts->ackXmasFinSynNullScanRcvd); 3743 resetUsageCounter(&el->secHostPkts->rejectedTCPConnRcvd); 3744 resetUsageCounter(&el->secHostPkts->establishedTCPConnRcvd); 3745 resetUsageCounter(&el->secHostPkts->udpToClosedPortRcvd); 3746 resetUsageCounter(&el->secHostPkts->udpToDiagnosticPortRcvd); 3747 resetUsageCounter(&el->secHostPkts->tcpToDiagnosticPortRcvd); 3748 resetUsageCounter(&el->secHostPkts->tinyFragmentRcvd); 3749 resetUsageCounter(&el->secHostPkts->icmpFragmentRcvd); 3750 resetUsageCounter(&el->secHostPkts->overlappingFragmentRcvd); 3751 resetUsageCounter(&el->secHostPkts->closedEmptyTCPConnRcvd); (gdb) print el->secHostPkts $1 = (SecurityHostProbes *) 0x0 (gdb) info thread 26 process 20452 0xb75ebc32 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 25 process 20454 0xb75ebc32 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 24 process 20455 0xb75ebc32 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 23 process 20456 0xb5bebf50 in free () from /lib/tls/libc.so.6 22 process 20457 0xb75ebc32 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 21 process 20459 0xb75ebc32 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 20 process 20460 0xb75ebc32 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 19 process 20461 0xb75ebc32 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 18 process 20462 0xb75ebc32 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 17 process 20463 0xb75ebc32 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 16 process 20464 0xb75ebc32 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 15 process 20465 0xb75ebc32 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 14 process 20467 0xb75ebc32 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 13 process 20468 0xb75ebc32 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 12 process 20469 0xb75ebc32 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 11 process 20470 0xb75ebc32 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 10 process 20471 0xb75ebc32 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 9 process 20472 0xb75ebc32 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 8 process 20473 0xb75ebc32 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 7 process 20474 0xb75ebc32 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 6 process 20475 0xb75ebc32 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 5 process 20476 0xb75ebc32 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 4 process 20477 0xb75ebc32 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 3 process 20478 0xb75ebc32 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 2 process 20479 0xb75ebc32 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 * 1 process 20466 0xb5bf4857 in memset () from /lib/tls/libc.so.6 End of trace output: Mon Sep 12 16:12:18 2005 THREADMGMT[t2875988912]: NETFLOW: thread running [p20452] Mon Sep 12 16:12:18 2005 THREADMGMT[t2847357872]: NETFLOW: thread running [p20452] Mon Sep 12 16:12:18 2005 THREADMGMT[t2817854384]: NETFLOW: thread running [p20452] Mon Sep 12 16:12:18 2005 THREADMGMT[t3033750448]: SFP: Fingerprint scan thread running [p20452] Mon Sep 12 16:12:18 2005 THREADMGMT[t2903215024]: NETFLOW: thread running [p20452] Mon Sep 12 16:12:18 2005 THREADMGMT[t2978061232]: NETFLOW: thread running [p20452] Mon Sep 12 16:12:18 2005 THREADMGMT[t2966432688]: NETFLOW: thread running [p20452] Mon Sep 12 16:12:18 2005 THREADMGMT[t2929036208]: NETFLOW: thread running [p20452] Mon Sep 12 16:12:18 2005 THREADMGMT[t3023260592]: SIH: Idle host scan thread running [p20452] Mon Sep 12 16:12:18 2005 THREADMGMT[t2802809776]: NETFLOW: thread running [p20452] Mon Sep 12 16:12:18 2005 THREADMGMT[t2771864496]: NETFLOW: thread running [p20452] Mon Sep 12 16:12:18 2005 THREADMGMT[t2861816752]: NETFLOW: thread running [p20452] Mon Sep 12 16:12:18 2005 THREADMGMT[t2787478448]: NETFLOW: thread running [p20452] Mon Sep 12 16:12:28 2005 THREADMGMT[t2736782256]: RRD: Throughput data collection: Thread starting [p20452] Mon Sep 12 16:12:28 2005 THREADMGMT[t2736782256]: RRD: Throughput data collection: Thread running [p20452] Mon Sep 12 16:12:28 2005 THREADMGMT[t2736782256]: RRD: Started thread for throughput data collection Mon Sep 12 16:12:28 2005 THREADMGMT[t2761374640]: RRD: Data collection thread running [p20452] Syslog entries: Sep 12 16:01:41 localhost kernel: device eth0 entered promiscuous mode Sep 12 16:01:41 localhost kernel: application bug: ntop(20313) has SIGCHLD set to SIG_IGN but calls wait(). Sep 12 16:01:41 localhost kernel: (see the NOTES section of 'man 2 wait'). Workaround activated. Sep 12 16:01:47 localhost kernel: device eth0 left promiscuous mode Sep 12 16:06:12 localhost ntop[20417]: THREADMGMT[t3045949568]: ntop RUNSTATE: PREINIT(1) Sep 12 16:06:12 localhost ntop[20417]: THREADMGMT[t3045949568]: ntop RUNSTATE: INIT(2) Sep 12 16:06:12 localhost kernel: device eth0 entered promiscuous mode Sep 12 16:06:13 localhost kernel: application bug: ntop(20417) has SIGCHLD set to SIG_IGN but calls wait(). Sep 12 16:06:13 localhost kernel: (see the NOTES section of 'man 2 wait'). Workaround activated. Sep 12 16:09:14 localhost kernel: device eth0 left promiscuous mode Sep 12 16:12:17 localhost ntop[20452]: THREADMGMT[t3045949568]: ntop RUNSTATE: PREINIT(1) Sep 12 16:12:17 localhost ntop[20452]: THREADMGMT[t3045949568]: ntop RUNSTATE: INIT(2) Sep 12 16:12:17 localhost kernel: device eth0 entered promiscuous mode Sep 12 16:12:17 localhost kernel: application bug: ntop(20452) has SIGCHLD set to SIG_IGN but calls wait(). Sep 12 16:12:17 localhost kernel: (see the NOTES section of 'man 2 wait'). Workaround activated. Sep 12 16:59:42 localhost kernel: device eth0 left promiscuous mode "Burton Strauss" <[EMAIL PROTECTED] rt.com> To Sent by: <[email protected]> ntop-dev-bounces@ cc unipi.it Subject RE: [Ntop-dev] Core dump in 3.2rc1 09/12/2005 04:54 netflow handling PM Please respond to [email protected] _dl_sysinfo_int80 usually means a deadlock in malloc() or similar call (it's the user-kernel interface call). SIGSEGV in the call to malloc() in leaks.c, line 75 is also consistent w/ a malloc() chain corruption. Problem is these are all symptoms - the actual problem could be many seconds or minutes before. "The NULL pointer looks like trouble.": Right - here's the code block: void resetSecurityHostTraffic(HostTraffic *el) { if(el->secHostPkts == NULL) return; resetUsageCounter(&el->secHostPkts->synPktsSent); ... What this appears to mean is that it was NOT null at the start of this block of code and yet is now is. So... Pls check in the log and see if idle purge was kicking around (you may need to go to a higher trace level to see the IDLE_PURGE messages) -----Burton -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED] Sent: Monday, September 12, 2005 3:25 PM To: [email protected] Subject: [Ntop-dev] Core dump in 3.2rc1 netflow handling I'm having trouble with ntop dying. I ran it through gdb and come up with the information listed below. I ran it once from within gdb and once from outside, then pulling the core into gdb. I'm running it on RedHat AS 3 and built it myself. Ntop crashes pretty easily so if there is something I should try let me know and I can try it. Run inside gdb: Error type: Segmentation fault (gdb) info thread ...... 9 Thread -1328637008 (LWP 19784) 0xb75ebc32 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 8 Thread -1315648592 (LWP 19783) 0xb75ebc32 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 * 7 Thread -1304302672 (LWP 19782) 0xb5becaaa in _int_malloc () from /lib/tls/libc.so.6 6 Thread -1292686416 (LWP 19781) 0xb75ebc32 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 5 Thread -1282196560 (LWP 19780) 0xb75ebc32 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 .... (gdb) thread 7 [Switching to thread 7 (Thread -1304302672 (LWP 19782))]#2 0xb5f62062 in ntop_safemalloc (sz=1, file=0xb5f8d5c5 "hash.c", line=954) at leaks.c:75 75 thePtr = malloc(sz); (gdb) list 70 */ 71 } 72 #endif 73 74 #ifndef USE_GC 75 thePtr = malloc(sz); 76 #else 77 thePtr = GC_malloc_atomic(sz); 78 #endif 79 (gdb) print sz $4 = 1 (gdb) bt #0 0xb5becaaa in _int_malloc () from /lib/tls/libc.so.6 #1 0xb5bebdfd in malloc () from /lib/tls/libc.so.6 #2 0xb5f62062 in ntop_safemalloc (sz=1, file=0xb5f8d5c5 "hash.c", line=954) at leaks.c:75 #3 0xb5f5b4dc in _lookupHost (hostIpAddress=0xb241cd10, ether_addr=0x0, vlanId=0, checkForMultihoming=0 '\0', forceUsingIPaddress=1 '\001', actualDeviceId=1, file=0xb250b639 "netflowPlugin.c", line=497) at hash.c:954 #4 0xb25051ad in handleGenericFlow (recordActTime=869082435, recordSysUpTime=-1783133275, record=0xb241dab0, deviceId=1) at netflowPlugin.c:497 #5 0xb2505c62 in dissectFlow (buffer=0xb241e200 "", bufferLen=1464, deviceId=1) at netflowPlugin.c:1276 #6 0xb2506b7f in netflowMainLoop (_deviceId=0x1) at netflowPlugin.c:1469 #7 0xb5d0cdac in start_thread () from /lib/tls/libpthread.so.0 #8 0xb5c569ea in clone () from /lib/tls/libc.so.6 Run from the command line: Mon Sep 12 14:21:15 2005 THREADMGMT[t2733636528]: RRD: Throughput data collection: Thread running [p19952] Mon Sep 12 14:21:15 2005 THREADMGMT[t2733636528]: RRD: Started thread for throughput data collection Mon Sep 12 14:21:15 2005 THREADMGMT[t2762632112]: RRD: Data collection thread running [p19952] Segmentation fault (core dumped) # ls -lt | head total 306628 -rw------- 1 root root 565616640 Sep 12 14:30 core.19952 .... # gdb /home/hc05/ntop-mine/bin/ntop core.19952 GNU gdb Red Hat Linux (5.3.90-0.20030710.40rh) (gdb) bt #0 0xb5bf4857 in memset () from /lib/tls/libc.so.6 #1 0x104eec48 in ?? () #2 0xb5f839b1 in resetUsageCounter (counter=0x49c) at util.c:3690 #3 0xb5f83a6c in resetSecurityHostTraffic (el=0x104eec48) at util.c:3713 #4 0xb5f65802 in allocateSecurityHostPkts (srcHost=0x104eec48) at pbuf.c:123 #5 0xb2504d5c in handleGenericFlow (recordActTime=685188419, recordSysUpTime=-1795279914, record=0xaea8aab0, deviceId=6) at netflowPlugin.c:612 #6 0xb2505c62 in dissectFlow (buffer=0xaea8b200 "", bufferLen=1464, deviceId=6) at netflowPlugin.c:1276 #7 0xb2506b7f in netflowMainLoop (_deviceId=0x6) at netflowPlugin.c:1469 #8 0xb5d0cdac in start_thread () from /lib/tls/libpthread.so.0 #9 0xb5c569ea in clone () from /lib/tls/libc.so.6 (gdb) frame 2 #2 0xb5f839b1 in resetUsageCounter (counter=0x49c) at util.c:3690 3690 memset(counter, 0, sizeof(UsageCounter)); (gdb) list 3685 /* ******************************* */ 3686 3687 void resetUsageCounter(UsageCounter *counter) { 3688 int i; 3689 3690 memset(counter, 0, sizeof(UsageCounter)); 3691 3692 for(i=0; i<MAX_NUM_CONTACTED_PEERS; i++) 3693 setEmptySerial(&counter->peersSerials[i]); 3694 } (gdb) print counter $1 = (UsageCounter *) 0x49c (gdb) frame 3 #3 0xb5f83a6c in resetSecurityHostTraffic (el=0x104eec48) at util.c:3713 3713 resetUsageCounter(&el->secHostPkts->nullPktsSent); (gdb) print el $2 = (HostTraffic *) 0x104eec48 (gdb) print el->secHostPkts->nullPktsSent Cannot access memory at address 0x49c (gdb) print el->secHostPkts $3 = (SecurityHostProbes *) 0x0 (gdb) The NULL pointer looks like trouble. (gdb) frame 4 #4 0xb5f65802 in allocateSecurityHostPkts (srcHost=0x104eec48) at pbuf.c:123 123 resetSecurityHostTraffic(srcHost); (gdb) list 118 /* ******************************* */ 119 120 void allocateSecurityHostPkts(HostTraffic *srcHost) { 121 if(srcHost->secHostPkts == NULL) { 122 if((srcHost->secHostPkts = (SecurityHostProbes*)malloc(sizeof(SecurityHostProbes))) == NULL) return; 123 resetSecurityHostTraffic(srcHost); 124 } 125 } 126 127 /* ************************************ */ (gdb) print srcHost->secHostPkts $4 = (SecurityHostProbes *) 0x0 I don't know why the test on line 122 didn't catch the NULL. Also, I don't know why the malloc failed unless it is because the core usage is really big. _______________________________________________ Ntop-dev mailing list [email protected] http://listgateway.unipi.it/mailman/listinfo/ntop-dev _______________________________________________ Ntop-dev mailing list [email protected] http://listgateway.unipi.it/mailman/listinfo/ntop-dev _______________________________________________ Ntop-dev mailing list [email protected] http://listgateway.unipi.it/mailman/listinfo/ntop-dev _______________________________________________ Ntop-dev mailing list [email protected] http://listgateway.unipi.it/mailman/listinfo/ntop-dev
