RE: [Ntop-dev] Core dump in 3.2rc1 netflow handling

Burton Strauss Tue, 13 Sep 2005 07:46:51 -0700

Is this an SMP or HT box?
-----Burton 

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf
Of [EMAIL PROTECTED]
Sent: Tuesday, September 13, 2005 8:48 AM
To: [email protected]
Subject: RE: [Ntop-dev] Core dump in 3.2rc1 netflow handling


The idea of something concurrent affecting the variables makes sense.  I ran
it again with the trace level set to 5.  It crashed calling the same
function with a NULL pointer from a different place, right after a line
where the same dereference succeeded.

There isn't anything about the idle purge in the trace output.  I've
included the end of the output below but I did search for the words idle and
purge in the rest of the log.

<snip />

                                                                           
             "Burton Strauss"                                              
             <[EMAIL PROTECTED]                                             
             rt.com>                                                    To 
             Sent by:                  <[email protected]>                 
             ntop-dev-bounces@                                          cc 
             unipi.it                                                      
                                                                   Subject 
                                       RE: [Ntop-dev] Core dump in 3.2rc1  
             09/12/2005 04:54          netflow handling                    
             PM                                                            
                                                                           
                                                                           
             Please respond to                                             
             [email protected]                                             
                                                                           
                                                                           




_dl_sysinfo_int80 usually means a deadlock in malloc() or similar call (it's
the user-kernel interface call).

SIGSEGV in the call to malloc() in leaks.c, line 75 is also consistent w/ a
malloc() chain corruption.

Problem is these are all symptoms - the actual problem could be many seconds
or minutes before.

"The NULL pointer looks like trouble.":  Right - here's the code block:

void resetSecurityHostTraffic(HostTraffic *el) {
  if(el->secHostPkts == NULL) return;
  resetUsageCounter(&el->secHostPkts->synPktsSent);
...

What this appears to mean is that it was NOT null at the start of this block
of code and yet is now is.

So... Pls check in the log and see if idle purge was kicking around (you may
need to go to a higher trace level to see the IDLE_PURGE messages)

-----Burton

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf
Of [EMAIL PROTECTED]
Sent: Monday, September 12, 2005 3:25 PM
To: [email protected]
Subject: [Ntop-dev] Core dump in 3.2rc1 netflow handling

I'm having trouble with ntop dying.  I ran it through gdb and come up with
the information listed below.  I ran it once from within gdb and once from
outside, then pulling the core into gdb.  I'm running it on RedHat AS 3 and
built it myself.  Ntop crashes pretty easily so if there is something I
should try let me know and I can try it.


Run inside gdb:


Error type: Segmentation fault

(gdb) info thread
......
  9 Thread -1328637008 (LWP 19784)  0xb75ebc32 in _dl_sysinfo_int80 () from
/lib/ld-linux.so.2
  8 Thread -1315648592 (LWP 19783)  0xb75ebc32 in _dl_sysinfo_int80 () from
/lib/ld-linux.so.2
* 7 Thread -1304302672 (LWP 19782)  0xb5becaaa in _int_malloc () from
/lib/tls/libc.so.6
  6 Thread -1292686416 (LWP 19781)  0xb75ebc32 in _dl_sysinfo_int80 () from
/lib/ld-linux.so.2
  5 Thread -1282196560 (LWP 19780)  0xb75ebc32 in _dl_sysinfo_int80 () from
/lib/ld-linux.so.2
....

(gdb) thread 7
[Switching to thread 7 (Thread -1304302672 (LWP 19782))]#2  0xb5f62062 in
ntop_safemalloc (sz=1,
    file=0xb5f8d5c5 "hash.c", line=954) at leaks.c:75
75        thePtr = malloc(sz);

(gdb) list
70                               */
71        }
72      #endif
73
74      #ifndef USE_GC
75        thePtr = malloc(sz);
76      #else
77        thePtr = GC_malloc_atomic(sz);
78      #endif
79

(gdb) print sz
$4 = 1

(gdb) bt
#0  0xb5becaaa in _int_malloc () from /lib/tls/libc.so.6
#1  0xb5bebdfd in malloc () from /lib/tls/libc.so.6
#2  0xb5f62062 in ntop_safemalloc (sz=1, file=0xb5f8d5c5 "hash.c",
line=954) at leaks.c:75
#3  0xb5f5b4dc in _lookupHost (hostIpAddress=0xb241cd10, ether_addr=0x0,
vlanId=0,
    checkForMultihoming=0 '\0', forceUsingIPaddress=1 '\001',
actualDeviceId=1,
    file=0xb250b639 "netflowPlugin.c", line=497) at hash.c:954
#4  0xb25051ad in handleGenericFlow (recordActTime=869082435,
recordSysUpTime=-1783133275,
    record=0xb241dab0, deviceId=1) at netflowPlugin.c:497
#5  0xb2505c62 in dissectFlow (buffer=0xb241e200 "", bufferLen=1464,
deviceId=1) at netflowPlugin.c:1276
#6  0xb2506b7f in netflowMainLoop (_deviceId=0x1) at netflowPlugin.c:1469
#7  0xb5d0cdac in start_thread () from /lib/tls/libpthread.so.0
#8  0xb5c569ea in clone () from /lib/tls/libc.so.6


Run from the command line:

Mon Sep 12 14:21:15 2005  THREADMGMT[t2733636528]: RRD: Throughput data
collection: Thread running [p19952]
Mon Sep 12 14:21:15 2005  THREADMGMT[t2733636528]: RRD: Started thread for
throughput data collection Mon Sep 12 14:21:15 2005
THREADMGMT[t2762632112]: RRD: Data collection thread running [p19952]
Segmentation fault (core dumped) # ls -lt | head total 306628
-rw-------    1 root     root     565616640 Sep 12 14:30 core.19952
....

# gdb /home/hc05/ntop-mine/bin/ntop core.19952 GNU gdb Red Hat Linux
(5.3.90-0.20030710.40rh)

(gdb) bt
#0  0xb5bf4857 in memset () from /lib/tls/libc.so.6
#1  0x104eec48 in ?? ()
#2  0xb5f839b1 in resetUsageCounter (counter=0x49c) at util.c:3690
#3  0xb5f83a6c in resetSecurityHostTraffic (el=0x104eec48) at util.c:3713
#4  0xb5f65802 in allocateSecurityHostPkts (srcHost=0x104eec48) at
pbuf.c:123
#5  0xb2504d5c in handleGenericFlow (recordActTime=685188419,
recordSysUpTime=-1795279914,
    record=0xaea8aab0, deviceId=6) at netflowPlugin.c:612
#6  0xb2505c62 in dissectFlow (buffer=0xaea8b200 "", bufferLen=1464,
deviceId=6) at netflowPlugin.c:1276
#7  0xb2506b7f in netflowMainLoop (_deviceId=0x6) at netflowPlugin.c:1469
#8  0xb5d0cdac in start_thread () from /lib/tls/libpthread.so.0
#9  0xb5c569ea in clone () from /lib/tls/libc.so.6


(gdb) frame 2
#2  0xb5f839b1 in resetUsageCounter (counter=0x49c) at util.c:3690
3690      memset(counter, 0, sizeof(UsageCounter));
(gdb) list
3685    /* ******************************* */
3686
3687    void resetUsageCounter(UsageCounter *counter) {
3688      int i;
3689
3690      memset(counter, 0, sizeof(UsageCounter));
3691
3692      for(i=0; i<MAX_NUM_CONTACTED_PEERS; i++)
3693        setEmptySerial(&counter->peersSerials[i]);
3694    }
(gdb) print counter
$1 = (UsageCounter *) 0x49c

(gdb) frame 3
#3  0xb5f83a6c in resetSecurityHostTraffic (el=0x104eec48) at util.c:3713
3713      resetUsageCounter(&el->secHostPkts->nullPktsSent);
(gdb) print el
$2 = (HostTraffic *) 0x104eec48
(gdb) print el->secHostPkts->nullPktsSent Cannot access memory at address
0x49c
(gdb) print el->secHostPkts
$3 = (SecurityHostProbes *) 0x0
(gdb)

The NULL pointer looks like trouble.


(gdb) frame 4
#4  0xb5f65802 in allocateSecurityHostPkts (srcHost=0x104eec48) at
pbuf.c:123
123         resetSecurityHostTraffic(srcHost);
(gdb) list
118     /* ******************************* */
119
120     void allocateSecurityHostPkts(HostTraffic *srcHost) {
121       if(srcHost->secHostPkts == NULL) {
122         if((srcHost->secHostPkts =
(SecurityHostProbes*)malloc(sizeof(SecurityHostProbes))) == NULL) return;
123         resetSecurityHostTraffic(srcHost);
124       }
125     }
126
127     /* ************************************ */
(gdb) print srcHost->secHostPkts
$4 = (SecurityHostProbes *) 0x0

I don't know why the test on line 122 didn't catch the NULL.  Also, I don't
know why the malloc failed unless it is because the core usage is really
big.

_______________________________________________
Ntop-dev mailing list
[email protected]
http://listgateway.unipi.it/mailman/listinfo/ntop-dev

_______________________________________________
Ntop-dev mailing list
[email protected]
http://listgateway.unipi.it/mailman/listinfo/ntop-dev


_______________________________________________
Ntop-dev mailing list
[email protected]
http://listgateway.unipi.it/mailman/listinfo/ntop-dev

_______________________________________________
Ntop-dev mailing list
[email protected]
http://listgateway.unipi.it/mailman/listinfo/ntop-dev

RE: [Ntop-dev] Core dump in 3.2rc1 netflow handling

Reply via email to