From everything, it sounds like it is probably a stray
pointer (one that isn't pointing at a valid HostTraffic entry - what you see
is whatever happens to be in those bytes).
At this point, I don't have anything nice to offer.
You probably need to run under Valgrind to detect the bad pointer reference
before it kills ntop (after it's there it's too late to figure out why).
Now the normal script (packages/RedHat/ntop.init) has a valgrind option,
but
(1) It will slow ntop down
substantially
(2) It runs in the foreground
(3) You probably will need to build up a spec file to teach
Valgrind to ignore the glibc et al issues (lots of the standard C library
code has memory reference problems). Here is what I'm using:
## ntop specific suppress commands for
valgrind
##
##
# Skip over standard library errors
##GDBM:
{
write_nocancel_in_gdbm_open
Memcheck:Param
write(buf)
fun:__write_nocancel
fun:gdbm_open
}
{
write_nocancel_in_inlined_gdbm_open
Memcheck:Param
write(buf)
fun:__write_nocancel
fun:initSingleGdbm
}
{
libc_in_gdbm_open
Memcheck:Param
write(buf)
obj:/lib/tls/libc-2.3.3.so
fun:gdbm_open
}
{
write_nocancel_in_gdbm_store
Memcheck:Param
write(buf)
fun:__write_nocancel
fun:_gdbm_write_bucket
fun:_gdbm_end_update
fun:gdbm_store
}
{
write_nocancel_in_gdbm_open
Memcheck:Param
write(buf)
fun:__write_nocancel
fun:gdbm_open
}
{
write_nocancel_in_inlined_gdbm_open
Memcheck:Param
write(buf)
fun:__write_nocancel
fun:initSingleGdbm
}
{
libc_in_gdbm_open
Memcheck:Param
write(buf)
obj:/lib/tls/libc-2.3.3.so
fun:gdbm_open
}
{
write_nocancel_in_gdbm_store
Memcheck:Param
write(buf)
fun:__write_nocancel
fun:_gdbm_write_bucket
fun:_gdbm_end_update
fun:gdbm_store
}
##libpng:
{
png_write_deflate
Memcheck:Cond
obj:/usr/lib/libz.so.1.2.1.2
obj:/usr/lib/libz.so.1.2.1.2
fun:deflate
fun:png_write_filtered_row
}
{
png_write_deflate
Memcheck:Cond
obj:/usr/lib/libz.so.1.2.1.2
obj:/usr/lib/libz.so.1.2.1.2
fun:deflate
fun:png_write_filtered_row
}
##libdl:
{
dl_relocate_object_in_dl_open
Memcheck:Cond
fun:_dl_relocate_object
fun:dl_open_worker
fun:_dl_catch_error
fun:__GI__dl_open
}
{
dl_relocate_object_in_dl_open
Memcheck:Cond
fun:_dl_relocate_object
fun:dl_open_worker
fun:_dl_catch_error
fun:__GI__dl_open
}
##libpcap:
{
sendto_in_pcap_findalldevs
Memcheck:Param
socketcall.sendto(msg)
fun:sendto
fun:getifaddrs
fun:pcap_findalldevs
}
{
sendto_in_pcap_findalldevs
Memcheck:Param
socketcall.sendto(msg)
fun:sendto
fun:getifaddrs
fun:pcap_findalldevs
}
##glibc:
{
localtime_tz_getenv2
Memcheck:Addr2
fun:getenv
fun:tzset_internal
fun:__tz_convert
fun:localtime
}
{
localtime_tz_getenv4
Memcheck:Addr4
fun:getenv
fun:tzset_internal
fun:__tz_convert
fun:localtime
}
{
strftime_tz_getenv2
Memcheck:Addr2
fun:getenv
fun:tzset_internal
fun:tzset
fun:strftime_l
fun:strftime
}
{
strftime_tz_getenv4
Memcheck:Addr4
fun:getenv
fun:tzset_internal
fun:tzset
fun:strftime_l
fun:strftime
}
{
sem_wait_futex
Memcheck:Param
futex(uaddr2)
fun:sem_wait@@GLIBC_2.1
}
{
localtime_tz_getenv2
Memcheck:Addr2
fun:getenv
fun:tzset_internal
fun:__tz_convert
fun:localtime
}
{
localtime_tz_getenv4
Memcheck:Addr4
fun:getenv
fun:tzset_internal
fun:__tz_convert
fun:localtime
}
{
strftime_tz_getenv2
Memcheck:Addr2
fun:getenv
fun:tzset_internal
fun:tzset
fun:strftime_l
fun:strftime
}
{
strftime_tz_getenv4
Memcheck:Addr4
fun:getenv
fun:tzset_internal
fun:tzset
fun:strftime_l
fun:strftime
}
{
sem_wait_futex
Memcheck:Param
futex(uaddr2)
fun:sem_wait@@GLIBC_2.1
}
##zlib:
{
gzflush_deflate
Memcheck:Cond
obj:/usr/lib/libz.so.1.2.1.2
obj:/usr/lib/libz.so.1.2.1.2
fun:deflate
obj:/usr/lib/libz.so.1.2.1.2
fun:gzflush
}
{
gzflush_deflate
Memcheck:Cond
obj:/usr/lib/libz.so.1.2.1.2
obj:/usr/lib/libz.so.1.2.1.2
fun:deflate
obj:/usr/lib/libz.so.1.2.1.2
fun:gzflush
}
# Cleanup crud...
(4)
Use valgrind 2.4.x - I have not tried ntop under valgrind 3.0 (it was released
this month).
When you start running under valgrind,
you'll see this:
$ sudo /etc/init.d/ntopdevel valgrind
Stopping...
Stopping ntopdevel: Running under valgrind, parameters are
Starting ntopdevel:
/usr/bin/valgrind --verbose --suppressions=/devel/ntop/valgrind.supp --logfile=/tmp/valgrind.log --leak-check=yes --leak-resolution=high --track-fds=yes --trace-children=yes --gen-suppressions=yes --sanity-level=0 /devel/bin/ntop -i eth1,eth2 @/etc/ntopdevel.conf
Processing file /etc/ntopdevel.conf for parameters...
Thu Aug 25 09:00:05 2005 NOTE: Interface merge enabled by default
Thu Aug 25 09:00:07 2005 Initializing gdbm databases
Thu Aug 25 09:00:08 2005 **WARNING** Unknown preference: ntop.filterExpressionInExtraFrame, value = 1
Thu Aug 25 09:00:08 2005 NOTE: Interface merge disabled from prefs file
Thu Aug 25 09:00:08 2005 ntop v.3.1.051 traceEventTemp
Thu Aug 25 09:00:08 2005 Configured on Aug 18 2005 17:26:42, built on Aug 18 2005 17:34:24.
Thu Aug 25 09:00:08 2005 Copyright 1998-2005 by Luca Deri <[EMAIL PROTECTED]>
Thu Aug 25 09:00:08 2005 Get the freshest ntop from http://www.ntop.org/
Thu Aug 25 09:00:08 2005 NOTE: ntop is running from '/devel/bin'
Thu Aug 25 09:00:08 2005 NOTE: (but see warning on man page for the --instance parameter)
Thu Aug 25 09:00:08 2005 NOTE: ntop libraries are in '/devel/lib'
Thu Aug 25 09:00:08 2005 Initializing ntop
Thu Aug 25 09:00:08 2005 Checking eth1 for additional devices
Thu Aug 25 09:00:08 2005 Resetting traffic statistics for device eth1
Thu Aug 25 09:00:08 2005 DLT: Device 0 [eth1] is 1, mtu 1514, header 14
Stopping...
Stopping ntopdevel: Running under valgrind, parameters are
Starting ntopdevel:
/usr/bin/valgrind --verbose --suppressions=/devel/ntop/valgrind.supp --logfile=/tmp/valgrind.log --leak-check=yes --leak-resolution=high --track-fds=yes --trace-children=yes --gen-suppressions=yes --sanity-level=0 /devel/bin/ntop -i eth1,eth2 @/etc/ntopdevel.conf
Processing file /etc/ntopdevel.conf for parameters...
Thu Aug 25 09:00:05 2005 NOTE: Interface merge enabled by default
Thu Aug 25 09:00:07 2005 Initializing gdbm databases
Thu Aug 25 09:00:08 2005 **WARNING** Unknown preference: ntop.filterExpressionInExtraFrame, value = 1
Thu Aug 25 09:00:08 2005 NOTE: Interface merge disabled from prefs file
Thu Aug 25 09:00:08 2005 ntop v.3.1.051 traceEventTemp
Thu Aug 25 09:00:08 2005 Configured on Aug 18 2005 17:26:42, built on Aug 18 2005 17:34:24.
Thu Aug 25 09:00:08 2005 Copyright 1998-2005 by Luca Deri <[EMAIL PROTECTED]>
Thu Aug 25 09:00:08 2005 Get the freshest ntop from http://www.ntop.org/
Thu Aug 25 09:00:08 2005 NOTE: ntop is running from '/devel/bin'
Thu Aug 25 09:00:08 2005 NOTE: (but see warning on man page for the --instance parameter)
Thu Aug 25 09:00:08 2005 NOTE: ntop libraries are in '/devel/lib'
Thu Aug 25 09:00:08 2005 Initializing ntop
Thu Aug 25 09:00:08 2005 Checking eth1 for additional devices
Thu Aug 25 09:00:08 2005 Resetting traffic statistics for device eth1
Thu Aug 25 09:00:08 2005 DLT: Device 0 [eth1] is 1, mtu 1514, header 14
...
You
also need another host session to monitor the log file, which is in /tmp
from my script:
$ ls -lt /tmp
...
-rw------- 1 root
root 6775 Aug 25 09:00
valgrind.log.pid7937
...
I
usually use watch:
# watch -n1 tail -n25 /tmp/valgrind.log.pid7937
If
valgrind finds a problem, it will show you the lines causing the error and (with
the parameters in the script) stop waiting for you to tell it to generate the
suppression message:
==7937== ---- Print suppression ? --- [Return/N/n/Y/y/C/c]
----
But
remember, you have to press the letter key back on the 1st
session (the running ntop). Since we're less interested in specific
problems, C (continue w/o stopping again) is probably the best choice. (Or
you could edit the script and comment out the line which adds --gen-suppressions=yes).
This
will be a huge file if ntop runs a long time (every thread destroyed will cause
a search for memory leaks, etc.) (and every page you pull will spin
threads). But it's the only way I know to find the problem (of
course, since Valgrind changes the malloc routines, it's also possible that the
problem will NEVER surface under valgrind. That's what makes these
problems so much fun).
-----Burton
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Marc Mazuhelli
Sent: Thursday, August 25, 2005 7:37 AM
To: [email protected]
Subject: [Ntop-dev] The "Bad magic number" message is back... (expected=1968/real=10248)
On 08/19/05 10:03, you wrote :
Secondly, there is now an 'unmagic' number, 1290, which is set in HostTraffic during the start of the delete process. Because of the timing of the messages (between selection and completion of the purge), I'm suspicious that we are reusing deleted entries and/or have a pointer to them during the delete process. So if your messages become:
Aug 17 06:46:34 gbp ntop[23566]: [MSGID8850278] **WARNING** Error: bad magic number [expected=1968/real=1290][deviceId=0] getFirstHost()[address.c/54]
That will tell us the problem is reusing deleted entries. If it's still zero, then the problem is something else.
Guess what: after a few (probably unrelated) crashes, the bad magic number message came back:
Aug 24 22:53:07 gbp ntop[1808]: [MSGID8931710] RRD: Cycle 153 ended, 59 RRDs updated
Aug 24 22:53:20 gbp ntop[1808]: [MSGID0825709] IDLE_PURGE: Device 0 [eth1] FINISHED selection, 1747 [out of 10213] hosts selected
Aug 24 22:53:21 gbp ntop[1808]: [MSGID8477291] IDLE_PURGE: Device 0 [eth1]: 1747/10212 hosts deleted, elapsed time is 0.051907 seconds (0.000030 per host)
Aug 24 22:53:28 gbp ntop[1808]: [MSGID8757584] SFP: Ending fingerprint scan cycle 304 - checked 131, resolved 0
Aug 24 22:55:21 gbp ntop[1808]: [MSGID0825709] IDLE_PURGE: Device 0 [eth1] FINISHED selection, 1409 [out of 9895] hosts selected
Aug 24 22:55:21 gbp ntop[1808]: [MSGID8477291] IDLE_PURGE: Device 0 [eth1]: 1409/9894 hosts deleted, elapsed time is 0.051273 seconds (0.000036 per host)
Aug 24 22:55:59 gbp ntop[1808]: [MSGID8757584] SFP: Ending fingerprint scan cycle 305 - checked 126, resolved 0
Aug 24 22:56:53 gbp ntop[1808]: [MSGID8781395] **ERROR** Bad magic number (expected=1968/real=10248) [deviceId=0] lookupHost()[pbuf.c/1088]
I greped for the unmagic number (1920) and found it in globals-defines.h but didnt find any reference to 10248. Where does that number come from?
This time all ntop processes have died so there is no process left to run gdb on.
Regards,
Marc.
--
Marc Mazuhelli
Computer security analyst
Service des technologies de l'information
Université de Sherbrooke
_______________________________________________ Ntop-dev mailing list [email protected] http://listgateway.unipi.it/mailman/listinfo/ntop-dev
