RE: [Ntop-dev] GDBM data corruption causing ntop crash [was: Data corruption [was: Ntop segfaulting] (fwd)]

Robbert Kouprie Tue, 26 Apr 2005 01:12:01 -0700

Hey,

On Mon, 25 Apr 2005, Burton Strauss wrote:

I think that's a marginal idea Batman ... It sounds - from the ENOMEM - as
though a record is in the database w/ a huge record size.  This means that
any attempt to read it - by any program invoking gdbm_fetch - will crash and
burn - and not in a way we can recover from.


So we segfault??

Some sort of stand-alone utility would be better, but since ntop will
recreate the file if necessary, it's better to just nuke it.  And don't
think about a periodic purge + reorg (you would have to lock for the
duration of the reorg).  That's just asking for trouble :-(

If you would use a standalone utility that nukes corrupted dbms when found, you would have to restart ntop anyway to recreate the dbms. So then we might add it to ntop initialization anyway.

The only benefit from a standalone utility would be a "if (corrupt) {
purge }" style of thing where ntop could stay alive during the purge.

Regards,
Robbert

-----Burton

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf
Of Robbert Kouprie
Sent: Monday, April 25, 2005 2:17 PM
To: [email protected]
Subject: [Ntop-dev] GDBM data corruption causing ntop crash [was: Data
corruption [was: Ntop segfaulting] (fwd)]

Hi Burton,

Thanks for the info!

Burton Strauss wrote:

There's two thoughts - either (1) gdbm db has been corrupted (leading
ntop to put bad stuff into memory) or (2) something else corrupted the
HostTraffic chains.


It looks like (1), see below.

You can try using dumpgdbm (I've posted this before) and/or
dnscachePurge (SourceForge).


Ah, neat tools. I wrote a simple Perl script myself (called "dumpgdbm.pl"
hehe) but your tool is a little bit more verbose.

Anyway, both tools crash on a certain entry in de dbm file:

Below is the strace of your dumpgdbm when failing. First I included two
correctly processed entries. The last one is failing. Notice the integer
'4096' (which should indicate the length of the field?).

lseek(3, 36898923, SEEK_SET)            = 36898923
read(3, "1150659434\0s0106000c6e23cade.ed."..., 83) = 83
write(1, "  \'1150659434\'      : ( 72) 7330"..., 83  '1150659434'
: ( 72) 73303130 36303030 63366532 33636164   s0106000c6e23cad
) = 83
write(1, "                            652e"..., 83
      652e6564 2e736861 77636162 6c652e6e   e.ed.shawcable.n
) = 83
write(1, "                            6574"..., 83
      65740000 00000000 00000000 00000000   et..............
) = 83
write(1, "                            0000"..., 83
      00000000 00000000 00000000 00000000   ................
) = 83
write(1, "                            1f7e"..., 83
      1f7e4742 1d000000                     .~GB............
) = 83
lseek(3, 4242015, SEEK_SET)             = 4242015
read(3, "1143704778\0pcp05033799pcs.plyntv"..., 83) = 83
write(1, "  \'1143704778\'      : ( 72) 7063"..., 83  '1143704778'
: ( 72) 70637030 35303333 37393970 63732e70   pcp05033799pcs.p
) = 83
write(1, "                            6c79"..., 83
      6c796e74 7630312e 6d692e63 6f6d6361   lyntv01.mi.comca
) = 83
write(1, "                            7374"..., 83
      73742e6e 65740000 00000000 00000000   st.net..........
) = 83
write(1, "                            0000"..., 83
      00000000 00000000 00000000 00000000   ................
) = 83
write(1, "                            580e"..., 83
      580e4442 1d000000                     X.DB............
) = 83
lseek(3, 35229696, SEEK_SET)            = 35229696
read(3, "2191640561\0dct9241.dct.tudelft.n"..., 4096) = 4096 mmap2(NULL,
2000834560, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1
ENOMEM (Cannot allocate memory)
brk(0)                                  = 0x816f000
brk(0x7f5a7000)                         = 0x816f000
mmap2(NULL, 2000969728, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
0) = -1 ENOMEM (Cannot allocate memory) mmap2(NULL, 2097152, PROT_NONE,
MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE,
-1, 0) = 0x40862000
munmap(0x40862000, 647168)              = 0
munmap(0x40a00000, 401408)              = 0
mprotect(0x40900000, 135168, PROT_READ|PROT_WRITE) = 0 mmap2(NULL,
2000834560, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1
ENOMEM (Cannot allocate memory)
write(2, "gdbm fatal: ", 12gdbm fatal: )            = 12
write(2, "malloc error", 12malloc error)            = 12
write(2, "\n", 1
)                       = 1
munmap(0xb7fe8000, 4096)                = 0
exit_group(1)                           = ?

Anyway, the database is corrupt for some reason. Let's blame it on the
hardware for now (I'm still testing). But would it be wise to add some kind
of check at ntop start, that tries to read in all gdbms that we use?

It helps taking the blame off ntop when gdbms are corrupt, and it helps
bughunting.

- Robbert
_______________________________________________
Ntop-dev mailing list
[email protected]
http://listgateway.unipi.it/mailman/listinfo/ntop-dev

_______________________________________________
Ntop-dev mailing list
[email protected]
http://listgateway.unipi.it/mailman/listinfo/ntop-dev

_______________________________________________
Ntop-dev mailing list
[email protected]
http://listgateway.unipi.it/mailman/listinfo/ntop-dev

RE: [Ntop-dev] GDBM data corruption causing ntop crash [was: Data corruption [was: Ntop segfaulting] (fwd)]

Reply via email to