It’s a nasty one to find. My colleague helped me to solve this. This problem will occur only when we are running snmp agent in a switch. (Devices having less memory)

Here’s the scoop.

The problem happens very early @ init time before going into the receive loop.

 

The main problem is in real_init_master(), we free agentx_sockets which is netsnmp_ds_strings[1][1] but we never reset netsnmp_ds_strings[1][1] to 0. Later on, we calloc an slp which has the same ptr as netsnmp_ds_strings[1][1] & add the slp to the linked list. Since netsnmp_ds_strings[1][1] is not 0, we temper with it which will screw up the linked list.

This problem is nasty & difficult to find because when we catch the corrupted linked list, it’s already way too late.

 

Here’s the best way for you to see the problem with gdb

 

main   

  init_snmp

    read_configs

      read_config_with_type    

        read_config

          run_config_handler

            agentx_parse_agentx_socket

              netsnmp_ds_set_string

                netsnmp_ds_strings[1][1] = strdup() = 0x100ab208 "localhost:705"

 

  init_master_agent

    real_init_master

      SNMP_FREE(agentx_sockets);    <= agentx_sockets = netsnmp_ds_strings[1][1]

      Here's the problem, we free but never reset netsnmp_ds_strings[1][1] to 0

   ...

   (We are still in init_master_agent)

   netsnmp_register_agent_nsap

     snmp_add

       snmp_sess_add_ex

         snmp_sess_copy

           _sess_copy

             slp = calloc() = 0x100ab208

                    <= Now both slp & netsnmp_ds_strings[1][1] have the same ptr

 

    Note that from this point on, netsnmp_ds_get_string(1, 1) will return the

    same pointer as slp and we are puting junk into the slp

       

Now when you do something, the problem will get worse & it will crash.

run_config_handler(token="agentxSocket", cptr="localhost:705")

  agentx_parse_agentx_socket(token="agentxSocket", cptr="localhost:705")

    netsnmp_ds_set_string(storeid=1, which=1, cptr="localhost:705")

      if (netsnmp_ds_strings[storeid][which] != NULL) {

        free(netsnmp_ds_strings[storeid][which]);  <= PROBLEM; freeing the slp to compound the problem [X]

        netsnmp_ds_strings[storeid][which] = NULL;

      }

      netsnmp_ds_strings[storeid][which] = strdup(value);

 

fix for this is as follows,

            Define a function in default_store.c file which sets netsnmp_ds_strings[storeid][which] to NULL and call it from real_init_master() after SNMP_FREE(agentx_sockets). Freeing twice a pointer was causing core dump.

 


From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Guddenahalli Naganna, Jayaprakasha
Sent: Wednesday, May 31, 2006 1:06 PM
To: [email protected]
Subject: SNMP Daemon crashed on sending SIGHUP signal by deleting trap2sink entry from configuration file.

 

Net-snmp version 5.2.1

 

Platform details.

SNMP master agent is running on a switch which is having “MontaVista 6.0-8.0.7.0300532 2003-12-24” operating system.

 

Steps to reproduce the crash as follows,

 

1. Configurations file should contain the following entries, (order should be same)

master agentx

agentxSocket localhost:705

agentaddress 161

trap2sink 192.168.33.2 public

2. Start master agent

3. Remove the trap2sink entry from configuration file.

            Modified configuration file content looks like this,

            master agentx

agentxSocket localhost:705

agentaddress 161

 

4. Send SIGHUP to master agent

            Now snmp daemon will crash. Crash is consistent only with the above procedure. Here is the gdb trace,

 

 

[EMAIL PROTECTED]:/nh/bin# gdb -d .. master_agent

GNU gdb 6.0 (MontaVista 6.0-8.0.7.0300532 2003-12-24)

Copyright 2003 Free Software Foundation, Inc.

GDB is free software, covered by the GNU General Public License, and you are

welcome to change it and/or distribute copies of it under certain conditions.

Type "show copying" to see the conditions.

There is absolutely no warranty for GDB.  Type "show warranty" for details.

This GDB was configured as "powerpc-hardhat-linux"...at

(gdb) att 351

Attaching to program: /nh/bin/master_agent, process 351

Reading symbols from /lib/libm.so.6...done.

Loaded symbols for /lib/libm.so.6

Reading symbols from /lib/libresolv.so.2...done.

Loaded symbols for /lib/libresolv.so.2

Reading symbols from /lib/libcrypt.so.1...done.

Loaded symbols for /lib/libcrypt.so.1

Reading symbols from /usr/lib/libelf.so.0...done.

Loaded symbols for /usr/lib/libelf.so.0

Reading symbols from /lib/librt.so.1...done.

Loaded symbols for /lib/librt.so.1

Reading symbols from /lib/libc.so.6...done.

Loaded symbols for /lib/libc.so.6

Reading symbols from /lib/libpthread.so.0...done.

Loaded symbols for /lib/libpthread.so.0

Reading symbols from /lib/ld.so.1...done.

Loaded symbols for /lib/ld.so.1

Reading symbols from /lib/libnss_files.so.2...done.

Loaded symbols for /lib/libnss_files.so.2

0x0fe02fac in select () from /lib/libc.so.6

(gdb) c

Continuing.

 

Program received signal SIGHUP, Hangup.

0x0fe02fac in select () from /lib/libc.so.6

(gdb) c

Continuing.

 

Program received signal SIGSEGV, Segmentation fault.

snmp_sess_select_info (sessp=0x10083490, numfds=0x7ffffc40, fdset=0x7ffffab8,

    timeout=0x7ffffc38, block=0x7ffffc44)

    at gated/src/snmp/libs/snmplib/snmp_api.c:5714

5714            if (slp->transport->sock == -1) {

(gdb)

 

 

           

_______________________________________________
Net-snmp-coders mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/net-snmp-coders

Reply via email to