Hi Josef,

I attach two patches to fix the memory inconsistency if the request is
resend and timed out. 
Could you try them?

- 0001-agentx-master-Return-when-NETSNMP_CALLBACK_OP_RESEND.patch

  This patch was posted by Anders, and I tried to add the description.
  This patch fixes the missing NETSNMP_CALLBACK_OP_RESEND callback.

- 0002-snmplib-snmp_api-Remove-the-request-on-the-session-w.patch

  This patch fixes the race between NETSNMP_CALLBACK_OP_SEND_FAILED 
  and NETSNMP_CALLBACK_OP_TIMED_OUT callback. If the request is failed,
  then remove the request from the internal session.

Thanks,
Masa

On 4/3/19 9:34 AM, Anders Wallin wrote:
> The introduction of that code fixes another issue;
> "commit 56c30b11f3616ea4f0c38a21e08e78f050096020
> Author: Bill Fenner <fen...@gmail.com>
> Date:   Wed Dec 20 21:52:10 2017 +0000
> 
>     NEWS: snmplib: PATCH: 1349: Fix perl/other crash against bad SNMPv3
> agent
> 
>     With the patch in 1214, the snmp_api code assumed that if magic was
>     set, it was the "struct synch-state" from snmp_client.  Of course,
>     magic belongs to the caller, and the perl library uses it differently,
>     so reaching into it is verboten.  Introduce a new callback (that
>     was already introduced in 5.8) to report this "retries exceeded"
>     state, and use it in snmp_client."
> 
> I think the problem is really about shutting down the agentx connection
> when one(1) response is to late. I have
> done 2 patches (one that only write a better log message and one that
> removes the "bad" code.
> With these patches I don't get any crash. I think that 5.7.3 has this issue
> as well, but it can not be crashed with the agentofdead code
> 
> Can you please try this?
> 
> Regards
> Anders Wallin
> 
> 
> On Wed, Apr 3, 2019 at 12:35 PM Josef Ridky <jri...@redhat.com> wrote:
> 
>> Hi,
>>
>> I have compared net-snmp-5.7.3 and net-snmp-5.8 and I have found, that
>> following callbacks in snmplib/snmp_api.c causes the core dump issue:
>>
>> --- old/snmplib/snmp_api.c      2019-04-03 12:13:55.126769866 +0200
>> +++ new/snmplib/snmp_api.c      2019-04-03 12:15:18.353420790 +0200
>> @@ -6731,9 +6731,9 @@ snmp_resend_request(struct session_list
>>          sp->s_snmp_errno = SNMPERR_BAD_SENDTO;
>>          sp->s_errno = errno;
>>          snmp_set_detail(strerror(errno));
>> -        if (rp->callback)
>> +/*        if (rp->callback)
>>              rp->callback(NETSNMP_CALLBACK_OP_SEND_FAILED, sp,
>> -                         rp->pdu->reqid, rp->pdu, rp->cb_data);
>> +                         rp->pdu->reqid, rp->pdu, rp->cb_data);*/
>>          return -1;
>>      } else {
>>          netsnmp_get_monotonic_clock(&now);
>> @@ -6743,9 +6743,9 @@ snmp_resend_request(struct session_list
>>          tv.tv_sec += tv.tv_usec / 1000000L;
>>          tv.tv_usec %= 1000000L;
>>          rp->expireM = tv;
>> -        if (rp->callback)
>> +/*        if (rp->callback)
>>              rp->callback(NETSNMP_CALLBACK_OP_RESEND, sp,
>> -                         rp->pdu->reqid, rp->pdu, rp->cb_data);
>> +                         rp->pdu->reqid, rp->pdu, rp->cb_data);*/
>>      }
>>      return 0;
>>  }
>>
>> Without them, all works as expected.
>>
>> Josef Ridky
>> Software Engineer
>> Core Services Team
>> Red Hat Czech, s.r.o.
>>
>> ----- Original Message -----
>> | From: "Anders Wallin" <walli...@gmail.com>
>> | To: "Josef Ridky" <jri...@redhat.com>
>> | Cc: "net-snmp-coders" <net-snmp-coders@lists.sourceforge.net>
>> | Sent: Tuesday, April 2, 2019 6:27:54 PM
>> | Subject: Re: Core dump with net-snmp-5.8
>> |
>> | Hi Josef,
>> | I can reproduce the issue using the master branch, I will take a look at
>> it
>> | later tonight or tomorrow
>> |
>> | Regards
>> | Anders Wallin
>> |
>> |
>> | On Tue, Apr 2, 2019 at 3:42 PM Josef Ridky <jri...@redhat.com> wrote:
>> |
>> | > Hi,
>> | >
>> | > thanks for your patch. Unfortunately, even when I have applied it, it
>> | > still ends with core dump due of 'double free or corruption (fasttop)'
>> | >
>> | > When I run snmpd with -Dsnmp_agent,agentx/master it ends with:
>> | >
>> | > agentx/master: sending pdu (req=0x1d4,trans=0x1d3,sess=0x5)
>> | > snmp_agent: delegate session == 0x56207e165240
>> | > snmp_agent: end of handle_snmp_packet, asp = 0x56207e165240
>> | > agentx/master: callback resend
>> | > agentx/master: callback resend
>> | > agentx/master: timeout on session 0x56207dfd5400 req=0x1c9
>> | > agentx/master: close 0x56207dfd5400, -1
>> | > snmp_agent: removed 40 delegated request(s) for session 0x56207dfce490
>> | > snmp_agent: processing delegated request, asp = 0x56207e165240
>> | > snmp_agent: canceling next walk for asp 0x56207e165240
>> | > snmp_agent: REMOVE session == 0x56207e165240
>> | > snmp_agent: agent_session 0x56207e165240 released
>> | > snmp_agent: processing delegated request, asp = 0x56207e1041a0
>> | > snmp_agent: canceling next walk for asp 0x56207e1041a0
>> | > snmp_agent: REMOVE session == 0x56207e1041a0
>> | > snmp_agent: agent_session 0x56207e1041a0 released
>> | > snmp_agent: processing delegated request, asp = 0x56207e1656c0
>> | > snmp_agent: canceling next walk for asp 0x56207e1656c0
>> | > snmp_agent: REMOVE session == 0x56207e1656c0
>> | > snmp_agent: agent_session 0x56207e1656c0 released
>> | > snmp_agent: processing delegated request, asp = 0x56207e11af40
>> | > snmp_agent: canceling next walk for asp 0x56207e11af40
>> | > snmp_agent: REMOVE session == 0x56207e11af40
>> | > snmp_agent: agent_session 0x56207e11af40 released
>> | > snmp_agent: processing delegated request, asp = 0x56207e118f00
>> | > snmp_agent: canceling next walk for asp 0x56207e118f00
>> | > snmp_agent: REMOVE session == 0x56207e118f00
>> | > snmp_agent: agent_session 0x56207e118f00 released
>> | > snmp_agent: processing delegated request, asp = 0x56207e11b540
>> | > snmp_agent: canceling next walk for asp 0x56207e11b540
>> | > snmp_agent: REMOVE session == 0x56207e11b540
>> | > snmp_agent: agent_session 0x56207e11b540 released
>> | > snmp_agent: processing delegated request, asp = 0x56207e11bd00
>> | > snmp_agent: canceling next walk for asp 0x56207e11bd00
>> | > snmp_agent: REMOVE session == 0x56207e11bd00
>> | > snmp_agent: agent_session 0x56207e11bd00 released
>> | > agentx/master: Continue removing delegated subsession reqests
>> | > agentx/master: close transport
>> | > snmp_agent: REMOVE session == 0x56207dfd5400
>> | > agentx/master: response too late on session 0x56207dfd5400
>> | > agentx/master: response too late on session 0x56207dfd5400
>> | > double free or corruption (fasttop)
>> | > Aborted (core dumped)
>> | >
>> | >
>> | > What's interesting, when I run it with -DALL it pass (at least for
>> several
>> | > rounds).
>> | > It looks like some strange race condition.
>> | >
>> | > Regards
>> | >
>> | > Josef Ridky
>> | > Software Engineer
>> | > Core Services Team
>> | > Red Hat Czech, s.r.o.
>> | >
>> | > ----- Original Message -----
>> | > | From: "Anders Wallin" <walli...@gmail.com>
>> | > | To: "Josef Ridky" <jri...@redhat.com>
>> | > | Cc: "net-snmp-coders" <net-snmp-coders@lists.sourceforge.net>
>> | > | Sent: Tuesday, April 2, 2019 1:46:40 PM
>> | > | Subject: Re: Core dump with net-snmp-5.8
>> | > |
>> | > | Hi Josef,
>> | > |
>> | > | I think it's the same issue as
>> | > https://sourceforge.net/p/net-snmp/bugs/2914/
>> | > | (where I also posted the solution)
>> | > | Regards
>> | > | Anders Wallin
>> | > |
>> | > |
>> | > | On Tue, Apr 2, 2019 at 12:43 PM Josef Ridky <jri...@redhat.com>
>> wrote:
>> | > |
>> | > | > Hi,
>> | > | >
>> | > | > recently, I have hit to an issue in net-snmp-5.8, that is
>> connected to
>> | > the
>> | > | > bug report [1].
>> | > | >
>> | > | > When I tried to run agentofdeath test from [1], snmpd daemon will
>> crash
>> | > | > with malloc(): smallbin double linked list corrupted or double
>> free()
>> | > issue
>> | > | > and dumps core (see bellow).
>> | > | > From log file, I can identified one issue with "Unknown operation".
>> | > | >
>> | > | > This issue is in the agentx_got_response function
>> | > | > (agent/mibgroup/agentx/master.c). There isn't implemented action
>> for
>> | > | > NETSNMP_CALLBACK_OP_RESEND (defined in
>> | > | > include/net-snmp/library/snmp_api.h).
>> | > | > As result "Unknown operation 6 in agentx_got_response" is shown in
>> log
>> | > | > file.
>> | > | >
>> | > | > /var/log/messages
>> | > | > -------------------------------
>> | > | > Mar 28 06:52:42 localhost snmpd[12073]: Unknown operation 6 in
>> | > | > agentx_got_response
>> | > | > Mar 28 06:52:43 localhost snmpd[12073]: Unknown operation 6 in
>> | > | > agentx_got_response
>> | > | > Mar 28 06:52:43 localhost snmpd[12073]: malloc(): smallbin double
>> | > linked
>> | > | > list corrupted
>> | > | > Mar 28 06:52:43 localhost systemd[1]: Started Process Core Dump
>> (PID
>> | > | > 13652/UID 0).
>> | > | > Mar 28 06:52:48 localhost systemd[1]: snmpd.service: Main process
>> | > exited,
>> | > | > code=dumped, status=6/ABRT
>> | > | > Mar 28 06:52:48 localhost systemd[1]: snmpd.service: Failed with
>> result
>> | > | > 'core-dump'.
>> | > | > -------------------------------
>> | > | >
>> | > | > The "Unknown operation" callback is caused by newly added piece of
>> | > code in
>> | > | > snmplib/snmp_api.c:
>> | > | >
>> | > | >  static int
>> | > | >  snmp_resend_request(struct session_list *slp, netsnmp_request_list
>> | > *rp,
>> | > | >  int incr_retries)
>> | > | >  {
>> | > | >
>> | > | > ...
>> | > | >
>> | > | >          tv.tv_sec += tv.tv_usec / 1000000L;
>> | > | >          tv.tv_usec %= 1000000L;
>> | > | >          rp->expireM = tv;
>> | > | > +        if (rp->callback)
>> | > | > +            rp->callback(NETSNMP_CALLBACK_OP_RESEND, sp,
>> | > | > +                         rp->pdu->reqid, rp->pdu, rp->cb_data);
>> | > | >      }
>> | > | >      return 0;
>> | > | >  }
>> | > | >
>> | > | >
>> | > | > When I tried to remove it, it just stop complaining about
>> operation 6,
>> | > but
>> | > | > the core dump is still present.
>> | > | >
>> | > | > May I ask you for help with this issue? Do you have any idea, what
>> | > causing
>> | > | > this issue in 5.8 and how to fix it?
>> | > | > I know, that Jan Safranek has fixed this for 5.7 by commit [2],
>> but it
>> | > | > looks like something other has changed and this issue is current
>> again.
>> | > | >
>> | > | > [1] https://sourceforge.net/p/net-snmp/bugs/2411/
>> | > | > [2]
>> | > | >
>> | >
>> https://github.com/net-snmp/net-snmp/commit/793d596838ff7cb48a73b675d62897c56c9e62df
>> | > | >
>> | > | > Regards
>> | > | >
>> | > | > Josef Ridky
>> | > | > Software Engineer
>> | > | > Core Services Team
>> | > | > Red Hat Czech, s.r.o.
>> | > | >
>> | > | >
>> | > | >
>> | > | > _______________________________________________
>> | > | > Net-snmp-coders mailing list
>> | > | > Net-snmp-coders@lists.sourceforge.net
>> | > | > https://lists.sourceforge.net/lists/listinfo/net-snmp-coders
>> | > | >
>> | > |
>> | >
>> |
>>
> 
> 
> 
> _______________________________________________
> Net-snmp-coders mailing list
> Net-snmp-coders@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/net-snmp-coders
> 
From: Anders Wallin <walli...@gmail.com>
Date: Sun, 7 Apr 2019 18:31:16 -0400
Subject: [PATCH 1/2] agentx/master: Return when NETSNMP_CALLBACK_OP_RESEND is
 set to the callback

snmpd is terminated abnormally due to the double free for the
request cache after the request is resend.

That is because the callback for NETSNMP_CALLBACK_OP_RESEND isn't
cared and the cache is freed wrongly.

Let's just return if NETSNMP_CALLBACK_OP_RESEND is set on the
callback.

Fixes: b7b50bbac ("snmp_send callback updates")

Signed-off-by: Anders Wallin <walli...@gmail.com>
---
 agent/mibgroup/agentx/master.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/agent/mibgroup/agentx/master.c b/agent/mibgroup/agentx/master.c
index 99c4123ed..b7a51dfef 100644
--- a/agent/mibgroup/agentx/master.c
+++ b/agent/mibgroup/agentx/master.c
@@ -280,6 +280,11 @@ agentx_got_response(int operation,
         netsnmp_free_delegated_cache(cache);
         return 0;
 
+    case NETSNMP_CALLBACK_OP_RESEND:
+        DEBUGMSGTL(("agentx/master", "resend on session %8p req=0x%x\n",
+                    session, (unsigned)reqid));
+        return 0;
+
     case NETSNMP_CALLBACK_OP_RECEIVED_MESSAGE:
         /*
          * This session is alive 
-- 
2.20.1

From: Masayoshi Mizuma <m.miz...@jp.fujitsu.com>
Date: Sun, 7 Apr 2019 19:22:11 -0400
Subject: [PATCH 2/2] snmplib/snmp_api: Remove the request on the session when
 the sending is failed

snmpd is terminated abnormally due to an invalid memory access after
the sending of a request is failed.

The time out callback for the failed request is executed when the
session is closing because the request remains in the internal session.
The cleanup for the request is executed on the
callback(NETSNMP_CALLBACK_OP_SEND_FAILED,) and also on the time out
callback(NETSNMP_CALLBACK_OP_TIMED_OUT,), so the wrong memory access
happens.

Remove the failed request from the internal session after the callback
for the failed request is done.

Signed-off-by: Masayoshi Mizuma <m.miz...@jp.fujitsu.com>
Reported-by: Shogo Matsumoto <shogo.matsum...@jp.fujitsu.com>
---
 snmplib/snmp_api.c | 22 +++++++++++++++-------
 1 file changed, 15 insertions(+), 7 deletions(-)

diff --git a/snmplib/snmp_api.c b/snmplib/snmp_api.c
index 554767a83..683cec681 100644
--- a/snmplib/snmp_api.c
+++ b/snmplib/snmp_api.c
@@ -6751,6 +6751,18 @@ snmp_resend_request(struct session_list *slp, netsnmp_request_list *rp,
 }
 
 
+static void
+remove_request(struct snmp_internal_session *isp,
+               netsnmp_request_list *orp, netsnmp_request_list *rp)
+{
+    if (orp)
+        orp->next_request = rp->next_request;
+    else
+        isp->requests = rp->next_request;
+    if (isp->requestsEnd == rp)
+        isp->requestsEnd = orp;
+    snmp_free_pdu(rp->pdu);
+}
 
 void
 snmp_sess_timeout(void *sessp)
@@ -6813,17 +6825,13 @@ snmp_sess_timeout(void *sessp)
                     callback(NETSNMP_CALLBACK_OP_TIMED_OUT, sp,
                              rp->pdu->reqid, rp->pdu, magic);
                 }
-                if (orp)
-                    orp->next_request = rp->next_request;
-                else
-                    isp->requests = rp->next_request;
-                if (isp->requestsEnd == rp)
-                    isp->requestsEnd = orp;
-                snmp_free_pdu(rp->pdu);
+                remove_request(isp, orp, rp);
                 freeme = rp;
                 continue;       /* don't update orp below */
             } else {
                 if (snmp_resend_request(slp, rp, TRUE)) {
+                    remove_request(isp, orp, rp);
+                    freeme = rp;
                     break;
                 }
             }
-- 
2.20.1

_______________________________________________
Net-snmp-coders mailing list
Net-snmp-coders@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/net-snmp-coders

Reply via email to