Hi Josef, I attach two patches to fix the memory inconsistency if the request is resend and timed out. Could you try them?
- 0001-agentx-master-Return-when-NETSNMP_CALLBACK_OP_RESEND.patch This patch was posted by Anders, and I tried to add the description. This patch fixes the missing NETSNMP_CALLBACK_OP_RESEND callback. - 0002-snmplib-snmp_api-Remove-the-request-on-the-session-w.patch This patch fixes the race between NETSNMP_CALLBACK_OP_SEND_FAILED and NETSNMP_CALLBACK_OP_TIMED_OUT callback. If the request is failed, then remove the request from the internal session. Thanks, Masa On 4/3/19 9:34 AM, Anders Wallin wrote: > The introduction of that code fixes another issue; > "commit 56c30b11f3616ea4f0c38a21e08e78f050096020 > Author: Bill Fenner <fen...@gmail.com> > Date: Wed Dec 20 21:52:10 2017 +0000 > > NEWS: snmplib: PATCH: 1349: Fix perl/other crash against bad SNMPv3 > agent > > With the patch in 1214, the snmp_api code assumed that if magic was > set, it was the "struct synch-state" from snmp_client. Of course, > magic belongs to the caller, and the perl library uses it differently, > so reaching into it is verboten. Introduce a new callback (that > was already introduced in 5.8) to report this "retries exceeded" > state, and use it in snmp_client." > > I think the problem is really about shutting down the agentx connection > when one(1) response is to late. I have > done 2 patches (one that only write a better log message and one that > removes the "bad" code. > With these patches I don't get any crash. I think that 5.7.3 has this issue > as well, but it can not be crashed with the agentofdead code > > Can you please try this? > > Regards > Anders Wallin > > > On Wed, Apr 3, 2019 at 12:35 PM Josef Ridky <jri...@redhat.com> wrote: > >> Hi, >> >> I have compared net-snmp-5.7.3 and net-snmp-5.8 and I have found, that >> following callbacks in snmplib/snmp_api.c causes the core dump issue: >> >> --- old/snmplib/snmp_api.c 2019-04-03 12:13:55.126769866 +0200 >> +++ new/snmplib/snmp_api.c 2019-04-03 12:15:18.353420790 +0200 >> @@ -6731,9 +6731,9 @@ snmp_resend_request(struct session_list >> sp->s_snmp_errno = SNMPERR_BAD_SENDTO; >> sp->s_errno = errno; >> snmp_set_detail(strerror(errno)); >> - if (rp->callback) >> +/* if (rp->callback) >> rp->callback(NETSNMP_CALLBACK_OP_SEND_FAILED, sp, >> - rp->pdu->reqid, rp->pdu, rp->cb_data); >> + rp->pdu->reqid, rp->pdu, rp->cb_data);*/ >> return -1; >> } else { >> netsnmp_get_monotonic_clock(&now); >> @@ -6743,9 +6743,9 @@ snmp_resend_request(struct session_list >> tv.tv_sec += tv.tv_usec / 1000000L; >> tv.tv_usec %= 1000000L; >> rp->expireM = tv; >> - if (rp->callback) >> +/* if (rp->callback) >> rp->callback(NETSNMP_CALLBACK_OP_RESEND, sp, >> - rp->pdu->reqid, rp->pdu, rp->cb_data); >> + rp->pdu->reqid, rp->pdu, rp->cb_data);*/ >> } >> return 0; >> } >> >> Without them, all works as expected. >> >> Josef Ridky >> Software Engineer >> Core Services Team >> Red Hat Czech, s.r.o. >> >> ----- Original Message ----- >> | From: "Anders Wallin" <walli...@gmail.com> >> | To: "Josef Ridky" <jri...@redhat.com> >> | Cc: "net-snmp-coders" <net-snmp-coders@lists.sourceforge.net> >> | Sent: Tuesday, April 2, 2019 6:27:54 PM >> | Subject: Re: Core dump with net-snmp-5.8 >> | >> | Hi Josef, >> | I can reproduce the issue using the master branch, I will take a look at >> it >> | later tonight or tomorrow >> | >> | Regards >> | Anders Wallin >> | >> | >> | On Tue, Apr 2, 2019 at 3:42 PM Josef Ridky <jri...@redhat.com> wrote: >> | >> | > Hi, >> | > >> | > thanks for your patch. Unfortunately, even when I have applied it, it >> | > still ends with core dump due of 'double free or corruption (fasttop)' >> | > >> | > When I run snmpd with -Dsnmp_agent,agentx/master it ends with: >> | > >> | > agentx/master: sending pdu (req=0x1d4,trans=0x1d3,sess=0x5) >> | > snmp_agent: delegate session == 0x56207e165240 >> | > snmp_agent: end of handle_snmp_packet, asp = 0x56207e165240 >> | > agentx/master: callback resend >> | > agentx/master: callback resend >> | > agentx/master: timeout on session 0x56207dfd5400 req=0x1c9 >> | > agentx/master: close 0x56207dfd5400, -1 >> | > snmp_agent: removed 40 delegated request(s) for session 0x56207dfce490 >> | > snmp_agent: processing delegated request, asp = 0x56207e165240 >> | > snmp_agent: canceling next walk for asp 0x56207e165240 >> | > snmp_agent: REMOVE session == 0x56207e165240 >> | > snmp_agent: agent_session 0x56207e165240 released >> | > snmp_agent: processing delegated request, asp = 0x56207e1041a0 >> | > snmp_agent: canceling next walk for asp 0x56207e1041a0 >> | > snmp_agent: REMOVE session == 0x56207e1041a0 >> | > snmp_agent: agent_session 0x56207e1041a0 released >> | > snmp_agent: processing delegated request, asp = 0x56207e1656c0 >> | > snmp_agent: canceling next walk for asp 0x56207e1656c0 >> | > snmp_agent: REMOVE session == 0x56207e1656c0 >> | > snmp_agent: agent_session 0x56207e1656c0 released >> | > snmp_agent: processing delegated request, asp = 0x56207e11af40 >> | > snmp_agent: canceling next walk for asp 0x56207e11af40 >> | > snmp_agent: REMOVE session == 0x56207e11af40 >> | > snmp_agent: agent_session 0x56207e11af40 released >> | > snmp_agent: processing delegated request, asp = 0x56207e118f00 >> | > snmp_agent: canceling next walk for asp 0x56207e118f00 >> | > snmp_agent: REMOVE session == 0x56207e118f00 >> | > snmp_agent: agent_session 0x56207e118f00 released >> | > snmp_agent: processing delegated request, asp = 0x56207e11b540 >> | > snmp_agent: canceling next walk for asp 0x56207e11b540 >> | > snmp_agent: REMOVE session == 0x56207e11b540 >> | > snmp_agent: agent_session 0x56207e11b540 released >> | > snmp_agent: processing delegated request, asp = 0x56207e11bd00 >> | > snmp_agent: canceling next walk for asp 0x56207e11bd00 >> | > snmp_agent: REMOVE session == 0x56207e11bd00 >> | > snmp_agent: agent_session 0x56207e11bd00 released >> | > agentx/master: Continue removing delegated subsession reqests >> | > agentx/master: close transport >> | > snmp_agent: REMOVE session == 0x56207dfd5400 >> | > agentx/master: response too late on session 0x56207dfd5400 >> | > agentx/master: response too late on session 0x56207dfd5400 >> | > double free or corruption (fasttop) >> | > Aborted (core dumped) >> | > >> | > >> | > What's interesting, when I run it with -DALL it pass (at least for >> several >> | > rounds). >> | > It looks like some strange race condition. >> | > >> | > Regards >> | > >> | > Josef Ridky >> | > Software Engineer >> | > Core Services Team >> | > Red Hat Czech, s.r.o. >> | > >> | > ----- Original Message ----- >> | > | From: "Anders Wallin" <walli...@gmail.com> >> | > | To: "Josef Ridky" <jri...@redhat.com> >> | > | Cc: "net-snmp-coders" <net-snmp-coders@lists.sourceforge.net> >> | > | Sent: Tuesday, April 2, 2019 1:46:40 PM >> | > | Subject: Re: Core dump with net-snmp-5.8 >> | > | >> | > | Hi Josef, >> | > | >> | > | I think it's the same issue as >> | > https://sourceforge.net/p/net-snmp/bugs/2914/ >> | > | (where I also posted the solution) >> | > | Regards >> | > | Anders Wallin >> | > | >> | > | >> | > | On Tue, Apr 2, 2019 at 12:43 PM Josef Ridky <jri...@redhat.com> >> wrote: >> | > | >> | > | > Hi, >> | > | > >> | > | > recently, I have hit to an issue in net-snmp-5.8, that is >> connected to >> | > the >> | > | > bug report [1]. >> | > | > >> | > | > When I tried to run agentofdeath test from [1], snmpd daemon will >> crash >> | > | > with malloc(): smallbin double linked list corrupted or double >> free() >> | > issue >> | > | > and dumps core (see bellow). >> | > | > From log file, I can identified one issue with "Unknown operation". >> | > | > >> | > | > This issue is in the agentx_got_response function >> | > | > (agent/mibgroup/agentx/master.c). There isn't implemented action >> for >> | > | > NETSNMP_CALLBACK_OP_RESEND (defined in >> | > | > include/net-snmp/library/snmp_api.h). >> | > | > As result "Unknown operation 6 in agentx_got_response" is shown in >> log >> | > | > file. >> | > | > >> | > | > /var/log/messages >> | > | > ------------------------------- >> | > | > Mar 28 06:52:42 localhost snmpd[12073]: Unknown operation 6 in >> | > | > agentx_got_response >> | > | > Mar 28 06:52:43 localhost snmpd[12073]: Unknown operation 6 in >> | > | > agentx_got_response >> | > | > Mar 28 06:52:43 localhost snmpd[12073]: malloc(): smallbin double >> | > linked >> | > | > list corrupted >> | > | > Mar 28 06:52:43 localhost systemd[1]: Started Process Core Dump >> (PID >> | > | > 13652/UID 0). >> | > | > Mar 28 06:52:48 localhost systemd[1]: snmpd.service: Main process >> | > exited, >> | > | > code=dumped, status=6/ABRT >> | > | > Mar 28 06:52:48 localhost systemd[1]: snmpd.service: Failed with >> result >> | > | > 'core-dump'. >> | > | > ------------------------------- >> | > | > >> | > | > The "Unknown operation" callback is caused by newly added piece of >> | > code in >> | > | > snmplib/snmp_api.c: >> | > | > >> | > | > static int >> | > | > snmp_resend_request(struct session_list *slp, netsnmp_request_list >> | > *rp, >> | > | > int incr_retries) >> | > | > { >> | > | > >> | > | > ... >> | > | > >> | > | > tv.tv_sec += tv.tv_usec / 1000000L; >> | > | > tv.tv_usec %= 1000000L; >> | > | > rp->expireM = tv; >> | > | > + if (rp->callback) >> | > | > + rp->callback(NETSNMP_CALLBACK_OP_RESEND, sp, >> | > | > + rp->pdu->reqid, rp->pdu, rp->cb_data); >> | > | > } >> | > | > return 0; >> | > | > } >> | > | > >> | > | > >> | > | > When I tried to remove it, it just stop complaining about >> operation 6, >> | > but >> | > | > the core dump is still present. >> | > | > >> | > | > May I ask you for help with this issue? Do you have any idea, what >> | > causing >> | > | > this issue in 5.8 and how to fix it? >> | > | > I know, that Jan Safranek has fixed this for 5.7 by commit [2], >> but it >> | > | > looks like something other has changed and this issue is current >> again. >> | > | > >> | > | > [1] https://sourceforge.net/p/net-snmp/bugs/2411/ >> | > | > [2] >> | > | > >> | > >> https://github.com/net-snmp/net-snmp/commit/793d596838ff7cb48a73b675d62897c56c9e62df >> | > | > >> | > | > Regards >> | > | > >> | > | > Josef Ridky >> | > | > Software Engineer >> | > | > Core Services Team >> | > | > Red Hat Czech, s.r.o. >> | > | > >> | > | > >> | > | > >> | > | > _______________________________________________ >> | > | > Net-snmp-coders mailing list >> | > | > Net-snmp-coders@lists.sourceforge.net >> | > | > https://lists.sourceforge.net/lists/listinfo/net-snmp-coders >> | > | > >> | > | >> | > >> | >> > > > > _______________________________________________ > Net-snmp-coders mailing list > Net-snmp-coders@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/net-snmp-coders >
From: Anders Wallin <walli...@gmail.com> Date: Sun, 7 Apr 2019 18:31:16 -0400 Subject: [PATCH 1/2] agentx/master: Return when NETSNMP_CALLBACK_OP_RESEND is set to the callback snmpd is terminated abnormally due to the double free for the request cache after the request is resend. That is because the callback for NETSNMP_CALLBACK_OP_RESEND isn't cared and the cache is freed wrongly. Let's just return if NETSNMP_CALLBACK_OP_RESEND is set on the callback. Fixes: b7b50bbac ("snmp_send callback updates") Signed-off-by: Anders Wallin <walli...@gmail.com> --- agent/mibgroup/agentx/master.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/agent/mibgroup/agentx/master.c b/agent/mibgroup/agentx/master.c index 99c4123ed..b7a51dfef 100644 --- a/agent/mibgroup/agentx/master.c +++ b/agent/mibgroup/agentx/master.c @@ -280,6 +280,11 @@ agentx_got_response(int operation, netsnmp_free_delegated_cache(cache); return 0; + case NETSNMP_CALLBACK_OP_RESEND: + DEBUGMSGTL(("agentx/master", "resend on session %8p req=0x%x\n", + session, (unsigned)reqid)); + return 0; + case NETSNMP_CALLBACK_OP_RECEIVED_MESSAGE: /* * This session is alive -- 2.20.1
From: Masayoshi Mizuma <m.miz...@jp.fujitsu.com> Date: Sun, 7 Apr 2019 19:22:11 -0400 Subject: [PATCH 2/2] snmplib/snmp_api: Remove the request on the session when the sending is failed snmpd is terminated abnormally due to an invalid memory access after the sending of a request is failed. The time out callback for the failed request is executed when the session is closing because the request remains in the internal session. The cleanup for the request is executed on the callback(NETSNMP_CALLBACK_OP_SEND_FAILED,) and also on the time out callback(NETSNMP_CALLBACK_OP_TIMED_OUT,), so the wrong memory access happens. Remove the failed request from the internal session after the callback for the failed request is done. Signed-off-by: Masayoshi Mizuma <m.miz...@jp.fujitsu.com> Reported-by: Shogo Matsumoto <shogo.matsum...@jp.fujitsu.com> --- snmplib/snmp_api.c | 22 +++++++++++++++------- 1 file changed, 15 insertions(+), 7 deletions(-) diff --git a/snmplib/snmp_api.c b/snmplib/snmp_api.c index 554767a83..683cec681 100644 --- a/snmplib/snmp_api.c +++ b/snmplib/snmp_api.c @@ -6751,6 +6751,18 @@ snmp_resend_request(struct session_list *slp, netsnmp_request_list *rp, } +static void +remove_request(struct snmp_internal_session *isp, + netsnmp_request_list *orp, netsnmp_request_list *rp) +{ + if (orp) + orp->next_request = rp->next_request; + else + isp->requests = rp->next_request; + if (isp->requestsEnd == rp) + isp->requestsEnd = orp; + snmp_free_pdu(rp->pdu); +} void snmp_sess_timeout(void *sessp) @@ -6813,17 +6825,13 @@ snmp_sess_timeout(void *sessp) callback(NETSNMP_CALLBACK_OP_TIMED_OUT, sp, rp->pdu->reqid, rp->pdu, magic); } - if (orp) - orp->next_request = rp->next_request; - else - isp->requests = rp->next_request; - if (isp->requestsEnd == rp) - isp->requestsEnd = orp; - snmp_free_pdu(rp->pdu); + remove_request(isp, orp, rp); freeme = rp; continue; /* don't update orp below */ } else { if (snmp_resend_request(slp, rp, TRUE)) { + remove_request(isp, orp, rp); + freeme = rp; break; } } -- 2.20.1
_______________________________________________ Net-snmp-coders mailing list Net-snmp-coders@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/net-snmp-coders