Hi Dave,

Can you please let us know if there is a bug raised for the race
condition in SNMPD while receiving/processing traps and incoming
requests.?

This problem was seen 3 times till now. We could capture the strace when
the problem happened for the 3rd time, which I sent to the list in one
of my previous mails.

As you suggested, we tried to reproduce by doing the following changes. 

A) Increased agentxPingInterval to 10 seconds
B) Enabled the debug tokens sess_process_packet, sess_read,
handler:calling and handler:returned.
C) Attached strace to SNMPD.

But unfortunately, we encountered another issue. Our Subagent has sent
huge number of traps generated due to changes in the system. From strace
output of SNMPD, we could see that "recv" call returned 65536 bytes,
which are all traps. Both subagent and SNMPD were busy processing
snmpwalk requests before the traps were generated and received. SNMPD
processed all these AGENTX traps and sent them as SNMP traps. But there
was a partial packet of size 624 left in this and logs indicate "pkt not
complete (need 840 got 624)". There was another recv followed, which was
also of 65536. The left out 624 bytes were added to this and loop
packet_len was logged as 66160. But instead of processing the earlier
expected 840 bytes, SNMPD processed 500 bytes and sent them as a SNMP
trap. Later, it tried to read and process 840 bytes first and then 70
bytes, but logs indicate that parse failed for both. After sometime,
when ping request of 20 bytes was received from subagent, this was added
to loop packet len, again SNMPD processed 500 bytes, sent them as a SNMP
trap and tried to read/process 840 bytes followed by 70 bytes, but parse
failed for both. Following strace output indicate the same.

------------------------------------------------------------------------
---------------------------
7632  05:22:20.843585 write(3, "  loop packet_len 624, PDU lengt"...,
38) = 38 <0.000011>
7632  05:22:20.843694 write(3, "sess_read: ", 11) = 11 <0.000011>
7632  05:22:20.843796 write(3, "pkt not complete (need 840 got 6"...,
43) = 43 <0.000011>

7632  05:22:20.854204 brk(0x1010c000)   = 0x1010c000 <0.000022>
7632  05:22:20.856386 recv(11,
"\1\0\0\0\210H\0\0\1\0\0\0\2\0\0\0\2\0\0\0\1\0\0\0\2\0\0"..., 65536, 0)
= 65536 <0.000337>
7632  05:22:20.857082 write(3, "sess_read: ", 11) = 11 <0.000018>
7632  05:22:20.857243 write(3, "  loop packet_len 66160, PDU len"...,
40) = 40 <0.000075>
7632  05:22:20.857435 write(3, "sess_process_packet: ", 21) = 21
<0.000012>
7632  05:22:20.857552 write(3, "session 0x1008a4c0 fd 11 pkt 0x1"...,
51) = 51 <0.000012>
7632  05:22:20.857758 write(3, "\nReceived 500 bytes from TCP: [1"...,
48) = 48 <0.000013>

7632  05:22:20.872068 sendto(9,
"0\201\323\2\1\1\4\6public\247\201\305\2\4d\r?7\2\1\0\2"..., 214, 0,
{sa_family=AF_INET, sin_port=htons(162),
sin_addr=inet_addr("172.6.1.1")}, 16) = 214 <0.000054>

7632  05:22:20.878832 send(11,
"[EMAIL PROTECTED]"..., 508, 0) =
508 <0.000023>
7632  05:22:20.879100 write(3, "sess_read: ", 11) = 11 <0.000027>
7632  05:22:20.880297 write(3, "  loop packet_len 65660, PDU len"...,
40) = 40 <0.000013>
7632  05:22:20.880458 write(3, "sess_process_packet: ", 21) = 21
<0.000012>
7632  05:22:20.880571 write(3, "session 0x1008a4c0 fd 11 pkt 0x1"...,
51) = 51 <0.000011>
7632  05:22:20.880690 write(3, "\nReceived 840 bytes from TCP: [1"...,
48) = 48 <0.000012>

7632  05:22:20.891358 write(3, "sess_process_packet: ", 21) = 21
<0.000025>
7632  05:22:20.891610 write(3, "parse fail\n", 11) = 11 <0.000012>
7632  05:22:20.891764 write(3, "sess_read: ", 11) = 11 <0.000011>
7632  05:22:20.891872 write(3, "  loop packet_len 64820, PDU len"...,
39) = 39 <0.000011>
7632  05:22:20.892002 write(3, "sess_process_packet: ", 21) = 21
<0.000011>
7632  05:22:20.892111 write(3, "session 0x1008a4c0 fd 11 pkt 0x1"...,
50) = 50 <0.000011>
7632  05:22:20.892232 write(3, "\nReceived 70 bytes from TCP: [12"...,
47) = 47 <0.000012>

7632  05:22:20.893169 write(3, "sess_process_packet: ", 21) = 21
<0.000011>
7632  05:22:20.893277 write(3, "parse fail\n", 11) = 11 <0.000011>
7632  05:22:20.893393 write(3, "sess_read: ", 11) = 11 <0.000014>
7632  05:22:20.893548 write(3, "  loop packet_len 64750, PDU len"...,
46) = 46 <0.000012>
7632  05:22:20.893663 write(3, "sess_read: ", 11) = 11 <0.000012>
7632  05:22:20.893764 write(3, "pkt not complete (need 101056532"...,
51) = 51 <0.000011>
------------------------------------------------------------------------
--------------------------------------------

SNMPD received transport disconnect indication soon after (recv returned
0 bytes), SNMPD was sending responses for outstanding requests and it
crashed due to Segmentation fault. 

Can you please help us if this kind of problem is seen earlier when
subagent sends 65536 bytes at once which has a last partial packet.? In
which cases the parse fails.? 
It will be really helpful if you can provide any ideas on this. 


Thanks & Regards,
Suresh.





-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On
Behalf Of Dave Shield
Sent: Tuesday, April 24, 2007 6:43 PM
To: Pamidipati Suresh-G20238
Cc: net-snmp-users@lists.sourceforge.net
Subject: Re: SNMPD not responding sometimes even though it is up and
running


On 24/04/07, Pamidipati Suresh-G20238 <[EMAIL PROTECTED]>
wrote:
> Subagent was busy responding to snmpwalk (which comes as getnext
> requests) till timestamp of Apr 5 18.36.56.098. SNMPD stopped 
> responding from timestamp Apr 5 18:36:57 for all the mib objects 
> (owned by itself and subagent).

The subagent isn't generating traps, then?
Most of the AgentX hangup problems we've had reported seem to have been
related to heavy trap generation - presumably triggering a race
condition with traps vs incoming requests.

>           However, SNMPD was responding to subagent ping requests. The

> ping interval was configured as 1 second.

I'm wondering whether this might not be effectively the same problem in
a new guise - with ping requests taking the place of traps.

Was this a one-off problem, or can you reproduce it fairly reliably? If
it's reproducible, then try reducing the frequency of your ping
requests (or turn them off altogether).   Does that make any difference?

Dave

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Net-snmp-users mailing list
Net-snmp-users@lists.sourceforge.net
Please see the following page to unsubscribe or change other options:
https://lists.sourceforge.net/lists/listinfo/net-snmp-users

Reply via email to