Hi Dave, Can you please let us know if there is a bug raised for the race condition in SNMPD while receiving/processing traps and incoming requests.?
This problem was seen 3 times till now. We could capture the strace when the problem happened for the 3rd time, which I sent to the list in one of my previous mails. As you suggested, we tried to reproduce by doing the following changes. A) Increased agentxPingInterval to 10 seconds B) Enabled the debug tokens sess_process_packet, sess_read, handler:calling and handler:returned. C) Attached strace to SNMPD. But unfortunately, we encountered another issue. Our Subagent has sent huge number of traps generated due to changes in the system. From strace output of SNMPD, we could see that "recv" call returned 65536 bytes, which are all traps. Both subagent and SNMPD were busy processing snmpwalk requests before the traps were generated and received. SNMPD processed all these AGENTX traps and sent them as SNMP traps. But there was a partial packet of size 624 left in this and logs indicate "pkt not complete (need 840 got 624)". There was another recv followed, which was also of 65536. The left out 624 bytes were added to this and loop packet_len was logged as 66160. But instead of processing the earlier expected 840 bytes, SNMPD processed 500 bytes and sent them as a SNMP trap. Later, it tried to read and process 840 bytes first and then 70 bytes, but logs indicate that parse failed for both. After sometime, when ping request of 20 bytes was received from subagent, this was added to loop packet len, again SNMPD processed 500 bytes, sent them as a SNMP trap and tried to read/process 840 bytes followed by 70 bytes, but parse failed for both. Following strace output indicate the same. ------------------------------------------------------------------------ --------------------------- 7632 05:22:20.843585 write(3, " loop packet_len 624, PDU lengt"..., 38) = 38 <0.000011> 7632 05:22:20.843694 write(3, "sess_read: ", 11) = 11 <0.000011> 7632 05:22:20.843796 write(3, "pkt not complete (need 840 got 6"..., 43) = 43 <0.000011> 7632 05:22:20.854204 brk(0x1010c000) = 0x1010c000 <0.000022> 7632 05:22:20.856386 recv(11, "\1\0\0\0\210H\0\0\1\0\0\0\2\0\0\0\2\0\0\0\1\0\0\0\2\0\0"..., 65536, 0) = 65536 <0.000337> 7632 05:22:20.857082 write(3, "sess_read: ", 11) = 11 <0.000018> 7632 05:22:20.857243 write(3, " loop packet_len 66160, PDU len"..., 40) = 40 <0.000075> 7632 05:22:20.857435 write(3, "sess_process_packet: ", 21) = 21 <0.000012> 7632 05:22:20.857552 write(3, "session 0x1008a4c0 fd 11 pkt 0x1"..., 51) = 51 <0.000012> 7632 05:22:20.857758 write(3, "\nReceived 500 bytes from TCP: [1"..., 48) = 48 <0.000013> 7632 05:22:20.872068 sendto(9, "0\201\323\2\1\1\4\6public\247\201\305\2\4d\r?7\2\1\0\2"..., 214, 0, {sa_family=AF_INET, sin_port=htons(162), sin_addr=inet_addr("172.6.1.1")}, 16) = 214 <0.000054> 7632 05:22:20.878832 send(11, "[EMAIL PROTECTED]"..., 508, 0) = 508 <0.000023> 7632 05:22:20.879100 write(3, "sess_read: ", 11) = 11 <0.000027> 7632 05:22:20.880297 write(3, " loop packet_len 65660, PDU len"..., 40) = 40 <0.000013> 7632 05:22:20.880458 write(3, "sess_process_packet: ", 21) = 21 <0.000012> 7632 05:22:20.880571 write(3, "session 0x1008a4c0 fd 11 pkt 0x1"..., 51) = 51 <0.000011> 7632 05:22:20.880690 write(3, "\nReceived 840 bytes from TCP: [1"..., 48) = 48 <0.000012> 7632 05:22:20.891358 write(3, "sess_process_packet: ", 21) = 21 <0.000025> 7632 05:22:20.891610 write(3, "parse fail\n", 11) = 11 <0.000012> 7632 05:22:20.891764 write(3, "sess_read: ", 11) = 11 <0.000011> 7632 05:22:20.891872 write(3, " loop packet_len 64820, PDU len"..., 39) = 39 <0.000011> 7632 05:22:20.892002 write(3, "sess_process_packet: ", 21) = 21 <0.000011> 7632 05:22:20.892111 write(3, "session 0x1008a4c0 fd 11 pkt 0x1"..., 50) = 50 <0.000011> 7632 05:22:20.892232 write(3, "\nReceived 70 bytes from TCP: [12"..., 47) = 47 <0.000012> 7632 05:22:20.893169 write(3, "sess_process_packet: ", 21) = 21 <0.000011> 7632 05:22:20.893277 write(3, "parse fail\n", 11) = 11 <0.000011> 7632 05:22:20.893393 write(3, "sess_read: ", 11) = 11 <0.000014> 7632 05:22:20.893548 write(3, " loop packet_len 64750, PDU len"..., 46) = 46 <0.000012> 7632 05:22:20.893663 write(3, "sess_read: ", 11) = 11 <0.000012> 7632 05:22:20.893764 write(3, "pkt not complete (need 101056532"..., 51) = 51 <0.000011> ------------------------------------------------------------------------ -------------------------------------------- SNMPD received transport disconnect indication soon after (recv returned 0 bytes), SNMPD was sending responses for outstanding requests and it crashed due to Segmentation fault. Can you please help us if this kind of problem is seen earlier when subagent sends 65536 bytes at once which has a last partial packet.? In which cases the parse fails.? It will be really helpful if you can provide any ideas on this. Thanks & Regards, Suresh. -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Dave Shield Sent: Tuesday, April 24, 2007 6:43 PM To: Pamidipati Suresh-G20238 Cc: net-snmp-users@lists.sourceforge.net Subject: Re: SNMPD not responding sometimes even though it is up and running On 24/04/07, Pamidipati Suresh-G20238 <[EMAIL PROTECTED]> wrote: > Subagent was busy responding to snmpwalk (which comes as getnext > requests) till timestamp of Apr 5 18.36.56.098. SNMPD stopped > responding from timestamp Apr 5 18:36:57 for all the mib objects > (owned by itself and subagent). The subagent isn't generating traps, then? Most of the AgentX hangup problems we've had reported seem to have been related to heavy trap generation - presumably triggering a race condition with traps vs incoming requests. > However, SNMPD was responding to subagent ping requests. The > ping interval was configured as 1 second. I'm wondering whether this might not be effectively the same problem in a new guise - with ping requests taking the place of traps. Was this a one-off problem, or can you reproduce it fairly reliably? If it's reproducible, then try reducing the frequency of your ping requests (or turn them off altogether). Does that make any difference? Dave ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Net-snmp-users mailing list Net-snmp-users@lists.sourceforge.net Please see the following page to unsubscribe or change other options: https://lists.sourceforge.net/lists/listinfo/net-snmp-users