Hi Oliver. Rather than a big reassembly list I suspect that you are hitting this bug.
6637163 ip_rput_fragment[_v6]() spuriously prunes valid frags due to unbounded inaccuracy of ill_frag_count The fix is currently in T-patch form (afaik as I can't look it up right now) 137111-05. HTH -George *>Date: Sun, 03 Aug 2008 16:59:05 +0800 *>From: "Oliver.Yang" <[EMAIL PROTECTED]> *>Subject: [networking-discuss] A question about IP reassembly *>To: [email protected] *>MIME-version: 1.0 *>Content-transfer-encoding: 7BIT *>X-BeenThere: [email protected] *>Delivered-to: [email protected] *>X-PMX-Version: 5.4.1.325704 *>X-Spam-Checker-Version: SpamAssassin 3.2.3 (2007-08-08) on oss-mail1.opensolaris.org *>X-Original-To: [email protected] *>X-Antispam: No, score=0.0/5.0, scanned in 0.187sec at (localhost [127.0.0.1]) by smf-spamd v1.3.1 - http://smfs.sf.net/ *>X-Mailman-Version: 2.1.9 *>List-Post: <mailto:[email protected]> *>List-Subscribe: <http://mail.opensolaris.org/mailman/listinfo/networking-discuss>, <mailto:[EMAIL PROTECTED]> *>List-Unsubscribe: <http://mail.opensolaris.org/mailman/listinfo/networking-discuss>, <mailto:[EMAIL PROTECTED]> *>List-Archive: <http://mail.opensolaris.org/pipermail/networking-discuss> *>List-Help: <mailto:[EMAIL PROTECTED]> *>List-Id: Networking General Discussion <networking-discuss.opensolaris.org> *>User-Agent: Thunderbird 2.0.0.16 (Windows/20080708) *>X-Spam-Status: No, score=-3.6 required=5.0 tests=AWL,RCVD_IN_DNSWL_MED autolearn=unavailable version=3.2.3 *>X-Spam-Level: *> *>Hi All, *> *>What reason could cause a big reassembly list on the specific ILL? *> *>I ran into a NIC driver UDP performance issue recently. After the *>debugging, I found it caused by a packets dropping issue in upper layer. *> *>The netstat output showed ipReasmFails is extremely high while this *>issue is happened: *> *># netstat -I igb1 -s -P ip *>Name Mtu Net/Dest Address Ipkts Ierrs Opkts Oerrs Collis *>Queue *>igb1 1500 11.0.3.0 11.0.3.11 7716786 0 26538 0 *>0 0 *> *>IPv4 ipForwarding = 2 ipDefaultTTL = 255 *> ipInReceives =7718245 ipInHdrErrors = 0 *> ipInAddrErrors = 0 ipInCksumErrs = 0 *> ipForwDatagrams = 0 ipForwProhibits = 0 *> ipInUnknownProtos = 70 ipInDiscards = 0 *> ipInDelivers = 48737 ipOutRequests = 32338 *> ipOutDiscards = 1 ipOutNoRoutes = 0 *> ipReasmTimeout = 60 ipReasmReqds =690804 *> ipReasmOKs = 83 ipReasmFails =695798 *> ipReasmDuplicates = 0 ipReasmPartDups = 0 *> ipFragOKs = 0 ipFragFails = 0 *> ipFragCreates = 0 ipRoutingDiscards = 0 *> tcpInErrs = 0 udpNoPorts = 118 *> udpInCksumErrs = 0 udpInOverflows = 0 *> rawipInOverflows = 0 ipsecInSucceeded = 0 *> ipsecInFailed = 0 ipInIPv6 = 0 *> ipOutIPv6 = 0 ipOutSwitchIPv6 = 106 *> *> *>My dtrace command showed the massive packets were dropped by *>ill_frag_prune in IP layer: *> *>bash-3.2# dtrace -n mib:::ipIfStatsReasmFails'[EMAIL PROTECTED]()]=count();}' *>dtrace: description 'mib:::ipIfStatsReasmFails' matched 3 probes *>^C *> ip`ill_frag_free_pkts+0x84 *> ip`ill_frag_prune+0x1d0 *> ip`ip_rput_fragment+0x39c *> ip`ip_udp_input+0x80c *> ip`ip_input+0xb94 *> dls`soft_ring_drain+0x68 *> dls`soft_ring_worker+0x5c *> unix`thread_start+0x4 *> 1 *> *> ip`ip_mib2_add_ip_stats+0x1ac *> ip`ip_snmp_get_mib2_ip_traffic_stats+0x130 *> ip`ip_snmp_get+0xd8 *> ip`snmpcom_req+0x280 *> ip`ip_wput_nondata+0x674 *> unix`putnext+0x208 *> genunix`strput+0x1a0 *> genunix`strputmsg+0x26c *> genunix`msgio32+0x398 *> genunix`putmsg32+0x98 *> unix`syscall_trap32+0xcc *> 6 *> *> ip`ip_mib2_add_ip_stats+0x1ac *> ip`ip_snmp_get_mib2_ip_traffic_stats+0x130 *> ip`ip_snmp_get+0xd8 *> ip`snmpcom_req+0x280 *> ip`ip_wput_nondata+0x674 *> unix`putnext+0x208 *> unix`putnext+0x208 *> unix`putnext+0x208 *> unix`putnext+0x208 *> unix`putnext+0x208 *> genunix`strput+0x1a0 *> genunix`strputmsg+0x26c *> genunix`msgio32+0x398 *> genunix`putmsg32+0x98 *> unix`syscall_trap32+0xcc *> 66 *> *> ip`ill_frag_free_pkts+0x84 *> ip`ill_frag_prune+0x100 *> ip`ip_rput_fragment+0x39c *> ip`ip_udp_input+0x80c *> ip`ip_input+0xb94 *> dls`soft_ring_drain+0x68 *> dls`soft_ring_worker+0x5c *> unix`thread_start+0x4 *> 232825 *> *> *>It seems it is caused by a big reassembly list: *> *> /* If the reassembly list for this ILL will get too big, prune it */ *> if ((msg_len + sizeof (*ipf) + ill->ill_frag_count) >= *> ipst->ips_ip_reass_queue_bytes) { *> ill_frag_prune(ill, *> (ipst->ips_ip_reass_queue_bytes < msg_len) ? 0 : *> (ipst->ips_ip_reass_queue_bytes - msg_len)); *> pruned = B_TRUE; *> } *> *>But I want to know the major reasons which could cause the big *>reassembly list on the specific ILL. *> *>Could you shed some light on this issue? Thanks! *> *>-- *>Cheers, *> *>------------------------------------------------------------ *>Oliver Yang | http://blog.csdn.net/yayong *> *>_______________________________________________ *>networking-discuss mailing list *>[email protected] George Shepherd http://clem.uk/~georges/ ============================================================================== Solaris Revenue Product Engineering: | SUN Microsystems Core team -Internet | Guillemont Park Email: [EMAIL PROTECTED] | Camberley GU17 9QG Disclaimer: Less is more, more or less | United Kingdom ============================================================================== _______________________________________________ networking-discuss mailing list [email protected]
