Gael.. There are multile potential reasons for ReasmFails incrementing, only one of them is the bug fixed in 137111-05, the others are not all bugs.
If you read CR 6637163 you will notice that bug is related to ill_frag_prune() being called erroneously. If you use the actual dtrace script supplied in the comments of CR 6637163 and the method I supply with it to determine if the calls to ill_frag_prune() are done on a negative signed integer, you will be able to determine whether CR 6637163 is involved in any way. Other causes of this type of issue are: 1/ problems with the cluster interconnect hardware... switches, cables NICs etc. 2/ Data loss due to a NIC card being in a 33Mhz slot instead of a 66Mhz slot. 3/ excessive number of oracle LMS processes on the cluster nodes 4/ Execessive load on a node resulting in delayed fragments. 5/ CR 6244400 The cure used often for Oracle RAC sites is to use 9kbyte jumbo frames on all nodes for the cluster interconnect and all it's switch ports. -George Gael wrote: > Good morning, > > Here the result of the dtrace after running all night long, was > something missed with the patch 137111-05 or was the release delayed ? > > > dbss8115:/root #dtrace -n mib:::ipIfStatsReasmFails'[EMAIL > PROTECTED]()]=count();}' > dtrace: description 'mib:::ipIfStatsReasmFails' matched 3 probes > > ip`ill_frag_free_pkts+0x90 > ip`ip_rput_fragment+0x634 > ip`ip_udp_input+0x7c8 > ip`ip_input+0xb20 > dls`soft_ring_drain+0x7c > dls`soft_ring_worker+0x64 > unix`thread_start+0x4 > 2 > > ip`ill_frag_free_pkts+0x90 > ip`ill_frag_prune+0x1f0 > ip`ip_rput_fragment+0x3b4 > ip`ip_udp_input+0x7c8 > ip`ip_input+0xb20 > dls`soft_ring_drain+0x7c > dls`soft_ring_worker+0x64 > unix`thread_start+0x4 > 27 > > ip`ill_frag_timeout+0x16c > ip`ill_frag_timer+0x5c > genunix`callout_execute+0xb8 > genunix`taskq_thread+0x1a4 > unix`thread_start+0x4 > 214 > > ip`ill_frag_free_pkts+0x90 > ip`ill_frag_prune+0x110 > ip`ip_rput_fragment+0x3b4 > ip`ip_udp_input+0x7c8 > ip`ip_input+0xb20 > dls`soft_ring_drain+0x7c > dls`soft_ring_worker+0x64 > unix`thread_start+0x4 > 8011 > > Regards > > > On 8/25/08, *Gael* <[EMAIL PROTECTED] > <mailto:[EMAIL PROTECTED]>> wrote: > > Hello George, > > Actually even after applying 137111-05, we still are seeing that > issue ... Oracle RAC crashes were blamed on that one. > > I have started that same dtrace script to try to catch the culprit. > > dbss8115:/root #uname -a > SunOS dbss8115 5.10 Generic_137111-05 sun4v sparc > SUNW,SPARC-Enterprise-T5220 > dbss8115:/root #uptime > 3:10pm up 4 day(s), 23:16, 2 users, load average: 1.28, 0.95, > 0.83 > dbss8115:/root #netstat -I igb1 -s -P ip > > > IPv4 ipForwarding = 2 ipDefaultTTL = 255 > ipInReceives =105901432 ipInHdrErrors = 0 > ipInAddrErrors = 0 ipInCksumErrs = 0 > ipForwDatagrams = 0 ipForwProhibits = 3396 > ipInUnknownProtos = 4 ipInDiscards = 1 > ipInDelivers =61849468 ipOutRequests =49924968 > ipOutDiscards = 0 ipOutNoRoutes = 0 > ipReasmTimeout = 60 ipReasmReqds =10337023 > ipReasmOKs =10310108 ipReasmFails = 37808 > ipReasmDuplicates = 2116 ipReasmPartDups = 0 > ipFragOKs =6033192 ipFragFails = 0 > ipFragCreates =30277839 ipRoutingDiscards = 0 > tcpInErrs = 0 udpNoPorts =170241 > udpInCksumErrs = 1299 udpInOverflows = 0 > rawipInOverflows = 0 ipsecInSucceeded = 0 > ipsecInFailed = 0 ipInIPv6 = 0 > ipOutIPv6 = 0 ipOutSwitchIPv6 = 0 > > > > > On 8/3/08, *George Shepherd* <[EMAIL PROTECTED] > <mailto:[EMAIL PROTECTED]>> wrote: > > Hi Oliver. > > Rather than a big reassembly list I suspect that you are > hitting this bug. > > 6637163 ip_rput_fragment[_v6]() spuriously prunes valid frags > due to unbounded > inaccuracy of ill_frag_count > > The fix is currently in T-patch form (afaik as I can't look it > up right now) > 137111-05. > > HTH > -George > > > > > -- > Gael Martinez > ------------------------------------------------------------------------ > > _______________________________________________ > networking-discuss mailing list > [email protected] > _______________________________________________ networking-discuss mailing list [email protected]
