Gael..

There are multile potential reasons for ReasmFails incrementing,
only one of them is the bug fixed in 137111-05, the others are not
all bugs.

If you read CR 6637163 you will notice that bug is related to
ill_frag_prune() being called erroneously. If you use the actual
dtrace script supplied in the comments of CR 6637163 and the
method I supply with it to determine if the calls to ill_frag_prune()
are done on a negative signed integer, you will be able to determine
whether CR 6637163 is involved in any way.

Other causes of this type of issue are:

1/ problems with the cluster interconnect hardware... switches, cables
NICs etc.

2/ Data loss due to a NIC card being in a 33Mhz slot instead of a 66Mhz 
slot.

3/ excessive number of oracle LMS processes on the cluster nodes

4/ Execessive load on a node resulting in delayed fragments.

5/ CR 6244400

The cure used often for Oracle RAC sites is to use 9kbyte jumbo frames
on all nodes for the cluster interconnect and all it's switch ports.

-George



Gael wrote:
> Good morning,
>  
> Here the result of the dtrace after running all night long, was 
> something missed with the patch 137111-05 or was the release delayed ?
>  
>
> dbss8115:/root #dtrace -n mib:::ipIfStatsReasmFails'[EMAIL 
> PROTECTED]()]=count();}'
> dtrace: description 'mib:::ipIfStatsReasmFails' matched 3 probes 
>
>               ip`ill_frag_free_pkts+0x90
>               ip`ip_rput_fragment+0x634
>               ip`ip_udp_input+0x7c8
>               ip`ip_input+0xb20
>               dls`soft_ring_drain+0x7c
>               dls`soft_ring_worker+0x64
>               unix`thread_start+0x4
>                 2
>
>               ip`ill_frag_free_pkts+0x90
>               ip`ill_frag_prune+0x1f0
>               ip`ip_rput_fragment+0x3b4
>               ip`ip_udp_input+0x7c8
>               ip`ip_input+0xb20
>               dls`soft_ring_drain+0x7c
>               dls`soft_ring_worker+0x64
>               unix`thread_start+0x4
>                27
>
>               ip`ill_frag_timeout+0x16c
>               ip`ill_frag_timer+0x5c
>               genunix`callout_execute+0xb8
>               genunix`taskq_thread+0x1a4
>               unix`thread_start+0x4
>               214
>
>               ip`ill_frag_free_pkts+0x90
>               ip`ill_frag_prune+0x110
>               ip`ip_rput_fragment+0x3b4
>               ip`ip_udp_input+0x7c8
>               ip`ip_input+0xb20
>               dls`soft_ring_drain+0x7c
>               dls`soft_ring_worker+0x64
>               unix`thread_start+0x4
>              8011
>
> Regards
>
>  
> On 8/25/08, *Gael* <[EMAIL PROTECTED] 
> <mailto:[EMAIL PROTECTED]>> wrote:
>
>     Hello George,
>      
>     Actually even after applying 137111-05, we still are seeing that
>     issue ... Oracle RAC crashes were blamed on that one.
>      
>     I have started that same dtrace script to try to catch the culprit.
>
>     dbss8115:/root #uname -a
>     SunOS dbss8115 5.10 Generic_137111-05 sun4v sparc
>     SUNW,SPARC-Enterprise-T5220
>     dbss8115:/root #uptime
>       3:10pm  up 4 day(s), 23:16,  2 users,  load average: 1.28, 0.95,
>     0.83
>     dbss8115:/root #netstat -I igb1 -s -P ip
>
>
>     IPv4    ipForwarding        =     2     ipDefaultTTL        =   255
>             ipInReceives        =105901432  ipInHdrErrors       =     0
>             ipInAddrErrors      =     0     ipInCksumErrs       =     0
>             ipForwDatagrams     =     0     ipForwProhibits     =  3396
>             ipInUnknownProtos   =     4     ipInDiscards        =     1
>             ipInDelivers        =61849468   ipOutRequests       =49924968
>             ipOutDiscards       =     0     ipOutNoRoutes       =     0
>             ipReasmTimeout      =    60     ipReasmReqds        =10337023
>             ipReasmOKs          =10310108   ipReasmFails        = 37808
>             ipReasmDuplicates   =  2116     ipReasmPartDups     =     0
>             ipFragOKs           =6033192    ipFragFails         =     0
>             ipFragCreates       =30277839   ipRoutingDiscards   =     0
>             tcpInErrs           =     0     udpNoPorts          =170241
>             udpInCksumErrs      =  1299     udpInOverflows      =     0
>             rawipInOverflows    =     0     ipsecInSucceeded    =     0
>             ipsecInFailed       =     0     ipInIPv6            =     0
>             ipOutIPv6           =     0     ipOutSwitchIPv6     =     0
>
>
>
>      
>     On 8/3/08, *George Shepherd* <[EMAIL PROTECTED]
>     <mailto:[EMAIL PROTECTED]>> wrote:
>
>         Hi Oliver.
>
>         Rather than a big reassembly list I suspect that you are
>         hitting this bug.
>
>         6637163 ip_rput_fragment[_v6]() spuriously prunes valid frags
>         due to unbounded
>         inaccuracy of ill_frag_count
>
>         The fix is currently in T-patch form (afaik as I can't look it
>         up right now)
>         137111-05.
>
>         HTH
>         -George
>
>
>
>
> -- 
> Gael Martinez
> ------------------------------------------------------------------------
>
> _______________________________________________
> networking-discuss mailing list
> [email protected]
>   

_______________________________________________
networking-discuss mailing list
[email protected]

Reply via email to