Dan McDonald wrote:
You should run lockstat for a minute or two during
this time:
lockstat sleep <#-of-secs>
The output while wedged is:
http://66.235.160.6/lockstat.txt
Lotsa stuff in or around the routing tables, as well as the ipif
(instantiation of an IP address) structures.
netstat -rna | grep UHA | wc -l
This value grows from 0 - ~40k while running. Once it gets to ~40k it will
stay steady for a few minutes then reset back to 0 and grow again.
That's consistent. Every UHA entry is a remote IP address. This makes sense
given you're opening many connections to many peers.
HOWEVER, once the routing problem starts, it goes steady state at only 38
(not 38 thousand, just 38). It seems to stay fixed at that value.
You said the way to unwedge it was to invoke network/initial or
network/services, IIRC. Let's see if we can use a scalpel instead of a meat
cleaver.
How do you install your default route? Do you use in.routed? Or do you have
an /etc/defaultrouter? Or do you use the new "route -p" which populates
/etc/inet/static_routes?
Depending on how you install your default route, you should first try:
- route flush (or "route delete default <defrtr>" if you have other
routes).
- re-install your default route (either "pkill -HUP in.routed" or
"route add default <defrtr>").
and see if you get unwedged.
Next thing to do is to unplumb/replumb your network interface:
ifconfig <intf> unplumb
ifconfig <intf> plumb <addr>/<prefix> up
If that doens't unwedge, try both (flush routes/unplumb, THEN plumb/add
route).
arp -an
While wedged, it still shows the router IP, and I can ping the router. The
output of arp -an doens't change before or after symptoms.
That answers Sowmini's questions. Beyond the stuff I suggested, Sowmini will
have better ideas about what's going on.
I am assuming the problem machine is running as a host. Are you running
in.routed on the machine or have you set default route?
In our lab specific stress testing on a Sun box set up as a DNS proxy
with responding to DNS queries under *heavy* load have indicated that it
may be worthwile experimenting with more randomized hashing algorithms.
Our current IRE hash algorithm is this( see ip_ire.h):
---------
/*
* We use the common modulo hash function. In ip_ire_init(), we make
* sure that the cache table size is always a power of 2. That's why
* we can use & instead of %. Also note that we try hard to make sure
* the lower bits of an address capture most info from the
* whole address.The reason being that since our hash table is
* probably a lot smaller than 2^32 buckets so the lower bits
* are the most important.
*/
#define IRE_ADDR_HASH(addr, table_size) \
(((addr) ^ ((addr) >> 8) ^ ((addr) >> 16) ^ ((addr) >> 24)) & \
((table_size) - 1))
---------------
We are looking into this.
Sangeeta
_______________________________________________
networking-discuss mailing list
[email protected]
_______________________________________________
networking-discuss mailing list
[email protected]