Erik Nordmark wrote:
Rao Shoaib wrote:
Thanks for taking the time to do the code review.
My concern is based on the experience of the SolarMax project where
we had 2K tunnels plumbed (This was on specific HW - on a bigger
system there may be more) and every time snmpd ran the system hung
because the code was holding a lock and going through individual
counters and adding them up. We learned that it was best to amortize
the time by updating the aggregate counters and individual counters
at the same time and then just copy them out.
Why would a lock be needed? The loop over ills need to ensure that an
ill doesn't disappear while it is being accessed, but that doesn't
require preventing other concerns. Thus I don't understand what you
think is hard here.
To walk the ill's one needs to hold the global ill_g_lock.
Was the case just an attempt at a poor implementation for SolarisMax?
No SolarMax actually fixed most of these issues.
There is obviously a concern that this global structure will be hot
and would result in cache misses. On current non SMP CMT systems this
is a non issue. On SNMP systems we have already seen that this is a
better approach. I believe that would also be the case for NUMA and
SMP CMT systems. If not then we will have to come up with a more
creative scheme such as not holding the lock or having a per cpu
global ip statistics structure. We could also announce that we are
deprecating returning aggregate statistics for V4 and the application
requesting the statistics would have to add them up just like it does
for V6.
If you feel we should address this issue now then we will or we could
file an RFE to visit this later. Current testing shows that there is
no degradation with our changes probably because current Solaris code
already does this.
Yes, you should address this now. Postponing things to RFEs means they
are likely to never get done.
But what you are suggesting is really an RFE as we are not introducing
this behavior. This is what Solaris does today. If we were to make this
change than we will have to go and test all cases to make sure there is
no degradation.
And the result should be uniform across the IPv4 and IPv6 code paths.
Can you elaborate on this please. If V4 is to return aggregate stats and
V6 does not have to than how can the result be uniform or are you
suggesting that we deprecate returning aggregates for V4.
Rao.
P.S. Sorry for the late reply. I was out last week.
Erik
_______________________________________________
networking-discuss mailing list
[email protected]