Erik Nordmark wrote:
Rao Shoaib wrote:

Thanks for taking the time to do the code review.

My concern is based on the experience of the SolarMax project where we had 2K tunnels plumbed (This was on specific HW - on a bigger system there may be more) and every time snmpd ran the system hung because the code was holding a lock and going through individual counters and adding them up. We learned that it was best to amortize the time by updating the aggregate counters and individual counters at the same time and then just copy them out.

Why would a lock be needed? The loop over ills need to ensure that an ill doesn't disappear while it is being accessed, but that doesn't require preventing other concerns. Thus I don't understand what you think is hard here.
To walk the ill's one needs to hold the global ill_g_lock.
Was the case just an attempt at a poor implementation for SolarisMax?
No SolarMax actually fixed most of these issues.

There is obviously a concern that this global structure will be hot and would result in cache misses. On current non SMP CMT systems this is a non issue. On SNMP systems we have already seen that this is a better approach. I believe that would also be the case for NUMA and SMP CMT systems. If not then we will have to come up with a more creative scheme such as not holding the lock or having a per cpu global ip statistics structure. We could also announce that we are deprecating returning aggregate statistics for V4 and the application requesting the statistics would have to add them up just like it does for V6.

If you feel we should address this issue now then we will or we could file an RFE to visit this later. Current testing shows that there is no degradation with our changes probably because current Solaris code already does this.

Yes, you should address this now. Postponing things to RFEs means they are likely to never get done.
But what you are suggesting is really an RFE as we are not introducing this behavior. This is what Solaris does today. If we were to make this change than we will have to go and test all cases to make sure there is no degradation.

And the result should be uniform across the IPv4 and IPv6 code paths.
Can you elaborate on this please. If V4 is to return aggregate stats and V6 does not have to than how can the result be uniform or are you suggesting that we deprecate returning aggregates for V4.

Rao.

P.S. Sorry for the late reply. I was out last week.

   Erik

_______________________________________________
networking-discuss mailing list
[email protected]

Reply via email to