On 02.12.2016 00:14, Ido Schimmel wrote: [...]
>> Basically, if you delete a node right now the kernel might simply do a >> RCU_INIT_POINTER(ptr_location, NULL), which has absolutely no barriers >> or synchronization with the reader side. Thus you might get a callback >> from the notifier for a delete event on the one CPU and you end up >> queueing this fib entry after the delete queue, because the RCU walk >> isn't protected by any means. >> >> Looking closer at this series again, I overlooked the fact that you >> fetch fib_seq using a rtnl_lock and rtnl_unlock pair, which first of all >> orders fetching of fib_seq and thus the RCU dumping after any concurrent >> executing fib table update, also the mutex_lock and unlock provide >> proper acquire and release fences, so the CPU indeed sees the effect of >> a RCU_INIT_POINTER update done on another CPU, because they pair with >> the rtnl_unlock which might happen on the other CPU. > > Yep, Exactly. I had a feeling this is the issue you were referring to, > but then you were the one to suggest the use of RTNL, so I was quite > confused. At that time I actually had in mind that the fib_register would happen under the sequence lock, so I didn't look closely to the memory barrier pairings. I kinda still consider this to be a happy accident. ;) >> My question is if this is a bit of luck and if we should make this >> explicit by putting the registration itself under the protection of the >> sequence counter. I favor the additional protection, e.g. if we some day >> actually we optimize the fib_seq code? Otherwise we might probably >> document this fact. :) > > Well, some listeners don't require a dump, but only registration > (rocker) and in the future we might only need a dump (e.g., port being > moved to a different net namespace). So I'm not sure if bundling both > together is a good idea. > > Maybe we can keep register_fib_notifier() as-is and add 'bool register' > to fib_notifier_dump() so that when set, 'nb' is also registered after > RCU walk, but before we check if the dump is consistent (unregistered if > inconsistent)? I really like that. Would you mind adding this? [...] >> Quick follow-up question: How can I quickly find out the hw limitations >> via the kernel api? > > That's a good question. Currently, you can't. However, we already have a > mechanism in place to read device's capabilities from the firmware and > we can (and should) expose some of them to the user. The best API for > that would be devlink, as it can represent the entire device as opposed > to only a port netdev like other tools. > > We're also working on making the pipeline more visible to the user, so > that it would be easier for users to understand and debug their > networks. I believe a colleague of mine (Matty) presented this during > the last netdev conference. Thanks, I will look it up! Bye, Hannes