On 15. nov. 2017 15:08, Andrew Lunn wrote:
On Wed, Nov 15, 2017 at 01:08:22PM +0100, Egil Hjelmeland wrote:
Hi experts
Hi, thanks for response, both Andrew and Vivien!
I am hoping for some guidance.
Does DSA offer any protection against concurrent calls of dsa_switch_ops?
Hi Egil
DSA itself does not.
There are various upper locks, which protect some calls, in some ways.
e.g. phy ops are protected by the mdio lock. stats calls are protected
by the rtnl lock, as well as some other calls. And other locks protect
other things.
But nothing gives you protection across them all.
For the mv88e6xxx driver, we took the simple approach. We generally
take a lock at the beginning of each of the dsa_swtich_ops functions,
and release it at the end. Since all accesses to the chip go through
two read/write functions, we also have code in them to detect when
they are called without holding the lock.
Some driver writers worry about performance in some situations, and
want finer grain locking. So they have multiple locks. When reviewing
drivers i will look for obvious locking issues, but don't look too
deeply. Without knowing the chip, it has hard for me to know if
something is safe or not. So i would not be surprised if there are
locking issues in some drivers.
The most "interesting" part of the lan9303 driver that has no locking is the
ALR (=fdb/mdb). ALR access is a sequence of register operations. Anyway it
is very unlikely that mdb related calls are reentered. But if it can happen,
it would mean that IGMP snooping can go wrong. (Which is actually very bad
in our applications.)
Is this something to worry about?
I would suggest looking a bit higher in the stack. fdb/mdb operations
come via switchdev, and have a notification mechanism between slave.c
and port.c. Check if that notification mechanism enforces
serialisation. Also, check that everything actually does go though
this notification mechanism. Maybe the dump operations do not?
OK, for my education I took a look in upper layers. Bridge layer specify
SWITCHDEV_F_DEFER option to switchdev operations. Which means switchdev
hand the work over to a workqueue. Which is executed by a kworker kernel
thread. In DSA, operations go through raw_notifier_call_chain.
raw_notifier_call_chain has no locking, and I assume it executes in same
context. A dump_stack() in the driver confirm my theory.
So the (most?) dsa operations execute in switchdev_deferred_process_work
queue. If a operation sleep, other dsa operations will run in the mean
time. So there is no serialization. Just as indicated by Vivien.
So if I still have time at hands when net-next opens again, I will do
something about it for lan9303.
And then check the lower levels of the driver. If say statistics
operations are performed at the same time as fdb/mdb, can the register
accesses get interleaved? If they can, is that actually a problem for
the hardware?
I have not seen anything in the datasheet about simultaneous access to
different registers. Until proven otherwise, I assume protecting
functions that require a sequence of related read/write operations will do.
At the moment I have changed my mind, I think it is better to add a new
alr_mutex to protect the ALR (fdb/mdb) operations. And not touch the
existing mutex. alr_mutex need to be locked in lan9303_alr_add_port,
lan9303_alr_del_port and lan9303_alr_loop, all of them simple functions.
Andrew
Egil