Hi, On 13/06/19 1:22 PM, Peter Rosin wrote: > Hi, > > On 2019-06-13 06:57, Kishon Vijay Abraham I wrote: >> Hi Peter, >> >> On 13/06/19 4:20 AM, Peter Rosin wrote: >>> Hi! >>> >>> [I know this has already been merged upstream, but I only just >>> now noticed the code and went to the archives to find the >>> originating mail. I hope I managed to set in-reply-to correctly...] >>> >>> The mux handling is problematic and does not follow the rules. >>> It needs to be fixed, or you may face deadlocks. See below. >>> >>> On 2019-04-05 11:08, Kishon Vijay Abraham I wrote: >>>> Add a new SERDES driver for TI's AM654x SoC which configures >>>> the SERDES only for PCIe. Support fo USB3 will be added later. >>>> >>>> SERDES in am654x has three input clocks (left input, externel reference >>>> clock and right input) and two output clocks (left output and right >>>> output) in addition to a PLL mux clock which the SERDES uses for Clock >>>> Multiplier Unit (CMU refclock). >>>> >>>> The PLL mux clock can select from one of the three input clocks. >>>> The right output can select between left input and external reference >>>> clock while the left output can select between the right input and >>>> external reference clock. >>>> >>>> The driver has support to select PLL mux and left/right output mux as >>>> specified in device tree. >>>> >>>> [rog...@ti.com: Fix boot lockup caused by accessing a structure member >>>> (hw->init) allocated in stack of probe() and accessed in get_parent] >>>> [rog...@ti.com: Fix "Failed to find the parent" warnings] >>>> Signed-off-by: Roger Quadros <rog...@ti.com> >>>> Signed-off-by: Kishon Vijay Abraham I <kis...@ti.com> > > *snip* > >>>> +static void serdes_am654_release(struct phy *x) >>>> +{ >>>> + struct serdes_am654 *phy = phy_get_drvdata(x); >>>> + >>>> + phy->type = PHY_NONE; >>>> + phy->busy = false; >>>> + mux_control_deselect(phy->control); >>> >>> Here you unconditionally deselect the mux, and that seems >>> dangerous. Are you *sure* that ->release may not be called >>> without a successful xlate call? >> >> Yeah, without a successful xlate(), the consumer will never get a reference >> to >> the PHY and the ->release() is invoked only from phy_put() which needs a >> reference to the PHY. > > Yes, I thought it might be ok, but good that you can confirm it. > >>> I'm not 100% sure of that, but I have not looked at the phy >>> code before today, so it may very well be the case that this >>> is safe... >>> >>>> +} >>>> + >>>> +struct phy *serdes_am654_xlate(struct device *dev, struct of_phandle_args >>>> + *args) >>>> +{ >>>> + struct serdes_am654 *am654_phy; >>>> + struct phy *phy; >>>> + int ret; >>>> + >>>> + phy = of_phy_simple_xlate(dev, args); >>>> + if (IS_ERR(phy)) >>>> + return phy; >>>> + >>>> + am654_phy = phy_get_drvdata(phy); >>>> + if (am654_phy->busy) >>>> + return ERR_PTR(-EBUSY); >>>> + >>>> + ret = mux_control_select(am654_phy->control, args->args[1]); >>>> + if (ret) { >>>> + dev_err(dev, "Failed to select SERDES Lane Function\n"); >>>> + return ERR_PTR(ret); >>>> + } >>> >>> *However* >>> >>> Here you select the mux as the last action, good, but, a mux must >>> be handled with that same care as a locking primitive, i.e. >>> successful selects must be perfectly balanced with deselects. I >>> see no guarantee of that here, since there are other failures >>> possible after the xlate call. So, being last in the function >>> does not really help. If I read the code correctly, the >>> phy core may fail if try_module_get fails in phy_get(). If that >>> ever happens, a successful call to mux_control_select is simply >>> forgotten, and the mux will be locked indefinitely. >> >> Good catch. While adding ->release() ops which is only invoked from phy_put, >> perhaps this was missed. Ideally it should be invoked from other places where >> there is a failure after phy_get. >>> >>> am654_phy->busy will also be set indefinitely, so you will get >>> -EBUSY and not a hard deadlock. At least here, but if the now >>> locked mux control happens to also control some other muxes >>> (probably unlikely, but if), then their consumers will potentially >>> deadlock hard. But that's just after a cursory reading, so I may >>> completely miss something... >> >> The ->busy here should prevent two consumers trying to control the same mux. > > Aha, you do not seem to be aware that one mux controller can > have multiple independent consumers (a mux is not like a gpio > or a pwm in that aspect). What I'm talking about is a single > mux control that controls several parallel muxes, each with its > own consumer. In the specific case you are targeting, that may > not be possible due to some hardware reason or something, but > looking at just this driver *it* cannot know that the mux will > be available just because it has a local ->busy flag.
I think this should be okay atleast for my case. My dt node will looks something like this serdes_mux: mux-controller { compatible = "mmio-mux"; #mux-control-cells = <1>; mux-reg-masks = <0x4080 0x3>, /* SERDES0 lane select */ <0x4090 0x3>; /* SERDES1 lane select */ } Here SERDES lane0 is muxed between PCIe0 Lane0, ICSS2 SGMII Lane0. The consumers of SERDES0 (in this case PCIe and ICSS), will have to invoke ->xlate of AM654 SERDES driver to select the mux and ->busy should be able to tell whether it is available. Only when multiple drivers try to get devm_mux_control_get and use mux_control_select, the problem might occur. For my case, only SERDES is the consumer of mux. > > For this case, I get the feeling that the mux may be selected for > a very long time, right? It is never about selecting the mux, > doing something for x milli/microseconds or so and then deselecting > the mux. Perhaps the mux will typically sit in the same state for > the entire uptime of the machine? Yes right. It'll be for the entire up time. Well, with dt overlay this could change. > > If you have these very long access patterns, the sharing capability > of the mux controls are pretty much useless, and I have > contemplating a mux mode to support this case. I.e. where you > lock/unlock the mux control once at probe/release (or similar), > and then basically instead of a shared mux get an exclusive mux > where the consumer is responsible for not making parallel accesses. > In other words, just like a gpio or a pwm. Yes, having an exclusive mux will make sure no one can accidentally modify the mux. > > The problem is that I then need a definition of "long" and "short" > accesses, and I split the mux universe in two... Are you trying to make sure there is no deadlock when two different consumers try to control a shared mux? Thanks Kishon