On Wed, Sep 07, 2016 at 11:55:06AM -0700, Hoan Tran wrote: > Hi Guenter, > > On Tue, Sep 6, 2016 at 11:39 PM, Guenter Roeck <li...@roeck-us.net> wrote: > > On 09/06/2016 11:07 PM, Hoan Tran wrote: > >> > >> Hi Guenter, > >> > >> On Tue, Sep 6, 2016 at 10:50 PM, Guenter Roeck <li...@roeck-us.net> wrote: > >>> > >>> On 09/06/2016 10:21 PM, Hoan Tran wrote: > >>>> > >>>> > >>>> Hi Guenter, > >>>> > >>>> Thank for your quick review ! > >>>> > >>>> On Tue, Sep 6, 2016 at 9:35 PM, Guenter Roeck <li...@roeck-us.net> > >>>> wrote: > >>>>> > >>>>> > >>>>> On 09/06/2016 08:46 PM, Hoan Tran wrote: > >>>>>> > >>>>>> > >>>>>> > >>>>>> The system crashes during probing xgene-hwmon driver when temperature > >>>>>> alarm interrupt occurs before. > >>>>>> It's because > >>>>>> - xgene_hwmon_probe() requests PCC mailbox channel which also enables > >>>>>> the mailbox interrupt. > >>>>>> - As temperature alarm interrupt is pending, ISR runs and crashes > >>>>>> when > >>>>>> accesses > >>>>>> into invalid resource as unmapped PCC shared memory. > >>>>>> > >>>>>> This patch fixes this issue by saving this alarm message and > >>>>>> scheduling > >>>>>> a > >>>>>> bottom handler after xgene_hwmon_probe() finish. > >>>>>> > >>>>> > >>>>> I am not completely happy with this fix. Main problem I have is that > >>>>> the > >>>>> processing associated with resp_pending doesn't happen until init_flag > >>>>> is > >>>>> set. > >>>>> Since the hwmon functions can be called right after > >>>>> hwmon_device_register_with_groups(), > >>>>> there is now a new race condition between that call and setting > >>>>> init_flag. > >>>> > >>>> > >>>> > >>>> I think it's still good if hwmon functions are called right after > >>>> hwmon_device_register_with_groups(). > >>>> The response message will be queued into FIFO and be processed later. > >>>> > >>> Yes, but the call to complete() won't happen in this case, or am I > >>> missing > >>> something ? > >> > >> > >> Yes, I think xgene_hwmon_rd() and xgene_hwmon_pcc_rd() functions have > >> to check "init_flag == true" before issue the read command. > >> > > > > This is getting more and more messy :-( > > > >> Thanks > >> Hoan > >> > >>> > >>> Guenter > >>> > >>> > >>>>> > >>>>> I am also a bit concerned with init_flag and rx_pending not being > >>>>> atomic > >>>>> and > >>>>> protected. > >>>>> What happens if a second interrupt occurs right after init_flag is set > >>>>> but > >>>>> before > >>>>> (or while) rx_pending is evaluated ? > >>>> > >>>> > >>>> > >>>> Yah, that's a good catch. I can re-use the "kfifo_lock" spinlock for > >>>> this atomic protection. > >>>> > >>>>> > >>>>> On top of that, init_flag and thus the added complexity is (unless I am > >>>>> missing > >>>>> something) only needed if acpi is enabled. Penaltizing non-acpi code > >>>>> doesn't > >>>>> seem > >>>>> to be optimal. > >>>> > >>>> > >>>> > >>>> I think, with DT, we still need this flag. In a case of temperature > >>>> alarm, the driver needs to set "temp1_critical_alarm" sysfs. > >>>> This "temp1_critical_alarm" should be created before "init_flag" = true. > >>>> > > > > I don't know. After all, any such alarm would be lost if it occurred > > before the driver was loaded, no ? Those mailboxes should really have > > a means to be informed that the initiator is ready to handle > > interrupts/callbacks. > > We don't want to miss an alarm message. User should be warned. > > As > ctx->hwmon_dev = hwmon_device_register_with_groups(ctx->dev, > > "apm_xgene", > ctx, > > xgene_hwmon_groups); > How about, callback functions check the ctx->hwmon_dev validation. If > not, they just save msg into FIFO. > Beside of that, as hwmon functions can be called before ctx->hwmon_dev > is assigned. Callback functions check if there is a mailbox response > pending before saving msg into FIFO as below > > static int xgene_hwmon_rx_ready(struct xgene_hwmon_dev *ctx, void *msg) > { > if (IS_ERR(ctx->hwmon_dev) && !ctx->resp_pending) {
Probably IS_ERR_OR_ZERO() ? Guenter > /* Enqueue to the FIFO */ > kfifo_in_spinlocked(&ctx->async_msg_fifo, msg, > sizeof(struct slimpro_resp_msg), > &ctx->kfifo_lock); > > return -EBUSY; Need to find something else. EBUSY isn't correct. > } > > return 0; > } > > > At the end of probe function, driver always schedules a bottom handler > to handle FIFO msg. > > Then we can remove the init_flag and rx_pending. > At least better than before, though I think it is still racy. It might be worthwhile checking by adding a large msleep() after hwmon registration and before ctx->hwmon_dev is written, and have a user space program access sysfs attributes immediately after they have been created. Thanks, Guenter