Hi Marek,
On Sat, 3 Jan 2026 at 19:25, Marek Vasut <[email protected]> wrote:
> On 10/29/25 3:08 PM, Geert Uytterhoeven wrote:
> > default_suspend_ok+0xb4/0x20c (P)
> > genpd_runtime_suspend+0x11c/0x4e0
> > __rpm_callback+0x44/0x1cc
> > rpm_callback+0x6c/0x78
> > rpm_suspend+0x108/0x564
> > pm_runtime_work+0xb8/0xbc
> > process_one_work+0x144/0x280
> > worker_thread+0x2c8/0x3d0
> > kthread+0x128/0x1e0
> > ret_from_fork+0x10/0x20
> > Code: aa1303e0 52800863 b0005661 912dc021 (f9402095)
> > ---[ end trace 0000000000000000 ]---
> >
> > This driver uses manual PM Domain handling for multiple PM Domains. In
> > my case, pvr_power_domains_fini() calls dev_pm_domain_detach() (twice),
> > which calls dev_pm_put_subsys_data(), and sets dev->power.subsys_data to
> > NULL when psd->refcount reaches zero.
> >
> > Later/in parallel, default_suspend_ok() calls dev_gpd_data():
> >
> > static inline struct generic_pm_domain_data *dev_gpd_data(struct
> > device *dev)
> > {
> > return to_gpd_data(dev->power.subsys_data->domain_data);
> > }
> >
> > triggering the NULL pointer dereference. Depending on timing, it may
> > crash earlier or later in genpd_runtime_suspend(), or not crash at all
> > (initially, I saw it only with extra debug prints in the genpd subsystem).
>
> I came to the same conclusion when revisiting it yesterday and today.
>
> The power 3dg-{a,b} domains are in RPM_SUSPENDING state, the
> __rpm_callback() is running and it unlocks dev->power.lock spinlock for
> just long enough, that the pvr_power_domains_fini() can issue
> dev_pm_domain_detach() and then dev_pm_put_subsys_data() , which unsets
> subsys_data, which are later still used by the __rpm_callback() (really
> the genpd_runtime_suspend() -> suspend_ok() it calls for this domain).
>
> But, I wonder if the problem is actually in the CPG MSSR clock domain
> driver. The pvr_power_domains_fini() dev_pm_domain_detach() really calls
> cpg_mssr_detach_dev() which calls pm_clk_destroy() and that invokes the
> dev_pm_domain_detach() which unsets the subsys_data . The
> pm_clk_destroy() documentation is explicit about it unsetting the
> subsys_data .
>
> I wonder if what we need to do instead, is patch the CPG MSSR clock
> domain driver such, that it would surely NOT call pm_clk_destroy()
> before the domain transitioned from RPM_SUSPENDING -> RPM_SUSPENDED
> state and surely is done with all its __rpm_callback() invocations ?
>
> Can you please test this change and see if it fixes the problem ?
>
> The barrier should guarantee that the domain is settled and no more
> callbacks are still running.
Thank you, that indeed fixes the issue!
However, I am not so sure this barrier belongs in the .detach_dev()
callback. The documentation for almost all dev_pm_domain_{at,de}tach*()
functions states:
* Callers must ensure proper synchronization of this function with power
* management callbacks.
However, I couldn't find any user that calls pm_runtime_barrier() first.
In case of multiple PM domains, it is even more complicated, as
dev_pm_domain_attach_list() (and pvr_power_domains_init(), which is
basically an open-coded variant of the former) creates a list of virtual
devices, which all need synchronization. For the devres-enabled version
(devm_pm_domain_attach_list()), the caller cannot take care of calling
pm_runtime_barrier() anyway, so it has to be handled by the PM core?
> --- a/drivers/clk/renesas/renesas-cpg-mssr.c
> +++ b/drivers/clk/renesas/renesas-cpg-mssr.c
> @@ -24,6 +24,7 @@
> #include <linux/platform_device.h>
> #include <linux/pm_clock.h>
> #include <linux/pm_domain.h>
> +#include <linux/pm_runtime.h>
> #include <linux/psci.h>
> #include <linux/reset-controller.h>
> #include <linux/slab.h>
> @@ -656,8 +657,10 @@ int cpg_mssr_attach_dev(struct generic_pm_domain
> *unused, struct device *dev)
>
> void cpg_mssr_detach_dev(struct generic_pm_domain *unused, struct
> device *dev)
> {
> - if (!pm_clk_no_clocks(dev))
> + if (!pm_clk_no_clocks(dev)) {
> + pm_runtime_barrier(dev);
> pm_clk_destroy(dev);
> + }
> }
>
> static void cpg_mssr_genpd_remove(void *data)
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds