On Fri, Feb 02, 2024 at 12:15:40PM -0400, Jason Gunthorpe wrote:

> > Yes looks like a race of some sort. Adding a bit of debug also makes the
> > issue go away so difficult to see what is happening.
> 
> I'm wondering if it is racing with iommu driver probing? I looked but
> didn't notice anything obviously wrong there that would cause this
> though.

Oh there is at least one racy thing here..

The of_xlate hackjob is out of order and is racy if you have multiple instances:

struct tegra_smmu *tegra_smmu_probe(struct device *dev,
                                    const struct tegra_smmu_soc *soc,
                                    struct tegra_mc *mc)
{
        /*
         * This is a bit of a hack. Ideally we'd want to simply return this
         * value. However iommu_device_register() will attempt to add
         * all devices to the IOMMU before we get that far. In order
         * not to rely on global variables to track the IOMMU instance, we
         * set it here so that it can be looked up from the .probe_device()
         * callback via the IOMMU device's .drvdata field.
         */
        mc->smmu = smmu;
         ^^^^^^^^^^^^^
           After this of_xlate and probe_device will succeed

        [...]
        err = iommu_device_sysfs_add(&smmu->iommu, dev, NULL, dev_name(dev));   
            
        err = iommu_device_register(&smmu->iommu, &tegra_smmu_ops, dev);
        ^^^^^^^^^
        But the iommu instance is not fully initialized yet.

So:

static int tegra_smmu_of_xlate(struct device *dev,
                               struct of_phandle_args *args)
{
        struct platform_device *iommu_pdev = of_find_device_by_node(args->np);
        struct tegra_mc *mc = platform_get_drvdata(iommu_pdev);

        dev_iommu_priv_set(dev, mc->smmu);
                            ^^^^^^^^^^^^
                             Gets the partially initialized iommu
                             instance


static struct iommu_device *tegra_smmu_probe_device(struct device *dev)
{
        smmu = dev_iommu_priv_get(dev);
        if (!smmu)
                return ERR_PTR(-ENODEV);
               ^^^^^^^^^^^^^
                  Allows the driver to bind to a partially setup instance

ie if you have multiple instances of tegra-smmu and you manage to do
concurrent probe where the iommu instance is probing concurrently with
the failing device_add flow then you can a situation like you have
described where the sysfs is not fully setup.

Is this making sense to you? Add a sleep after the mc->smmu store and
confirm with printing?

I think this is all an insane design. I fixed this race and removed
all this hackery in my fwspec removal series. There the iommu instance
only ever comes out of the locked list that iommu_device_register()
populates and drivers have a safe and simple flow.

Maybe just moving the store to mc->smmu later would improve it? I
didn't look closely..

Jason

Reply via email to