Re: 3.6-rc7 boot crash + bisection

2012-09-28 Thread Roedel, Joerg
On Wed, Sep 26, 2012 at 04:04:03PM -0600, Alex Williamson wrote:

> Here's a lockdep clean version of it:
> 
> amd_iommu: Handle aliases not backed by devices
> 
> Aliases sometimes don't have a struct pci_dev backing them.  This breaks
> our attempt to figure out the topology and device quirks that may effect
> IOMMU grouping.  When this happens, allocate an IOMMU group on the
> dev_data for the alias and make use of it for all devices referencing
> this alias.

Yes, this is the real fix. But it is too big for v3.6 at this time, so
I'll would take this for 3.7 and use my small fix for 3.6.

> Signed-off-by: Alex Williamson 
> ---
> 
> diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
> index b64502d..4eacb17 100644
> --- a/drivers/iommu/amd_iommu.c
> +++ b/drivers/iommu/amd_iommu.c
> @@ -71,6 +71,7 @@ static DEFINE_SPINLOCK(iommu_pd_list_lock);
>  /* List of all available dev_data structures */
>  static LIST_HEAD(dev_data_list);
>  static DEFINE_SPINLOCK(dev_data_list_lock);
> +static DEFINE_MUTEX(dev_data_iommu_group_lock);

I think this lock is not necessary. The iommu_init_device routine does
not run multiple times in parallel for the same device. So we should be
safe on that side.


Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.6-rc7 boot crash + bisection

2012-09-26 Thread Roedel, Joerg
On Wed, Sep 26, 2012 at 08:35:59AM -0600, Alex Williamson wrote:
> Hmm, that throws a kink in iommu groups.  So perhaps we need to make an
> alias interface to iommu groups.  Seems like this could just be an extra
> parameter to iommu_group_get and iommu_group_add_device (empty in the
> typical case).  Then we have the problem of what's the type for an
> alias?  For AMI-Vi, it's a u16, but we need to be more generic than
> that.  Maybe iommu groups should just treat it as a void* so iommus can
> use a pointer to some structure or a fixed value like a u16 bus:slot.
> Thoughts?

Good question. The iommu-groups are part of the IOMMU-API, with an
interface to the IOMMU drivers and one to the users of IOMMU-API. So the
alias handling itself should be a function of the interface to the IOMMU
driver. In general the interface should not be bus specific.

So a void pointer seems the only logical choice then. But I would not
limit its scope to alias handling. How about making it a bus-private
pointer where IOMMU driver store bus-specific information. That way we
make sure that there is one struct per bus-type for this pointer, and
not one structure per IOMMU driver.


Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.6-rc7 boot crash + bisection

2012-09-26 Thread Roedel, Joerg
On Wed, Sep 26, 2012 at 08:52:01AM -0600, Alex Williamson wrote:
> Assuming this works, it may be ok as a 3.7 fix, but if there was
> actually more than one device behind the alias we'd expose them as
> separate iommu groups.  I don't think that's what we want.  Maybe it
> should at least get a pr_warn.  Thanks,

True, we need something more generic as the real fix. When Florian
reports success I'll try to get this still into 3.6, otherwise to
-stable.


Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.6-rc7 boot crash + bisection

2012-09-26 Thread Roedel, Joerg
Florian,

On Wed, Sep 26, 2012 at 01:01:54AM +0200, Florian Dazinger wrote:
> you're right, either "amd_iommu=off" or removing the audio card makes
> the failure disappear. I will test the new BIOS rev. tomorrow.

Can you please test this diff and report if it fixes the problem for
you?

Thanks.

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index b64502d..e89daf1 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -266,7 +266,7 @@ static void swap_pci_ref(struct pci_dev **from, struct 
pci_dev *to)
 
 static int iommu_init_device(struct device *dev)
 {
-   struct pci_dev *dma_pdev, *pdev = to_pci_dev(dev);
+   struct pci_dev *dma_pdev = NULL, *pdev = to_pci_dev(dev);
struct iommu_dev_data *dev_data;
struct iommu_group *group;
u16 alias;
@@ -293,7 +293,9 @@ static int iommu_init_device(struct device *dev)
dev_data->alias_data = alias_data;
 
dma_pdev = pci_get_bus_and_slot(alias >> 8, alias & 0xff);
-   } else
+   }
+
+   if (dma_pdev == NULL)
dma_pdev = pci_dev_get(pdev);
 
/* Account for quirked devices */

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.6-rc7 boot crash + bisection

2012-09-26 Thread Roedel, Joerg
On Tue, Sep 25, 2012 at 01:43:46PM -0600, Alex Williamson wrote:
> Joerg, any thoughts on a quirk for this?  Unfortunately we can't just
> skip IOMMU groups when an alias is broken because it puts the other
> IOMMU groups at risk that might not actually be isolated from this
> device.  It looks like we parse the alias info before PCI is probed, so
> maybe we'd need to call the quirk from iommu_init_device itself.

I fear that the BIOS does everything right and device 08:04.0 is indeed
using 08:00.0 as request-id. There are a couple of devices where this
happens, usually when the vendor just took the old 32bit PCI chip, added
a transparent PCIe-to-PCI bridge to the device and sell it a PCIe.

So the assumption that every request-id has a corresponding pci_dev
structure does not hold. I also had made that assumption in the
AMD IOMMU driver but had to add code which removes that assumption. We
should look for a way to remove that assumption from the group-code too.


Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-next: Tree for Sept 24 (iommu)

2012-09-25 Thread Roedel, Joerg
On Mon, Sep 24, 2012 at 09:23:25PM -0600, Alex Williamson wrote:
> On Mon, 2012-09-24 at 15:02 -0700, Randy Dunlap wrote:
> > On 09/24/2012 07:53 AM, Stephen Rothwell wrote:
> > 
> > > Hi all,
> > > 
> > > Today was a train wreck, with lots of new conflicts across several trees
> > > and a few build failures as well.
> > > 
> > > Changes since 201209021:
> > > 
> > 
> > 
> > 
> > on i386:
> > 
> > drivers/built-in.o: In function `iommu_group_remove_device':
> > (.text+0x74cb10): multiple definition of `iommu_group_remove_device'
> > arch/x86/built-in.o:(.text+0x140d0): first defined here
> ...
> 
> 
> Here's a patch to get it past this.  It still doesn't fully build, but
> the rest isn't iommu related.  Thanks,

Applied, thanks.

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/