On Fri, Jun 14, 2013 at 10:11 AM, Takao Indoh <indou.ta...@jp.fujitsu.com>wrote:

> (2013/06/13 12:41), Bjorn Helgaas wrote:
> > On Wed, Jun 12, 2013 at 8:44 PM, Takao Indoh <indou.ta...@jp.fujitsu.com>
> wrote:
> >> (2013/06/12 13:45), Bjorn Helgaas wrote:
> >>> [+cc Vivek, Haren; sorry I didn't think to add you earlier]
> >>>
> >>> On Tue, Jun 11, 2013 at 12:08 AM, Takao Indoh
> >>> <indou.ta...@jp.fujitsu.com> wrote:
> >>>> (2013/06/11 11:20), Bjorn Helgaas wrote:
> >>>
> >>>>> I'm not sure you need to reset legacy devices (or non-PCI devices)
> >>>>> yet, but the current hook isn't anchored anywhere -- it's just an
> >>>>> fs_initcall() that doesn't give the reader any clue about the
> >>>>> connection between the reset and the problem it's solving.
> >>>>>
> >>>>> If we do something like this patch, I think it needs to be done at
> the
> >>>>> point where we enable or disable the IOMMU.  That way, it's connected
> >>>>> to the important event, and there's a clue about how to make
> >>>>> corresponding fixes for other IOMMUs.
> >>>>
> >>>> Ok. pci_iommu_init() is appropriate place to add this hook?
> >>>
> >>> I looked at various IOMMU init places today, and it's far more
> >>> complicated and varied than I had hoped.
> >>>
> >>> This reset scheme depends on enumerating PCI devices before we
> >>> initialize the IOMMU used by those devices.  x86 works that way today,
> >>> but not all architectures do (see the sparc pci_fire_pbm_init(), for
> >>
> >> Sorry, could you tell me which part depends on architecture?
> >
> > Your patch works if PCIe devices are reset before the kdump kernel
> > enables the IOMMU.  On x86, this is possible because PCI enumeration
> > happens before the IOMMU initialization.  On sparc, the IOMMU is
> > initialized before PCI devices are enumerated, so there would still be
> > a window where ongoing DMA could cause an IOMMU error.
>
> Ok, understood, thanks.
>
> Hmmm, it seems to be difficult to find out method which is common to
> all architectures. So, what I can do for now is introducing reset scheme
> which is only for x86.
>
> 1) Change this patch so that it work only on x86 platform. For example
>    call this reset code from x86_init.iommu.iommu_init() instead of
>    fs_initcall.
>
> Or another idea is:
>
> 2) Enumerate PCI devices in IOMMU layer. That is:
>    PCI layer
>      Just provide interface to reset given strcut pci_dev. Maybe
>      pci_reset_function() looks good for this purpose.
>    IOMMU layer
>      Determine which devices should be reset. On kernel boot, check if
>      IOMMU is already active or not, and if active, check IOMMU page
>      table and reset devices whose entry exists there.
>
> > Of course, it might be possible to reorganize the sparc code to to the
> > IOMMU init *after* it enumerates PCI devices.  But I think that change
> > would be hard to justify.
> >
> > And I think even on x86, it would be better if we did the IOMMU init
> > before PCI enumeration -- the PCI devices depend on the IOMMU, so
> > logically the IOMMU should be initialized first so the PCI devices can
> > be associated with it as they are enumerated.
>
> So third idea is:
>
> 3) Do reset before PCI enumeration(arch_initcall_sync or somewhere). We
>    need to implement new code to enumerate PCI devices and reset them
>    for this purpose.
>
> Idea 2 is not difficult to implement, but one problem is that this
> method may be dangerous. We need to scan IOMMU page table which is used
> in previous kernel, but it may be broken. Idea 3 seems to be difficult
> to implement...
>
>
> >
> >>> example).  And I think conceptually, the IOMMU should be enumerated
> >>> and initialized *before* the devices that use it.
> >>>
> >>> So I'm uncomfortable with that aspect of this scheme.
> >>>
> >>> It would be at least conceivable to reset the devices in the system
> >>> kernel, before the kexec.  I know we want to do as little as possible
> >>> in the crashing kernel, but it's at least a possibility, and it might
> >>> be cleaner.
> >>
> >> I bet this will be not accepted by kdump maintainer. Everything in panic
> >> kernel is unreliable.
> >
> > kdump is inherently unreliable.  The kdump kernel doesn't start from
> > an arbitrary machine state.  We don't expect it to tolerate all CPUs
> > running, for example.  Maybe it should be expected to tolerate PCI
> > devices running, either.
>
> What I wanted to say is that any resources of first kernel are
> unreliable. Under panic situation, struct pci_dev tree may be broken, or
> pci_lock may be already hold by someone, etc. So, if we do this in first
> kernel, maybe kdump needs its own code to enumerate PCI devices and
> reset them.
>
> Vivek?
>

Ping Vivek


>
> Thanks,
> Takao Indoh
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>



-- 
Regards
Dave
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Reply via email to