On Fri, Jun 14, 2013 at 10:11 AM, Takao Indoh <indou.ta...@jp.fujitsu.com>wrote:
> (2013/06/13 12:41), Bjorn Helgaas wrote: > > On Wed, Jun 12, 2013 at 8:44 PM, Takao Indoh <indou.ta...@jp.fujitsu.com> > wrote: > >> (2013/06/12 13:45), Bjorn Helgaas wrote: > >>> [+cc Vivek, Haren; sorry I didn't think to add you earlier] > >>> > >>> On Tue, Jun 11, 2013 at 12:08 AM, Takao Indoh > >>> <indou.ta...@jp.fujitsu.com> wrote: > >>>> (2013/06/11 11:20), Bjorn Helgaas wrote: > >>> > >>>>> I'm not sure you need to reset legacy devices (or non-PCI devices) > >>>>> yet, but the current hook isn't anchored anywhere -- it's just an > >>>>> fs_initcall() that doesn't give the reader any clue about the > >>>>> connection between the reset and the problem it's solving. > >>>>> > >>>>> If we do something like this patch, I think it needs to be done at > the > >>>>> point where we enable or disable the IOMMU. That way, it's connected > >>>>> to the important event, and there's a clue about how to make > >>>>> corresponding fixes for other IOMMUs. > >>>> > >>>> Ok. pci_iommu_init() is appropriate place to add this hook? > >>> > >>> I looked at various IOMMU init places today, and it's far more > >>> complicated and varied than I had hoped. > >>> > >>> This reset scheme depends on enumerating PCI devices before we > >>> initialize the IOMMU used by those devices. x86 works that way today, > >>> but not all architectures do (see the sparc pci_fire_pbm_init(), for > >> > >> Sorry, could you tell me which part depends on architecture? > > > > Your patch works if PCIe devices are reset before the kdump kernel > > enables the IOMMU. On x86, this is possible because PCI enumeration > > happens before the IOMMU initialization. On sparc, the IOMMU is > > initialized before PCI devices are enumerated, so there would still be > > a window where ongoing DMA could cause an IOMMU error. > > Ok, understood, thanks. > > Hmmm, it seems to be difficult to find out method which is common to > all architectures. So, what I can do for now is introducing reset scheme > which is only for x86. > > 1) Change this patch so that it work only on x86 platform. For example > call this reset code from x86_init.iommu.iommu_init() instead of > fs_initcall. > > Or another idea is: > > 2) Enumerate PCI devices in IOMMU layer. That is: > PCI layer > Just provide interface to reset given strcut pci_dev. Maybe > pci_reset_function() looks good for this purpose. > IOMMU layer > Determine which devices should be reset. On kernel boot, check if > IOMMU is already active or not, and if active, check IOMMU page > table and reset devices whose entry exists there. > > > Of course, it might be possible to reorganize the sparc code to to the > > IOMMU init *after* it enumerates PCI devices. But I think that change > > would be hard to justify. > > > > And I think even on x86, it would be better if we did the IOMMU init > > before PCI enumeration -- the PCI devices depend on the IOMMU, so > > logically the IOMMU should be initialized first so the PCI devices can > > be associated with it as they are enumerated. > > So third idea is: > > 3) Do reset before PCI enumeration(arch_initcall_sync or somewhere). We > need to implement new code to enumerate PCI devices and reset them > for this purpose. > > Idea 2 is not difficult to implement, but one problem is that this > method may be dangerous. We need to scan IOMMU page table which is used > in previous kernel, but it may be broken. Idea 3 seems to be difficult > to implement... > > > > > >>> example). And I think conceptually, the IOMMU should be enumerated > >>> and initialized *before* the devices that use it. > >>> > >>> So I'm uncomfortable with that aspect of this scheme. > >>> > >>> It would be at least conceivable to reset the devices in the system > >>> kernel, before the kexec. I know we want to do as little as possible > >>> in the crashing kernel, but it's at least a possibility, and it might > >>> be cleaner. > >> > >> I bet this will be not accepted by kdump maintainer. Everything in panic > >> kernel is unreliable. > > > > kdump is inherently unreliable. The kdump kernel doesn't start from > > an arbitrary machine state. We don't expect it to tolerate all CPUs > > running, for example. Maybe it should be expected to tolerate PCI > > devices running, either. > > What I wanted to say is that any resources of first kernel are > unreliable. Under panic situation, struct pci_dev tree may be broken, or > pci_lock may be already hold by someone, etc. So, if we do this in first > kernel, maybe kdump needs its own code to enumerate PCI devices and > reset them. > > Vivek? > Ping Vivek > > Thanks, > Takao Indoh > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > -- Regards Dave
_______________________________________________ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu