Re: [PATCH 8/8] powernv/eeh: Do hotplug on devices without EEH aware driver
On Thu, 2013-06-27 at 13:46 +0800, Gavin Shan wrote: > During recovery for EEH errors, the device driver requires reset > explicitly (most of cases). The EEH core doesn't do hotplug during > reset. However, there might have some device drivers that can't > support EEH. So the deivce can't be put into quite state during > the reset and possibly requesting PCI config or MMIO access. That > would lead to the failure of the reset, and we don't expect that. > > The patch intends to fix that by removing those devices whose drivers > can't support EEH before reset and added into the system after reset. > In the result, it would avoid the race condition mentioned as above. > The idea was proposed by Ben. Ok so, while the basic idea is sane, I have some problems with the implementation. First, you add a lot of code conditional to eeh_probe_mode_dev() and I don't see why. There is no reason not to do the exact same remove/add operations under pHyp. This is actually a problem I have with the "new" EEH code overall, there are too much gratuitous differences between the two "modes" and it's very tricky to figure out exactly what and why, meaning this is going to be a source of bugs. I think it's worthwhile trying to reconcile some of that. But then, there is *already* code to do a similar unplug/replug in EEH, except that it's done inside eeh_reset_device() in the "other" case (and indeed not necessarily in the best way). Let's look a bit more: > +extern void pcibios_setup_device(struct pci_dev *dev); Unrelated: The above is gone as an exported symbol from upstream as of today. I'll merge upstream in before applying your patches so don't add that. You shouldn't need anyway, see below. > extern void pcibios_setup_bus_devices(struct pci_bus *bus); > extern void pcibios_setup_bus_self(struct pci_bus *bus); > extern void pcibios_setup_phb_io_space(struct pci_controller *hose); > diff --git a/arch/powerpc/kernel/eeh_driver.c > b/arch/powerpc/kernel/eeh_driver.c > index 2b1ce17..cb3baab 100644 > --- a/arch/powerpc/kernel/eeh_driver.c > +++ b/arch/powerpc/kernel/eeh_driver.c > @@ -244,6 +244,7 @@ static void *eeh_report_reset(void *data, void *userdata) > struct pci_dev *dev = eeh_dev_to_pci_dev(edev); > enum pci_ers_result rc, *res = userdata; > struct pci_driver *driver; > + bool enable_irq = true; > > if (!dev) return NULL; > dev->error_state = pci_channel_io_normal; > @@ -251,7 +252,21 @@ static void *eeh_report_reset(void *data, void *userdata) > driver = eeh_pcid_get(dev); > if (!driver) return NULL; > > - eeh_enable_irq(dev); So I would very much like to understand a bit more that irq business, for example, what does it do when MSIs are enabled, etc... but that is also probably something to look at separately (you have a TODO list ?) > + /* > + * For those PCI devices just added, we reloaded its driver > + * and needn't to enable the interrupt. The driver should > + * take care of that. Otherwise, complaint raised from IRQ > + * subsystem. > + */ > + if (eeh_probe_mode_dev() && (edev->mode & EEH_DEV_REMOVED)) { > + edev->mode &= ~(EEH_DEV_REMOVED | EEH_DEV_IRQ_DISABLED); > + edev->bus = NULL; > + enable_irq = false; > + > + } Ok, yet another eeh_probe_mode_dev() ... they irk me :-) There should be no functional differences between EEH operations on powernv and pseries at a high level and this is high level... > + if (enable_irq) > + eeh_enable_irq(dev); > > if (!driver->err_handler || > !driver->err_handler->slot_reset) { > @@ -338,6 +353,115 @@ static void *eeh_report_failure(void *data, void > *userdata) > return NULL; > } > > +static void *eeh_rmv_device(void *data, void *userdata) > +{ > + struct pci_driver *driver; > + struct eeh_dev *edev = (struct eeh_dev *)data; > + struct pci_dev *dev = eeh_dev_to_pci_dev(edev); > + int *removed = (int *)userdata; > + > + /* > + * Actually, we should remove the PCI bridges as well. > + * However, that's lots of complexity to do that, > + * particularly some of devices under the bridge might > + * support EEH. So we just care about PCI devices for > + * simplicity here. > + */ > + if (!dev || (dev->hdr_type & PCI_HEADER_TYPE_BRIDGE)) > + return NULL; > + driver = eeh_pcid_get(dev); > + if (driver && driver->err_handler) > + return NULL; So here we end up basically re-inventing PCI hotplug: > + /* If the driver doesn't support EEH, remove it */ > + pr_info("EEH: Removing device %s without EEH support\n", > + pci_name(dev)); > + > + /* Detach EEH device from PCI device */ > + edev->pdev = NULL; > + dev->dev.archdata.edev = NULL; > + pci_dev_put(dev); > + > + /* Remove and address cache */ > + eeh_addr_cache_rmv_dev(dev); > + eeh_sysfs_remove_device(dev); > + > +
[git pull] Please pull powerpc.git merge branch
Hi Linus ! Earlier today I mentioned that while we had fixed the kernel crashes, EEH error recovery didn't always recover... It appears that I had a fix for that already in powerpc-next (with a stable CC). I cherry-picked it today and did a few tests and it seems that things now work quite well. The patch is also pretty simple, so I see no reason to wait before merging it. Please pull. Cheers, Ben. The following changes since commit 6c355beafdbd0a62add3a3d89825ca87cf8ecec0: Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc (2013-06-29 17:02:48 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc.git merge for you to fetch changes up to ea461abf61753b4b79e625a7c20650105b990f21: powerpc/eeh: Fix fetching bus for single-dev-PE (2013-06-30 14:08:34 +1000) Gavin Shan (1): powerpc/eeh: Fix fetching bus for single-dev-PE arch/powerpc/platforms/pseries/eeh_pe.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc/pci: Avoid overriding MSI interrupt
On Sun, 2013-06-30 at 09:51 +0800, Gavin Shan wrote: > Ben, I think one patch was lost from mainline and that fixes the problem. > > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/?id=5fb621698e94e3af8b413d9439041fde48e2784d > > I had the patch applied to /home/benh/linux-test and have following commands > to inject errors, everything looks good :-) > > errinjct eeh -v -f 0 -p U78AB.001.WZSGBJ6-P1-C5-T1 -a 0x0 -m 0x0 > errinjct eeh -v -f 0 -p U78AB.001.WZSGBJ6-P1-C6-T1 -a 0x0 -m 0x0 Ok, thanks, it's in -next and not upstream yet. I'll ask Linus to pull that one. Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc/pci: Avoid overriding MSI interrupt
On Sun, Jun 30, 2013 at 09:09:20AM +1000, Benjamin Herrenschmidt wrote: >On Fri, 2013-06-28 at 21:10 +0800, Gavin Shan wrote: .../... >I'm running some tests, so far it looks good. However, Gavin, when you >have a chance on vpl3, try injecting errors to other adapters, for >example the VGA adapter (you need to do lspci to trigger the EEH >detection after that since there's no driver and use the "loc code" >variant off errinjct) or eth2 (the cxgb3). > >All I get from EEH with these is: > >[ 362.962564] EEH: Detected PCI bus error on PHB#7-PE#1 >[ 362.962570] eeh_handle_event: Cannot find PCI bus for PHB#7-PE#1 > >and > >[ 424.381083] EEH: Detected PCI bus error on PHB#6-PE#1 >[ 424.381089] eeh_handle_event: Cannot find PCI bus for PHB#6-PE#1 > >Followed by ... nothing. > Ben, I think one patch was lost from mainline and that fixes the problem. https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/?id=5fb621698e94e3af8b413d9439041fde48e2784d I had the patch applied to /home/benh/linux-test and have following commands to inject errors, everything looks good :-) errinjct eeh -v -f 0 -p U78AB.001.WZSGBJ6-P1-C5-T1 -a 0x0 -m 0x0 errinjct eeh -v -f 0 -p U78AB.001.WZSGBJ6-P1-C6-T1 -a 0x0 -m 0x0 Thanks, Gavin ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc/pci: Avoid overriding MSI interrupt
On Sun, Jun 30, 2013 at 09:09:20AM +1000, Benjamin Herrenschmidt wrote: > On Fri, 2013-06-28 at 21:10 +0800, Gavin Shan wrote: > > The issue was introduced by commit 37f02195 ("powerpc/pci: fix > > PCI-e devices rescan issue on powerpc platform"). The field > > (struct pci_dev::irq) is reused by PCI core to trace the base > > MSI interrupt number if the MSI stuff is enabled on the corresponding > > device. When running to pcibios_setup_device(), we possibly still > > have enabled MSI interrupt on the device. That means "pci_dev->irq" > > still have the base MSI interrupt number and it will be overwritten > > if we're going fix "pci_dev->irq" again by pci_read_irq_line(). > > Eventually, when we enable the device, it runs to kernel crash caused > > by fetching the the MSI interrupt descriptor (struct msi_desc) from > > non-MSI interrupt and using the NULL descriptor. > > So finally I decided instead to apply Guenter patch > > [PATCH v2] powerpc/pci: Improve device hotplug initialization > > Which fixes the underlying problem instead. > Guess I am not hitting above bug because I have my own patch applied ;). Thanks a lot! Guenter ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[git pull] Please pull powerpc.git merge branch
Hi Linus ! We discovered some breakage in our "EEH" (PCI Error Handling) code while doing error injection, due to a couple of regressions. One of them is due to a patch (37f02195b) that, in hindsight, I shouldn't have merged considering that it caused more problems than it solved. Please pull those two fixes. One for a simple EEH address cache initialization issue. The other one is a patch from Guenter that I had originally planned to put in 3.11 but which happens to also fix that other regression (a kernel oops during EEH error handling and possibly hotplug). With those two, the couple of test machines I've hammered with error injection are remaining up now. EEH appears to still fail to recover on some devices, so there is another problem that Gavin is looking into but at least it's no longer crashing the kernel. Cheers, Ben. The following changes since commit b37e161388ac3980d5dfb73050e85874b84253eb: powerpc/pci: Fix boot panic on mpc83xx (regression) (2013-06-24 16:54:09 -0500) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc.git merge for you to fetch changes up to 7846de406f43df98ac9864212dcfe3f2816bdb04: powerpc/pci: Improve device hotplug initialization (2013-06-30 08:46:46 +1000) Guenter Roeck (1): powerpc/pci: Improve device hotplug initialization Thadeu Lima de Souza Cascardo (1): powerpc/eeh: Add eeh_dev to the cache during boot arch/powerpc/kernel/pci-common.c | 17 - arch/powerpc/platforms/pseries/eeh_cache.c |4 ++-- 2 files changed, 14 insertions(+), 7 deletions(-) ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc/pci: Avoid overriding MSI interrupt
On Fri, 2013-06-28 at 21:10 +0800, Gavin Shan wrote: > The issue was introduced by commit 37f02195 ("powerpc/pci: fix > PCI-e devices rescan issue on powerpc platform"). The field > (struct pci_dev::irq) is reused by PCI core to trace the base > MSI interrupt number if the MSI stuff is enabled on the corresponding > device. When running to pcibios_setup_device(), we possibly still > have enabled MSI interrupt on the device. That means "pci_dev->irq" > still have the base MSI interrupt number and it will be overwritten > if we're going fix "pci_dev->irq" again by pci_read_irq_line(). > Eventually, when we enable the device, it runs to kernel crash caused > by fetching the the MSI interrupt descriptor (struct msi_desc) from > non-MSI interrupt and using the NULL descriptor. So finally I decided instead to apply Guenter patch [PATCH v2] powerpc/pci: Improve device hotplug initialization Which fixes the underlying problem instead. I'm running some tests, so far it looks good. However, Gavin, when you have a chance on vpl3, try injecting errors to other adapters, for example the VGA adapter (you need to do lspci to trigger the EEH detection after that since there's no driver and use the "loc code" variant off errinjct) or eth2 (the cxgb3). All I get from EEH with these is: [ 362.962564] EEH: Detected PCI bus error on PHB#7-PE#1 [ 362.962570] eeh_handle_event: Cannot find PCI bus for PHB#7-PE#1 and [ 424.381083] EEH: Detected PCI bus error on PHB#6-PE#1 [ 424.381089] eeh_handle_event: Cannot find PCI bus for PHB#6-PE#1 Followed by ... nothing. This is a tree which has Cascardo patch and Gunther patch (usual location on vpl3). Can you have a look ? Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] of: Specify initrd location using 64-bit
On Sat, Jun 29, 2013 at 1:43 AM, Santosh Shilimkar wrote: > Rob, > Are you ok with phys_addr_t since your concern was about rest > of the memory specific bits of the device-tree code use u64 ? No. I still think it should be u64 for same reasons I said originally. >>> >>> +1 >>> >> +1 >> >> fix type >> > Apart from waste of 32bit, what is the other concern you > have ? I really want to converge on this patch because it > has been a open ended discussion for quite some time. Does > that really break any thing on x86 or your concern is more > from semantics of the physical address. As the "original reasons" were not in this thread, I had to search a bit. I suppose you mean this one: https://lkml.org/lkml/2012/9/13/544 ? Summarized: | The address to load the initrd is decided by the bootloader/user and set | at that point later in time. | The dtb should not be tied to the kernel you are booting. Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH -V2 2/4] powerpc: Contiguous memory allocator based hash page allocation
On Fri, Jun 28, 2013 at 02:41:00PM +0530, Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > Use CMA for allocation of guest hash page. "page table" not just "page". This patch description seems a bit brief for a patch of this length. Please describe a little more of the motivation and the design decisions. > + if (selected_size) { > + pr_debug("%s: reserving %ld MiB for global area\n", __func__, > + (unsigned long)selected_size / SZ_1M); > + align_size = hpt_align_pages << PAGE_SHIFT; > + kvm_cma_declare_contiguous(selected_size, align_size); The alignment you declare here has to be at least as large as the largest alignment that we will be requesting for any block later on. This alignment is fine for POWER7, but PPC970 requires the HPT to be aligned on a multiple of its size. For PPC970 we should make sure align_size is at least as large as any block that we could allocate. Thus align_size should be at least __rounddown_pow_of_two(selected_size) for PPC970. Paul. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH -V2 3/4] powerpc: Contiguous memory allocator based RMA allocation
On Fri, Jun 28, 2013 at 02:41:01PM +0530, Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > Use CMA for allocation of RMA region for guest. Also remove linear allocator > now that it is not used > > Signed-off-by: Aneesh Kumar K.V Acked-by: Paul Mackerras ... though it could use a more extensive patch description. Paul. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH -V2 4/4] powerpc/kvm: Use 256K chunk to track both RMA and hash page table allocation.
On Fri, Jun 28, 2013 at 02:41:02PM +0530, Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > Both RMA and hash page table request will be a multiple of 256K. We can use > a chunk size of 256K to track the free/used 256K chunk in the bitmap. This > should help to reduce the bitmap size. > > Signed-off-by: Aneesh Kumar K.V Looks good overall, just some minor comments below: > + int chunk_count, nr_chunk; I get a little nervous when I see "int" used for variables storing a number of pages or related things such as chunks. Yes, int is enough today but one day it won't be, and there is no time or space penalty to using "long" instead, and in fact the code generated "long" variables can be slightly shorter. So please make variables like this "long". (That will require changes to earlier patches in this series.) > + * aling mask with chunk size. The bit tracks pages in chunk size Should be "align". Paul. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH -V2 1/4] mm/cma: Move dma contiguous changes into a seperate config
On Fri, Jun 28, 2013 at 02:40:59PM +0530, Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > We want to use CMA for allocating hash page table and real mode area for > PPC64. Hence move DMA contiguous related changes into a seperate config > so that ppc64 can enable CMA without requiring DMA contiguous. > > Signed-off-by: Aneesh Kumar K.V Acked-by: Paul Mackerras When you send out the next version, please cc kvm-...@vger.kernel.org, k...@vger.kernel.org and Alexander Graf . Paul. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc/pci: Avoid overriding MSI interrupt
On Fri, 2013-06-28 at 21:10 +0800, Gavin Shan wrote: > The issue was introduced by commit 37f02195 ("powerpc/pci: fix > PCI-e devices rescan issue on powerpc platform"). The field That "fix" caused more problems than it solved. There's a better approach floating around that I will merge eventually next week. Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev