Re: [PATCH 8/8] powernv/eeh: Do hotplug on devices without EEH aware driver

2013-06-29 Thread Benjamin Herrenschmidt
On Thu, 2013-06-27 at 13:46 +0800, Gavin Shan wrote:
> During recovery for EEH errors, the device driver requires reset
> explicitly (most of cases). The EEH core doesn't do hotplug during
> reset. However, there might have some device drivers that can't
> support EEH. So the deivce can't be put into quite state during
> the reset and possibly requesting PCI config or MMIO access. That
> would lead to the failure of the reset, and we don't expect that.
> 
> The patch intends to fix that by removing those devices whose drivers
> can't support EEH before reset and added into the system after reset.
> In the result, it would avoid the race condition mentioned as above.
> The idea was proposed by Ben.

Ok so, while the basic idea is sane, I have some problems with the
implementation.

First, you add a lot of code conditional to eeh_probe_mode_dev() and
I don't see why. There is no reason not to do the exact same remove/add
operations under pHyp.

This is actually a problem I have with the "new" EEH code overall, there
are too much gratuitous differences between the two "modes" and it's
very tricky to figure out exactly what and why, meaning this is going to
be a source of bugs. I think it's worthwhile trying to reconcile some of
that.

But then, there is *already* code to do a similar unplug/replug in EEH,
except that it's done inside eeh_reset_device() in the "other" case
(and indeed not necessarily in the best way).

Let's look a bit more:

> +extern void pcibios_setup_device(struct pci_dev *dev);

Unrelated: The above is gone as an exported symbol from upstream
as of today. I'll merge upstream in before applying your patches
so don't add that. You shouldn't need anyway, see below.

>  extern void pcibios_setup_bus_devices(struct pci_bus *bus);
>  extern void pcibios_setup_bus_self(struct pci_bus *bus);
>  extern void pcibios_setup_phb_io_space(struct pci_controller *hose);
> diff --git a/arch/powerpc/kernel/eeh_driver.c 
> b/arch/powerpc/kernel/eeh_driver.c
> index 2b1ce17..cb3baab 100644
> --- a/arch/powerpc/kernel/eeh_driver.c
> +++ b/arch/powerpc/kernel/eeh_driver.c
> @@ -244,6 +244,7 @@ static void *eeh_report_reset(void *data, void *userdata)
>   struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
>   enum pci_ers_result rc, *res = userdata;
>   struct pci_driver *driver;
> + bool enable_irq = true;
>  
>   if (!dev) return NULL;
>   dev->error_state = pci_channel_io_normal;
> @@ -251,7 +252,21 @@ static void *eeh_report_reset(void *data, void *userdata)
>   driver = eeh_pcid_get(dev);
>   if (!driver) return NULL;
>  
> - eeh_enable_irq(dev);

So I would very much like to understand a bit more that irq business,
for example, what does it do when MSIs are enabled, etc... but that
is also probably something to look at separately (you have a TODO
list ?)

> + /*
> +  * For those PCI devices just added, we reloaded its driver
> +  * and needn't to enable the interrupt. The driver should
> +  * take care of that. Otherwise, complaint raised from IRQ
> +  * subsystem.
> +  */
> + if (eeh_probe_mode_dev() && (edev->mode & EEH_DEV_REMOVED)) {
> + edev->mode &= ~(EEH_DEV_REMOVED | EEH_DEV_IRQ_DISABLED);
> + edev->bus = NULL;
> + enable_irq = false;
> +
> + }

Ok, yet another eeh_probe_mode_dev() ... they irk me :-) There should be
no functional differences between EEH operations on powernv and pseries
at a high level and this is high level...

> + if (enable_irq)
> + eeh_enable_irq(dev);
>  
>   if (!driver->err_handler ||
>   !driver->err_handler->slot_reset) {
> @@ -338,6 +353,115 @@ static void *eeh_report_failure(void *data, void 
> *userdata)
>   return NULL;
>  }
>  
> +static void *eeh_rmv_device(void *data, void *userdata)
> +{
> + struct pci_driver *driver;
> + struct eeh_dev *edev = (struct eeh_dev *)data;
> + struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
> + int *removed = (int *)userdata;
> +
> + /*
> +  * Actually, we should remove the PCI bridges as well.
> +  * However, that's lots of complexity to do that,
> +  * particularly some of devices under the bridge might
> +  * support EEH. So we just care about PCI devices for
> +  * simplicity here.
> +  */
> + if (!dev || (dev->hdr_type & PCI_HEADER_TYPE_BRIDGE))
> + return NULL;
> + driver = eeh_pcid_get(dev);
> + if (driver && driver->err_handler)
> + return NULL;

So here we end up basically re-inventing PCI hotplug:

> + /* If the driver doesn't support EEH, remove it */
> + pr_info("EEH: Removing device %s without EEH support\n",
> + pci_name(dev));
> +
> + /* Detach EEH device from PCI device */
> + edev->pdev = NULL;
> + dev->dev.archdata.edev = NULL;
> + pci_dev_put(dev);
> +
> + /* Remove and address cache */
> + eeh_addr_cache_rmv_dev(dev);
> + eeh_sysfs_remove_device(dev);
> +
> +

[git pull] Please pull powerpc.git merge branch

2013-06-29 Thread Benjamin Herrenschmidt
Hi Linus !

Earlier today I mentioned that while we had fixed the kernel crashes,
EEH error recovery didn't always recover... It appears that I had
a fix for that already in powerpc-next (with a stable CC).

I cherry-picked it today and did a few tests and it seems that things
now work quite well. The patch is also pretty simple, so I see no reason
to wait before merging it.

Please pull.

Cheers,
Ben.

The following changes since commit 6c355beafdbd0a62add3a3d89825ca87cf8ecec0:

  Merge branch 'merge' of 
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc (2013-06-29 17:02:48 
-0700)

are available in the git repository at:


  git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc.git merge

for you to fetch changes up to ea461abf61753b4b79e625a7c20650105b990f21:

  powerpc/eeh: Fix fetching bus for single-dev-PE (2013-06-30 14:08:34 +1000)


Gavin Shan (1):
  powerpc/eeh: Fix fetching bus for single-dev-PE

 arch/powerpc/platforms/pseries/eeh_pe.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] powerpc/pci: Avoid overriding MSI interrupt

2013-06-29 Thread Benjamin Herrenschmidt
On Sun, 2013-06-30 at 09:51 +0800, Gavin Shan wrote:

> Ben, I think one patch was lost from mainline and that fixes the problem.
> 
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/?id=5fb621698e94e3af8b413d9439041fde48e2784d
> 
> I had the patch applied to /home/benh/linux-test and have following commands
> to inject errors, everything looks good :-)
> 
> errinjct eeh -v -f 0 -p U78AB.001.WZSGBJ6-P1-C5-T1 -a 0x0 -m 0x0
> errinjct eeh -v -f 0 -p U78AB.001.WZSGBJ6-P1-C6-T1 -a 0x0 -m 0x0

Ok, thanks, it's in -next and not upstream yet. I'll ask Linus to pull
that one.

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] powerpc/pci: Avoid overriding MSI interrupt

2013-06-29 Thread Gavin Shan
On Sun, Jun 30, 2013 at 09:09:20AM +1000, Benjamin Herrenschmidt wrote:
>On Fri, 2013-06-28 at 21:10 +0800, Gavin Shan wrote:

.../...

>I'm running some tests, so far it looks good. However, Gavin, when you
>have a chance on vpl3, try injecting errors to other adapters, for
>example the VGA adapter (you need to do lspci to trigger the EEH
>detection after that since there's no driver and use the "loc code"
>variant off errinjct) or eth2 (the cxgb3).
>
>All I get from EEH with these is:
>
>[  362.962564] EEH: Detected PCI bus error on PHB#7-PE#1
>[  362.962570] eeh_handle_event: Cannot find PCI bus for PHB#7-PE#1
>
>and
>
>[  424.381083] EEH: Detected PCI bus error on PHB#6-PE#1
>[  424.381089] eeh_handle_event: Cannot find PCI bus for PHB#6-PE#1
>
>Followed by ... nothing.
>

Ben, I think one patch was lost from mainline and that fixes the problem.

https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/?id=5fb621698e94e3af8b413d9439041fde48e2784d

I had the patch applied to /home/benh/linux-test and have following commands
to inject errors, everything looks good :-)

errinjct eeh -v -f 0 -p U78AB.001.WZSGBJ6-P1-C5-T1 -a 0x0 -m 0x0
errinjct eeh -v -f 0 -p U78AB.001.WZSGBJ6-P1-C6-T1 -a 0x0 -m 0x0

Thanks,
Gavin

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] powerpc/pci: Avoid overriding MSI interrupt

2013-06-29 Thread Guenter Roeck
On Sun, Jun 30, 2013 at 09:09:20AM +1000, Benjamin Herrenschmidt wrote:
> On Fri, 2013-06-28 at 21:10 +0800, Gavin Shan wrote:
> > The issue was introduced by commit 37f02195 ("powerpc/pci: fix
> > PCI-e devices rescan issue on powerpc platform"). The field
> > (struct pci_dev::irq) is reused by PCI core to trace the base
> > MSI interrupt number if the MSI stuff is enabled on the corresponding
> > device. When running to pcibios_setup_device(), we possibly still
> > have enabled MSI interrupt on the device. That means "pci_dev->irq"
> > still have the base MSI interrupt number and it will be overwritten
> > if we're going fix "pci_dev->irq" again by pci_read_irq_line().
> > Eventually, when we enable the device, it runs to kernel crash caused
> > by fetching the the MSI interrupt descriptor (struct msi_desc) from
> > non-MSI interrupt and using the NULL descriptor.
> 
> So finally I decided instead to apply Guenter patch
> 
> [PATCH v2] powerpc/pci: Improve device hotplug initialization
> 
> Which fixes the underlying problem instead.
> 

Guess I am not hitting above bug because I have my own patch applied ;).

Thanks a lot!

Guenter
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[git pull] Please pull powerpc.git merge branch

2013-06-29 Thread Benjamin Herrenschmidt
Hi Linus !

We discovered some breakage in our "EEH" (PCI Error Handling) code while
doing error injection, due to a couple of regressions. One of them is
due to a patch (37f02195b) that, in hindsight, I shouldn't have merged
considering that it caused more problems than it solved.

Please pull those two fixes. One for a simple EEH address cache
initialization issue. The other one is a patch from Guenter that I had
originally planned to put in 3.11 but which happens to also fix that
other regression (a kernel oops during EEH error handling and possibly
hotplug).

With those two, the couple of test machines I've hammered with error
injection are remaining up now. EEH appears to still fail to recover on
some devices, so there is another problem that Gavin is looking into but
at least it's no longer crashing the kernel.

Cheers,
Ben.

The following changes since commit b37e161388ac3980d5dfb73050e85874b84253eb:

  powerpc/pci: Fix boot panic on mpc83xx (regression) (2013-06-24 16:54:09 
-0500)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc.git merge

for you to fetch changes up to 7846de406f43df98ac9864212dcfe3f2816bdb04:

  powerpc/pci: Improve device hotplug initialization (2013-06-30 08:46:46 +1000)


Guenter Roeck (1):
  powerpc/pci: Improve device hotplug initialization

Thadeu Lima de Souza Cascardo (1):
  powerpc/eeh: Add eeh_dev to the cache during boot

 arch/powerpc/kernel/pci-common.c   |   17 -
 arch/powerpc/platforms/pseries/eeh_cache.c |4 ++--
 2 files changed, 14 insertions(+), 7 deletions(-)


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] powerpc/pci: Avoid overriding MSI interrupt

2013-06-29 Thread Benjamin Herrenschmidt
On Fri, 2013-06-28 at 21:10 +0800, Gavin Shan wrote:
> The issue was introduced by commit 37f02195 ("powerpc/pci: fix
> PCI-e devices rescan issue on powerpc platform"). The field
> (struct pci_dev::irq) is reused by PCI core to trace the base
> MSI interrupt number if the MSI stuff is enabled on the corresponding
> device. When running to pcibios_setup_device(), we possibly still
> have enabled MSI interrupt on the device. That means "pci_dev->irq"
> still have the base MSI interrupt number and it will be overwritten
> if we're going fix "pci_dev->irq" again by pci_read_irq_line().
> Eventually, when we enable the device, it runs to kernel crash caused
> by fetching the the MSI interrupt descriptor (struct msi_desc) from
> non-MSI interrupt and using the NULL descriptor.

So finally I decided instead to apply Guenter patch

[PATCH v2] powerpc/pci: Improve device hotplug initialization

Which fixes the underlying problem instead.

I'm running some tests, so far it looks good. However, Gavin, when you
have a chance on vpl3, try injecting errors to other adapters, for
example the VGA adapter (you need to do lspci to trigger the EEH
detection after that since there's no driver and use the "loc code"
variant off errinjct) or eth2 (the cxgb3).

All I get from EEH with these is:

[  362.962564] EEH: Detected PCI bus error on PHB#7-PE#1
[  362.962570] eeh_handle_event: Cannot find PCI bus for PHB#7-PE#1

and

[  424.381083] EEH: Detected PCI bus error on PHB#6-PE#1
[  424.381089] eeh_handle_event: Cannot find PCI bus for PHB#6-PE#1

Followed by ... nothing.

This is a tree which has Cascardo patch and Gunther patch (usual
location on vpl3).

Can you have a look ?

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] of: Specify initrd location using 64-bit

2013-06-29 Thread Geert Uytterhoeven
On Sat, Jun 29, 2013 at 1:43 AM, Santosh Shilimkar
 wrote:
> Rob,
> Are you ok with phys_addr_t since your concern was about rest
> of the memory specific bits of the device-tree code use u64 ?

 No. I still think it should be u64 for same reasons I said originally.
>>>
>>> +1
>>>
>> +1
>>
>> fix type
>>
> Apart from waste of 32bit, what is the other concern you
> have ? I really want to converge on this patch because it
> has been a open ended discussion for quite some time. Does
> that really break any thing on x86 or your concern is more
> from semantics of the physical address.

As the "original reasons" were not in this thread, I had to search a bit.
I suppose you mean this one: https://lkml.org/lkml/2012/9/13/544 ?

Summarized:
| The address to load the initrd is decided by the bootloader/user and set
| at that point later in time.
| The dtb should not be tied to the kernel you are booting.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH -V2 2/4] powerpc: Contiguous memory allocator based hash page allocation

2013-06-29 Thread Paul Mackerras
On Fri, Jun 28, 2013 at 02:41:00PM +0530, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" 
> 
> Use CMA for allocation of guest hash page.

"page table" not just "page".  This patch description seems a bit
brief for a patch of this length.  Please describe a little more of
the motivation and the design decisions.

> + if (selected_size) {
> + pr_debug("%s: reserving %ld MiB for global area\n", __func__,
> +  (unsigned long)selected_size / SZ_1M);
> + align_size = hpt_align_pages << PAGE_SHIFT;
> + kvm_cma_declare_contiguous(selected_size, align_size);

The alignment you declare here has to be at least as large as the
largest alignment that we will be requesting for any block later on.
This alignment is fine for POWER7, but PPC970 requires the HPT to be
aligned on a multiple of its size.  For PPC970 we should make sure
align_size is at least as large as any block that we could allocate.
Thus align_size should be at least __rounddown_pow_of_two(selected_size)
for PPC970.

Paul.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH -V2 3/4] powerpc: Contiguous memory allocator based RMA allocation

2013-06-29 Thread Paul Mackerras
On Fri, Jun 28, 2013 at 02:41:01PM +0530, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" 
> 
> Use CMA for allocation of RMA region for guest. Also remove linear allocator
> now that it is not used
> 
> Signed-off-by: Aneesh Kumar K.V 

Acked-by: Paul Mackerras 

... though it could use a more extensive patch description.

Paul.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH -V2 4/4] powerpc/kvm: Use 256K chunk to track both RMA and hash page table allocation.

2013-06-29 Thread Paul Mackerras
On Fri, Jun 28, 2013 at 02:41:02PM +0530, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" 
> 
> Both RMA and hash page table request will be a multiple of 256K. We can use
> a chunk size of 256K to track the free/used 256K chunk in the bitmap. This
> should help to reduce the bitmap size.
> 
> Signed-off-by: Aneesh Kumar K.V 

Looks good overall, just some minor comments below:

> + int chunk_count, nr_chunk;

I get a little nervous when I see "int" used for variables storing a
number of pages or related things such as chunks.  Yes, int is enough
today but one day it won't be, and there is no time or space penalty
to using "long" instead, and in fact the code generated "long"
variables can be slightly shorter.  So please make variables like this
"long".  (That will require changes to earlier patches in this
series.)

> +  * aling mask with chunk size. The bit tracks pages in chunk size

Should be "align".

Paul.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH -V2 1/4] mm/cma: Move dma contiguous changes into a seperate config

2013-06-29 Thread Paul Mackerras
On Fri, Jun 28, 2013 at 02:40:59PM +0530, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" 
> 
> We want to use CMA for allocating hash page table and real mode area for
> PPC64. Hence move DMA contiguous related changes into a seperate config
> so that ppc64 can enable CMA without requiring DMA contiguous.
> 
> Signed-off-by: Aneesh Kumar K.V 

Acked-by: Paul Mackerras 

When you send out the next version, please cc kvm-...@vger.kernel.org,
k...@vger.kernel.org and Alexander Graf .

Paul.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] powerpc/pci: Avoid overriding MSI interrupt

2013-06-29 Thread Benjamin Herrenschmidt
On Fri, 2013-06-28 at 21:10 +0800, Gavin Shan wrote:
> The issue was introduced by commit 37f02195 ("powerpc/pci: fix
> PCI-e devices rescan issue on powerpc platform"). The field

That "fix" caused more problems than it solved. There's a better
approach floating around that I will merge eventually next week.

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev