[PATCH] cxl: Fix issues when unmapping contexts

2015-02-24 Thread Ian Munsie
From: Ian Munsie 

commit 0712dc7e73e59d79bcead5d5520acf4e9e917e87 upstream.
for the 3.18 stable series

An issue was introduced with "cxl: Unmap MMIO regions when detaching a
context" (b123429e6a9e8d03aacf888d23262835f0081448) where closing a
context normally could also unmap the problem state area of other
contexts currently using the AFU.

It was also discovered that after a context's MMIO space had been
unmapped it would read 0s when accessing it, whereas the expected
behaviour was for the access to fail altogether.

In order to address these issues, this patch does two things:

- Forced mmap unmapping is only done when we are forcefully detaching
  all contexts, and not in the normal detach path. Since the normal
  context close path is tied to the file release any mmaps must have
  already been released so we don't need to worry in that case.

- The mmap path now uses a vm_operations_struct with a fault handler.
  The fault handler ensures that the context is in started state,
  otherwise it fails the access attempt with a SIGBUS.

Fixes: b123429e6a9e ("cxl: Unmap MMIO regions when detaching a context")
Signed-off-by: Ian Munsie 
Signed-off-by: Michael Ellerman 
---
 drivers/misc/cxl/context.c | 82 +++---
 drivers/misc/cxl/file.c| 14 
 2 files changed, 71 insertions(+), 25 deletions(-)

diff --git a/drivers/misc/cxl/context.c b/drivers/misc/cxl/context.c
index 51fd6b5..d1b55fe 100644
--- a/drivers/misc/cxl/context.c
+++ b/drivers/misc/cxl/context.c
@@ -100,6 +100,46 @@ int cxl_context_init(struct cxl_context *ctx, struct 
cxl_afu *afu, bool master,
return 0;
 }
 
+static int cxl_mmap_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
+{
+   struct cxl_context *ctx = vma->vm_file->private_data;
+   unsigned long address = (unsigned long)vmf->virtual_address;
+   u64 area, offset;
+
+   offset = vmf->pgoff << PAGE_SHIFT;
+
+   pr_devel("%s: pe: %i address: 0x%lx offset: 0x%llx\n",
+   __func__, ctx->pe, address, offset);
+
+   if (ctx->afu->current_mode == CXL_MODE_DEDICATED) {
+   area = ctx->afu->psn_phys;
+   if (offset > ctx->afu->adapter->ps_size)
+   return VM_FAULT_SIGBUS;
+   } else {
+   area = ctx->psn_phys;
+   if (offset > ctx->psn_size)
+   return VM_FAULT_SIGBUS;
+   }
+
+   mutex_lock(&ctx->status_mutex);
+
+   if (ctx->status != STARTED) {
+   mutex_unlock(&ctx->status_mutex);
+   pr_devel("%s: Context not started, failing problem state 
access\n", __func__);
+   return VM_FAULT_SIGBUS;
+   }
+
+   vm_insert_pfn(vma, address, (area + offset) >> PAGE_SHIFT);
+
+   mutex_unlock(&ctx->status_mutex);
+
+   return VM_FAULT_NOPAGE;
+}
+
+static const struct vm_operations_struct cxl_mmap_vmops = {
+   .fault = cxl_mmap_fault,
+};
+
 /*
  * Map a per-context mmio space into the given vma.
  */
@@ -108,26 +148,25 @@ int cxl_context_iomap(struct cxl_context *ctx, struct 
vm_area_struct *vma)
u64 len = vma->vm_end - vma->vm_start;
len = min(len, ctx->psn_size);
 
-   if (ctx->afu->current_mode == CXL_MODE_DEDICATED) {
-   vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
-   return vm_iomap_memory(vma, ctx->afu->psn_phys, 
ctx->afu->adapter->ps_size);
-   }
+   if (ctx->afu->current_mode != CXL_MODE_DEDICATED) {
+   /* make sure there is a valid per process space for this AFU */
+   if ((ctx->master && !ctx->afu->psa) || (!ctx->afu->pp_psa)) {
+   pr_devel("AFU doesn't support mmio space\n");
+   return -EINVAL;
+   }
 
-   /* make sure there is a valid per process space for this AFU */
-   if ((ctx->master && !ctx->afu->psa) || (!ctx->afu->pp_psa)) {
-   pr_devel("AFU doesn't support mmio space\n");
-   return -EINVAL;
+   /* Can't mmap until the AFU is enabled */
+   if (!ctx->afu->enabled)
+   return -EBUSY;
}
 
-   /* Can't mmap until the AFU is enabled */
-   if (!ctx->afu->enabled)
-   return -EBUSY;
-
pr_devel("%s: mmio physical: %llx pe: %i master:%i\n", __func__,
 ctx->psn_phys, ctx->pe , ctx->master);
 
+   vma->vm_flags |= VM_IO | VM_PFNMAP;
vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
-   return vm_iomap_memory(vma, ctx->psn_phys, len);
+   vma->vm_ops = &cxl_mmap_vmops;
+   return 0;
 }
 
 /*
@@ -150,12 +189,6 @@ static void __detach_context(struct cxl_context *ctx)
afu_release_irqs(ctx);
flush_work(&ctx->fault_work); /* Only needed for dedicated process */
wake_up_all(&ctx->wq);
-
-   /* Release Problem State Area mapping */
-   mutex_lock(&ctx->mapping_lock);
-   if (ctx->mapping)
-

Re: [PATCH 1/3] cxl: Use image state defaults for reloading FPGA

2015-02-24 Thread Ian Munsie
Excerpts from Greg KH's message of 2015-02-25 11:32:29 +1100:
> What stable kernel(s) are you wanting this series to go into?

Hi Greg,

These three patches are for 3.18 and 3.19.

Cheers,
-Ian

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: Build regressions/improvements in v4.0-rc1

2015-02-24 Thread Michael Ellerman
On Tue, 2015-02-24 at 10:38 +0100, Geert Uytterhoeven wrote:
> Hi Michael,
> 
> On Tue, Feb 24, 2015 at 5:52 AM, Michael Ellerman  wrote:
> >> >   + error: book3s_64_vio_hv.c: undefined reference to 
> >> > `power7_wakeup_loss':  => .text+0x408)
> >>
> >> pseries_defconfig
> >
> > This one is actually from pseries_defconfig+POWERNV=n, so I think I
> 
> Thanks!
> 
> > broke your script with the + notation in the config name :)
> 
> Nope, my brain used the wrong separator.

I can't help with that :)

> However, my scripts do have a problem with the subdirectories
> in arch/powerpc/configs/ (4xx/currituck_defconfig)...

Yeah sorry, they are a bit of a pain. I'm sure some horrible regexp can deal
with it ;)

cheers


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc/fsl: add power_off support for fsl platform

2015-02-24 Thread Scott Wood
On Wed, 2015-02-04 at 14:47 +0800, Dongsheng Wang wrote:
> +void ppc_md_fixup(void)
> +{

This name is way too generic (though it's moot since you shouldn't use
ppc_md for this).

> + struct device_node *np;
> +
> + np = of_find_compatible_node(NULL, NULL, "fsl,fpga-qixis");
> + if (!np)
> + return;
> +
> + of_node_put(np);
> +
> + pm_power_off = fsl_power_off;
> + ppc_md.halt = fsl_power_off;
> +}

Please implement this as a drivers/power/reset driver, and consider
basing on top of
http://lists.infradead.org/pipermail/linux-arm-kernel/2014-October/293089.html

-Scott


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v6 1/2] powerpc/mpc85xx: Add FSL QorIQ DPAA BMan support to device tree(s)

2015-02-24 Thread Scott Wood
On Mon, 2015-02-02 at 00:53 -0600, Emil Medve wrote:
> From: Kumar Gala 
> 
> Change-Id: If643fa5ba0a903aef8f5056a2c90ebecc995b760

Remove these.

-Scott


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v4 1/2] powerpc/corenet: Enable muxing MDIO buses via GPIO

2015-02-24 Thread Scott Wood
On Sun, 2015-02-01 at 15:48 -0600, Emil Medve wrote:
> From: Andy Fleming 
> 
> Change-Id: I4489db79957ad533f4ba3f04fe7d5bcb3288e981

Again, remove these.

-Scott


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 6/8] clk: ppc-corenet: Replace kzalloc() with kmalloc()

2015-02-24 Thread Scott Wood
On Tue, 2015-01-20 at 04:09 -0600, Emil Medve wrote:
> Where the memset() is not necessary
> 
> Signed-off-by: Emil Medve 
> ---
>  drivers/clk/clk-ppc-corenet.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/clk/clk-ppc-corenet.c b/drivers/clk/clk-ppc-corenet.c
> index d84a7f0..91816b1 100644
> --- a/drivers/clk/clk-ppc-corenet.c
> +++ b/drivers/clk/clk-ppc-corenet.c
> @@ -185,7 +185,7 @@ static void __init core_pll_init(struct device_node *np)
>   if (!subclks)
>   goto err_map;
>  
> - onecell_data = kzalloc(sizeof(*onecell_data), GFP_KERNEL);
> + onecell_data = kmalloc(sizeof(*onecell_data), GFP_KERNEL);
>   if (!onecell_data)
>   goto err_clks;
>  

I think it's better to use kzalloc always, outside of
performance-sensitive allocations.  E.g. what if a new field is added to
the struct later?

-Scott


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 7/8] powerpc/corenet: Enable CLK_PPC_CORENET

2015-02-24 Thread Scott Wood
On Tue, 2015-01-20 at 04:09 -0600, Emil Medve wrote:
> Change-Id: I1a80ad7b9f6854791bd270b746f93a91439155a6
> Signed-off-by: Emil Medve 

No Change-Id, and don't bundle patches meant for my tree in the same
patchset as patches meant for other trees.  There's no dependency
between them.

-Scott


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/3] cxl: Use image state defaults for reloading FPGA

2015-02-24 Thread Greg KH
On Mon, Feb 23, 2015 at 03:21:19PM +1100, Michael Ellerman wrote:
> From: Ryan Grimm 
> 
> Commit 4beb5421babee1204757b877622830c6aa31be6d upstream.
> 
> Select defaults such that a PERST causes flash image reload.  Select which
> image based on what the card is set up to load.
> 
> CXL_VSEC_PERST_LOADS_IMAGE selects whether PERST assertion causes flash image
> load.
> 
> CXL_VSEC_PERST_SELECT_USER selects which image is loaded on the next PERST.
> 
> cxl_update_image_control writes these bits into the VSEC.
> 
> Signed-off-by: Ryan Grimm 
> Acked-by: Ian Munsie 
> Signed-off-by: Michael Ellerman 
> ---
>  drivers/misc/cxl/cxl.h |  1 +
>  drivers/misc/cxl/pci.c | 42 --
>  2 files changed, 41 insertions(+), 2 deletions(-)

What stable kernel(s) are you wanting this series to go into?

thanks,

greg k-h
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 2/3] powerpc/dma: Support 32-bit coherent mask with 64-bit dma_mask

2015-02-24 Thread Scott Wood
On Wed, 2015-02-25 at 07:40 +1100, Benjamin Herrenschmidt wrote:
> On Tue, 2015-02-24 at 14:34 -0600, Scott Wood wrote:
> > On Fri, 2015-02-20 at 19:35 +1100, Benjamin Herrenschmidt wrote:
> > >  static u64 dma_direct_get_required_mask(struct device *dev)
> > > diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
> > > index f146ef0..a7f15e2 100644
> > > --- a/arch/powerpc/mm/mem.c
> > > +++ b/arch/powerpc/mm/mem.c
> > > @@ -277,6 +277,11 @@ int dma_pfn_limit_to_zone(u64 pfn_limit)
> > >   return -EPERM;
> > >  }
> > >  
> > > +u64 dma_get_zone_limit(int zone)
> > > +{
> > > + return max_zone_pfns[zone] << PAGE_SHIFT;
> > > +}
> > 
> > If you must do this in terms of bytes rather than pfn, cast to u64
> > before shifting -- and even then the result will be PAGE_SIZE - 1 too
> > small.
> 
> Do we have RAM above what a unsigned long can hold ? I think I'll just
> make it a pfn and respin...

Yes, we can have over 4 GiB RAM on 32-bit.

-Scott


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc: Export __spin_yield

2015-02-24 Thread Benjamin Herrenschmidt
On Tue, 2015-02-24 at 10:37 -0600, Suresh E. Warrier wrote:
> On 02/23/2015 09:38 PM, Benjamin Herrenschmidt wrote:
> > On Mon, 2015-02-23 at 18:10 -0600, Suresh E. Warrier wrote:
> >> Export __spin_yield so that the arch_spin_unlock() function
> >> can be invoked from a module.
> > 
> > Make it EXPORT_SYMBOL_GPL. Also explain why a module might need it
> > 
> 
> Sure, I will change that to EXPORT_SYMBOL_GPL. Just curious, though, 
> there is another symbol arch_spin_unlock_wait that is exported from
> the file without the _GPL prefix. Any idea why?

Nope. Not sure how come we did that.

> I have mentioned that this needs to be exported to call the 
> arch_spin_unlock() function from a module. What additional information
> do you think will be useful here ? Are you looking at something
> that explains why a module might need to call arch_spin_unlock()?

What kind of module might need it...

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 2/3] powerpc/dma: Support 32-bit coherent mask with 64-bit dma_mask

2015-02-24 Thread Scott Wood
On Fri, 2015-02-20 at 19:35 +1100, Benjamin Herrenschmidt wrote:
> @@ -149,14 +141,13 @@ static void dma_direct_unmap_sg(struct device *dev, 
> struct scatterlist *sg,
>  
>  static int dma_direct_dma_supported(struct device *dev, u64 mask)
>  {
> -#ifdef CONFIG_PPC64
> - /* Could be improved so platforms can set the limit in case
> -  * they have limited DMA windows
> -  */
> - return mask >= get_dma_offset(dev) + (memblock_end_of_DRAM() - 1);
> -#else
> - return 1;
> + u64 offset = get_dma_offset(dev);
> + u64 limit = offset + memblock_end_of_DRAM() - 1;
> +
> +#if defined(CONFIG_ZONE_DMA32)
> + limit = offset + dma_get_zone_limit(ZONE_DMA32);
>  #endif
> + return mask >= limit;
>  }

I'm confused as to whether dma_supported() is supposed to be testing a
coherent mask or regular mask...  The above suggests coherent, as does
the call to dma_supported() in dma_set_coherent_mask(), but if swiotlb
is used, swiotlb_dma_supported() will only check for a mask that can
accommodate io_tlb_end, without regard for coherent allocations.

>  static u64 dma_direct_get_required_mask(struct device *dev)
> diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
> index f146ef0..a7f15e2 100644
> --- a/arch/powerpc/mm/mem.c
> +++ b/arch/powerpc/mm/mem.c
> @@ -277,6 +277,11 @@ int dma_pfn_limit_to_zone(u64 pfn_limit)
>   return -EPERM;
>  }
>  
> +u64 dma_get_zone_limit(int zone)
> +{
> + return max_zone_pfns[zone] << PAGE_SHIFT;
> +}

If you must do this in terms of bytes rather than pfn, cast to u64
before shifting -- and even then the result will be PAGE_SIZE - 1 too
small.

-Scott


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: Problems with Kernels 3.17-rc1 and onwards on Acube Sam460 AMCC 460ex board

2015-02-24 Thread Julian Margetson

Thanks

after skipping several times :

git bisect skip

There are only 'skip'ped commits left to test.
The first bad commit could be any of:
b486e0e6d599b9ca8667fb9a7d49b7383ee963c7
eab3bbeffd152125ae0f90863b8e9bc8eef49423
960cd9d4fef6dd9e235c0e5c0d4ed027f8a48025
f02ad907cd9e7fe3a6405d2d005840912f1ed258
6a425c2a9b37ca3d2c37e3c1cdf973dba53eaa79
ee0a89cf3c2c550e6d877dda21dd2947afb90cb6
92890583627ee2a0518e55b063fcff86826fef96
95d6eb3b134e1826ed04cc92b224d93de13e281f
9469244d869623e8b54d9f3d4d00737e377af273
We cannot bisect more!




On 2/24/2015 3:14 PM, Gerhard Pircher wrote:

Am 2015-02-24 um 12:08 schrieb Julian Margetson:

Problems with  the Gib bisect
Kernel wont compile after 10th bisect .

You can try "git bisect skip" to select another commit for testing.
Hopefully that one compiles fine then.

Gerhard


drivers/built-in.o: In function `drm_mode_atomic_ioctl':
(.text+0x865dc): undefined reference to `__get_user_bad'
make: *** [vmlinux] Error 1
root@julian-VirtualBox:/usr/src/linux# git bisect log
git bisect start
# bad: [c517d838eb7d07bbe9507871fab3931deccff539] Linux 4.0-rc1
git bisect bad c517d838eb7d07bbe9507871fab3931deccff539
# good: [bfa76d49576599a4b9f9b7a71f23d73d6dcff735] Linux 3.19
git bisect good bfa76d49576599a4b9f9b7a71f23d73d6dcff735
# good: [02f1f2170d2831b3233e91091c60a66622f29e82] kernel.h: remove ancient 
__FUNCTION__ hack
git bisect good 02f1f2170d2831b3233e91091c60a66622f29e82
# bad: [796e1c55717e9a6ff5c81b12289ffa1ffd919b6f] Merge branch 'drm-next' of 
git://people.freedesktop.org/~airlied/linux
git bisect bad 796e1c55717e9a6ff5c81b12289ffa1ffd919b6f
# good: [9682ec9692e5ac11c6caebd079324e727b19e7ce] Merge tag 
'driver-core-3.20-rc1' of 
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
git bisect good 9682ec9692e5ac11c6caebd079324e727b19e7ce
# good: [a9724125ad014decf008d782e60447c811391326] Merge tag 'tty-3.20-rc1' of 
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
git bisect good a9724125ad014decf008d782e60447c811391326
# good: [f43dff0ee00a259f524ce17ba4f8030553c66590] Merge tag 
'drm-amdkfd-next-fixes-2015-01-25' of 
git://people.freedesktop.org/~gabbayo/linux into drm-next
git bisect good f43dff0ee00a259f524ce17ba4f8030553c66590
# bad: [cffe1e89dc9bf541a39d9287ced7c5addff07084] drm: sti: HDMI add audio 
infoframe
git bisect bad cffe1e89dc9bf541a39d9287ced7c5addff07084
# good: [2f5b4ef15c60bc5292a3f006c018acb3da53737b] Merge tag 
'drm/tegra/for-3.20-rc1' of git://anongit.freedesktop.org/tegra/linux into 
drm-next
git bisect good 2f5b4ef15c60bc5292a3f006c018acb3da53737b
# bad: [86588ce80ccd714793e9ba4140d7ae214229] drm/udl: optimize 
udl_compress_hline16 (v2)
git bisect bad 86588ce80ccd714793e9ba4140d7ae214229
# bad: [d47df63393ed81977e0f6435988d9cbd70c867f7] drm/panel: simple: Add AVIC 
TM070DDH03 panel support
git bisect bad d47df63393ed81977e0f6435988d9cbd70c867f7
# bad: [9469244d869623e8b54d9f3d4d00737e377af273] drm/atomic: Fix potential use 
of state after free
git bisect bad 9469244d869623e8b54d9f3d4d00737e377af273
root@julian-VirtualBox:/usr/src/linux#



On 02/24/2015, you wrote:


On Fri, 2015-02-20 at 15:25 -0400, Julian Margetson wrote:

On 2/18/2015 11:25 PM, Julian Margetson wrote:
  re PPC4XX PCI(E) MSI support.
https://lists.ozlabs.org/pipermail/linuxppc-dev/2010-November/087273.html

Hmm, I think all those comments were addressed before it was merged.

I tried to get a 4xx board going here last week, but it doesn't seem happy. I
can get a bit of uboot but then it hangs, might be overheating.

cheers




___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev





___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 2/3] powerpc/dma: Support 32-bit coherent mask with 64-bit dma_mask

2015-02-24 Thread Benjamin Herrenschmidt
On Tue, 2015-02-24 at 14:34 -0600, Scott Wood wrote:
> On Fri, 2015-02-20 at 19:35 +1100, Benjamin Herrenschmidt wrote:
> > @@ -149,14 +141,13 @@ static void dma_direct_unmap_sg(struct device *dev, 
> > struct scatterlist *sg,
> >  
> >  static int dma_direct_dma_supported(struct device *dev, u64 mask)
> >  {
> > -#ifdef CONFIG_PPC64
> > -   /* Could be improved so platforms can set the limit in case
> > -* they have limited DMA windows
> > -*/
> > -   return mask >= get_dma_offset(dev) + (memblock_end_of_DRAM() - 1);
> > -#else
> > -   return 1;
> > +   u64 offset = get_dma_offset(dev);
> > +   u64 limit = offset + memblock_end_of_DRAM() - 1;
> > +
> > +#if defined(CONFIG_ZONE_DMA32)
> > +   limit = offset + dma_get_zone_limit(ZONE_DMA32);
> >  #endif
> > +   return mask >= limit;
> >  }
> 
> I'm confused as to whether dma_supported() is supposed to be testing a
> coherent mask or regular mask...  The above suggests coherent, as does
> the call to dma_supported() in dma_set_coherent_mask(), but if swiotlb
> is used, swiotlb_dma_supported() will only check for a mask that can
> accommodate io_tlb_end, without regard for coherent allocations.

This is confusing indeed, but without the above, dma_set_coherent_mask()
won't work ... so I'm assuming the above. Notice that x86 doesn't even
bother and basically return 1 for anything above a 24 bit mask (appart
from the force_sac case but we can ignore it).

So we probably should fix our swiotlb implementation as well... but
that's orthogonal.

> >  static u64 dma_direct_get_required_mask(struct device *dev)
> > diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
> > index f146ef0..a7f15e2 100644
> > --- a/arch/powerpc/mm/mem.c
> > +++ b/arch/powerpc/mm/mem.c
> > @@ -277,6 +277,11 @@ int dma_pfn_limit_to_zone(u64 pfn_limit)
> > return -EPERM;
> >  }
> >  
> > +u64 dma_get_zone_limit(int zone)
> > +{
> > +   return max_zone_pfns[zone] << PAGE_SHIFT;
> > +}
> 
> If you must do this in terms of bytes rather than pfn, cast to u64
> before shifting -- and even then the result will be PAGE_SIZE - 1 too
> small.

Do we have RAM above what a unsigned long can hold ? I think I'll just
make it a pfn and respin...

Cheers,
Ben.

> -Scott
> 


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 5/5] crypto: talitos: Add software backlog queue handling

2015-02-24 Thread Horia Geantă
On 2/20/2015 6:21 PM, Martin Hicks wrote:
> I was running into situations where the hardware FIFO was filling up, and
> the code was returning EAGAIN to dm-crypt and just dropping the submitted
> crypto request.
> 
> This adds support in talitos for a software backlog queue.  When requests
> can't be queued to the hardware immediately EBUSY is returned.  The queued
> requests are dispatched to the hardware in received order as hardware FIFO
> slots become available.
> 
> Signed-off-by: Martin Hicks 

Hi Martin,

Thanks for the effort!
Indeed we noticed that talitos (and caam) don't play nicely with
dm-crypt, lacking a backlog mechanism.

Please run checkpatch --strict and fix the errors, warnings.

> ---
>  drivers/crypto/talitos.c |   92 
> +++---
>  drivers/crypto/talitos.h |3 ++
>  2 files changed, 74 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c
> index d3472be..226654c 100644
> --- a/drivers/crypto/talitos.c
> +++ b/drivers/crypto/talitos.c
> @@ -183,43 +183,72 @@ static int init_device(struct device *dev)
>  }
>  
>  /**
> - * talitos_submit - submits a descriptor to the device for processing
> + * talitos_handle_queue - performs submissions either of new descriptors
> + *or ones waiting in the queue backlog.
>   * @dev: the SEC device to be used
>   * @ch:  the SEC device channel to be used
> - * @edesc:   the descriptor to be processed by the device
> - * @context: a handle for use by caller (optional)

The "context" kernel-doc should have been removed in patch 4/5.

> + * @edesc:   the descriptor to be processed by the device (optional)
>   *
>   * desc must contain valid dma-mapped (bus physical) address pointers.
>   * callback must check err and feedback in descriptor header
> - * for device processing status.
> + * for device processing status upon completion.
>   */
> -int talitos_submit(struct device *dev, int ch, struct talitos_edesc *edesc)
> +int talitos_handle_queue(struct device *dev, int ch, struct talitos_edesc 
> *edesc)
>  {
>   struct talitos_private *priv = dev_get_drvdata(dev);
> - struct talitos_request *request = &edesc->req;
> + struct talitos_request *request, *orig_request = NULL;
> + struct crypto_async_request *async_req;
>   unsigned long flags;
>   int head;
> + int ret = -EINPROGRESS; 
>  
>   spin_lock_irqsave(&priv->chan[ch].head_lock, flags);
>  
> + if (edesc) {
> + orig_request = &edesc->req;
> + crypto_enqueue_request(&priv->chan[ch].queue, 
> &orig_request->base);
> + }

The request goes through the SW queue even if there are empty slots in
the HW queue, doing unnecessary crypto_queue_encrypt() and
crypto_queue_decrypt(). Trying to use the HW queue first would be better.

> +
> +flush_another:
> + if (priv->chan[ch].queue.qlen == 0) {
> + spin_unlock_irqrestore(&priv->chan[ch].head_lock, flags);
> + return 0;
> + }
> +
>   if (!atomic_inc_not_zero(&priv->chan[ch].submit_count)) {
>   /* h/w fifo is full */
>   spin_unlock_irqrestore(&priv->chan[ch].head_lock, flags);
> - return -EAGAIN;
> + return -EBUSY;
>   }
>  
> - head = priv->chan[ch].head;
> + /* Dequeue the oldest request */
> + async_req = crypto_dequeue_request(&priv->chan[ch].queue);
> +
> + request = container_of(async_req, struct talitos_request, base);
>   request->dma_desc = dma_map_single(dev, request->desc,
>  sizeof(*request->desc),
>  DMA_BIDIRECTIONAL);
>  
>   /* increment fifo head */
> + head = priv->chan[ch].head;
>   priv->chan[ch].head = (priv->chan[ch].head + 1) & (priv->fifo_len - 1);
>  
> - smp_wmb();
> - priv->chan[ch].fifo[head] = request;
> + spin_unlock_irqrestore(&priv->chan[ch].head_lock, flags);
> +
> + /*
> +  * Mark a backlogged request as in-progress, return EBUSY because
> +  * the original request that was submitted is backlogged.

s/is backlogged/is backlogged or dropped
Original request will not be enqueued by crypto_queue_enqueue() if the
CRYPTO_TFM_REQ_MAY_BACKLOG flag is not set (since SW queue is for
backlog only) - that's the case for IPsec requests.

> +  */
> + if (request != orig_request) {
> + struct crypto_async_request *areq = request->context;
> + areq->complete(areq, -EINPROGRESS);
> + ret = -EBUSY;
> + }
> +
> + spin_lock_irqsave(&priv->chan[ch].head_lock, flags);
>  
>   /* GO! */
> + priv->chan[ch].fifo[head] = request;
>   wmb();
>   out_be32(priv->chan[ch].reg + TALITOS_FF,
>upper_32_bits(request->dma_desc));
> @@ -228,9 +257,18 @@ int talitos_submit(struct device *dev, int ch, struct 
> talitos_edesc *edesc)
>  
>   spin_unlock_irqrestore(&priv->

[PATCH v1 3/3] SHA1 for PPC/SPE - kernel config

2015-02-24 Thread Markus Stockhausen
[PATCH v1 3/3] SHA1 for PPC/SPE - kernel config

Integrate the module into the kernel config tree.

Signed-off-by: Markus Stockhausen 

diff --git a/arch/powerpc/crypto/Makefile b/arch/powerpc/crypto/Makefile
index 1698fb9..d400bf9 100644
--- a/arch/powerpc/crypto/Makefile
+++ b/arch/powerpc/crypto/Makefile
@@ -6,8 +6,10 @@
 
 obj-$(CONFIG_CRYPTO_AES_PPC_SPE) += aes-ppc-spe.o
 obj-$(CONFIG_CRYPTO_SHA1_PPC) += sha1-powerpc.o
+obj-$(CONFIG_CRYPTO_SHA1_PPC_SPE) += sha1-ppc-spe.o
 obj-$(CONFIG_CRYPTO_SHA256_PPC_SPE) += sha256-ppc-spe.o
 
 aes-ppc-spe-y := aes-spe-core.o aes-spe-keys.o aes-tab-4k.o aes-spe-modes.o 
aes_spe_glue.o
 sha1-powerpc-y := sha1-powerpc-asm.o sha1.o
+sha1-ppc-spe-y := sha1-spe-asm.o sha1_spe_glue.o
 sha256-ppc-spe-y := sha256-spe-asm.o sha256_spe_glue.o
diff --git a/crypto/Kconfig b/crypto/Kconfig
index f34d136..7fc084f 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -573,6 +573,13 @@ config CRYPTO_SHA1_PPC
  This is the powerpc hardware accelerated implementation of the
  SHA-1 secure hash standard (FIPS 180-1/DFIPS 180-2).
 
+config CRYPTO_SHA1_PPC_SPE
+   tristate "SHA1 digest algorithm (PPC SPE)"
+   depends on PPC && SPE
+   help
+ SHA-1 secure hash standard (DFIPS 180-4) implemented
+ using powerpc SPE SIMD instruction set.
+
 config CRYPTO_SHA1_MB
tristate "SHA1 digest algorithm (x86_64 Multi-Buffer, Experimental)"
depends on X86 && 64BIT

Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte
Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail
irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und
vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte
Weitergabe dieser Mail ist nicht gestattet.

Über das Internet versandte E-Mails können unter fremden Namen erstellt oder
manipuliert werden. Deshalb ist diese als E-Mail verschickte Nachricht keine
rechtsverbindliche Willenserklärung.

Collogia
Unternehmensberatung AG
Ubierring 11
D-50678 Köln

Vorstand:
Kadir Akin
Dr. Michael Höhnerbach

Vorsitzender des Aufsichtsrates:
Hans Kristian Langva

Registergericht: Amtsgericht Köln
Registernummer: HRB 52 497

This e-mail may contain confidential and/or privileged information. If you
are not the intended recipient (or have received this e-mail in error)
please notify the sender immediately and destroy this e-mail. Any
unauthorized copying, disclosure or distribution of the material in this
e-mail is strictly forbidden.

e-mails sent over the internet may have been written under a wrong name or
been manipulated. That is why this message sent as an e-mail is not a
legally binding declaration of intention.

Collogia
Unternehmensberatung AG
Ubierring 11
D-50678 Köln

executive board:
Kadir Akin
Dr. Michael Höhnerbach

President of the supervisory board:
Hans Kristian Langva

Registry office: district court Cologne
Register number: HRB 52 497


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v1 2/3] SHA1 for PPC/SPE - glue

2015-02-24 Thread Markus Stockhausen
[PATCH v1 2/3] SHA1 for PPC/SPE - glue

Glue code for crypto infrastructure. Call the assembler
code where required. Disable preemption during calculation
and enable SPE instructions in the kernel prior to the 
call. Avoid to disable preemption for too long.

Take a little care about small input data. Kick out early
for input chunks < 64 bytes and replace memset for context
cleanup with simple loop. 

Signed-off-by: Markus Stockhausen 

diff --git a/arch/powerpc/sha1_spe_glue.c b/arch/powerpc/sha1_spe_glue.c
new file mode 100644
index 000..3e1d222
--- /dev/null
+++ b/arch/powerpc/sha1_spe_glue.c
@@ -0,0 +1,210 @@
+/*
+ * Glue code for SHA-1 implementation for SPE instructions (PPC)
+ *
+ * Based on generic implementation.
+ *
+ * Copyright (c) 2015 Markus Stockhausen 
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/*
+ * MAX_BYTES defines the number of bytes that are allowed to be processed
+ * between preempt_disable() and preempt_enable(). SHA1 takes ~1000
+ * operations per 64 bytes. e500 cores can issue two arithmetic instructions
+ * per clock cycle using one 32/64 bit unit (SU1) and one 32 bit unit (SU2).
+ * Thus 2KB of input data will need an estimated maximum of 18,000 cycles.
+ * Headroom for cache misses included. Even with the low end model clocked
+ * at 667 MHz this equals to a critical time window of less than 27us.
+ *
+ */
+#define MAX_BYTES 2048
+
+extern void ppc_spe_sha1_transform(u32 *state, const u8 *src, u32 blocks);
+
+static void spe_begin(void)
+{
+   /* We just start SPE operations and will save SPE registers later. */
+   preempt_disable();
+   enable_kernel_spe();
+}
+
+static void spe_end(void)
+{
+   /* reenable preemption */
+   preempt_enable();
+}
+
+static inline void ppc_sha1_clear_context(struct sha1_state *sctx)
+{
+   int count = sizeof(struct sha1_state) >> 2;
+   u32 *ptr = (u32 *)sctx;
+
+   /* make sure we can clear the fast way */
+   BUILD_BUG_ON(sizeof(struct sha1_state) % 4);
+   do { *ptr++ = 0; } while (--count);
+}
+
+static int ppc_spe_sha1_init(struct shash_desc *desc)
+{
+   struct sha1_state *sctx = shash_desc_ctx(desc);
+
+   sctx->state[0] = SHA1_H0;
+   sctx->state[1] = SHA1_H1;
+   sctx->state[2] = SHA1_H2;
+   sctx->state[3] = SHA1_H3;
+   sctx->state[4] = SHA1_H4;
+   sctx->count = 0;
+
+   return 0;
+}
+
+static int ppc_spe_sha1_update(struct shash_desc *desc, const u8 *data,
+   unsigned int len)
+{
+   struct sha1_state *sctx = shash_desc_ctx(desc);
+   const unsigned int offset = sctx->count & 0x3f;
+   const unsigned int avail = 64 - offset;
+   unsigned int bytes;
+   const u8 *src = data;
+
+   if (avail > len) {
+   sctx->count += len;
+   memcpy((char *)sctx->buffer + offset, src, len);
+   return 0;
+   }
+
+   sctx->count += len;
+
+   if (offset) {
+   memcpy((char *)sctx->buffer + offset, src, avail);
+
+   spe_begin();
+   ppc_spe_sha1_transform(sctx->state, (const u8 *)sctx->buffer, 
1);
+   spe_end();
+
+   len -= avail;
+   src += avail;
+   }
+
+   while (len > 63) {
+   bytes = (len > MAX_BYTES) ? MAX_BYTES : len;
+   bytes = bytes & ~0x3f;
+
+   spe_begin();
+   ppc_spe_sha1_transform(sctx->state, src, bytes >> 6);
+   spe_end();
+
+   src += bytes;
+   len -= bytes;
+   };
+
+   memcpy((char *)sctx->buffer, src, len);
+   return 0;
+}
+
+static int ppc_spe_sha1_final(struct shash_desc *desc, u8 *out)
+{
+   struct sha1_state *sctx = shash_desc_ctx(desc);
+   const unsigned int offset = sctx->count & 0x3f;
+   char *p = (char *)sctx->buffer + offset;
+   int padlen;
+   __be64 *pbits = (__be64 *)(((char *)&sctx->buffer) + 56);
+   __be32 *dst = (__be32 *)out;
+
+   padlen = 55 - offset;
+   *p++ = 0x80;
+
+   spe_begin();
+
+   if (padlen < 0) {
+   memset(p, 0x00, padlen + sizeof (u64));
+   ppc_spe_sha1_transform(sctx->state, sctx->buffer, 1);
+   p = (char *)sctx->buffer;
+   padlen = 56;
+   }
+
+   memset(p, 0, padlen);
+   *pbits = cpu_to_be64(sctx->count << 3);
+   ppc_spe_sha1_transform(sctx->state, sctx->buffer, 1);
+
+   spe_end();
+
+   dst[0] = cpu_to_be32(sctx->state[0]);
+   dst[1] = cpu_to_be32(sctx->state[1]);
+   dst[2] = cpu_to_be32(sctx->state[2]);
+   dst[3] = cpu_to_be32(sctx->state[3]);
+   dst[4] = cpu_to_be32(s

[PATCH v1 0/3] SHA1 for PPC/SPE

2015-02-24 Thread Markus Stockhausen
[PATCH v1 0/3] SHA1 for PPC/SPE

The following patches add support for SIMD accelerated SHA1
calculation on PPC processors with SPE instruction set. The 
implementation takes care of the following constraints:

- independant of processor endianess
- save SPE registers for interrupt context compatibility
- disable preemtion only for short intervals

Performance numbers from insmod tcrypt sec=3 mode=303 taken
on e500v2 800 MHz (TP Link WDR4900)

dataper sha1-ppc this patch   speedup  cycles
length  update  bytes/secbytes/secfactor   per byte
--  --  ---  ---  ---  
16  169,686,688   13,195,285   x1.36  60.63
64  16   18,769,344   21,886,122   x1.17  36.55
64  64   26,187,712   33,181,184   x1.27  24.11
   256  16   27,461,120   29,614,080   x1.08  27.01
   256  64   45,257,898   52,748,373   x1.17  15.17
   256 256   56,050,773   68,863,061   x1.23  11.62
  1024  16   30,863,360   32,438,272   x1.05  24.66
  1024 256   72,531,626   85,434,709   x1.18   9.36
  10241024   78,640,469   94,731,605   x1.20   8.44
  2048  16   31,771,989   32,970,752   x1.04  24.26
  2048 256   76,478,464   89,234,090   x1.17   8.97
  20481024   83,010,218   98,902,698   x1.19   8.09
  20482048   84,336,640  101,038,762   x1.19   7.92

Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte
Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail
irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und
vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte
Weitergabe dieser Mail ist nicht gestattet.

Über das Internet versandte E-Mails können unter fremden Namen erstellt oder
manipuliert werden. Deshalb ist diese als E-Mail verschickte Nachricht keine
rechtsverbindliche Willenserklärung.

Collogia
Unternehmensberatung AG
Ubierring 11
D-50678 Köln

Vorstand:
Kadir Akin
Dr. Michael Höhnerbach

Vorsitzender des Aufsichtsrates:
Hans Kristian Langva

Registergericht: Amtsgericht Köln
Registernummer: HRB 52 497

This e-mail may contain confidential and/or privileged information. If you
are not the intended recipient (or have received this e-mail in error)
please notify the sender immediately and destroy this e-mail. Any
unauthorized copying, disclosure or distribution of the material in this
e-mail is strictly forbidden.

e-mails sent over the internet may have been written under a wrong name or
been manipulated. That is why this message sent as an e-mail is not a
legally binding declaration of intention.

Collogia
Unternehmensberatung AG
Ubierring 11
D-50678 Köln

executive board:
Kadir Akin
Dr. Michael Höhnerbach

President of the supervisory board:
Hans Kristian Langva

Registry office: district court Cologne
Register number: HRB 52 497


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v1 1/3] SHA1 for PPC/SPE - assembler

2015-02-24 Thread Markus Stockhausen
[PATCH v1 1/3] SHA1 for PPC/SPE - assembler

This is the assembler code for SHA1 implementation with
the SIMD SPE instruction set. With the enhanced instruction 
set we can operate on 2 32 bit words in parallel. That helps 
reducing the time to calculate W16-W79. For increasing 
performance even more the assembler function can compute 
hashes for more than one 64 byte input block. 

The state of the used SPE registers is preserved via the 
stack so we can run from interrupt context. There might 
be the case that we interrupt ourselves and push sensitive 
data from another context onto our stack. Clear this area
in the stack afterwards to avoid information leakage.

The code is endian independant.

Signed-off-by: Markus Stockhausen 

diff --git a/arch/powerpc/sha1-spe-asm.S b/arch/powerpc/sha1-spe-asm.S
new file mode 100644
index 000..fcb6cf0
--- /dev/null
+++ b/arch/powerpc/sha1-spe-asm.S
@@ -0,0 +1,299 @@
+/*
+ * Fast SHA-1 implementation for SPE instruction set (PPC)
+ *
+ * This code makes use of the SPE SIMD instruction set as defined in
+ * http://cache.freescale.com/files/32bit/doc/ref_manual/SPEPIM.pdf
+ * Implementation is based on optimization guide notes from
+ * http://cache.freescale.com/files/32bit/doc/app_note/AN2665.pdf
+ *
+ * Copyright (c) 2015 Markus Stockhausen 
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ */
+
+#include 
+#include 
+
+#define rHPr3  /* pointer to hash value*/
+#define rWPr4  /* pointer to input */
+#define rKPr5  /* pointer to constants */
+
+#define rW0r14 /* 64 bit round words   */
+#define rW1r15
+#define rW2r16
+#define rW3r17
+#define rW4r18
+#define rW5r19
+#define rW6r20
+#define rW7r21
+
+#define rH0r6  /* 32 bit hash values   */
+#define rH1r7
+#define rH2r8
+#define rH3r9
+#define rH4r10
+
+#define rT0r22 /* 64 bit temporary */
+#define rT1r0  /* 32 bit temporaries   */
+#define rT2r11
+#define rT3r12
+
+#define rK r23 /* 64 bit constant in volatile register */
+
+#define LOAD_K01
+
+#define LOAD_K11 \
+   evlwwsplat  rK,0(rKP);
+
+#define LOAD_K21 \
+   evlwwsplat  rK,4(rKP);
+
+#define LOAD_K31 \
+   evlwwsplat  rK,8(rKP);
+
+#define LOAD_K41 \
+   evlwwsplat  rK,12(rKP);
+
+#define INITIALIZE \
+   stwur1,-128(r1);/* create stack frame   */ \
+   evstdw  r14,8(r1);  /* We must save non volatile*/ \
+   evstdw  r15,16(r1); /* registers. Take the chance   */ \
+   evstdw  r16,24(r1); /* and save the SPE part too*/ \
+   evstdw  r17,32(r1);\
+   evstdw  r18,40(r1);\
+   evstdw  r19,48(r1);\
+   evstdw  r20,56(r1);\
+   evstdw  r21,64(r1);\
+   evstdw  r22,72(r1);\
+   evstdw  r23,80(r1);
+
+
+#define FINALIZE \
+   evldw   r14,8(r1);  /* restore SPE registers*/ \
+   evldw   r15,16(r1);\
+   evldw   r16,24(r1);\
+   evldw   r17,32(r1);\
+   evldw   r18,40(r1);\
+   evldw   r19,48(r1);\
+   evldw   r20,56(r1);\
+   evldw   r21,64(r1);\
+   evldw   r22,72(r1);\
+   evldw   r23,80(r1);\
+   xor r0,r0,r0;  \
+   stw r0,8(r1);   /* Delete sensitive data*/ \
+   stw r0,16(r1);  /* that we might have pushed*/ \
+   stw r0,24(r1);  /* from other context that runs */ \
+   stw r0,32(r1);  /* the same code. Assume that   */ \
+   stw r0,40(r1);  /* the lower part of the GPRs   */ \
+   stw r0,48(r1);  /* were already overwritten on  */ \
+   stw r0,56(r1);  /* the way down to here */ \
+   stw r0,64(r1);  

Re: Problems with Kernels 3.17-rc1 and onwards on Acube Sam460 AMCC 460ex board

2015-02-24 Thread Gerhard Pircher
Am 2015-02-24 um 12:08 schrieb Julian Margetson:
> Problems with  the Gib bisect
> Kernel wont compile after 10th bisect .
You can try "git bisect skip" to select another commit for testing.
Hopefully that one compiles fine then.

Gerhard

> drivers/built-in.o: In function `drm_mode_atomic_ioctl':
> (.text+0x865dc): undefined reference to `__get_user_bad'
> make: *** [vmlinux] Error 1
> root@julian-VirtualBox:/usr/src/linux# git bisect log
> git bisect start
> # bad: [c517d838eb7d07bbe9507871fab3931deccff539] Linux 4.0-rc1
> git bisect bad c517d838eb7d07bbe9507871fab3931deccff539
> # good: [bfa76d49576599a4b9f9b7a71f23d73d6dcff735] Linux 3.19
> git bisect good bfa76d49576599a4b9f9b7a71f23d73d6dcff735
> # good: [02f1f2170d2831b3233e91091c60a66622f29e82] kernel.h: remove ancient 
> __FUNCTION__ hack
> git bisect good 02f1f2170d2831b3233e91091c60a66622f29e82
> # bad: [796e1c55717e9a6ff5c81b12289ffa1ffd919b6f] Merge branch 'drm-next' of 
> git://people.freedesktop.org/~airlied/linux
> git bisect bad 796e1c55717e9a6ff5c81b12289ffa1ffd919b6f
> # good: [9682ec9692e5ac11c6caebd079324e727b19e7ce] Merge tag 
> 'driver-core-3.20-rc1' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
> git bisect good 9682ec9692e5ac11c6caebd079324e727b19e7ce
> # good: [a9724125ad014decf008d782e60447c811391326] Merge tag 'tty-3.20-rc1' 
> of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
> git bisect good a9724125ad014decf008d782e60447c811391326
> # good: [f43dff0ee00a259f524ce17ba4f8030553c66590] Merge tag 
> 'drm-amdkfd-next-fixes-2015-01-25' of 
> git://people.freedesktop.org/~gabbayo/linux into drm-next
> git bisect good f43dff0ee00a259f524ce17ba4f8030553c66590
> # bad: [cffe1e89dc9bf541a39d9287ced7c5addff07084] drm: sti: HDMI add audio 
> infoframe
> git bisect bad cffe1e89dc9bf541a39d9287ced7c5addff07084
> # good: [2f5b4ef15c60bc5292a3f006c018acb3da53737b] Merge tag 
> 'drm/tegra/for-3.20-rc1' of git://anongit.freedesktop.org/tegra/linux into 
> drm-next
> git bisect good 2f5b4ef15c60bc5292a3f006c018acb3da53737b
> # bad: [86588ce80ccd714793e9ba4140d7ae214229] drm/udl: optimize 
> udl_compress_hline16 (v2)
> git bisect bad 86588ce80ccd714793e9ba4140d7ae214229
> # bad: [d47df63393ed81977e0f6435988d9cbd70c867f7] drm/panel: simple: Add AVIC 
> TM070DDH03 panel support
> git bisect bad d47df63393ed81977e0f6435988d9cbd70c867f7
> # bad: [9469244d869623e8b54d9f3d4d00737e377af273] drm/atomic: Fix potential 
> use of state after free
> git bisect bad 9469244d869623e8b54d9f3d4d00737e377af273
> root@julian-VirtualBox:/usr/src/linux# 
> 
> 
> 
> On 02/24/2015, you wrote:
> 
>> On Fri, 2015-02-20 at 15:25 -0400, Julian Margetson wrote:
>>> On 2/18/2015 11:25 PM, Julian Margetson wrote:
>>
>>>  re PPC4XX PCI(E) MSI support.
>>> https://lists.ozlabs.org/pipermail/linuxppc-dev/2010-November/087273.html
>>
>> Hmm, I think all those comments were addressed before it was merged.
>>
>> I tried to get a 4xx board going here last week, but it doesn't seem happy. I
>> can get a bit of uboot but then it hangs, might be overheating.
>>
>> cheers
>>
>>
> 
> 
> ___
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
> 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: Problems with Kernels 3.17-rc1 and onwards on Acube Sam460 AMCC 460ex board

2015-02-24 Thread Julian Margetson

On 2/24/2015 7:10 AM, Julian Margetson wrote:

Problems with  the Gib bisect
Kernel wont compile after 10th bisect .

drivers/built-in.o: In function `drm_mode_atomic_ioctl':
(.text+0x865dc): undefined reference to `__get_user_bad'
make: *** [vmlinux] Error 1
root@julian-VirtualBox:/usr/src/linux# git bisect log
git bisect start
# bad: [c517d838eb7d07bbe9507871fab3931deccff539] Linux 4.0-rc1
git bisect bad c517d838eb7d07bbe9507871fab3931deccff539
# good: [bfa76d49576599a4b9f9b7a71f23d73d6dcff735] Linux 3.19
git bisect good bfa76d49576599a4b9f9b7a71f23d73d6dcff735
# good: [02f1f2170d2831b3233e91091c60a66622f29e82] kernel.h: remove ancient 
__FUNCTION__ hack
git bisect good 02f1f2170d2831b3233e91091c60a66622f29e82
# bad: [796e1c55717e9a6ff5c81b12289ffa1ffd919b6f] Merge branch 'drm-next' of 
git://people.freedesktop.org/~airlied/linux
git bisect bad 796e1c55717e9a6ff5c81b12289ffa1ffd919b6f
# good: [9682ec9692e5ac11c6caebd079324e727b19e7ce] Merge tag 
'driver-core-3.20-rc1' of 
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
git bisect good 9682ec9692e5ac11c6caebd079324e727b19e7ce
# good: [a9724125ad014decf008d782e60447c811391326] Merge tag 'tty-3.20-rc1' of 
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
git bisect good a9724125ad014decf008d782e60447c811391326
# good: [f43dff0ee00a259f524ce17ba4f8030553c66590] Merge tag 
'drm-amdkfd-next-fixes-2015-01-25' of 
git://people.freedesktop.org/~gabbayo/linux into drm-next
git bisect good f43dff0ee00a259f524ce17ba4f8030553c66590
# bad: [cffe1e89dc9bf541a39d9287ced7c5addff07084] drm: sti: HDMI add audio 
infoframe
git bisect bad cffe1e89dc9bf541a39d9287ced7c5addff07084
# good: [2f5b4ef15c60bc5292a3f006c018acb3da53737b] Merge tag 
'drm/tegra/for-3.20-rc1' of git://anongit.freedesktop.org/tegra/linux into 
drm-next
git bisect good 2f5b4ef15c60bc5292a3f006c018acb3da53737b
# bad: [86588ce80ccd714793e9ba4140d7ae214229] drm/udl: optimize 
udl_compress_hline16 (v2)
git bisect bad 86588ce80ccd714793e9ba4140d7ae214229
# bad: [d47df63393ed81977e0f6435988d9cbd70c867f7] drm/panel: simple: Add AVIC 
TM070DDH03 panel support
git bisect bad d47df63393ed81977e0f6435988d9cbd70c867f7
# bad: [9469244d869623e8b54d9f3d4d00737e377af273] drm/atomic: Fix potential use 
of state after free
git bisect bad 9469244d869623e8b54d9f3d4d00737e377af273
root@julian-VirtualBox:/usr/src/linux#



On 02/24/2015, you wrote:


On Fri, 2015-02-20 at 15:25 -0400, Julian Margetson wrote:

On 2/18/2015 11:25 PM, Julian Margetson wrote:
  re PPC4XX PCI(E) MSI support.
https://lists.ozlabs.org/pipermail/linuxppc-dev/2010-November/087273.html

Hmm, I think all those comments were addressed before it was merged.

I tried to get a 4xx board going here last week, but it doesn't seem happy. I
can get a bit of uboot but then it hangs, might be overheating.

cheers



Kernel 4.0.0-rc1 boots ok when  DVI output used but not when HDMI output 
used.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v12 17/21] powerpc/powernv: Shift VF resource with an offset

2015-02-24 Thread Bjorn Helgaas
On Tue, Feb 24, 2015 at 3:00 AM, Bjorn Helgaas  wrote:
> On Tue, Feb 24, 2015 at 02:34:57AM -0600, Bjorn Helgaas wrote:
>> From: Wei Yang 
>>
>> On PowerNV platform, resource position in M64 implies the PE# the resource
>> belongs to.  In some cases, adjustment of a resource is necessary to locate
>> it to a correct position in M64.
>>
>> Add pnv_pci_vf_resource_shift() to shift the 'real' PF IOV BAR address
>> according to an offset.
>>
>> [bhelgaas: rework loops, rework overlap check, index resource[]
>> conventionally, remove pci_regs.h include, squashed with next patch]
>> Signed-off-by: Wei Yang 
>> Signed-off-by: Bjorn Helgaas 
>
> ...
>
>> +#ifdef CONFIG_PCI_IOV
>> +static int pnv_pci_vf_resource_shift(struct pci_dev *dev, int offset)
>> +{
>> + struct pci_dn *pdn = pci_get_pdn(dev);
>> + int i;
>> + struct resource *res, res2;
>> + resource_size_t size;
>> + u16 vf_num;
>> +
>> + if (!dev->is_physfn)
>> + return -EINVAL;
>> +
>> + /*
>> +  * "offset" is in VFs.  The M64 windows are sized so that when they
>> +  * are segmented, each segment is the same size as the IOV BAR.
>> +  * Each segment is in a separate PE, and the high order bits of the
>> +  * address are the PE number.  Therefore, each VF's BAR is in a
>> +  * separate PE, and changing the IOV BAR start address changes the
>> +  * range of PEs the VFs are in.
>> +  */
>> + vf_num = pdn->vf_pes;
>> + for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
>> + res = &dev->resource[i + PCI_IOV_RESOURCES];
>> + if (!res->flags || !res->parent)
>> + continue;
>> +
>> + if (!pnv_pci_is_mem_pref_64(res->flags))
>> + continue;
>> +
>> + /*
>> +  * The actual IOV BAR range is determined by the start address
>> +  * and the actual size for vf_num VFs BAR.  This check is to
>> +  * make sure that after shifting, the range will not overlap
>> +  * with another device.
>> +  */
>> + size = pci_iov_resource_size(dev, i + PCI_IOV_RESOURCES);
>> + res2.flags = res->flags;
>> + res2.start = res->start + (size * offset);
>> + res2.end = res2.start + (size * vf_num) - 1;
>> +
>> + if (res2.end > res->end) {
>> + dev_err(&dev->dev, "VF BAR%d: %pR would extend past 
>> %pR (trying to enable %d VFs shifted by %d)\n",
>> + i, &res2, res, vf_num, offset);
>> + return -EBUSY;
>> + }
>> + }
>> +
>> + for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
>> + res = &dev->resource[i + PCI_IOV_RESOURCES];
>> + if (!res->flags || !res->parent)
>> + continue;
>> +
>> + if (!pnv_pci_is_mem_pref_64(res->flags))
>> + continue;
>> +
>> + size = pci_iov_resource_size(dev, i + PCI_IOV_RESOURCES);
>> + res2 = *res;
>> + res->start += size * offset;
>
> I'm still not happy about this fiddling with res->start.
>
> Increasing res->start means that in principle, the "size * offset" bytes
> that we just removed from res are now available for allocation to somebody
> else.  I don't think we *will* give that space to anything else because of
> the alignment restrictions you're enforcing, but "res" now doesn't
> correctly describe the real resource map.
>
> Would you be able to just update the BAR here while leaving the struct
> resource alone?  In that case, it would look a little funny that lspci
> would show a BAR value in the middle of the region in /proc/iomem, but
> the /proc/iomem region would be more correct.

I guess this would also require a tweak where we compute the addresses
of each of the VF resources.  Today it's probably just "base + VF_num
* size", where "base" is res->start.  We'd have to account for the
offset there if we don't adjust it here.

>> +
>> + dev_info(&dev->dev, "VF BAR%d: %pR shifted to %pR (enabling %d 
>> VFs shifted by %d)\n",
>> +  i, &res2, res, vf_num, offset);
>> + pci_update_resource(dev, i + PCI_IOV_RESOURCES);
>> + }
>> + pdn->max_vfs -= offset;
>> + return 0;
>> +}
>> +#endif /* CONFIG_PCI_IOV */
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc: Export __spin_yield

2015-02-24 Thread Suresh E. Warrier
On 02/23/2015 09:38 PM, Benjamin Herrenschmidt wrote:
> On Mon, 2015-02-23 at 18:10 -0600, Suresh E. Warrier wrote:
>> Export __spin_yield so that the arch_spin_unlock() function
>> can be invoked from a module.
> 
> Make it EXPORT_SYMBOL_GPL. Also explain why a module might need it
> 

Sure, I will change that to EXPORT_SYMBOL_GPL. Just curious, though, 
there is another symbol arch_spin_unlock_wait that is exported from
the file without the _GPL prefix. Any idea why?

I have mentioned that this needs to be exported to call the 
arch_spin_unlock() function from a module. What additional information
do you think will be useful here ? Are you looking at something
that explains why a module might need to call arch_spin_unlock()?

Thanks.
-suresh

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC 01/11] i2c: add quirk structure to describe adapter flaws

2015-02-24 Thread Wolfram Sang
On Mon, Jan 19, 2015 at 04:05:15PM +0100, Wolfram Sang wrote:
> 
> > > + struct i2c_adapter_quirks *quirks;
> > >  };
> > >  #define to_i2c_adapter(d) container_of(d, struct i2c_adapter, dev)
> > >  
> > 
> > I suggest to add const.
> > const struct i2c_adapter_quirks *quirks;
> > 
> > also, in i2c-core.c, should modify:
> > const struct i2c_adapter_quirks *q = adap->quirks;
> 
> Thanks, I'll think about it.

And added it...



signature.asc
Description: Digital signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC 02/11] i2c: add quirk checks to core

2015-02-24 Thread Wolfram Sang
On Mon, Jan 12, 2015 at 12:08:14PM +, Russell King - ARM Linux wrote:
> On Fri, Jan 09, 2015 at 06:21:32PM +0100, Wolfram Sang wrote:
> > +static int i2c_quirk_error(struct i2c_adapter *adap, struct i2c_msg *msg, 
> > char *err_msg)
> > +{
> > +   dev_err(&adap->dev, "quirk: %s (addr 0x%04x, size %u)\n", err_msg, 
> > msg->addr, msg->len);
> > +   return -EOPNOTSUPP;
> > +}
> 
> So, what happens if I open an I2C adapter, find a message which causes
> i2c_quirk_error() to be called, and then spin repeatedly calling that...
> Shouldn't there be some rate limiting to this?

Can be argued. Changed to dev_err_ratelimited(). Thanks!



signature.asc
Description: Digital signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC 02/11] i2c: add quirk checks to core

2015-02-24 Thread Wolfram Sang

> > +   if (msgs[i].flags & I2C_M_RD) {
> > +   if (i2c_quirk_exceeded(len, max_read))
> > +   return i2c_quirk_error(adap, &msgs[i], "msg 
> > too long");
> > +   } else {
> > +   if (i2c_quirk_exceeded(len, max_write))
> > +   return i2c_quirk_error(adap, &msgs[i], "msg 
> > too long");
> > +   }
> 
> What about being more verbose in the error message, specifying if it
> was a read or a write message that failed?

Yes, done now. Thanks!



signature.asc
Description: Digital signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: Problems with Kernels 3.17-rc1 and onwards on Acube Sam460 AMCC 460ex board

2015-02-24 Thread Julian Margetson
Problems with  the Gib bisect
Kernel wont compile after 10th bisect .

drivers/built-in.o: In function `drm_mode_atomic_ioctl':
(.text+0x865dc): undefined reference to `__get_user_bad'
make: *** [vmlinux] Error 1
root@julian-VirtualBox:/usr/src/linux# git bisect log
git bisect start
# bad: [c517d838eb7d07bbe9507871fab3931deccff539] Linux 4.0-rc1
git bisect bad c517d838eb7d07bbe9507871fab3931deccff539
# good: [bfa76d49576599a4b9f9b7a71f23d73d6dcff735] Linux 3.19
git bisect good bfa76d49576599a4b9f9b7a71f23d73d6dcff735
# good: [02f1f2170d2831b3233e91091c60a66622f29e82] kernel.h: remove ancient 
__FUNCTION__ hack
git bisect good 02f1f2170d2831b3233e91091c60a66622f29e82
# bad: [796e1c55717e9a6ff5c81b12289ffa1ffd919b6f] Merge branch 'drm-next' of 
git://people.freedesktop.org/~airlied/linux
git bisect bad 796e1c55717e9a6ff5c81b12289ffa1ffd919b6f
# good: [9682ec9692e5ac11c6caebd079324e727b19e7ce] Merge tag 
'driver-core-3.20-rc1' of 
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
git bisect good 9682ec9692e5ac11c6caebd079324e727b19e7ce
# good: [a9724125ad014decf008d782e60447c811391326] Merge tag 'tty-3.20-rc1' of 
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
git bisect good a9724125ad014decf008d782e60447c811391326
# good: [f43dff0ee00a259f524ce17ba4f8030553c66590] Merge tag 
'drm-amdkfd-next-fixes-2015-01-25' of 
git://people.freedesktop.org/~gabbayo/linux into drm-next
git bisect good f43dff0ee00a259f524ce17ba4f8030553c66590
# bad: [cffe1e89dc9bf541a39d9287ced7c5addff07084] drm: sti: HDMI add audio 
infoframe
git bisect bad cffe1e89dc9bf541a39d9287ced7c5addff07084
# good: [2f5b4ef15c60bc5292a3f006c018acb3da53737b] Merge tag 
'drm/tegra/for-3.20-rc1' of git://anongit.freedesktop.org/tegra/linux into 
drm-next
git bisect good 2f5b4ef15c60bc5292a3f006c018acb3da53737b
# bad: [86588ce80ccd714793e9ba4140d7ae214229] drm/udl: optimize 
udl_compress_hline16 (v2)
git bisect bad 86588ce80ccd714793e9ba4140d7ae214229
# bad: [d47df63393ed81977e0f6435988d9cbd70c867f7] drm/panel: simple: Add AVIC 
TM070DDH03 panel support
git bisect bad d47df63393ed81977e0f6435988d9cbd70c867f7
# bad: [9469244d869623e8b54d9f3d4d00737e377af273] drm/atomic: Fix potential use 
of state after free
git bisect bad 9469244d869623e8b54d9f3d4d00737e377af273
root@julian-VirtualBox:/usr/src/linux# 



On 02/24/2015, you wrote:

> On Fri, 2015-02-20 at 15:25 -0400, Julian Margetson wrote:
>> On 2/18/2015 11:25 PM, Julian Margetson wrote:
>
>>  re PPC4XX PCI(E) MSI support.
>> https://lists.ozlabs.org/pipermail/linuxppc-dev/2010-November/087273.html
>
> Hmm, I think all those comments were addressed before it was merged.
>
> I tried to get a 4xx board going here last week, but it doesn't seem happy. I
> can get a bit of uboot but then it hangs, might be overheating.
>
> cheers
>
>


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 0/7] Serialise oopses, BUGs, WARNs, dump_stack, soft lockups and hard lockups

2015-02-24 Thread Russell King - ARM Linux
On Tue, Feb 24, 2015 at 01:39:46AM -0800, Arjan van de Ven wrote:
> one of the question is if you want to serialize, or if you just want
> to label.  If you take a cookie (could just be a monotonic increasing
> number) at the start of the oops and then prefix/postfix the stack
> printing with that number, you don't serialize (risk of locking up),
> but you can pretty trivially see which line came from where..
> if you do the monotonic increasing number approach, you even get an
> ordering out of it. it does mean changing the dump_stack() and co
> function fingerprint to take an extra argument, but that is not TOO
> insane.

I like that idea, but it relies on ensuring that each line is printed
by one printk() statement - which in itself is a good idea.

I'd actually like a version of print_hex_dump() which we could use for
stack and code dumping - the existing print_hex_dump() assumes that it's
fine to dereference the pointer, whereas for stack and code dumping,
we can't always make that assumption.  That's a separate issue though.

-- 
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: Problems with Kernels 3.17-rc1 and onwards on Acube Sam460 AMCC 460ex board

2015-02-24 Thread Julian Margetson
I had  a hanging Uboot problem with a Sam440ep board.Never figured the problem 
out but
It workedd for another two years after the problems began. Died for good last 
September with the hanging becoming  a daily issue.
Dont think that it was overheating. I thought that it could have been a problem 
with  the on board  ethernet.

anyway I am still not giving up hopes of DRI and future kernals working, only 
into my second year of trying so too soon to give up .
Doing a git bisect on the 4.00-rc1 now.

On 02/24/2015, you wrote:

> On Fri, 2015-02-20 at 15:25 -0400, Julian Margetson wrote:
>> On 2/18/2015 11:25 PM, Julian Margetson wrote:
>
>>  re PPC4XX PCI(E) MSI support.
>> https://lists.ozlabs.org/pipermail/linuxppc-dev/2010-November/087273.html
>
> Hmm, I think all those comments were addressed before it was merged.
>
> I tried to get a 4xx board going here last week, but it doesn't seem happy. I
> can get a bit of uboot but then it hangs, might be overheating.
>
> cheers
>
>


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH 1/7] Add die_spin_lock_{irqsave,irqrestore}

2015-02-24 Thread David Laight
From: Ingo Molnar
...
> So why not trylock and time out here after a few seconds,
> instead of indefinitely supressing some potentially vital
> output due to some other CPU crashing/locking with the lock
> held?

I've used that for status requests that usually lock a table
to get a consistent view.
If trylock times out assume that the data is actually stable
and access it anyway. Remember the pid doing the access and
next time it tries to acquire the same lock do a trylock with
no timeout.
That way if (when) there is a locking fubar (or a driver oops
with a lock held) at least some of the relevant status commands
will work.

David

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V4] powerpc, powernv: Add OPAL platform event driver

2015-02-24 Thread Vipin K Parashar

Hi Stewart,
 I looked into ACPI and found details about it. But before we 
go into
discussing more details of it, would like to  share a brief about OPAL 
platform

events (EPOW/DPO) work and original design proposed.

As if now OPAL platform events work supports two events:
EPOW (Early Power Off Warning) and DPO (Delayed Power Off).

On FSP based systems FSP notifies OPAL about EPOW and DPO events via mbox
mechanism. Subsequently OPAL sends notifications for these events to 
pkvm kernel.
Original design is to have a kernel driver maintain a queue and add 
these events
to queue upon arrival. pkvm driver also provides a character device for 
host to consume
these events. A daemon is proposed for pkvm host to poll/read these 
events from

char device. This daemon would process these events and take action to log
and shutdown host. Apart from this it would also send these event info 
to VMs
which is handled by OSes running on VMs. Linux on VMs already has code 
in place
to handle these events as it expects this info to reach it in PAPR 
format under

EPOW (Environmental and Power Warnings) category.

EPOW mbox msgs are received for below events:
1. UPS events - UPS Battery Low, UPS Bypassed, UPS Utility Failure, UPS On
2. SPCN events - Configuration Change, Log SPCN Fault, Impending Power 
Failure, Power Incomplete

3. Temprature events - Over Ambient temperature, Over internal temperature.

Now ACPI:

Looked into ACPI and tried to figure out how ACPI userspace/kernel 
framework

can be helpful for our work.

ACPI user space consists of below components.
acpid - ACPI daemon to receive events from kernel
acpid provides events and actions files in /etc/acpi dir to configure 
actions

for various events.

acpi, acpi_listen, acpitool - Commands to query and set various ACPI 
supported parameters.
These tools work with various sysfs files to show/set various parameter 
values.


As if today acpid and other tools don't exist for POWER so would need to 
be ported.
acpid is useful for our work but other tools might not be helpful as 
they look into
various sysfs files created by various ACPI kernel drivers which we 
won't have.

Also we would need to map our EPOW/DPO events to acpid supported events
and few events link SPCN ones won't map straight away and might need to be
added in acpid as new events.

ACPI in kernel has various drivers for fan, battery, laptop buttons etc. 
They handle events
and uses netlink mechanism to sent out these events to userspace. Now 
looking into ACPI
code it seems that we would be reusing a small chunk of acpi code but 
instead end up adding
unnecessary complexity due to support a lot of stuff than needed by us. 
Here too mapping our
 EPOW/DPO events to ACPI defined structures in needed and we would need 
to add
new member varaibles in ACPI event structures for unmapped events like 
SPCN ones.


In nutshell it seems that by using ACPI we would end up adding lot more 
complexity with a little

gain of code reuse.

Netlink:

On technology side netlink seems to be a faster method compared to 
character driver. So that could be
a good alternative to use as a method of communication between our pkvm 
driver and userspace.
But EPOW/DPO events occur at very low rate unlike network subsystem 
which receive data packets
at a very high rate. So probably netlink could be a faster method but 
due to slow EPOW/DPO event

traffic a character driver might be sufficient.

We already have ppc64-diag package which is part of various distros so 
would be used for hosting
daemon code. Thus it takes off overhead of convincing distros for adding 
something extra.


This was my findings and opinions on alternatives. Apologies for a 
little lengthy text :-)


Let me know if i missed out anything and any suggestions that you would 
have.


Regards,
Vipin

On 02/11/2015 10:32 AM, Stewart Smith wrote:

Vipin K Parashar  writes:

(1) Environmental and Power Warning (EPOW)
(2) Delayed Power Off (DPO)
The user interface for this driver is /dev/opal_event character
device file where the user space clients can poll and read for
new opal platform events. The expected sequence of events driven
from user space should be like the following.

(1) Open the character device file
(2) Poll on the file for POLLIN event
(3) When unblocked, must attempt to read OPAL_PLAT_EVENT_MAX_SIZE size
(4) Kernel driver will pass at most one opal_plat_event structure
(5) Poll again for more new events

A few thoughts from discussing with Michael and Joel:
- not convinced that a chardev is the most ideal way to notify
   userspace. It seems like yet-another powerpc specific notification
   mechanism, which isn't ideal.
- netlink probably isn't right either (although maybe *sligthtly*
   better?)
- it seems that the "standard" way is ACPI, so I wonder if we could emit
   an ACPI event and essentially fake having ACPI... that would make all
   existing userspace "just work", righ

Re: [PATCH 0/7] Serialise oopses, BUGs, WARNs, dump_stack, soft lockups and hard lockups

2015-02-24 Thread Arjan van de Ven
>> Some architectures already have their own recursive
>> locking for oopses and we have another version for
>> serialising dump_stack.
>>
>> Create a common version and use it everywhere (oopses,
>> BUGs, WARNs, dump_stack, soft lockups and hard lockups).
>
> Dunno. I've had cases where the simultaneity of the oopses
> (i.e. their garbled nature) gave me the clue about the type
> of race to expect.
>


one of the question is if you want to serialize, or if you just want to label.
If you take a cookie (could just be a monotonic increasing number) at
the start of the oops
and then prefix/postfix the stack printing with that number, you don't
serialize (risk of locking up),
but you can pretty trivially see which line  came from where..
if you do the monotonic increasing number approach, you even get an
ordering out of it.
it does mean changing the dump_stack() and co function fingerprint to
take an extra argument,
but that is not TOO insane.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: Build regressions/improvements in v4.0-rc1

2015-02-24 Thread Geert Uytterhoeven
Hi Michael,

On Tue, Feb 24, 2015 at 5:52 AM, Michael Ellerman  wrote:
>> >   + error: book3s_64_vio_hv.c: undefined reference to 
>> > `power7_wakeup_loss':  => .text+0x408)
>>
>> pseries_defconfig
>
> This one is actually from pseries_defconfig+POWERNV=n, so I think I

Thanks!

> broke your script with the + notation in the config name :)

Nope, my brain used the wrong separator.

However, my scripts do have a problem with the subdirectories
in arch/powerpc/configs/ (4xx/currituck_defconfig)...

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v12 18/21] powerpc/powernv: Reserve additional space for IOV BAR, with m64_per_iov supported

2015-02-24 Thread Bjorn Helgaas
On Tue, Feb 24, 2015 at 02:35:04AM -0600, Bjorn Helgaas wrote:
> From: Wei Yang 
> 
> M64 aperture size is limited on PHB3.  When the IOV BAR is too big, this
> will exceed the limitation and failed to be assigned.
> 
> Introduce a different mechanism based on the IOV BAR size:
> 
>   - if IOV BAR size is smaller than 64MB, expand to total_pe
>   - if IOV BAR size is bigger than 64MB, roundup power2
> 
> [bhelgaas: make dev_printk() output more consistent, use PCI_SRIOV_NUM_BARS]
> Signed-off-by: Wei Yang 
> Signed-off-by: Bjorn Helgaas 
> ---
>  arch/powerpc/include/asm/pci-bridge.h |2 ++
>  arch/powerpc/platforms/powernv/pci-ioda.c |   33 
> ++---
>  2 files changed, 32 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/pci-bridge.h 
> b/arch/powerpc/include/asm/pci-bridge.h
> index 011340df8583..d824bb184ab8 100644
> --- a/arch/powerpc/include/asm/pci-bridge.h
> +++ b/arch/powerpc/include/asm/pci-bridge.h
> @@ -179,6 +179,8 @@ struct pci_dn {
>   u16 max_vfs;/* number of VFs IOV BAR expended */
>   u16 vf_pes; /* VF PE# under this PF */
>   int offset; /* PE# for the first VF PE */
> +#define M64_PER_IOV 4
> + int m64_per_iov;
>  #define IODA_INVALID_M64(-1)
>   int m64_wins[PCI_SRIOV_NUM_BARS];
>  #endif /* CONFIG_PCI_IOV */
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
> b/arch/powerpc/platforms/powernv/pci-ioda.c
> index a3c2fbe35fc8..30b7c3909746 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -2242,6 +2242,7 @@ static void pnv_pci_ioda_fixup_iov_resources(struct 
> pci_dev *pdev)
>   int i;
>   resource_size_t size;
>   struct pci_dn *pdn;
> + int mul, total_vfs;
>  
>   if (!pdev->is_physfn || pdev->is_added)
>   return;
> @@ -2252,6 +2253,32 @@ static void pnv_pci_ioda_fixup_iov_resources(struct 
> pci_dev *pdev)
>   pdn = pci_get_pdn(pdev);
>   pdn->max_vfs = 0;
>  
> + total_vfs = pci_sriov_get_totalvfs(pdev);
> + pdn->m64_per_iov = 1;
> + mul = phb->ioda.total_pe;
> +
> + for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
> + res = &pdev->resource[i + PCI_IOV_RESOURCES];
> + if (!res->flags || res->parent)
> + continue;
> + if (!pnv_pci_is_mem_pref_64(res->flags)) {
> + dev_warn(&pdev->dev, " non M64 VF BAR%d: %pR\n",
> +  i, res);

Why is this a dev_warn()?  Can the user do anything about it?  Do you want
bug reports if users see this message?  There are several other instances
of this in the other patches, too.

> + continue;
> + }
> +
> + size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES);
> +
> + /* bigger than 64M */
> + if (size > (1 << 26)) {
> + dev_info(&pdev->dev, "PowerNV: VF BAR%d: %pR IOV size 
> is bigger than 64M, roundup power2\n",
> +  i, res);
> + pdn->m64_per_iov = M64_PER_IOV;
> + mul = __roundup_pow_of_two(total_vfs);

Why is this __roundup_pow_of_two() instead of roundup_pow_of_two()?
I *think* __roundup_pow_of_two() is basically a helper function for
implementing roundup_pow_of_two() and not intended to be used by itself.

I think there are other patches that use __roundup_pow_of_two(), too.

> + break;
> + }
> + }
> +
>   for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
>   res = &pdev->resource[i + PCI_IOV_RESOURCES];
>   if (!res->flags || res->parent)
> @@ -2264,12 +2291,12 @@ static void pnv_pci_ioda_fixup_iov_resources(struct 
> pci_dev *pdev)
>  
>   dev_dbg(&pdev->dev, " Fixing VF BAR%d: %pR to\n", i, res);
>   size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES);
> - res->end = res->start + size * phb->ioda.total_pe - 1;
> + res->end = res->start + size * mul - 1;
>   dev_dbg(&pdev->dev, "   %pR\n", res);
>   dev_info(&pdev->dev, "VF BAR%d: %pR (expanded to %d VFs for PE 
> alignment)",
> - i, res, phb->ioda.total_pe);
> +  i, res, mul);
>   }
> - pdn->max_vfs = phb->ioda.total_pe;
> + pdn->max_vfs = mul;
>  }
>  
>  static void pnv_pci_ioda_fixup_sriov(struct pci_bus *bus)
> 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 4/4] powerpc/mpic: remove unused functions

2015-02-24 Thread Arseny Solokha
Drop unused fsl_mpic_primary_get_version(), mpic_set_clk_ratio(),
mpic_set_serial_int().

  + fsl_mpic_primary_get_version() is just a safe wrapper around
fsl_mpic_get_version() for SMP configurations. While the latter is
called explicitly for handling PIC initialization and setting up error
interrupt vector depending on PIC hardware version, the former isn't
used for anything.

  + As for mpic_set_clk_ratio() and mpic_set_serial_int(), they both are
almost nine years old[1] but still have no chance to be called even from
out-of-tree modules because they both are __init and of course aren't
exported.

[1] https://lists.ozlabs.org/pipermail/linuxppc-dev/2006-June/023867.html

Signed-off-by: Arseny Solokha 
Cc: hongtao@freescale.com
---
 arch/powerpc/include/asm/mpic.h | 20 
 arch/powerpc/sysdev/mpic.c  | 35 ---
 2 files changed, 55 deletions(-)

diff --git a/arch/powerpc/include/asm/mpic.h b/arch/powerpc/include/asm/mpic.h
index 754f93d..6ce63a7 100644
--- a/arch/powerpc/include/asm/mpic.h
+++ b/arch/powerpc/include/asm/mpic.h
@@ -34,10 +34,6 @@
 #defineMPIC_GREG_GCONF_BASE_MASK   0x000f
 #defineMPIC_GREG_GCONF_MCK 0x0800
 #define MPIC_GREG_GLOBAL_CONF_10x00030
-#defineMPIC_GREG_GLOBAL_CONF_1_SIE 0x0800
-#defineMPIC_GREG_GLOBAL_CONF_1_CLK_RATIO_MASK  0x7000
-#defineMPIC_GREG_GLOBAL_CONF_1_CLK_RATIO(r)\
-   (((r) << 28) & MPIC_GREG_GLOBAL_CONF_1_CLK_RATIO_MASK)
 #define MPIC_GREG_VENDOR_0 0x00040
 #define MPIC_GREG_VENDOR_1 0x00050
 #define MPIC_GREG_VENDOR_2 0x00060
@@ -395,16 +391,6 @@ extern struct bus_type mpic_subsys;
 #defineMPIC_REGSET_STANDARDMPIC_REGSET(0)  /* Original 
MPIC */
 #defineMPIC_REGSET_TSI108  MPIC_REGSET(1)  /* Tsi108/109 
PIC */
 
-/* Get the version of primary MPIC */
-#ifdef CONFIG_MPIC
-extern u32 fsl_mpic_primary_get_version(void);
-#else
-static inline u32 fsl_mpic_primary_get_version(void)
-{
-   return 0;
-}
-#endif
-
 /* Allocate the controller structure and setup the linux irq descs
  * for the range if interrupts passed in. No HW initialization is
  * actually performed.
@@ -496,11 +482,5 @@ extern unsigned int mpic_get_coreint_irq(void);
 /* Fetch Machine Check interrupt from primary mpic */
 extern unsigned int mpic_get_mcirq(void);
 
-/* Set the EPIC clock ratio */
-void mpic_set_clk_ratio(struct mpic *mpic, u32 clock_ratio);
-
-/* Enable/Disable EPIC serial interrupt mode */
-void mpic_set_serial_int(struct mpic *mpic, int enable);
-
 #endif /* __KERNEL__ */
 #endif /* _ASM_POWERPC_MPIC_H */
diff --git a/arch/powerpc/sysdev/mpic.c b/arch/powerpc/sysdev/mpic.c
index bbfbbf2..f72b592 100644
--- a/arch/powerpc/sysdev/mpic.c
+++ b/arch/powerpc/sysdev/mpic.c
@@ -1219,16 +1219,6 @@ static u32 fsl_mpic_get_version(struct mpic *mpic)
  * Exported functions
  */
 
-u32 fsl_mpic_primary_get_version(void)
-{
-   struct mpic *mpic = mpic_primary;
-
-   if (mpic)
-   return fsl_mpic_get_version(mpic);
-
-   return 0;
-}
-
 struct mpic * __init mpic_alloc(struct device_node *node,
phys_addr_t phys_addr,
unsigned int flags,
@@ -1676,31 +1666,6 @@ void __init mpic_init(struct mpic *mpic)
mpic_err_int_init(mpic, MPIC_FSL_ERR_INT);
 }
 
-void __init mpic_set_clk_ratio(struct mpic *mpic, u32 clock_ratio)
-{
-   u32 v;
-
-   v = mpic_read(mpic->gregs, MPIC_GREG_GLOBAL_CONF_1);
-   v &= ~MPIC_GREG_GLOBAL_CONF_1_CLK_RATIO_MASK;
-   v |= MPIC_GREG_GLOBAL_CONF_1_CLK_RATIO(clock_ratio);
-   mpic_write(mpic->gregs, MPIC_GREG_GLOBAL_CONF_1, v);
-}
-
-void __init mpic_set_serial_int(struct mpic *mpic, int enable)
-{
-   unsigned long flags;
-   u32 v;
-
-   raw_spin_lock_irqsave(&mpic_lock, flags);
-   v = mpic_read(mpic->gregs, MPIC_GREG_GLOBAL_CONF_1);
-   if (enable)
-   v |= MPIC_GREG_GLOBAL_CONF_1_SIE;
-   else
-   v &= ~MPIC_GREG_GLOBAL_CONF_1_SIE;
-   mpic_write(mpic->gregs, MPIC_GREG_GLOBAL_CONF_1, v);
-   raw_spin_unlock_irqrestore(&mpic_lock, flags);
-}
-
 void mpic_irq_set_priority(unsigned int irq, unsigned int pri)
 {
struct mpic *mpic = mpic_find(irq);
-- 
2.3.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 3/4] powrepc/qe: drop unused ucc_slow_poll_transmitter_now

2015-02-24 Thread Arseny Solokha
Drop ucc_slow_poll_transmitter_now() which has no users since its
inception in 2007 in commit 986585385131 ("[POWERPC] Add QUICC
Engine (QE) infrastructure").

Signed-off-by: Arseny Solokha 
---
 arch/powerpc/include/asm/ucc_slow.h   | 13 -
 arch/powerpc/sysdev/qe_lib/ucc_slow.c |  5 -
 2 files changed, 18 deletions(-)

diff --git a/arch/powerpc/include/asm/ucc_slow.h 
b/arch/powerpc/include/asm/ucc_slow.h
index c44131e..233ef5f 100644
--- a/arch/powerpc/include/asm/ucc_slow.h
+++ b/arch/powerpc/include/asm/ucc_slow.h
@@ -251,19 +251,6 @@ void ucc_slow_enable(struct ucc_slow_private * uccs, enum 
comm_dir mode);
  */
 void ucc_slow_disable(struct ucc_slow_private * uccs, enum comm_dir mode);
 
-/* ucc_slow_poll_transmitter_now
- * Immediately forces a poll of the transmitter for data to be sent.
- * Typically, the hardware performs a periodic poll for data that the
- * transmit routine has set up to be transmitted. In cases where
- * this polling cycle is not soon enough, this optional routine can
- * be invoked to force a poll right away, instead. Proper use for
- * each transmission for which this functionality is desired is to
- * call the transmit routine and then this routine right after.
- *
- * uccs - (In) pointer to the slow UCC structure.
- */
-void ucc_slow_poll_transmitter_now(struct ucc_slow_private * uccs);
-
 /* ucc_slow_graceful_stop_tx
  * Smoothly stops transmission on a specified slow UCC.
  *
diff --git a/arch/powerpc/sysdev/qe_lib/ucc_slow.c 
b/arch/powerpc/sysdev/qe_lib/ucc_slow.c
index befaf11..5f91628 100644
--- a/arch/powerpc/sysdev/qe_lib/ucc_slow.c
+++ b/arch/powerpc/sysdev/qe_lib/ucc_slow.c
@@ -43,11 +43,6 @@ u32 ucc_slow_get_qe_cr_subblock(int uccs_num)
 }
 EXPORT_SYMBOL(ucc_slow_get_qe_cr_subblock);
 
-void ucc_slow_poll_transmitter_now(struct ucc_slow_private * uccs)
-{
-   out_be16(&uccs->us_regs->utodr, UCC_SLOW_TOD);
-}
-
 void ucc_slow_graceful_stop_tx(struct ucc_slow_private * uccs)
 {
struct ucc_slow_info *us_info = uccs->us_info;
-- 
2.3.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 2/4] kvm/ppc/mpic: drop unused IRQ_testbit

2015-02-24 Thread Arseny Solokha
Drop unused static procedure which doesn't have callers within its
translation unit. It had been already removed independently in QEMU[1]
from the OpenPIC implementation borrowed from the kernel.

[1] https://lists.gnu.org/archive/html/qemu-devel/2014-06/msg01812.html

Signed-off-by: Arseny Solokha 
Cc: Alexander Graf 
Cc: Gleb Natapov 
Cc: Paolo Bonzini 
---
 arch/powerpc/kvm/mpic.c | 5 -
 1 file changed, 5 deletions(-)

diff --git a/arch/powerpc/kvm/mpic.c b/arch/powerpc/kvm/mpic.c
index 39b3a8f..a480d99 100644
--- a/arch/powerpc/kvm/mpic.c
+++ b/arch/powerpc/kvm/mpic.c
@@ -289,11 +289,6 @@ static inline void IRQ_resetbit(struct irq_queue *q, int 
n_IRQ)
clear_bit(n_IRQ, q->queue);
 }
 
-static inline int IRQ_testbit(struct irq_queue *q, int n_IRQ)
-{
-   return test_bit(n_IRQ, q->queue);
-}
-
 static void IRQ_check(struct openpic *opp, struct irq_queue *q)
 {
int irq = -1;
-- 
2.3.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V2 0/4] powerpc: trivial unused functions cleanup

2015-02-24 Thread Arseny Solokha
This series removes unused functions from powerpc tree that I've been
able to discover.

Two machines at hands, e300 and e500 based, boot and run without
regressions on my workload with this series applied. The removed code
seems also been rarely touched, so it seems the series is safe at least
in general. But I can't obviously express any strong point in support of
the series, so it's completely OK to leave things as is.

Arseny Solokha (4):
  powerpc/boot: drop planetcore_set_serial_speed
  kvm/ppc/mpic: drop unused IRQ_testbit
  powrepc/qe: drop unused ucc_slow_poll_transmitter_now
  powerpc/mpic: remove unused functions

 arch/powerpc/boot/planetcore.c| 33 -
 arch/powerpc/boot/planetcore.h|  3 ---
 arch/powerpc/include/asm/mpic.h   | 20 
 arch/powerpc/include/asm/ucc_slow.h   | 13 -
 arch/powerpc/kvm/mpic.c   |  5 -
 arch/powerpc/sysdev/mpic.c| 35 ---
 arch/powerpc/sysdev/qe_lib/ucc_slow.c |  5 -
 7 files changed, 114 deletions(-)

-- 
2.3.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 1/4] powerpc/boot: drop planetcore_set_serial_speed

2015-02-24 Thread Arseny Solokha
Drop planetcore_set_serial_speed() which had no users since its
inception in commit fec6047047fd ("[POWERPC] bootwrapper: Add PlanetCore
firmware support") in 2007.

Signed-off-by: Arseny Solokha 
---
 arch/powerpc/boot/planetcore.c | 33 -
 arch/powerpc/boot/planetcore.h |  3 ---
 2 files changed, 36 deletions(-)

diff --git a/arch/powerpc/boot/planetcore.c b/arch/powerpc/boot/planetcore.c
index 0d8558a..75117e6 100644
--- a/arch/powerpc/boot/planetcore.c
+++ b/arch/powerpc/boot/planetcore.c
@@ -131,36 +131,3 @@ void planetcore_set_stdout_path(const char *table)
 
setprop_str(chosen, "linux,stdout-path", path);
 }
-
-void planetcore_set_serial_speed(const char *table)
-{
-   void *chosen, *stdout;
-   u64 baud;
-   u32 baud32;
-   int len;
-
-   chosen = finddevice("/chosen");
-   if (!chosen)
-   return;
-
-   len = getprop(chosen, "linux,stdout-path", prop_buf, MAX_PROP_LEN);
-   if (len <= 0)
-   return;
-
-   stdout = finddevice(prop_buf);
-   if (!stdout) {
-   printf("planetcore_set_serial_speed: "
-  "Bad /chosen/linux,stdout-path.\r\n");
-
-   return;
-   }
-
-   if (!planetcore_get_decimal(table, PLANETCORE_KEY_SERIAL_BAUD,
-   &baud)) {
-   printf("planetcore_set_serial_speed: No SB tag.\r\n");
-   return;
-   }
-
-   baud32 = baud;
-   setprop(stdout, "current-speed", &baud32, 4);
-}
diff --git a/arch/powerpc/boot/planetcore.h b/arch/powerpc/boot/planetcore.h
index 0d4094f..d53c733 100644
--- a/arch/powerpc/boot/planetcore.h
+++ b/arch/powerpc/boot/planetcore.h
@@ -43,7 +43,4 @@ void planetcore_set_mac_addrs(const char *table);
  */
 void planetcore_set_stdout_path(const char *table);
 
-/* Sets the current-speed property in the serial node. */
-void planetcore_set_serial_speed(const char *table);
-
 #endif
-- 
2.3.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v12 17/21] powerpc/powernv: Shift VF resource with an offset

2015-02-24 Thread Bjorn Helgaas
On Tue, Feb 24, 2015 at 02:34:57AM -0600, Bjorn Helgaas wrote:
> From: Wei Yang 
> 
> On PowerNV platform, resource position in M64 implies the PE# the resource
> belongs to.  In some cases, adjustment of a resource is necessary to locate
> it to a correct position in M64.
> 
> Add pnv_pci_vf_resource_shift() to shift the 'real' PF IOV BAR address
> according to an offset.

I think I squashed the "powerpc/powernv: Allocate VF PE" into this one, but
I didn't merge the changelog into this one.  Those two patches don't seem
super related to each other, but I think there really was some dependency.

> [bhelgaas: rework loops, rework overlap check, index resource[]
> conventionally, remove pci_regs.h include, squashed with next patch]
> Signed-off-by: Wei Yang 
> Signed-off-by: Bjorn Helgaas 
> ---
>  arch/powerpc/include/asm/pci-bridge.h |4 
>  arch/powerpc/kernel/pci_dn.c  |   11 +
>  arch/powerpc/platforms/powernv/pci-ioda.c |  520 
> -
>  arch/powerpc/platforms/powernv/pci.c  |   18 +
>  arch/powerpc/platforms/powernv/pci.h  |7 
>  5 files changed, 543 insertions(+), 17 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/pci-bridge.h 
> b/arch/powerpc/include/asm/pci-bridge.h
> index de11de7d4547..011340df8583 100644
> --- a/arch/powerpc/include/asm/pci-bridge.h
> +++ b/arch/powerpc/include/asm/pci-bridge.h
> @@ -177,6 +177,10 @@ struct pci_dn {
>   int pe_number;
>  #ifdef CONFIG_PCI_IOV
>   u16 max_vfs;/* number of VFs IOV BAR expended */
> + u16 vf_pes; /* VF PE# under this PF */
> + int offset; /* PE# for the first VF PE */
> +#define IODA_INVALID_M64(-1)
> + int m64_wins[PCI_SRIOV_NUM_BARS];
>  #endif /* CONFIG_PCI_IOV */
>  #endif
>   struct list_head child_list;
> diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
> index f3a1a81d112f..5faf7ca45434 100644
> --- a/arch/powerpc/kernel/pci_dn.c
> +++ b/arch/powerpc/kernel/pci_dn.c
> @@ -217,6 +217,17 @@ void remove_dev_pci_info(struct pci_dev *pdev)
>   struct pci_dn *pdn, *tmp;
>   int i;
>  
> + /*
> +  * VF and VF PE are created/released dynamically, so we need to
> +  * bind/unbind them.  Otherwise the VF and VF PE would be mismatched
> +  * when re-enabling SR-IOV.
> +  */
> + if (pdev->is_virtfn) {
> + pdn = pci_get_pdn(pdev);
> + pdn->pe_number = IODA_INVALID_PE;
> + return;
> + }
> +
>   /* Only support IOV PF for now */
>   if (!pdev->is_physfn)
>   return;
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
> b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 6a86690bb8de..a3c2fbe35fc8 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -44,6 +44,9 @@
>  #include "powernv.h"
>  #include "pci.h"
>  
> +/* 256M DMA window, 4K TCE pages, 8 bytes TCE */
> +#define TCE32_TABLE_SIZE ((0x1000 / 0x1000) * 8)
> +
>  static void pe_level_printk(const struct pnv_ioda_pe *pe, const char *level,
>   const char *fmt, ...)
>  {
> @@ -56,11 +59,18 @@ static void pe_level_printk(const struct pnv_ioda_pe *pe, 
> const char *level,
>   vaf.fmt = fmt;
>   vaf.va = &args;
>  
> - if (pe->pdev)
> + if (pe->flags & PNV_IODA_PE_DEV)
>   strlcpy(pfix, dev_name(&pe->pdev->dev), sizeof(pfix));
> - else
> + else if (pe->flags & (PNV_IODA_PE_BUS | PNV_IODA_PE_BUS_ALL))
>   sprintf(pfix, "%04x:%02x ",
>   pci_domain_nr(pe->pbus), pe->pbus->number);
> +#ifdef CONFIG_PCI_IOV
> + else if (pe->flags & PNV_IODA_PE_VF)
> + sprintf(pfix, "%04x:%02x:%2x.%d",
> + pci_domain_nr(pe->parent_dev->bus),
> + (pe->rid & 0xff00) >> 8,
> + PCI_SLOT(pe->rid), PCI_FUNC(pe->rid));
> +#endif /* CONFIG_PCI_IOV*/
>  
>   printk("%spci %s: [PE# %.3d] %pV",
>  level, pfix, pe->pe_number, &vaf);
> @@ -591,7 +601,7 @@ static int pnv_ioda_set_peltv(struct pnv_phb *phb,
> bool is_add)
>  {
>   struct pnv_ioda_pe *slave;
> - struct pci_dev *pdev;
> + struct pci_dev *pdev = NULL;
>   int ret;
>  
>   /*
> @@ -630,8 +640,12 @@ static int pnv_ioda_set_peltv(struct pnv_phb *phb,
>  
>   if (pe->flags & (PNV_IODA_PE_BUS_ALL | PNV_IODA_PE_BUS))
>   pdev = pe->pbus->self;
> - else
> + else if (pe->flags & PNV_IODA_PE_DEV)
>   pdev = pe->pdev->bus->self;
> +#ifdef CONFIG_PCI_IOV
> + else if (pe->flags & PNV_IODA_PE_VF)
> + pdev = pe->parent_dev->bus->self;
> +#endif /* CONFIG_PCI_IOV */
>   while (pdev) {
>   struct pci_dn *pdn = pci_get_pdn(pdev);
>   struct pnv_ioda_pe *parent;
> @@ -649,6 +663,87 @@ static int pnv_ioda_set_peltv(struct pnv_phb *phb,
> 

Re: [PATCH v12 17/21] powerpc/powernv: Shift VF resource with an offset

2015-02-24 Thread Bjorn Helgaas
On Tue, Feb 24, 2015 at 02:34:57AM -0600, Bjorn Helgaas wrote:
> From: Wei Yang 
> 
> On PowerNV platform, resource position in M64 implies the PE# the resource
> belongs to.  In some cases, adjustment of a resource is necessary to locate
> it to a correct position in M64.
> 
> Add pnv_pci_vf_resource_shift() to shift the 'real' PF IOV BAR address
> according to an offset.
> 
> [bhelgaas: rework loops, rework overlap check, index resource[]
> conventionally, remove pci_regs.h include, squashed with next patch]
> Signed-off-by: Wei Yang 
> Signed-off-by: Bjorn Helgaas 

...

> +#ifdef CONFIG_PCI_IOV
> +static int pnv_pci_vf_resource_shift(struct pci_dev *dev, int offset)
> +{
> + struct pci_dn *pdn = pci_get_pdn(dev);
> + int i;
> + struct resource *res, res2;
> + resource_size_t size;
> + u16 vf_num;
> +
> + if (!dev->is_physfn)
> + return -EINVAL;
> +
> + /*
> +  * "offset" is in VFs.  The M64 windows are sized so that when they
> +  * are segmented, each segment is the same size as the IOV BAR.
> +  * Each segment is in a separate PE, and the high order bits of the
> +  * address are the PE number.  Therefore, each VF's BAR is in a
> +  * separate PE, and changing the IOV BAR start address changes the
> +  * range of PEs the VFs are in.
> +  */
> + vf_num = pdn->vf_pes;
> + for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
> + res = &dev->resource[i + PCI_IOV_RESOURCES];
> + if (!res->flags || !res->parent)
> + continue;
> +
> + if (!pnv_pci_is_mem_pref_64(res->flags))
> + continue;
> +
> + /*
> +  * The actual IOV BAR range is determined by the start address
> +  * and the actual size for vf_num VFs BAR.  This check is to
> +  * make sure that after shifting, the range will not overlap
> +  * with another device.
> +  */
> + size = pci_iov_resource_size(dev, i + PCI_IOV_RESOURCES);
> + res2.flags = res->flags;
> + res2.start = res->start + (size * offset);
> + res2.end = res2.start + (size * vf_num) - 1;
> +
> + if (res2.end > res->end) {
> + dev_err(&dev->dev, "VF BAR%d: %pR would extend past %pR 
> (trying to enable %d VFs shifted by %d)\n",
> + i, &res2, res, vf_num, offset);
> + return -EBUSY;
> + }
> + }
> +
> + for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
> + res = &dev->resource[i + PCI_IOV_RESOURCES];
> + if (!res->flags || !res->parent)
> + continue;
> +
> + if (!pnv_pci_is_mem_pref_64(res->flags))
> + continue;
> +
> + size = pci_iov_resource_size(dev, i + PCI_IOV_RESOURCES);
> + res2 = *res;
> + res->start += size * offset;

I'm still not happy about this fiddling with res->start.

Increasing res->start means that in principle, the "size * offset" bytes
that we just removed from res are now available for allocation to somebody
else.  I don't think we *will* give that space to anything else because of
the alignment restrictions you're enforcing, but "res" now doesn't
correctly describe the real resource map.

Would you be able to just update the BAR here while leaving the struct
resource alone?  In that case, it would look a little funny that lspci
would show a BAR value in the middle of the region in /proc/iomem, but
the /proc/iomem region would be more correct.

> +
> + dev_info(&dev->dev, "VF BAR%d: %pR shifted to %pR (enabling %d 
> VFs shifted by %d)\n",
> +  i, &res2, res, vf_num, offset);
> + pci_update_resource(dev, i + PCI_IOV_RESOURCES);
> + }
> + pdn->max_vfs -= offset;
> + return 0;
> +}
> +#endif /* CONFIG_PCI_IOV */
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v12 15/21] powerpc/powernv: Reserve additional space for IOV BAR according to the number of total_pe

2015-02-24 Thread Bjorn Helgaas
On Tue, Feb 24, 2015 at 02:34:42AM -0600, Bjorn Helgaas wrote:
> From: Wei Yang 
> 
> On PHB3, PF IOV BAR will be covered by M64 window to have better PE
> isolation.  The total_pe number is usually different from total_VFs, which
> can lead to a conflict between MMIO space and the PE number.
> 
> For example, if total_VFs is 128 and total_pe is 256, the second half of
> M64 window will be part of other PCI device, which may already belong
> to other PEs.

I'm still trying to wrap my mind around the explanation here.

I *think* what's going on is that the M64 window must be a power-of-two
size.  If the VF(n) BAR space doesn't completely fill it, we might allocate
the leftover space to another device.  Then the M64 window for *this*
device may cause the other device to be associated with a PE it didn't
expect.

But I don't understand this well enough to describe it clearly.

More serious code question below...

> Prevent the conflict by reserving additional space for the PF IOV BAR,
> which is total_pe number of VF's BAR size.
> 
> [bhelgaas: make dev_printk() output more consistent, index resource[]
> conventionally]
> Signed-off-by: Wei Yang 
> Signed-off-by: Bjorn Helgaas 
> ---
>  arch/powerpc/include/asm/machdep.h|4 ++
>  arch/powerpc/include/asm/pci-bridge.h |3 ++
>  arch/powerpc/kernel/pci-common.c  |5 +++
>  arch/powerpc/platforms/powernv/pci-ioda.c |   58 
> +
>  4 files changed, 70 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/machdep.h 
> b/arch/powerpc/include/asm/machdep.h
> index c8175a3fe560..965547c58497 100644
> --- a/arch/powerpc/include/asm/machdep.h
> +++ b/arch/powerpc/include/asm/machdep.h
> @@ -250,6 +250,10 @@ struct machdep_calls {
>   /* Reset the secondary bus of bridge */
>   void  (*pcibios_reset_secondary_bus)(struct pci_dev *dev);
>  
> +#ifdef CONFIG_PCI_IOV
> + void (*pcibios_fixup_sriov)(struct pci_bus *bus);
> +#endif /* CONFIG_PCI_IOV */
> +
>   /* Called to shutdown machine specific hardware not already controlled
>* by other drivers.
>*/
> diff --git a/arch/powerpc/include/asm/pci-bridge.h 
> b/arch/powerpc/include/asm/pci-bridge.h
> index 513f8f27060d..de11de7d4547 100644
> --- a/arch/powerpc/include/asm/pci-bridge.h
> +++ b/arch/powerpc/include/asm/pci-bridge.h
> @@ -175,6 +175,9 @@ struct pci_dn {
>  #define IODA_INVALID_PE  (-1)
>  #ifdef CONFIG_PPC_POWERNV
>   int pe_number;
> +#ifdef CONFIG_PCI_IOV
> + u16 max_vfs;/* number of VFs IOV BAR expended */
> +#endif /* CONFIG_PCI_IOV */
>  #endif
>   struct list_head child_list;
>   struct list_head list;
> diff --git a/arch/powerpc/kernel/pci-common.c 
> b/arch/powerpc/kernel/pci-common.c
> index 82031011522f..022e9feeb1f2 100644
> --- a/arch/powerpc/kernel/pci-common.c
> +++ b/arch/powerpc/kernel/pci-common.c
> @@ -1646,6 +1646,11 @@ void pcibios_scan_phb(struct pci_controller *hose)
>   if (ppc_md.pcibios_fixup_phb)
>   ppc_md.pcibios_fixup_phb(hose);
>  
> +#ifdef CONFIG_PCI_IOV
> + if (ppc_md.pcibios_fixup_sriov)
> + ppc_md.pcibios_fixup_sriov(bus);
> +#endif /* CONFIG_PCI_IOV */
> +
>   /* Configure PCI Express settings */
>   if (bus && !pci_has_flag(PCI_PROBE_ONLY)) {
>   struct pci_bus *child;
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
> b/arch/powerpc/platforms/powernv/pci-ioda.c
> index cd1a56160ded..36c533da5ccb 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -1749,6 +1749,61 @@ static void pnv_pci_init_ioda_msis(struct pnv_phb *phb)
>  static void pnv_pci_init_ioda_msis(struct pnv_phb *phb) { }
>  #endif /* CONFIG_PCI_MSI */
>  
> +#ifdef CONFIG_PCI_IOV
> +static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
> +{
> + struct pci_controller *hose;
> + struct pnv_phb *phb;
> + struct resource *res;
> + int i;
> + resource_size_t size;
> + struct pci_dn *pdn;
> +
> + if (!pdev->is_physfn || pdev->is_added)
> + return;
> +
> + hose = pci_bus_to_host(pdev->bus);
> + phb = hose->private_data;
> +
> + pdn = pci_get_pdn(pdev);
> + pdn->max_vfs = 0;
> +
> + for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
> + res = &pdev->resource[i + PCI_IOV_RESOURCES];
> + if (!res->flags || res->parent)
> + continue;
> + if (!pnv_pci_is_mem_pref_64(res->flags)) {
> + dev_warn(&pdev->dev, "Skipping expanding VF BAR%d: 
> %pR\n",
> +  i, res);
> + continue;
> + }
> +
> + dev_dbg(&pdev->dev, " Fixing VF BAR%d: %pR to\n", i, res);
> + size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES);
> + res->end = res->start + size * phb->ioda.total_pe - 1;
> + dev_dbg(&pdev->dev, "   %pR\n", r

Re: [PATCH v12 14/21] powerpc/powernv: Allocate struct pnv_ioda_pe iommu_table dynamically

2015-02-24 Thread Bjorn Helgaas
On Tue, Feb 24, 2015 at 02:34:35AM -0600, Bjorn Helgaas wrote:
> From: Wei Yang 
> 
> Current iommu_table of a PE is a static field.  This will have a problem
> when iommu_free_table() is called.
> 
> Allocate iommu_table dynamically.

I'd like a little more explanation about why we're calling
iommu_free_table() now when we didn't call it before.  Maybe this happens
when we disable SR-IOV and the VFs go away?

Is there a hotplug remove path where we should also be calling
iommu_free_table()?

> Signed-off-by: Wei Yang 
> Signed-off-by: Bjorn Helgaas 
> ---
>  arch/powerpc/include/asm/iommu.h  |3 +++
>  arch/powerpc/platforms/powernv/pci-ioda.c |   26 ++
>  arch/powerpc/platforms/powernv/pci.h  |2 +-
>  3 files changed, 18 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/iommu.h 
> b/arch/powerpc/include/asm/iommu.h
> index 9cfa3706a1b8..5574eeb97634 100644
> --- a/arch/powerpc/include/asm/iommu.h
> +++ b/arch/powerpc/include/asm/iommu.h
> @@ -78,6 +78,9 @@ struct iommu_table {
>   struct iommu_group *it_group;
>  #endif
>   void (*set_bypass)(struct iommu_table *tbl, bool enable);
> +#ifdef CONFIG_PPC_POWERNV
> + void   *data;
> +#endif
>  };
>  
>  /* Pure 2^n version of get_order */
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
> b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 58c4fc4ab63c..cd1a56160ded 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -916,6 +916,10 @@ static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, 
> int all)
>   return;
>   }
>  
> + pe->tce32_table = kzalloc_node(sizeof(struct iommu_table),
> + GFP_KERNEL, hose->node);
> + pe->tce32_table->data = pe;
> +
>   /* Associate it with all child devices */
>   pnv_ioda_setup_same_PE(bus, pe);
>  
> @@ -1005,7 +1009,7 @@ static void pnv_pci_ioda_dma_dev_setup(struct pnv_phb 
> *phb, struct pci_dev *pdev
>  
>   pe = &phb->ioda.pe_array[pdn->pe_number];
>   WARN_ON(get_dma_ops(&pdev->dev) != &dma_iommu_ops);
> - set_iommu_table_base_and_group(&pdev->dev, &pe->tce32_table);
> + set_iommu_table_base_and_group(&pdev->dev, pe->tce32_table);
>  }
>  
>  static int pnv_pci_ioda_dma_set_mask(struct pnv_phb *phb,
> @@ -1032,7 +1036,7 @@ static int pnv_pci_ioda_dma_set_mask(struct pnv_phb 
> *phb,
>   } else {
>   dev_info(&pdev->dev, "Using 32-bit DMA via iommu\n");
>   set_dma_ops(&pdev->dev, &dma_iommu_ops);
> - set_iommu_table_base(&pdev->dev, &pe->tce32_table);
> + set_iommu_table_base(&pdev->dev, pe->tce32_table);
>   }
>   *pdev->dev.dma_mask = dma_mask;
>   return 0;
> @@ -1069,9 +1073,9 @@ static void pnv_ioda_setup_bus_dma(struct pnv_ioda_pe 
> *pe,
>   list_for_each_entry(dev, &bus->devices, bus_list) {
>   if (add_to_iommu_group)
>   set_iommu_table_base_and_group(&dev->dev,
> -&pe->tce32_table);
> +pe->tce32_table);
>   else
> - set_iommu_table_base(&dev->dev, &pe->tce32_table);
> + set_iommu_table_base(&dev->dev, pe->tce32_table);
>  
>   if (dev->subordinate)
>   pnv_ioda_setup_bus_dma(pe, dev->subordinate,
> @@ -1161,8 +1165,7 @@ static void pnv_pci_ioda2_tce_invalidate(struct 
> pnv_ioda_pe *pe,
>  void pnv_pci_ioda_tce_invalidate(struct iommu_table *tbl,
>__be64 *startp, __be64 *endp, bool rm)
>  {
> - struct pnv_ioda_pe *pe = container_of(tbl, struct pnv_ioda_pe,
> -   tce32_table);
> + struct pnv_ioda_pe *pe = tbl->data;
>   struct pnv_phb *phb = pe->phb;
>  
>   if (phb->type == PNV_PHB_IODA1)
> @@ -1228,7 +1231,7 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb 
> *phb,
>   }
>  
>   /* Setup linux iommu table */
> - tbl = &pe->tce32_table;
> + tbl = pe->tce32_table;
>   pnv_pci_setup_iommu_table(tbl, addr, TCE32_TABLE_SIZE * segs,
> base << 28, IOMMU_PAGE_SHIFT_4K);
>  
> @@ -1266,8 +1269,7 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb 
> *phb,
>  
>  static void pnv_pci_ioda2_set_bypass(struct iommu_table *tbl, bool enable)
>  {
> - struct pnv_ioda_pe *pe = container_of(tbl, struct pnv_ioda_pe,
> -   tce32_table);
> + struct pnv_ioda_pe *pe = tbl->data;
>   uint16_t window_id = (pe->pe_number << 1 ) + 1;
>   int64_t rc;
>  
> @@ -1312,10 +1314,10 @@ static void pnv_pci_ioda2_setup_bypass_pe(struct 
> pnv_phb *phb,
>   pe->tce_bypass_base = 1ull << 59;
>  
>   /* Install set_bypass callback for VFIO */
> - pe->tce32_table.set_bypass = pnv_pci_ioda2_set_bypass;
> + pe->tce32_table->se

Re: [PATCH v12 11/21] powerpc/pci: Don't unset PCI resources for VFs

2015-02-24 Thread Bjorn Helgaas
On Tue, Feb 24, 2015 at 02:34:13AM -0600, Bjorn Helgaas wrote:
> From: Wei Yang 
> 
> If we're going to reassign resources with flag PCI_REASSIGN_ALL_RSRC, all
> resources will be cleaned out during device header fixup time and then get
> reassigned by PCI core.  However, the VF resources won't be reassigned and
> thus, we shouldn't clean them out.
> 
> If the pci_dev is a VF, skip the resource unset process.

I think this patch is correct, but we should include a little more detail
in the changelog to answer questions like mine and Ben's
(http://lkml.kernel.org/r/1423528584.4924.70.ca...@au1.ibm.com).

> Signed-off-by: Wei Yang 
> Signed-off-by: Bjorn Helgaas 
> ---
>  arch/powerpc/kernel/pci-common.c |4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/arch/powerpc/kernel/pci-common.c 
> b/arch/powerpc/kernel/pci-common.c
> index 2a525c938158..82031011522f 100644
> --- a/arch/powerpc/kernel/pci-common.c
> +++ b/arch/powerpc/kernel/pci-common.c
> @@ -788,6 +788,10 @@ static void pcibios_fixup_resources(struct pci_dev *dev)
>  pci_name(dev));
>   return;
>   }
> +
> + if (dev->is_virtfn)
> + return;
> +
>   for (i = 0; i < DEVICE_COUNT_RESOURCE; i++) {
>   struct resource *res = dev->resource + i;
>   struct pci_bus_region reg;
> 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v12 08/21] PCI: Add pcibios_sriov_enable() and pcibios_sriov_disable()

2015-02-24 Thread Bjorn Helgaas
On Tue, Feb 24, 2015 at 02:33:52AM -0600, Bjorn Helgaas wrote:
> From: Wei Yang 
> 
> VFs are dynamically created when a driver enables them.  On some platforms,
> like PowerNV, special resources are necessary to enable VFs.
> 
> Add platform hooks for enabling and disabling VFs.
> 
> Signed-off-by: Wei Yang 
> Signed-off-by: Bjorn Helgaas 
> ---
>  drivers/pci/iov.c |   19 +++
>  1 file changed, 19 insertions(+)
> 
> diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
> index 5643a1011e23..cc6fedf4a1b9 100644
> --- a/drivers/pci/iov.c
> +++ b/drivers/pci/iov.c
> @@ -220,6 +220,11 @@ static void virtfn_remove(struct pci_dev *dev, int id, 
> int reset)
>   pci_dev_put(dev);
>  }
>  
> +int __weak pcibios_sriov_enable(struct pci_dev *pdev, u16 vf_num)

I think this "vf_num" parameter should be renamed to something like
"num_vfs" instead.  It's subtle, but "vf_num" suggests that we're talking
about one of several VFs, e.g., VF1 or VF 2.  But here we really mean the
total number of VFs that we're enabling.

There's similar code in the powerpc implementation that should be
renamed the same way.

> +{
> +   return 0;
> +}
> +
>  static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
>  {
>   int rc;
> @@ -231,6 +236,7 @@ static int sriov_enable(struct pci_dev *dev, int 
> nr_virtfn)
>   struct pci_sriov *iov = dev->sriov;
>   int bars = 0;
>   int bus;
> + int retval;
>  
>   if (!nr_virtfn)
>   return 0;
> @@ -307,6 +313,12 @@ static int sriov_enable(struct pci_dev *dev, int 
> nr_virtfn)
>   if (nr_virtfn < initial)
>   initial = nr_virtfn;
>  
> + if ((retval = pcibios_sriov_enable(dev, initial))) {
> + dev_err(&dev->dev, "failure %d from pcibios_sriov_enable()\n",
> + retval);
> + return retval;
> + }
> +
>   for (i = 0; i < initial; i++) {
>   rc = virtfn_add(dev, i, 0);
>   if (rc)
> @@ -335,6 +347,11 @@ failed:
>   return rc;
>  }
>  
> +int __weak pcibios_sriov_disable(struct pci_dev *pdev)
> +{
> +   return 0;
> +}
> +
>  static void sriov_disable(struct pci_dev *dev)
>  {
>   int i;
> @@ -346,6 +363,8 @@ static void sriov_disable(struct pci_dev *dev)
>   for (i = 0; i < iov->num_VFs; i++)
>   virtfn_remove(dev, i, 0);
>  
> + pcibios_sriov_disable(dev);
> +
>   iov->ctrl &= ~(PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE);
>   pci_cfg_access_lock(dev);
>   pci_write_config_word(dev, iov->pos + PCI_SRIOV_CTRL, iov->ctrl);
> 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v12 10/21] PCI: Consider additional PF's IOV BAR alignment in sizing and assigning

2015-02-24 Thread Bjorn Helgaas
On Tue, Feb 24, 2015 at 02:34:06AM -0600, Bjorn Helgaas wrote:
> From: Wei Yang 
> 
> When sizing and assigning resources, we divide the resources into two
> lists: the requested list and the additional list.  We don't consider the
> alignment of additional VF(n) BAR space.
> 
> This is reasonable because the alignment required for the VF(n) BAR space
> is the size of an individual VF BAR, not the size of the space for *all*
> VFs.  But some platforms, e.g., PowerNV, require additional alignment.
> 
> Consider the additional IOV BAR alignment when sizing and assigning
> resources.  When there is not enough system MMIO space, the PF's IOV BAR
> alignment will not contribute to the bridge.  When there is enough system
> MMIO space, the additional alignment will contribute to the bridge.

I don't understand the ""when there is not enough system MMIO space" part.
How do we tell if there's enough MMIO space?

> Also, take advantage of pci_dev_resource::min_align to store this
> additional alignment.

This comment doesn't seem to make sense; this patch doesn't save anything
in min_align.

Another question below...

> [bhelgaas: changelog, printk cast]
> Signed-off-by: Wei Yang 
> Signed-off-by: Bjorn Helgaas 
> ---
>  drivers/pci/setup-bus.c |   83 
> ---
>  1 file changed, 70 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> index e3e17f3c0f0f..affbceae560f 100644
> --- a/drivers/pci/setup-bus.c
> +++ b/drivers/pci/setup-bus.c
> @@ -99,8 +99,8 @@ static void remove_from_list(struct list_head *head,
>   }
>  }
>  
> -static resource_size_t get_res_add_size(struct list_head *head,
> - struct resource *res)
> +static struct pci_dev_resource *res_to_dev_res(struct list_head *head,
> +struct resource *res)
>  {
>   struct pci_dev_resource *dev_res;
>  
> @@ -109,17 +109,37 @@ static resource_size_t get_res_add_size(struct 
> list_head *head,
>   int idx = res - &dev_res->dev->resource[0];
>  
>   dev_printk(KERN_DEBUG, &dev_res->dev->dev,
> -  "res[%d]=%pR get_res_add_size add_size %llx\n",
> +  "res[%d]=%pR res_to_dev_res add_size %llx 
> min_align %llx\n",
>idx, dev_res->res,
> -  (unsigned long long)dev_res->add_size);
> +  (unsigned long long)dev_res->add_size,
> +  (unsigned long long)dev_res->min_align);
>  
> - return dev_res->add_size;
> + return dev_res;
>   }
>   }
>  
> - return 0;
> + return NULL;
> +}
> +
> +static resource_size_t get_res_add_size(struct list_head *head,
> + struct resource *res)
> +{
> + struct pci_dev_resource *dev_res;
> +
> + dev_res = res_to_dev_res(head, res);
> + return dev_res ? dev_res->add_size : 0;
> +}
> +
> +static resource_size_t get_res_add_align(struct list_head *head,
> +  struct resource *res)
> +{
> + struct pci_dev_resource *dev_res;
> +
> + dev_res = res_to_dev_res(head, res);
> + return dev_res ? dev_res->min_align : 0;
>  }
>  
> +
>  /* Sort resources by alignment */
>  static void pdev_sort_resources(struct pci_dev *dev, struct list_head *head)
>  {
> @@ -368,8 +388,9 @@ static void __assign_resources_sorted(struct list_head 
> *head,
>   LIST_HEAD(save_head);
>   LIST_HEAD(local_fail_head);
>   struct pci_dev_resource *save_res;
> - struct pci_dev_resource *dev_res, *tmp_res;
> + struct pci_dev_resource *dev_res, *tmp_res, *dev_res2;
>   unsigned long fail_type;
> + resource_size_t add_align, align;
>  
>   /* Check if optional add_size is there */
>   if (!realloc_head || list_empty(realloc_head))
> @@ -384,10 +405,38 @@ static void __assign_resources_sorted(struct list_head 
> *head,
>   }
>  
>   /* Update res in head list with add_size in realloc_head list */
> - list_for_each_entry(dev_res, head, list)
> + list_for_each_entry_safe(dev_res, tmp_res, head, list) {
>   dev_res->res->end += get_res_add_size(realloc_head,
>   dev_res->res);
>  
> + /*
> +  * There are two kinds of additional resources in the list:
> +  * 1. bridge resource  -- IORESOURCE_STARTALIGN
> +  * 2. SR-IOV resource   -- IORESOURCE_SIZEALIGN
> +  * Here just fix the additional alignment for bridge
> +  */
> + if (!(dev_res->res->flags & IORESOURCE_STARTALIGN))
> + continue;
> +
> + add_align = get_res_add_align(realloc_head, dev_res->res);
> +
> + /* Reorder the list by their alignment */

Why do we nee

[PATCH v12 21/21] powerpc/pci: Add PCI resource alignment documentation

2015-02-24 Thread Bjorn Helgaas
From: Wei Yang 

In order to enable SRIOV on PowerNV platform, the PF's IOV BAR needs to be
adjusted:

1. size expanded
2. aligned to M64BT size

This patch documents this change on the reason and how.

[bhelgaas: reformat, clarify, expand]
Signed-off-by: Wei Yang 
Signed-off-by: Bjorn Helgaas 
---
 .../powerpc/pci_iov_resource_on_powernv.txt|  305 
 1 file changed, 305 insertions(+)
 create mode 100644 Documentation/powerpc/pci_iov_resource_on_powernv.txt

diff --git a/Documentation/powerpc/pci_iov_resource_on_powernv.txt 
b/Documentation/powerpc/pci_iov_resource_on_powernv.txt
new file mode 100644
index ..4e9bb2812238
--- /dev/null
+++ b/Documentation/powerpc/pci_iov_resource_on_powernv.txt
@@ -0,0 +1,305 @@
+Wei Yang 
+Benjamin Herrenschmidt 
+26 Aug 2014
+
+This document describes the requirement from hardware for PCI MMIO resource
+sizing and assignment on PowerNV platform and how generic PCI code handles
+this requirement.  The first two sections describe the concepts of
+Partitionable Endpoints and the implementation on P8 (IODA2).
+
+1. Introduction to Partitionable Endpoints
+
+A Partitionable Endpoint (PE) is a way to group the various resources
+associated with a device or a set of device to provide isolation between
+partitions (i.e., filtering of DMA, MSIs etc.) and to provide a mechanism
+to freeze a device that is causing errors in order to limit the possibility
+of propagation of bad data.
+
+There is thus, in HW, a table of PE states that contains a pair of "frozen"
+state bits (one for MMIO and one for DMA, they get set together but can be
+cleared independently) for each PE.
+
+When a PE is frozen, all stores in any direction are dropped and all loads
+return all 1's value.  MSIs are also blocked.  There's a bit more state
+that captures things like the details of the error that caused the freeze
+etc., but that's not critical.
+
+The interesting part is how the various PCIe transactions (MMIO, DMA, ...)
+are matched to their corresponding PEs.
+
+The following section provides a rough description of what we have on P8
+(IODA2).  Keep in mind that this is all per PHB (PCI host bridge).  Each
+PHB is a completely separate HW entity that replicates the entire logic,
+so has its own set of PEs, etc.
+
+2. Implementation of Partitionable Endpoints on P8 (IODA2)
+
+P8 supports up to 256 Partitionable Endpoints per PHB.
+
+  * Inbound
+
+For DMA, MSIs and inbound PCIe error messages, we have a table (in
+memory but accessed in HW by the chip) that provides a direct
+correspondence between a PCIe RID (bus/dev/fn) with a PE number.
+We call this the RTT.
+
+- For DMA we then provide an entire address space for each PE that can
+  contains two "windows", depending on the value of PCI address bit 59.
+  Each window can be configured to be remapped via a "TCE table" (IOMMU
+  translation table), which has various configurable characteristics
+  not described here.
+
+- For MSIs, we have two windows in the address space (one at the top of
+  the 32-bit space and one much higher) which, via a combination of the
+  address and MSI value, will result in one of the 2048 interrupts per
+  bridge being triggered.  There's a PE# in the interrupt controller
+  descriptor table as well which is compared with the PE# obtained from
+  the RTT to "authorize" the device to emit that specific interrupt.
+
+- Error messages just use the RTT.
+
+  * Outbound.  That's where the tricky part is.
+
+Like other PCI host bridges, the Power8 IODA2 PHB supports "windows"
+from the CPU address space to the PCI address space.  There is one M32
+window and sixteen M64 windows.  They have different characteristics.
+First what they have in common: they forward a configurable portion of
+the CPU address space to the PCIe bus and must be naturally aligned
+power of two in size.  The rest is different:
+
+- The M32 window:
+
+  * Is limited to 4GB in size.
+
+  * Drops the top bits of the address (above the size) and replaces
+   them with a configurable value.  This is typically used to generate
+   32-bit PCIe accesses.  We configure that window at boot from FW and
+   don't touch it from Linux; it's usually set to forward a 2GB
+   portion of address space from the CPU to PCIe
+   0x8000_..0x_.  (Note: The top 64KB are actually
+   reserved for MSIs but this is not a problem at this point; we just
+   need to ensure Linux doesn't assign anything there, the M32 logic
+   ignores that however and will forward in that space if we try).
+
+  * It is divided into 256 segments of equal size.  A table in the chip
+   maps each segment to a PE#.  That allows portions of the MMIO space
+   to be assigned to PEs on a segment granularity.  For a 2GB window,
+   the segment granularity is 2GB/256 = 8MB.
+
+Now, this is the "main" 

[PATCH v12 20/21] powerpc/pci: Remove unused struct pci_dn.pcidev field

2015-02-24 Thread Bjorn Helgaas
From: Wei Yang 

In struct pci_dn, the pcidev field is assigned but not used, so remove it.

Signed-off-by: Wei Yang 
Signed-off-by: Bjorn Helgaas 
Acked-by: Gavin Shan 
---
 arch/powerpc/include/asm/pci-bridge.h |1 -
 arch/powerpc/platforms/powernv/pci-ioda.c |1 -
 2 files changed, 2 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h 
b/arch/powerpc/include/asm/pci-bridge.h
index 958ea8675691..109efbaf384d 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -168,7 +168,6 @@ struct pci_dn {
 
int pci_ext_config_space;   /* for pci devices */
 
-   struct  pci_dev *pcidev;/* back-pointer to the pci device */
 #ifdef CONFIG_EEH
struct eeh_dev *edev;   /* eeh device */
 #endif
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index b265d5da601b..58d4ca01bfd9 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1024,7 +1024,6 @@ static void pnv_ioda_setup_same_PE(struct pci_bus *bus, 
struct pnv_ioda_pe *pe)
pci_name(dev));
continue;
}
-   pdn->pcidev = dev;
pdn->pe_number = pe->pe_number;
pe->dma_weight += pnv_ioda_dma_weight(dev);
if ((pe->flags & PNV_IODA_PE_BUS_ALL) && dev->subordinate)

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v12 19/21] powerpc/powernv: Group VF PE when IOV BAR is big on PHB3

2015-02-24 Thread Bjorn Helgaas
From: Wei Yang 

When IOV BAR is big, each is covered by 4 M64 windows.  This leads to
several VF PE sits in one PE in terms of M64.

Group VF PEs according to the M64 allocation.

[bhelgaas: use dev_printk() when possible]
Signed-off-by: Wei Yang 
Signed-off-by: Bjorn Helgaas 
---
 arch/powerpc/include/asm/pci-bridge.h |2 
 arch/powerpc/platforms/powernv/pci-ioda.c |  197 +++--
 2 files changed, 154 insertions(+), 45 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h 
b/arch/powerpc/include/asm/pci-bridge.h
index d824bb184ab8..958ea8675691 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -182,7 +182,7 @@ struct pci_dn {
 #define M64_PER_IOV 4
int m64_per_iov;
 #define IODA_INVALID_M64(-1)
-   int m64_wins[PCI_SRIOV_NUM_BARS];
+   int m64_wins[PCI_SRIOV_NUM_BARS][M64_PER_IOV];
 #endif /* CONFIG_PCI_IOV */
 #endif
struct list_head child_list;
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 30b7c3909746..b265d5da601b 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1152,26 +1152,27 @@ static int pnv_pci_vf_release_m64(struct pci_dev *pdev)
struct pci_controller *hose;
struct pnv_phb*phb;
struct pci_dn *pdn;
-   inti;
+   inti, j;
 
bus = pdev->bus;
hose = pci_bus_to_host(bus);
phb = hose->private_data;
pdn = pci_get_pdn(pdev);
 
-   for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
-   if (pdn->m64_wins[i] == IODA_INVALID_M64)
-   continue;
-   opal_pci_phb_mmio_enable(phb->opal_id,
-   OPAL_M64_WINDOW_TYPE, pdn->m64_wins[i], 0);
-   clear_bit(pdn->m64_wins[i], &phb->ioda.m64_bar_alloc);
-   pdn->m64_wins[i] = IODA_INVALID_M64;
-   }
+   for (i = 0; i < PCI_SRIOV_NUM_BARS; i++)
+   for (j = 0; j < M64_PER_IOV; j++) {
+   if (pdn->m64_wins[i][j] == IODA_INVALID_M64)
+   continue;
+   opal_pci_phb_mmio_enable(phb->opal_id,
+   OPAL_M64_WINDOW_TYPE, pdn->m64_wins[i][j], 0);
+   clear_bit(pdn->m64_wins[i][j], 
&phb->ioda.m64_bar_alloc);
+   pdn->m64_wins[i][j] = IODA_INVALID_M64;
+   }
 
return 0;
 }
 
-static int pnv_pci_vf_assign_m64(struct pci_dev *pdev)
+static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 vf_num)
 {
struct pci_bus*bus;
struct pci_controller *hose;
@@ -1179,17 +1180,33 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev)
struct pci_dn *pdn;
unsigned int   win;
struct resource   *res;
-   inti;
+   inti, j;
int64_trc;
+   inttotal_vfs;
+   resource_size_tsize, start;
+   intpe_num;
+   intvf_groups;
+   intvf_per_group;
 
bus = pdev->bus;
hose = pci_bus_to_host(bus);
phb = hose->private_data;
pdn = pci_get_pdn(pdev);
+   total_vfs = pci_sriov_get_totalvfs(pdev);
 
/* Initialize the m64_wins to IODA_INVALID_M64 */
for (i = 0; i < PCI_SRIOV_NUM_BARS; i++)
-   pdn->m64_wins[i] = IODA_INVALID_M64;
+   for (j = 0; j < M64_PER_IOV; j++)
+   pdn->m64_wins[i][j] = IODA_INVALID_M64;
+
+   if (pdn->m64_per_iov == M64_PER_IOV) {
+   vf_groups = (vf_num <= M64_PER_IOV) ? vf_num: M64_PER_IOV;
+   vf_per_group = (vf_num <= M64_PER_IOV)? 1:
+   __roundup_pow_of_two(vf_num) / pdn->m64_per_iov;
+   } else {
+   vf_groups = 1;
+   vf_per_group = 1;
+   }
 
for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
res = &pdev->resource[i + PCI_IOV_RESOURCES];
@@ -1199,35 +1216,61 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev)
if (!pnv_pci_is_mem_pref_64(res->flags))
continue;
 
-   do {
-   win = find_next_zero_bit(&phb->ioda.m64_bar_alloc,
-   phb->ioda.m64_bar_idx + 1, 0);
-
-   if (win >= phb->ioda.m64_bar_idx + 1)
-   goto m64_failed;
-   } while (test_and_set_bit(win, &phb->ioda.m64_bar_alloc));
+   for (j = 0; j < vf_groups; j++) {
+   do {
+   win = 
find_next_zero_bit(&phb->ioda.m64_bar_alloc,
+   phb->ioda.m64_bar_idx + 1, 0);
+
+

[PATCH v12 18/21] powerpc/powernv: Reserve additional space for IOV BAR, with m64_per_iov supported

2015-02-24 Thread Bjorn Helgaas
From: Wei Yang 

M64 aperture size is limited on PHB3.  When the IOV BAR is too big, this
will exceed the limitation and failed to be assigned.

Introduce a different mechanism based on the IOV BAR size:

  - if IOV BAR size is smaller than 64MB, expand to total_pe
  - if IOV BAR size is bigger than 64MB, roundup power2

[bhelgaas: make dev_printk() output more consistent, use PCI_SRIOV_NUM_BARS]
Signed-off-by: Wei Yang 
Signed-off-by: Bjorn Helgaas 
---
 arch/powerpc/include/asm/pci-bridge.h |2 ++
 arch/powerpc/platforms/powernv/pci-ioda.c |   33 ++---
 2 files changed, 32 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h 
b/arch/powerpc/include/asm/pci-bridge.h
index 011340df8583..d824bb184ab8 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -179,6 +179,8 @@ struct pci_dn {
u16 max_vfs;/* number of VFs IOV BAR expended */
u16 vf_pes; /* VF PE# under this PF */
int offset; /* PE# for the first VF PE */
+#define M64_PER_IOV 4
+   int m64_per_iov;
 #define IODA_INVALID_M64(-1)
int m64_wins[PCI_SRIOV_NUM_BARS];
 #endif /* CONFIG_PCI_IOV */
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index a3c2fbe35fc8..30b7c3909746 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2242,6 +2242,7 @@ static void pnv_pci_ioda_fixup_iov_resources(struct 
pci_dev *pdev)
int i;
resource_size_t size;
struct pci_dn *pdn;
+   int mul, total_vfs;
 
if (!pdev->is_physfn || pdev->is_added)
return;
@@ -2252,6 +2253,32 @@ static void pnv_pci_ioda_fixup_iov_resources(struct 
pci_dev *pdev)
pdn = pci_get_pdn(pdev);
pdn->max_vfs = 0;
 
+   total_vfs = pci_sriov_get_totalvfs(pdev);
+   pdn->m64_per_iov = 1;
+   mul = phb->ioda.total_pe;
+
+   for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
+   res = &pdev->resource[i + PCI_IOV_RESOURCES];
+   if (!res->flags || res->parent)
+   continue;
+   if (!pnv_pci_is_mem_pref_64(res->flags)) {
+   dev_warn(&pdev->dev, " non M64 VF BAR%d: %pR\n",
+i, res);
+   continue;
+   }
+
+   size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES);
+
+   /* bigger than 64M */
+   if (size > (1 << 26)) {
+   dev_info(&pdev->dev, "PowerNV: VF BAR%d: %pR IOV size 
is bigger than 64M, roundup power2\n",
+i, res);
+   pdn->m64_per_iov = M64_PER_IOV;
+   mul = __roundup_pow_of_two(total_vfs);
+   break;
+   }
+   }
+
for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
res = &pdev->resource[i + PCI_IOV_RESOURCES];
if (!res->flags || res->parent)
@@ -2264,12 +2291,12 @@ static void pnv_pci_ioda_fixup_iov_resources(struct 
pci_dev *pdev)
 
dev_dbg(&pdev->dev, " Fixing VF BAR%d: %pR to\n", i, res);
size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES);
-   res->end = res->start + size * phb->ioda.total_pe - 1;
+   res->end = res->start + size * mul - 1;
dev_dbg(&pdev->dev, "   %pR\n", res);
dev_info(&pdev->dev, "VF BAR%d: %pR (expanded to %d VFs for PE 
alignment)",
-   i, res, phb->ioda.total_pe);
+i, res, mul);
}
-   pdn->max_vfs = phb->ioda.total_pe;
+   pdn->max_vfs = mul;
 }
 
 static void pnv_pci_ioda_fixup_sriov(struct pci_bus *bus)

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v12 17/21] powerpc/powernv: Shift VF resource with an offset

2015-02-24 Thread Bjorn Helgaas
From: Wei Yang 

On PowerNV platform, resource position in M64 implies the PE# the resource
belongs to.  In some cases, adjustment of a resource is necessary to locate
it to a correct position in M64.

Add pnv_pci_vf_resource_shift() to shift the 'real' PF IOV BAR address
according to an offset.

[bhelgaas: rework loops, rework overlap check, index resource[]
conventionally, remove pci_regs.h include, squashed with next patch]
Signed-off-by: Wei Yang 
Signed-off-by: Bjorn Helgaas 
---
 arch/powerpc/include/asm/pci-bridge.h |4 
 arch/powerpc/kernel/pci_dn.c  |   11 +
 arch/powerpc/platforms/powernv/pci-ioda.c |  520 -
 arch/powerpc/platforms/powernv/pci.c  |   18 +
 arch/powerpc/platforms/powernv/pci.h  |7 
 5 files changed, 543 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h 
b/arch/powerpc/include/asm/pci-bridge.h
index de11de7d4547..011340df8583 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -177,6 +177,10 @@ struct pci_dn {
int pe_number;
 #ifdef CONFIG_PCI_IOV
u16 max_vfs;/* number of VFs IOV BAR expended */
+   u16 vf_pes; /* VF PE# under this PF */
+   int offset; /* PE# for the first VF PE */
+#define IODA_INVALID_M64(-1)
+   int m64_wins[PCI_SRIOV_NUM_BARS];
 #endif /* CONFIG_PCI_IOV */
 #endif
struct list_head child_list;
diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
index f3a1a81d112f..5faf7ca45434 100644
--- a/arch/powerpc/kernel/pci_dn.c
+++ b/arch/powerpc/kernel/pci_dn.c
@@ -217,6 +217,17 @@ void remove_dev_pci_info(struct pci_dev *pdev)
struct pci_dn *pdn, *tmp;
int i;
 
+   /*
+* VF and VF PE are created/released dynamically, so we need to
+* bind/unbind them.  Otherwise the VF and VF PE would be mismatched
+* when re-enabling SR-IOV.
+*/
+   if (pdev->is_virtfn) {
+   pdn = pci_get_pdn(pdev);
+   pdn->pe_number = IODA_INVALID_PE;
+   return;
+   }
+
/* Only support IOV PF for now */
if (!pdev->is_physfn)
return;
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 6a86690bb8de..a3c2fbe35fc8 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -44,6 +44,9 @@
 #include "powernv.h"
 #include "pci.h"
 
+/* 256M DMA window, 4K TCE pages, 8 bytes TCE */
+#define TCE32_TABLE_SIZE   ((0x1000 / 0x1000) * 8)
+
 static void pe_level_printk(const struct pnv_ioda_pe *pe, const char *level,
const char *fmt, ...)
 {
@@ -56,11 +59,18 @@ static void pe_level_printk(const struct pnv_ioda_pe *pe, 
const char *level,
vaf.fmt = fmt;
vaf.va = &args;
 
-   if (pe->pdev)
+   if (pe->flags & PNV_IODA_PE_DEV)
strlcpy(pfix, dev_name(&pe->pdev->dev), sizeof(pfix));
-   else
+   else if (pe->flags & (PNV_IODA_PE_BUS | PNV_IODA_PE_BUS_ALL))
sprintf(pfix, "%04x:%02x ",
pci_domain_nr(pe->pbus), pe->pbus->number);
+#ifdef CONFIG_PCI_IOV
+   else if (pe->flags & PNV_IODA_PE_VF)
+   sprintf(pfix, "%04x:%02x:%2x.%d",
+   pci_domain_nr(pe->parent_dev->bus),
+   (pe->rid & 0xff00) >> 8,
+   PCI_SLOT(pe->rid), PCI_FUNC(pe->rid));
+#endif /* CONFIG_PCI_IOV*/
 
printk("%spci %s: [PE# %.3d] %pV",
   level, pfix, pe->pe_number, &vaf);
@@ -591,7 +601,7 @@ static int pnv_ioda_set_peltv(struct pnv_phb *phb,
  bool is_add)
 {
struct pnv_ioda_pe *slave;
-   struct pci_dev *pdev;
+   struct pci_dev *pdev = NULL;
int ret;
 
/*
@@ -630,8 +640,12 @@ static int pnv_ioda_set_peltv(struct pnv_phb *phb,
 
if (pe->flags & (PNV_IODA_PE_BUS_ALL | PNV_IODA_PE_BUS))
pdev = pe->pbus->self;
-   else
+   else if (pe->flags & PNV_IODA_PE_DEV)
pdev = pe->pdev->bus->self;
+#ifdef CONFIG_PCI_IOV
+   else if (pe->flags & PNV_IODA_PE_VF)
+   pdev = pe->parent_dev->bus->self;
+#endif /* CONFIG_PCI_IOV */
while (pdev) {
struct pci_dn *pdn = pci_get_pdn(pdev);
struct pnv_ioda_pe *parent;
@@ -649,6 +663,87 @@ static int pnv_ioda_set_peltv(struct pnv_phb *phb,
return 0;
 }
 
+#ifdef CONFIG_PCI_IOV
+static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
+{
+   struct pci_dev *parent;
+   uint8_t bcomp, dcomp, fcomp;
+   int64_t rc;
+   long rid_end, rid;
+
+   /* Currently, we just deconfigure VF PE. Bus PE will always there.*/
+   if (pe->pbus) {
+   int count;
+
+   dcomp = OPAL_IGNORE_RID_DEVI

[PATCH v12 16/21] powerpc/powernv: Implement pcibios_iov_resource_alignment() on powernv

2015-02-24 Thread Bjorn Helgaas
From: Wei Yang 

Implement pcibios_iov_resource_alignment() on powernv platform.

On PowerNV platform, there are 3 cases for the IOV BAR:
1. initial state, the IOV BAR size is multiple times of VF BAR size
2. after expanded, the IOV BAR size is expanded to meet the M64 segment size
3. sizing stage, the IOV BAR is truncated to 0

pnv_pci_iov_resource_alignment() handle these three cases respectively.

[bhelgaas: adjust to drop "align" parameter, return pci_iov_resource_size()
if no ppc_md machdep_call version]
Signed-off-by: Wei Yang 
Signed-off-by: Bjorn Helgaas 
---
 arch/powerpc/include/asm/machdep.h|1 +
 arch/powerpc/kernel/pci-common.c  |   10 ++
 arch/powerpc/platforms/powernv/pci-ioda.c |   20 
 3 files changed, 31 insertions(+)

diff --git a/arch/powerpc/include/asm/machdep.h 
b/arch/powerpc/include/asm/machdep.h
index 965547c58497..045448f9e8b2 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -252,6 +252,7 @@ struct machdep_calls {
 
 #ifdef CONFIG_PCI_IOV
void (*pcibios_fixup_sriov)(struct pci_bus *bus);
+   resource_size_t (*pcibios_iov_resource_alignment)(struct pci_dev *, int 
resno);
 #endif /* CONFIG_PCI_IOV */
 
/* Called to shutdown machine specific hardware not already controlled
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index 022e9feeb1f2..2f1ad9ef4402 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -130,6 +130,16 @@ void pcibios_reset_secondary_bus(struct pci_dev *dev)
pci_reset_secondary_bus(dev);
 }
 
+#ifdef CONFIG_PCI_IOV
+resource_size_t pcibios_iov_resource_alignment(struct pci_dev *pdev, int resno)
+{
+   if (ppc_md.pcibios_iov_resource_alignment)
+   return ppc_md.pcibios_iov_resource_alignment(pdev, resno);
+
+   return pci_iov_resource_size(dev, resno);
+}
+#endif /* CONFIG_PCI_IOV */
+
 static resource_size_t pcibios_io_size(const struct pci_controller *hose)
 {
 #ifdef CONFIG_PPC64
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 36c533da5ccb..6a86690bb8de 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1980,6 +1980,25 @@ static resource_size_t pnv_pci_window_alignment(struct 
pci_bus *bus,
return phb->ioda.io_segsize;
 }
 
+#ifdef CONFIG_PCI_IOV
+static resource_size_t pnv_pci_iov_resource_alignment(struct pci_dev *pdev,
+ int resno)
+{
+   struct pci_dn *pdn = pci_get_pdn(pdev);
+   resource_size_t align, iov_align;
+
+   iov_align = resource_size(&pdev->resource[resno]);
+   if (iov_align)
+   return iov_align;
+
+   align = pci_iov_resource_size(pdev, resno);
+   if (pdn->max_vfs)
+   return pdn->max_vfs * align;
+
+   return align;
+}
+#endif /* CONFIG_PCI_IOV */
+
 /* Prevent enabling devices for which we couldn't properly
  * assign a PE
  */
@@ -2182,6 +2201,7 @@ static void __init pnv_pci_init_ioda_phb(struct 
device_node *np,
ppc_md.pcibios_reset_secondary_bus = pnv_pci_reset_secondary_bus;
 #ifdef CONFIG_PCI_IOV
ppc_md.pcibios_fixup_sriov = pnv_pci_ioda_fixup_sriov;
+   ppc_md.pcibios_iov_resource_alignment = pnv_pci_iov_resource_alignment;
 #endif /* CONFIG_PCI_IOV */
pci_add_flags(PCI_REASSIGN_ALL_RSRC);
 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v12 15/21] powerpc/powernv: Reserve additional space for IOV BAR according to the number of total_pe

2015-02-24 Thread Bjorn Helgaas
From: Wei Yang 

On PHB3, PF IOV BAR will be covered by M64 window to have better PE
isolation.  The total_pe number is usually different from total_VFs, which
can lead to a conflict between MMIO space and the PE number.

For example, if total_VFs is 128 and total_pe is 256, the second half of
M64 window will be part of other PCI device, which may already belong
to other PEs.

Prevent the conflict by reserving additional space for the PF IOV BAR,
which is total_pe number of VF's BAR size.

[bhelgaas: make dev_printk() output more consistent, index resource[]
conventionally]
Signed-off-by: Wei Yang 
Signed-off-by: Bjorn Helgaas 
---
 arch/powerpc/include/asm/machdep.h|4 ++
 arch/powerpc/include/asm/pci-bridge.h |3 ++
 arch/powerpc/kernel/pci-common.c  |5 +++
 arch/powerpc/platforms/powernv/pci-ioda.c |   58 +
 4 files changed, 70 insertions(+)

diff --git a/arch/powerpc/include/asm/machdep.h 
b/arch/powerpc/include/asm/machdep.h
index c8175a3fe560..965547c58497 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -250,6 +250,10 @@ struct machdep_calls {
/* Reset the secondary bus of bridge */
void  (*pcibios_reset_secondary_bus)(struct pci_dev *dev);
 
+#ifdef CONFIG_PCI_IOV
+   void (*pcibios_fixup_sriov)(struct pci_bus *bus);
+#endif /* CONFIG_PCI_IOV */
+
/* Called to shutdown machine specific hardware not already controlled
 * by other drivers.
 */
diff --git a/arch/powerpc/include/asm/pci-bridge.h 
b/arch/powerpc/include/asm/pci-bridge.h
index 513f8f27060d..de11de7d4547 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -175,6 +175,9 @@ struct pci_dn {
 #define IODA_INVALID_PE(-1)
 #ifdef CONFIG_PPC_POWERNV
int pe_number;
+#ifdef CONFIG_PCI_IOV
+   u16 max_vfs;/* number of VFs IOV BAR expended */
+#endif /* CONFIG_PCI_IOV */
 #endif
struct list_head child_list;
struct list_head list;
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index 82031011522f..022e9feeb1f2 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -1646,6 +1646,11 @@ void pcibios_scan_phb(struct pci_controller *hose)
if (ppc_md.pcibios_fixup_phb)
ppc_md.pcibios_fixup_phb(hose);
 
+#ifdef CONFIG_PCI_IOV
+   if (ppc_md.pcibios_fixup_sriov)
+   ppc_md.pcibios_fixup_sriov(bus);
+#endif /* CONFIG_PCI_IOV */
+
/* Configure PCI Express settings */
if (bus && !pci_has_flag(PCI_PROBE_ONLY)) {
struct pci_bus *child;
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index cd1a56160ded..36c533da5ccb 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1749,6 +1749,61 @@ static void pnv_pci_init_ioda_msis(struct pnv_phb *phb)
 static void pnv_pci_init_ioda_msis(struct pnv_phb *phb) { }
 #endif /* CONFIG_PCI_MSI */
 
+#ifdef CONFIG_PCI_IOV
+static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
+{
+   struct pci_controller *hose;
+   struct pnv_phb *phb;
+   struct resource *res;
+   int i;
+   resource_size_t size;
+   struct pci_dn *pdn;
+
+   if (!pdev->is_physfn || pdev->is_added)
+   return;
+
+   hose = pci_bus_to_host(pdev->bus);
+   phb = hose->private_data;
+
+   pdn = pci_get_pdn(pdev);
+   pdn->max_vfs = 0;
+
+   for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
+   res = &pdev->resource[i + PCI_IOV_RESOURCES];
+   if (!res->flags || res->parent)
+   continue;
+   if (!pnv_pci_is_mem_pref_64(res->flags)) {
+   dev_warn(&pdev->dev, "Skipping expanding VF BAR%d: 
%pR\n",
+i, res);
+   continue;
+   }
+
+   dev_dbg(&pdev->dev, " Fixing VF BAR%d: %pR to\n", i, res);
+   size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES);
+   res->end = res->start + size * phb->ioda.total_pe - 1;
+   dev_dbg(&pdev->dev, "   %pR\n", res);
+   dev_info(&pdev->dev, "VF BAR%d: %pR (expanded to %d VFs for PE 
alignment)",
+   i, res, phb->ioda.total_pe);
+   }
+   pdn->max_vfs = phb->ioda.total_pe;
+}
+
+static void pnv_pci_ioda_fixup_sriov(struct pci_bus *bus)
+{
+   struct pci_dev *pdev;
+   struct pci_bus *b;
+
+   list_for_each_entry(pdev, &bus->devices, bus_list) {
+   b = pdev->subordinate;
+
+   if (b)
+   pnv_pci_ioda_fixup_sriov(b);
+
+   pnv_pci_ioda_fixup_iov_resources(pdev);
+   }
+}
+#endif /* CONFIG_PCI_IOV */
+
 /*
  * This function is supposed to be called on basi

[PATCH v12 14/21] powerpc/powernv: Allocate struct pnv_ioda_pe iommu_table dynamically

2015-02-24 Thread Bjorn Helgaas
From: Wei Yang 

Current iommu_table of a PE is a static field.  This will have a problem
when iommu_free_table() is called.

Allocate iommu_table dynamically.

Signed-off-by: Wei Yang 
Signed-off-by: Bjorn Helgaas 
---
 arch/powerpc/include/asm/iommu.h  |3 +++
 arch/powerpc/platforms/powernv/pci-ioda.c |   26 ++
 arch/powerpc/platforms/powernv/pci.h  |2 +-
 3 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index 9cfa3706a1b8..5574eeb97634 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -78,6 +78,9 @@ struct iommu_table {
struct iommu_group *it_group;
 #endif
void (*set_bypass)(struct iommu_table *tbl, bool enable);
+#ifdef CONFIG_PPC_POWERNV
+   void   *data;
+#endif
 };
 
 /* Pure 2^n version of get_order */
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 58c4fc4ab63c..cd1a56160ded 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -916,6 +916,10 @@ static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, int 
all)
return;
}
 
+   pe->tce32_table = kzalloc_node(sizeof(struct iommu_table),
+   GFP_KERNEL, hose->node);
+   pe->tce32_table->data = pe;
+
/* Associate it with all child devices */
pnv_ioda_setup_same_PE(bus, pe);
 
@@ -1005,7 +1009,7 @@ static void pnv_pci_ioda_dma_dev_setup(struct pnv_phb 
*phb, struct pci_dev *pdev
 
pe = &phb->ioda.pe_array[pdn->pe_number];
WARN_ON(get_dma_ops(&pdev->dev) != &dma_iommu_ops);
-   set_iommu_table_base_and_group(&pdev->dev, &pe->tce32_table);
+   set_iommu_table_base_and_group(&pdev->dev, pe->tce32_table);
 }
 
 static int pnv_pci_ioda_dma_set_mask(struct pnv_phb *phb,
@@ -1032,7 +1036,7 @@ static int pnv_pci_ioda_dma_set_mask(struct pnv_phb *phb,
} else {
dev_info(&pdev->dev, "Using 32-bit DMA via iommu\n");
set_dma_ops(&pdev->dev, &dma_iommu_ops);
-   set_iommu_table_base(&pdev->dev, &pe->tce32_table);
+   set_iommu_table_base(&pdev->dev, pe->tce32_table);
}
*pdev->dev.dma_mask = dma_mask;
return 0;
@@ -1069,9 +1073,9 @@ static void pnv_ioda_setup_bus_dma(struct pnv_ioda_pe *pe,
list_for_each_entry(dev, &bus->devices, bus_list) {
if (add_to_iommu_group)
set_iommu_table_base_and_group(&dev->dev,
-  &pe->tce32_table);
+  pe->tce32_table);
else
-   set_iommu_table_base(&dev->dev, &pe->tce32_table);
+   set_iommu_table_base(&dev->dev, pe->tce32_table);
 
if (dev->subordinate)
pnv_ioda_setup_bus_dma(pe, dev->subordinate,
@@ -1161,8 +1165,7 @@ static void pnv_pci_ioda2_tce_invalidate(struct 
pnv_ioda_pe *pe,
 void pnv_pci_ioda_tce_invalidate(struct iommu_table *tbl,
 __be64 *startp, __be64 *endp, bool rm)
 {
-   struct pnv_ioda_pe *pe = container_of(tbl, struct pnv_ioda_pe,
- tce32_table);
+   struct pnv_ioda_pe *pe = tbl->data;
struct pnv_phb *phb = pe->phb;
 
if (phb->type == PNV_PHB_IODA1)
@@ -1228,7 +1231,7 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
}
 
/* Setup linux iommu table */
-   tbl = &pe->tce32_table;
+   tbl = pe->tce32_table;
pnv_pci_setup_iommu_table(tbl, addr, TCE32_TABLE_SIZE * segs,
  base << 28, IOMMU_PAGE_SHIFT_4K);
 
@@ -1266,8 +1269,7 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
 
 static void pnv_pci_ioda2_set_bypass(struct iommu_table *tbl, bool enable)
 {
-   struct pnv_ioda_pe *pe = container_of(tbl, struct pnv_ioda_pe,
- tce32_table);
+   struct pnv_ioda_pe *pe = tbl->data;
uint16_t window_id = (pe->pe_number << 1 ) + 1;
int64_t rc;
 
@@ -1312,10 +1314,10 @@ static void pnv_pci_ioda2_setup_bypass_pe(struct 
pnv_phb *phb,
pe->tce_bypass_base = 1ull << 59;
 
/* Install set_bypass callback for VFIO */
-   pe->tce32_table.set_bypass = pnv_pci_ioda2_set_bypass;
+   pe->tce32_table->set_bypass = pnv_pci_ioda2_set_bypass;
 
/* Enable bypass by default */
-   pnv_pci_ioda2_set_bypass(&pe->tce32_table, true);
+   pnv_pci_ioda2_set_bypass(pe->tce32_table, true);
 }
 
 static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
@@ -1363,7 +1365,7 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb 
*phb,
}
 
/* Setup linux iommu table */
-   tbl = &pe->tce32_table;
+   tbl = pe->tce32_table;
 

[PATCH v12 13/21] powerpc/powernv: Use pci_dn, not device_node, in PCI config accessor

2015-02-24 Thread Bjorn Helgaas
From: Wei Yang 

The PCI config accessors previously relied on device_node.  Unfortunately,
VFs don't have a corresponding device_node, so change the accessors to use
pci_dn instead.

[bhelgaas: changelog]
Signed-off-by: Gavin Shan 
Signed-off-by: Bjorn Helgaas 
---
 arch/powerpc/platforms/powernv/eeh-powernv.c |   14 +
 arch/powerpc/platforms/powernv/pci.c |   69 ++
 arch/powerpc/platforms/powernv/pci.h |4 +-
 3 files changed, 40 insertions(+), 47 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c 
b/arch/powerpc/platforms/powernv/eeh-powernv.c
index e261869adc86..7a5021b95a14 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -430,21 +430,31 @@ static inline bool powernv_eeh_cfg_blocked(struct 
device_node *dn)
 static int powernv_eeh_read_config(struct device_node *dn,
   int where, int size, u32 *val)
 {
+   struct pci_dn *pdn = PCI_DN(dn);
+
+   if (!pdn)
+   return PCIBIOS_DEVICE_NOT_FOUND;
+
if (powernv_eeh_cfg_blocked(dn)) {
*val = 0x;
return PCIBIOS_SET_FAILED;
}
 
-   return pnv_pci_cfg_read(dn, where, size, val);
+   return pnv_pci_cfg_read(pdn, where, size, val);
 }
 
 static int powernv_eeh_write_config(struct device_node *dn,
int where, int size, u32 val)
 {
+   struct pci_dn *pdn = PCI_DN(dn);
+
+   if (!pdn)
+   return PCIBIOS_DEVICE_NOT_FOUND;
+
if (powernv_eeh_cfg_blocked(dn))
return PCIBIOS_SET_FAILED;
 
-   return pnv_pci_cfg_write(dn, where, size, val);
+   return pnv_pci_cfg_write(pdn, where, size, val);
 }
 
 /**
diff --git a/arch/powerpc/platforms/powernv/pci.c 
b/arch/powerpc/platforms/powernv/pci.c
index e69142f4af08..6c20d6e70383 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -366,9 +366,9 @@ static void pnv_pci_handle_eeh_config(struct pnv_phb *phb, 
u32 pe_no)
spin_unlock_irqrestore(&phb->lock, flags);
 }
 
-static void pnv_pci_config_check_eeh(struct pnv_phb *phb,
-struct device_node *dn)
+static void pnv_pci_config_check_eeh(struct pci_dn *pdn)
 {
+   struct pnv_phb *phb = pdn->phb->private_data;
u8  fstate;
__be16  pcierr;
int pe_no;
@@ -379,7 +379,7 @@ static void pnv_pci_config_check_eeh(struct pnv_phb *phb,
 * setup that yet. So all ER errors should be mapped to
 * reserved PE.
 */
-   pe_no = PCI_DN(dn)->pe_number;
+   pe_no = pdn->pe_number;
if (pe_no == IODA_INVALID_PE) {
if (phb->type == PNV_PHB_P5IOC2)
pe_no = 0;
@@ -407,8 +407,7 @@ static void pnv_pci_config_check_eeh(struct pnv_phb *phb,
}
 
cfg_dbg(" -> EEH check, bdfn=%04x PE#%d fstate=%x\n",
-   (PCI_DN(dn)->busno << 8) | (PCI_DN(dn)->devfn),
-   pe_no, fstate);
+   (pdn->busno << 8) | (pdn->devfn), pe_no, fstate);
 
/* Clear the frozen state if applicable */
if (fstate == OPAL_EEH_STOPPED_MMIO_FREEZE ||
@@ -425,10 +424,9 @@ static void pnv_pci_config_check_eeh(struct pnv_phb *phb,
}
 }
 
-int pnv_pci_cfg_read(struct device_node *dn,
+int pnv_pci_cfg_read(struct pci_dn *pdn,
 int where, int size, u32 *val)
 {
-   struct pci_dn *pdn = PCI_DN(dn);
struct pnv_phb *phb = pdn->phb->private_data;
u32 bdfn = (pdn->busno << 8) | pdn->devfn;
s64 rc;
@@ -462,10 +460,9 @@ int pnv_pci_cfg_read(struct device_node *dn,
return PCIBIOS_SUCCESSFUL;
 }
 
-int pnv_pci_cfg_write(struct device_node *dn,
+int pnv_pci_cfg_write(struct pci_dn *pdn,
  int where, int size, u32 val)
 {
-   struct pci_dn *pdn = PCI_DN(dn);
struct pnv_phb *phb = pdn->phb->private_data;
u32 bdfn = (pdn->busno << 8) | pdn->devfn;
 
@@ -489,18 +486,17 @@ int pnv_pci_cfg_write(struct device_node *dn,
 }
 
 #if CONFIG_EEH
-static bool pnv_pci_cfg_check(struct pci_controller *hose,
- struct device_node *dn)
+static bool pnv_pci_cfg_check(struct pci_dn *pdn)
 {
struct eeh_dev *edev = NULL;
-   struct pnv_phb *phb = hose->private_data;
+   struct pnv_phb *phb = pdn->phb->private_data;
 
/* EEH not enabled ? */
if (!(phb->flags & PNV_PHB_FLAG_EEH))
return true;
 
/* PE reset or device removed ? */
-   edev = of_node_to_eeh_dev(dn);
+   edev = pdn->edev;
if (edev) {
if (edev->pe &&
(edev->pe->state & EEH_PE_CFG_BLOCKED))
@@ -513,8 +509,7 @@ static bool pnv_pci_cfg_check(struct pci_controller *hose,
return true;
 }
 #else
-static inline pnv_pci_cfg_check(struct pci_controller *hose,
-   struct devi

[PATCH v12 12/21] powerpc/pci: Refactor pci_dn

2015-02-24 Thread Bjorn Helgaas
From: Gavin Shan 

pci_dn is the extension of PCI device node and is created from device node.
Unfortunately, VFs are enabled dynamically by PF's driver and they don't
have corresponding device nodes, and pci_dn.  Refactor pci_dn to support
VFs:

   * pci_dn is organized as a hierarchy tree.  VF's pci_dn is put
 to the child list of pci_dn of PF's bridge.  pci_dn of other device
 put to the child list of pci_dn of its upstream bridge.

   * VF's pci_dn is expected to be created dynamically when PF
 enabling VFs.  VF's pci_dn will be destroyed when PF disabling VFs.
 pci_dn of other device is still created from device node as before.

   * For one particular PCI device (VF or not), its pci_dn can be
 found from pdev->dev.archdata.firmware_data, PCI_DN(devnode), or
 parent's list.  The fast path (fetching pci_dn through PCI device
 instance) is populated during early fixup time.

[bhelgaas: add ifdef around add_one_dev_pci_info(), use dev_printk()]
Signed-off-by: Gavin Shan 
Signed-off-by: Bjorn Helgaas 
---
 arch/powerpc/include/asm/device.h |3 
 arch/powerpc/include/asm/pci-bridge.h |   14 +-
 arch/powerpc/kernel/pci_dn.c  |  245 +
 arch/powerpc/platforms/powernv/pci-ioda.c |   16 ++
 4 files changed, 272 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/device.h 
b/arch/powerpc/include/asm/device.h
index 38faeded7d59..29992cd020bb 100644
--- a/arch/powerpc/include/asm/device.h
+++ b/arch/powerpc/include/asm/device.h
@@ -34,6 +34,9 @@ struct dev_archdata {
 #ifdef CONFIG_SWIOTLB
dma_addr_t  max_direct_dma_addr;
 #endif
+#ifdef CONFIG_PPC64
+   void*firmware_data;
+#endif
 #ifdef CONFIG_EEH
struct eeh_dev  *edev;
 #endif
diff --git a/arch/powerpc/include/asm/pci-bridge.h 
b/arch/powerpc/include/asm/pci-bridge.h
index 546d036fe925..513f8f27060d 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -89,6 +89,7 @@ struct pci_controller {
 
 #ifdef CONFIG_PPC64
unsigned long buid;
+   void *firmware_data;
 #endif /* CONFIG_PPC64 */
 
void *private_data;
@@ -154,9 +155,13 @@ static inline int isa_vaddr_is_ioport(void __iomem 
*address)
 struct iommu_table;
 
 struct pci_dn {
+   int flags;
+#define PCI_DN_FLAG_IOV_VF 0x01
+
int busno;  /* pci bus number */
int devfn;  /* pci device and function number */
 
+   struct  pci_dn *parent;
struct  pci_controller *phb;/* for pci devices */
struct  iommu_table *iommu_table;   /* for phb's or bridges */
struct  device_node *node;  /* back-pointer to the device_node */
@@ -171,14 +176,19 @@ struct pci_dn {
 #ifdef CONFIG_PPC_POWERNV
int pe_number;
 #endif
+   struct list_head child_list;
+   struct list_head list;
 };
 
 /* Get the pointer to a device_node's pci_dn */
 #define PCI_DN(dn) ((struct pci_dn *) (dn)->data)
 
+extern struct pci_dn *pci_get_pdn_by_devfn(struct pci_bus *bus,
+  int devfn);
 extern struct pci_dn *pci_get_pdn(struct pci_dev *pdev);
-
-extern void * update_dn_pci_info(struct device_node *dn, void *data);
+extern struct pci_dn *add_dev_pci_info(struct pci_dev *pdev);
+extern void remove_dev_pci_info(struct pci_dev *pdev);
+extern void *update_dn_pci_info(struct device_node *dn, void *data);
 
 static inline int pci_device_from_OF_node(struct device_node *np,
  u8 *bus, u8 *devfn)
diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
index 83df3075d3df..f3a1a81d112f 100644
--- a/arch/powerpc/kernel/pci_dn.c
+++ b/arch/powerpc/kernel/pci_dn.c
@@ -32,12 +32,223 @@
 #include 
 #include 
 
+/*
+ * The function is used to find the firmware data of one
+ * specific PCI device, which is attached to the indicated
+ * PCI bus. For VFs, their firmware data is linked to that
+ * one of PF's bridge. For other devices, their firmware
+ * data is linked to that of their bridge.
+ */
+static struct pci_dn *pci_bus_to_pdn(struct pci_bus *bus)
+{
+   struct pci_bus *pbus;
+   struct device_node *dn;
+   struct pci_dn *pdn;
+
+   /*
+* We probably have virtual bus which doesn't
+* have associated bridge.
+*/
+   pbus = bus;
+   while (pbus) {
+   if (pci_is_root_bus(pbus) || pbus->self)
+   break;
+
+   pbus = pbus->parent;
+   }
+
+   /*
+* Except virtual bus, all PCI buses should
+* have device nodes.
+*/
+   dn = pci_bus_to_OF_node(pbus);
+   pdn = dn ? PCI_DN(dn) : NULL;
+
+   return pdn;
+}
+
+struct pci_dn *pci_get_pdn_by_devfn(struct pci_bus *bus,
+   int devfn)
+{
+   struct device_node *dn = NULL;
+   struct pci_dn *parent, *pdn;
+   

[PATCH v12 11/21] powerpc/pci: Don't unset PCI resources for VFs

2015-02-24 Thread Bjorn Helgaas
From: Wei Yang 

If we're going to reassign resources with flag PCI_REASSIGN_ALL_RSRC, all
resources will be cleaned out during device header fixup time and then get
reassigned by PCI core.  However, the VF resources won't be reassigned and
thus, we shouldn't clean them out.

If the pci_dev is a VF, skip the resource unset process.

Signed-off-by: Wei Yang 
Signed-off-by: Bjorn Helgaas 
---
 arch/powerpc/kernel/pci-common.c |4 
 1 file changed, 4 insertions(+)

diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index 2a525c938158..82031011522f 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -788,6 +788,10 @@ static void pcibios_fixup_resources(struct pci_dev *dev)
   pci_name(dev));
return;
}
+
+   if (dev->is_virtfn)
+   return;
+
for (i = 0; i < DEVICE_COUNT_RESOURCE; i++) {
struct resource *res = dev->resource + i;
struct pci_bus_region reg;

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v12 10/21] PCI: Consider additional PF's IOV BAR alignment in sizing and assigning

2015-02-24 Thread Bjorn Helgaas
From: Wei Yang 

When sizing and assigning resources, we divide the resources into two
lists: the requested list and the additional list.  We don't consider the
alignment of additional VF(n) BAR space.

This is reasonable because the alignment required for the VF(n) BAR space
is the size of an individual VF BAR, not the size of the space for *all*
VFs.  But some platforms, e.g., PowerNV, require additional alignment.

Consider the additional IOV BAR alignment when sizing and assigning
resources.  When there is not enough system MMIO space, the PF's IOV BAR
alignment will not contribute to the bridge.  When there is enough system
MMIO space, the additional alignment will contribute to the bridge.

Also, take advantage of pci_dev_resource::min_align to store this
additional alignment.

[bhelgaas: changelog, printk cast]
Signed-off-by: Wei Yang 
Signed-off-by: Bjorn Helgaas 
---
 drivers/pci/setup-bus.c |   83 ---
 1 file changed, 70 insertions(+), 13 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index e3e17f3c0f0f..affbceae560f 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -99,8 +99,8 @@ static void remove_from_list(struct list_head *head,
}
 }
 
-static resource_size_t get_res_add_size(struct list_head *head,
-   struct resource *res)
+static struct pci_dev_resource *res_to_dev_res(struct list_head *head,
+  struct resource *res)
 {
struct pci_dev_resource *dev_res;
 
@@ -109,17 +109,37 @@ static resource_size_t get_res_add_size(struct list_head 
*head,
int idx = res - &dev_res->dev->resource[0];
 
dev_printk(KERN_DEBUG, &dev_res->dev->dev,
-"res[%d]=%pR get_res_add_size add_size %llx\n",
+"res[%d]=%pR res_to_dev_res add_size %llx 
min_align %llx\n",
 idx, dev_res->res,
-(unsigned long long)dev_res->add_size);
+(unsigned long long)dev_res->add_size,
+(unsigned long long)dev_res->min_align);
 
-   return dev_res->add_size;
+   return dev_res;
}
}
 
-   return 0;
+   return NULL;
+}
+
+static resource_size_t get_res_add_size(struct list_head *head,
+   struct resource *res)
+{
+   struct pci_dev_resource *dev_res;
+
+   dev_res = res_to_dev_res(head, res);
+   return dev_res ? dev_res->add_size : 0;
+}
+
+static resource_size_t get_res_add_align(struct list_head *head,
+struct resource *res)
+{
+   struct pci_dev_resource *dev_res;
+
+   dev_res = res_to_dev_res(head, res);
+   return dev_res ? dev_res->min_align : 0;
 }
 
+
 /* Sort resources by alignment */
 static void pdev_sort_resources(struct pci_dev *dev, struct list_head *head)
 {
@@ -368,8 +388,9 @@ static void __assign_resources_sorted(struct list_head 
*head,
LIST_HEAD(save_head);
LIST_HEAD(local_fail_head);
struct pci_dev_resource *save_res;
-   struct pci_dev_resource *dev_res, *tmp_res;
+   struct pci_dev_resource *dev_res, *tmp_res, *dev_res2;
unsigned long fail_type;
+   resource_size_t add_align, align;
 
/* Check if optional add_size is there */
if (!realloc_head || list_empty(realloc_head))
@@ -384,10 +405,38 @@ static void __assign_resources_sorted(struct list_head 
*head,
}
 
/* Update res in head list with add_size in realloc_head list */
-   list_for_each_entry(dev_res, head, list)
+   list_for_each_entry_safe(dev_res, tmp_res, head, list) {
dev_res->res->end += get_res_add_size(realloc_head,
dev_res->res);
 
+   /*
+* There are two kinds of additional resources in the list:
+* 1. bridge resource  -- IORESOURCE_STARTALIGN
+* 2. SR-IOV resource   -- IORESOURCE_SIZEALIGN
+* Here just fix the additional alignment for bridge
+*/
+   if (!(dev_res->res->flags & IORESOURCE_STARTALIGN))
+   continue;
+
+   add_align = get_res_add_align(realloc_head, dev_res->res);
+
+   /* Reorder the list by their alignment */
+   if (add_align > dev_res->res->start) {
+   dev_res->res->start = add_align;
+   dev_res->res->end = add_align +
+   resource_size(dev_res->res);
+
+   list_for_each_entry(dev_res2, head, list) {
+   align = pci_resource_alignment(dev_res2->dev,
+ 

[PATCH v12 09/21] PCI: Add pcibios_iov_resource_alignment() interface

2015-02-24 Thread Bjorn Helgaas
From: Wei Yang 

Per the SR-IOV spec r1.1, sec 3.3.14, the required alignment of a PF's IOV
BAR is the size of an individual VF BAR, and the size consumed is the
individual VF BAR size times NumVFs.

The PowerNV platform has additional alignment requirements to help support
its Partitionable Endpoint device isolation feature (see
Documentation/powerpc/pci_iov_resource_on_powernv.txt).

Add a pcibios_iov_resource_alignment() interface to allow platforms to
request additional alignment.

[bhelgaas: changelog, adapt to reworked pci_sriov_resource_alignment(),
drop "align" parameter]
Signed-off-by: Wei Yang 
Signed-off-by: Bjorn Helgaas 
---
 drivers/pci/iov.c   |8 +++-
 include/linux/pci.h |1 +
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index cc6fedf4a1b9..bde0f02cae32 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -569,6 +569,12 @@ int pci_iov_resource_bar(struct pci_dev *dev, int resno)
4 * (resno - PCI_IOV_RESOURCES);
 }
 
+resource_size_t __weak pcibios_iov_resource_alignment(struct pci_dev *dev,
+ int resno)
+{
+   return pci_iov_resource_size(dev, resno);
+}
+
 /**
  * pci_sriov_resource_alignment - get resource alignment for VF BAR
  * @dev: the PCI device
@@ -581,7 +587,7 @@ int pci_iov_resource_bar(struct pci_dev *dev, int resno)
  */
 resource_size_t pci_sriov_resource_alignment(struct pci_dev *dev, int resno)
 {
-   return pci_iov_resource_size(dev, resno);
+   return pcibios_iov_resource_alignment(dev, resno);
 }
 
 /**
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 99ea94835fb6..4e1f17db1a81 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1174,6 +1174,7 @@ unsigned char pci_bus_max_busnr(struct pci_bus *bus);
 void pci_setup_bridge(struct pci_bus *bus);
 resource_size_t pcibios_window_alignment(struct pci_bus *bus,
 unsigned long type);
+resource_size_t pcibios_iov_resource_alignment(struct pci_dev *dev, int resno);
 
 #define PCI_VGA_STATE_CHANGE_BRIDGE (1 << 0)
 #define PCI_VGA_STATE_CHANGE_DECODES (1 << 1)

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v12 08/21] PCI: Add pcibios_sriov_enable() and pcibios_sriov_disable()

2015-02-24 Thread Bjorn Helgaas
From: Wei Yang 

VFs are dynamically created when a driver enables them.  On some platforms,
like PowerNV, special resources are necessary to enable VFs.

Add platform hooks for enabling and disabling VFs.

Signed-off-by: Wei Yang 
Signed-off-by: Bjorn Helgaas 
---
 drivers/pci/iov.c |   19 +++
 1 file changed, 19 insertions(+)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 5643a1011e23..cc6fedf4a1b9 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -220,6 +220,11 @@ static void virtfn_remove(struct pci_dev *dev, int id, int 
reset)
pci_dev_put(dev);
 }
 
+int __weak pcibios_sriov_enable(struct pci_dev *pdev, u16 vf_num)
+{
+   return 0;
+}
+
 static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
 {
int rc;
@@ -231,6 +236,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
struct pci_sriov *iov = dev->sriov;
int bars = 0;
int bus;
+   int retval;
 
if (!nr_virtfn)
return 0;
@@ -307,6 +313,12 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
if (nr_virtfn < initial)
initial = nr_virtfn;
 
+   if ((retval = pcibios_sriov_enable(dev, initial))) {
+   dev_err(&dev->dev, "failure %d from pcibios_sriov_enable()\n",
+   retval);
+   return retval;
+   }
+
for (i = 0; i < initial; i++) {
rc = virtfn_add(dev, i, 0);
if (rc)
@@ -335,6 +347,11 @@ failed:
return rc;
 }
 
+int __weak pcibios_sriov_disable(struct pci_dev *pdev)
+{
+   return 0;
+}
+
 static void sriov_disable(struct pci_dev *dev)
 {
int i;
@@ -346,6 +363,8 @@ static void sriov_disable(struct pci_dev *dev)
for (i = 0; i < iov->num_VFs; i++)
virtfn_remove(dev, i, 0);
 
+   pcibios_sriov_disable(dev);
+
iov->ctrl &= ~(PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE);
pci_cfg_access_lock(dev);
pci_write_config_word(dev, iov->pos + PCI_SRIOV_CTRL, iov->ctrl);

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v12 07/21] PCI: Export pci_iov_virtfn_bus() and pci_iov_virtfn_devfn()

2015-02-24 Thread Bjorn Helgaas
From: Wei Yang 

On PowerNV, some resource reservation is needed for SR-IOV VFs that don't
exist at the bootup stage.  To do the match between resources and VFs, the
code need to get the VF's BDF in advance.

Rename virtfn_bus() and virtfn_devfn() to pci_iov_virtfn_bus() and
pci_iov_virtfn_devfn() and export them.

[bhelgaas: changelog, make "busnr" int]
Signed-off-by: Wei Yang 
Signed-off-by: Bjorn Helgaas 
---
 drivers/pci/iov.c   |   28 
 include/linux/pci.h |   11 +++
 2 files changed, 27 insertions(+), 12 deletions(-)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 2ae921f84bd3..5643a1011e23 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -19,16 +19,20 @@
 
 #define VIRTFN_ID_LEN  16
 
-static inline u8 virtfn_bus(struct pci_dev *dev, int id)
+int pci_iov_virtfn_bus(struct pci_dev *dev, int vf_id)
 {
+   if (!dev->is_physfn)
+   return -EINVAL;
return dev->bus->number + ((dev->devfn + dev->sriov->offset +
-   dev->sriov->stride * id) >> 8);
+   dev->sriov->stride * vf_id) >> 8);
 }
 
-static inline u8 virtfn_devfn(struct pci_dev *dev, int id)
+int pci_iov_virtfn_devfn(struct pci_dev *dev, int vf_id)
 {
+   if (!dev->is_physfn)
+   return -EINVAL;
return (dev->devfn + dev->sriov->offset +
-   dev->sriov->stride * id) & 0xff;
+   dev->sriov->stride * vf_id) & 0xff;
 }
 
 /*
@@ -58,11 +62,11 @@ static inline u8 virtfn_max_buses(struct pci_dev *dev)
struct pci_sriov *iov = dev->sriov;
int nr_virtfn;
u8 max = 0;
-   u8 busnr;
+   int busnr;
 
for (nr_virtfn = 1; nr_virtfn <= iov->total_VFs; nr_virtfn++) {
pci_iov_set_numvfs(dev, nr_virtfn);
-   busnr = virtfn_bus(dev, nr_virtfn - 1);
+   busnr = pci_iov_virtfn_bus(dev, nr_virtfn - 1);
if (busnr > max)
max = busnr;
}
@@ -116,7 +120,7 @@ static int virtfn_add(struct pci_dev *dev, int id, int 
reset)
struct pci_bus *bus;
 
mutex_lock(&iov->dev->sriov->lock);
-   bus = virtfn_add_bus(dev->bus, virtfn_bus(dev, id));
+   bus = virtfn_add_bus(dev->bus, pci_iov_virtfn_bus(dev, id));
if (!bus)
goto failed;
 
@@ -124,7 +128,7 @@ static int virtfn_add(struct pci_dev *dev, int id, int 
reset)
if (!virtfn)
goto failed0;
 
-   virtfn->devfn = virtfn_devfn(dev, id);
+   virtfn->devfn = pci_iov_virtfn_devfn(dev, id);
virtfn->vendor = dev->vendor;
pci_read_config_word(dev, iov->pos + PCI_SRIOV_VF_DID, &virtfn->device);
pci_setup_device(virtfn);
@@ -186,8 +190,8 @@ static void virtfn_remove(struct pci_dev *dev, int id, int 
reset)
struct pci_sriov *iov = dev->sriov;
 
virtfn = pci_get_domain_bus_and_slot(pci_domain_nr(dev->bus),
-virtfn_bus(dev, id),
-virtfn_devfn(dev, id));
+pci_iov_virtfn_bus(dev, id),
+pci_iov_virtfn_devfn(dev, id));
if (!virtfn)
return;
 
@@ -226,7 +230,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
struct pci_dev *pdev;
struct pci_sriov *iov = dev->sriov;
int bars = 0;
-   u8 bus;
+   int bus;
 
if (!nr_virtfn)
return 0;
@@ -263,7 +267,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
iov->offset = offset;
iov->stride = stride;
 
-   bus = virtfn_bus(dev, nr_virtfn - 1);
+   bus = pci_iov_virtfn_bus(dev, nr_virtfn - 1);
if (bus > dev->bus->busn_res.end) {
dev_err(&dev->dev, "can't enable %d VFs (bus %02x out of range 
of %pR)\n",
nr_virtfn, bus, &dev->bus->busn_res);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 15596582e575..99ea94835fb6 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1669,6 +1669,9 @@ int pci_ext_cfg_avail(void);
 void __iomem *pci_ioremap_bar(struct pci_dev *pdev, int bar);
 
 #ifdef CONFIG_PCI_IOV
+int pci_iov_virtfn_bus(struct pci_dev *dev, int id);
+int pci_iov_virtfn_devfn(struct pci_dev *dev, int id);
+
 int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn);
 void pci_disable_sriov(struct pci_dev *dev);
 int pci_num_vf(struct pci_dev *dev);
@@ -1677,6 +1680,14 @@ int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 
numvfs);
 int pci_sriov_get_totalvfs(struct pci_dev *dev);
 resource_size_t pci_iov_resource_size(struct pci_dev *dev, int resno);
 #else
+static inline int pci_iov_virtfn_bus(struct pci_dev *dev, int id)
+{
+   return -ENOSYS;
+}
+static inline int pci_iov_virtfn_devfn(struct pci_dev *dev, int id)
+{
+   return -ENOSYS;
+}
 static inline int pci_enable_sriov(struct pci_dev *dev, int nr_virt

[PATCH v12 06/21] PCI: Calculate maximum number of buses required for VFs

2015-02-24 Thread Bjorn Helgaas
From: Wei Yang 

An SR-IOV device can change its First VF Offset and VF Stride based on the
values of ARI Capable Hierarchy and NumVFs.  The number of buses required
for all VFs is determined by NumVFs, First VF Offset, and VF Stride (see
SR-IOV spec r1.1, sec 2.1.2).

Previously pci_iov_bus_range() computed how many buses would be required by
TotalVFs, but this was based on a single NumVFs value and may not have been
the maximum for all NumVFs configurations.

Iterate over all valid NumVFs and calculate the maximum number of bus
numbers that could ever be required for VFs of this device.

[bhelgaas: changelog, compute busnr of NumVFs, not TotalVFs, remove
kerenl-doc comment marker]
Signed-off-by: Wei Yang 
Signed-off-by: Bjorn Helgaas 
---
 drivers/pci/iov.c |   31 +++
 drivers/pci/pci.h |1 +
 2 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index a8752c2c2b53..2ae921f84bd3 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -46,6 +46,30 @@ static inline void pci_iov_set_numvfs(struct pci_dev *dev, 
int nr_virtfn)
pci_read_config_word(dev, iov->pos + PCI_SRIOV_VF_STRIDE, &iov->stride);
 }
 
+/*
+ * The PF consumes one bus number.  NumVFs, First VF Offset, and VF Stride
+ * determine how many additional bus numbers will be consumed by VFs.
+ *
+ * Iterate over all valid NumVFs and calculate the maximum number of bus
+ * numbers that could ever be required.
+ */
+static inline u8 virtfn_max_buses(struct pci_dev *dev)
+{
+   struct pci_sriov *iov = dev->sriov;
+   int nr_virtfn;
+   u8 max = 0;
+   u8 busnr;
+
+   for (nr_virtfn = 1; nr_virtfn <= iov->total_VFs; nr_virtfn++) {
+   pci_iov_set_numvfs(dev, nr_virtfn);
+   busnr = virtfn_bus(dev, nr_virtfn - 1);
+   if (busnr > max)
+   max = busnr;
+   }
+
+   return max;
+}
+
 static struct pci_bus *virtfn_add_bus(struct pci_bus *bus, int busnr)
 {
struct pci_bus *child;
@@ -427,6 +451,7 @@ found:
 
dev->sriov = iov;
dev->is_physfn = 1;
+   iov->max_VF_buses = virtfn_max_buses(dev);
 
return 0;
 
@@ -556,15 +581,13 @@ void pci_restore_iov_state(struct pci_dev *dev)
 int pci_iov_bus_range(struct pci_bus *bus)
 {
int max = 0;
-   u8 busnr;
struct pci_dev *dev;
 
list_for_each_entry(dev, &bus->devices, bus_list) {
if (!dev->is_physfn)
continue;
-   busnr = virtfn_bus(dev, dev->sriov->total_VFs - 1);
-   if (busnr > max)
-   max = busnr;
+   if (dev->sriov->max_VF_buses > max)
+   max = dev->sriov->max_VF_buses;
}
 
return max ? max - bus->number : 0;
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 57329645dd01..bae593c04541 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -243,6 +243,7 @@ struct pci_sriov {
u16 stride; /* following VF stride */
u32 pgsz;   /* page size for BAR alignment */
u8 link;/* Function Dependency Link */
+   u8 max_VF_buses;/* max buses consumed by VFs */
u16 driver_max_VFs; /* max num VFs driver supports */
struct pci_dev *dev;/* lowest numbered PF */
struct pci_dev *self;   /* this PF */

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v12 05/21] PCI: Refresh First VF Offset and VF Stride when updating NumVFs

2015-02-24 Thread Bjorn Helgaas
From: Wei Yang 

The First VF Offset and VF Stride fields depend on the NumVFs setting, so
refresh the cached fields in struct pci_sriov when updating NumVFs.  See
the SR-IOV spec r1.1, sec 3.3.9 and 3.3.10.

[bhelgaas: changelog, remove kernel-doc comment marker]
Signed-off-by: Wei Yang 
Signed-off-by: Bjorn Helgaas 
---
 drivers/pci/iov.c |   23 +++
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 27b98c361823..a8752c2c2b53 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -31,6 +31,21 @@ static inline u8 virtfn_devfn(struct pci_dev *dev, int id)
dev->sriov->stride * id) & 0xff;
 }
 
+/*
+ * Per SR-IOV spec sec 3.3.10 and 3.3.11, First VF Offset and VF Stride may
+ * change when NumVFs changes.
+ *
+ * Update iov->offset and iov->stride when NumVFs is written.
+ */
+static inline void pci_iov_set_numvfs(struct pci_dev *dev, int nr_virtfn)
+{
+   struct pci_sriov *iov = dev->sriov;
+
+   pci_write_config_word(dev, iov->pos + PCI_SRIOV_NUM_VF, nr_virtfn);
+   pci_read_config_word(dev, iov->pos + PCI_SRIOV_VF_OFFSET, &iov->offset);
+   pci_read_config_word(dev, iov->pos + PCI_SRIOV_VF_STRIDE, &iov->stride);
+}
+
 static struct pci_bus *virtfn_add_bus(struct pci_bus *bus, int busnr)
 {
struct pci_bus *child;
@@ -253,7 +268,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
return rc;
}
 
-   pci_write_config_word(dev, iov->pos + PCI_SRIOV_NUM_VF, nr_virtfn);
+   pci_iov_set_numvfs(dev, nr_virtfn);
iov->ctrl |= PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE;
pci_cfg_access_lock(dev);
pci_write_config_word(dev, iov->pos + PCI_SRIOV_CTRL, iov->ctrl);
@@ -282,7 +297,7 @@ failed:
iov->ctrl &= ~(PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE);
pci_cfg_access_lock(dev);
pci_write_config_word(dev, iov->pos + PCI_SRIOV_CTRL, iov->ctrl);
-   pci_write_config_word(dev, iov->pos + PCI_SRIOV_NUM_VF, 0);
+   pci_iov_set_numvfs(dev, 0);
ssleep(1);
pci_cfg_access_unlock(dev);
 
@@ -313,7 +328,7 @@ static void sriov_disable(struct pci_dev *dev)
sysfs_remove_link(&dev->dev.kobj, "dep_link");
 
iov->num_VFs = 0;
-   pci_write_config_word(dev, iov->pos + PCI_SRIOV_NUM_VF, 0);
+   pci_iov_set_numvfs(dev, 0);
 }
 
 static int sriov_init(struct pci_dev *dev, int pos)
@@ -452,7 +467,7 @@ static void sriov_restore_state(struct pci_dev *dev)
pci_update_resource(dev, i);
 
pci_write_config_dword(dev, iov->pos + PCI_SRIOV_SYS_PGSIZE, iov->pgsz);
-   pci_write_config_word(dev, iov->pos + PCI_SRIOV_NUM_VF, iov->num_VFs);
+   pci_iov_set_numvfs(dev, iov->num_VFs);
pci_write_config_word(dev, iov->pos + PCI_SRIOV_CTRL, iov->ctrl);
if (iov->ctrl & PCI_SRIOV_CTRL_VFE)
msleep(100);

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v12 04/21] PCI: Index IOV resources in the conventional style

2015-02-24 Thread Bjorn Helgaas
Most of PCI uses "res = &dev->resource[i]", not "res = dev->resource + i".
Use that style in iov.c also.

No functional change.

Signed-off-by: Bjorn Helgaas 
---
 drivers/pci/iov.c |8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 5bca0e1a2799..27b98c361823 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -95,7 +95,7 @@ static int virtfn_add(struct pci_dev *dev, int id, int reset)
virtfn->multifunction = 0;
 
for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
-   res = dev->resource + PCI_IOV_RESOURCES + i;
+   res = &dev->resource[i + PCI_IOV_RESOURCES];
if (!res->parent)
continue;
virtfn->resource[i].name = pci_name(virtfn);
@@ -212,7 +212,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
nres = 0;
for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
bars |= (1 << (i + PCI_IOV_RESOURCES));
-   res = dev->resource + PCI_IOV_RESOURCES + i;
+   res = &dev->resource[i + PCI_IOV_RESOURCES];
if (res->parent)
nres++;
}
@@ -373,7 +373,7 @@ found:
 
nres = 0;
for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
-   res = dev->resource + PCI_IOV_RESOURCES + i;
+   res = &dev->resource[i + PCI_IOV_RESOURCES];
bar64 = __pci_read_base(dev, pci_bar_unknown, res,
pos + PCI_SRIOV_BAR + i * 4);
if (!res->flags)
@@ -417,7 +417,7 @@ found:
 
 failed:
for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
-   res = dev->resource + PCI_IOV_RESOURCES + i;
+   res = &dev->resource[i + PCI_IOV_RESOURCES];
res->flags = 0;
}
 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v12 03/21] PCI: Keep individual VF BAR size in struct pci_sriov

2015-02-24 Thread Bjorn Helgaas
From: Wei Yang 

Currently we don't store the individual VF BAR size.  We calculate it when
needed by dividing the PF's IOV resource size (which contains space for
*all* the VFs) by total_VFs or by reading the BAR in the SR-IOV capability
again.

Keep the individual VF BAR size in struct pci_sriov.barsz[], add
pci_iov_resource_size() to retrieve it, and use that instead of doing the
division or reading the SR-IOV capability BAR.

[bhelgaas: rename to "barsz[]", simplify barsz[] index computation, remove
SR-IOV capability BAR sizing]
Signed-off-by: Wei Yang 
Signed-off-by: Bjorn Helgaas 
---
 drivers/pci/iov.c   |   39 ---
 drivers/pci/pci.h   |1 +
 include/linux/pci.h |3 +++
 3 files changed, 24 insertions(+), 19 deletions(-)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 05f9d97e4175..5bca0e1a2799 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -57,6 +57,14 @@ static void virtfn_remove_bus(struct pci_bus *physbus, 
struct pci_bus *virtbus)
pci_remove_bus(virtbus);
 }
 
+resource_size_t pci_iov_resource_size(struct pci_dev *dev, int resno)
+{
+   if (!dev->is_physfn)
+   return 0;
+
+   return dev->sriov->barsz[resno - PCI_IOV_RESOURCES];
+}
+
 static int virtfn_add(struct pci_dev *dev, int id, int reset)
 {
int i;
@@ -92,8 +100,7 @@ static int virtfn_add(struct pci_dev *dev, int id, int reset)
continue;
virtfn->resource[i].name = pci_name(virtfn);
virtfn->resource[i].flags = res->flags;
-   size = resource_size(res);
-   do_div(size, iov->total_VFs);
+   size = pci_iov_resource_size(dev, i + PCI_IOV_RESOURCES);
virtfn->resource[i].start = res->start + size * id;
virtfn->resource[i].end = virtfn->resource[i].start + size - 1;
rc = request_resource(res, &virtfn->resource[i]);
@@ -311,7 +318,7 @@ static void sriov_disable(struct pci_dev *dev)
 
 static int sriov_init(struct pci_dev *dev, int pos)
 {
-   int i;
+   int i, bar64;
int rc;
int nres;
u32 pgsz;
@@ -360,29 +367,29 @@ found:
pgsz &= ~(pgsz - 1);
pci_write_config_dword(dev, pos + PCI_SRIOV_SYS_PGSIZE, pgsz);
 
+   iov = kzalloc(sizeof(*iov), GFP_KERNEL);
+   if (!iov)
+   return -ENOMEM;
+
nres = 0;
for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
res = dev->resource + PCI_IOV_RESOURCES + i;
-   i += __pci_read_base(dev, pci_bar_unknown, res,
-pos + PCI_SRIOV_BAR + i * 4);
+   bar64 = __pci_read_base(dev, pci_bar_unknown, res,
+   pos + PCI_SRIOV_BAR + i * 4);
if (!res->flags)
continue;
if (resource_size(res) & (PAGE_SIZE - 1)) {
rc = -EIO;
goto failed;
}
+   iov->barsz[i] = resource_size(res);
res->end = res->start + resource_size(res) * total - 1;
dev_info(&dev->dev, "VF(n) BAR%d space: %pR (contains BAR%d for 
%d VFs)\n",
 i, res, i, total);
+   i += bar64;
nres++;
}
 
-   iov = kzalloc(sizeof(*iov), GFP_KERNEL);
-   if (!iov) {
-   rc = -ENOMEM;
-   goto failed;
-   }
-
iov->pos = pos;
iov->nres = nres;
iov->ctrl = ctrl;
@@ -414,6 +421,7 @@ failed:
res->flags = 0;
}
 
+   kfree(iov);
return rc;
 }
 
@@ -510,14 +518,7 @@ int pci_iov_resource_bar(struct pci_dev *dev, int resno)
  */
 resource_size_t pci_sriov_resource_alignment(struct pci_dev *dev, int resno)
 {
-   struct resource tmp;
-   int reg = pci_iov_resource_bar(dev, resno);
-
-   if (!reg)
-   return 0;
-
-__pci_read_base(dev, pci_bar_unknown, &tmp, reg);
-   return resource_alignment(&tmp);
+   return pci_iov_resource_size(dev, resno);
 }
 
 /**
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 4091f82239cd..57329645dd01 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -247,6 +247,7 @@ struct pci_sriov {
struct pci_dev *dev;/* lowest numbered PF */
struct pci_dev *self;   /* this PF */
struct mutex lock;  /* lock for VF bus */
+   resource_size_t barsz[PCI_SRIOV_NUM_BARS];  /* VF BAR size */
 };
 
 #ifdef CONFIG_PCI_ATS
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 211e9da8a7d7..15596582e575 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1675,6 +1675,7 @@ int pci_num_vf(struct pci_dev *dev);
 int pci_vfs_assigned(struct pci_dev *dev);
 int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 numvfs);
 int pci_sriov_get_totalvfs(struct pci_dev *dev);
+resource_size_t pci_iov_resource_size(struct pci_dev *dev, int resno);
 #e

[PATCH v12 02/21] PCI: Print PF SR-IOV resource that contains all VF(n) BAR space

2015-02-24 Thread Bjorn Helgaas
From: Wei Yang 

When we size VF BAR0, VF BAR1, etc., from the SR-IOV Capability of a PF, we
learn the alignment requirement and amount of space consumed by a single
VF.  But when VFs are enabled, *each* of the NumVFs consumes that amount of
space, so the total size of the PF resource is "VF BAR size * NumVFs".

Add a printk of the total space consumed by the VFs corresponding to what
we already do for normal non-IOV BARs.

No functional change; new message only.

[bhelgaas: split out into its own patch]
Signed-off-by: Wei Yang 
Signed-off-by: Bjorn Helgaas 
---
 drivers/pci/iov.c |2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index c4c33ead03bc..05f9d97e4175 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -372,6 +372,8 @@ found:
goto failed;
}
res->end = res->start + resource_size(res) * total - 1;
+   dev_info(&dev->dev, "VF(n) BAR%d space: %pR (contains BAR%d for 
%d VFs)\n",
+i, res, i, total);
nres++;
}
 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v12 01/21] PCI: Print more info in sriov_enable() error message

2015-02-24 Thread Bjorn Helgaas
If we don't have space for all the bus numbers required to enable VFs,
print the largest bus number required and the range available.

No functional change; improved error message only.

Signed-off-by: Bjorn Helgaas 
---
 drivers/pci/iov.c |7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 4b3a4eaad996..c4c33ead03bc 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -180,6 +180,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
struct pci_dev *pdev;
struct pci_sriov *iov = dev->sriov;
int bars = 0;
+   u8 bus;
 
if (!nr_virtfn)
return 0;
@@ -216,8 +217,10 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
iov->offset = offset;
iov->stride = stride;
 
-   if (virtfn_bus(dev, nr_virtfn - 1) > dev->bus->busn_res.end) {
-   dev_err(&dev->dev, "SR-IOV: bus number out of range\n");
+   bus = virtfn_bus(dev, nr_virtfn - 1);
+   if (bus > dev->bus->busn_res.end) {
+   dev_err(&dev->dev, "can't enable %d VFs (bus %02x out of range 
of %pR)\n",
+   nr_virtfn, bus, &dev->bus->busn_res);
return -ENOMEM;
}
 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v12 00/21] Enable SRIOV on Power8

2015-02-24 Thread Bjorn Helgaas
Wei Yang's most recent POWER8 SR-IOV patchset was v11, posted on Jan 15,
2015.

I'm having a hard time keeping everything straight between the tweaks I've
made on my branch and incremental updates.  I think it's easier to repost
the whole series so one can easily collect everything that goes together.
So here'a a v12 with the changes I've made.

Wei, please follow up with a v13 to fix anything I broke here.  Here's how
I would do that using stgit:

  git checkout -b pci/virtualization-v13 pci/virtualization-v12
  stg init
  stg uncommit -n 21
  
  stg mail -v v13 ... pci-print-more-info-in..powerpc-pci-add-pci-resource

I put v10, v11, and v12 on branches based on v4.0-rc1:

  pci/virtualization-v10(posted 12/22/2014)
  pci/virtualization-v11(posted 01/15/2015)
  pci/virtualization-v12(this posting)

This makes it relatively easy to diff the versions, e.g.,

  git diff pci/virtualization-v11 pci/virtualization-v12

These branches are at

  https://git.kernel.org/cgit/linux/kernel/git/helgaas/pci.git/

v12:
   * remove "align" parameter from pcibios_iov_resource_alignment()
 default version returns pci_iov_resource_size() instead of the
 "align" parameter
   * in powerpc pcibios_iov_resource_alignment(), return
 pci_iov_resource_size() if there's no ppc_md function pointer
   * in pci_sriov_resource_alignment(), don't re-read base, since we
 saved the required alignment when reading it the first time
   * remove "vf_num" parameter from add_dev_pci_info() and
 remove_dev_pci_info(); use pci_sriov_get_totalvfs() instead
   * use dev_warn() instead of pr_warn() when possible
   * check to be sure IOV BAR is still in range after shifting, change
 pnv_pci_vf_resource_shift() from void to int
   * improve sriov_enable() error message
   * improve SR-IOV BAR sizing message
   * index IOV resources in conventional style
   * include preamble patches (refresh offset/stride when updating numVFs,
 calculate max buses required
   * restructure pci_iov_max_bus_range() to return value instead of updating
 internally, rename to virtfn_max_buses()
   * fix typos & formatting
   * expand documentation

Bjorn

---

Bjorn Helgaas (2):
  PCI: Print more info in sriov_enable() error message
  PCI: Index IOV resources in the conventional style

Gavin Shan (1):
  powerpc/pci: Refactor pci_dn

Wei Yang (18):
  PCI: Print PF SR-IOV resource that contains all VF(n) BAR space
  PCI: Keep individual VF BAR size in struct pci_sriov
  PCI: Refresh First VF Offset and VF Stride when updating NumVFs
  PCI: Calculate maximum number of buses required for VFs
  PCI: Export pci_iov_virtfn_bus() and pci_iov_virtfn_devfn()
  PCI: Add pcibios_sriov_enable() and pcibios_sriov_disable()
  PCI: Add pcibios_iov_resource_alignment() interface
  PCI: Consider additional PF's IOV BAR alignment in sizing and assigning
  powerpc/pci: Don't unset PCI resources for VFs
  powerpc/powernv: Use pci_dn, not device_node, in PCI config accessor
  powerpc/powernv: Allocate struct pnv_ioda_pe iommu_table dynamically
  powerpc/powernv: Reserve additional space for IOV BAR according to the 
number of total_pe
  powerpc/powernv: Implement pcibios_iov_resource_alignment() on powernv
  powerpc/powernv: Shift VF resource with an offset
  powerpc/powernv: Reserve additional space for IOV BAR, with m64_per_iov 
supported
  powerpc/powernv: Group VF PE when IOV BAR is big on PHB3
  powerpc/pci: Remove unused struct pci_dn.pcidev field
  powerpc/pci: Add PCI resource alignment documentation


 .../powerpc/pci_iov_resource_on_powernv.txt|  305 
 arch/powerpc/include/asm/device.h  |3 
 arch/powerpc/include/asm/iommu.h   |3 
 arch/powerpc/include/asm/machdep.h |5 
 arch/powerpc/include/asm/pci-bridge.h  |   24 +
 arch/powerpc/kernel/pci-common.c   |   19 
 arch/powerpc/kernel/pci_dn.c   |  256 ++-
 arch/powerpc/platforms/powernv/eeh-powernv.c   |   14 
 arch/powerpc/platforms/powernv/pci-ioda.c  |  777 +++-
 arch/powerpc/platforms/powernv/pci.c   |   87 +-
 arch/powerpc/platforms/powernv/pci.h   |   13 
 drivers/pci/iov.c  |  155 +++-
 drivers/pci/pci.h  |2 
 drivers/pci/setup-bus.c|   83 ++
 include/linux/pci.h|   15 
 15 files changed, 1622 insertions(+), 139 deletions(-)
 create mode 100644 Documentation/powerpc/pci_iov_resource_on_powernv.txt
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V11 08/17] powrepc/pci: Refactor pci_dn

2015-02-24 Thread Benjamin Herrenschmidt
On Tue, 2015-02-24 at 02:13 -0600, Bjorn Helgaas wrote:
> 
> Ah, yes, now I see the problem.  I don't really like having to export
> pci_iov_virtfn_bus() and pci_iov_virtfn_devfn(), but it's probably not
> worth the hassle of changing it, and I think adding more pcibios
> interfaces
> would be even worse.

Aren't we going to eventually turn them all into host bridge ops ? :-)

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V11 08/17] powrepc/pci: Refactor pci_dn

2015-02-24 Thread Bjorn Helgaas
On Mon, Feb 23, 2015 at 11:13:49AM +1100, Gavin Shan wrote:
> On Fri, Feb 20, 2015 at 05:19:17PM -0600, Bjorn Helgaas wrote:
> >On Thu, Jan 15, 2015 at 10:27:58AM +0800, Wei Yang wrote:
> >> From: Gavin Shan 
> >> 
> >> pci_dn is the extension of PCI device node and it's created from
> >> device node. Unfortunately, VFs that are enabled dynamically by
> >> PF's driver and they don't have corresponding device nodes, and
> >> pci_dn. The patch refactors pci_dn to support VFs:
> >> 
> >>* pci_dn is organized as a hierarchy tree. VF's pci_dn is put
> >>  to the child list of pci_dn of PF's bridge. pci_dn of other
> >>  device put to the child list of pci_dn of its upstream bridge.
> >> 
> >>* VF's pci_dn is expected to be created dynamically when PF
> >>  enabling VFs. VF's pci_dn will be destroyed when PF disabling
> >>  VFs. pci_dn of other device is still created from device node
> >>  as before.
> >> 
> >>* For one particular PCI device (VF or not), its pci_dn can be
> >>  found from pdev->dev.archdata.firmware_data, PCI_DN(devnode),
> >>  or parent's list. The fast path (fetching pci_dn through PCI
> >>  device instance) is populated during early fixup time.
> >> 
> >> Signed-off-by: Gavin Shan 
> >> ---
> >>  arch/powerpc/include/asm/device.h |3 +
> >>  arch/powerpc/include/asm/pci-bridge.h |   14 +-
> >>  arch/powerpc/kernel/pci_dn.c  |  242 
> >> -
> >>  arch/powerpc/platforms/powernv/pci-ioda.c |   16 ++
> >>  4 files changed, 270 insertions(+), 5 deletions(-)
> >> ...
> >
> >> +#ifdef CONFIG_PCI_IOV
> >> +static struct pci_dn *add_one_dev_pci_info(struct pci_dn *parent,
> >> + struct pci_dev *pdev,
> >> + int busno, int devfn)
> >> +{
> >> +  struct pci_dn *pdn;
> >> +
> >> +  /* Except PHB, we always have parent firmware data */
> >> +  if (!parent)
> >> +  return NULL;
> >> +
> >> +  pdn = kzalloc(sizeof(*pdn), GFP_KERNEL);
> >> +  if (!pdn) {
> >> +  pr_warn("%s: Out of memory !\n", __func__);
> >> +  return NULL;
> >> +  }
> >> +
> >> +  pdn->phb = parent->phb;
> >> +  pdn->parent = parent;
> >> +  pdn->busno = busno;
> >> +  pdn->devfn = devfn;
> >> +#ifdef CONFIG_PPC_POWERNV
> >> +  pdn->pe_number = IODA_INVALID_PE;
> >> +#endif
> >> +  INIT_LIST_HEAD(&pdn->child_list);
> >> +  INIT_LIST_HEAD(&pdn->list);
> >> +  list_add_tail(&pdn->list, &parent->child_list);
> >> +
> >> +  /*
> >> +   * If we already have PCI device instance, lets
> >> +   * bind them.
> >> +   */
> >> +  if (pdev)
> >> +  pdev->dev.archdata.firmware_data = pdn;
> >> +
> >> +  return pdn;
> >
> >I'd like to see this done in pcibios_add_device(), as I mentioned in
> >response to "[PATCH V11 01/17] PCI/IOV: Export interface for retrieve VF's
> >BDF".  Maybe that's not feasible for some reason, but it would be a nicer
> >design if it's possible.
> >
> >The remove_dev_pci_info() work would be done in pcibios_release_device()
> >then, of course.
> >
> 
> Yes, it's not feasible. PCI config accessors rely on VF's pci_dn. Before
> calling pcibios_add_device(), we need access VF's config space.  That means
> we need VF's pci_dn before pci_setup_device() as follows:
> 
> sriov_enable()
> pcibios_sriov_enable(); /* Currently, VF's pci_dn is created at 
> this point */
> virtfn_add();
> virtfn_add_bus();   /* Create virtual bus if necessary */
> /* ---> A */
> pci_alloc_dev();/* ---> B */
> pci_setup_device(vf);   /* Access VF's config space */
> pci_read_config_byte(vf, PCI_HEADER_TYPE);
> pci_read_config_dword(vf, PCI_CLASS_REVISION);
> pci_fixup_device(pci_fixup_early, vf);
> pci_read_irq();
> pci_read_bases();
> pci_device_add(vf);
> device_initialize(&vf->dev);
> pci_fixup_device(pci_fixup_header, vf);
> pci_init_capabilities(vf);
> pcibios_add_device(vf);
> 
> We have couple of options here:
> 
> 1) Keep current code. VF's pci_dn is going to be destroyed in
>pcibios_sriov_disable() as we're doing currently.
> 2) Introduce pcibios_iov_virtfn_add() (at A) for platform to override.
>VF's pci_dn is going to be destroyed in pcibios_release_device().
> 3) Introduce pcibios_alloc_dev() (at B) for platform to override. The
>VF's pci_dn is going to be destroyed in pcibios_release_device().

Ah, yes, now I see the problem.  I don't really like having to export
pci_iov_virtfn_bus() and pci_iov_virtfn_devfn(), but it's probably not
worth the hassle of changing it, and I think adding more pcibios interfaces
would be even worse.

So let's leave it as-is for now.

> >> +}
> >> +#endif // CONFIG_PCI_IOV
> >> +
> >> +struct pci_dn *add_dev_pci_info(struct pci_dev *pdev, u16 vf_