[PATCH] cxl: Fix issues when unmapping contexts
From: Ian Munsie commit 0712dc7e73e59d79bcead5d5520acf4e9e917e87 upstream. for the 3.18 stable series An issue was introduced with "cxl: Unmap MMIO regions when detaching a context" (b123429e6a9e8d03aacf888d23262835f0081448) where closing a context normally could also unmap the problem state area of other contexts currently using the AFU. It was also discovered that after a context's MMIO space had been unmapped it would read 0s when accessing it, whereas the expected behaviour was for the access to fail altogether. In order to address these issues, this patch does two things: - Forced mmap unmapping is only done when we are forcefully detaching all contexts, and not in the normal detach path. Since the normal context close path is tied to the file release any mmaps must have already been released so we don't need to worry in that case. - The mmap path now uses a vm_operations_struct with a fault handler. The fault handler ensures that the context is in started state, otherwise it fails the access attempt with a SIGBUS. Fixes: b123429e6a9e ("cxl: Unmap MMIO regions when detaching a context") Signed-off-by: Ian Munsie Signed-off-by: Michael Ellerman --- drivers/misc/cxl/context.c | 82 +++--- drivers/misc/cxl/file.c| 14 2 files changed, 71 insertions(+), 25 deletions(-) diff --git a/drivers/misc/cxl/context.c b/drivers/misc/cxl/context.c index 51fd6b5..d1b55fe 100644 --- a/drivers/misc/cxl/context.c +++ b/drivers/misc/cxl/context.c @@ -100,6 +100,46 @@ int cxl_context_init(struct cxl_context *ctx, struct cxl_afu *afu, bool master, return 0; } +static int cxl_mmap_fault(struct vm_area_struct *vma, struct vm_fault *vmf) +{ + struct cxl_context *ctx = vma->vm_file->private_data; + unsigned long address = (unsigned long)vmf->virtual_address; + u64 area, offset; + + offset = vmf->pgoff << PAGE_SHIFT; + + pr_devel("%s: pe: %i address: 0x%lx offset: 0x%llx\n", + __func__, ctx->pe, address, offset); + + if (ctx->afu->current_mode == CXL_MODE_DEDICATED) { + area = ctx->afu->psn_phys; + if (offset > ctx->afu->adapter->ps_size) + return VM_FAULT_SIGBUS; + } else { + area = ctx->psn_phys; + if (offset > ctx->psn_size) + return VM_FAULT_SIGBUS; + } + + mutex_lock(&ctx->status_mutex); + + if (ctx->status != STARTED) { + mutex_unlock(&ctx->status_mutex); + pr_devel("%s: Context not started, failing problem state access\n", __func__); + return VM_FAULT_SIGBUS; + } + + vm_insert_pfn(vma, address, (area + offset) >> PAGE_SHIFT); + + mutex_unlock(&ctx->status_mutex); + + return VM_FAULT_NOPAGE; +} + +static const struct vm_operations_struct cxl_mmap_vmops = { + .fault = cxl_mmap_fault, +}; + /* * Map a per-context mmio space into the given vma. */ @@ -108,26 +148,25 @@ int cxl_context_iomap(struct cxl_context *ctx, struct vm_area_struct *vma) u64 len = vma->vm_end - vma->vm_start; len = min(len, ctx->psn_size); - if (ctx->afu->current_mode == CXL_MODE_DEDICATED) { - vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); - return vm_iomap_memory(vma, ctx->afu->psn_phys, ctx->afu->adapter->ps_size); - } + if (ctx->afu->current_mode != CXL_MODE_DEDICATED) { + /* make sure there is a valid per process space for this AFU */ + if ((ctx->master && !ctx->afu->psa) || (!ctx->afu->pp_psa)) { + pr_devel("AFU doesn't support mmio space\n"); + return -EINVAL; + } - /* make sure there is a valid per process space for this AFU */ - if ((ctx->master && !ctx->afu->psa) || (!ctx->afu->pp_psa)) { - pr_devel("AFU doesn't support mmio space\n"); - return -EINVAL; + /* Can't mmap until the AFU is enabled */ + if (!ctx->afu->enabled) + return -EBUSY; } - /* Can't mmap until the AFU is enabled */ - if (!ctx->afu->enabled) - return -EBUSY; - pr_devel("%s: mmio physical: %llx pe: %i master:%i\n", __func__, ctx->psn_phys, ctx->pe , ctx->master); + vma->vm_flags |= VM_IO | VM_PFNMAP; vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); - return vm_iomap_memory(vma, ctx->psn_phys, len); + vma->vm_ops = &cxl_mmap_vmops; + return 0; } /* @@ -150,12 +189,6 @@ static void __detach_context(struct cxl_context *ctx) afu_release_irqs(ctx); flush_work(&ctx->fault_work); /* Only needed for dedicated process */ wake_up_all(&ctx->wq); - - /* Release Problem State Area mapping */ - mutex_lock(&ctx->mapping_lock); - if (ctx->mapping) -
Re: [PATCH 1/3] cxl: Use image state defaults for reloading FPGA
Excerpts from Greg KH's message of 2015-02-25 11:32:29 +1100: > What stable kernel(s) are you wanting this series to go into? Hi Greg, These three patches are for 3.18 and 3.19. Cheers, -Ian ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Build regressions/improvements in v4.0-rc1
On Tue, 2015-02-24 at 10:38 +0100, Geert Uytterhoeven wrote: > Hi Michael, > > On Tue, Feb 24, 2015 at 5:52 AM, Michael Ellerman wrote: > >> > + error: book3s_64_vio_hv.c: undefined reference to > >> > `power7_wakeup_loss': => .text+0x408) > >> > >> pseries_defconfig > > > > This one is actually from pseries_defconfig+POWERNV=n, so I think I > > Thanks! > > > broke your script with the + notation in the config name :) > > Nope, my brain used the wrong separator. I can't help with that :) > However, my scripts do have a problem with the subdirectories > in arch/powerpc/configs/ (4xx/currituck_defconfig)... Yeah sorry, they are a bit of a pain. I'm sure some horrible regexp can deal with it ;) cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc/fsl: add power_off support for fsl platform
On Wed, 2015-02-04 at 14:47 +0800, Dongsheng Wang wrote: > +void ppc_md_fixup(void) > +{ This name is way too generic (though it's moot since you shouldn't use ppc_md for this). > + struct device_node *np; > + > + np = of_find_compatible_node(NULL, NULL, "fsl,fpga-qixis"); > + if (!np) > + return; > + > + of_node_put(np); > + > + pm_power_off = fsl_power_off; > + ppc_md.halt = fsl_power_off; > +} Please implement this as a drivers/power/reset driver, and consider basing on top of http://lists.infradead.org/pipermail/linux-arm-kernel/2014-October/293089.html -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v6 1/2] powerpc/mpc85xx: Add FSL QorIQ DPAA BMan support to device tree(s)
On Mon, 2015-02-02 at 00:53 -0600, Emil Medve wrote: > From: Kumar Gala > > Change-Id: If643fa5ba0a903aef8f5056a2c90ebecc995b760 Remove these. -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v4 1/2] powerpc/corenet: Enable muxing MDIO buses via GPIO
On Sun, 2015-02-01 at 15:48 -0600, Emil Medve wrote: > From: Andy Fleming > > Change-Id: I4489db79957ad533f4ba3f04fe7d5bcb3288e981 Again, remove these. -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 6/8] clk: ppc-corenet: Replace kzalloc() with kmalloc()
On Tue, 2015-01-20 at 04:09 -0600, Emil Medve wrote: > Where the memset() is not necessary > > Signed-off-by: Emil Medve > --- > drivers/clk/clk-ppc-corenet.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/clk/clk-ppc-corenet.c b/drivers/clk/clk-ppc-corenet.c > index d84a7f0..91816b1 100644 > --- a/drivers/clk/clk-ppc-corenet.c > +++ b/drivers/clk/clk-ppc-corenet.c > @@ -185,7 +185,7 @@ static void __init core_pll_init(struct device_node *np) > if (!subclks) > goto err_map; > > - onecell_data = kzalloc(sizeof(*onecell_data), GFP_KERNEL); > + onecell_data = kmalloc(sizeof(*onecell_data), GFP_KERNEL); > if (!onecell_data) > goto err_clks; > I think it's better to use kzalloc always, outside of performance-sensitive allocations. E.g. what if a new field is added to the struct later? -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 7/8] powerpc/corenet: Enable CLK_PPC_CORENET
On Tue, 2015-01-20 at 04:09 -0600, Emil Medve wrote: > Change-Id: I1a80ad7b9f6854791bd270b746f93a91439155a6 > Signed-off-by: Emil Medve No Change-Id, and don't bundle patches meant for my tree in the same patchset as patches meant for other trees. There's no dependency between them. -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/3] cxl: Use image state defaults for reloading FPGA
On Mon, Feb 23, 2015 at 03:21:19PM +1100, Michael Ellerman wrote: > From: Ryan Grimm > > Commit 4beb5421babee1204757b877622830c6aa31be6d upstream. > > Select defaults such that a PERST causes flash image reload. Select which > image based on what the card is set up to load. > > CXL_VSEC_PERST_LOADS_IMAGE selects whether PERST assertion causes flash image > load. > > CXL_VSEC_PERST_SELECT_USER selects which image is loaded on the next PERST. > > cxl_update_image_control writes these bits into the VSEC. > > Signed-off-by: Ryan Grimm > Acked-by: Ian Munsie > Signed-off-by: Michael Ellerman > --- > drivers/misc/cxl/cxl.h | 1 + > drivers/misc/cxl/pci.c | 42 -- > 2 files changed, 41 insertions(+), 2 deletions(-) What stable kernel(s) are you wanting this series to go into? thanks, greg k-h ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 2/3] powerpc/dma: Support 32-bit coherent mask with 64-bit dma_mask
On Wed, 2015-02-25 at 07:40 +1100, Benjamin Herrenschmidt wrote: > On Tue, 2015-02-24 at 14:34 -0600, Scott Wood wrote: > > On Fri, 2015-02-20 at 19:35 +1100, Benjamin Herrenschmidt wrote: > > > static u64 dma_direct_get_required_mask(struct device *dev) > > > diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c > > > index f146ef0..a7f15e2 100644 > > > --- a/arch/powerpc/mm/mem.c > > > +++ b/arch/powerpc/mm/mem.c > > > @@ -277,6 +277,11 @@ int dma_pfn_limit_to_zone(u64 pfn_limit) > > > return -EPERM; > > > } > > > > > > +u64 dma_get_zone_limit(int zone) > > > +{ > > > + return max_zone_pfns[zone] << PAGE_SHIFT; > > > +} > > > > If you must do this in terms of bytes rather than pfn, cast to u64 > > before shifting -- and even then the result will be PAGE_SIZE - 1 too > > small. > > Do we have RAM above what a unsigned long can hold ? I think I'll just > make it a pfn and respin... Yes, we can have over 4 GiB RAM on 32-bit. -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc: Export __spin_yield
On Tue, 2015-02-24 at 10:37 -0600, Suresh E. Warrier wrote: > On 02/23/2015 09:38 PM, Benjamin Herrenschmidt wrote: > > On Mon, 2015-02-23 at 18:10 -0600, Suresh E. Warrier wrote: > >> Export __spin_yield so that the arch_spin_unlock() function > >> can be invoked from a module. > > > > Make it EXPORT_SYMBOL_GPL. Also explain why a module might need it > > > > Sure, I will change that to EXPORT_SYMBOL_GPL. Just curious, though, > there is another symbol arch_spin_unlock_wait that is exported from > the file without the _GPL prefix. Any idea why? Nope. Not sure how come we did that. > I have mentioned that this needs to be exported to call the > arch_spin_unlock() function from a module. What additional information > do you think will be useful here ? Are you looking at something > that explains why a module might need to call arch_spin_unlock()? What kind of module might need it... Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 2/3] powerpc/dma: Support 32-bit coherent mask with 64-bit dma_mask
On Fri, 2015-02-20 at 19:35 +1100, Benjamin Herrenschmidt wrote: > @@ -149,14 +141,13 @@ static void dma_direct_unmap_sg(struct device *dev, > struct scatterlist *sg, > > static int dma_direct_dma_supported(struct device *dev, u64 mask) > { > -#ifdef CONFIG_PPC64 > - /* Could be improved so platforms can set the limit in case > - * they have limited DMA windows > - */ > - return mask >= get_dma_offset(dev) + (memblock_end_of_DRAM() - 1); > -#else > - return 1; > + u64 offset = get_dma_offset(dev); > + u64 limit = offset + memblock_end_of_DRAM() - 1; > + > +#if defined(CONFIG_ZONE_DMA32) > + limit = offset + dma_get_zone_limit(ZONE_DMA32); > #endif > + return mask >= limit; > } I'm confused as to whether dma_supported() is supposed to be testing a coherent mask or regular mask... The above suggests coherent, as does the call to dma_supported() in dma_set_coherent_mask(), but if swiotlb is used, swiotlb_dma_supported() will only check for a mask that can accommodate io_tlb_end, without regard for coherent allocations. > static u64 dma_direct_get_required_mask(struct device *dev) > diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c > index f146ef0..a7f15e2 100644 > --- a/arch/powerpc/mm/mem.c > +++ b/arch/powerpc/mm/mem.c > @@ -277,6 +277,11 @@ int dma_pfn_limit_to_zone(u64 pfn_limit) > return -EPERM; > } > > +u64 dma_get_zone_limit(int zone) > +{ > + return max_zone_pfns[zone] << PAGE_SHIFT; > +} If you must do this in terms of bytes rather than pfn, cast to u64 before shifting -- and even then the result will be PAGE_SIZE - 1 too small. -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Problems with Kernels 3.17-rc1 and onwards on Acube Sam460 AMCC 460ex board
Thanks after skipping several times : git bisect skip There are only 'skip'ped commits left to test. The first bad commit could be any of: b486e0e6d599b9ca8667fb9a7d49b7383ee963c7 eab3bbeffd152125ae0f90863b8e9bc8eef49423 960cd9d4fef6dd9e235c0e5c0d4ed027f8a48025 f02ad907cd9e7fe3a6405d2d005840912f1ed258 6a425c2a9b37ca3d2c37e3c1cdf973dba53eaa79 ee0a89cf3c2c550e6d877dda21dd2947afb90cb6 92890583627ee2a0518e55b063fcff86826fef96 95d6eb3b134e1826ed04cc92b224d93de13e281f 9469244d869623e8b54d9f3d4d00737e377af273 We cannot bisect more! On 2/24/2015 3:14 PM, Gerhard Pircher wrote: Am 2015-02-24 um 12:08 schrieb Julian Margetson: Problems with the Gib bisect Kernel wont compile after 10th bisect . You can try "git bisect skip" to select another commit for testing. Hopefully that one compiles fine then. Gerhard drivers/built-in.o: In function `drm_mode_atomic_ioctl': (.text+0x865dc): undefined reference to `__get_user_bad' make: *** [vmlinux] Error 1 root@julian-VirtualBox:/usr/src/linux# git bisect log git bisect start # bad: [c517d838eb7d07bbe9507871fab3931deccff539] Linux 4.0-rc1 git bisect bad c517d838eb7d07bbe9507871fab3931deccff539 # good: [bfa76d49576599a4b9f9b7a71f23d73d6dcff735] Linux 3.19 git bisect good bfa76d49576599a4b9f9b7a71f23d73d6dcff735 # good: [02f1f2170d2831b3233e91091c60a66622f29e82] kernel.h: remove ancient __FUNCTION__ hack git bisect good 02f1f2170d2831b3233e91091c60a66622f29e82 # bad: [796e1c55717e9a6ff5c81b12289ffa1ffd919b6f] Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux git bisect bad 796e1c55717e9a6ff5c81b12289ffa1ffd919b6f # good: [9682ec9692e5ac11c6caebd079324e727b19e7ce] Merge tag 'driver-core-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core git bisect good 9682ec9692e5ac11c6caebd079324e727b19e7ce # good: [a9724125ad014decf008d782e60447c811391326] Merge tag 'tty-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty git bisect good a9724125ad014decf008d782e60447c811391326 # good: [f43dff0ee00a259f524ce17ba4f8030553c66590] Merge tag 'drm-amdkfd-next-fixes-2015-01-25' of git://people.freedesktop.org/~gabbayo/linux into drm-next git bisect good f43dff0ee00a259f524ce17ba4f8030553c66590 # bad: [cffe1e89dc9bf541a39d9287ced7c5addff07084] drm: sti: HDMI add audio infoframe git bisect bad cffe1e89dc9bf541a39d9287ced7c5addff07084 # good: [2f5b4ef15c60bc5292a3f006c018acb3da53737b] Merge tag 'drm/tegra/for-3.20-rc1' of git://anongit.freedesktop.org/tegra/linux into drm-next git bisect good 2f5b4ef15c60bc5292a3f006c018acb3da53737b # bad: [86588ce80ccd714793e9ba4140d7ae214229] drm/udl: optimize udl_compress_hline16 (v2) git bisect bad 86588ce80ccd714793e9ba4140d7ae214229 # bad: [d47df63393ed81977e0f6435988d9cbd70c867f7] drm/panel: simple: Add AVIC TM070DDH03 panel support git bisect bad d47df63393ed81977e0f6435988d9cbd70c867f7 # bad: [9469244d869623e8b54d9f3d4d00737e377af273] drm/atomic: Fix potential use of state after free git bisect bad 9469244d869623e8b54d9f3d4d00737e377af273 root@julian-VirtualBox:/usr/src/linux# On 02/24/2015, you wrote: On Fri, 2015-02-20 at 15:25 -0400, Julian Margetson wrote: On 2/18/2015 11:25 PM, Julian Margetson wrote: re PPC4XX PCI(E) MSI support. https://lists.ozlabs.org/pipermail/linuxppc-dev/2010-November/087273.html Hmm, I think all those comments were addressed before it was merged. I tried to get a 4xx board going here last week, but it doesn't seem happy. I can get a bit of uboot but then it hangs, might be overheating. cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 2/3] powerpc/dma: Support 32-bit coherent mask with 64-bit dma_mask
On Tue, 2015-02-24 at 14:34 -0600, Scott Wood wrote: > On Fri, 2015-02-20 at 19:35 +1100, Benjamin Herrenschmidt wrote: > > @@ -149,14 +141,13 @@ static void dma_direct_unmap_sg(struct device *dev, > > struct scatterlist *sg, > > > > static int dma_direct_dma_supported(struct device *dev, u64 mask) > > { > > -#ifdef CONFIG_PPC64 > > - /* Could be improved so platforms can set the limit in case > > -* they have limited DMA windows > > -*/ > > - return mask >= get_dma_offset(dev) + (memblock_end_of_DRAM() - 1); > > -#else > > - return 1; > > + u64 offset = get_dma_offset(dev); > > + u64 limit = offset + memblock_end_of_DRAM() - 1; > > + > > +#if defined(CONFIG_ZONE_DMA32) > > + limit = offset + dma_get_zone_limit(ZONE_DMA32); > > #endif > > + return mask >= limit; > > } > > I'm confused as to whether dma_supported() is supposed to be testing a > coherent mask or regular mask... The above suggests coherent, as does > the call to dma_supported() in dma_set_coherent_mask(), but if swiotlb > is used, swiotlb_dma_supported() will only check for a mask that can > accommodate io_tlb_end, without regard for coherent allocations. This is confusing indeed, but without the above, dma_set_coherent_mask() won't work ... so I'm assuming the above. Notice that x86 doesn't even bother and basically return 1 for anything above a 24 bit mask (appart from the force_sac case but we can ignore it). So we probably should fix our swiotlb implementation as well... but that's orthogonal. > > static u64 dma_direct_get_required_mask(struct device *dev) > > diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c > > index f146ef0..a7f15e2 100644 > > --- a/arch/powerpc/mm/mem.c > > +++ b/arch/powerpc/mm/mem.c > > @@ -277,6 +277,11 @@ int dma_pfn_limit_to_zone(u64 pfn_limit) > > return -EPERM; > > } > > > > +u64 dma_get_zone_limit(int zone) > > +{ > > + return max_zone_pfns[zone] << PAGE_SHIFT; > > +} > > If you must do this in terms of bytes rather than pfn, cast to u64 > before shifting -- and even then the result will be PAGE_SIZE - 1 too > small. Do we have RAM above what a unsigned long can hold ? I think I'll just make it a pfn and respin... Cheers, Ben. > -Scott > ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 5/5] crypto: talitos: Add software backlog queue handling
On 2/20/2015 6:21 PM, Martin Hicks wrote: > I was running into situations where the hardware FIFO was filling up, and > the code was returning EAGAIN to dm-crypt and just dropping the submitted > crypto request. > > This adds support in talitos for a software backlog queue. When requests > can't be queued to the hardware immediately EBUSY is returned. The queued > requests are dispatched to the hardware in received order as hardware FIFO > slots become available. > > Signed-off-by: Martin Hicks Hi Martin, Thanks for the effort! Indeed we noticed that talitos (and caam) don't play nicely with dm-crypt, lacking a backlog mechanism. Please run checkpatch --strict and fix the errors, warnings. > --- > drivers/crypto/talitos.c | 92 > +++--- > drivers/crypto/talitos.h |3 ++ > 2 files changed, 74 insertions(+), 21 deletions(-) > > diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c > index d3472be..226654c 100644 > --- a/drivers/crypto/talitos.c > +++ b/drivers/crypto/talitos.c > @@ -183,43 +183,72 @@ static int init_device(struct device *dev) > } > > /** > - * talitos_submit - submits a descriptor to the device for processing > + * talitos_handle_queue - performs submissions either of new descriptors > + *or ones waiting in the queue backlog. > * @dev: the SEC device to be used > * @ch: the SEC device channel to be used > - * @edesc: the descriptor to be processed by the device > - * @context: a handle for use by caller (optional) The "context" kernel-doc should have been removed in patch 4/5. > + * @edesc: the descriptor to be processed by the device (optional) > * > * desc must contain valid dma-mapped (bus physical) address pointers. > * callback must check err and feedback in descriptor header > - * for device processing status. > + * for device processing status upon completion. > */ > -int talitos_submit(struct device *dev, int ch, struct talitos_edesc *edesc) > +int talitos_handle_queue(struct device *dev, int ch, struct talitos_edesc > *edesc) > { > struct talitos_private *priv = dev_get_drvdata(dev); > - struct talitos_request *request = &edesc->req; > + struct talitos_request *request, *orig_request = NULL; > + struct crypto_async_request *async_req; > unsigned long flags; > int head; > + int ret = -EINPROGRESS; > > spin_lock_irqsave(&priv->chan[ch].head_lock, flags); > > + if (edesc) { > + orig_request = &edesc->req; > + crypto_enqueue_request(&priv->chan[ch].queue, > &orig_request->base); > + } The request goes through the SW queue even if there are empty slots in the HW queue, doing unnecessary crypto_queue_encrypt() and crypto_queue_decrypt(). Trying to use the HW queue first would be better. > + > +flush_another: > + if (priv->chan[ch].queue.qlen == 0) { > + spin_unlock_irqrestore(&priv->chan[ch].head_lock, flags); > + return 0; > + } > + > if (!atomic_inc_not_zero(&priv->chan[ch].submit_count)) { > /* h/w fifo is full */ > spin_unlock_irqrestore(&priv->chan[ch].head_lock, flags); > - return -EAGAIN; > + return -EBUSY; > } > > - head = priv->chan[ch].head; > + /* Dequeue the oldest request */ > + async_req = crypto_dequeue_request(&priv->chan[ch].queue); > + > + request = container_of(async_req, struct talitos_request, base); > request->dma_desc = dma_map_single(dev, request->desc, > sizeof(*request->desc), > DMA_BIDIRECTIONAL); > > /* increment fifo head */ > + head = priv->chan[ch].head; > priv->chan[ch].head = (priv->chan[ch].head + 1) & (priv->fifo_len - 1); > > - smp_wmb(); > - priv->chan[ch].fifo[head] = request; > + spin_unlock_irqrestore(&priv->chan[ch].head_lock, flags); > + > + /* > + * Mark a backlogged request as in-progress, return EBUSY because > + * the original request that was submitted is backlogged. s/is backlogged/is backlogged or dropped Original request will not be enqueued by crypto_queue_enqueue() if the CRYPTO_TFM_REQ_MAY_BACKLOG flag is not set (since SW queue is for backlog only) - that's the case for IPsec requests. > + */ > + if (request != orig_request) { > + struct crypto_async_request *areq = request->context; > + areq->complete(areq, -EINPROGRESS); > + ret = -EBUSY; > + } > + > + spin_lock_irqsave(&priv->chan[ch].head_lock, flags); > > /* GO! */ > + priv->chan[ch].fifo[head] = request; > wmb(); > out_be32(priv->chan[ch].reg + TALITOS_FF, >upper_32_bits(request->dma_desc)); > @@ -228,9 +257,18 @@ int talitos_submit(struct device *dev, int ch, struct > talitos_edesc *edesc) > > spin_unlock_irqrestore(&priv->
[PATCH v1 3/3] SHA1 for PPC/SPE - kernel config
[PATCH v1 3/3] SHA1 for PPC/SPE - kernel config Integrate the module into the kernel config tree. Signed-off-by: Markus Stockhausen diff --git a/arch/powerpc/crypto/Makefile b/arch/powerpc/crypto/Makefile index 1698fb9..d400bf9 100644 --- a/arch/powerpc/crypto/Makefile +++ b/arch/powerpc/crypto/Makefile @@ -6,8 +6,10 @@ obj-$(CONFIG_CRYPTO_AES_PPC_SPE) += aes-ppc-spe.o obj-$(CONFIG_CRYPTO_SHA1_PPC) += sha1-powerpc.o +obj-$(CONFIG_CRYPTO_SHA1_PPC_SPE) += sha1-ppc-spe.o obj-$(CONFIG_CRYPTO_SHA256_PPC_SPE) += sha256-ppc-spe.o aes-ppc-spe-y := aes-spe-core.o aes-spe-keys.o aes-tab-4k.o aes-spe-modes.o aes_spe_glue.o sha1-powerpc-y := sha1-powerpc-asm.o sha1.o +sha1-ppc-spe-y := sha1-spe-asm.o sha1_spe_glue.o sha256-ppc-spe-y := sha256-spe-asm.o sha256_spe_glue.o diff --git a/crypto/Kconfig b/crypto/Kconfig index f34d136..7fc084f 100644 --- a/crypto/Kconfig +++ b/crypto/Kconfig @@ -573,6 +573,13 @@ config CRYPTO_SHA1_PPC This is the powerpc hardware accelerated implementation of the SHA-1 secure hash standard (FIPS 180-1/DFIPS 180-2). +config CRYPTO_SHA1_PPC_SPE + tristate "SHA1 digest algorithm (PPC SPE)" + depends on PPC && SPE + help + SHA-1 secure hash standard (DFIPS 180-4) implemented + using powerpc SPE SIMD instruction set. + config CRYPTO_SHA1_MB tristate "SHA1 digest algorithm (x86_64 Multi-Buffer, Experimental)" depends on X86 && 64BIT Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail ist nicht gestattet. Ãber das Internet versandte E-Mails können unter fremden Namen erstellt oder manipuliert werden. Deshalb ist diese als E-Mail verschickte Nachricht keine rechtsverbindliche Willenserklärung. Collogia Unternehmensberatung AG Ubierring 11 D-50678 Köln Vorstand: Kadir Akin Dr. Michael Höhnerbach Vorsitzender des Aufsichtsrates: Hans Kristian Langva Registergericht: Amtsgericht Köln Registernummer: HRB 52 497 This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden. e-mails sent over the internet may have been written under a wrong name or been manipulated. That is why this message sent as an e-mail is not a legally binding declaration of intention. Collogia Unternehmensberatung AG Ubierring 11 D-50678 Köln executive board: Kadir Akin Dr. Michael Höhnerbach President of the supervisory board: Hans Kristian Langva Registry office: district court Cologne Register number: HRB 52 497 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v1 2/3] SHA1 for PPC/SPE - glue
[PATCH v1 2/3] SHA1 for PPC/SPE - glue Glue code for crypto infrastructure. Call the assembler code where required. Disable preemption during calculation and enable SPE instructions in the kernel prior to the call. Avoid to disable preemption for too long. Take a little care about small input data. Kick out early for input chunks < 64 bytes and replace memset for context cleanup with simple loop. Signed-off-by: Markus Stockhausen diff --git a/arch/powerpc/sha1_spe_glue.c b/arch/powerpc/sha1_spe_glue.c new file mode 100644 index 000..3e1d222 --- /dev/null +++ b/arch/powerpc/sha1_spe_glue.c @@ -0,0 +1,210 @@ +/* + * Glue code for SHA-1 implementation for SPE instructions (PPC) + * + * Based on generic implementation. + * + * Copyright (c) 2015 Markus Stockhausen + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License as published by the Free + * Software Foundation; either version 2 of the License, or (at your option) + * any later version. + * + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +/* + * MAX_BYTES defines the number of bytes that are allowed to be processed + * between preempt_disable() and preempt_enable(). SHA1 takes ~1000 + * operations per 64 bytes. e500 cores can issue two arithmetic instructions + * per clock cycle using one 32/64 bit unit (SU1) and one 32 bit unit (SU2). + * Thus 2KB of input data will need an estimated maximum of 18,000 cycles. + * Headroom for cache misses included. Even with the low end model clocked + * at 667 MHz this equals to a critical time window of less than 27us. + * + */ +#define MAX_BYTES 2048 + +extern void ppc_spe_sha1_transform(u32 *state, const u8 *src, u32 blocks); + +static void spe_begin(void) +{ + /* We just start SPE operations and will save SPE registers later. */ + preempt_disable(); + enable_kernel_spe(); +} + +static void spe_end(void) +{ + /* reenable preemption */ + preempt_enable(); +} + +static inline void ppc_sha1_clear_context(struct sha1_state *sctx) +{ + int count = sizeof(struct sha1_state) >> 2; + u32 *ptr = (u32 *)sctx; + + /* make sure we can clear the fast way */ + BUILD_BUG_ON(sizeof(struct sha1_state) % 4); + do { *ptr++ = 0; } while (--count); +} + +static int ppc_spe_sha1_init(struct shash_desc *desc) +{ + struct sha1_state *sctx = shash_desc_ctx(desc); + + sctx->state[0] = SHA1_H0; + sctx->state[1] = SHA1_H1; + sctx->state[2] = SHA1_H2; + sctx->state[3] = SHA1_H3; + sctx->state[4] = SHA1_H4; + sctx->count = 0; + + return 0; +} + +static int ppc_spe_sha1_update(struct shash_desc *desc, const u8 *data, + unsigned int len) +{ + struct sha1_state *sctx = shash_desc_ctx(desc); + const unsigned int offset = sctx->count & 0x3f; + const unsigned int avail = 64 - offset; + unsigned int bytes; + const u8 *src = data; + + if (avail > len) { + sctx->count += len; + memcpy((char *)sctx->buffer + offset, src, len); + return 0; + } + + sctx->count += len; + + if (offset) { + memcpy((char *)sctx->buffer + offset, src, avail); + + spe_begin(); + ppc_spe_sha1_transform(sctx->state, (const u8 *)sctx->buffer, 1); + spe_end(); + + len -= avail; + src += avail; + } + + while (len > 63) { + bytes = (len > MAX_BYTES) ? MAX_BYTES : len; + bytes = bytes & ~0x3f; + + spe_begin(); + ppc_spe_sha1_transform(sctx->state, src, bytes >> 6); + spe_end(); + + src += bytes; + len -= bytes; + }; + + memcpy((char *)sctx->buffer, src, len); + return 0; +} + +static int ppc_spe_sha1_final(struct shash_desc *desc, u8 *out) +{ + struct sha1_state *sctx = shash_desc_ctx(desc); + const unsigned int offset = sctx->count & 0x3f; + char *p = (char *)sctx->buffer + offset; + int padlen; + __be64 *pbits = (__be64 *)(((char *)&sctx->buffer) + 56); + __be32 *dst = (__be32 *)out; + + padlen = 55 - offset; + *p++ = 0x80; + + spe_begin(); + + if (padlen < 0) { + memset(p, 0x00, padlen + sizeof (u64)); + ppc_spe_sha1_transform(sctx->state, sctx->buffer, 1); + p = (char *)sctx->buffer; + padlen = 56; + } + + memset(p, 0, padlen); + *pbits = cpu_to_be64(sctx->count << 3); + ppc_spe_sha1_transform(sctx->state, sctx->buffer, 1); + + spe_end(); + + dst[0] = cpu_to_be32(sctx->state[0]); + dst[1] = cpu_to_be32(sctx->state[1]); + dst[2] = cpu_to_be32(sctx->state[2]); + dst[3] = cpu_to_be32(sctx->state[3]); + dst[4] = cpu_to_be32(s
[PATCH v1 0/3] SHA1 for PPC/SPE
[PATCH v1 0/3] SHA1 for PPC/SPE The following patches add support for SIMD accelerated SHA1 calculation on PPC processors with SPE instruction set. The implementation takes care of the following constraints: - independant of processor endianess - save SPE registers for interrupt context compatibility - disable preemtion only for short intervals Performance numbers from insmod tcrypt sec=3 mode=303 taken on e500v2 800 MHz (TP Link WDR4900) dataper sha1-ppc this patch speedup cycles length update bytes/secbytes/secfactor per byte -- -- --- --- --- 16 169,686,688 13,195,285 x1.36 60.63 64 16 18,769,344 21,886,122 x1.17 36.55 64 64 26,187,712 33,181,184 x1.27 24.11 256 16 27,461,120 29,614,080 x1.08 27.01 256 64 45,257,898 52,748,373 x1.17 15.17 256 256 56,050,773 68,863,061 x1.23 11.62 1024 16 30,863,360 32,438,272 x1.05 24.66 1024 256 72,531,626 85,434,709 x1.18 9.36 10241024 78,640,469 94,731,605 x1.20 8.44 2048 16 31,771,989 32,970,752 x1.04 24.26 2048 256 76,478,464 89,234,090 x1.17 8.97 20481024 83,010,218 98,902,698 x1.19 8.09 20482048 84,336,640 101,038,762 x1.19 7.92 Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail ist nicht gestattet. Ãber das Internet versandte E-Mails können unter fremden Namen erstellt oder manipuliert werden. Deshalb ist diese als E-Mail verschickte Nachricht keine rechtsverbindliche Willenserklärung. Collogia Unternehmensberatung AG Ubierring 11 D-50678 Köln Vorstand: Kadir Akin Dr. Michael Höhnerbach Vorsitzender des Aufsichtsrates: Hans Kristian Langva Registergericht: Amtsgericht Köln Registernummer: HRB 52 497 This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden. e-mails sent over the internet may have been written under a wrong name or been manipulated. That is why this message sent as an e-mail is not a legally binding declaration of intention. Collogia Unternehmensberatung AG Ubierring 11 D-50678 Köln executive board: Kadir Akin Dr. Michael Höhnerbach President of the supervisory board: Hans Kristian Langva Registry office: district court Cologne Register number: HRB 52 497 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v1 1/3] SHA1 for PPC/SPE - assembler
[PATCH v1 1/3] SHA1 for PPC/SPE - assembler This is the assembler code for SHA1 implementation with the SIMD SPE instruction set. With the enhanced instruction set we can operate on 2 32 bit words in parallel. That helps reducing the time to calculate W16-W79. For increasing performance even more the assembler function can compute hashes for more than one 64 byte input block. The state of the used SPE registers is preserved via the stack so we can run from interrupt context. There might be the case that we interrupt ourselves and push sensitive data from another context onto our stack. Clear this area in the stack afterwards to avoid information leakage. The code is endian independant. Signed-off-by: Markus Stockhausen diff --git a/arch/powerpc/sha1-spe-asm.S b/arch/powerpc/sha1-spe-asm.S new file mode 100644 index 000..fcb6cf0 --- /dev/null +++ b/arch/powerpc/sha1-spe-asm.S @@ -0,0 +1,299 @@ +/* + * Fast SHA-1 implementation for SPE instruction set (PPC) + * + * This code makes use of the SPE SIMD instruction set as defined in + * http://cache.freescale.com/files/32bit/doc/ref_manual/SPEPIM.pdf + * Implementation is based on optimization guide notes from + * http://cache.freescale.com/files/32bit/doc/app_note/AN2665.pdf + * + * Copyright (c) 2015 Markus Stockhausen + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License as published by the Free + * Software Foundation; either version 2 of the License, or (at your option) + * any later version. + * + */ + +#include +#include + +#define rHPr3 /* pointer to hash value*/ +#define rWPr4 /* pointer to input */ +#define rKPr5 /* pointer to constants */ + +#define rW0r14 /* 64 bit round words */ +#define rW1r15 +#define rW2r16 +#define rW3r17 +#define rW4r18 +#define rW5r19 +#define rW6r20 +#define rW7r21 + +#define rH0r6 /* 32 bit hash values */ +#define rH1r7 +#define rH2r8 +#define rH3r9 +#define rH4r10 + +#define rT0r22 /* 64 bit temporary */ +#define rT1r0 /* 32 bit temporaries */ +#define rT2r11 +#define rT3r12 + +#define rK r23 /* 64 bit constant in volatile register */ + +#define LOAD_K01 + +#define LOAD_K11 \ + evlwwsplat rK,0(rKP); + +#define LOAD_K21 \ + evlwwsplat rK,4(rKP); + +#define LOAD_K31 \ + evlwwsplat rK,8(rKP); + +#define LOAD_K41 \ + evlwwsplat rK,12(rKP); + +#define INITIALIZE \ + stwur1,-128(r1);/* create stack frame */ \ + evstdw r14,8(r1); /* We must save non volatile*/ \ + evstdw r15,16(r1); /* registers. Take the chance */ \ + evstdw r16,24(r1); /* and save the SPE part too*/ \ + evstdw r17,32(r1);\ + evstdw r18,40(r1);\ + evstdw r19,48(r1);\ + evstdw r20,56(r1);\ + evstdw r21,64(r1);\ + evstdw r22,72(r1);\ + evstdw r23,80(r1); + + +#define FINALIZE \ + evldw r14,8(r1); /* restore SPE registers*/ \ + evldw r15,16(r1);\ + evldw r16,24(r1);\ + evldw r17,32(r1);\ + evldw r18,40(r1);\ + evldw r19,48(r1);\ + evldw r20,56(r1);\ + evldw r21,64(r1);\ + evldw r22,72(r1);\ + evldw r23,80(r1);\ + xor r0,r0,r0; \ + stw r0,8(r1); /* Delete sensitive data*/ \ + stw r0,16(r1); /* that we might have pushed*/ \ + stw r0,24(r1); /* from other context that runs */ \ + stw r0,32(r1); /* the same code. Assume that */ \ + stw r0,40(r1); /* the lower part of the GPRs */ \ + stw r0,48(r1); /* were already overwritten on */ \ + stw r0,56(r1); /* the way down to here */ \ + stw r0,64(r1);
Re: Problems with Kernels 3.17-rc1 and onwards on Acube Sam460 AMCC 460ex board
Am 2015-02-24 um 12:08 schrieb Julian Margetson: > Problems with the Gib bisect > Kernel wont compile after 10th bisect . You can try "git bisect skip" to select another commit for testing. Hopefully that one compiles fine then. Gerhard > drivers/built-in.o: In function `drm_mode_atomic_ioctl': > (.text+0x865dc): undefined reference to `__get_user_bad' > make: *** [vmlinux] Error 1 > root@julian-VirtualBox:/usr/src/linux# git bisect log > git bisect start > # bad: [c517d838eb7d07bbe9507871fab3931deccff539] Linux 4.0-rc1 > git bisect bad c517d838eb7d07bbe9507871fab3931deccff539 > # good: [bfa76d49576599a4b9f9b7a71f23d73d6dcff735] Linux 3.19 > git bisect good bfa76d49576599a4b9f9b7a71f23d73d6dcff735 > # good: [02f1f2170d2831b3233e91091c60a66622f29e82] kernel.h: remove ancient > __FUNCTION__ hack > git bisect good 02f1f2170d2831b3233e91091c60a66622f29e82 > # bad: [796e1c55717e9a6ff5c81b12289ffa1ffd919b6f] Merge branch 'drm-next' of > git://people.freedesktop.org/~airlied/linux > git bisect bad 796e1c55717e9a6ff5c81b12289ffa1ffd919b6f > # good: [9682ec9692e5ac11c6caebd079324e727b19e7ce] Merge tag > 'driver-core-3.20-rc1' of > git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core > git bisect good 9682ec9692e5ac11c6caebd079324e727b19e7ce > # good: [a9724125ad014decf008d782e60447c811391326] Merge tag 'tty-3.20-rc1' > of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty > git bisect good a9724125ad014decf008d782e60447c811391326 > # good: [f43dff0ee00a259f524ce17ba4f8030553c66590] Merge tag > 'drm-amdkfd-next-fixes-2015-01-25' of > git://people.freedesktop.org/~gabbayo/linux into drm-next > git bisect good f43dff0ee00a259f524ce17ba4f8030553c66590 > # bad: [cffe1e89dc9bf541a39d9287ced7c5addff07084] drm: sti: HDMI add audio > infoframe > git bisect bad cffe1e89dc9bf541a39d9287ced7c5addff07084 > # good: [2f5b4ef15c60bc5292a3f006c018acb3da53737b] Merge tag > 'drm/tegra/for-3.20-rc1' of git://anongit.freedesktop.org/tegra/linux into > drm-next > git bisect good 2f5b4ef15c60bc5292a3f006c018acb3da53737b > # bad: [86588ce80ccd714793e9ba4140d7ae214229] drm/udl: optimize > udl_compress_hline16 (v2) > git bisect bad 86588ce80ccd714793e9ba4140d7ae214229 > # bad: [d47df63393ed81977e0f6435988d9cbd70c867f7] drm/panel: simple: Add AVIC > TM070DDH03 panel support > git bisect bad d47df63393ed81977e0f6435988d9cbd70c867f7 > # bad: [9469244d869623e8b54d9f3d4d00737e377af273] drm/atomic: Fix potential > use of state after free > git bisect bad 9469244d869623e8b54d9f3d4d00737e377af273 > root@julian-VirtualBox:/usr/src/linux# > > > > On 02/24/2015, you wrote: > >> On Fri, 2015-02-20 at 15:25 -0400, Julian Margetson wrote: >>> On 2/18/2015 11:25 PM, Julian Margetson wrote: >> >>> re PPC4XX PCI(E) MSI support. >>> https://lists.ozlabs.org/pipermail/linuxppc-dev/2010-November/087273.html >> >> Hmm, I think all those comments were addressed before it was merged. >> >> I tried to get a 4xx board going here last week, but it doesn't seem happy. I >> can get a bit of uboot but then it hangs, might be overheating. >> >> cheers >> >> > > > ___ > Linuxppc-dev mailing list > Linuxppc-dev@lists.ozlabs.org > https://lists.ozlabs.org/listinfo/linuxppc-dev > ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Problems with Kernels 3.17-rc1 and onwards on Acube Sam460 AMCC 460ex board
On 2/24/2015 7:10 AM, Julian Margetson wrote: Problems with the Gib bisect Kernel wont compile after 10th bisect . drivers/built-in.o: In function `drm_mode_atomic_ioctl': (.text+0x865dc): undefined reference to `__get_user_bad' make: *** [vmlinux] Error 1 root@julian-VirtualBox:/usr/src/linux# git bisect log git bisect start # bad: [c517d838eb7d07bbe9507871fab3931deccff539] Linux 4.0-rc1 git bisect bad c517d838eb7d07bbe9507871fab3931deccff539 # good: [bfa76d49576599a4b9f9b7a71f23d73d6dcff735] Linux 3.19 git bisect good bfa76d49576599a4b9f9b7a71f23d73d6dcff735 # good: [02f1f2170d2831b3233e91091c60a66622f29e82] kernel.h: remove ancient __FUNCTION__ hack git bisect good 02f1f2170d2831b3233e91091c60a66622f29e82 # bad: [796e1c55717e9a6ff5c81b12289ffa1ffd919b6f] Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux git bisect bad 796e1c55717e9a6ff5c81b12289ffa1ffd919b6f # good: [9682ec9692e5ac11c6caebd079324e727b19e7ce] Merge tag 'driver-core-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core git bisect good 9682ec9692e5ac11c6caebd079324e727b19e7ce # good: [a9724125ad014decf008d782e60447c811391326] Merge tag 'tty-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty git bisect good a9724125ad014decf008d782e60447c811391326 # good: [f43dff0ee00a259f524ce17ba4f8030553c66590] Merge tag 'drm-amdkfd-next-fixes-2015-01-25' of git://people.freedesktop.org/~gabbayo/linux into drm-next git bisect good f43dff0ee00a259f524ce17ba4f8030553c66590 # bad: [cffe1e89dc9bf541a39d9287ced7c5addff07084] drm: sti: HDMI add audio infoframe git bisect bad cffe1e89dc9bf541a39d9287ced7c5addff07084 # good: [2f5b4ef15c60bc5292a3f006c018acb3da53737b] Merge tag 'drm/tegra/for-3.20-rc1' of git://anongit.freedesktop.org/tegra/linux into drm-next git bisect good 2f5b4ef15c60bc5292a3f006c018acb3da53737b # bad: [86588ce80ccd714793e9ba4140d7ae214229] drm/udl: optimize udl_compress_hline16 (v2) git bisect bad 86588ce80ccd714793e9ba4140d7ae214229 # bad: [d47df63393ed81977e0f6435988d9cbd70c867f7] drm/panel: simple: Add AVIC TM070DDH03 panel support git bisect bad d47df63393ed81977e0f6435988d9cbd70c867f7 # bad: [9469244d869623e8b54d9f3d4d00737e377af273] drm/atomic: Fix potential use of state after free git bisect bad 9469244d869623e8b54d9f3d4d00737e377af273 root@julian-VirtualBox:/usr/src/linux# On 02/24/2015, you wrote: On Fri, 2015-02-20 at 15:25 -0400, Julian Margetson wrote: On 2/18/2015 11:25 PM, Julian Margetson wrote: re PPC4XX PCI(E) MSI support. https://lists.ozlabs.org/pipermail/linuxppc-dev/2010-November/087273.html Hmm, I think all those comments were addressed before it was merged. I tried to get a 4xx board going here last week, but it doesn't seem happy. I can get a bit of uboot but then it hangs, might be overheating. cheers Kernel 4.0.0-rc1 boots ok when DVI output used but not when HDMI output used. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v12 17/21] powerpc/powernv: Shift VF resource with an offset
On Tue, Feb 24, 2015 at 3:00 AM, Bjorn Helgaas wrote: > On Tue, Feb 24, 2015 at 02:34:57AM -0600, Bjorn Helgaas wrote: >> From: Wei Yang >> >> On PowerNV platform, resource position in M64 implies the PE# the resource >> belongs to. In some cases, adjustment of a resource is necessary to locate >> it to a correct position in M64. >> >> Add pnv_pci_vf_resource_shift() to shift the 'real' PF IOV BAR address >> according to an offset. >> >> [bhelgaas: rework loops, rework overlap check, index resource[] >> conventionally, remove pci_regs.h include, squashed with next patch] >> Signed-off-by: Wei Yang >> Signed-off-by: Bjorn Helgaas > > ... > >> +#ifdef CONFIG_PCI_IOV >> +static int pnv_pci_vf_resource_shift(struct pci_dev *dev, int offset) >> +{ >> + struct pci_dn *pdn = pci_get_pdn(dev); >> + int i; >> + struct resource *res, res2; >> + resource_size_t size; >> + u16 vf_num; >> + >> + if (!dev->is_physfn) >> + return -EINVAL; >> + >> + /* >> + * "offset" is in VFs. The M64 windows are sized so that when they >> + * are segmented, each segment is the same size as the IOV BAR. >> + * Each segment is in a separate PE, and the high order bits of the >> + * address are the PE number. Therefore, each VF's BAR is in a >> + * separate PE, and changing the IOV BAR start address changes the >> + * range of PEs the VFs are in. >> + */ >> + vf_num = pdn->vf_pes; >> + for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) { >> + res = &dev->resource[i + PCI_IOV_RESOURCES]; >> + if (!res->flags || !res->parent) >> + continue; >> + >> + if (!pnv_pci_is_mem_pref_64(res->flags)) >> + continue; >> + >> + /* >> + * The actual IOV BAR range is determined by the start address >> + * and the actual size for vf_num VFs BAR. This check is to >> + * make sure that after shifting, the range will not overlap >> + * with another device. >> + */ >> + size = pci_iov_resource_size(dev, i + PCI_IOV_RESOURCES); >> + res2.flags = res->flags; >> + res2.start = res->start + (size * offset); >> + res2.end = res2.start + (size * vf_num) - 1; >> + >> + if (res2.end > res->end) { >> + dev_err(&dev->dev, "VF BAR%d: %pR would extend past >> %pR (trying to enable %d VFs shifted by %d)\n", >> + i, &res2, res, vf_num, offset); >> + return -EBUSY; >> + } >> + } >> + >> + for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) { >> + res = &dev->resource[i + PCI_IOV_RESOURCES]; >> + if (!res->flags || !res->parent) >> + continue; >> + >> + if (!pnv_pci_is_mem_pref_64(res->flags)) >> + continue; >> + >> + size = pci_iov_resource_size(dev, i + PCI_IOV_RESOURCES); >> + res2 = *res; >> + res->start += size * offset; > > I'm still not happy about this fiddling with res->start. > > Increasing res->start means that in principle, the "size * offset" bytes > that we just removed from res are now available for allocation to somebody > else. I don't think we *will* give that space to anything else because of > the alignment restrictions you're enforcing, but "res" now doesn't > correctly describe the real resource map. > > Would you be able to just update the BAR here while leaving the struct > resource alone? In that case, it would look a little funny that lspci > would show a BAR value in the middle of the region in /proc/iomem, but > the /proc/iomem region would be more correct. I guess this would also require a tweak where we compute the addresses of each of the VF resources. Today it's probably just "base + VF_num * size", where "base" is res->start. We'd have to account for the offset there if we don't adjust it here. >> + >> + dev_info(&dev->dev, "VF BAR%d: %pR shifted to %pR (enabling %d >> VFs shifted by %d)\n", >> + i, &res2, res, vf_num, offset); >> + pci_update_resource(dev, i + PCI_IOV_RESOURCES); >> + } >> + pdn->max_vfs -= offset; >> + return 0; >> +} >> +#endif /* CONFIG_PCI_IOV */ ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc: Export __spin_yield
On 02/23/2015 09:38 PM, Benjamin Herrenschmidt wrote: > On Mon, 2015-02-23 at 18:10 -0600, Suresh E. Warrier wrote: >> Export __spin_yield so that the arch_spin_unlock() function >> can be invoked from a module. > > Make it EXPORT_SYMBOL_GPL. Also explain why a module might need it > Sure, I will change that to EXPORT_SYMBOL_GPL. Just curious, though, there is another symbol arch_spin_unlock_wait that is exported from the file without the _GPL prefix. Any idea why? I have mentioned that this needs to be exported to call the arch_spin_unlock() function from a module. What additional information do you think will be useful here ? Are you looking at something that explains why a module might need to call arch_spin_unlock()? Thanks. -suresh ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC 01/11] i2c: add quirk structure to describe adapter flaws
On Mon, Jan 19, 2015 at 04:05:15PM +0100, Wolfram Sang wrote: > > > > + struct i2c_adapter_quirks *quirks; > > > }; > > > #define to_i2c_adapter(d) container_of(d, struct i2c_adapter, dev) > > > > > > > I suggest to add const. > > const struct i2c_adapter_quirks *quirks; > > > > also, in i2c-core.c, should modify: > > const struct i2c_adapter_quirks *q = adap->quirks; > > Thanks, I'll think about it. And added it... signature.asc Description: Digital signature ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC 02/11] i2c: add quirk checks to core
On Mon, Jan 12, 2015 at 12:08:14PM +, Russell King - ARM Linux wrote: > On Fri, Jan 09, 2015 at 06:21:32PM +0100, Wolfram Sang wrote: > > +static int i2c_quirk_error(struct i2c_adapter *adap, struct i2c_msg *msg, > > char *err_msg) > > +{ > > + dev_err(&adap->dev, "quirk: %s (addr 0x%04x, size %u)\n", err_msg, > > msg->addr, msg->len); > > + return -EOPNOTSUPP; > > +} > > So, what happens if I open an I2C adapter, find a message which causes > i2c_quirk_error() to be called, and then spin repeatedly calling that... > Shouldn't there be some rate limiting to this? Can be argued. Changed to dev_err_ratelimited(). Thanks! signature.asc Description: Digital signature ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC 02/11] i2c: add quirk checks to core
> > + if (msgs[i].flags & I2C_M_RD) { > > + if (i2c_quirk_exceeded(len, max_read)) > > + return i2c_quirk_error(adap, &msgs[i], "msg > > too long"); > > + } else { > > + if (i2c_quirk_exceeded(len, max_write)) > > + return i2c_quirk_error(adap, &msgs[i], "msg > > too long"); > > + } > > What about being more verbose in the error message, specifying if it > was a read or a write message that failed? Yes, done now. Thanks! signature.asc Description: Digital signature ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Problems with Kernels 3.17-rc1 and onwards on Acube Sam460 AMCC 460ex board
Problems with the Gib bisect Kernel wont compile after 10th bisect . drivers/built-in.o: In function `drm_mode_atomic_ioctl': (.text+0x865dc): undefined reference to `__get_user_bad' make: *** [vmlinux] Error 1 root@julian-VirtualBox:/usr/src/linux# git bisect log git bisect start # bad: [c517d838eb7d07bbe9507871fab3931deccff539] Linux 4.0-rc1 git bisect bad c517d838eb7d07bbe9507871fab3931deccff539 # good: [bfa76d49576599a4b9f9b7a71f23d73d6dcff735] Linux 3.19 git bisect good bfa76d49576599a4b9f9b7a71f23d73d6dcff735 # good: [02f1f2170d2831b3233e91091c60a66622f29e82] kernel.h: remove ancient __FUNCTION__ hack git bisect good 02f1f2170d2831b3233e91091c60a66622f29e82 # bad: [796e1c55717e9a6ff5c81b12289ffa1ffd919b6f] Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux git bisect bad 796e1c55717e9a6ff5c81b12289ffa1ffd919b6f # good: [9682ec9692e5ac11c6caebd079324e727b19e7ce] Merge tag 'driver-core-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core git bisect good 9682ec9692e5ac11c6caebd079324e727b19e7ce # good: [a9724125ad014decf008d782e60447c811391326] Merge tag 'tty-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty git bisect good a9724125ad014decf008d782e60447c811391326 # good: [f43dff0ee00a259f524ce17ba4f8030553c66590] Merge tag 'drm-amdkfd-next-fixes-2015-01-25' of git://people.freedesktop.org/~gabbayo/linux into drm-next git bisect good f43dff0ee00a259f524ce17ba4f8030553c66590 # bad: [cffe1e89dc9bf541a39d9287ced7c5addff07084] drm: sti: HDMI add audio infoframe git bisect bad cffe1e89dc9bf541a39d9287ced7c5addff07084 # good: [2f5b4ef15c60bc5292a3f006c018acb3da53737b] Merge tag 'drm/tegra/for-3.20-rc1' of git://anongit.freedesktop.org/tegra/linux into drm-next git bisect good 2f5b4ef15c60bc5292a3f006c018acb3da53737b # bad: [86588ce80ccd714793e9ba4140d7ae214229] drm/udl: optimize udl_compress_hline16 (v2) git bisect bad 86588ce80ccd714793e9ba4140d7ae214229 # bad: [d47df63393ed81977e0f6435988d9cbd70c867f7] drm/panel: simple: Add AVIC TM070DDH03 panel support git bisect bad d47df63393ed81977e0f6435988d9cbd70c867f7 # bad: [9469244d869623e8b54d9f3d4d00737e377af273] drm/atomic: Fix potential use of state after free git bisect bad 9469244d869623e8b54d9f3d4d00737e377af273 root@julian-VirtualBox:/usr/src/linux# On 02/24/2015, you wrote: > On Fri, 2015-02-20 at 15:25 -0400, Julian Margetson wrote: >> On 2/18/2015 11:25 PM, Julian Margetson wrote: > >> re PPC4XX PCI(E) MSI support. >> https://lists.ozlabs.org/pipermail/linuxppc-dev/2010-November/087273.html > > Hmm, I think all those comments were addressed before it was merged. > > I tried to get a 4xx board going here last week, but it doesn't seem happy. I > can get a bit of uboot but then it hangs, might be overheating. > > cheers > > ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/7] Serialise oopses, BUGs, WARNs, dump_stack, soft lockups and hard lockups
On Tue, Feb 24, 2015 at 01:39:46AM -0800, Arjan van de Ven wrote: > one of the question is if you want to serialize, or if you just want > to label. If you take a cookie (could just be a monotonic increasing > number) at the start of the oops and then prefix/postfix the stack > printing with that number, you don't serialize (risk of locking up), > but you can pretty trivially see which line came from where.. > if you do the monotonic increasing number approach, you even get an > ordering out of it. it does mean changing the dump_stack() and co > function fingerprint to take an extra argument, but that is not TOO > insane. I like that idea, but it relies on ensuring that each line is printed by one printk() statement - which in itself is a good idea. I'd actually like a version of print_hex_dump() which we could use for stack and code dumping - the existing print_hex_dump() assumes that it's fine to dereference the pointer, whereas for stack and code dumping, we can't always make that assumption. That's a separate issue though. -- FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up according to speedtest.net. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Problems with Kernels 3.17-rc1 and onwards on Acube Sam460 AMCC 460ex board
I had a hanging Uboot problem with a Sam440ep board.Never figured the problem out but It workedd for another two years after the problems began. Died for good last September with the hanging becoming a daily issue. Dont think that it was overheating. I thought that it could have been a problem with the on board ethernet. anyway I am still not giving up hopes of DRI and future kernals working, only into my second year of trying so too soon to give up . Doing a git bisect on the 4.00-rc1 now. On 02/24/2015, you wrote: > On Fri, 2015-02-20 at 15:25 -0400, Julian Margetson wrote: >> On 2/18/2015 11:25 PM, Julian Margetson wrote: > >> re PPC4XX PCI(E) MSI support. >> https://lists.ozlabs.org/pipermail/linuxppc-dev/2010-November/087273.html > > Hmm, I think all those comments were addressed before it was merged. > > I tried to get a 4xx board going here last week, but it doesn't seem happy. I > can get a bit of uboot but then it hangs, might be overheating. > > cheers > > ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [PATCH 1/7] Add die_spin_lock_{irqsave,irqrestore}
From: Ingo Molnar ... > So why not trylock and time out here after a few seconds, > instead of indefinitely supressing some potentially vital > output due to some other CPU crashing/locking with the lock > held? I've used that for status requests that usually lock a table to get a consistent view. If trylock times out assume that the data is actually stable and access it anyway. Remember the pid doing the access and next time it tries to acquire the same lock do a trylock with no timeout. That way if (when) there is a locking fubar (or a driver oops with a lock held) at least some of the relevant status commands will work. David ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V4] powerpc, powernv: Add OPAL platform event driver
Hi Stewart, I looked into ACPI and found details about it. But before we go into discussing more details of it, would like to share a brief about OPAL platform events (EPOW/DPO) work and original design proposed. As if now OPAL platform events work supports two events: EPOW (Early Power Off Warning) and DPO (Delayed Power Off). On FSP based systems FSP notifies OPAL about EPOW and DPO events via mbox mechanism. Subsequently OPAL sends notifications for these events to pkvm kernel. Original design is to have a kernel driver maintain a queue and add these events to queue upon arrival. pkvm driver also provides a character device for host to consume these events. A daemon is proposed for pkvm host to poll/read these events from char device. This daemon would process these events and take action to log and shutdown host. Apart from this it would also send these event info to VMs which is handled by OSes running on VMs. Linux on VMs already has code in place to handle these events as it expects this info to reach it in PAPR format under EPOW (Environmental and Power Warnings) category. EPOW mbox msgs are received for below events: 1. UPS events - UPS Battery Low, UPS Bypassed, UPS Utility Failure, UPS On 2. SPCN events - Configuration Change, Log SPCN Fault, Impending Power Failure, Power Incomplete 3. Temprature events - Over Ambient temperature, Over internal temperature. Now ACPI: Looked into ACPI and tried to figure out how ACPI userspace/kernel framework can be helpful for our work. ACPI user space consists of below components. acpid - ACPI daemon to receive events from kernel acpid provides events and actions files in /etc/acpi dir to configure actions for various events. acpi, acpi_listen, acpitool - Commands to query and set various ACPI supported parameters. These tools work with various sysfs files to show/set various parameter values. As if today acpid and other tools don't exist for POWER so would need to be ported. acpid is useful for our work but other tools might not be helpful as they look into various sysfs files created by various ACPI kernel drivers which we won't have. Also we would need to map our EPOW/DPO events to acpid supported events and few events link SPCN ones won't map straight away and might need to be added in acpid as new events. ACPI in kernel has various drivers for fan, battery, laptop buttons etc. They handle events and uses netlink mechanism to sent out these events to userspace. Now looking into ACPI code it seems that we would be reusing a small chunk of acpi code but instead end up adding unnecessary complexity due to support a lot of stuff than needed by us. Here too mapping our EPOW/DPO events to ACPI defined structures in needed and we would need to add new member varaibles in ACPI event structures for unmapped events like SPCN ones. In nutshell it seems that by using ACPI we would end up adding lot more complexity with a little gain of code reuse. Netlink: On technology side netlink seems to be a faster method compared to character driver. So that could be a good alternative to use as a method of communication between our pkvm driver and userspace. But EPOW/DPO events occur at very low rate unlike network subsystem which receive data packets at a very high rate. So probably netlink could be a faster method but due to slow EPOW/DPO event traffic a character driver might be sufficient. We already have ppc64-diag package which is part of various distros so would be used for hosting daemon code. Thus it takes off overhead of convincing distros for adding something extra. This was my findings and opinions on alternatives. Apologies for a little lengthy text :-) Let me know if i missed out anything and any suggestions that you would have. Regards, Vipin On 02/11/2015 10:32 AM, Stewart Smith wrote: Vipin K Parashar writes: (1) Environmental and Power Warning (EPOW) (2) Delayed Power Off (DPO) The user interface for this driver is /dev/opal_event character device file where the user space clients can poll and read for new opal platform events. The expected sequence of events driven from user space should be like the following. (1) Open the character device file (2) Poll on the file for POLLIN event (3) When unblocked, must attempt to read OPAL_PLAT_EVENT_MAX_SIZE size (4) Kernel driver will pass at most one opal_plat_event structure (5) Poll again for more new events A few thoughts from discussing with Michael and Joel: - not convinced that a chardev is the most ideal way to notify userspace. It seems like yet-another powerpc specific notification mechanism, which isn't ideal. - netlink probably isn't right either (although maybe *sligthtly* better?) - it seems that the "standard" way is ACPI, so I wonder if we could emit an ACPI event and essentially fake having ACPI... that would make all existing userspace "just work", righ
Re: [PATCH 0/7] Serialise oopses, BUGs, WARNs, dump_stack, soft lockups and hard lockups
>> Some architectures already have their own recursive >> locking for oopses and we have another version for >> serialising dump_stack. >> >> Create a common version and use it everywhere (oopses, >> BUGs, WARNs, dump_stack, soft lockups and hard lockups). > > Dunno. I've had cases where the simultaneity of the oopses > (i.e. their garbled nature) gave me the clue about the type > of race to expect. > one of the question is if you want to serialize, or if you just want to label. If you take a cookie (could just be a monotonic increasing number) at the start of the oops and then prefix/postfix the stack printing with that number, you don't serialize (risk of locking up), but you can pretty trivially see which line came from where.. if you do the monotonic increasing number approach, you even get an ordering out of it. it does mean changing the dump_stack() and co function fingerprint to take an extra argument, but that is not TOO insane. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Build regressions/improvements in v4.0-rc1
Hi Michael, On Tue, Feb 24, 2015 at 5:52 AM, Michael Ellerman wrote: >> > + error: book3s_64_vio_hv.c: undefined reference to >> > `power7_wakeup_loss': => .text+0x408) >> >> pseries_defconfig > > This one is actually from pseries_defconfig+POWERNV=n, so I think I Thanks! > broke your script with the + notation in the config name :) Nope, my brain used the wrong separator. However, my scripts do have a problem with the subdirectories in arch/powerpc/configs/ (4xx/currituck_defconfig)... Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v12 18/21] powerpc/powernv: Reserve additional space for IOV BAR, with m64_per_iov supported
On Tue, Feb 24, 2015 at 02:35:04AM -0600, Bjorn Helgaas wrote: > From: Wei Yang > > M64 aperture size is limited on PHB3. When the IOV BAR is too big, this > will exceed the limitation and failed to be assigned. > > Introduce a different mechanism based on the IOV BAR size: > > - if IOV BAR size is smaller than 64MB, expand to total_pe > - if IOV BAR size is bigger than 64MB, roundup power2 > > [bhelgaas: make dev_printk() output more consistent, use PCI_SRIOV_NUM_BARS] > Signed-off-by: Wei Yang > Signed-off-by: Bjorn Helgaas > --- > arch/powerpc/include/asm/pci-bridge.h |2 ++ > arch/powerpc/platforms/powernv/pci-ioda.c | 33 > ++--- > 2 files changed, 32 insertions(+), 3 deletions(-) > > diff --git a/arch/powerpc/include/asm/pci-bridge.h > b/arch/powerpc/include/asm/pci-bridge.h > index 011340df8583..d824bb184ab8 100644 > --- a/arch/powerpc/include/asm/pci-bridge.h > +++ b/arch/powerpc/include/asm/pci-bridge.h > @@ -179,6 +179,8 @@ struct pci_dn { > u16 max_vfs;/* number of VFs IOV BAR expended */ > u16 vf_pes; /* VF PE# under this PF */ > int offset; /* PE# for the first VF PE */ > +#define M64_PER_IOV 4 > + int m64_per_iov; > #define IODA_INVALID_M64(-1) > int m64_wins[PCI_SRIOV_NUM_BARS]; > #endif /* CONFIG_PCI_IOV */ > diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c > b/arch/powerpc/platforms/powernv/pci-ioda.c > index a3c2fbe35fc8..30b7c3909746 100644 > --- a/arch/powerpc/platforms/powernv/pci-ioda.c > +++ b/arch/powerpc/platforms/powernv/pci-ioda.c > @@ -2242,6 +2242,7 @@ static void pnv_pci_ioda_fixup_iov_resources(struct > pci_dev *pdev) > int i; > resource_size_t size; > struct pci_dn *pdn; > + int mul, total_vfs; > > if (!pdev->is_physfn || pdev->is_added) > return; > @@ -2252,6 +2253,32 @@ static void pnv_pci_ioda_fixup_iov_resources(struct > pci_dev *pdev) > pdn = pci_get_pdn(pdev); > pdn->max_vfs = 0; > > + total_vfs = pci_sriov_get_totalvfs(pdev); > + pdn->m64_per_iov = 1; > + mul = phb->ioda.total_pe; > + > + for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) { > + res = &pdev->resource[i + PCI_IOV_RESOURCES]; > + if (!res->flags || res->parent) > + continue; > + if (!pnv_pci_is_mem_pref_64(res->flags)) { > + dev_warn(&pdev->dev, " non M64 VF BAR%d: %pR\n", > + i, res); Why is this a dev_warn()? Can the user do anything about it? Do you want bug reports if users see this message? There are several other instances of this in the other patches, too. > + continue; > + } > + > + size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES); > + > + /* bigger than 64M */ > + if (size > (1 << 26)) { > + dev_info(&pdev->dev, "PowerNV: VF BAR%d: %pR IOV size > is bigger than 64M, roundup power2\n", > + i, res); > + pdn->m64_per_iov = M64_PER_IOV; > + mul = __roundup_pow_of_two(total_vfs); Why is this __roundup_pow_of_two() instead of roundup_pow_of_two()? I *think* __roundup_pow_of_two() is basically a helper function for implementing roundup_pow_of_two() and not intended to be used by itself. I think there are other patches that use __roundup_pow_of_two(), too. > + break; > + } > + } > + > for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) { > res = &pdev->resource[i + PCI_IOV_RESOURCES]; > if (!res->flags || res->parent) > @@ -2264,12 +2291,12 @@ static void pnv_pci_ioda_fixup_iov_resources(struct > pci_dev *pdev) > > dev_dbg(&pdev->dev, " Fixing VF BAR%d: %pR to\n", i, res); > size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES); > - res->end = res->start + size * phb->ioda.total_pe - 1; > + res->end = res->start + size * mul - 1; > dev_dbg(&pdev->dev, " %pR\n", res); > dev_info(&pdev->dev, "VF BAR%d: %pR (expanded to %d VFs for PE > alignment)", > - i, res, phb->ioda.total_pe); > + i, res, mul); > } > - pdn->max_vfs = phb->ioda.total_pe; > + pdn->max_vfs = mul; > } > > static void pnv_pci_ioda_fixup_sriov(struct pci_bus *bus) > ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 4/4] powerpc/mpic: remove unused functions
Drop unused fsl_mpic_primary_get_version(), mpic_set_clk_ratio(), mpic_set_serial_int(). + fsl_mpic_primary_get_version() is just a safe wrapper around fsl_mpic_get_version() for SMP configurations. While the latter is called explicitly for handling PIC initialization and setting up error interrupt vector depending on PIC hardware version, the former isn't used for anything. + As for mpic_set_clk_ratio() and mpic_set_serial_int(), they both are almost nine years old[1] but still have no chance to be called even from out-of-tree modules because they both are __init and of course aren't exported. [1] https://lists.ozlabs.org/pipermail/linuxppc-dev/2006-June/023867.html Signed-off-by: Arseny Solokha Cc: hongtao@freescale.com --- arch/powerpc/include/asm/mpic.h | 20 arch/powerpc/sysdev/mpic.c | 35 --- 2 files changed, 55 deletions(-) diff --git a/arch/powerpc/include/asm/mpic.h b/arch/powerpc/include/asm/mpic.h index 754f93d..6ce63a7 100644 --- a/arch/powerpc/include/asm/mpic.h +++ b/arch/powerpc/include/asm/mpic.h @@ -34,10 +34,6 @@ #defineMPIC_GREG_GCONF_BASE_MASK 0x000f #defineMPIC_GREG_GCONF_MCK 0x0800 #define MPIC_GREG_GLOBAL_CONF_10x00030 -#defineMPIC_GREG_GLOBAL_CONF_1_SIE 0x0800 -#defineMPIC_GREG_GLOBAL_CONF_1_CLK_RATIO_MASK 0x7000 -#defineMPIC_GREG_GLOBAL_CONF_1_CLK_RATIO(r)\ - (((r) << 28) & MPIC_GREG_GLOBAL_CONF_1_CLK_RATIO_MASK) #define MPIC_GREG_VENDOR_0 0x00040 #define MPIC_GREG_VENDOR_1 0x00050 #define MPIC_GREG_VENDOR_2 0x00060 @@ -395,16 +391,6 @@ extern struct bus_type mpic_subsys; #defineMPIC_REGSET_STANDARDMPIC_REGSET(0) /* Original MPIC */ #defineMPIC_REGSET_TSI108 MPIC_REGSET(1) /* Tsi108/109 PIC */ -/* Get the version of primary MPIC */ -#ifdef CONFIG_MPIC -extern u32 fsl_mpic_primary_get_version(void); -#else -static inline u32 fsl_mpic_primary_get_version(void) -{ - return 0; -} -#endif - /* Allocate the controller structure and setup the linux irq descs * for the range if interrupts passed in. No HW initialization is * actually performed. @@ -496,11 +482,5 @@ extern unsigned int mpic_get_coreint_irq(void); /* Fetch Machine Check interrupt from primary mpic */ extern unsigned int mpic_get_mcirq(void); -/* Set the EPIC clock ratio */ -void mpic_set_clk_ratio(struct mpic *mpic, u32 clock_ratio); - -/* Enable/Disable EPIC serial interrupt mode */ -void mpic_set_serial_int(struct mpic *mpic, int enable); - #endif /* __KERNEL__ */ #endif /* _ASM_POWERPC_MPIC_H */ diff --git a/arch/powerpc/sysdev/mpic.c b/arch/powerpc/sysdev/mpic.c index bbfbbf2..f72b592 100644 --- a/arch/powerpc/sysdev/mpic.c +++ b/arch/powerpc/sysdev/mpic.c @@ -1219,16 +1219,6 @@ static u32 fsl_mpic_get_version(struct mpic *mpic) * Exported functions */ -u32 fsl_mpic_primary_get_version(void) -{ - struct mpic *mpic = mpic_primary; - - if (mpic) - return fsl_mpic_get_version(mpic); - - return 0; -} - struct mpic * __init mpic_alloc(struct device_node *node, phys_addr_t phys_addr, unsigned int flags, @@ -1676,31 +1666,6 @@ void __init mpic_init(struct mpic *mpic) mpic_err_int_init(mpic, MPIC_FSL_ERR_INT); } -void __init mpic_set_clk_ratio(struct mpic *mpic, u32 clock_ratio) -{ - u32 v; - - v = mpic_read(mpic->gregs, MPIC_GREG_GLOBAL_CONF_1); - v &= ~MPIC_GREG_GLOBAL_CONF_1_CLK_RATIO_MASK; - v |= MPIC_GREG_GLOBAL_CONF_1_CLK_RATIO(clock_ratio); - mpic_write(mpic->gregs, MPIC_GREG_GLOBAL_CONF_1, v); -} - -void __init mpic_set_serial_int(struct mpic *mpic, int enable) -{ - unsigned long flags; - u32 v; - - raw_spin_lock_irqsave(&mpic_lock, flags); - v = mpic_read(mpic->gregs, MPIC_GREG_GLOBAL_CONF_1); - if (enable) - v |= MPIC_GREG_GLOBAL_CONF_1_SIE; - else - v &= ~MPIC_GREG_GLOBAL_CONF_1_SIE; - mpic_write(mpic->gregs, MPIC_GREG_GLOBAL_CONF_1, v); - raw_spin_unlock_irqrestore(&mpic_lock, flags); -} - void mpic_irq_set_priority(unsigned int irq, unsigned int pri) { struct mpic *mpic = mpic_find(irq); -- 2.3.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 3/4] powrepc/qe: drop unused ucc_slow_poll_transmitter_now
Drop ucc_slow_poll_transmitter_now() which has no users since its inception in 2007 in commit 986585385131 ("[POWERPC] Add QUICC Engine (QE) infrastructure"). Signed-off-by: Arseny Solokha --- arch/powerpc/include/asm/ucc_slow.h | 13 - arch/powerpc/sysdev/qe_lib/ucc_slow.c | 5 - 2 files changed, 18 deletions(-) diff --git a/arch/powerpc/include/asm/ucc_slow.h b/arch/powerpc/include/asm/ucc_slow.h index c44131e..233ef5f 100644 --- a/arch/powerpc/include/asm/ucc_slow.h +++ b/arch/powerpc/include/asm/ucc_slow.h @@ -251,19 +251,6 @@ void ucc_slow_enable(struct ucc_slow_private * uccs, enum comm_dir mode); */ void ucc_slow_disable(struct ucc_slow_private * uccs, enum comm_dir mode); -/* ucc_slow_poll_transmitter_now - * Immediately forces a poll of the transmitter for data to be sent. - * Typically, the hardware performs a periodic poll for data that the - * transmit routine has set up to be transmitted. In cases where - * this polling cycle is not soon enough, this optional routine can - * be invoked to force a poll right away, instead. Proper use for - * each transmission for which this functionality is desired is to - * call the transmit routine and then this routine right after. - * - * uccs - (In) pointer to the slow UCC structure. - */ -void ucc_slow_poll_transmitter_now(struct ucc_slow_private * uccs); - /* ucc_slow_graceful_stop_tx * Smoothly stops transmission on a specified slow UCC. * diff --git a/arch/powerpc/sysdev/qe_lib/ucc_slow.c b/arch/powerpc/sysdev/qe_lib/ucc_slow.c index befaf11..5f91628 100644 --- a/arch/powerpc/sysdev/qe_lib/ucc_slow.c +++ b/arch/powerpc/sysdev/qe_lib/ucc_slow.c @@ -43,11 +43,6 @@ u32 ucc_slow_get_qe_cr_subblock(int uccs_num) } EXPORT_SYMBOL(ucc_slow_get_qe_cr_subblock); -void ucc_slow_poll_transmitter_now(struct ucc_slow_private * uccs) -{ - out_be16(&uccs->us_regs->utodr, UCC_SLOW_TOD); -} - void ucc_slow_graceful_stop_tx(struct ucc_slow_private * uccs) { struct ucc_slow_info *us_info = uccs->us_info; -- 2.3.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 2/4] kvm/ppc/mpic: drop unused IRQ_testbit
Drop unused static procedure which doesn't have callers within its translation unit. It had been already removed independently in QEMU[1] from the OpenPIC implementation borrowed from the kernel. [1] https://lists.gnu.org/archive/html/qemu-devel/2014-06/msg01812.html Signed-off-by: Arseny Solokha Cc: Alexander Graf Cc: Gleb Natapov Cc: Paolo Bonzini --- arch/powerpc/kvm/mpic.c | 5 - 1 file changed, 5 deletions(-) diff --git a/arch/powerpc/kvm/mpic.c b/arch/powerpc/kvm/mpic.c index 39b3a8f..a480d99 100644 --- a/arch/powerpc/kvm/mpic.c +++ b/arch/powerpc/kvm/mpic.c @@ -289,11 +289,6 @@ static inline void IRQ_resetbit(struct irq_queue *q, int n_IRQ) clear_bit(n_IRQ, q->queue); } -static inline int IRQ_testbit(struct irq_queue *q, int n_IRQ) -{ - return test_bit(n_IRQ, q->queue); -} - static void IRQ_check(struct openpic *opp, struct irq_queue *q) { int irq = -1; -- 2.3.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH V2 0/4] powerpc: trivial unused functions cleanup
This series removes unused functions from powerpc tree that I've been able to discover. Two machines at hands, e300 and e500 based, boot and run without regressions on my workload with this series applied. The removed code seems also been rarely touched, so it seems the series is safe at least in general. But I can't obviously express any strong point in support of the series, so it's completely OK to leave things as is. Arseny Solokha (4): powerpc/boot: drop planetcore_set_serial_speed kvm/ppc/mpic: drop unused IRQ_testbit powrepc/qe: drop unused ucc_slow_poll_transmitter_now powerpc/mpic: remove unused functions arch/powerpc/boot/planetcore.c| 33 - arch/powerpc/boot/planetcore.h| 3 --- arch/powerpc/include/asm/mpic.h | 20 arch/powerpc/include/asm/ucc_slow.h | 13 - arch/powerpc/kvm/mpic.c | 5 - arch/powerpc/sysdev/mpic.c| 35 --- arch/powerpc/sysdev/qe_lib/ucc_slow.c | 5 - 7 files changed, 114 deletions(-) -- 2.3.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 1/4] powerpc/boot: drop planetcore_set_serial_speed
Drop planetcore_set_serial_speed() which had no users since its inception in commit fec6047047fd ("[POWERPC] bootwrapper: Add PlanetCore firmware support") in 2007. Signed-off-by: Arseny Solokha --- arch/powerpc/boot/planetcore.c | 33 - arch/powerpc/boot/planetcore.h | 3 --- 2 files changed, 36 deletions(-) diff --git a/arch/powerpc/boot/planetcore.c b/arch/powerpc/boot/planetcore.c index 0d8558a..75117e6 100644 --- a/arch/powerpc/boot/planetcore.c +++ b/arch/powerpc/boot/planetcore.c @@ -131,36 +131,3 @@ void planetcore_set_stdout_path(const char *table) setprop_str(chosen, "linux,stdout-path", path); } - -void planetcore_set_serial_speed(const char *table) -{ - void *chosen, *stdout; - u64 baud; - u32 baud32; - int len; - - chosen = finddevice("/chosen"); - if (!chosen) - return; - - len = getprop(chosen, "linux,stdout-path", prop_buf, MAX_PROP_LEN); - if (len <= 0) - return; - - stdout = finddevice(prop_buf); - if (!stdout) { - printf("planetcore_set_serial_speed: " - "Bad /chosen/linux,stdout-path.\r\n"); - - return; - } - - if (!planetcore_get_decimal(table, PLANETCORE_KEY_SERIAL_BAUD, - &baud)) { - printf("planetcore_set_serial_speed: No SB tag.\r\n"); - return; - } - - baud32 = baud; - setprop(stdout, "current-speed", &baud32, 4); -} diff --git a/arch/powerpc/boot/planetcore.h b/arch/powerpc/boot/planetcore.h index 0d4094f..d53c733 100644 --- a/arch/powerpc/boot/planetcore.h +++ b/arch/powerpc/boot/planetcore.h @@ -43,7 +43,4 @@ void planetcore_set_mac_addrs(const char *table); */ void planetcore_set_stdout_path(const char *table); -/* Sets the current-speed property in the serial node. */ -void planetcore_set_serial_speed(const char *table); - #endif -- 2.3.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v12 17/21] powerpc/powernv: Shift VF resource with an offset
On Tue, Feb 24, 2015 at 02:34:57AM -0600, Bjorn Helgaas wrote: > From: Wei Yang > > On PowerNV platform, resource position in M64 implies the PE# the resource > belongs to. In some cases, adjustment of a resource is necessary to locate > it to a correct position in M64. > > Add pnv_pci_vf_resource_shift() to shift the 'real' PF IOV BAR address > according to an offset. I think I squashed the "powerpc/powernv: Allocate VF PE" into this one, but I didn't merge the changelog into this one. Those two patches don't seem super related to each other, but I think there really was some dependency. > [bhelgaas: rework loops, rework overlap check, index resource[] > conventionally, remove pci_regs.h include, squashed with next patch] > Signed-off-by: Wei Yang > Signed-off-by: Bjorn Helgaas > --- > arch/powerpc/include/asm/pci-bridge.h |4 > arch/powerpc/kernel/pci_dn.c | 11 + > arch/powerpc/platforms/powernv/pci-ioda.c | 520 > - > arch/powerpc/platforms/powernv/pci.c | 18 + > arch/powerpc/platforms/powernv/pci.h |7 > 5 files changed, 543 insertions(+), 17 deletions(-) > > diff --git a/arch/powerpc/include/asm/pci-bridge.h > b/arch/powerpc/include/asm/pci-bridge.h > index de11de7d4547..011340df8583 100644 > --- a/arch/powerpc/include/asm/pci-bridge.h > +++ b/arch/powerpc/include/asm/pci-bridge.h > @@ -177,6 +177,10 @@ struct pci_dn { > int pe_number; > #ifdef CONFIG_PCI_IOV > u16 max_vfs;/* number of VFs IOV BAR expended */ > + u16 vf_pes; /* VF PE# under this PF */ > + int offset; /* PE# for the first VF PE */ > +#define IODA_INVALID_M64(-1) > + int m64_wins[PCI_SRIOV_NUM_BARS]; > #endif /* CONFIG_PCI_IOV */ > #endif > struct list_head child_list; > diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c > index f3a1a81d112f..5faf7ca45434 100644 > --- a/arch/powerpc/kernel/pci_dn.c > +++ b/arch/powerpc/kernel/pci_dn.c > @@ -217,6 +217,17 @@ void remove_dev_pci_info(struct pci_dev *pdev) > struct pci_dn *pdn, *tmp; > int i; > > + /* > + * VF and VF PE are created/released dynamically, so we need to > + * bind/unbind them. Otherwise the VF and VF PE would be mismatched > + * when re-enabling SR-IOV. > + */ > + if (pdev->is_virtfn) { > + pdn = pci_get_pdn(pdev); > + pdn->pe_number = IODA_INVALID_PE; > + return; > + } > + > /* Only support IOV PF for now */ > if (!pdev->is_physfn) > return; > diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c > b/arch/powerpc/platforms/powernv/pci-ioda.c > index 6a86690bb8de..a3c2fbe35fc8 100644 > --- a/arch/powerpc/platforms/powernv/pci-ioda.c > +++ b/arch/powerpc/platforms/powernv/pci-ioda.c > @@ -44,6 +44,9 @@ > #include "powernv.h" > #include "pci.h" > > +/* 256M DMA window, 4K TCE pages, 8 bytes TCE */ > +#define TCE32_TABLE_SIZE ((0x1000 / 0x1000) * 8) > + > static void pe_level_printk(const struct pnv_ioda_pe *pe, const char *level, > const char *fmt, ...) > { > @@ -56,11 +59,18 @@ static void pe_level_printk(const struct pnv_ioda_pe *pe, > const char *level, > vaf.fmt = fmt; > vaf.va = &args; > > - if (pe->pdev) > + if (pe->flags & PNV_IODA_PE_DEV) > strlcpy(pfix, dev_name(&pe->pdev->dev), sizeof(pfix)); > - else > + else if (pe->flags & (PNV_IODA_PE_BUS | PNV_IODA_PE_BUS_ALL)) > sprintf(pfix, "%04x:%02x ", > pci_domain_nr(pe->pbus), pe->pbus->number); > +#ifdef CONFIG_PCI_IOV > + else if (pe->flags & PNV_IODA_PE_VF) > + sprintf(pfix, "%04x:%02x:%2x.%d", > + pci_domain_nr(pe->parent_dev->bus), > + (pe->rid & 0xff00) >> 8, > + PCI_SLOT(pe->rid), PCI_FUNC(pe->rid)); > +#endif /* CONFIG_PCI_IOV*/ > > printk("%spci %s: [PE# %.3d] %pV", > level, pfix, pe->pe_number, &vaf); > @@ -591,7 +601,7 @@ static int pnv_ioda_set_peltv(struct pnv_phb *phb, > bool is_add) > { > struct pnv_ioda_pe *slave; > - struct pci_dev *pdev; > + struct pci_dev *pdev = NULL; > int ret; > > /* > @@ -630,8 +640,12 @@ static int pnv_ioda_set_peltv(struct pnv_phb *phb, > > if (pe->flags & (PNV_IODA_PE_BUS_ALL | PNV_IODA_PE_BUS)) > pdev = pe->pbus->self; > - else > + else if (pe->flags & PNV_IODA_PE_DEV) > pdev = pe->pdev->bus->self; > +#ifdef CONFIG_PCI_IOV > + else if (pe->flags & PNV_IODA_PE_VF) > + pdev = pe->parent_dev->bus->self; > +#endif /* CONFIG_PCI_IOV */ > while (pdev) { > struct pci_dn *pdn = pci_get_pdn(pdev); > struct pnv_ioda_pe *parent; > @@ -649,6 +663,87 @@ static int pnv_ioda_set_peltv(struct pnv_phb *phb, >
Re: [PATCH v12 17/21] powerpc/powernv: Shift VF resource with an offset
On Tue, Feb 24, 2015 at 02:34:57AM -0600, Bjorn Helgaas wrote: > From: Wei Yang > > On PowerNV platform, resource position in M64 implies the PE# the resource > belongs to. In some cases, adjustment of a resource is necessary to locate > it to a correct position in M64. > > Add pnv_pci_vf_resource_shift() to shift the 'real' PF IOV BAR address > according to an offset. > > [bhelgaas: rework loops, rework overlap check, index resource[] > conventionally, remove pci_regs.h include, squashed with next patch] > Signed-off-by: Wei Yang > Signed-off-by: Bjorn Helgaas ... > +#ifdef CONFIG_PCI_IOV > +static int pnv_pci_vf_resource_shift(struct pci_dev *dev, int offset) > +{ > + struct pci_dn *pdn = pci_get_pdn(dev); > + int i; > + struct resource *res, res2; > + resource_size_t size; > + u16 vf_num; > + > + if (!dev->is_physfn) > + return -EINVAL; > + > + /* > + * "offset" is in VFs. The M64 windows are sized so that when they > + * are segmented, each segment is the same size as the IOV BAR. > + * Each segment is in a separate PE, and the high order bits of the > + * address are the PE number. Therefore, each VF's BAR is in a > + * separate PE, and changing the IOV BAR start address changes the > + * range of PEs the VFs are in. > + */ > + vf_num = pdn->vf_pes; > + for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) { > + res = &dev->resource[i + PCI_IOV_RESOURCES]; > + if (!res->flags || !res->parent) > + continue; > + > + if (!pnv_pci_is_mem_pref_64(res->flags)) > + continue; > + > + /* > + * The actual IOV BAR range is determined by the start address > + * and the actual size for vf_num VFs BAR. This check is to > + * make sure that after shifting, the range will not overlap > + * with another device. > + */ > + size = pci_iov_resource_size(dev, i + PCI_IOV_RESOURCES); > + res2.flags = res->flags; > + res2.start = res->start + (size * offset); > + res2.end = res2.start + (size * vf_num) - 1; > + > + if (res2.end > res->end) { > + dev_err(&dev->dev, "VF BAR%d: %pR would extend past %pR > (trying to enable %d VFs shifted by %d)\n", > + i, &res2, res, vf_num, offset); > + return -EBUSY; > + } > + } > + > + for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) { > + res = &dev->resource[i + PCI_IOV_RESOURCES]; > + if (!res->flags || !res->parent) > + continue; > + > + if (!pnv_pci_is_mem_pref_64(res->flags)) > + continue; > + > + size = pci_iov_resource_size(dev, i + PCI_IOV_RESOURCES); > + res2 = *res; > + res->start += size * offset; I'm still not happy about this fiddling with res->start. Increasing res->start means that in principle, the "size * offset" bytes that we just removed from res are now available for allocation to somebody else. I don't think we *will* give that space to anything else because of the alignment restrictions you're enforcing, but "res" now doesn't correctly describe the real resource map. Would you be able to just update the BAR here while leaving the struct resource alone? In that case, it would look a little funny that lspci would show a BAR value in the middle of the region in /proc/iomem, but the /proc/iomem region would be more correct. > + > + dev_info(&dev->dev, "VF BAR%d: %pR shifted to %pR (enabling %d > VFs shifted by %d)\n", > + i, &res2, res, vf_num, offset); > + pci_update_resource(dev, i + PCI_IOV_RESOURCES); > + } > + pdn->max_vfs -= offset; > + return 0; > +} > +#endif /* CONFIG_PCI_IOV */ ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v12 15/21] powerpc/powernv: Reserve additional space for IOV BAR according to the number of total_pe
On Tue, Feb 24, 2015 at 02:34:42AM -0600, Bjorn Helgaas wrote: > From: Wei Yang > > On PHB3, PF IOV BAR will be covered by M64 window to have better PE > isolation. The total_pe number is usually different from total_VFs, which > can lead to a conflict between MMIO space and the PE number. > > For example, if total_VFs is 128 and total_pe is 256, the second half of > M64 window will be part of other PCI device, which may already belong > to other PEs. I'm still trying to wrap my mind around the explanation here. I *think* what's going on is that the M64 window must be a power-of-two size. If the VF(n) BAR space doesn't completely fill it, we might allocate the leftover space to another device. Then the M64 window for *this* device may cause the other device to be associated with a PE it didn't expect. But I don't understand this well enough to describe it clearly. More serious code question below... > Prevent the conflict by reserving additional space for the PF IOV BAR, > which is total_pe number of VF's BAR size. > > [bhelgaas: make dev_printk() output more consistent, index resource[] > conventionally] > Signed-off-by: Wei Yang > Signed-off-by: Bjorn Helgaas > --- > arch/powerpc/include/asm/machdep.h|4 ++ > arch/powerpc/include/asm/pci-bridge.h |3 ++ > arch/powerpc/kernel/pci-common.c |5 +++ > arch/powerpc/platforms/powernv/pci-ioda.c | 58 > + > 4 files changed, 70 insertions(+) > > diff --git a/arch/powerpc/include/asm/machdep.h > b/arch/powerpc/include/asm/machdep.h > index c8175a3fe560..965547c58497 100644 > --- a/arch/powerpc/include/asm/machdep.h > +++ b/arch/powerpc/include/asm/machdep.h > @@ -250,6 +250,10 @@ struct machdep_calls { > /* Reset the secondary bus of bridge */ > void (*pcibios_reset_secondary_bus)(struct pci_dev *dev); > > +#ifdef CONFIG_PCI_IOV > + void (*pcibios_fixup_sriov)(struct pci_bus *bus); > +#endif /* CONFIG_PCI_IOV */ > + > /* Called to shutdown machine specific hardware not already controlled >* by other drivers. >*/ > diff --git a/arch/powerpc/include/asm/pci-bridge.h > b/arch/powerpc/include/asm/pci-bridge.h > index 513f8f27060d..de11de7d4547 100644 > --- a/arch/powerpc/include/asm/pci-bridge.h > +++ b/arch/powerpc/include/asm/pci-bridge.h > @@ -175,6 +175,9 @@ struct pci_dn { > #define IODA_INVALID_PE (-1) > #ifdef CONFIG_PPC_POWERNV > int pe_number; > +#ifdef CONFIG_PCI_IOV > + u16 max_vfs;/* number of VFs IOV BAR expended */ > +#endif /* CONFIG_PCI_IOV */ > #endif > struct list_head child_list; > struct list_head list; > diff --git a/arch/powerpc/kernel/pci-common.c > b/arch/powerpc/kernel/pci-common.c > index 82031011522f..022e9feeb1f2 100644 > --- a/arch/powerpc/kernel/pci-common.c > +++ b/arch/powerpc/kernel/pci-common.c > @@ -1646,6 +1646,11 @@ void pcibios_scan_phb(struct pci_controller *hose) > if (ppc_md.pcibios_fixup_phb) > ppc_md.pcibios_fixup_phb(hose); > > +#ifdef CONFIG_PCI_IOV > + if (ppc_md.pcibios_fixup_sriov) > + ppc_md.pcibios_fixup_sriov(bus); > +#endif /* CONFIG_PCI_IOV */ > + > /* Configure PCI Express settings */ > if (bus && !pci_has_flag(PCI_PROBE_ONLY)) { > struct pci_bus *child; > diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c > b/arch/powerpc/platforms/powernv/pci-ioda.c > index cd1a56160ded..36c533da5ccb 100644 > --- a/arch/powerpc/platforms/powernv/pci-ioda.c > +++ b/arch/powerpc/platforms/powernv/pci-ioda.c > @@ -1749,6 +1749,61 @@ static void pnv_pci_init_ioda_msis(struct pnv_phb *phb) > static void pnv_pci_init_ioda_msis(struct pnv_phb *phb) { } > #endif /* CONFIG_PCI_MSI */ > > +#ifdef CONFIG_PCI_IOV > +static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev) > +{ > + struct pci_controller *hose; > + struct pnv_phb *phb; > + struct resource *res; > + int i; > + resource_size_t size; > + struct pci_dn *pdn; > + > + if (!pdev->is_physfn || pdev->is_added) > + return; > + > + hose = pci_bus_to_host(pdev->bus); > + phb = hose->private_data; > + > + pdn = pci_get_pdn(pdev); > + pdn->max_vfs = 0; > + > + for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) { > + res = &pdev->resource[i + PCI_IOV_RESOURCES]; > + if (!res->flags || res->parent) > + continue; > + if (!pnv_pci_is_mem_pref_64(res->flags)) { > + dev_warn(&pdev->dev, "Skipping expanding VF BAR%d: > %pR\n", > + i, res); > + continue; > + } > + > + dev_dbg(&pdev->dev, " Fixing VF BAR%d: %pR to\n", i, res); > + size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES); > + res->end = res->start + size * phb->ioda.total_pe - 1; > + dev_dbg(&pdev->dev, " %pR\n", r
Re: [PATCH v12 14/21] powerpc/powernv: Allocate struct pnv_ioda_pe iommu_table dynamically
On Tue, Feb 24, 2015 at 02:34:35AM -0600, Bjorn Helgaas wrote: > From: Wei Yang > > Current iommu_table of a PE is a static field. This will have a problem > when iommu_free_table() is called. > > Allocate iommu_table dynamically. I'd like a little more explanation about why we're calling iommu_free_table() now when we didn't call it before. Maybe this happens when we disable SR-IOV and the VFs go away? Is there a hotplug remove path where we should also be calling iommu_free_table()? > Signed-off-by: Wei Yang > Signed-off-by: Bjorn Helgaas > --- > arch/powerpc/include/asm/iommu.h |3 +++ > arch/powerpc/platforms/powernv/pci-ioda.c | 26 ++ > arch/powerpc/platforms/powernv/pci.h |2 +- > 3 files changed, 18 insertions(+), 13 deletions(-) > > diff --git a/arch/powerpc/include/asm/iommu.h > b/arch/powerpc/include/asm/iommu.h > index 9cfa3706a1b8..5574eeb97634 100644 > --- a/arch/powerpc/include/asm/iommu.h > +++ b/arch/powerpc/include/asm/iommu.h > @@ -78,6 +78,9 @@ struct iommu_table { > struct iommu_group *it_group; > #endif > void (*set_bypass)(struct iommu_table *tbl, bool enable); > +#ifdef CONFIG_PPC_POWERNV > + void *data; > +#endif > }; > > /* Pure 2^n version of get_order */ > diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c > b/arch/powerpc/platforms/powernv/pci-ioda.c > index 58c4fc4ab63c..cd1a56160ded 100644 > --- a/arch/powerpc/platforms/powernv/pci-ioda.c > +++ b/arch/powerpc/platforms/powernv/pci-ioda.c > @@ -916,6 +916,10 @@ static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, > int all) > return; > } > > + pe->tce32_table = kzalloc_node(sizeof(struct iommu_table), > + GFP_KERNEL, hose->node); > + pe->tce32_table->data = pe; > + > /* Associate it with all child devices */ > pnv_ioda_setup_same_PE(bus, pe); > > @@ -1005,7 +1009,7 @@ static void pnv_pci_ioda_dma_dev_setup(struct pnv_phb > *phb, struct pci_dev *pdev > > pe = &phb->ioda.pe_array[pdn->pe_number]; > WARN_ON(get_dma_ops(&pdev->dev) != &dma_iommu_ops); > - set_iommu_table_base_and_group(&pdev->dev, &pe->tce32_table); > + set_iommu_table_base_and_group(&pdev->dev, pe->tce32_table); > } > > static int pnv_pci_ioda_dma_set_mask(struct pnv_phb *phb, > @@ -1032,7 +1036,7 @@ static int pnv_pci_ioda_dma_set_mask(struct pnv_phb > *phb, > } else { > dev_info(&pdev->dev, "Using 32-bit DMA via iommu\n"); > set_dma_ops(&pdev->dev, &dma_iommu_ops); > - set_iommu_table_base(&pdev->dev, &pe->tce32_table); > + set_iommu_table_base(&pdev->dev, pe->tce32_table); > } > *pdev->dev.dma_mask = dma_mask; > return 0; > @@ -1069,9 +1073,9 @@ static void pnv_ioda_setup_bus_dma(struct pnv_ioda_pe > *pe, > list_for_each_entry(dev, &bus->devices, bus_list) { > if (add_to_iommu_group) > set_iommu_table_base_and_group(&dev->dev, > -&pe->tce32_table); > +pe->tce32_table); > else > - set_iommu_table_base(&dev->dev, &pe->tce32_table); > + set_iommu_table_base(&dev->dev, pe->tce32_table); > > if (dev->subordinate) > pnv_ioda_setup_bus_dma(pe, dev->subordinate, > @@ -1161,8 +1165,7 @@ static void pnv_pci_ioda2_tce_invalidate(struct > pnv_ioda_pe *pe, > void pnv_pci_ioda_tce_invalidate(struct iommu_table *tbl, >__be64 *startp, __be64 *endp, bool rm) > { > - struct pnv_ioda_pe *pe = container_of(tbl, struct pnv_ioda_pe, > - tce32_table); > + struct pnv_ioda_pe *pe = tbl->data; > struct pnv_phb *phb = pe->phb; > > if (phb->type == PNV_PHB_IODA1) > @@ -1228,7 +1231,7 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb > *phb, > } > > /* Setup linux iommu table */ > - tbl = &pe->tce32_table; > + tbl = pe->tce32_table; > pnv_pci_setup_iommu_table(tbl, addr, TCE32_TABLE_SIZE * segs, > base << 28, IOMMU_PAGE_SHIFT_4K); > > @@ -1266,8 +1269,7 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb > *phb, > > static void pnv_pci_ioda2_set_bypass(struct iommu_table *tbl, bool enable) > { > - struct pnv_ioda_pe *pe = container_of(tbl, struct pnv_ioda_pe, > - tce32_table); > + struct pnv_ioda_pe *pe = tbl->data; > uint16_t window_id = (pe->pe_number << 1 ) + 1; > int64_t rc; > > @@ -1312,10 +1314,10 @@ static void pnv_pci_ioda2_setup_bypass_pe(struct > pnv_phb *phb, > pe->tce_bypass_base = 1ull << 59; > > /* Install set_bypass callback for VFIO */ > - pe->tce32_table.set_bypass = pnv_pci_ioda2_set_bypass; > + pe->tce32_table->se
Re: [PATCH v12 11/21] powerpc/pci: Don't unset PCI resources for VFs
On Tue, Feb 24, 2015 at 02:34:13AM -0600, Bjorn Helgaas wrote: > From: Wei Yang > > If we're going to reassign resources with flag PCI_REASSIGN_ALL_RSRC, all > resources will be cleaned out during device header fixup time and then get > reassigned by PCI core. However, the VF resources won't be reassigned and > thus, we shouldn't clean them out. > > If the pci_dev is a VF, skip the resource unset process. I think this patch is correct, but we should include a little more detail in the changelog to answer questions like mine and Ben's (http://lkml.kernel.org/r/1423528584.4924.70.ca...@au1.ibm.com). > Signed-off-by: Wei Yang > Signed-off-by: Bjorn Helgaas > --- > arch/powerpc/kernel/pci-common.c |4 > 1 file changed, 4 insertions(+) > > diff --git a/arch/powerpc/kernel/pci-common.c > b/arch/powerpc/kernel/pci-common.c > index 2a525c938158..82031011522f 100644 > --- a/arch/powerpc/kernel/pci-common.c > +++ b/arch/powerpc/kernel/pci-common.c > @@ -788,6 +788,10 @@ static void pcibios_fixup_resources(struct pci_dev *dev) > pci_name(dev)); > return; > } > + > + if (dev->is_virtfn) > + return; > + > for (i = 0; i < DEVICE_COUNT_RESOURCE; i++) { > struct resource *res = dev->resource + i; > struct pci_bus_region reg; > ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v12 08/21] PCI: Add pcibios_sriov_enable() and pcibios_sriov_disable()
On Tue, Feb 24, 2015 at 02:33:52AM -0600, Bjorn Helgaas wrote: > From: Wei Yang > > VFs are dynamically created when a driver enables them. On some platforms, > like PowerNV, special resources are necessary to enable VFs. > > Add platform hooks for enabling and disabling VFs. > > Signed-off-by: Wei Yang > Signed-off-by: Bjorn Helgaas > --- > drivers/pci/iov.c | 19 +++ > 1 file changed, 19 insertions(+) > > diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c > index 5643a1011e23..cc6fedf4a1b9 100644 > --- a/drivers/pci/iov.c > +++ b/drivers/pci/iov.c > @@ -220,6 +220,11 @@ static void virtfn_remove(struct pci_dev *dev, int id, > int reset) > pci_dev_put(dev); > } > > +int __weak pcibios_sriov_enable(struct pci_dev *pdev, u16 vf_num) I think this "vf_num" parameter should be renamed to something like "num_vfs" instead. It's subtle, but "vf_num" suggests that we're talking about one of several VFs, e.g., VF1 or VF 2. But here we really mean the total number of VFs that we're enabling. There's similar code in the powerpc implementation that should be renamed the same way. > +{ > + return 0; > +} > + > static int sriov_enable(struct pci_dev *dev, int nr_virtfn) > { > int rc; > @@ -231,6 +236,7 @@ static int sriov_enable(struct pci_dev *dev, int > nr_virtfn) > struct pci_sriov *iov = dev->sriov; > int bars = 0; > int bus; > + int retval; > > if (!nr_virtfn) > return 0; > @@ -307,6 +313,12 @@ static int sriov_enable(struct pci_dev *dev, int > nr_virtfn) > if (nr_virtfn < initial) > initial = nr_virtfn; > > + if ((retval = pcibios_sriov_enable(dev, initial))) { > + dev_err(&dev->dev, "failure %d from pcibios_sriov_enable()\n", > + retval); > + return retval; > + } > + > for (i = 0; i < initial; i++) { > rc = virtfn_add(dev, i, 0); > if (rc) > @@ -335,6 +347,11 @@ failed: > return rc; > } > > +int __weak pcibios_sriov_disable(struct pci_dev *pdev) > +{ > + return 0; > +} > + > static void sriov_disable(struct pci_dev *dev) > { > int i; > @@ -346,6 +363,8 @@ static void sriov_disable(struct pci_dev *dev) > for (i = 0; i < iov->num_VFs; i++) > virtfn_remove(dev, i, 0); > > + pcibios_sriov_disable(dev); > + > iov->ctrl &= ~(PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE); > pci_cfg_access_lock(dev); > pci_write_config_word(dev, iov->pos + PCI_SRIOV_CTRL, iov->ctrl); > ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v12 10/21] PCI: Consider additional PF's IOV BAR alignment in sizing and assigning
On Tue, Feb 24, 2015 at 02:34:06AM -0600, Bjorn Helgaas wrote: > From: Wei Yang > > When sizing and assigning resources, we divide the resources into two > lists: the requested list and the additional list. We don't consider the > alignment of additional VF(n) BAR space. > > This is reasonable because the alignment required for the VF(n) BAR space > is the size of an individual VF BAR, not the size of the space for *all* > VFs. But some platforms, e.g., PowerNV, require additional alignment. > > Consider the additional IOV BAR alignment when sizing and assigning > resources. When there is not enough system MMIO space, the PF's IOV BAR > alignment will not contribute to the bridge. When there is enough system > MMIO space, the additional alignment will contribute to the bridge. I don't understand the ""when there is not enough system MMIO space" part. How do we tell if there's enough MMIO space? > Also, take advantage of pci_dev_resource::min_align to store this > additional alignment. This comment doesn't seem to make sense; this patch doesn't save anything in min_align. Another question below... > [bhelgaas: changelog, printk cast] > Signed-off-by: Wei Yang > Signed-off-by: Bjorn Helgaas > --- > drivers/pci/setup-bus.c | 83 > --- > 1 file changed, 70 insertions(+), 13 deletions(-) > > diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c > index e3e17f3c0f0f..affbceae560f 100644 > --- a/drivers/pci/setup-bus.c > +++ b/drivers/pci/setup-bus.c > @@ -99,8 +99,8 @@ static void remove_from_list(struct list_head *head, > } > } > > -static resource_size_t get_res_add_size(struct list_head *head, > - struct resource *res) > +static struct pci_dev_resource *res_to_dev_res(struct list_head *head, > +struct resource *res) > { > struct pci_dev_resource *dev_res; > > @@ -109,17 +109,37 @@ static resource_size_t get_res_add_size(struct > list_head *head, > int idx = res - &dev_res->dev->resource[0]; > > dev_printk(KERN_DEBUG, &dev_res->dev->dev, > - "res[%d]=%pR get_res_add_size add_size %llx\n", > + "res[%d]=%pR res_to_dev_res add_size %llx > min_align %llx\n", >idx, dev_res->res, > - (unsigned long long)dev_res->add_size); > + (unsigned long long)dev_res->add_size, > + (unsigned long long)dev_res->min_align); > > - return dev_res->add_size; > + return dev_res; > } > } > > - return 0; > + return NULL; > +} > + > +static resource_size_t get_res_add_size(struct list_head *head, > + struct resource *res) > +{ > + struct pci_dev_resource *dev_res; > + > + dev_res = res_to_dev_res(head, res); > + return dev_res ? dev_res->add_size : 0; > +} > + > +static resource_size_t get_res_add_align(struct list_head *head, > + struct resource *res) > +{ > + struct pci_dev_resource *dev_res; > + > + dev_res = res_to_dev_res(head, res); > + return dev_res ? dev_res->min_align : 0; > } > > + > /* Sort resources by alignment */ > static void pdev_sort_resources(struct pci_dev *dev, struct list_head *head) > { > @@ -368,8 +388,9 @@ static void __assign_resources_sorted(struct list_head > *head, > LIST_HEAD(save_head); > LIST_HEAD(local_fail_head); > struct pci_dev_resource *save_res; > - struct pci_dev_resource *dev_res, *tmp_res; > + struct pci_dev_resource *dev_res, *tmp_res, *dev_res2; > unsigned long fail_type; > + resource_size_t add_align, align; > > /* Check if optional add_size is there */ > if (!realloc_head || list_empty(realloc_head)) > @@ -384,10 +405,38 @@ static void __assign_resources_sorted(struct list_head > *head, > } > > /* Update res in head list with add_size in realloc_head list */ > - list_for_each_entry(dev_res, head, list) > + list_for_each_entry_safe(dev_res, tmp_res, head, list) { > dev_res->res->end += get_res_add_size(realloc_head, > dev_res->res); > > + /* > + * There are two kinds of additional resources in the list: > + * 1. bridge resource -- IORESOURCE_STARTALIGN > + * 2. SR-IOV resource -- IORESOURCE_SIZEALIGN > + * Here just fix the additional alignment for bridge > + */ > + if (!(dev_res->res->flags & IORESOURCE_STARTALIGN)) > + continue; > + > + add_align = get_res_add_align(realloc_head, dev_res->res); > + > + /* Reorder the list by their alignment */ Why do we nee
[PATCH v12 21/21] powerpc/pci: Add PCI resource alignment documentation
From: Wei Yang In order to enable SRIOV on PowerNV platform, the PF's IOV BAR needs to be adjusted: 1. size expanded 2. aligned to M64BT size This patch documents this change on the reason and how. [bhelgaas: reformat, clarify, expand] Signed-off-by: Wei Yang Signed-off-by: Bjorn Helgaas --- .../powerpc/pci_iov_resource_on_powernv.txt| 305 1 file changed, 305 insertions(+) create mode 100644 Documentation/powerpc/pci_iov_resource_on_powernv.txt diff --git a/Documentation/powerpc/pci_iov_resource_on_powernv.txt b/Documentation/powerpc/pci_iov_resource_on_powernv.txt new file mode 100644 index ..4e9bb2812238 --- /dev/null +++ b/Documentation/powerpc/pci_iov_resource_on_powernv.txt @@ -0,0 +1,305 @@ +Wei Yang +Benjamin Herrenschmidt +26 Aug 2014 + +This document describes the requirement from hardware for PCI MMIO resource +sizing and assignment on PowerNV platform and how generic PCI code handles +this requirement. The first two sections describe the concepts of +Partitionable Endpoints and the implementation on P8 (IODA2). + +1. Introduction to Partitionable Endpoints + +A Partitionable Endpoint (PE) is a way to group the various resources +associated with a device or a set of device to provide isolation between +partitions (i.e., filtering of DMA, MSIs etc.) and to provide a mechanism +to freeze a device that is causing errors in order to limit the possibility +of propagation of bad data. + +There is thus, in HW, a table of PE states that contains a pair of "frozen" +state bits (one for MMIO and one for DMA, they get set together but can be +cleared independently) for each PE. + +When a PE is frozen, all stores in any direction are dropped and all loads +return all 1's value. MSIs are also blocked. There's a bit more state +that captures things like the details of the error that caused the freeze +etc., but that's not critical. + +The interesting part is how the various PCIe transactions (MMIO, DMA, ...) +are matched to their corresponding PEs. + +The following section provides a rough description of what we have on P8 +(IODA2). Keep in mind that this is all per PHB (PCI host bridge). Each +PHB is a completely separate HW entity that replicates the entire logic, +so has its own set of PEs, etc. + +2. Implementation of Partitionable Endpoints on P8 (IODA2) + +P8 supports up to 256 Partitionable Endpoints per PHB. + + * Inbound + +For DMA, MSIs and inbound PCIe error messages, we have a table (in +memory but accessed in HW by the chip) that provides a direct +correspondence between a PCIe RID (bus/dev/fn) with a PE number. +We call this the RTT. + +- For DMA we then provide an entire address space for each PE that can + contains two "windows", depending on the value of PCI address bit 59. + Each window can be configured to be remapped via a "TCE table" (IOMMU + translation table), which has various configurable characteristics + not described here. + +- For MSIs, we have two windows in the address space (one at the top of + the 32-bit space and one much higher) which, via a combination of the + address and MSI value, will result in one of the 2048 interrupts per + bridge being triggered. There's a PE# in the interrupt controller + descriptor table as well which is compared with the PE# obtained from + the RTT to "authorize" the device to emit that specific interrupt. + +- Error messages just use the RTT. + + * Outbound. That's where the tricky part is. + +Like other PCI host bridges, the Power8 IODA2 PHB supports "windows" +from the CPU address space to the PCI address space. There is one M32 +window and sixteen M64 windows. They have different characteristics. +First what they have in common: they forward a configurable portion of +the CPU address space to the PCIe bus and must be naturally aligned +power of two in size. The rest is different: + +- The M32 window: + + * Is limited to 4GB in size. + + * Drops the top bits of the address (above the size) and replaces + them with a configurable value. This is typically used to generate + 32-bit PCIe accesses. We configure that window at boot from FW and + don't touch it from Linux; it's usually set to forward a 2GB + portion of address space from the CPU to PCIe + 0x8000_..0x_. (Note: The top 64KB are actually + reserved for MSIs but this is not a problem at this point; we just + need to ensure Linux doesn't assign anything there, the M32 logic + ignores that however and will forward in that space if we try). + + * It is divided into 256 segments of equal size. A table in the chip + maps each segment to a PE#. That allows portions of the MMIO space + to be assigned to PEs on a segment granularity. For a 2GB window, + the segment granularity is 2GB/256 = 8MB. + +Now, this is the "main"
[PATCH v12 20/21] powerpc/pci: Remove unused struct pci_dn.pcidev field
From: Wei Yang In struct pci_dn, the pcidev field is assigned but not used, so remove it. Signed-off-by: Wei Yang Signed-off-by: Bjorn Helgaas Acked-by: Gavin Shan --- arch/powerpc/include/asm/pci-bridge.h |1 - arch/powerpc/platforms/powernv/pci-ioda.c |1 - 2 files changed, 2 deletions(-) diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h index 958ea8675691..109efbaf384d 100644 --- a/arch/powerpc/include/asm/pci-bridge.h +++ b/arch/powerpc/include/asm/pci-bridge.h @@ -168,7 +168,6 @@ struct pci_dn { int pci_ext_config_space; /* for pci devices */ - struct pci_dev *pcidev;/* back-pointer to the pci device */ #ifdef CONFIG_EEH struct eeh_dev *edev; /* eeh device */ #endif diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index b265d5da601b..58d4ca01bfd9 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -1024,7 +1024,6 @@ static void pnv_ioda_setup_same_PE(struct pci_bus *bus, struct pnv_ioda_pe *pe) pci_name(dev)); continue; } - pdn->pcidev = dev; pdn->pe_number = pe->pe_number; pe->dma_weight += pnv_ioda_dma_weight(dev); if ((pe->flags & PNV_IODA_PE_BUS_ALL) && dev->subordinate) ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v12 19/21] powerpc/powernv: Group VF PE when IOV BAR is big on PHB3
From: Wei Yang When IOV BAR is big, each is covered by 4 M64 windows. This leads to several VF PE sits in one PE in terms of M64. Group VF PEs according to the M64 allocation. [bhelgaas: use dev_printk() when possible] Signed-off-by: Wei Yang Signed-off-by: Bjorn Helgaas --- arch/powerpc/include/asm/pci-bridge.h |2 arch/powerpc/platforms/powernv/pci-ioda.c | 197 +++-- 2 files changed, 154 insertions(+), 45 deletions(-) diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h index d824bb184ab8..958ea8675691 100644 --- a/arch/powerpc/include/asm/pci-bridge.h +++ b/arch/powerpc/include/asm/pci-bridge.h @@ -182,7 +182,7 @@ struct pci_dn { #define M64_PER_IOV 4 int m64_per_iov; #define IODA_INVALID_M64(-1) - int m64_wins[PCI_SRIOV_NUM_BARS]; + int m64_wins[PCI_SRIOV_NUM_BARS][M64_PER_IOV]; #endif /* CONFIG_PCI_IOV */ #endif struct list_head child_list; diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index 30b7c3909746..b265d5da601b 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -1152,26 +1152,27 @@ static int pnv_pci_vf_release_m64(struct pci_dev *pdev) struct pci_controller *hose; struct pnv_phb*phb; struct pci_dn *pdn; - inti; + inti, j; bus = pdev->bus; hose = pci_bus_to_host(bus); phb = hose->private_data; pdn = pci_get_pdn(pdev); - for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) { - if (pdn->m64_wins[i] == IODA_INVALID_M64) - continue; - opal_pci_phb_mmio_enable(phb->opal_id, - OPAL_M64_WINDOW_TYPE, pdn->m64_wins[i], 0); - clear_bit(pdn->m64_wins[i], &phb->ioda.m64_bar_alloc); - pdn->m64_wins[i] = IODA_INVALID_M64; - } + for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) + for (j = 0; j < M64_PER_IOV; j++) { + if (pdn->m64_wins[i][j] == IODA_INVALID_M64) + continue; + opal_pci_phb_mmio_enable(phb->opal_id, + OPAL_M64_WINDOW_TYPE, pdn->m64_wins[i][j], 0); + clear_bit(pdn->m64_wins[i][j], &phb->ioda.m64_bar_alloc); + pdn->m64_wins[i][j] = IODA_INVALID_M64; + } return 0; } -static int pnv_pci_vf_assign_m64(struct pci_dev *pdev) +static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 vf_num) { struct pci_bus*bus; struct pci_controller *hose; @@ -1179,17 +1180,33 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev) struct pci_dn *pdn; unsigned int win; struct resource *res; - inti; + inti, j; int64_trc; + inttotal_vfs; + resource_size_tsize, start; + intpe_num; + intvf_groups; + intvf_per_group; bus = pdev->bus; hose = pci_bus_to_host(bus); phb = hose->private_data; pdn = pci_get_pdn(pdev); + total_vfs = pci_sriov_get_totalvfs(pdev); /* Initialize the m64_wins to IODA_INVALID_M64 */ for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) - pdn->m64_wins[i] = IODA_INVALID_M64; + for (j = 0; j < M64_PER_IOV; j++) + pdn->m64_wins[i][j] = IODA_INVALID_M64; + + if (pdn->m64_per_iov == M64_PER_IOV) { + vf_groups = (vf_num <= M64_PER_IOV) ? vf_num: M64_PER_IOV; + vf_per_group = (vf_num <= M64_PER_IOV)? 1: + __roundup_pow_of_two(vf_num) / pdn->m64_per_iov; + } else { + vf_groups = 1; + vf_per_group = 1; + } for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) { res = &pdev->resource[i + PCI_IOV_RESOURCES]; @@ -1199,35 +1216,61 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev) if (!pnv_pci_is_mem_pref_64(res->flags)) continue; - do { - win = find_next_zero_bit(&phb->ioda.m64_bar_alloc, - phb->ioda.m64_bar_idx + 1, 0); - - if (win >= phb->ioda.m64_bar_idx + 1) - goto m64_failed; - } while (test_and_set_bit(win, &phb->ioda.m64_bar_alloc)); + for (j = 0; j < vf_groups; j++) { + do { + win = find_next_zero_bit(&phb->ioda.m64_bar_alloc, + phb->ioda.m64_bar_idx + 1, 0); + +
[PATCH v12 18/21] powerpc/powernv: Reserve additional space for IOV BAR, with m64_per_iov supported
From: Wei Yang M64 aperture size is limited on PHB3. When the IOV BAR is too big, this will exceed the limitation and failed to be assigned. Introduce a different mechanism based on the IOV BAR size: - if IOV BAR size is smaller than 64MB, expand to total_pe - if IOV BAR size is bigger than 64MB, roundup power2 [bhelgaas: make dev_printk() output more consistent, use PCI_SRIOV_NUM_BARS] Signed-off-by: Wei Yang Signed-off-by: Bjorn Helgaas --- arch/powerpc/include/asm/pci-bridge.h |2 ++ arch/powerpc/platforms/powernv/pci-ioda.c | 33 ++--- 2 files changed, 32 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h index 011340df8583..d824bb184ab8 100644 --- a/arch/powerpc/include/asm/pci-bridge.h +++ b/arch/powerpc/include/asm/pci-bridge.h @@ -179,6 +179,8 @@ struct pci_dn { u16 max_vfs;/* number of VFs IOV BAR expended */ u16 vf_pes; /* VF PE# under this PF */ int offset; /* PE# for the first VF PE */ +#define M64_PER_IOV 4 + int m64_per_iov; #define IODA_INVALID_M64(-1) int m64_wins[PCI_SRIOV_NUM_BARS]; #endif /* CONFIG_PCI_IOV */ diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index a3c2fbe35fc8..30b7c3909746 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -2242,6 +2242,7 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev) int i; resource_size_t size; struct pci_dn *pdn; + int mul, total_vfs; if (!pdev->is_physfn || pdev->is_added) return; @@ -2252,6 +2253,32 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev) pdn = pci_get_pdn(pdev); pdn->max_vfs = 0; + total_vfs = pci_sriov_get_totalvfs(pdev); + pdn->m64_per_iov = 1; + mul = phb->ioda.total_pe; + + for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) { + res = &pdev->resource[i + PCI_IOV_RESOURCES]; + if (!res->flags || res->parent) + continue; + if (!pnv_pci_is_mem_pref_64(res->flags)) { + dev_warn(&pdev->dev, " non M64 VF BAR%d: %pR\n", +i, res); + continue; + } + + size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES); + + /* bigger than 64M */ + if (size > (1 << 26)) { + dev_info(&pdev->dev, "PowerNV: VF BAR%d: %pR IOV size is bigger than 64M, roundup power2\n", +i, res); + pdn->m64_per_iov = M64_PER_IOV; + mul = __roundup_pow_of_two(total_vfs); + break; + } + } + for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) { res = &pdev->resource[i + PCI_IOV_RESOURCES]; if (!res->flags || res->parent) @@ -2264,12 +2291,12 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev) dev_dbg(&pdev->dev, " Fixing VF BAR%d: %pR to\n", i, res); size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES); - res->end = res->start + size * phb->ioda.total_pe - 1; + res->end = res->start + size * mul - 1; dev_dbg(&pdev->dev, " %pR\n", res); dev_info(&pdev->dev, "VF BAR%d: %pR (expanded to %d VFs for PE alignment)", - i, res, phb->ioda.total_pe); +i, res, mul); } - pdn->max_vfs = phb->ioda.total_pe; + pdn->max_vfs = mul; } static void pnv_pci_ioda_fixup_sriov(struct pci_bus *bus) ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v12 17/21] powerpc/powernv: Shift VF resource with an offset
From: Wei Yang On PowerNV platform, resource position in M64 implies the PE# the resource belongs to. In some cases, adjustment of a resource is necessary to locate it to a correct position in M64. Add pnv_pci_vf_resource_shift() to shift the 'real' PF IOV BAR address according to an offset. [bhelgaas: rework loops, rework overlap check, index resource[] conventionally, remove pci_regs.h include, squashed with next patch] Signed-off-by: Wei Yang Signed-off-by: Bjorn Helgaas --- arch/powerpc/include/asm/pci-bridge.h |4 arch/powerpc/kernel/pci_dn.c | 11 + arch/powerpc/platforms/powernv/pci-ioda.c | 520 - arch/powerpc/platforms/powernv/pci.c | 18 + arch/powerpc/platforms/powernv/pci.h |7 5 files changed, 543 insertions(+), 17 deletions(-) diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h index de11de7d4547..011340df8583 100644 --- a/arch/powerpc/include/asm/pci-bridge.h +++ b/arch/powerpc/include/asm/pci-bridge.h @@ -177,6 +177,10 @@ struct pci_dn { int pe_number; #ifdef CONFIG_PCI_IOV u16 max_vfs;/* number of VFs IOV BAR expended */ + u16 vf_pes; /* VF PE# under this PF */ + int offset; /* PE# for the first VF PE */ +#define IODA_INVALID_M64(-1) + int m64_wins[PCI_SRIOV_NUM_BARS]; #endif /* CONFIG_PCI_IOV */ #endif struct list_head child_list; diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c index f3a1a81d112f..5faf7ca45434 100644 --- a/arch/powerpc/kernel/pci_dn.c +++ b/arch/powerpc/kernel/pci_dn.c @@ -217,6 +217,17 @@ void remove_dev_pci_info(struct pci_dev *pdev) struct pci_dn *pdn, *tmp; int i; + /* +* VF and VF PE are created/released dynamically, so we need to +* bind/unbind them. Otherwise the VF and VF PE would be mismatched +* when re-enabling SR-IOV. +*/ + if (pdev->is_virtfn) { + pdn = pci_get_pdn(pdev); + pdn->pe_number = IODA_INVALID_PE; + return; + } + /* Only support IOV PF for now */ if (!pdev->is_physfn) return; diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index 6a86690bb8de..a3c2fbe35fc8 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -44,6 +44,9 @@ #include "powernv.h" #include "pci.h" +/* 256M DMA window, 4K TCE pages, 8 bytes TCE */ +#define TCE32_TABLE_SIZE ((0x1000 / 0x1000) * 8) + static void pe_level_printk(const struct pnv_ioda_pe *pe, const char *level, const char *fmt, ...) { @@ -56,11 +59,18 @@ static void pe_level_printk(const struct pnv_ioda_pe *pe, const char *level, vaf.fmt = fmt; vaf.va = &args; - if (pe->pdev) + if (pe->flags & PNV_IODA_PE_DEV) strlcpy(pfix, dev_name(&pe->pdev->dev), sizeof(pfix)); - else + else if (pe->flags & (PNV_IODA_PE_BUS | PNV_IODA_PE_BUS_ALL)) sprintf(pfix, "%04x:%02x ", pci_domain_nr(pe->pbus), pe->pbus->number); +#ifdef CONFIG_PCI_IOV + else if (pe->flags & PNV_IODA_PE_VF) + sprintf(pfix, "%04x:%02x:%2x.%d", + pci_domain_nr(pe->parent_dev->bus), + (pe->rid & 0xff00) >> 8, + PCI_SLOT(pe->rid), PCI_FUNC(pe->rid)); +#endif /* CONFIG_PCI_IOV*/ printk("%spci %s: [PE# %.3d] %pV", level, pfix, pe->pe_number, &vaf); @@ -591,7 +601,7 @@ static int pnv_ioda_set_peltv(struct pnv_phb *phb, bool is_add) { struct pnv_ioda_pe *slave; - struct pci_dev *pdev; + struct pci_dev *pdev = NULL; int ret; /* @@ -630,8 +640,12 @@ static int pnv_ioda_set_peltv(struct pnv_phb *phb, if (pe->flags & (PNV_IODA_PE_BUS_ALL | PNV_IODA_PE_BUS)) pdev = pe->pbus->self; - else + else if (pe->flags & PNV_IODA_PE_DEV) pdev = pe->pdev->bus->self; +#ifdef CONFIG_PCI_IOV + else if (pe->flags & PNV_IODA_PE_VF) + pdev = pe->parent_dev->bus->self; +#endif /* CONFIG_PCI_IOV */ while (pdev) { struct pci_dn *pdn = pci_get_pdn(pdev); struct pnv_ioda_pe *parent; @@ -649,6 +663,87 @@ static int pnv_ioda_set_peltv(struct pnv_phb *phb, return 0; } +#ifdef CONFIG_PCI_IOV +static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe) +{ + struct pci_dev *parent; + uint8_t bcomp, dcomp, fcomp; + int64_t rc; + long rid_end, rid; + + /* Currently, we just deconfigure VF PE. Bus PE will always there.*/ + if (pe->pbus) { + int count; + + dcomp = OPAL_IGNORE_RID_DEVI
[PATCH v12 16/21] powerpc/powernv: Implement pcibios_iov_resource_alignment() on powernv
From: Wei Yang Implement pcibios_iov_resource_alignment() on powernv platform. On PowerNV platform, there are 3 cases for the IOV BAR: 1. initial state, the IOV BAR size is multiple times of VF BAR size 2. after expanded, the IOV BAR size is expanded to meet the M64 segment size 3. sizing stage, the IOV BAR is truncated to 0 pnv_pci_iov_resource_alignment() handle these three cases respectively. [bhelgaas: adjust to drop "align" parameter, return pci_iov_resource_size() if no ppc_md machdep_call version] Signed-off-by: Wei Yang Signed-off-by: Bjorn Helgaas --- arch/powerpc/include/asm/machdep.h|1 + arch/powerpc/kernel/pci-common.c | 10 ++ arch/powerpc/platforms/powernv/pci-ioda.c | 20 3 files changed, 31 insertions(+) diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h index 965547c58497..045448f9e8b2 100644 --- a/arch/powerpc/include/asm/machdep.h +++ b/arch/powerpc/include/asm/machdep.h @@ -252,6 +252,7 @@ struct machdep_calls { #ifdef CONFIG_PCI_IOV void (*pcibios_fixup_sriov)(struct pci_bus *bus); + resource_size_t (*pcibios_iov_resource_alignment)(struct pci_dev *, int resno); #endif /* CONFIG_PCI_IOV */ /* Called to shutdown machine specific hardware not already controlled diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c index 022e9feeb1f2..2f1ad9ef4402 100644 --- a/arch/powerpc/kernel/pci-common.c +++ b/arch/powerpc/kernel/pci-common.c @@ -130,6 +130,16 @@ void pcibios_reset_secondary_bus(struct pci_dev *dev) pci_reset_secondary_bus(dev); } +#ifdef CONFIG_PCI_IOV +resource_size_t pcibios_iov_resource_alignment(struct pci_dev *pdev, int resno) +{ + if (ppc_md.pcibios_iov_resource_alignment) + return ppc_md.pcibios_iov_resource_alignment(pdev, resno); + + return pci_iov_resource_size(dev, resno); +} +#endif /* CONFIG_PCI_IOV */ + static resource_size_t pcibios_io_size(const struct pci_controller *hose) { #ifdef CONFIG_PPC64 diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index 36c533da5ccb..6a86690bb8de 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -1980,6 +1980,25 @@ static resource_size_t pnv_pci_window_alignment(struct pci_bus *bus, return phb->ioda.io_segsize; } +#ifdef CONFIG_PCI_IOV +static resource_size_t pnv_pci_iov_resource_alignment(struct pci_dev *pdev, + int resno) +{ + struct pci_dn *pdn = pci_get_pdn(pdev); + resource_size_t align, iov_align; + + iov_align = resource_size(&pdev->resource[resno]); + if (iov_align) + return iov_align; + + align = pci_iov_resource_size(pdev, resno); + if (pdn->max_vfs) + return pdn->max_vfs * align; + + return align; +} +#endif /* CONFIG_PCI_IOV */ + /* Prevent enabling devices for which we couldn't properly * assign a PE */ @@ -2182,6 +2201,7 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np, ppc_md.pcibios_reset_secondary_bus = pnv_pci_reset_secondary_bus; #ifdef CONFIG_PCI_IOV ppc_md.pcibios_fixup_sriov = pnv_pci_ioda_fixup_sriov; + ppc_md.pcibios_iov_resource_alignment = pnv_pci_iov_resource_alignment; #endif /* CONFIG_PCI_IOV */ pci_add_flags(PCI_REASSIGN_ALL_RSRC); ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v12 15/21] powerpc/powernv: Reserve additional space for IOV BAR according to the number of total_pe
From: Wei Yang On PHB3, PF IOV BAR will be covered by M64 window to have better PE isolation. The total_pe number is usually different from total_VFs, which can lead to a conflict between MMIO space and the PE number. For example, if total_VFs is 128 and total_pe is 256, the second half of M64 window will be part of other PCI device, which may already belong to other PEs. Prevent the conflict by reserving additional space for the PF IOV BAR, which is total_pe number of VF's BAR size. [bhelgaas: make dev_printk() output more consistent, index resource[] conventionally] Signed-off-by: Wei Yang Signed-off-by: Bjorn Helgaas --- arch/powerpc/include/asm/machdep.h|4 ++ arch/powerpc/include/asm/pci-bridge.h |3 ++ arch/powerpc/kernel/pci-common.c |5 +++ arch/powerpc/platforms/powernv/pci-ioda.c | 58 + 4 files changed, 70 insertions(+) diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h index c8175a3fe560..965547c58497 100644 --- a/arch/powerpc/include/asm/machdep.h +++ b/arch/powerpc/include/asm/machdep.h @@ -250,6 +250,10 @@ struct machdep_calls { /* Reset the secondary bus of bridge */ void (*pcibios_reset_secondary_bus)(struct pci_dev *dev); +#ifdef CONFIG_PCI_IOV + void (*pcibios_fixup_sriov)(struct pci_bus *bus); +#endif /* CONFIG_PCI_IOV */ + /* Called to shutdown machine specific hardware not already controlled * by other drivers. */ diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h index 513f8f27060d..de11de7d4547 100644 --- a/arch/powerpc/include/asm/pci-bridge.h +++ b/arch/powerpc/include/asm/pci-bridge.h @@ -175,6 +175,9 @@ struct pci_dn { #define IODA_INVALID_PE(-1) #ifdef CONFIG_PPC_POWERNV int pe_number; +#ifdef CONFIG_PCI_IOV + u16 max_vfs;/* number of VFs IOV BAR expended */ +#endif /* CONFIG_PCI_IOV */ #endif struct list_head child_list; struct list_head list; diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c index 82031011522f..022e9feeb1f2 100644 --- a/arch/powerpc/kernel/pci-common.c +++ b/arch/powerpc/kernel/pci-common.c @@ -1646,6 +1646,11 @@ void pcibios_scan_phb(struct pci_controller *hose) if (ppc_md.pcibios_fixup_phb) ppc_md.pcibios_fixup_phb(hose); +#ifdef CONFIG_PCI_IOV + if (ppc_md.pcibios_fixup_sriov) + ppc_md.pcibios_fixup_sriov(bus); +#endif /* CONFIG_PCI_IOV */ + /* Configure PCI Express settings */ if (bus && !pci_has_flag(PCI_PROBE_ONLY)) { struct pci_bus *child; diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index cd1a56160ded..36c533da5ccb 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -1749,6 +1749,61 @@ static void pnv_pci_init_ioda_msis(struct pnv_phb *phb) static void pnv_pci_init_ioda_msis(struct pnv_phb *phb) { } #endif /* CONFIG_PCI_MSI */ +#ifdef CONFIG_PCI_IOV +static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev) +{ + struct pci_controller *hose; + struct pnv_phb *phb; + struct resource *res; + int i; + resource_size_t size; + struct pci_dn *pdn; + + if (!pdev->is_physfn || pdev->is_added) + return; + + hose = pci_bus_to_host(pdev->bus); + phb = hose->private_data; + + pdn = pci_get_pdn(pdev); + pdn->max_vfs = 0; + + for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) { + res = &pdev->resource[i + PCI_IOV_RESOURCES]; + if (!res->flags || res->parent) + continue; + if (!pnv_pci_is_mem_pref_64(res->flags)) { + dev_warn(&pdev->dev, "Skipping expanding VF BAR%d: %pR\n", +i, res); + continue; + } + + dev_dbg(&pdev->dev, " Fixing VF BAR%d: %pR to\n", i, res); + size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES); + res->end = res->start + size * phb->ioda.total_pe - 1; + dev_dbg(&pdev->dev, " %pR\n", res); + dev_info(&pdev->dev, "VF BAR%d: %pR (expanded to %d VFs for PE alignment)", + i, res, phb->ioda.total_pe); + } + pdn->max_vfs = phb->ioda.total_pe; +} + +static void pnv_pci_ioda_fixup_sriov(struct pci_bus *bus) +{ + struct pci_dev *pdev; + struct pci_bus *b; + + list_for_each_entry(pdev, &bus->devices, bus_list) { + b = pdev->subordinate; + + if (b) + pnv_pci_ioda_fixup_sriov(b); + + pnv_pci_ioda_fixup_iov_resources(pdev); + } +} +#endif /* CONFIG_PCI_IOV */ + /* * This function is supposed to be called on basi
[PATCH v12 14/21] powerpc/powernv: Allocate struct pnv_ioda_pe iommu_table dynamically
From: Wei Yang Current iommu_table of a PE is a static field. This will have a problem when iommu_free_table() is called. Allocate iommu_table dynamically. Signed-off-by: Wei Yang Signed-off-by: Bjorn Helgaas --- arch/powerpc/include/asm/iommu.h |3 +++ arch/powerpc/platforms/powernv/pci-ioda.c | 26 ++ arch/powerpc/platforms/powernv/pci.h |2 +- 3 files changed, 18 insertions(+), 13 deletions(-) diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h index 9cfa3706a1b8..5574eeb97634 100644 --- a/arch/powerpc/include/asm/iommu.h +++ b/arch/powerpc/include/asm/iommu.h @@ -78,6 +78,9 @@ struct iommu_table { struct iommu_group *it_group; #endif void (*set_bypass)(struct iommu_table *tbl, bool enable); +#ifdef CONFIG_PPC_POWERNV + void *data; +#endif }; /* Pure 2^n version of get_order */ diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index 58c4fc4ab63c..cd1a56160ded 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -916,6 +916,10 @@ static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, int all) return; } + pe->tce32_table = kzalloc_node(sizeof(struct iommu_table), + GFP_KERNEL, hose->node); + pe->tce32_table->data = pe; + /* Associate it with all child devices */ pnv_ioda_setup_same_PE(bus, pe); @@ -1005,7 +1009,7 @@ static void pnv_pci_ioda_dma_dev_setup(struct pnv_phb *phb, struct pci_dev *pdev pe = &phb->ioda.pe_array[pdn->pe_number]; WARN_ON(get_dma_ops(&pdev->dev) != &dma_iommu_ops); - set_iommu_table_base_and_group(&pdev->dev, &pe->tce32_table); + set_iommu_table_base_and_group(&pdev->dev, pe->tce32_table); } static int pnv_pci_ioda_dma_set_mask(struct pnv_phb *phb, @@ -1032,7 +1036,7 @@ static int pnv_pci_ioda_dma_set_mask(struct pnv_phb *phb, } else { dev_info(&pdev->dev, "Using 32-bit DMA via iommu\n"); set_dma_ops(&pdev->dev, &dma_iommu_ops); - set_iommu_table_base(&pdev->dev, &pe->tce32_table); + set_iommu_table_base(&pdev->dev, pe->tce32_table); } *pdev->dev.dma_mask = dma_mask; return 0; @@ -1069,9 +1073,9 @@ static void pnv_ioda_setup_bus_dma(struct pnv_ioda_pe *pe, list_for_each_entry(dev, &bus->devices, bus_list) { if (add_to_iommu_group) set_iommu_table_base_and_group(&dev->dev, - &pe->tce32_table); + pe->tce32_table); else - set_iommu_table_base(&dev->dev, &pe->tce32_table); + set_iommu_table_base(&dev->dev, pe->tce32_table); if (dev->subordinate) pnv_ioda_setup_bus_dma(pe, dev->subordinate, @@ -1161,8 +1165,7 @@ static void pnv_pci_ioda2_tce_invalidate(struct pnv_ioda_pe *pe, void pnv_pci_ioda_tce_invalidate(struct iommu_table *tbl, __be64 *startp, __be64 *endp, bool rm) { - struct pnv_ioda_pe *pe = container_of(tbl, struct pnv_ioda_pe, - tce32_table); + struct pnv_ioda_pe *pe = tbl->data; struct pnv_phb *phb = pe->phb; if (phb->type == PNV_PHB_IODA1) @@ -1228,7 +1231,7 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb, } /* Setup linux iommu table */ - tbl = &pe->tce32_table; + tbl = pe->tce32_table; pnv_pci_setup_iommu_table(tbl, addr, TCE32_TABLE_SIZE * segs, base << 28, IOMMU_PAGE_SHIFT_4K); @@ -1266,8 +1269,7 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb, static void pnv_pci_ioda2_set_bypass(struct iommu_table *tbl, bool enable) { - struct pnv_ioda_pe *pe = container_of(tbl, struct pnv_ioda_pe, - tce32_table); + struct pnv_ioda_pe *pe = tbl->data; uint16_t window_id = (pe->pe_number << 1 ) + 1; int64_t rc; @@ -1312,10 +1314,10 @@ static void pnv_pci_ioda2_setup_bypass_pe(struct pnv_phb *phb, pe->tce_bypass_base = 1ull << 59; /* Install set_bypass callback for VFIO */ - pe->tce32_table.set_bypass = pnv_pci_ioda2_set_bypass; + pe->tce32_table->set_bypass = pnv_pci_ioda2_set_bypass; /* Enable bypass by default */ - pnv_pci_ioda2_set_bypass(&pe->tce32_table, true); + pnv_pci_ioda2_set_bypass(pe->tce32_table, true); } static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb, @@ -1363,7 +1365,7 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb, } /* Setup linux iommu table */ - tbl = &pe->tce32_table; + tbl = pe->tce32_table;
[PATCH v12 13/21] powerpc/powernv: Use pci_dn, not device_node, in PCI config accessor
From: Wei Yang The PCI config accessors previously relied on device_node. Unfortunately, VFs don't have a corresponding device_node, so change the accessors to use pci_dn instead. [bhelgaas: changelog] Signed-off-by: Gavin Shan Signed-off-by: Bjorn Helgaas --- arch/powerpc/platforms/powernv/eeh-powernv.c | 14 + arch/powerpc/platforms/powernv/pci.c | 69 ++ arch/powerpc/platforms/powernv/pci.h |4 +- 3 files changed, 40 insertions(+), 47 deletions(-) diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c index e261869adc86..7a5021b95a14 100644 --- a/arch/powerpc/platforms/powernv/eeh-powernv.c +++ b/arch/powerpc/platforms/powernv/eeh-powernv.c @@ -430,21 +430,31 @@ static inline bool powernv_eeh_cfg_blocked(struct device_node *dn) static int powernv_eeh_read_config(struct device_node *dn, int where, int size, u32 *val) { + struct pci_dn *pdn = PCI_DN(dn); + + if (!pdn) + return PCIBIOS_DEVICE_NOT_FOUND; + if (powernv_eeh_cfg_blocked(dn)) { *val = 0x; return PCIBIOS_SET_FAILED; } - return pnv_pci_cfg_read(dn, where, size, val); + return pnv_pci_cfg_read(pdn, where, size, val); } static int powernv_eeh_write_config(struct device_node *dn, int where, int size, u32 val) { + struct pci_dn *pdn = PCI_DN(dn); + + if (!pdn) + return PCIBIOS_DEVICE_NOT_FOUND; + if (powernv_eeh_cfg_blocked(dn)) return PCIBIOS_SET_FAILED; - return pnv_pci_cfg_write(dn, where, size, val); + return pnv_pci_cfg_write(pdn, where, size, val); } /** diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c index e69142f4af08..6c20d6e70383 100644 --- a/arch/powerpc/platforms/powernv/pci.c +++ b/arch/powerpc/platforms/powernv/pci.c @@ -366,9 +366,9 @@ static void pnv_pci_handle_eeh_config(struct pnv_phb *phb, u32 pe_no) spin_unlock_irqrestore(&phb->lock, flags); } -static void pnv_pci_config_check_eeh(struct pnv_phb *phb, -struct device_node *dn) +static void pnv_pci_config_check_eeh(struct pci_dn *pdn) { + struct pnv_phb *phb = pdn->phb->private_data; u8 fstate; __be16 pcierr; int pe_no; @@ -379,7 +379,7 @@ static void pnv_pci_config_check_eeh(struct pnv_phb *phb, * setup that yet. So all ER errors should be mapped to * reserved PE. */ - pe_no = PCI_DN(dn)->pe_number; + pe_no = pdn->pe_number; if (pe_no == IODA_INVALID_PE) { if (phb->type == PNV_PHB_P5IOC2) pe_no = 0; @@ -407,8 +407,7 @@ static void pnv_pci_config_check_eeh(struct pnv_phb *phb, } cfg_dbg(" -> EEH check, bdfn=%04x PE#%d fstate=%x\n", - (PCI_DN(dn)->busno << 8) | (PCI_DN(dn)->devfn), - pe_no, fstate); + (pdn->busno << 8) | (pdn->devfn), pe_no, fstate); /* Clear the frozen state if applicable */ if (fstate == OPAL_EEH_STOPPED_MMIO_FREEZE || @@ -425,10 +424,9 @@ static void pnv_pci_config_check_eeh(struct pnv_phb *phb, } } -int pnv_pci_cfg_read(struct device_node *dn, +int pnv_pci_cfg_read(struct pci_dn *pdn, int where, int size, u32 *val) { - struct pci_dn *pdn = PCI_DN(dn); struct pnv_phb *phb = pdn->phb->private_data; u32 bdfn = (pdn->busno << 8) | pdn->devfn; s64 rc; @@ -462,10 +460,9 @@ int pnv_pci_cfg_read(struct device_node *dn, return PCIBIOS_SUCCESSFUL; } -int pnv_pci_cfg_write(struct device_node *dn, +int pnv_pci_cfg_write(struct pci_dn *pdn, int where, int size, u32 val) { - struct pci_dn *pdn = PCI_DN(dn); struct pnv_phb *phb = pdn->phb->private_data; u32 bdfn = (pdn->busno << 8) | pdn->devfn; @@ -489,18 +486,17 @@ int pnv_pci_cfg_write(struct device_node *dn, } #if CONFIG_EEH -static bool pnv_pci_cfg_check(struct pci_controller *hose, - struct device_node *dn) +static bool pnv_pci_cfg_check(struct pci_dn *pdn) { struct eeh_dev *edev = NULL; - struct pnv_phb *phb = hose->private_data; + struct pnv_phb *phb = pdn->phb->private_data; /* EEH not enabled ? */ if (!(phb->flags & PNV_PHB_FLAG_EEH)) return true; /* PE reset or device removed ? */ - edev = of_node_to_eeh_dev(dn); + edev = pdn->edev; if (edev) { if (edev->pe && (edev->pe->state & EEH_PE_CFG_BLOCKED)) @@ -513,8 +509,7 @@ static bool pnv_pci_cfg_check(struct pci_controller *hose, return true; } #else -static inline pnv_pci_cfg_check(struct pci_controller *hose, - struct devi
[PATCH v12 12/21] powerpc/pci: Refactor pci_dn
From: Gavin Shan pci_dn is the extension of PCI device node and is created from device node. Unfortunately, VFs are enabled dynamically by PF's driver and they don't have corresponding device nodes, and pci_dn. Refactor pci_dn to support VFs: * pci_dn is organized as a hierarchy tree. VF's pci_dn is put to the child list of pci_dn of PF's bridge. pci_dn of other device put to the child list of pci_dn of its upstream bridge. * VF's pci_dn is expected to be created dynamically when PF enabling VFs. VF's pci_dn will be destroyed when PF disabling VFs. pci_dn of other device is still created from device node as before. * For one particular PCI device (VF or not), its pci_dn can be found from pdev->dev.archdata.firmware_data, PCI_DN(devnode), or parent's list. The fast path (fetching pci_dn through PCI device instance) is populated during early fixup time. [bhelgaas: add ifdef around add_one_dev_pci_info(), use dev_printk()] Signed-off-by: Gavin Shan Signed-off-by: Bjorn Helgaas --- arch/powerpc/include/asm/device.h |3 arch/powerpc/include/asm/pci-bridge.h | 14 +- arch/powerpc/kernel/pci_dn.c | 245 + arch/powerpc/platforms/powernv/pci-ioda.c | 16 ++ 4 files changed, 272 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/include/asm/device.h b/arch/powerpc/include/asm/device.h index 38faeded7d59..29992cd020bb 100644 --- a/arch/powerpc/include/asm/device.h +++ b/arch/powerpc/include/asm/device.h @@ -34,6 +34,9 @@ struct dev_archdata { #ifdef CONFIG_SWIOTLB dma_addr_t max_direct_dma_addr; #endif +#ifdef CONFIG_PPC64 + void*firmware_data; +#endif #ifdef CONFIG_EEH struct eeh_dev *edev; #endif diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h index 546d036fe925..513f8f27060d 100644 --- a/arch/powerpc/include/asm/pci-bridge.h +++ b/arch/powerpc/include/asm/pci-bridge.h @@ -89,6 +89,7 @@ struct pci_controller { #ifdef CONFIG_PPC64 unsigned long buid; + void *firmware_data; #endif /* CONFIG_PPC64 */ void *private_data; @@ -154,9 +155,13 @@ static inline int isa_vaddr_is_ioport(void __iomem *address) struct iommu_table; struct pci_dn { + int flags; +#define PCI_DN_FLAG_IOV_VF 0x01 + int busno; /* pci bus number */ int devfn; /* pci device and function number */ + struct pci_dn *parent; struct pci_controller *phb;/* for pci devices */ struct iommu_table *iommu_table; /* for phb's or bridges */ struct device_node *node; /* back-pointer to the device_node */ @@ -171,14 +176,19 @@ struct pci_dn { #ifdef CONFIG_PPC_POWERNV int pe_number; #endif + struct list_head child_list; + struct list_head list; }; /* Get the pointer to a device_node's pci_dn */ #define PCI_DN(dn) ((struct pci_dn *) (dn)->data) +extern struct pci_dn *pci_get_pdn_by_devfn(struct pci_bus *bus, + int devfn); extern struct pci_dn *pci_get_pdn(struct pci_dev *pdev); - -extern void * update_dn_pci_info(struct device_node *dn, void *data); +extern struct pci_dn *add_dev_pci_info(struct pci_dev *pdev); +extern void remove_dev_pci_info(struct pci_dev *pdev); +extern void *update_dn_pci_info(struct device_node *dn, void *data); static inline int pci_device_from_OF_node(struct device_node *np, u8 *bus, u8 *devfn) diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c index 83df3075d3df..f3a1a81d112f 100644 --- a/arch/powerpc/kernel/pci_dn.c +++ b/arch/powerpc/kernel/pci_dn.c @@ -32,12 +32,223 @@ #include #include +/* + * The function is used to find the firmware data of one + * specific PCI device, which is attached to the indicated + * PCI bus. For VFs, their firmware data is linked to that + * one of PF's bridge. For other devices, their firmware + * data is linked to that of their bridge. + */ +static struct pci_dn *pci_bus_to_pdn(struct pci_bus *bus) +{ + struct pci_bus *pbus; + struct device_node *dn; + struct pci_dn *pdn; + + /* +* We probably have virtual bus which doesn't +* have associated bridge. +*/ + pbus = bus; + while (pbus) { + if (pci_is_root_bus(pbus) || pbus->self) + break; + + pbus = pbus->parent; + } + + /* +* Except virtual bus, all PCI buses should +* have device nodes. +*/ + dn = pci_bus_to_OF_node(pbus); + pdn = dn ? PCI_DN(dn) : NULL; + + return pdn; +} + +struct pci_dn *pci_get_pdn_by_devfn(struct pci_bus *bus, + int devfn) +{ + struct device_node *dn = NULL; + struct pci_dn *parent, *pdn; +
[PATCH v12 11/21] powerpc/pci: Don't unset PCI resources for VFs
From: Wei Yang If we're going to reassign resources with flag PCI_REASSIGN_ALL_RSRC, all resources will be cleaned out during device header fixup time and then get reassigned by PCI core. However, the VF resources won't be reassigned and thus, we shouldn't clean them out. If the pci_dev is a VF, skip the resource unset process. Signed-off-by: Wei Yang Signed-off-by: Bjorn Helgaas --- arch/powerpc/kernel/pci-common.c |4 1 file changed, 4 insertions(+) diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c index 2a525c938158..82031011522f 100644 --- a/arch/powerpc/kernel/pci-common.c +++ b/arch/powerpc/kernel/pci-common.c @@ -788,6 +788,10 @@ static void pcibios_fixup_resources(struct pci_dev *dev) pci_name(dev)); return; } + + if (dev->is_virtfn) + return; + for (i = 0; i < DEVICE_COUNT_RESOURCE; i++) { struct resource *res = dev->resource + i; struct pci_bus_region reg; ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v12 10/21] PCI: Consider additional PF's IOV BAR alignment in sizing and assigning
From: Wei Yang When sizing and assigning resources, we divide the resources into two lists: the requested list and the additional list. We don't consider the alignment of additional VF(n) BAR space. This is reasonable because the alignment required for the VF(n) BAR space is the size of an individual VF BAR, not the size of the space for *all* VFs. But some platforms, e.g., PowerNV, require additional alignment. Consider the additional IOV BAR alignment when sizing and assigning resources. When there is not enough system MMIO space, the PF's IOV BAR alignment will not contribute to the bridge. When there is enough system MMIO space, the additional alignment will contribute to the bridge. Also, take advantage of pci_dev_resource::min_align to store this additional alignment. [bhelgaas: changelog, printk cast] Signed-off-by: Wei Yang Signed-off-by: Bjorn Helgaas --- drivers/pci/setup-bus.c | 83 --- 1 file changed, 70 insertions(+), 13 deletions(-) diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c index e3e17f3c0f0f..affbceae560f 100644 --- a/drivers/pci/setup-bus.c +++ b/drivers/pci/setup-bus.c @@ -99,8 +99,8 @@ static void remove_from_list(struct list_head *head, } } -static resource_size_t get_res_add_size(struct list_head *head, - struct resource *res) +static struct pci_dev_resource *res_to_dev_res(struct list_head *head, + struct resource *res) { struct pci_dev_resource *dev_res; @@ -109,17 +109,37 @@ static resource_size_t get_res_add_size(struct list_head *head, int idx = res - &dev_res->dev->resource[0]; dev_printk(KERN_DEBUG, &dev_res->dev->dev, -"res[%d]=%pR get_res_add_size add_size %llx\n", +"res[%d]=%pR res_to_dev_res add_size %llx min_align %llx\n", idx, dev_res->res, -(unsigned long long)dev_res->add_size); +(unsigned long long)dev_res->add_size, +(unsigned long long)dev_res->min_align); - return dev_res->add_size; + return dev_res; } } - return 0; + return NULL; +} + +static resource_size_t get_res_add_size(struct list_head *head, + struct resource *res) +{ + struct pci_dev_resource *dev_res; + + dev_res = res_to_dev_res(head, res); + return dev_res ? dev_res->add_size : 0; +} + +static resource_size_t get_res_add_align(struct list_head *head, +struct resource *res) +{ + struct pci_dev_resource *dev_res; + + dev_res = res_to_dev_res(head, res); + return dev_res ? dev_res->min_align : 0; } + /* Sort resources by alignment */ static void pdev_sort_resources(struct pci_dev *dev, struct list_head *head) { @@ -368,8 +388,9 @@ static void __assign_resources_sorted(struct list_head *head, LIST_HEAD(save_head); LIST_HEAD(local_fail_head); struct pci_dev_resource *save_res; - struct pci_dev_resource *dev_res, *tmp_res; + struct pci_dev_resource *dev_res, *tmp_res, *dev_res2; unsigned long fail_type; + resource_size_t add_align, align; /* Check if optional add_size is there */ if (!realloc_head || list_empty(realloc_head)) @@ -384,10 +405,38 @@ static void __assign_resources_sorted(struct list_head *head, } /* Update res in head list with add_size in realloc_head list */ - list_for_each_entry(dev_res, head, list) + list_for_each_entry_safe(dev_res, tmp_res, head, list) { dev_res->res->end += get_res_add_size(realloc_head, dev_res->res); + /* +* There are two kinds of additional resources in the list: +* 1. bridge resource -- IORESOURCE_STARTALIGN +* 2. SR-IOV resource -- IORESOURCE_SIZEALIGN +* Here just fix the additional alignment for bridge +*/ + if (!(dev_res->res->flags & IORESOURCE_STARTALIGN)) + continue; + + add_align = get_res_add_align(realloc_head, dev_res->res); + + /* Reorder the list by their alignment */ + if (add_align > dev_res->res->start) { + dev_res->res->start = add_align; + dev_res->res->end = add_align + + resource_size(dev_res->res); + + list_for_each_entry(dev_res2, head, list) { + align = pci_resource_alignment(dev_res2->dev, +
[PATCH v12 09/21] PCI: Add pcibios_iov_resource_alignment() interface
From: Wei Yang Per the SR-IOV spec r1.1, sec 3.3.14, the required alignment of a PF's IOV BAR is the size of an individual VF BAR, and the size consumed is the individual VF BAR size times NumVFs. The PowerNV platform has additional alignment requirements to help support its Partitionable Endpoint device isolation feature (see Documentation/powerpc/pci_iov_resource_on_powernv.txt). Add a pcibios_iov_resource_alignment() interface to allow platforms to request additional alignment. [bhelgaas: changelog, adapt to reworked pci_sriov_resource_alignment(), drop "align" parameter] Signed-off-by: Wei Yang Signed-off-by: Bjorn Helgaas --- drivers/pci/iov.c |8 +++- include/linux/pci.h |1 + 2 files changed, 8 insertions(+), 1 deletion(-) diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index cc6fedf4a1b9..bde0f02cae32 100644 --- a/drivers/pci/iov.c +++ b/drivers/pci/iov.c @@ -569,6 +569,12 @@ int pci_iov_resource_bar(struct pci_dev *dev, int resno) 4 * (resno - PCI_IOV_RESOURCES); } +resource_size_t __weak pcibios_iov_resource_alignment(struct pci_dev *dev, + int resno) +{ + return pci_iov_resource_size(dev, resno); +} + /** * pci_sriov_resource_alignment - get resource alignment for VF BAR * @dev: the PCI device @@ -581,7 +587,7 @@ int pci_iov_resource_bar(struct pci_dev *dev, int resno) */ resource_size_t pci_sriov_resource_alignment(struct pci_dev *dev, int resno) { - return pci_iov_resource_size(dev, resno); + return pcibios_iov_resource_alignment(dev, resno); } /** diff --git a/include/linux/pci.h b/include/linux/pci.h index 99ea94835fb6..4e1f17db1a81 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -1174,6 +1174,7 @@ unsigned char pci_bus_max_busnr(struct pci_bus *bus); void pci_setup_bridge(struct pci_bus *bus); resource_size_t pcibios_window_alignment(struct pci_bus *bus, unsigned long type); +resource_size_t pcibios_iov_resource_alignment(struct pci_dev *dev, int resno); #define PCI_VGA_STATE_CHANGE_BRIDGE (1 << 0) #define PCI_VGA_STATE_CHANGE_DECODES (1 << 1) ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v12 08/21] PCI: Add pcibios_sriov_enable() and pcibios_sriov_disable()
From: Wei Yang VFs are dynamically created when a driver enables them. On some platforms, like PowerNV, special resources are necessary to enable VFs. Add platform hooks for enabling and disabling VFs. Signed-off-by: Wei Yang Signed-off-by: Bjorn Helgaas --- drivers/pci/iov.c | 19 +++ 1 file changed, 19 insertions(+) diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index 5643a1011e23..cc6fedf4a1b9 100644 --- a/drivers/pci/iov.c +++ b/drivers/pci/iov.c @@ -220,6 +220,11 @@ static void virtfn_remove(struct pci_dev *dev, int id, int reset) pci_dev_put(dev); } +int __weak pcibios_sriov_enable(struct pci_dev *pdev, u16 vf_num) +{ + return 0; +} + static int sriov_enable(struct pci_dev *dev, int nr_virtfn) { int rc; @@ -231,6 +236,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn) struct pci_sriov *iov = dev->sriov; int bars = 0; int bus; + int retval; if (!nr_virtfn) return 0; @@ -307,6 +313,12 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn) if (nr_virtfn < initial) initial = nr_virtfn; + if ((retval = pcibios_sriov_enable(dev, initial))) { + dev_err(&dev->dev, "failure %d from pcibios_sriov_enable()\n", + retval); + return retval; + } + for (i = 0; i < initial; i++) { rc = virtfn_add(dev, i, 0); if (rc) @@ -335,6 +347,11 @@ failed: return rc; } +int __weak pcibios_sriov_disable(struct pci_dev *pdev) +{ + return 0; +} + static void sriov_disable(struct pci_dev *dev) { int i; @@ -346,6 +363,8 @@ static void sriov_disable(struct pci_dev *dev) for (i = 0; i < iov->num_VFs; i++) virtfn_remove(dev, i, 0); + pcibios_sriov_disable(dev); + iov->ctrl &= ~(PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE); pci_cfg_access_lock(dev); pci_write_config_word(dev, iov->pos + PCI_SRIOV_CTRL, iov->ctrl); ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v12 07/21] PCI: Export pci_iov_virtfn_bus() and pci_iov_virtfn_devfn()
From: Wei Yang On PowerNV, some resource reservation is needed for SR-IOV VFs that don't exist at the bootup stage. To do the match between resources and VFs, the code need to get the VF's BDF in advance. Rename virtfn_bus() and virtfn_devfn() to pci_iov_virtfn_bus() and pci_iov_virtfn_devfn() and export them. [bhelgaas: changelog, make "busnr" int] Signed-off-by: Wei Yang Signed-off-by: Bjorn Helgaas --- drivers/pci/iov.c | 28 include/linux/pci.h | 11 +++ 2 files changed, 27 insertions(+), 12 deletions(-) diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index 2ae921f84bd3..5643a1011e23 100644 --- a/drivers/pci/iov.c +++ b/drivers/pci/iov.c @@ -19,16 +19,20 @@ #define VIRTFN_ID_LEN 16 -static inline u8 virtfn_bus(struct pci_dev *dev, int id) +int pci_iov_virtfn_bus(struct pci_dev *dev, int vf_id) { + if (!dev->is_physfn) + return -EINVAL; return dev->bus->number + ((dev->devfn + dev->sriov->offset + - dev->sriov->stride * id) >> 8); + dev->sriov->stride * vf_id) >> 8); } -static inline u8 virtfn_devfn(struct pci_dev *dev, int id) +int pci_iov_virtfn_devfn(struct pci_dev *dev, int vf_id) { + if (!dev->is_physfn) + return -EINVAL; return (dev->devfn + dev->sriov->offset + - dev->sriov->stride * id) & 0xff; + dev->sriov->stride * vf_id) & 0xff; } /* @@ -58,11 +62,11 @@ static inline u8 virtfn_max_buses(struct pci_dev *dev) struct pci_sriov *iov = dev->sriov; int nr_virtfn; u8 max = 0; - u8 busnr; + int busnr; for (nr_virtfn = 1; nr_virtfn <= iov->total_VFs; nr_virtfn++) { pci_iov_set_numvfs(dev, nr_virtfn); - busnr = virtfn_bus(dev, nr_virtfn - 1); + busnr = pci_iov_virtfn_bus(dev, nr_virtfn - 1); if (busnr > max) max = busnr; } @@ -116,7 +120,7 @@ static int virtfn_add(struct pci_dev *dev, int id, int reset) struct pci_bus *bus; mutex_lock(&iov->dev->sriov->lock); - bus = virtfn_add_bus(dev->bus, virtfn_bus(dev, id)); + bus = virtfn_add_bus(dev->bus, pci_iov_virtfn_bus(dev, id)); if (!bus) goto failed; @@ -124,7 +128,7 @@ static int virtfn_add(struct pci_dev *dev, int id, int reset) if (!virtfn) goto failed0; - virtfn->devfn = virtfn_devfn(dev, id); + virtfn->devfn = pci_iov_virtfn_devfn(dev, id); virtfn->vendor = dev->vendor; pci_read_config_word(dev, iov->pos + PCI_SRIOV_VF_DID, &virtfn->device); pci_setup_device(virtfn); @@ -186,8 +190,8 @@ static void virtfn_remove(struct pci_dev *dev, int id, int reset) struct pci_sriov *iov = dev->sriov; virtfn = pci_get_domain_bus_and_slot(pci_domain_nr(dev->bus), -virtfn_bus(dev, id), -virtfn_devfn(dev, id)); +pci_iov_virtfn_bus(dev, id), +pci_iov_virtfn_devfn(dev, id)); if (!virtfn) return; @@ -226,7 +230,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn) struct pci_dev *pdev; struct pci_sriov *iov = dev->sriov; int bars = 0; - u8 bus; + int bus; if (!nr_virtfn) return 0; @@ -263,7 +267,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn) iov->offset = offset; iov->stride = stride; - bus = virtfn_bus(dev, nr_virtfn - 1); + bus = pci_iov_virtfn_bus(dev, nr_virtfn - 1); if (bus > dev->bus->busn_res.end) { dev_err(&dev->dev, "can't enable %d VFs (bus %02x out of range of %pR)\n", nr_virtfn, bus, &dev->bus->busn_res); diff --git a/include/linux/pci.h b/include/linux/pci.h index 15596582e575..99ea94835fb6 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -1669,6 +1669,9 @@ int pci_ext_cfg_avail(void); void __iomem *pci_ioremap_bar(struct pci_dev *pdev, int bar); #ifdef CONFIG_PCI_IOV +int pci_iov_virtfn_bus(struct pci_dev *dev, int id); +int pci_iov_virtfn_devfn(struct pci_dev *dev, int id); + int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn); void pci_disable_sriov(struct pci_dev *dev); int pci_num_vf(struct pci_dev *dev); @@ -1677,6 +1680,14 @@ int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 numvfs); int pci_sriov_get_totalvfs(struct pci_dev *dev); resource_size_t pci_iov_resource_size(struct pci_dev *dev, int resno); #else +static inline int pci_iov_virtfn_bus(struct pci_dev *dev, int id) +{ + return -ENOSYS; +} +static inline int pci_iov_virtfn_devfn(struct pci_dev *dev, int id) +{ + return -ENOSYS; +} static inline int pci_enable_sriov(struct pci_dev *dev, int nr_virt
[PATCH v12 06/21] PCI: Calculate maximum number of buses required for VFs
From: Wei Yang An SR-IOV device can change its First VF Offset and VF Stride based on the values of ARI Capable Hierarchy and NumVFs. The number of buses required for all VFs is determined by NumVFs, First VF Offset, and VF Stride (see SR-IOV spec r1.1, sec 2.1.2). Previously pci_iov_bus_range() computed how many buses would be required by TotalVFs, but this was based on a single NumVFs value and may not have been the maximum for all NumVFs configurations. Iterate over all valid NumVFs and calculate the maximum number of bus numbers that could ever be required for VFs of this device. [bhelgaas: changelog, compute busnr of NumVFs, not TotalVFs, remove kerenl-doc comment marker] Signed-off-by: Wei Yang Signed-off-by: Bjorn Helgaas --- drivers/pci/iov.c | 31 +++ drivers/pci/pci.h |1 + 2 files changed, 28 insertions(+), 4 deletions(-) diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index a8752c2c2b53..2ae921f84bd3 100644 --- a/drivers/pci/iov.c +++ b/drivers/pci/iov.c @@ -46,6 +46,30 @@ static inline void pci_iov_set_numvfs(struct pci_dev *dev, int nr_virtfn) pci_read_config_word(dev, iov->pos + PCI_SRIOV_VF_STRIDE, &iov->stride); } +/* + * The PF consumes one bus number. NumVFs, First VF Offset, and VF Stride + * determine how many additional bus numbers will be consumed by VFs. + * + * Iterate over all valid NumVFs and calculate the maximum number of bus + * numbers that could ever be required. + */ +static inline u8 virtfn_max_buses(struct pci_dev *dev) +{ + struct pci_sriov *iov = dev->sriov; + int nr_virtfn; + u8 max = 0; + u8 busnr; + + for (nr_virtfn = 1; nr_virtfn <= iov->total_VFs; nr_virtfn++) { + pci_iov_set_numvfs(dev, nr_virtfn); + busnr = virtfn_bus(dev, nr_virtfn - 1); + if (busnr > max) + max = busnr; + } + + return max; +} + static struct pci_bus *virtfn_add_bus(struct pci_bus *bus, int busnr) { struct pci_bus *child; @@ -427,6 +451,7 @@ found: dev->sriov = iov; dev->is_physfn = 1; + iov->max_VF_buses = virtfn_max_buses(dev); return 0; @@ -556,15 +581,13 @@ void pci_restore_iov_state(struct pci_dev *dev) int pci_iov_bus_range(struct pci_bus *bus) { int max = 0; - u8 busnr; struct pci_dev *dev; list_for_each_entry(dev, &bus->devices, bus_list) { if (!dev->is_physfn) continue; - busnr = virtfn_bus(dev, dev->sriov->total_VFs - 1); - if (busnr > max) - max = busnr; + if (dev->sriov->max_VF_buses > max) + max = dev->sriov->max_VF_buses; } return max ? max - bus->number : 0; diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index 57329645dd01..bae593c04541 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -243,6 +243,7 @@ struct pci_sriov { u16 stride; /* following VF stride */ u32 pgsz; /* page size for BAR alignment */ u8 link;/* Function Dependency Link */ + u8 max_VF_buses;/* max buses consumed by VFs */ u16 driver_max_VFs; /* max num VFs driver supports */ struct pci_dev *dev;/* lowest numbered PF */ struct pci_dev *self; /* this PF */ ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v12 05/21] PCI: Refresh First VF Offset and VF Stride when updating NumVFs
From: Wei Yang The First VF Offset and VF Stride fields depend on the NumVFs setting, so refresh the cached fields in struct pci_sriov when updating NumVFs. See the SR-IOV spec r1.1, sec 3.3.9 and 3.3.10. [bhelgaas: changelog, remove kernel-doc comment marker] Signed-off-by: Wei Yang Signed-off-by: Bjorn Helgaas --- drivers/pci/iov.c | 23 +++ 1 file changed, 19 insertions(+), 4 deletions(-) diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index 27b98c361823..a8752c2c2b53 100644 --- a/drivers/pci/iov.c +++ b/drivers/pci/iov.c @@ -31,6 +31,21 @@ static inline u8 virtfn_devfn(struct pci_dev *dev, int id) dev->sriov->stride * id) & 0xff; } +/* + * Per SR-IOV spec sec 3.3.10 and 3.3.11, First VF Offset and VF Stride may + * change when NumVFs changes. + * + * Update iov->offset and iov->stride when NumVFs is written. + */ +static inline void pci_iov_set_numvfs(struct pci_dev *dev, int nr_virtfn) +{ + struct pci_sriov *iov = dev->sriov; + + pci_write_config_word(dev, iov->pos + PCI_SRIOV_NUM_VF, nr_virtfn); + pci_read_config_word(dev, iov->pos + PCI_SRIOV_VF_OFFSET, &iov->offset); + pci_read_config_word(dev, iov->pos + PCI_SRIOV_VF_STRIDE, &iov->stride); +} + static struct pci_bus *virtfn_add_bus(struct pci_bus *bus, int busnr) { struct pci_bus *child; @@ -253,7 +268,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn) return rc; } - pci_write_config_word(dev, iov->pos + PCI_SRIOV_NUM_VF, nr_virtfn); + pci_iov_set_numvfs(dev, nr_virtfn); iov->ctrl |= PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE; pci_cfg_access_lock(dev); pci_write_config_word(dev, iov->pos + PCI_SRIOV_CTRL, iov->ctrl); @@ -282,7 +297,7 @@ failed: iov->ctrl &= ~(PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE); pci_cfg_access_lock(dev); pci_write_config_word(dev, iov->pos + PCI_SRIOV_CTRL, iov->ctrl); - pci_write_config_word(dev, iov->pos + PCI_SRIOV_NUM_VF, 0); + pci_iov_set_numvfs(dev, 0); ssleep(1); pci_cfg_access_unlock(dev); @@ -313,7 +328,7 @@ static void sriov_disable(struct pci_dev *dev) sysfs_remove_link(&dev->dev.kobj, "dep_link"); iov->num_VFs = 0; - pci_write_config_word(dev, iov->pos + PCI_SRIOV_NUM_VF, 0); + pci_iov_set_numvfs(dev, 0); } static int sriov_init(struct pci_dev *dev, int pos) @@ -452,7 +467,7 @@ static void sriov_restore_state(struct pci_dev *dev) pci_update_resource(dev, i); pci_write_config_dword(dev, iov->pos + PCI_SRIOV_SYS_PGSIZE, iov->pgsz); - pci_write_config_word(dev, iov->pos + PCI_SRIOV_NUM_VF, iov->num_VFs); + pci_iov_set_numvfs(dev, iov->num_VFs); pci_write_config_word(dev, iov->pos + PCI_SRIOV_CTRL, iov->ctrl); if (iov->ctrl & PCI_SRIOV_CTRL_VFE) msleep(100); ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v12 04/21] PCI: Index IOV resources in the conventional style
Most of PCI uses "res = &dev->resource[i]", not "res = dev->resource + i". Use that style in iov.c also. No functional change. Signed-off-by: Bjorn Helgaas --- drivers/pci/iov.c |8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index 5bca0e1a2799..27b98c361823 100644 --- a/drivers/pci/iov.c +++ b/drivers/pci/iov.c @@ -95,7 +95,7 @@ static int virtfn_add(struct pci_dev *dev, int id, int reset) virtfn->multifunction = 0; for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) { - res = dev->resource + PCI_IOV_RESOURCES + i; + res = &dev->resource[i + PCI_IOV_RESOURCES]; if (!res->parent) continue; virtfn->resource[i].name = pci_name(virtfn); @@ -212,7 +212,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn) nres = 0; for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) { bars |= (1 << (i + PCI_IOV_RESOURCES)); - res = dev->resource + PCI_IOV_RESOURCES + i; + res = &dev->resource[i + PCI_IOV_RESOURCES]; if (res->parent) nres++; } @@ -373,7 +373,7 @@ found: nres = 0; for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) { - res = dev->resource + PCI_IOV_RESOURCES + i; + res = &dev->resource[i + PCI_IOV_RESOURCES]; bar64 = __pci_read_base(dev, pci_bar_unknown, res, pos + PCI_SRIOV_BAR + i * 4); if (!res->flags) @@ -417,7 +417,7 @@ found: failed: for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) { - res = dev->resource + PCI_IOV_RESOURCES + i; + res = &dev->resource[i + PCI_IOV_RESOURCES]; res->flags = 0; } ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v12 03/21] PCI: Keep individual VF BAR size in struct pci_sriov
From: Wei Yang Currently we don't store the individual VF BAR size. We calculate it when needed by dividing the PF's IOV resource size (which contains space for *all* the VFs) by total_VFs or by reading the BAR in the SR-IOV capability again. Keep the individual VF BAR size in struct pci_sriov.barsz[], add pci_iov_resource_size() to retrieve it, and use that instead of doing the division or reading the SR-IOV capability BAR. [bhelgaas: rename to "barsz[]", simplify barsz[] index computation, remove SR-IOV capability BAR sizing] Signed-off-by: Wei Yang Signed-off-by: Bjorn Helgaas --- drivers/pci/iov.c | 39 --- drivers/pci/pci.h |1 + include/linux/pci.h |3 +++ 3 files changed, 24 insertions(+), 19 deletions(-) diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index 05f9d97e4175..5bca0e1a2799 100644 --- a/drivers/pci/iov.c +++ b/drivers/pci/iov.c @@ -57,6 +57,14 @@ static void virtfn_remove_bus(struct pci_bus *physbus, struct pci_bus *virtbus) pci_remove_bus(virtbus); } +resource_size_t pci_iov_resource_size(struct pci_dev *dev, int resno) +{ + if (!dev->is_physfn) + return 0; + + return dev->sriov->barsz[resno - PCI_IOV_RESOURCES]; +} + static int virtfn_add(struct pci_dev *dev, int id, int reset) { int i; @@ -92,8 +100,7 @@ static int virtfn_add(struct pci_dev *dev, int id, int reset) continue; virtfn->resource[i].name = pci_name(virtfn); virtfn->resource[i].flags = res->flags; - size = resource_size(res); - do_div(size, iov->total_VFs); + size = pci_iov_resource_size(dev, i + PCI_IOV_RESOURCES); virtfn->resource[i].start = res->start + size * id; virtfn->resource[i].end = virtfn->resource[i].start + size - 1; rc = request_resource(res, &virtfn->resource[i]); @@ -311,7 +318,7 @@ static void sriov_disable(struct pci_dev *dev) static int sriov_init(struct pci_dev *dev, int pos) { - int i; + int i, bar64; int rc; int nres; u32 pgsz; @@ -360,29 +367,29 @@ found: pgsz &= ~(pgsz - 1); pci_write_config_dword(dev, pos + PCI_SRIOV_SYS_PGSIZE, pgsz); + iov = kzalloc(sizeof(*iov), GFP_KERNEL); + if (!iov) + return -ENOMEM; + nres = 0; for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) { res = dev->resource + PCI_IOV_RESOURCES + i; - i += __pci_read_base(dev, pci_bar_unknown, res, -pos + PCI_SRIOV_BAR + i * 4); + bar64 = __pci_read_base(dev, pci_bar_unknown, res, + pos + PCI_SRIOV_BAR + i * 4); if (!res->flags) continue; if (resource_size(res) & (PAGE_SIZE - 1)) { rc = -EIO; goto failed; } + iov->barsz[i] = resource_size(res); res->end = res->start + resource_size(res) * total - 1; dev_info(&dev->dev, "VF(n) BAR%d space: %pR (contains BAR%d for %d VFs)\n", i, res, i, total); + i += bar64; nres++; } - iov = kzalloc(sizeof(*iov), GFP_KERNEL); - if (!iov) { - rc = -ENOMEM; - goto failed; - } - iov->pos = pos; iov->nres = nres; iov->ctrl = ctrl; @@ -414,6 +421,7 @@ failed: res->flags = 0; } + kfree(iov); return rc; } @@ -510,14 +518,7 @@ int pci_iov_resource_bar(struct pci_dev *dev, int resno) */ resource_size_t pci_sriov_resource_alignment(struct pci_dev *dev, int resno) { - struct resource tmp; - int reg = pci_iov_resource_bar(dev, resno); - - if (!reg) - return 0; - -__pci_read_base(dev, pci_bar_unknown, &tmp, reg); - return resource_alignment(&tmp); + return pci_iov_resource_size(dev, resno); } /** diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index 4091f82239cd..57329645dd01 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -247,6 +247,7 @@ struct pci_sriov { struct pci_dev *dev;/* lowest numbered PF */ struct pci_dev *self; /* this PF */ struct mutex lock; /* lock for VF bus */ + resource_size_t barsz[PCI_SRIOV_NUM_BARS]; /* VF BAR size */ }; #ifdef CONFIG_PCI_ATS diff --git a/include/linux/pci.h b/include/linux/pci.h index 211e9da8a7d7..15596582e575 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -1675,6 +1675,7 @@ int pci_num_vf(struct pci_dev *dev); int pci_vfs_assigned(struct pci_dev *dev); int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 numvfs); int pci_sriov_get_totalvfs(struct pci_dev *dev); +resource_size_t pci_iov_resource_size(struct pci_dev *dev, int resno); #e
[PATCH v12 02/21] PCI: Print PF SR-IOV resource that contains all VF(n) BAR space
From: Wei Yang When we size VF BAR0, VF BAR1, etc., from the SR-IOV Capability of a PF, we learn the alignment requirement and amount of space consumed by a single VF. But when VFs are enabled, *each* of the NumVFs consumes that amount of space, so the total size of the PF resource is "VF BAR size * NumVFs". Add a printk of the total space consumed by the VFs corresponding to what we already do for normal non-IOV BARs. No functional change; new message only. [bhelgaas: split out into its own patch] Signed-off-by: Wei Yang Signed-off-by: Bjorn Helgaas --- drivers/pci/iov.c |2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index c4c33ead03bc..05f9d97e4175 100644 --- a/drivers/pci/iov.c +++ b/drivers/pci/iov.c @@ -372,6 +372,8 @@ found: goto failed; } res->end = res->start + resource_size(res) * total - 1; + dev_info(&dev->dev, "VF(n) BAR%d space: %pR (contains BAR%d for %d VFs)\n", +i, res, i, total); nres++; } ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v12 01/21] PCI: Print more info in sriov_enable() error message
If we don't have space for all the bus numbers required to enable VFs, print the largest bus number required and the range available. No functional change; improved error message only. Signed-off-by: Bjorn Helgaas --- drivers/pci/iov.c |7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index 4b3a4eaad996..c4c33ead03bc 100644 --- a/drivers/pci/iov.c +++ b/drivers/pci/iov.c @@ -180,6 +180,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn) struct pci_dev *pdev; struct pci_sriov *iov = dev->sriov; int bars = 0; + u8 bus; if (!nr_virtfn) return 0; @@ -216,8 +217,10 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn) iov->offset = offset; iov->stride = stride; - if (virtfn_bus(dev, nr_virtfn - 1) > dev->bus->busn_res.end) { - dev_err(&dev->dev, "SR-IOV: bus number out of range\n"); + bus = virtfn_bus(dev, nr_virtfn - 1); + if (bus > dev->bus->busn_res.end) { + dev_err(&dev->dev, "can't enable %d VFs (bus %02x out of range of %pR)\n", + nr_virtfn, bus, &dev->bus->busn_res); return -ENOMEM; } ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v12 00/21] Enable SRIOV on Power8
Wei Yang's most recent POWER8 SR-IOV patchset was v11, posted on Jan 15, 2015. I'm having a hard time keeping everything straight between the tweaks I've made on my branch and incremental updates. I think it's easier to repost the whole series so one can easily collect everything that goes together. So here'a a v12 with the changes I've made. Wei, please follow up with a v13 to fix anything I broke here. Here's how I would do that using stgit: git checkout -b pci/virtualization-v13 pci/virtualization-v12 stg init stg uncommit -n 21 stg mail -v v13 ... pci-print-more-info-in..powerpc-pci-add-pci-resource I put v10, v11, and v12 on branches based on v4.0-rc1: pci/virtualization-v10(posted 12/22/2014) pci/virtualization-v11(posted 01/15/2015) pci/virtualization-v12(this posting) This makes it relatively easy to diff the versions, e.g., git diff pci/virtualization-v11 pci/virtualization-v12 These branches are at https://git.kernel.org/cgit/linux/kernel/git/helgaas/pci.git/ v12: * remove "align" parameter from pcibios_iov_resource_alignment() default version returns pci_iov_resource_size() instead of the "align" parameter * in powerpc pcibios_iov_resource_alignment(), return pci_iov_resource_size() if there's no ppc_md function pointer * in pci_sriov_resource_alignment(), don't re-read base, since we saved the required alignment when reading it the first time * remove "vf_num" parameter from add_dev_pci_info() and remove_dev_pci_info(); use pci_sriov_get_totalvfs() instead * use dev_warn() instead of pr_warn() when possible * check to be sure IOV BAR is still in range after shifting, change pnv_pci_vf_resource_shift() from void to int * improve sriov_enable() error message * improve SR-IOV BAR sizing message * index IOV resources in conventional style * include preamble patches (refresh offset/stride when updating numVFs, calculate max buses required * restructure pci_iov_max_bus_range() to return value instead of updating internally, rename to virtfn_max_buses() * fix typos & formatting * expand documentation Bjorn --- Bjorn Helgaas (2): PCI: Print more info in sriov_enable() error message PCI: Index IOV resources in the conventional style Gavin Shan (1): powerpc/pci: Refactor pci_dn Wei Yang (18): PCI: Print PF SR-IOV resource that contains all VF(n) BAR space PCI: Keep individual VF BAR size in struct pci_sriov PCI: Refresh First VF Offset and VF Stride when updating NumVFs PCI: Calculate maximum number of buses required for VFs PCI: Export pci_iov_virtfn_bus() and pci_iov_virtfn_devfn() PCI: Add pcibios_sriov_enable() and pcibios_sriov_disable() PCI: Add pcibios_iov_resource_alignment() interface PCI: Consider additional PF's IOV BAR alignment in sizing and assigning powerpc/pci: Don't unset PCI resources for VFs powerpc/powernv: Use pci_dn, not device_node, in PCI config accessor powerpc/powernv: Allocate struct pnv_ioda_pe iommu_table dynamically powerpc/powernv: Reserve additional space for IOV BAR according to the number of total_pe powerpc/powernv: Implement pcibios_iov_resource_alignment() on powernv powerpc/powernv: Shift VF resource with an offset powerpc/powernv: Reserve additional space for IOV BAR, with m64_per_iov supported powerpc/powernv: Group VF PE when IOV BAR is big on PHB3 powerpc/pci: Remove unused struct pci_dn.pcidev field powerpc/pci: Add PCI resource alignment documentation .../powerpc/pci_iov_resource_on_powernv.txt| 305 arch/powerpc/include/asm/device.h |3 arch/powerpc/include/asm/iommu.h |3 arch/powerpc/include/asm/machdep.h |5 arch/powerpc/include/asm/pci-bridge.h | 24 + arch/powerpc/kernel/pci-common.c | 19 arch/powerpc/kernel/pci_dn.c | 256 ++- arch/powerpc/platforms/powernv/eeh-powernv.c | 14 arch/powerpc/platforms/powernv/pci-ioda.c | 777 +++- arch/powerpc/platforms/powernv/pci.c | 87 +- arch/powerpc/platforms/powernv/pci.h | 13 drivers/pci/iov.c | 155 +++- drivers/pci/pci.h |2 drivers/pci/setup-bus.c| 83 ++ include/linux/pci.h| 15 15 files changed, 1622 insertions(+), 139 deletions(-) create mode 100644 Documentation/powerpc/pci_iov_resource_on_powernv.txt ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V11 08/17] powrepc/pci: Refactor pci_dn
On Tue, 2015-02-24 at 02:13 -0600, Bjorn Helgaas wrote: > > Ah, yes, now I see the problem. I don't really like having to export > pci_iov_virtfn_bus() and pci_iov_virtfn_devfn(), but it's probably not > worth the hassle of changing it, and I think adding more pcibios > interfaces > would be even worse. Aren't we going to eventually turn them all into host bridge ops ? :-) Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V11 08/17] powrepc/pci: Refactor pci_dn
On Mon, Feb 23, 2015 at 11:13:49AM +1100, Gavin Shan wrote: > On Fri, Feb 20, 2015 at 05:19:17PM -0600, Bjorn Helgaas wrote: > >On Thu, Jan 15, 2015 at 10:27:58AM +0800, Wei Yang wrote: > >> From: Gavin Shan > >> > >> pci_dn is the extension of PCI device node and it's created from > >> device node. Unfortunately, VFs that are enabled dynamically by > >> PF's driver and they don't have corresponding device nodes, and > >> pci_dn. The patch refactors pci_dn to support VFs: > >> > >>* pci_dn is organized as a hierarchy tree. VF's pci_dn is put > >> to the child list of pci_dn of PF's bridge. pci_dn of other > >> device put to the child list of pci_dn of its upstream bridge. > >> > >>* VF's pci_dn is expected to be created dynamically when PF > >> enabling VFs. VF's pci_dn will be destroyed when PF disabling > >> VFs. pci_dn of other device is still created from device node > >> as before. > >> > >>* For one particular PCI device (VF or not), its pci_dn can be > >> found from pdev->dev.archdata.firmware_data, PCI_DN(devnode), > >> or parent's list. The fast path (fetching pci_dn through PCI > >> device instance) is populated during early fixup time. > >> > >> Signed-off-by: Gavin Shan > >> --- > >> arch/powerpc/include/asm/device.h |3 + > >> arch/powerpc/include/asm/pci-bridge.h | 14 +- > >> arch/powerpc/kernel/pci_dn.c | 242 > >> - > >> arch/powerpc/platforms/powernv/pci-ioda.c | 16 ++ > >> 4 files changed, 270 insertions(+), 5 deletions(-) > >> ... > > > >> +#ifdef CONFIG_PCI_IOV > >> +static struct pci_dn *add_one_dev_pci_info(struct pci_dn *parent, > >> + struct pci_dev *pdev, > >> + int busno, int devfn) > >> +{ > >> + struct pci_dn *pdn; > >> + > >> + /* Except PHB, we always have parent firmware data */ > >> + if (!parent) > >> + return NULL; > >> + > >> + pdn = kzalloc(sizeof(*pdn), GFP_KERNEL); > >> + if (!pdn) { > >> + pr_warn("%s: Out of memory !\n", __func__); > >> + return NULL; > >> + } > >> + > >> + pdn->phb = parent->phb; > >> + pdn->parent = parent; > >> + pdn->busno = busno; > >> + pdn->devfn = devfn; > >> +#ifdef CONFIG_PPC_POWERNV > >> + pdn->pe_number = IODA_INVALID_PE; > >> +#endif > >> + INIT_LIST_HEAD(&pdn->child_list); > >> + INIT_LIST_HEAD(&pdn->list); > >> + list_add_tail(&pdn->list, &parent->child_list); > >> + > >> + /* > >> + * If we already have PCI device instance, lets > >> + * bind them. > >> + */ > >> + if (pdev) > >> + pdev->dev.archdata.firmware_data = pdn; > >> + > >> + return pdn; > > > >I'd like to see this done in pcibios_add_device(), as I mentioned in > >response to "[PATCH V11 01/17] PCI/IOV: Export interface for retrieve VF's > >BDF". Maybe that's not feasible for some reason, but it would be a nicer > >design if it's possible. > > > >The remove_dev_pci_info() work would be done in pcibios_release_device() > >then, of course. > > > > Yes, it's not feasible. PCI config accessors rely on VF's pci_dn. Before > calling pcibios_add_device(), we need access VF's config space. That means > we need VF's pci_dn before pci_setup_device() as follows: > > sriov_enable() > pcibios_sriov_enable(); /* Currently, VF's pci_dn is created at > this point */ > virtfn_add(); > virtfn_add_bus(); /* Create virtual bus if necessary */ > /* ---> A */ > pci_alloc_dev();/* ---> B */ > pci_setup_device(vf); /* Access VF's config space */ > pci_read_config_byte(vf, PCI_HEADER_TYPE); > pci_read_config_dword(vf, PCI_CLASS_REVISION); > pci_fixup_device(pci_fixup_early, vf); > pci_read_irq(); > pci_read_bases(); > pci_device_add(vf); > device_initialize(&vf->dev); > pci_fixup_device(pci_fixup_header, vf); > pci_init_capabilities(vf); > pcibios_add_device(vf); > > We have couple of options here: > > 1) Keep current code. VF's pci_dn is going to be destroyed in >pcibios_sriov_disable() as we're doing currently. > 2) Introduce pcibios_iov_virtfn_add() (at A) for platform to override. >VF's pci_dn is going to be destroyed in pcibios_release_device(). > 3) Introduce pcibios_alloc_dev() (at B) for platform to override. The >VF's pci_dn is going to be destroyed in pcibios_release_device(). Ah, yes, now I see the problem. I don't really like having to export pci_iov_virtfn_bus() and pci_iov_virtfn_devfn(), but it's probably not worth the hassle of changing it, and I think adding more pcibios interfaces would be even worse. So let's leave it as-is for now. > >> +} > >> +#endif // CONFIG_PCI_IOV > >> + > >> +struct pci_dn *add_dev_pci_info(struct pci_dev *pdev, u16 vf_