Re: RFC: Reducing the number of non volatile GPRs in the ppc64 kernel
I agree with Segher. We already know we have opportunities to do a better job with shrink-wrapping (pushing this kind of useless activity down past early exits), so having examples of code to look at to improve this would be useful. -- Bill Bill Schmidt, Ph.D. Linux on Power Toolchain IBM Linux Technology Center wschm...@us.ibm.com (507) 319-6873 From: Segher Boessenkool To: Anton Blanchard Cc: linuxppc-dev@lists.ozlabs.org, Michael Gschwind/Watson/IBM@IBMUS, Alan Modra , Bill Schmidt/Rochester/IBM@IBMUS, Ulrich Weigand , pau...@samba.org Date: 08/05/2015 06:20 AM Subject:Re: RFC: Reducing the number of non volatile GPRs in the ppc64 kernel Hi Anton, On Wed, Aug 05, 2015 at 02:03:00PM +1000, Anton Blanchard wrote: > While looking at traces of kernel workloads, I noticed places where gcc > used a large number of non volatiles. Some of these functions > did very little work, and we spent most of our time saving the > non volatiles to the stack and reading them back. That is something that should be fixed in GCC -- do you have an example of such a function? > It made me wonder if we have the right ratio of volatile to non > volatile GPRs. Since the kernel is completely self contained, we could > potentially change that ratio. > > Attached is a quick hack to gcc and the kernel to decrease the number > of non volatile GPRs to 8. I'm not sure if this is a good idea (and if > the volatile to non volatile ratio is right), but this gives us > something to play with. Instead of the GCC hack you can add a bunch of -fcall-used-r14 etc. options; does that not work for you? Segher ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V2 6/6] powerpc/powernv: allocate discrete PE# when using M64 BAR in Single PE mode
On Fri, Aug 07, 2015 at 03:54:48PM +1000, Gavin Shan wrote: >On Fri, Aug 07, 2015 at 01:44:33PM +0800, Wei Yang wrote: >>On Fri, Aug 07, 2015 at 01:43:01PM +1000, Gavin Shan wrote: >>>On Fri, Aug 07, 2015 at 10:33:33AM +0800, Wei Yang wrote: On Fri, Aug 07, 2015 at 11:36:56AM +1000, Gavin Shan wrote: >On Thu, Aug 06, 2015 at 09:41:41PM +0800, Wei Yang wrote: >>On Thu, Aug 06, 2015 at 03:36:01PM +1000, Gavin Shan wrote: >>>On Wed, Aug 05, 2015 at 09:25:03AM +0800, Wei Yang wrote: When M64 BAR is set to Single PE mode, the PE# assigned to VF could be discrete. This patch restructures the patch to allocate discrete PE# for VFs when M64 BAR is set to Single PE mode. Signed-off-by: Wei Yang --- arch/powerpc/include/asm/pci-bridge.h |2 +- arch/powerpc/platforms/powernv/pci-ioda.c | 69 + 2 files changed, 51 insertions(+), 20 deletions(-) diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h index 8aeba4c..72415c7 100644 --- a/arch/powerpc/include/asm/pci-bridge.h +++ b/arch/powerpc/include/asm/pci-bridge.h @@ -213,7 +213,7 @@ struct pci_dn { #ifdef CONFIG_PCI_IOV u16 vfs_expanded; /* number of VFs IOV BAR expanded */ u16 num_vfs;/* number of VFs enabled*/ - int offset; /* PE# for the first VF PE */ + int *offset;/* PE# for the first VF PE or array */ boolm64_single_mode;/* Use M64 BAR in Single Mode */ #define IODA_INVALID_M64(-1) int (*m64_map)[PCI_SRIOV_NUM_BARS]; >>> >>>how about renaming "offset" to "pe_num_map", or "pe_map" ? Similar to >>>the comments >>>I gave to the "m64_bar_map", num_of_max_vfs entries can be allocated. >>>Though not >>>all of them will be used, not too much memory will be wasted. >>> >> >>Thanks for your comment. >> >>I have thought about change the name to make it more self explain. While >>another fact I want to take in is this field is also used to be reflect >>the >>shift offset when M64 BAR is used in the Shared Mode. So I maintain the >>name. >> >>How about use "enum", one maintain the name "offset", and another one >>rename to >>"pe_num_map". And use the meaningful name at proper place? >> So I suppose you agree with my naming proposal. >>> >>>No, I dislike the "enum" things. >>> >> >>OK, then you suggest to rename it pe_num_map or keep it as offset? >> > >pe_num_map would be better. > > >Ok. I'm explaining it with more details. There are two cases: single vs >shared >mode. When PHB M64 BARs run in single mode, you need an array to track the >allocated discrete PE#. The VF_index is the index to the array. When PHB >M64 >BARs run in shared mode, you need continuous PE#. No array required for >this >case. Instead, the starting PE# should be stored to somewhere, which can >be pdn->offset[0] simply. > >So when allocating memory for this array, you just simply allocate >(sizeof(*pdn->offset) >*max_vf_num) no matter what mode PHB's M64 BARs will run in. The point is >nobody >can enable (max_vf_num + 1) VFs. The max_vf_num is 15? >>> >>>I don't understand why you said: the max_vf_num is 15. Since max_vf_num is >>>variable >>>on different PFs, how can it be fixed value - 15 ? >>> >> >>In Shared PE case, only one int to indicate the start PE# is fine. >>In Single PE mode, we totally could enable 15 VF, the same number of PEs for >>each VF, which is limited by the number M64 BARs we have in the system. >> >>If not, the number you expected is total_vfs? >> > >then it should be min(total_vfs, phb->ioda.m64_bar_idx), isn't it? > > >With above way, the arrays for PE# and M64 BAR remapping needn't be >allocated >when enabling SRIOV capability and releasing on disabling SRIOV capability. >Instead, those two arrays can be allocated during resource fixup time and >free'ed >when destroying the pdn. > My same point of view like previous, if the memory is not in the concern, how about define them static? >>> >>>It's a bad idea from my review. How many entries this array is going to have? >>>256 * NUM_OF_MAX_VF_BARS ? >>> >> >>No. >> >>It has 15 * 6, 15 VFs we could enable at most and 6 VF BARs a VF could have at >>most. >> > >It's min(total_vfs, phb->ioda.m64_bar_idx) VFs that can be enabled at maximal >degree, no? > Yes, you are right. The number 15 is the one I used when the field is static. If we want to allocate it dynamically, we need to choose the smaller one. While I suggest
[PATCH 5/5] powerpc/mm: Drop CONFIG_PPC_HAS_HASH_64K
The relation between CONFIG_PPC_HAS_HASH_64K and CONFIG_PPC_64K_PAGES is painfully complicated. But if we rearrange it enough we can see that PPC_HAS_HASH_64K essentially depends on PPC_STD_MMU_64 && PPC_64K_PAGES. We can then notice that PPC_HAS_HASH_64K is used in files that are only built for PPC_STD_MMU_64, meaning it's equivalent to PPC_64K_PAGES. So replace all uses and drop it. Signed-off-by: Michael Ellerman --- arch/powerpc/Kconfig| 6 -- arch/powerpc/mm/hash_low_64.S | 4 ++-- arch/powerpc/mm/hash_utils_64.c | 12 ++-- 3 files changed, 8 insertions(+), 14 deletions(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 3a4ba2809201..1e69dee29be3 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -514,11 +514,6 @@ config NODES_SPAN_OTHER_NODES def_bool y depends on NEED_MULTIPLE_NODES -config PPC_HAS_HASH_64K - bool - depends on PPC64 - default n - config STDBINUTILS bool "Using standard binutils settings" depends on 44x @@ -566,7 +561,6 @@ config PPC_16K_PAGES config PPC_64K_PAGES bool "64k page size" depends on !PPC_FSL_BOOK3E && (44x || PPC_STD_MMU_64 || PPC_BOOK3E_64) - select PPC_HAS_HASH_64K if PPC_STD_MMU_64 config PPC_256K_PAGES bool "256k page size" diff --git a/arch/powerpc/mm/hash_low_64.S b/arch/powerpc/mm/hash_low_64.S index 463174a4a647..3b49e3295901 100644 --- a/arch/powerpc/mm/hash_low_64.S +++ b/arch/powerpc/mm/hash_low_64.S @@ -701,7 +701,7 @@ htab_pte_insert_failure: #endif /* CONFIG_PPC_64K_PAGES */ -#ifdef CONFIG_PPC_HAS_HASH_64K +#ifdef CONFIG_PPC_64K_PAGES /* * * @@ -993,7 +993,7 @@ ht64_pte_insert_failure: b ht64_bail -#endif /* CONFIG_PPC_HAS_HASH_64K */ +#endif /* CONFIG_PPC_64K_PAGES */ /* diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c index 5ec987f65b2c..aee70171355b 100644 --- a/arch/powerpc/mm/hash_utils_64.c +++ b/arch/powerpc/mm/hash_utils_64.c @@ -640,7 +640,7 @@ extern u32 ht64_call_hpte_updatepp[]; static void __init htab_finish_init(void) { -#ifdef CONFIG_PPC_HAS_HASH_64K +#ifdef CONFIG_PPC_64K_PAGES patch_branch(ht64_call_hpte_insert1, ppc_function_entry(ppc_md.hpte_insert), BRANCH_SET_LINK); @@ -653,7 +653,7 @@ static void __init htab_finish_init(void) patch_branch(ht64_call_hpte_updatepp, ppc_function_entry(ppc_md.hpte_updatepp), BRANCH_SET_LINK); -#endif /* CONFIG_PPC_HAS_HASH_64K */ +#endif /* CONFIG_PPC_64K_PAGES */ patch_branch(htab_call_hpte_insert1, ppc_function_entry(ppc_md.hpte_insert), @@ -1151,12 +1151,12 @@ int hash_page_mm(struct mm_struct *mm, unsigned long ea, check_paca_psize(ea, mm, psize, user_region); #endif /* CONFIG_PPC_64K_PAGES */ -#ifdef CONFIG_PPC_HAS_HASH_64K +#ifdef CONFIG_PPC_64K_PAGES if (psize == MMU_PAGE_64K) rc = __hash_page_64K(ea, access, vsid, ptep, trap, flags, ssize); else -#endif /* CONFIG_PPC_HAS_HASH_64K */ +#endif /* CONFIG_PPC_64K_PAGES */ { int spp = subpage_protection(mm, ea); if (access & spp) @@ -1264,12 +1264,12 @@ void hash_preload(struct mm_struct *mm, unsigned long ea, update_flags |= HPTE_LOCAL_UPDATE; /* Hash it in */ -#ifdef CONFIG_PPC_HAS_HASH_64K +#ifdef CONFIG_PPC_64K_PAGES if (mm->context.user_psize == MMU_PAGE_64K) rc = __hash_page_64K(ea, access, vsid, ptep, trap, update_flags, ssize); else -#endif /* CONFIG_PPC_HAS_HASH_64K */ +#endif /* CONFIG_PPC_64K_PAGES */ rc = __hash_page_4K(ea, access, vsid, ptep, trap, update_flags, ssize, subpage_protection(mm, ea)); -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 4/5] powerpc/mm: Simplify page size kconfig dependencies
For config options with only a single value, guarding the single value with 'if' is the same as adding a 'depends' statement. And it's more standard to just use 'depends'. And if the option has both an 'if' guard and a 'depends' we can collapse them into a single 'depends' by combining them with &&. Signed-off-by: Michael Ellerman --- arch/powerpc/Kconfig | 11 ++- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 5ef27113b898..3a4ba2809201 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -560,16 +560,17 @@ config PPC_4K_PAGES bool "4k page size" config PPC_16K_PAGES - bool "16k page size" if 44x || PPC_8xx + bool "16k page size" + depends on 44x || PPC_8xx config PPC_64K_PAGES - bool "64k page size" if 44x || PPC_STD_MMU_64 || PPC_BOOK3E_64 - depends on !PPC_FSL_BOOK3E + bool "64k page size" + depends on !PPC_FSL_BOOK3E && (44x || PPC_STD_MMU_64 || PPC_BOOK3E_64) select PPC_HAS_HASH_64K if PPC_STD_MMU_64 config PPC_256K_PAGES - bool "256k page size" if 44x - depends on !STDBINUTILS + bool "256k page size" + depends on 44x && !STDBINUTILS help Make the page size 256k. -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 3/5] powerpc/mm: Drop the 64K on 4K version of pte_pagesize_index()
Now that support for 64k pages with a 4K kernel is removed, this code is unreachable. CONFIG_PPC_HAS_HASH_64K can only be true when CONFIG_PPC_64K_PAGES is also true. But when CONFIG_PPC_64K_PAGES is true we include pte-hash64.h which includes pte-hash64-64k.h, which defines both pte_pagesize_index() and crucially __real_pte, which means this defintion can never be used. Signed-off-by: Michael Ellerman --- arch/powerpc/include/asm/pgtable-ppc64.h | 12 1 file changed, 12 deletions(-) diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h b/arch/powerpc/include/asm/pgtable-ppc64.h index 7ee2300ee392..fa1dfb7f7b48 100644 --- a/arch/powerpc/include/asm/pgtable-ppc64.h +++ b/arch/powerpc/include/asm/pgtable-ppc64.h @@ -134,23 +134,11 @@ #define pte_iterate_hashed_end() } while(0) -#ifdef CONFIG_PPC_HAS_HASH_64K /* * We expect this to be called only for user addresses or kernel virtual * addresses other than the linear mapping. */ -#define pte_pagesize_index(mm, addr, pte) \ - ({ \ - unsigned int psize; \ - if (is_kernel_addr(addr)) \ - psize = MMU_PAGE_4K;\ - else\ - psize = get_slice_psize(mm, addr); \ - psize; \ - }) -#else #define pte_pagesize_index(mm, addr, pte) MMU_PAGE_4K -#endif #endif /* __real_pte */ -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 2/5] powerpc/cell: Drop support for 64K local store on 4K kernels
Back in the olden days we added support for using 64K pages to map the SPU (Synergistic Processing Unit) local store on Cell, when the main kernel was using 4K pages. This was useful at the time because distros were using 4K pages, but using 64K pages on the SPUs could reduce TLB pressure there. However these days the number of Cell users is approaching zero, and supporting this option adds unpleasant complexity to the memory management code. So drop the option, CONFIG_SPU_FS_64K_LS, and all related code. Signed-off-by: Michael Ellerman --- arch/powerpc/include/asm/spu_csa.h | 6 -- arch/powerpc/mm/hugetlbpage.c | 8 -- arch/powerpc/platforms/cell/Kconfig | 15 --- arch/powerpc/platforms/cell/spufs/file.c| 55 --- arch/powerpc/platforms/cell/spufs/lscsa_alloc.c | 124 +--- 5 files changed, 2 insertions(+), 206 deletions(-) diff --git a/arch/powerpc/include/asm/spu_csa.h b/arch/powerpc/include/asm/spu_csa.h index a40fd491250c..51f80b41cda3 100644 --- a/arch/powerpc/include/asm/spu_csa.h +++ b/arch/powerpc/include/asm/spu_csa.h @@ -241,12 +241,6 @@ struct spu_priv2_collapsed { */ struct spu_state { struct spu_lscsa *lscsa; -#ifdef CONFIG_SPU_FS_64K_LS - int use_big_pages; - /* One struct page per 64k page */ -#define SPU_LSCSA_NUM_BIG_PAGES(sizeof(struct spu_lscsa) / 0x1) - struct page *lscsa_pages[SPU_LSCSA_NUM_BIG_PAGES]; -#endif struct spu_problem_collapsed prob; struct spu_priv1_collapsed priv1; struct spu_priv2_collapsed priv2; diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c index bb0bd7025cb8..06c14523b787 100644 --- a/arch/powerpc/mm/hugetlbpage.c +++ b/arch/powerpc/mm/hugetlbpage.c @@ -808,14 +808,6 @@ static int __init add_huge_page_size(unsigned long long size) if ((mmu_psize = shift_to_mmu_psize(shift)) < 0) return -EINVAL; -#ifdef CONFIG_SPU_FS_64K_LS - /* Disable support for 64K huge pages when 64K SPU local store -* support is enabled as the current implementation conflicts. -*/ - if (shift == PAGE_SHIFT_64K) - return -EINVAL; -#endif /* CONFIG_SPU_FS_64K_LS */ - BUG_ON(mmu_psize_defs[mmu_psize].shift != shift); /* Return if huge page size has already been setup */ diff --git a/arch/powerpc/platforms/cell/Kconfig b/arch/powerpc/platforms/cell/Kconfig index 2f23133ab3d1..b0ac1773cea6 100644 --- a/arch/powerpc/platforms/cell/Kconfig +++ b/arch/powerpc/platforms/cell/Kconfig @@ -57,21 +57,6 @@ config SPU_FS Units on machines implementing the Broadband Processor Architecture. -config SPU_FS_64K_LS - bool "Use 64K pages to map SPE local store" - # we depend on PPC_MM_SLICES for now rather than selecting - # it because we depend on hugetlbfs hooks being present. We - # will fix that when the generic code has been improved to - # not require hijacking hugetlbfs hooks. - depends on SPU_FS && PPC_MM_SLICES && !PPC_64K_PAGES - default y - select PPC_HAS_HASH_64K - help - This option causes SPE local stores to be mapped in process - address spaces using 64K pages while the rest of the kernel - uses 4K pages. This can improve performances of applications - using multiple SPEs by lowering the TLB pressure on them. - config SPU_BASE bool default n diff --git a/arch/powerpc/platforms/cell/spufs/file.c b/arch/powerpc/platforms/cell/spufs/file.c index d966bbe58b8f..5038fd578e65 100644 --- a/arch/powerpc/platforms/cell/spufs/file.c +++ b/arch/powerpc/platforms/cell/spufs/file.c @@ -239,23 +239,6 @@ spufs_mem_mmap_fault(struct vm_area_struct *vma, struct vm_fault *vmf) unsigned long address = (unsigned long)vmf->virtual_address; unsigned long pfn, offset; -#ifdef CONFIG_SPU_FS_64K_LS - struct spu_state *csa = &ctx->csa; - int psize; - - /* Check what page size we are using */ - psize = get_slice_psize(vma->vm_mm, address); - - /* Some sanity checking */ - BUG_ON(csa->use_big_pages != (psize == MMU_PAGE_64K)); - - /* Wow, 64K, cool, we need to align the address though */ - if (csa->use_big_pages) { - BUG_ON(vma->vm_start & 0x); - address &= ~0xul; - } -#endif /* CONFIG_SPU_FS_64K_LS */ - offset = vmf->pgoff << PAGE_SHIFT; if (offset >= LS_SIZE) return VM_FAULT_SIGBUS; @@ -310,22 +293,6 @@ static const struct vm_operations_struct spufs_mem_mmap_vmops = { static int spufs_mem_mmap(struct file *file, struct vm_area_struct *vma) { -#ifdef CONFIG_SPU_FS_64K_LS - struct spu_context *ctx = file->private_data; - struct spu_state*csa = &ctx->csa; - - /* Sanity check VMA alignment */ - if (csa->use_big_pages) { - pr_debug(
[PATCH 1/5] powerpc/mm: Fix pte_pagesize_index() crash on 4K w/64K hash
The powerpc kernel can be built to have either a 4K PAGE_SIZE or a 64K PAGE_SIZE. However when built with a 4K PAGE_SIZE there is an additional config option which can be enabled, PPC_HAS_HASH_64K, which means the kernel also knows how to hash a 64K page even though the base PAGE_SIZE is 4K. This is used in one obscure configuration, to support 64K pages for SPU local store on the Cell processor when the rest of the kernel is using 4K pages. In this configuration, pte_pagesize_index() is defined to just pass through its arguments to get_slice_psize(). However pte_pagesize_index() is called for both user and kernel addresses, whereas get_slice_psize() only knows how to handle user addresses. This has been broken forever, however until recently it happened to work. That was because in get_slice_psize() the large kernel address would cause the right shift of the slice mask to return zero. However in commit 7aa0727f3302 "powerpc/mm: Increase the slice range to 64TB", the get_slice_psize() code was changed so that instead of a right shift we do an array lookup based on the address. When passed a kernel address this means we index way off the end of the slice array and return random junk. That is only fatal if we happen to hit something non-zero, but when we do return a non-zero value we confuse the MMU code and eventually cause a check stop. This fix is ugly, but simple. When we're called for a kernel address we return 4K, which is always correct in this configuration, otherwise we use the slice mask. Fixes: 7aa0727f3302 ("powerpc/mm: Increase the slice range to 64TB") Reported-by: Cyril Bur Signed-off-by: Michael Ellerman --- arch/powerpc/include/asm/pgtable-ppc64.h | 14 +- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h b/arch/powerpc/include/asm/pgtable-ppc64.h index 3bb7488bd24b..7ee2300ee392 100644 --- a/arch/powerpc/include/asm/pgtable-ppc64.h +++ b/arch/powerpc/include/asm/pgtable-ppc64.h @@ -135,7 +135,19 @@ #define pte_iterate_hashed_end() } while(0) #ifdef CONFIG_PPC_HAS_HASH_64K -#define pte_pagesize_index(mm, addr, pte) get_slice_psize(mm, addr) +/* + * We expect this to be called only for user addresses or kernel virtual + * addresses other than the linear mapping. + */ +#define pte_pagesize_index(mm, addr, pte) \ + ({ \ + unsigned int psize; \ + if (is_kernel_addr(addr)) \ + psize = MMU_PAGE_4K;\ + else\ + psize = get_slice_psize(mm, addr); \ + psize; \ + }) #else #define pte_pagesize_index(mm, addr, pte) MMU_PAGE_4K #endif -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V2 6/6] powerpc/powernv: allocate discrete PE# when using M64 BAR in Single PE mode
On Fri, Aug 07, 2015 at 01:44:33PM +0800, Wei Yang wrote: >On Fri, Aug 07, 2015 at 01:43:01PM +1000, Gavin Shan wrote: >>On Fri, Aug 07, 2015 at 10:33:33AM +0800, Wei Yang wrote: >>>On Fri, Aug 07, 2015 at 11:36:56AM +1000, Gavin Shan wrote: On Thu, Aug 06, 2015 at 09:41:41PM +0800, Wei Yang wrote: >On Thu, Aug 06, 2015 at 03:36:01PM +1000, Gavin Shan wrote: >>On Wed, Aug 05, 2015 at 09:25:03AM +0800, Wei Yang wrote: >>>When M64 BAR is set to Single PE mode, the PE# assigned to VF could be >>>discrete. >>> >>>This patch restructures the patch to allocate discrete PE# for VFs when >>>M64 >>>BAR is set to Single PE mode. >>> >>>Signed-off-by: Wei Yang >>>--- >>> arch/powerpc/include/asm/pci-bridge.h |2 +- >>> arch/powerpc/platforms/powernv/pci-ioda.c | 69 >>> + >>> 2 files changed, 51 insertions(+), 20 deletions(-) >>> >>>diff --git a/arch/powerpc/include/asm/pci-bridge.h >>>b/arch/powerpc/include/asm/pci-bridge.h >>>index 8aeba4c..72415c7 100644 >>>--- a/arch/powerpc/include/asm/pci-bridge.h >>>+++ b/arch/powerpc/include/asm/pci-bridge.h >>>@@ -213,7 +213,7 @@ struct pci_dn { >>> #ifdef CONFIG_PCI_IOV >>> u16 vfs_expanded; /* number of VFs IOV BAR >>> expanded */ >>> u16 num_vfs;/* number of VFs enabled*/ >>>-int offset; /* PE# for the first VF PE */ >>>+int *offset;/* PE# for the first VF PE or >>>array */ >>> boolm64_single_mode;/* Use M64 BAR in Single Mode */ >>> #define IODA_INVALID_M64(-1) >>> int (*m64_map)[PCI_SRIOV_NUM_BARS]; >> >>how about renaming "offset" to "pe_num_map", or "pe_map" ? Similar to the >>comments >>I gave to the "m64_bar_map", num_of_max_vfs entries can be allocated. >>Though not >>all of them will be used, not too much memory will be wasted. >> > >Thanks for your comment. > >I have thought about change the name to make it more self explain. While >another fact I want to take in is this field is also used to be reflect the >shift offset when M64 BAR is used in the Shared Mode. So I maintain the >name. > >How about use "enum", one maintain the name "offset", and another one >rename to >"pe_num_map". And use the meaningful name at proper place? > >>> >>>So I suppose you agree with my naming proposal. >>> >> >>No, I dislike the "enum" things. >> > >OK, then you suggest to rename it pe_num_map or keep it as offset? > pe_num_map would be better. Ok. I'm explaining it with more details. There are two cases: single vs shared mode. When PHB M64 BARs run in single mode, you need an array to track the allocated discrete PE#. The VF_index is the index to the array. When PHB M64 BARs run in shared mode, you need continuous PE#. No array required for this case. Instead, the starting PE# should be stored to somewhere, which can be pdn->offset[0] simply. So when allocating memory for this array, you just simply allocate (sizeof(*pdn->offset) *max_vf_num) no matter what mode PHB's M64 BARs will run in. The point is nobody can enable (max_vf_num + 1) VFs. >>> >>>The max_vf_num is 15? >>> >> >>I don't understand why you said: the max_vf_num is 15. Since max_vf_num is >>variable >>on different PFs, how can it be fixed value - 15 ? >> > >In Shared PE case, only one int to indicate the start PE# is fine. >In Single PE mode, we totally could enable 15 VF, the same number of PEs for >each VF, which is limited by the number M64 BARs we have in the system. > >If not, the number you expected is total_vfs? > then it should be min(total_vfs, phb->ioda.m64_bar_idx), isn't it? With above way, the arrays for PE# and M64 BAR remapping needn't be allocated when enabling SRIOV capability and releasing on disabling SRIOV capability. Instead, those two arrays can be allocated during resource fixup time and free'ed when destroying the pdn. >>> >>>My same point of view like previous, if the memory is not in the concern, how >>>about define them static? >>> >> >>It's a bad idea from my review. How many entries this array is going to have? >>256 * NUM_OF_MAX_VF_BARS ? >> > >No. > >It has 15 * 6, 15 VFs we could enable at most and 6 VF BARs a VF could have at >most. > It's min(total_vfs, phb->ioda.m64_bar_idx) VFs that can be enabled at maximal degree, no? >>>And for the long term, we may support more VFs. Then at that moment, we need >>>to restructure the code to meet it. >>> >>>So I suggest if we want to allocate it dynamically, we allocate the exact >>>number of space. >>> >> >>Fine... it can be improved when it has to be, as you said. >> > >-- >Richard Yang >Help you, Help me __
Re: [PATCH 2/4] cxl: sparse: Make declarations static
On Fri, 2015-08-07 at 13:18 +1000, Daniel Axtens wrote: > A few declarations were identified by sparse as needing to be static: > > /scratch/dja/linux-capi/drivers/misc/cxl/irq.c:408:6: warning: symbol > 'afu_irq_name_free' was not declared. Should it be static? > /scratch/dja/linux-capi/drivers/misc/cxl/irq.c:467:6: warning: symbol > 'afu_register_hwirqs' was not declared. Should it be static? > /scratch/dja/linux-capi/drivers/misc/cxl/file.c:254:6: warning: symbol > 'afu_compat_ioctl' was not declared. Should it be static? > /scratch/dja/linux-capi/drivers/misc/cxl/file.c:399:30: warning: symbol > 'afu_master_fops' was not declared. Should it be static? > > Make them static. > > Signed-off-by: Daniel Axtens Acked-by: Michael Neuling > --- > drivers/misc/cxl/file.c | 4 ++-- > drivers/misc/cxl/irq.c | 4 ++-- > 2 files changed, 4 insertions(+), 4 deletions(-) > > diff --git a/drivers/misc/cxl/file.c b/drivers/misc/cxl/file.c > index e3f4b69527a9..c8c8bfa2679b 100644 > --- a/drivers/misc/cxl/file.c > +++ b/drivers/misc/cxl/file.c > @@ -251,7 +251,7 @@ long afu_ioctl(struct file *file, unsigned int cmd, > unsigned long arg) > return -EINVAL; > } > > -long afu_compat_ioctl(struct file *file, unsigned int cmd, > +static long afu_compat_ioctl(struct file *file, unsigned int cmd, >unsigned long arg) > { > return afu_ioctl(file, cmd, arg); > @@ -396,7 +396,7 @@ const struct file_operations afu_fops = { > .mmap = afu_mmap, > }; > > -const struct file_operations afu_master_fops = { > +static const struct file_operations afu_master_fops = { > .owner = THIS_MODULE, > .open = afu_master_open, > .poll = afu_poll, > diff --git a/drivers/misc/cxl/irq.c b/drivers/misc/cxl/irq.c > index 680cd263436d..77e5d0e7ebe1 100644 > --- a/drivers/misc/cxl/irq.c > +++ b/drivers/misc/cxl/irq.c > @@ -405,7 +405,7 @@ void cxl_release_psl_irq(struct cxl_afu *afu) > kfree(afu->psl_irq_name); > } > > -void afu_irq_name_free(struct cxl_context *ctx) > +static void afu_irq_name_free(struct cxl_context *ctx) > { > struct cxl_irq_name *irq_name, *tmp; > > @@ -464,7 +464,7 @@ out: > return -ENOMEM; > } > > -void afu_register_hwirqs(struct cxl_context *ctx) > +static void afu_register_hwirqs(struct cxl_context *ctx) > { > irq_hw_number_t hwirq; > struct cxl_irq_name *irq_name; ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V2 6/6] powerpc/powernv: allocate discrete PE# when using M64 BAR in Single PE mode
On Fri, Aug 07, 2015 at 01:43:01PM +1000, Gavin Shan wrote: >On Fri, Aug 07, 2015 at 10:33:33AM +0800, Wei Yang wrote: >>On Fri, Aug 07, 2015 at 11:36:56AM +1000, Gavin Shan wrote: >>>On Thu, Aug 06, 2015 at 09:41:41PM +0800, Wei Yang wrote: On Thu, Aug 06, 2015 at 03:36:01PM +1000, Gavin Shan wrote: >On Wed, Aug 05, 2015 at 09:25:03AM +0800, Wei Yang wrote: >>When M64 BAR is set to Single PE mode, the PE# assigned to VF could be >>discrete. >> >>This patch restructures the patch to allocate discrete PE# for VFs when >>M64 >>BAR is set to Single PE mode. >> >>Signed-off-by: Wei Yang >>--- >> arch/powerpc/include/asm/pci-bridge.h |2 +- >> arch/powerpc/platforms/powernv/pci-ioda.c | 69 >> + >> 2 files changed, 51 insertions(+), 20 deletions(-) >> >>diff --git a/arch/powerpc/include/asm/pci-bridge.h >>b/arch/powerpc/include/asm/pci-bridge.h >>index 8aeba4c..72415c7 100644 >>--- a/arch/powerpc/include/asm/pci-bridge.h >>+++ b/arch/powerpc/include/asm/pci-bridge.h >>@@ -213,7 +213,7 @@ struct pci_dn { >> #ifdef CONFIG_PCI_IOV >> u16 vfs_expanded; /* number of VFs IOV BAR expanded */ >> u16 num_vfs;/* number of VFs enabled*/ >>- int offset; /* PE# for the first VF PE */ >>+ int *offset;/* PE# for the first VF PE or array */ >> boolm64_single_mode;/* Use M64 BAR in Single Mode */ >> #define IODA_INVALID_M64(-1) >> int (*m64_map)[PCI_SRIOV_NUM_BARS]; > >how about renaming "offset" to "pe_num_map", or "pe_map" ? Similar to the >comments >I gave to the "m64_bar_map", num_of_max_vfs entries can be allocated. >Though not >all of them will be used, not too much memory will be wasted. > Thanks for your comment. I have thought about change the name to make it more self explain. While another fact I want to take in is this field is also used to be reflect the shift offset when M64 BAR is used in the Shared Mode. So I maintain the name. How about use "enum", one maintain the name "offset", and another one rename to "pe_num_map". And use the meaningful name at proper place? >> >>So I suppose you agree with my naming proposal. >> > >No, I dislike the "enum" things. > OK, then you suggest to rename it pe_num_map or keep it as offset? >>> >>>Ok. I'm explaining it with more details. There are two cases: single vs >>>shared >>>mode. When PHB M64 BARs run in single mode, you need an array to track the >>>allocated discrete PE#. The VF_index is the index to the array. When PHB M64 >>>BARs run in shared mode, you need continuous PE#. No array required for this >>>case. Instead, the starting PE# should be stored to somewhere, which can >>>be pdn->offset[0] simply. >>> >>>So when allocating memory for this array, you just simply allocate >>>(sizeof(*pdn->offset) >>>*max_vf_num) no matter what mode PHB's M64 BARs will run in. The point is >>>nobody >>>can enable (max_vf_num + 1) VFs. >> >>The max_vf_num is 15? >> > >I don't understand why you said: the max_vf_num is 15. Since max_vf_num is >variable >on different PFs, how can it be fixed value - 15 ? > In Shared PE case, only one int to indicate the start PE# is fine. In Single PE mode, we totally could enable 15 VF, the same number of PEs for each VF, which is limited by the number M64 BARs we have in the system. If not, the number you expected is total_vfs? >>> >>>With above way, the arrays for PE# and M64 BAR remapping needn't be allocated >>>when enabling SRIOV capability and releasing on disabling SRIOV capability. >>>Instead, those two arrays can be allocated during resource fixup time and >>>free'ed >>>when destroying the pdn. >>> >> >>My same point of view like previous, if the memory is not in the concern, how >>about define them static? >> > >It's a bad idea from my review. How many entries this array is going to have? >256 * NUM_OF_MAX_VF_BARS ? > No. It has 15 * 6, 15 VFs we could enable at most and 6 VF BARs a VF could have at most. >>And for the long term, we may support more VFs. Then at that moment, we need >>to restructure the code to meet it. >> >>So I suggest if we want to allocate it dynamically, we allocate the exact >>number of space. >> > >Fine... it can be improved when it has to be, as you said. > -- Richard Yang Help you, Help me ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 4/4] cxl: sparse: Silence iomem warning in debugfs file creation
On Fri, 2015-08-07 at 13:18 +1000, Daniel Axtens wrote: > An IO address, tagged with __iomem, is passed to debugfs_create_file > as private data. This requires that it be cast to void *. The cast > creates a sparse warning: > /scratch/dja/linux-capi/drivers/misc/cxl/debugfs.c:51:57: warning: cast > removes address space of expression > > The address space marker is added back in the file operations > (fops_io_u64). > > Silence the warning with __force. > > Signed-off-by: Daniel Axtens Acked-by: Michael Neuling > --- > drivers/misc/cxl/debugfs.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/misc/cxl/debugfs.c b/drivers/misc/cxl/debugfs.c > index 825c412580bc..18df6f44af2a 100644 > --- a/drivers/misc/cxl/debugfs.c > +++ b/drivers/misc/cxl/debugfs.c > @@ -48,7 +48,7 @@ DEFINE_SIMPLE_ATTRIBUTE(fops_io_x64, debugfs_io_u64_get, > debugfs_io_u64_set, "0x > static struct dentry *debugfs_create_io_x64(const char *name, umode_t mode, > struct dentry *parent, u64 __iomem > *value) > { > - return debugfs_create_file(name, mode, parent, (void *)value, > &fops_io_x64); > + return debugfs_create_file(name, mode, parent, (void __force *)value, > &fops_io_x64); > } > > int cxl_debugfs_adapter_add(struct cxl *adapter) ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 3/4] cxl: sparse: Flag iomem pointers properly
On Fri, 2015-08-07 at 13:18 +1000, Daniel Axtens wrote: > Sparse identifies the following address space issues: > /scratch/dja/linux-capi/drivers/misc/cxl/vphb.c:125:17: warning: incorrect > type in assignment (different address spaces) > /scratch/dja/linux-capi/drivers/misc/cxl/vphb.c:125:17:expected void > volatile [noderef] * > /scratch/dja/linux-capi/drivers/misc/cxl/vphb.c:125:17:got void * > /scratch/dja/linux-capi/drivers/misc/cxl/vphb.c:220:23: warning: incorrect > type in assignment (different address spaces) > /scratch/dja/linux-capi/drivers/misc/cxl/vphb.c:220:23:expected void > [noderef] *cfg_data > /scratch/dja/linux-capi/drivers/misc/cxl/vphb.c:220:23:got void * > > Fix those by flagging __iomem in the relevant casts. > > Signed-off-by: Daniel Axtens Acked-by: Michael Neuling > --- > drivers/misc/cxl/vphb.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/misc/cxl/vphb.c b/drivers/misc/cxl/vphb.c > index 2eba002b580b..a7b55772a91c 100644 > --- a/drivers/misc/cxl/vphb.c > +++ b/drivers/misc/cxl/vphb.c > @@ -122,7 +122,7 @@ static int cxl_pcie_config_info(struct pci_bus *bus, > unsigned int devfn, > return PCIBIOS_BAD_REGISTER_NUMBER; > addr = cxl_pcie_cfg_addr(phb, bus->number, devfn, offset); > > - *ioaddr = (void *)(addr & ~0x3ULL); > + *ioaddr = (void __iomem *)(addr & ~0x3ULL); > *shift = ((addr & 0x3) * 8); > switch (len) { > case 1: > @@ -217,7 +217,7 @@ int cxl_pci_vphb_add(struct cxl_afu *afu) > /* Setup the PHB using arch provided callback */ > phb->ops = &cxl_pcie_pci_ops; > phb->cfg_addr = afu->afu_desc_mmio + afu->crs_offset; > - phb->cfg_data = (void *)(u64)afu->crs_len; > + phb->cfg_data = (void __iomem *)(u64)afu->crs_len; > phb->private_data = afu; > phb->controller_ops = cxl_pci_controller_ops; > ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/4] cxl: Compile with -Werror
On Fri, 2015-08-07 at 13:18 +1000, Daniel Axtens wrote: > It's a good idea, and it brings us in line with the rest of arch/powerpc. > > Signed-off-by: Daniel Axtens Acked-by: Michael Neuling > --- > drivers/misc/cxl/Makefile | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/misc/cxl/Makefile b/drivers/misc/cxl/Makefile > index 14e3f8219a11..6f484dfe78f9 100644 > --- a/drivers/misc/cxl/Makefile > +++ b/drivers/misc/cxl/Makefile > @@ -1,3 +1,5 @@ > +ccflags-y := -Werror > + > cxl-y+= main.o file.o irq.o fault.o native.o > cxl-y+= context.o sysfs.o debugfs.o pci.o > trace.o > cxl-y+= vphb.o api.o ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V2 1/6] powerpc/powernv: don't enable SRIOV when VF BAR contains non M64 BAR
On Fri, Aug 07, 2015 at 10:24:05AM +0800, Wei Yang wrote: >On Fri, Aug 07, 2015 at 11:20:10AM +1000, Gavin Shan wrote: >>On Thu, Aug 06, 2015 at 10:10:10PM +0800, Wei Yang wrote: >>>On Thu, Aug 06, 2015 at 02:35:57PM +1000, Gavin Shan wrote: On Wed, Aug 05, 2015 at 09:24:58AM +0800, Wei Yang wrote: >On PHB_IODA2, we enable SRIOV devices by mapping IOV BAR with M64 BARs. If >a SRIOV device's BAR is not 64-bit prefetchable, this is not assigned from >M64 windwo, which means M64 BAR can't work on it. > s/PHB_IODA2/PHB3 s/windwo/window >This patch makes this explicit. > >Signed-off-by: Wei Yang The idea sounds right, but there is one question as below. >--- > arch/powerpc/platforms/powernv/pci-ioda.c | 25 + > 1 file changed, 9 insertions(+), 16 deletions(-) > >diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c >b/arch/powerpc/platforms/powernv/pci-ioda.c >index 5738d31..9b41dba 100644 >--- a/arch/powerpc/platforms/powernv/pci-ioda.c >+++ b/arch/powerpc/platforms/powernv/pci-ioda.c >@@ -908,9 +908,6 @@ static int pnv_pci_vf_resource_shift(struct pci_dev >*dev, int offset) > if (!res->flags || !res->parent) > continue; > >- if (!pnv_pci_is_mem_pref_64(res->flags)) >- continue; >- > /* >* The actual IOV BAR range is determined by the start address >* and the actual size for num_vfs VFs BAR. This check is to >@@ -939,9 +936,6 @@ static int pnv_pci_vf_resource_shift(struct pci_dev >*dev, int offset) > if (!res->flags || !res->parent) > continue; > >- if (!pnv_pci_is_mem_pref_64(res->flags)) >- continue; >- > size = pci_iov_resource_size(dev, i + PCI_IOV_RESOURCES); > res2 = *res; > res->start += size * offset; >@@ -1221,9 +1215,6 @@ static int pnv_pci_vf_assign_m64(struct pci_dev >*pdev, u16 num_vfs) > if (!res->flags || !res->parent) > continue; > >- if (!pnv_pci_is_mem_pref_64(res->flags)) >- continue; >- > for (j = 0; j < vf_groups; j++) { > do { > win = > find_next_zero_bit(&phb->ioda.m64_bar_alloc, >@@ -1510,6 +1501,12 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 >num_vfs) > pdn = pci_get_pdn(pdev); > > if (phb->type == PNV_PHB_IODA2) { >+ if (!pdn->vfs_expanded) { >+ dev_info(&pdev->dev, "don't support this SRIOV device" >+ " with non M64 VF BAR\n"); >+ return -EBUSY; >+ } >+ It would be -ENOSPC since -EBUSY indicates the devices (VFs) are temparily unavailable. For this case, the VFs are permanently unavailable because of running out of space to accomodate M64 and non-M64 VF BARs. The error message could be printed with dev_warn() and it would be precise as below or something else you prefer: dev_warn(&pdev->dev, "SRIOV not supported because of non-M64 VF BAR\n"); >>> >>>Thanks for the comment, will change accordingly. >>> > /* Calculate available PE for required VFs */ > mutex_lock(&phb->ioda.pe_alloc_mutex); > pdn->offset = bitmap_find_next_zero_area( >@@ -2774,9 +2771,10 @@ static void pnv_pci_ioda_fixup_iov_resources(struct >pci_dev *pdev) > if (!res->flags || res->parent) > continue; > if (!pnv_pci_is_mem_pref_64(res->flags)) { >- dev_warn(&pdev->dev, " non M64 VF BAR%d: %pR\n", >+ dev_warn(&pdev->dev, "Don't support SR-IOV with" >+ " non M64 VF BAR%d: %pR. \n", >i, res); >- continue; >+ return; > } > > size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES); >@@ -2795,11 +2793,6 @@ static void pnv_pci_ioda_fixup_iov_resources(struct >pci_dev *pdev) > res = &pdev->resource[i + PCI_IOV_RESOURCES]; > if (!res->flags || res->parent) > continue; >- if (!pnv_pci_is_mem_pref_64(res->flags)) { >- dev_warn(&pdev->dev, "Skipping expanding VF BAR%d: >%pR\n", >- i, res); >- continue; >- } When any one IOV BAR on the PF is non-M64, none of the VFs can be enabled. Will we still allocate/assign M64 or M32 resources for the IOV BARs? If so, I think it can be avoided. >>> >>>Don't get your point. You mean to avoid this function?
Re: [PATCH] powerpc/powernv: move dma_get_required_mask from pnv_phb to pci_controller_ops
On Fri, 2015-08-07 at 13:45 +1000, Andrew Donnellan wrote: > Simplify the dma_get_required_mask call chain by moving it from pnv_phb to > pci_controller_ops, similar to commit 763d2d8df1ee2b92ff09c > ("powerpc/powernv: Move dma_set_mask from pnv_phb to pci_controller_ops"). > > Previous call chain: > > 0) call dma_get_required_mask() (kernel/dma.c) > 1) call ppc_md.dma_get_required_mask, if it exists. On powernv, that > points to pnv_dma_get_required_mask() (platforms/powernv/setup.c) > 2) device is PCI, therefore call pnv_pci_dma_get_required_mask() > (platforms/powernv/pci.c) > 3) call phb->dma_get_required_mask if it exists > 4) it only exists in the ioda case, where it points to >pnv_pci_ioda_dma_get_required_mask() (platforms/powernv/pci-ioda.c) > > New call chain: > > 0) call dma_get_required_mask() (kernel/dma.c) > 1) device is PCI, therefore call pci_controller_ops.dma_get_required_mask > if it exists > 2) in the ioda case, that points to pnv_pci_ioda_dma_get_required_mask() > (platforms/powernv/pci-ioda.c) > > In the p5ioc2 case, the call chain remains the same - > dma_get_required_mask() does not find either a ppc_md call or > pci_controller_ops call, so it calls __dma_get_required_mask(). > > Signed-off-by: Andrew Donnellan > Reviewed-by: Daniel Axtens -- Regards, Daniel signature.asc Description: This is a digitally signed message part ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] powerpc/powernv: move dma_get_required_mask from pnv_phb to pci_controller_ops
Simplify the dma_get_required_mask call chain by moving it from pnv_phb to pci_controller_ops, similar to commit 763d2d8df1ee2b92ff09c ("powerpc/powernv: Move dma_set_mask from pnv_phb to pci_controller_ops"). Previous call chain: 0) call dma_get_required_mask() (kernel/dma.c) 1) call ppc_md.dma_get_required_mask, if it exists. On powernv, that points to pnv_dma_get_required_mask() (platforms/powernv/setup.c) 2) device is PCI, therefore call pnv_pci_dma_get_required_mask() (platforms/powernv/pci.c) 3) call phb->dma_get_required_mask if it exists 4) it only exists in the ioda case, where it points to pnv_pci_ioda_dma_get_required_mask() (platforms/powernv/pci-ioda.c) New call chain: 0) call dma_get_required_mask() (kernel/dma.c) 1) device is PCI, therefore call pci_controller_ops.dma_get_required_mask if it exists 2) in the ioda case, that points to pnv_pci_ioda_dma_get_required_mask() (platforms/powernv/pci-ioda.c) In the p5ioc2 case, the call chain remains the same - dma_get_required_mask() does not find either a ppc_md call or pci_controller_ops call, so it calls __dma_get_required_mask(). Signed-off-by: Andrew Donnellan --- dma_get_required_mask() is only used in a small handful of drivers, for which I don't have access to appropriate hardware, so this was tested by artificially inserting calls into the tg3 PCI network card driver. --- arch/powerpc/include/asm/pci-bridge.h | 1 + arch/powerpc/kernel/dma.c | 7 +++ arch/powerpc/platforms/powernv/pci-ioda.c | 7 --- arch/powerpc/platforms/powernv/pci.c | 11 --- arch/powerpc/platforms/powernv/pci.h | 2 -- arch/powerpc/platforms/powernv/powernv.h | 6 -- arch/powerpc/platforms/powernv/setup.c| 9 - 7 files changed, 12 insertions(+), 31 deletions(-) diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h index 712add5..37fc535 100644 --- a/arch/powerpc/include/asm/pci-bridge.h +++ b/arch/powerpc/include/asm/pci-bridge.h @@ -42,6 +42,7 @@ struct pci_controller_ops { #endif int (*dma_set_mask)(struct pci_dev *dev, u64 dma_mask); + u64 (*dma_get_required_mask)(struct pci_dev *dev); void(*shutdown)(struct pci_controller *); }; diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c index 35e4dcc..9d68589 100644 --- a/arch/powerpc/kernel/dma.c +++ b/arch/powerpc/kernel/dma.c @@ -278,6 +278,13 @@ u64 dma_get_required_mask(struct device *dev) if (ppc_md.dma_get_required_mask) return ppc_md.dma_get_required_mask(dev); + if (dev_is_pci(dev)) { + struct pci_dev *pdev = to_pci_dev(dev); + struct pci_controller *phb = pci_bus_to_host(pdev->bus); + if (phb->controller_ops.dma_get_required_mask) + return phb->controller_ops.dma_get_required_mask(pdev); + } + return __dma_get_required_mask(dev); } EXPORT_SYMBOL_GPL(dma_get_required_mask); diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index 85cbc96..6f015d4 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -1630,9 +1630,10 @@ static int pnv_pci_ioda_dma_set_mask(struct pci_dev *pdev, u64 dma_mask) return 0; } -static u64 pnv_pci_ioda_dma_get_required_mask(struct pnv_phb *phb, - struct pci_dev *pdev) +static u64 pnv_pci_ioda_dma_get_required_mask(struct pci_dev *pdev) { + struct pci_controller *hose = pci_bus_to_host(pdev->bus); + struct pnv_phb *phb = hose->private_data; struct pci_dn *pdn = pci_get_pdn(pdev); struct pnv_ioda_pe *pe; u64 end, mask; @@ -3057,6 +3058,7 @@ static const struct pci_controller_ops pnv_pci_ioda_controller_ops = { .window_alignment = pnv_pci_window_alignment, .reset_secondary_bus = pnv_pci_reset_secondary_bus, .dma_set_mask = pnv_pci_ioda_dma_set_mask, + .dma_get_required_mask = pnv_pci_ioda_dma_get_required_mask, .shutdown = pnv_pci_ioda_shutdown, }; @@ -3203,7 +3205,6 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np, /* Setup TCEs */ phb->dma_dev_setup = pnv_pci_ioda_dma_dev_setup; - phb->dma_get_required_mask = pnv_pci_ioda_dma_get_required_mask; /* Setup MSI support */ pnv_pci_init_ioda_msis(phb); diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c index 765d8ed..3e7f6fd 100644 --- a/arch/powerpc/platforms/powernv/pci.c +++ b/arch/powerpc/platforms/powernv/pci.c @@ -761,17 +761,6 @@ void pnv_pci_dma_dev_setup(struct pci_dev *pdev) phb->dma_dev_setup(phb, pdev); } -u64 pnv_pci_dma_get_required_mask(struct pci_dev *pdev) -{ - struct pci_controller *hose = pci_bus_to_host(pdev->bus); - s
Re: [PATCH V2 6/6] powerpc/powernv: allocate discrete PE# when using M64 BAR in Single PE mode
On Fri, Aug 07, 2015 at 10:33:33AM +0800, Wei Yang wrote: >On Fri, Aug 07, 2015 at 11:36:56AM +1000, Gavin Shan wrote: >>On Thu, Aug 06, 2015 at 09:41:41PM +0800, Wei Yang wrote: >>>On Thu, Aug 06, 2015 at 03:36:01PM +1000, Gavin Shan wrote: On Wed, Aug 05, 2015 at 09:25:03AM +0800, Wei Yang wrote: >When M64 BAR is set to Single PE mode, the PE# assigned to VF could be >discrete. > >This patch restructures the patch to allocate discrete PE# for VFs when M64 >BAR is set to Single PE mode. > >Signed-off-by: Wei Yang >--- > arch/powerpc/include/asm/pci-bridge.h |2 +- > arch/powerpc/platforms/powernv/pci-ioda.c | 69 > + > 2 files changed, 51 insertions(+), 20 deletions(-) > >diff --git a/arch/powerpc/include/asm/pci-bridge.h >b/arch/powerpc/include/asm/pci-bridge.h >index 8aeba4c..72415c7 100644 >--- a/arch/powerpc/include/asm/pci-bridge.h >+++ b/arch/powerpc/include/asm/pci-bridge.h >@@ -213,7 +213,7 @@ struct pci_dn { > #ifdef CONFIG_PCI_IOV > u16 vfs_expanded; /* number of VFs IOV BAR expanded */ > u16 num_vfs;/* number of VFs enabled*/ >- int offset; /* PE# for the first VF PE */ >+ int *offset;/* PE# for the first VF PE or array */ > boolm64_single_mode;/* Use M64 BAR in Single Mode */ > #define IODA_INVALID_M64(-1) > int (*m64_map)[PCI_SRIOV_NUM_BARS]; how about renaming "offset" to "pe_num_map", or "pe_map" ? Similar to the comments I gave to the "m64_bar_map", num_of_max_vfs entries can be allocated. Though not all of them will be used, not too much memory will be wasted. >>> >>>Thanks for your comment. >>> >>>I have thought about change the name to make it more self explain. While >>>another fact I want to take in is this field is also used to be reflect the >>>shift offset when M64 BAR is used in the Shared Mode. So I maintain the name. >>> >>>How about use "enum", one maintain the name "offset", and another one rename >>>to >>>"pe_num_map". And use the meaningful name at proper place? >>> > >So I suppose you agree with my naming proposal. > No, I dislike the "enum" things. >> >>Ok. I'm explaining it with more details. There are two cases: single vs shared >>mode. When PHB M64 BARs run in single mode, you need an array to track the >>allocated discrete PE#. The VF_index is the index to the array. When PHB M64 >>BARs run in shared mode, you need continuous PE#. No array required for this >>case. Instead, the starting PE# should be stored to somewhere, which can >>be pdn->offset[0] simply. >> >>So when allocating memory for this array, you just simply allocate >>(sizeof(*pdn->offset) >>*max_vf_num) no matter what mode PHB's M64 BARs will run in. The point is >>nobody >>can enable (max_vf_num + 1) VFs. > >The max_vf_num is 15? > I don't understand why you said: the max_vf_num is 15. Since max_vf_num is variable on different PFs, how can it be fixed value - 15 ? >> >>With above way, the arrays for PE# and M64 BAR remapping needn't be allocated >>when enabling SRIOV capability and releasing on disabling SRIOV capability. >>Instead, those two arrays can be allocated during resource fixup time and >>free'ed >>when destroying the pdn. >> > >My same point of view like previous, if the memory is not in the concern, how >about define them static? > It's a bad idea from my review. How many entries this array is going to have? 256 * NUM_OF_MAX_VF_BARS ? >And for the long term, we may support more VFs. Then at that moment, we need >to restructure the code to meet it. > >So I suggest if we want to allocate it dynamically, we allocate the exact >number of space. > Fine... it can be improved when it has to be, as you said. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/4] cxl: Compile with -Werror
> Do you need to put this patch last so that you don't break bisecting the > series? > The warnings fixed in the rest of the series are from sparse, not the compiler, so they don't break building with -Werror. -- Regards, Daniel signature.asc Description: This is a digitally signed message part ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/4] cxl: Compile with -Werror
On Fri, 2015-08-07 at 13:18 +1000, Daniel Axtens wrote: > It's a good idea, and it brings us in line with the rest of arch/powerpc. Do you need to put this patch last so that you don't break bisecting the series? Mikey > Signed-off-by: Daniel Axtens > --- > drivers/misc/cxl/Makefile | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/misc/cxl/Makefile b/drivers/misc/cxl/Makefile > index 14e3f8219a11..6f484dfe78f9 100644 > --- a/drivers/misc/cxl/Makefile > +++ b/drivers/misc/cxl/Makefile > @@ -1,3 +1,5 @@ > +ccflags-y := -Werror > + > cxl-y+= main.o file.o irq.o fault.o native.o > cxl-y+= context.o sysfs.o debugfs.o pci.o > trace.o > cxl-y+= vphb.o api.o ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 4/4] cxl: sparse: Silence iomem warning in debugfs file creation
An IO address, tagged with __iomem, is passed to debugfs_create_file as private data. This requires that it be cast to void *. The cast creates a sparse warning: /scratch/dja/linux-capi/drivers/misc/cxl/debugfs.c:51:57: warning: cast removes address space of expression The address space marker is added back in the file operations (fops_io_u64). Silence the warning with __force. Signed-off-by: Daniel Axtens --- drivers/misc/cxl/debugfs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/misc/cxl/debugfs.c b/drivers/misc/cxl/debugfs.c index 825c412580bc..18df6f44af2a 100644 --- a/drivers/misc/cxl/debugfs.c +++ b/drivers/misc/cxl/debugfs.c @@ -48,7 +48,7 @@ DEFINE_SIMPLE_ATTRIBUTE(fops_io_x64, debugfs_io_u64_get, debugfs_io_u64_set, "0x static struct dentry *debugfs_create_io_x64(const char *name, umode_t mode, struct dentry *parent, u64 __iomem *value) { - return debugfs_create_file(name, mode, parent, (void *)value, &fops_io_x64); + return debugfs_create_file(name, mode, parent, (void __force *)value, &fops_io_x64); } int cxl_debugfs_adapter_add(struct cxl *adapter) -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 3/4] cxl: sparse: Flag iomem pointers properly
Sparse identifies the following address space issues: /scratch/dja/linux-capi/drivers/misc/cxl/vphb.c:125:17: warning: incorrect type in assignment (different address spaces) /scratch/dja/linux-capi/drivers/misc/cxl/vphb.c:125:17:expected void volatile [noderef] * /scratch/dja/linux-capi/drivers/misc/cxl/vphb.c:125:17:got void * /scratch/dja/linux-capi/drivers/misc/cxl/vphb.c:220:23: warning: incorrect type in assignment (different address spaces) /scratch/dja/linux-capi/drivers/misc/cxl/vphb.c:220:23:expected void [noderef] *cfg_data /scratch/dja/linux-capi/drivers/misc/cxl/vphb.c:220:23:got void * Fix those by flagging __iomem in the relevant casts. Signed-off-by: Daniel Axtens --- drivers/misc/cxl/vphb.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/misc/cxl/vphb.c b/drivers/misc/cxl/vphb.c index 2eba002b580b..a7b55772a91c 100644 --- a/drivers/misc/cxl/vphb.c +++ b/drivers/misc/cxl/vphb.c @@ -122,7 +122,7 @@ static int cxl_pcie_config_info(struct pci_bus *bus, unsigned int devfn, return PCIBIOS_BAD_REGISTER_NUMBER; addr = cxl_pcie_cfg_addr(phb, bus->number, devfn, offset); - *ioaddr = (void *)(addr & ~0x3ULL); + *ioaddr = (void __iomem *)(addr & ~0x3ULL); *shift = ((addr & 0x3) * 8); switch (len) { case 1: @@ -217,7 +217,7 @@ int cxl_pci_vphb_add(struct cxl_afu *afu) /* Setup the PHB using arch provided callback */ phb->ops = &cxl_pcie_pci_ops; phb->cfg_addr = afu->afu_desc_mmio + afu->crs_offset; - phb->cfg_data = (void *)(u64)afu->crs_len; + phb->cfg_data = (void __iomem *)(u64)afu->crs_len; phb->private_data = afu; phb->controller_ops = cxl_pci_controller_ops; -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 2/4] cxl: sparse: Make declarations static
A few declarations were identified by sparse as needing to be static: /scratch/dja/linux-capi/drivers/misc/cxl/irq.c:408:6: warning: symbol 'afu_irq_name_free' was not declared. Should it be static? /scratch/dja/linux-capi/drivers/misc/cxl/irq.c:467:6: warning: symbol 'afu_register_hwirqs' was not declared. Should it be static? /scratch/dja/linux-capi/drivers/misc/cxl/file.c:254:6: warning: symbol 'afu_compat_ioctl' was not declared. Should it be static? /scratch/dja/linux-capi/drivers/misc/cxl/file.c:399:30: warning: symbol 'afu_master_fops' was not declared. Should it be static? Make them static. Signed-off-by: Daniel Axtens --- drivers/misc/cxl/file.c | 4 ++-- drivers/misc/cxl/irq.c | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/misc/cxl/file.c b/drivers/misc/cxl/file.c index e3f4b69527a9..c8c8bfa2679b 100644 --- a/drivers/misc/cxl/file.c +++ b/drivers/misc/cxl/file.c @@ -251,7 +251,7 @@ long afu_ioctl(struct file *file, unsigned int cmd, unsigned long arg) return -EINVAL; } -long afu_compat_ioctl(struct file *file, unsigned int cmd, +static long afu_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg) { return afu_ioctl(file, cmd, arg); @@ -396,7 +396,7 @@ const struct file_operations afu_fops = { .mmap = afu_mmap, }; -const struct file_operations afu_master_fops = { +static const struct file_operations afu_master_fops = { .owner = THIS_MODULE, .open = afu_master_open, .poll = afu_poll, diff --git a/drivers/misc/cxl/irq.c b/drivers/misc/cxl/irq.c index 680cd263436d..77e5d0e7ebe1 100644 --- a/drivers/misc/cxl/irq.c +++ b/drivers/misc/cxl/irq.c @@ -405,7 +405,7 @@ void cxl_release_psl_irq(struct cxl_afu *afu) kfree(afu->psl_irq_name); } -void afu_irq_name_free(struct cxl_context *ctx) +static void afu_irq_name_free(struct cxl_context *ctx) { struct cxl_irq_name *irq_name, *tmp; @@ -464,7 +464,7 @@ out: return -ENOMEM; } -void afu_register_hwirqs(struct cxl_context *ctx) +static void afu_register_hwirqs(struct cxl_context *ctx) { irq_hw_number_t hwirq; struct cxl_irq_name *irq_name; -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 1/4] cxl: Compile with -Werror
It's a good idea, and it brings us in line with the rest of arch/powerpc. Signed-off-by: Daniel Axtens --- drivers/misc/cxl/Makefile | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/misc/cxl/Makefile b/drivers/misc/cxl/Makefile index 14e3f8219a11..6f484dfe78f9 100644 --- a/drivers/misc/cxl/Makefile +++ b/drivers/misc/cxl/Makefile @@ -1,3 +1,5 @@ +ccflags-y := -Werror + cxl-y += main.o file.o irq.o fault.o native.o cxl-y += context.o sysfs.o debugfs.o pci.o trace.o cxl-y += vphb.o api.o -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/3] Powerpc: mpc85xx: refactor the PM operations
On Fri, Aug 7, 2015 at 2:02 AM, Scott Wood wrote: On Thu, 2015-08-06 at 13:54 +0800, Chenhui Zhao wrote: On Thu, Aug 6, 2015 at 1:46 PM, Scott Wood wrote: > On Thu, 2015-08-06 at 12:20 +0800, Chenhui Zhao wrote: > > On Thu, Aug 6, 2015 at 10:57 AM, Scott Wood > > > > wrote: > > > On Wed, 2015-08-05 at 18:11 +0800, Chenhui Zhao wrote: > > > > On Tue, Aug 4, 2015 at 4:26 AM, Scott Wood > > > > > > wrote: > > > > > On Mon, 2015-08-03 at 19:32 +0800, Chenhui Zhao wrote: > > > > > > > > > > > > > > > > > > On Sat, Aug 1, 2015 at 7:59 AM, Scott Wood > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Could you explain irq_mask()? Why would there still be > > IRQs > > > > > > destined > > > > > > > for > > > > > > > this CPU at this point? > > > > > > > > > > > > This function just masks irq by setting the registers in > > RCPM > > > > (for > > > > > > example, RCPM_CPMIMR, RCPM_CPMCIMR). Actually, all irqs to > > > > this CPU > > > > > > have been migrated to other CPUs. > > > > > > > > > > So why do we need to set those bits in RCPM? Is it just > > caution? > > > > > > > > Setting these bits can mask interrupts signalled to RCPM from > > MPIC > > > > as a > > > > means of > > > > waking up from a lower power state. So, cores will not be > > waked up > > > > unexpectedly. > > > > > > Why would the MPIC be signalling those interrupts if they've been > > > masked at > > > the MPIC? > > > > > > -Scott > > > > > > > The interrupts to RCPM from MPIC are IRQ, Machine Check, NMI and > > Critical interrupts. Some of them didn't be masked in MPIC. > > What interrupt could actually happen to a sleeping cpu that this > protects > against? > > -Scott Not sure. Maybe spurious interrupts or hardware exceptions. Spurious interrupts happen due to race conditions. They don't happen because the MPIC is bored and decides to ring a CPU's doorbell and hide in the bushes. If by "hardware exceptions" you mean machine checks, how would such a machine check be generated by a core that is off? However, setting them make sure dead cpus can not be waked up unexpectedly. I'm not seeing enough value here to warrant resurrecting the old sleep node stuff. -Scott My guess maybe not accurate. My point is that electronic parts don't always work as expected. Taking preventative measures can make the system more robust. In addition, this step is required in deep sleep procedure. -Chenhui ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] cxl: Add alternate MMIO error handling
On Thu, 2015-07-23 at 16:43 +1000, Ian Munsie wrote: > From: Ian Munsie > > userspace programs using cxl currently have to use two strategies for > dealing with MMIO errors simultaneously. They have to check every read > for a return of all Fs in case the adapter has gone away and the kernel > has not yet noticed, and they have to deal with SIGBUS in case the > kernel has already noticed, invalidated the mapping and marked the > context as failed. > > In order to simplify things, this patch adds an alternative approach > where the kernel will return a page filled with Fs instead of delivering > a SIGBUS. This allows userspace to only need to deal with one of these > two error paths, and is intended for use in libraries that use cxl > transparently and may not be able to safely install a signal handler. > > This approach will only work if certain constraints are met. Namely, if > the application is both reading and writing to an address in the problem > state area it cannot assume that a non-FF read is OK, as it may just be > reading out a value it has previously written. Further - since only one > page is used per context a write to a given offset would be visible when > reading the same offset from a different page in the mapping (this only > applies within a single context, not between contexts). > > An application could deal with this by e.g. making sure it also reads > from a read-only offset after any reads to a read/write offset. > > Due to these constraints, this functionality must be explicitly > requested by userspace when starting the context by passing in the > CXL_START_WORK_ERR_FF flag. > > Signed-off-by: Ian Munsie Acked-by: Michael Neuling > --- > drivers/misc/cxl/context.c | 14 ++ > drivers/misc/cxl/cxl.h | 4 +++- > drivers/misc/cxl/file.c| 4 +++- > include/uapi/misc/cxl.h| 4 +++- > 4 files changed, 23 insertions(+), 3 deletions(-) > > diff --git a/drivers/misc/cxl/context.c b/drivers/misc/cxl/context.c > index 1287148..6570f10 100644 > --- a/drivers/misc/cxl/context.c > +++ b/drivers/misc/cxl/context.c > @@ -126,6 +126,18 @@ static int cxl_mmap_fault(struct vm_area_struct *vma, > struct vm_fault *vmf) > if (ctx->status != STARTED) { > mutex_unlock(&ctx->status_mutex); > pr_devel("%s: Context not started, failing problem state > access\n", __func__); > + if (ctx->mmio_err_ff) { > + if (!ctx->ff_page) { > + ctx->ff_page = alloc_page(GFP_USER); > + if (!ctx->ff_page) > + return VM_FAULT_OOM; > + memset(page_address(ctx->ff_page), 0xff, > PAGE_SIZE); > + } > + get_page(ctx->ff_page); > + vmf->page = ctx->ff_page; > + vma->vm_page_prot = pgprot_cached(vma->vm_page_prot); > + return 0; > + } > return VM_FAULT_SIGBUS; > } > > @@ -253,6 +265,8 @@ static void reclaim_ctx(struct rcu_head *rcu) > struct cxl_context *ctx = container_of(rcu, struct cxl_context, rcu); > > free_page((u64)ctx->sstp); > + if (ctx->ff_page) > + __free_page(ctx->ff_page); > ctx->sstp = NULL; > > kfree(ctx); > diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h > index 4fd66ca..b7293a4 100644 > --- a/drivers/misc/cxl/cxl.h > +++ b/drivers/misc/cxl/cxl.h > @@ -34,7 +34,7 @@ extern uint cxl_verbose; > * Bump version each time a user API change is made, whether it is > * backwards compatible ot not. > */ > -#define CXL_API_VERSION 1 > +#define CXL_API_VERSION 2 > #define CXL_API_VERSION_COMPATIBLE 1 > > /* > @@ -418,6 +418,8 @@ struct cxl_context { > /* Used to unmap any mmaps when force detaching */ > struct address_space *mapping; > struct mutex mapping_lock; > + struct page *ff_page; > + bool mmio_err_ff; > > spinlock_t sste_lock; /* Protects segment table entries */ > struct cxl_sste *sstp; > diff --git a/drivers/misc/cxl/file.c b/drivers/misc/cxl/file.c > index e3f4b69..34c7a5e 100644 > --- a/drivers/misc/cxl/file.c > +++ b/drivers/misc/cxl/file.c > @@ -179,6 +179,8 @@ static long afu_ioctl_start_work(struct cxl_context *ctx, > if (work.flags & CXL_START_WORK_AMR) > amr = work.amr & mfspr(SPRN_UAMOR); > > + ctx->mmio_err_ff = !!(work.flags & CXL_START_WORK_ERR_FF); > + > /* >* We grab the PID here and not in the file open to allow for the case >* where a process (master, some daemon, etc) has opened the chardev on > @@ -519,7 +521,7 @@ int __init cxl_file_init(void) >* If these change we really need to update API. Either change some >* flags or update API version number CXL_API_VERSION. >*/ > - BUILD_BUG_ON(CXL_API_VERSION != 1); > + BUILD_BUG_ON(CXL_API_VERSION != 2); > BUIL
[PATCH] powerpc/vdso: Emit GNU & SysV hashes
Andy Lutomirski says: Some dynamic loaders may be slightly faster if a GNU hash is available. This is unlikely to have any measurable effect on the time it takes to resolve vdso symbols (since there are so few of them). In some contexts, it can be a win for a different reason: if every DSO has a GNU hash section, then libc can avoid calculating SysV hashes at all. Both musl and glibc appear to have this optimization. Signed-off-by: Michael Ellerman --- arch/powerpc/kernel/vdso32/Makefile | 2 +- arch/powerpc/kernel/vdso64/Makefile | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kernel/vdso32/Makefile b/arch/powerpc/kernel/vdso32/Makefile index 53e6c9b979ec..6abffb7a8cd9 100644 --- a/arch/powerpc/kernel/vdso32/Makefile +++ b/arch/powerpc/kernel/vdso32/Makefile @@ -18,7 +18,7 @@ GCOV_PROFILE := n ccflags-y := -shared -fno-common -fno-builtin ccflags-y += -nostdlib -Wl,-soname=linux-vdso32.so.1 \ - $(call cc-ldoption, -Wl$(comma)--hash-style=sysv) + $(call cc-ldoption, -Wl$(comma)--hash-style=both) asflags-y := -D__VDSO32__ -s obj-y += vdso32_wrapper.o diff --git a/arch/powerpc/kernel/vdso64/Makefile b/arch/powerpc/kernel/vdso64/Makefile index effca9404b17..8c8f2ae43935 100644 --- a/arch/powerpc/kernel/vdso64/Makefile +++ b/arch/powerpc/kernel/vdso64/Makefile @@ -11,7 +11,7 @@ GCOV_PROFILE := n ccflags-y := -shared -fno-common -fno-builtin ccflags-y += -nostdlib -Wl,-soname=linux-vdso64.so.1 \ - $(call cc-ldoption, -Wl$(comma)--hash-style=sysv) + $(call cc-ldoption, -Wl$(comma)--hash-style=both) asflags-y := -D__VDSO64__ -s obj-y += vdso64_wrapper.o -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V2 6/6] powerpc/powernv: allocate discrete PE# when using M64 BAR in Single PE mode
On Fri, Aug 07, 2015 at 11:36:56AM +1000, Gavin Shan wrote: >On Thu, Aug 06, 2015 at 09:41:41PM +0800, Wei Yang wrote: >>On Thu, Aug 06, 2015 at 03:36:01PM +1000, Gavin Shan wrote: >>>On Wed, Aug 05, 2015 at 09:25:03AM +0800, Wei Yang wrote: When M64 BAR is set to Single PE mode, the PE# assigned to VF could be discrete. This patch restructures the patch to allocate discrete PE# for VFs when M64 BAR is set to Single PE mode. Signed-off-by: Wei Yang --- arch/powerpc/include/asm/pci-bridge.h |2 +- arch/powerpc/platforms/powernv/pci-ioda.c | 69 + 2 files changed, 51 insertions(+), 20 deletions(-) diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h index 8aeba4c..72415c7 100644 --- a/arch/powerpc/include/asm/pci-bridge.h +++ b/arch/powerpc/include/asm/pci-bridge.h @@ -213,7 +213,7 @@ struct pci_dn { #ifdef CONFIG_PCI_IOV u16 vfs_expanded; /* number of VFs IOV BAR expanded */ u16 num_vfs;/* number of VFs enabled*/ - int offset; /* PE# for the first VF PE */ + int *offset;/* PE# for the first VF PE or array */ boolm64_single_mode;/* Use M64 BAR in Single Mode */ #define IODA_INVALID_M64(-1) int (*m64_map)[PCI_SRIOV_NUM_BARS]; >>> >>>how about renaming "offset" to "pe_num_map", or "pe_map" ? Similar to the >>>comments >>>I gave to the "m64_bar_map", num_of_max_vfs entries can be allocated. Though >>>not >>>all of them will be used, not too much memory will be wasted. >>> >> >>Thanks for your comment. >> >>I have thought about change the name to make it more self explain. While >>another fact I want to take in is this field is also used to be reflect the >>shift offset when M64 BAR is used in the Shared Mode. So I maintain the name. >> >>How about use "enum", one maintain the name "offset", and another one rename >>to >>"pe_num_map". And use the meaningful name at proper place? >> So I suppose you agree with my naming proposal. > >Ok. I'm explaining it with more details. There are two cases: single vs shared >mode. When PHB M64 BARs run in single mode, you need an array to track the >allocated discrete PE#. The VF_index is the index to the array. When PHB M64 >BARs run in shared mode, you need continuous PE#. No array required for this >case. Instead, the starting PE# should be stored to somewhere, which can >be pdn->offset[0] simply. > >So when allocating memory for this array, you just simply allocate >(sizeof(*pdn->offset) >*max_vf_num) no matter what mode PHB's M64 BARs will run in. The point is >nobody >can enable (max_vf_num + 1) VFs. The max_vf_num is 15? > >With above way, the arrays for PE# and M64 BAR remapping needn't be allocated >when enabling SRIOV capability and releasing on disabling SRIOV capability. >Instead, those two arrays can be allocated during resource fixup time and >free'ed >when destroying the pdn. > My same point of view like previous, if the memory is not in the concern, how about define them static? And for the long term, we may support more VFs. Then at that moment, we need to restructure the code to meet it. So I suggest if we want to allocate it dynamically, we allocate the exact number of space. -- Richard Yang Help you, Help me ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V2 5/6] powerpc/powernv: boundary the total vf bar size instead of the individual one
On Fri, Aug 07, 2015 at 11:23:54AM +1000, Gavin Shan wrote: >On Thu, Aug 06, 2015 at 10:03:04PM +0800, Wei Yang wrote: >>On Thu, Aug 06, 2015 at 03:28:51PM +1000, Gavin Shan wrote: >>>On Wed, Aug 05, 2015 at 09:25:02AM +0800, Wei Yang wrote: Each VF could have 6 BARs at most. When the total BAR size exceeds the gate, after expanding it will also exhaust the M64 Window. This patch limits the boundary by checking the total VF BAR size instead of the individual BAR. Signed-off-by: Wei Yang >>> >>>Ok. I didn't look at this when giving comments to last patch. It turns >>>you have the change in this patch. Please merge it with the previous >>>patch. >>> >> >>Hmm... I prefer to have them in two patches. One focus on the calculation of >>gate and the other focus on checking the total VF BAR size. This would help >>record the change. >> > >It's fine to me as well. I'll take close look on your next revision since >you have to refresh the whole series. Is that fine to you? > Fine and thanks for your comments. --- arch/powerpc/platforms/powernv/pci-ioda.c | 13 +++-- 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index 31dcedc..4042303 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -2702,7 +2702,7 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev) struct pnv_phb *phb; struct resource *res; int i; - resource_size_t size, gate; + resource_size_t size, gate, total_vf_bar_sz; struct pci_dn *pdn; int mul, total_vfs; @@ -2729,6 +2729,7 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev) * Window and limit the system flexibility. */ gate = phb->ioda.m64_segsize >> 1; + total_vf_bar_sz = 0; for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) { res = &pdev->resource[i + PCI_IOV_RESOURCES]; @@ -2741,13 +2742,13 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev) return; } - size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES); + total_vf_bar_sz += pci_iov_resource_size(pdev, + i + PCI_IOV_RESOURCES); /* bigger than or equal to gate */ - if (size >= gate) { - dev_info(&pdev->dev, "PowerNV: VF BAR%d: %pR IOV size " - "is bigger than %lld, roundup power2\n", -i, res, gate); + if (total_vf_bar_sz >= gate) { + dev_info(&pdev->dev, "PowerNV: VF BAR Total IOV size " + "is bigger than %lld, roundup power2\n", gate); mul = roundup_pow_of_two(total_vfs); pdn->m64_single_mode = true; break; -- 1.7.9.5 >> >>-- >>Richard Yang >>Help you, Help me -- Richard Yang Help you, Help me ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V2 1/6] powerpc/powernv: don't enable SRIOV when VF BAR contains non M64 BAR
On Fri, Aug 07, 2015 at 11:20:10AM +1000, Gavin Shan wrote: >On Thu, Aug 06, 2015 at 10:10:10PM +0800, Wei Yang wrote: >>On Thu, Aug 06, 2015 at 02:35:57PM +1000, Gavin Shan wrote: >>>On Wed, Aug 05, 2015 at 09:24:58AM +0800, Wei Yang wrote: On PHB_IODA2, we enable SRIOV devices by mapping IOV BAR with M64 BARs. If a SRIOV device's BAR is not 64-bit prefetchable, this is not assigned from M64 windwo, which means M64 BAR can't work on it. >>> >>>s/PHB_IODA2/PHB3 >>>s/windwo/window >>> This patch makes this explicit. Signed-off-by: Wei Yang >>> >>>The idea sounds right, but there is one question as below. >>> --- arch/powerpc/platforms/powernv/pci-ioda.c | 25 + 1 file changed, 9 insertions(+), 16 deletions(-) diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index 5738d31..9b41dba 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -908,9 +908,6 @@ static int pnv_pci_vf_resource_shift(struct pci_dev *dev, int offset) if (!res->flags || !res->parent) continue; - if (!pnv_pci_is_mem_pref_64(res->flags)) - continue; - /* * The actual IOV BAR range is determined by the start address * and the actual size for num_vfs VFs BAR. This check is to @@ -939,9 +936,6 @@ static int pnv_pci_vf_resource_shift(struct pci_dev *dev, int offset) if (!res->flags || !res->parent) continue; - if (!pnv_pci_is_mem_pref_64(res->flags)) - continue; - size = pci_iov_resource_size(dev, i + PCI_IOV_RESOURCES); res2 = *res; res->start += size * offset; @@ -1221,9 +1215,6 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs) if (!res->flags || !res->parent) continue; - if (!pnv_pci_is_mem_pref_64(res->flags)) - continue; - for (j = 0; j < vf_groups; j++) { do { win = find_next_zero_bit(&phb->ioda.m64_bar_alloc, @@ -1510,6 +1501,12 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs) pdn = pci_get_pdn(pdev); if (phb->type == PNV_PHB_IODA2) { + if (!pdn->vfs_expanded) { + dev_info(&pdev->dev, "don't support this SRIOV device" + " with non M64 VF BAR\n"); + return -EBUSY; + } + >>> >>>It would be -ENOSPC since -EBUSY indicates the devices (VFs) are temparily >>>unavailable. For this case, the VFs are permanently unavailable because of >>>running out of space to accomodate M64 and non-M64 VF BARs. >>> >>>The error message could be printed with dev_warn() and it would be precise >>>as below or something else you prefer: >>> >>> dev_warn(&pdev->dev, "SRIOV not supported because of non-M64 VF BAR\n"); >>> >> >>Thanks for the comment, will change accordingly. >> >>> /* Calculate available PE for required VFs */ mutex_lock(&phb->ioda.pe_alloc_mutex); pdn->offset = bitmap_find_next_zero_area( @@ -2774,9 +2771,10 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev) if (!res->flags || res->parent) continue; if (!pnv_pci_is_mem_pref_64(res->flags)) { - dev_warn(&pdev->dev, " non M64 VF BAR%d: %pR\n", + dev_warn(&pdev->dev, "Don't support SR-IOV with" + " non M64 VF BAR%d: %pR. \n", i, res); - continue; + return; } size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES); @@ -2795,11 +2793,6 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev) res = &pdev->resource[i + PCI_IOV_RESOURCES]; if (!res->flags || res->parent) continue; - if (!pnv_pci_is_mem_pref_64(res->flags)) { - dev_warn(&pdev->dev, "Skipping expanding VF BAR%d: %pR\n", -i, res); - continue; - } >>> >>>When any one IOV BAR on the PF is non-M64, none of the VFs can be enabled. >>>Will we still allocate/assign M64 or M32 resources for the IOV BARs? If so, >>>I think it can be avoided. >>> >> >>Don't get your point. You mean to avoid this function? >> >>Or clear the IOV BAR when we found one of it is non-M64? >> > >I mean to clear all IOV BARs in case any more of them are IO
Re: [PATCH] powerpc/prom: Use DRCONF flags while processing detected LMBs
On Thursday 06 August 2015 06:35 PM, Anshuman Khandual wrote: > This patch just replaces hard coded values with existing Please drop "This patch just" and start with "Replace hard ..." https://www.kernel.org/doc/Documentation/SubmittingPatches Maddy > DRCONF flags while procesing detected LMBs from the device > tree. This does not change any functionality. > > Signed-off-by: Anshuman Khandual > --- > arch/powerpc/kernel/prom.c | 7 --- > 1 file changed, 4 insertions(+), 3 deletions(-) > > diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c > index 8b888b1..70a8cab 100644 > --- a/arch/powerpc/kernel/prom.c > +++ b/arch/powerpc/kernel/prom.c > @@ -476,9 +476,10 @@ static int __init > early_init_dt_scan_drconf_memory(unsigned long node) > flags = of_read_number(&dm[3], 1); > /* skip DRC index, pad, assoc. list index, flags */ > dm += 4; > - /* skip this block if the reserved bit is set in flags (0x80) > -or if the block is not assigned to this partition (0x8) */ > - if ((flags & 0x80) || !(flags & 0x8)) > + /* skip this block if the reserved bit is set in flags > +or if the block is not assigned to this partition */ > + if ((flags & DRCONF_MEM_RESERVED) || > + !(flags & DRCONF_MEM_ASSIGNED)) > continue; > size = memblock_size; > rngs = 1; ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V2 3/6] powerpc/powernv: use one M64 BAR in Single PE mode for one VF BAR
On Thu, Aug 06, 2015 at 08:04:58PM +1000, Alexey Kardashevskiy wrote: >On 08/05/2015 11:25 AM, Wei Yang wrote: >>In current implementation, when VF BAR is bigger than 64MB, it uses 4 M64 >>BAR in Single PE mode to cover the number of VFs required to be enabled. >>By doing so, several VFs would be in one VF Group and leads to interference >>between VFs in the same group. >> >>This patch changes the design by using one M64 BAR in Single PE mode for >>one VF BAR. This gives absolute isolation for VFs. >> >>Signed-off-by: Wei Yang >>--- >> arch/powerpc/include/asm/pci-bridge.h |5 +- >> arch/powerpc/platforms/powernv/pci-ioda.c | 180 >> - >> 2 files changed, 76 insertions(+), 109 deletions(-) >> >>diff --git a/arch/powerpc/include/asm/pci-bridge.h >>b/arch/powerpc/include/asm/pci-bridge.h >>index 712add5..8aeba4c 100644 >>--- a/arch/powerpc/include/asm/pci-bridge.h >>+++ b/arch/powerpc/include/asm/pci-bridge.h >>@@ -214,10 +214,9 @@ struct pci_dn { >> u16 vfs_expanded; /* number of VFs IOV BAR expanded */ >> u16 num_vfs;/* number of VFs enabled*/ >> int offset; /* PE# for the first VF PE */ >>-#define M64_PER_IOV 4 >>- int m64_per_iov; >>+ boolm64_single_mode;/* Use M64 BAR in Single Mode */ >> #define IODA_INVALID_M64(-1) >>- int m64_wins[PCI_SRIOV_NUM_BARS][M64_PER_IOV]; >>+ int (*m64_map)[PCI_SRIOV_NUM_BARS]; >> #endif /* CONFIG_PCI_IOV */ >> #endif >> struct list_head child_list; >>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c >>b/arch/powerpc/platforms/powernv/pci-ioda.c >>index 7192e62..f5d110c 100644 >>--- a/arch/powerpc/platforms/powernv/pci-ioda.c >>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c >>@@ -1148,29 +1148,36 @@ static void pnv_pci_ioda_setup_PEs(void) >> } >> >> #ifdef CONFIG_PCI_IOV >>-static int pnv_pci_vf_release_m64(struct pci_dev *pdev) >>+static int pnv_pci_vf_release_m64(struct pci_dev *pdev, u16 num_vfs) >> { >> struct pci_bus*bus; >> struct pci_controller *hose; >> struct pnv_phb*phb; >> struct pci_dn *pdn; >> inti, j; >>+ intm64_bars; >> >> bus = pdev->bus; >> hose = pci_bus_to_host(bus); >> phb = hose->private_data; >> pdn = pci_get_pdn(pdev); >> >>+ if (pdn->m64_single_mode) >>+ m64_bars = num_vfs; >>+ else >>+ m64_bars = 1; >>+ >> for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) >>- for (j = 0; j < M64_PER_IOV; j++) { >>- if (pdn->m64_wins[i][j] == IODA_INVALID_M64) >>+ for (j = 0; j < m64_bars; j++) { >>+ if (pdn->m64_map[j][i] == IODA_INVALID_M64) >> continue; >> opal_pci_phb_mmio_enable(phb->opal_id, >>- OPAL_M64_WINDOW_TYPE, pdn->m64_wins[i][j], 0); >>- clear_bit(pdn->m64_wins[i][j], >>&phb->ioda.m64_bar_alloc); >>- pdn->m64_wins[i][j] = IODA_INVALID_M64; >>+ OPAL_M64_WINDOW_TYPE, pdn->m64_map[j][i], 0); >>+ clear_bit(pdn->m64_map[j][i], &phb->ioda.m64_bar_alloc); >>+ pdn->m64_map[j][i] = IODA_INVALID_M64; >> } >> >>+ kfree(pdn->m64_map); >> return 0; >> } >> >>@@ -1187,8 +1194,7 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, >>u16 num_vfs) >> inttotal_vfs; >> resource_size_tsize, start; >> intpe_num; >>- intvf_groups; >>- intvf_per_group; >>+ intm64_bars; >> >> bus = pdev->bus; >> hose = pci_bus_to_host(bus); >>@@ -1196,26 +1202,26 @@ static int pnv_pci_vf_assign_m64(struct pci_dev >>*pdev, u16 num_vfs) >> pdn = pci_get_pdn(pdev); >> total_vfs = pci_sriov_get_totalvfs(pdev); >> >>- /* Initialize the m64_wins to IODA_INVALID_M64 */ >>- for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) >>- for (j = 0; j < M64_PER_IOV; j++) >>- pdn->m64_wins[i][j] = IODA_INVALID_M64; >>+ if (pdn->m64_single_mode) > > >This is a physical function's @pdn, right? Yes > > >>+ m64_bars = num_vfs; >>+ else >>+ m64_bars = 1; >>+ >>+ pdn->m64_map = kmalloc(sizeof(*pdn->m64_map) * m64_bars, GFP_KERNEL); > > >Assume we have SRIOV device with 16VF. >So it was m64_wins[6][4], now it is (roughly speaking) m64_map[6][16] >(for a single PE mode) or m64_map[6][1]. I believe m64_bars cannot be >bigger than 16 on PHB3, right? Is this checked anywhere (does it have >to)? In pnv_pci_vf_assign_m64(), we need to find_next_zero_bit() and check the return value. If exceed m64_bar_idx, means fail. > >This m64_wins -> m64_map change - is was not a map (what was it?), >and it is, is not it? Hmm... G
Re: [PATCH V2 1/6] powerpc/powernv: don't enable SRIOV when VF BAR contains non M64 BAR
On Thu, Aug 06, 2015 at 10:10:10PM +0800, Wei Yang wrote: >On Thu, Aug 06, 2015 at 02:35:57PM +1000, Gavin Shan wrote: >>On Wed, Aug 05, 2015 at 09:24:58AM +0800, Wei Yang wrote: >>>On PHB_IODA2, we enable SRIOV devices by mapping IOV BAR with M64 BARs. If >>>a SRIOV device's BAR is not 64-bit prefetchable, this is not assigned from >>>M64 windwo, which means M64 BAR can't work on it. >>> >> >>s/PHB_IODA2/PHB3 >>s/windwo/window >> >>>This patch makes this explicit. >>> >>>Signed-off-by: Wei Yang >> >>The idea sounds right, but there is one question as below. >> >>>--- >>> arch/powerpc/platforms/powernv/pci-ioda.c | 25 + >>> 1 file changed, 9 insertions(+), 16 deletions(-) >>> >>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c >>>b/arch/powerpc/platforms/powernv/pci-ioda.c >>>index 5738d31..9b41dba 100644 >>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c >>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c >>>@@ -908,9 +908,6 @@ static int pnv_pci_vf_resource_shift(struct pci_dev >>>*dev, int offset) >>> if (!res->flags || !res->parent) >>> continue; >>> >>>-if (!pnv_pci_is_mem_pref_64(res->flags)) >>>-continue; >>>- >>> /* >>> * The actual IOV BAR range is determined by the start address >>> * and the actual size for num_vfs VFs BAR. This check is to >>>@@ -939,9 +936,6 @@ static int pnv_pci_vf_resource_shift(struct pci_dev >>>*dev, int offset) >>> if (!res->flags || !res->parent) >>> continue; >>> >>>-if (!pnv_pci_is_mem_pref_64(res->flags)) >>>-continue; >>>- >>> size = pci_iov_resource_size(dev, i + PCI_IOV_RESOURCES); >>> res2 = *res; >>> res->start += size * offset; >>>@@ -1221,9 +1215,6 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, >>>u16 num_vfs) >>> if (!res->flags || !res->parent) >>> continue; >>> >>>-if (!pnv_pci_is_mem_pref_64(res->flags)) >>>-continue; >>>- >>> for (j = 0; j < vf_groups; j++) { >>> do { >>> win = >>> find_next_zero_bit(&phb->ioda.m64_bar_alloc, >>>@@ -1510,6 +1501,12 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 >>>num_vfs) >>> pdn = pci_get_pdn(pdev); >>> >>> if (phb->type == PNV_PHB_IODA2) { >>>+if (!pdn->vfs_expanded) { >>>+dev_info(&pdev->dev, "don't support this SRIOV device" >>>+" with non M64 VF BAR\n"); >>>+return -EBUSY; >>>+} >>>+ >> >>It would be -ENOSPC since -EBUSY indicates the devices (VFs) are temparily >>unavailable. For this case, the VFs are permanently unavailable because of >>running out of space to accomodate M64 and non-M64 VF BARs. >> >>The error message could be printed with dev_warn() and it would be precise >>as below or something else you prefer: >> >> dev_warn(&pdev->dev, "SRIOV not supported because of non-M64 VF BAR\n"); >> > >Thanks for the comment, will change accordingly. > >> >>> /* Calculate available PE for required VFs */ >>> mutex_lock(&phb->ioda.pe_alloc_mutex); >>> pdn->offset = bitmap_find_next_zero_area( >>>@@ -2774,9 +2771,10 @@ static void pnv_pci_ioda_fixup_iov_resources(struct >>>pci_dev *pdev) >>> if (!res->flags || res->parent) >>> continue; >>> if (!pnv_pci_is_mem_pref_64(res->flags)) { >>>-dev_warn(&pdev->dev, " non M64 VF BAR%d: %pR\n", >>>+dev_warn(&pdev->dev, "Don't support SR-IOV with" >>>+" non M64 VF BAR%d: %pR. \n", >>> i, res); >>>-continue; >>>+return; >>> } >>> >>> size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES); >>>@@ -2795,11 +2793,6 @@ static void pnv_pci_ioda_fixup_iov_resources(struct >>>pci_dev *pdev) >>> res = &pdev->resource[i + PCI_IOV_RESOURCES]; >>> if (!res->flags || res->parent) >>> continue; >>>-if (!pnv_pci_is_mem_pref_64(res->flags)) { >>>-dev_warn(&pdev->dev, "Skipping expanding VF BAR%d: >>>%pR\n", >>>- i, res); >>>-continue; >>>-} >> >>When any one IOV BAR on the PF is non-M64, none of the VFs can be enabled. >>Will we still allocate/assign M64 or M32 resources for the IOV BARs? If so, >>I think it can be avoided. >> > >Don't get your point. You mean to avoid this function? > >Or clear the IOV BAR when we found one of it is non-M64? > I mean to clear all IOV BARs in case any more of them are IO or M32. In this case, the SRIOV capability won't be enabled. Otherwise, the resources for all IOV BARs are assigned and allocated by
Re: [PATCH V2 5/6] powerpc/powernv: boundary the total vf bar size instead of the individual one
On Thu, Aug 06, 2015 at 10:03:04PM +0800, Wei Yang wrote: >On Thu, Aug 06, 2015 at 03:28:51PM +1000, Gavin Shan wrote: >>On Wed, Aug 05, 2015 at 09:25:02AM +0800, Wei Yang wrote: >>>Each VF could have 6 BARs at most. When the total BAR size exceeds the >>>gate, after expanding it will also exhaust the M64 Window. >>> >>>This patch limits the boundary by checking the total VF BAR size instead of >>>the individual BAR. >>> >>>Signed-off-by: Wei Yang >> >>Ok. I didn't look at this when giving comments to last patch. It turns >>you have the change in this patch. Please merge it with the previous >>patch. >> > >Hmm... I prefer to have them in two patches. One focus on the calculation of >gate and the other focus on checking the total VF BAR size. This would help >record the change. > It's fine to me as well. I'll take close look on your next revision since you have to refresh the whole series. Is that fine to you? >>>--- >>> arch/powerpc/platforms/powernv/pci-ioda.c | 13 +++-- >>> 1 file changed, 7 insertions(+), 6 deletions(-) >>> >>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c >>>b/arch/powerpc/platforms/powernv/pci-ioda.c >>>index 31dcedc..4042303 100644 >>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c >>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c >>>@@ -2702,7 +2702,7 @@ static void pnv_pci_ioda_fixup_iov_resources(struct >>>pci_dev *pdev) >>> struct pnv_phb *phb; >>> struct resource *res; >>> int i; >>>-resource_size_t size, gate; >>>+resource_size_t size, gate, total_vf_bar_sz; >>> struct pci_dn *pdn; >>> int mul, total_vfs; >>> >>>@@ -2729,6 +2729,7 @@ static void pnv_pci_ioda_fixup_iov_resources(struct >>>pci_dev *pdev) >>> * Window and limit the system flexibility. >>> */ >>> gate = phb->ioda.m64_segsize >> 1; >>>+total_vf_bar_sz = 0; >>> >>> for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) { >>> res = &pdev->resource[i + PCI_IOV_RESOURCES]; >>>@@ -2741,13 +2742,13 @@ static void pnv_pci_ioda_fixup_iov_resources(struct >>>pci_dev *pdev) >>> return; >>> } >>> >>>-size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES); >>>+total_vf_bar_sz += pci_iov_resource_size(pdev, >>>+i + PCI_IOV_RESOURCES); >>> >>> /* bigger than or equal to gate */ >>>-if (size >= gate) { >>>-dev_info(&pdev->dev, "PowerNV: VF BAR%d: %pR IOV size " >>>-"is bigger than %lld, roundup power2\n", >>>- i, res, gate); >>>+if (total_vf_bar_sz >= gate) { >>>+dev_info(&pdev->dev, "PowerNV: VF BAR Total IOV size " >>>+"is bigger than %lld, roundup power2\n", gate); >>> mul = roundup_pow_of_two(total_vfs); >>> pdn->m64_single_mode = true; >>> break; >>>-- >>>1.7.9.5 >>> > >-- >Richard Yang >Help you, Help me ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V2 3/6] powerpc/powernv: use one M64 BAR in Single PE mode for one VF BAR
On Thu, Aug 06, 2015 at 08:07:01PM +1000, Gavin Shan wrote: >On Thu, Aug 06, 2015 at 05:36:02PM +0800, Wei Yang wrote: >>On Thu, Aug 06, 2015 at 03:20:25PM +1000, Gavin Shan wrote: >>>On Wed, Aug 05, 2015 at 09:25:00AM +0800, Wei Yang wrote: In current implementation, when VF BAR is bigger than 64MB, it uses 4 M64 BAR in Single PE mode to cover the number of VFs required to be enabled. By doing so, several VFs would be in one VF Group and leads to interference between VFs in the same group. This patch changes the design by using one M64 BAR in Single PE mode for one VF BAR. This gives absolute isolation for VFs. Signed-off-by: Wei Yang --- arch/powerpc/include/asm/pci-bridge.h |5 +- arch/powerpc/platforms/powernv/pci-ioda.c | 180 - 2 files changed, 76 insertions(+), 109 deletions(-) diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h index 712add5..8aeba4c 100644 --- a/arch/powerpc/include/asm/pci-bridge.h +++ b/arch/powerpc/include/asm/pci-bridge.h @@ -214,10 +214,9 @@ struct pci_dn { u16 vfs_expanded; /* number of VFs IOV BAR expanded */ u16 num_vfs;/* number of VFs enabled*/ int offset; /* PE# for the first VF PE */ -#define M64_PER_IOV 4 - int m64_per_iov; + boolm64_single_mode;/* Use M64 BAR in Single Mode */ #define IODA_INVALID_M64(-1) - int m64_wins[PCI_SRIOV_NUM_BARS][M64_PER_IOV]; + int (*m64_map)[PCI_SRIOV_NUM_BARS]; >>> >>>It can be explicit? For example: >>> >>> int *m64_map; >>> >>> /* Initialization */ >>> size_t size = sizeof(*pdn->m64_map) * PCI_SRIOV_NUM_BARS * >>> num_of_max_VFs; >>> pdn->m64_map = kmalloc(size, GFP_KERNEL); >>> for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) >>> for (j = 0; j < num_of_max_VFs; j++) >>> pdn->m64_map[i * PCI_SRIOV_NUM_BARS + j] = >>> PNV_INVALID_M64; >>> >>> /* Destroy */ >>> int step = 1; >>> >>> if (!pdn->m64_single_mode) >>> step = phb->ioda.total_pe; >>> for (i = 0; i < PCI_SRIOV_NUM_BARS * num_of_max_VFs; i += step) >>> if (pdn->m64_map[i] == PNV_INVALID_M64) >>> continue; >>> >>> /* Unmap the window */ >>> >> >>The m64_map is a pointer to an array with 6 elements, which represents the 6 >>M64 BAR index for the 6 VF BARs. >> >>When we use Shared Mode, one array is allocated. The six elements >>represents the six M64 BAR(at most) used to map the whole IOV BAR. >> >>When we use Single Mode, num_vfs array is allocate. Each array represents >>the map between one VF's BAR and M64 BAR index. >> >>During the map and un-map, M64 BAR is assigned one by one in VF BAR's order. >>So I think the code is explicit. >> >>In your code, you allocate a big one dimension array to hold the M64 BAR >>index. It works, while I don't think this is more explicit than original code. >> > >When M64 is in Single Mode, array with (num_vfs * 6) entries is allocated >because every VF BAR (6 at most) will have one corresponding PHB M64 BAR. >Anything I missed? > >The point in my code is you needn't worry about the mode (single vs shared) >As I said, not too much memory wasted. However, it's up to you. > If we don't want to save some memory, how about just define them static instead of dynamically allocate? >I'm not fan of "int (*m64_map)[PCI_SRIOV_NUM_BARS]". Instead, you can replace >it with "int *m64_map" and calculate its size using following formula: > > sizeof(*pdn->m64_map) * PCI_SRIOV_NUM_BARS; > > sizeof(*pdn->m64_map) * PCI_SRIOV_NUM_BARS * num_vfs; > >>-- >>Richard Yang >>Help you, Help me -- Richard Yang Help you, Help me ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V2 6/6] powerpc/powernv: allocate discrete PE# when using M64 BAR in Single PE mode
On Thu, Aug 06, 2015 at 09:41:41PM +0800, Wei Yang wrote: >On Thu, Aug 06, 2015 at 03:36:01PM +1000, Gavin Shan wrote: >>On Wed, Aug 05, 2015 at 09:25:03AM +0800, Wei Yang wrote: >>>When M64 BAR is set to Single PE mode, the PE# assigned to VF could be >>>discrete. >>> >>>This patch restructures the patch to allocate discrete PE# for VFs when M64 >>>BAR is set to Single PE mode. >>> >>>Signed-off-by: Wei Yang >>>--- >>> arch/powerpc/include/asm/pci-bridge.h |2 +- >>> arch/powerpc/platforms/powernv/pci-ioda.c | 69 >>> + >>> 2 files changed, 51 insertions(+), 20 deletions(-) >>> >>>diff --git a/arch/powerpc/include/asm/pci-bridge.h >>>b/arch/powerpc/include/asm/pci-bridge.h >>>index 8aeba4c..72415c7 100644 >>>--- a/arch/powerpc/include/asm/pci-bridge.h >>>+++ b/arch/powerpc/include/asm/pci-bridge.h >>>@@ -213,7 +213,7 @@ struct pci_dn { >>> #ifdef CONFIG_PCI_IOV >>> u16 vfs_expanded; /* number of VFs IOV BAR expanded */ >>> u16 num_vfs;/* number of VFs enabled*/ >>>-int offset; /* PE# for the first VF PE */ >>>+int *offset;/* PE# for the first VF PE or array */ >>> boolm64_single_mode;/* Use M64 BAR in Single Mode */ >>> #define IODA_INVALID_M64(-1) >>> int (*m64_map)[PCI_SRIOV_NUM_BARS]; >> >>how about renaming "offset" to "pe_num_map", or "pe_map" ? Similar to the >>comments >>I gave to the "m64_bar_map", num_of_max_vfs entries can be allocated. Though >>not >>all of them will be used, not too much memory will be wasted. >> > >Thanks for your comment. > >I have thought about change the name to make it more self explain. While >another fact I want to take in is this field is also used to be reflect the >shift offset when M64 BAR is used in the Shared Mode. So I maintain the name. > >How about use "enum", one maintain the name "offset", and another one rename to >"pe_num_map". And use the meaningful name at proper place? > Ok. I'm explaining it with more details. There are two cases: single vs shared mode. When PHB M64 BARs run in single mode, you need an array to track the allocated discrete PE#. The VF_index is the index to the array. When PHB M64 BARs run in shared mode, you need continuous PE#. No array required for this case. Instead, the starting PE# should be stored to somewhere, which can be pdn->offset[0] simply. So when allocating memory for this array, you just simply allocate (sizeof(*pdn->offset) *max_vf_num) no matter what mode PHB's M64 BARs will run in. The point is nobody can enable (max_vf_num + 1) VFs. With above way, the arrays for PE# and M64 BAR remapping needn't be allocated when enabling SRIOV capability and releasing on disabling SRIOV capability. Instead, those two arrays can be allocated during resource fixup time and free'ed when destroying the pdn. >>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c >>>b/arch/powerpc/platforms/powernv/pci-ioda.c >>>index 4042303..9953829 100644 >>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c >>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c >>>@@ -1243,7 +1243,7 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, >>>u16 num_vfs) >>> >>> /* Map the M64 here */ >>> if (pdn->m64_single_mode) { >>>-pe_num = pdn->offset + j; >>>+pe_num = pdn->offset[j]; >>> rc = opal_pci_map_pe_mmio_window(phb->opal_id, >>> pe_num, OPAL_M64_WINDOW_TYPE, >>> pdn->m64_map[j][i], 0); >>>@@ -1347,7 +1347,7 @@ void pnv_pci_sriov_disable(struct pci_dev *pdev) >>> struct pnv_phb*phb; >>> struct pci_dn *pdn; >>> struct pci_sriov *iov; >>>-u16 num_vfs; >>>+u16num_vfs, i; >>> >>> bus = pdev->bus; >>> hose = pci_bus_to_host(bus); >>>@@ -1361,14 +1361,18 @@ void pnv_pci_sriov_disable(struct pci_dev *pdev) >>> >>> if (phb->type == PNV_PHB_IODA2) { >>> if (!pdn->m64_single_mode) >>>-pnv_pci_vf_resource_shift(pdev, -pdn->offset); >>>+pnv_pci_vf_resource_shift(pdev, -*pdn->offset); >>> >>> /* Release M64 windows */ >>> pnv_pci_vf_release_m64(pdev, num_vfs); >>> >>> /* Release PE numbers */ >>>-bitmap_clear(phb->ioda.pe_alloc, pdn->offset, num_vfs); >>>-pdn->offset = 0; >>>+if (pdn->m64_single_mode) { >>>+for (i = 0; i < num_vfs; i++) >>>+pnv_ioda_free_pe(phb, pdn->offset[i]); >>>+} else >>>+bitmap_clear(phb->ioda.pe_alloc, *pdn->offset, num_vfs); >>>+kfree(pdn->offset); >> >>Can pnv_ioda_free_pe() be reused to release PE ? > >You mean use it to similar thing in pnv_ioda_deconfigure_pe()? > Forget it please
Re: [PATCH v6 41/42] drivers/of: Export OF changeset functions
On Thu, Aug 06, 2015 at 08:48:10AM -0500, Rob Herring wrote: >On Wed, Aug 5, 2015 at 11:11 PM, Gavin Shan wrote: Thanks, Rob. All your comments will be covered in next revision. Thanks, Gavin >> The PowerNV PCI hotplug driver is going to use the OF changeset >> to manage the changed device sub-tree, which requires those OF >> changeset functions are exported. >> >> Signed-off-by: Gavin Shan >> --- >> drivers/of/dynamic.c | 65 >> --- >> drivers/of/overlay.c | 8 +++ >> drivers/of/unittest.c | 4 ++-- >> include/linux/of.h| 2 ++ >> 4 files changed, 54 insertions(+), 25 deletions(-) >> >> diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c >> index 53826b8..af65b5b 100644 >> --- a/drivers/of/dynamic.c >> +++ b/drivers/of/dynamic.c >> @@ -646,6 +646,7 @@ void of_changeset_init(struct of_changeset *ocs) >> memset(ocs, 0, sizeof(*ocs)); >> INIT_LIST_HEAD(&ocs->entries); >> } >> +EXPORT_SYMBOL(of_changeset_init); > >We probably want these to be the _GPL variant. > >> >> /** >> * of_changeset_destroy - Destroy a changeset >> @@ -662,20 +663,9 @@ void of_changeset_destroy(struct of_changeset *ocs) >> list_for_each_entry_safe_reverse(ce, cen, &ocs->entries, node) >> __of_changeset_entry_destroy(ce); >> } >> +EXPORT_SYMBOL(of_changeset_destroy); >> >> -/** >> - * of_changeset_apply - Applies a changeset >> - * >> - * @ocs: changeset pointer >> - * >> - * Applies a changeset to the live tree. >> - * Any side-effects of live tree state changes are applied here on >> - * sucess, like creation/destruction of devices and side-effects >> - * like creation of sysfs properties and directories. >> - * Returns 0 on success, a negative error value in case of an error. >> - * On error the partially applied effects are reverted. >> - */ >> -int of_changeset_apply(struct of_changeset *ocs) >> +int __of_changeset_apply(struct of_changeset *ocs) >> { >> struct of_changeset_entry *ce; >> int ret; >> @@ -704,17 +694,30 @@ int of_changeset_apply(struct of_changeset *ocs) >> } >> >> /** >> - * of_changeset_revert - Reverts an applied changeset >> + * of_changeset_apply - Applies a changeset >> * >> * @ocs: changeset pointer >> * >> - * Reverts a changeset returning the state of the tree to what it >> - * was before the application. >> - * Any side-effects like creation/destruction of devices and >> - * removal of sysfs properties and directories are applied. >> + * Applies a changeset to the live tree. >> + * Any side-effects of live tree state changes are applied here on >> + * sucess, like creation/destruction of devices and side-effects > >s/sucess/success/ > >> + * like creation of sysfs properties and directories. >> * Returns 0 on success, a negative error value in case of an error. >> + * On error the partially applied effects are reverted. >> */ >> -int of_changeset_revert(struct of_changeset *ocs) >> +int of_changeset_apply(struct of_changeset *ocs) >> +{ >> + int ret; >> + >> + mutex_lock(&of_mutex); >> + ret = __of_changeset_apply(ocs); >> + mutex_unlock(&of_mutex); >> + >> + return ret; >> +} >> +EXPORT_SYMBOL(of_changeset_apply); >> + >> +int __of_changeset_revert(struct of_changeset *ocs) >> { >> struct of_changeset_entry *ce; >> int ret; >> @@ -742,6 +745,29 @@ int of_changeset_revert(struct of_changeset *ocs) >> } >> >> /** >> + * of_changeset_revert - Reverts an applied changeset >> + * >> + * @ocs: changeset pointer >> + * >> + * Reverts a changeset returning the state of the tree to what it >> + * was before the application. >> + * Any side-effects like creation/destruction of devices and >> + * removal of sysfs properties and directories are applied. >> + * Returns 0 on success, a negative error value in case of an error. >> + */ >> +int of_changeset_revert(struct of_changeset *ocs) >> +{ >> + int ret; >> + >> + mutex_lock(&of_mutex); >> + ret = __of_changeset_revert(ocs); >> + mutex_unlock(&of_mutex); >> + >> + return ret; >> +} >> +EXPORT_SYMBOL(of_changeset_revert); >> + >> +/** >> * of_changeset_action - Perform a changeset action >> * >> * @ocs: changeset pointer >> @@ -779,3 +805,4 @@ int of_changeset_action(struct of_changeset *ocs, >> unsigned long action, >> list_add_tail(&ce->node, &ocs->entries); >> return 0; >> } >> +EXPORT_SYMBOL(of_changeset_action); >> diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c >> index 24e025f..804ea33 100644 >> --- a/drivers/of/overlay.c >> +++ b/drivers/of/overlay.c >> @@ -378,9 +378,9 @@ int of_overlay_create(struct device_node *tree) >> } >> >> /* apply the changeset */ >> - err = of_changeset_apply(&ov->cset); >> + err = __of_changeset_apply(&ov->cset); >> if (err) { >> - pr_err("%s: of_changeset_apply() failed for tree@%s\n", >> +
Re: [PATCH V2 2/6] powerpc/powernv: simplify the calculation of iov resource
On Thu, Aug 06, 2015 at 09:49:02PM +0800, Wei Yang wrote: >On Thu, Aug 06, 2015 at 02:51:40PM +1000, Gavin Shan wrote: >>On Wed, Aug 05, 2015 at 09:24:59AM +0800, Wei Yang wrote: >>>The alignment of IOV BAR on PowerNV platform is the total size of the IOV >>>BAR. No matter whether the IOV BAR is truncated or not, the total size >>>could be calculated by (vfs_expanded * VF size). >>> >> >>s/VF size/VF BAR size >> >>I think the changelog would be more explicit: >> >>The alignment of IOV BAR on PowerNV platform is the total size of the >>IOV BAR, no matter whether the IOV BAR is extended with number of max >>VFs or number of max PE number (256). The alignment can be calculated > >number of max VFs is not correct. This should be >roundup_pow_of_two(total_vfs). > >Others looks good to me. > Yes, You're correct. >>by (vfs_expaned * VF_BAR_size). >> >>>This patch simplifies the pnv_pci_iov_resource_alignment() by removing the >>>first case. >>> >>>Signed-off-by: Wei Yang >> >>Reviewed-by: Gavin Shan >> >>>--- >>> arch/powerpc/platforms/powernv/pci-ioda.c | 14 +- >>> 1 file changed, 9 insertions(+), 5 deletions(-) >>> >>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c >>>b/arch/powerpc/platforms/powernv/pci-ioda.c >>>index 9b41dba..7192e62 100644 >>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c >>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c >>>@@ -2987,12 +2987,16 @@ static resource_size_t >>>pnv_pci_iov_resource_alignment(struct pci_dev *pdev, >>> int resno) >>> { >>> struct pci_dn *pdn = pci_get_pdn(pdev); >>>-resource_size_t align, iov_align; >>>- >>>-iov_align = resource_size(&pdev->resource[resno]); >>>-if (iov_align) >>>-return iov_align; >>>+resource_size_t align; >>> >>>+/* >>>+ * On PowerNV platform, IOV BAR is mapped by M64 BAR to enable the >>>+ * SR-IOV. While from hardware perspective, the range mapped by M64 >>>+ * BAR should be size aligned. >>>+ * >>>+ * This function return the total IOV BAR size if expanded or just the >>>+ * individual size if not. >>>+ */ >>> align = pci_iov_resource_size(pdev, resno); >>> if (pdn->vfs_expanded) >>> return pdn->vfs_expanded * align; >>>-- >>>1.7.9.5 >>> > >-- >Richard Yang >Help you, Help me ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V2 2/6] powerpc/powernv: simplify the calculation of iov resource
On Thu, Aug 06, 2015 at 08:15:55PM +1000, Alexey Kardashevskiy wrote: >On 08/06/2015 07:41 PM, Wei Yang wrote: >>On Thu, Aug 06, 2015 at 07:00:00PM +1000, Alexey Kardashevskiy wrote: >>>On 08/06/2015 02:51 PM, Gavin Shan wrote: On Wed, Aug 05, 2015 at 09:24:59AM +0800, Wei Yang wrote: >The alignment of IOV BAR on PowerNV platform is the total size of the IOV >BAR. No matter whether the IOV BAR is truncated or not, the total size >could be calculated by (vfs_expanded * VF size). > s/VF size/VF BAR size I think the changelog would be more explicit: The alignment of IOV BAR on PowerNV platform is the total size of the IOV BAR, no matter whether the IOV BAR is extended with number of max VFs or number of max PE number (256). The alignment can be calculated by (vfs_expaned * VF_BAR_size). >>> >>> >>> >>>Is that really a PowerNV-specific requirement or it is valid for >>>every platform (I suspect this is the case here)? >>> >> >>Currently, it is PowerNV-specific. > > >How is x86 different on this matter? >Why would we need this extra alignment, not just VF's BAR alignment? > > The difference lie the "PE" isolation on Power. From the MMIO perspective, we use M32 BAR and M64 BAR to map a MMIO range to a PE#. M32 is controlled with one M32 BAR, while M64 has 16 BARs. Now let's back to the pure PCI world. Every PCI device has BAR, which is required to be size aligned. For example, BAR#0 is 1MB, which means it is 1MB aligned. Then let's look at the SRIOV case. The PF's IOV BAR (6 BARs at most) contains the start address of the total_vfs number of VFs BAR. For example, VF BAR#0 size is 1MB, and suppose PF's IOV BAR#0 is assigned to 0 and total_vfs is 8. The IOV BAR is mapped to a range [0 - (8MB -1)]. When pci core allocating the IOV BAR, it just make sure IOV BAR is 1MB aligned, since by doing so each VF BAR is size aligned. That is x86 does and maybe other platforms. I believe you understand this part. Now back to our platform, we want to use M64 BAR to map the IOV BAR. Come to the example above. The IOV BAR#0 is 8MB size, if we still let the pci core allocate it with 1MB alignment, our M64 BAR can't work. Since M64 BAR itself should be size aligned(in this case 8MB.). The same concept as PCI BAR. >>> >>>Also, what is the exact meaning of "expanded" in @vfs_expanded? It is >>>either 255 (if individual VF BARs are <= 64MB) or >>>roundup_pow_of_two(total_vfs) (which is something like 4 or 16). What >>>is expanded here? >>> >> >>PF's IOV BAR original size is (VF BAR size * total_vfs). >> >>After expanding, the IOV BAR size is (VF BAR size * 256) or (VF BAR size * >>roundup_pow_of_two(total_vfs)). > > >Ufff, got it now. I'd store just an expanded IOV BAR size (not some >magic VFs number) because this is what it actually is: >pdn->vfs_expanded * align > This would change the idea, since VF still could have 6 BARs. In your proposal, we need to 6 variable to store the expanded size for each. > >>> >This patch simplifies the pnv_pci_iov_resource_alignment() by removing the >first case. > >Signed-off-by: Wei Yang Reviewed-by: Gavin Shan >--- >arch/powerpc/platforms/powernv/pci-ioda.c | 14 +- >1 file changed, 9 insertions(+), 5 deletions(-) > >diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c >b/arch/powerpc/platforms/powernv/pci-ioda.c >index 9b41dba..7192e62 100644 >--- a/arch/powerpc/platforms/powernv/pci-ioda.c >+++ b/arch/powerpc/platforms/powernv/pci-ioda.c >@@ -2987,12 +2987,16 @@ static resource_size_t >pnv_pci_iov_resource_alignment(struct pci_dev *pdev, > int resno) >{ > struct pci_dn *pdn = pci_get_pdn(pdev); >- resource_size_t align, iov_align; >- >- iov_align = resource_size(&pdev->resource[resno]); >- if (iov_align) >- return iov_align; >+ resource_size_t align; > >+ /* >+ * On PowerNV platform, IOV BAR is mapped by M64 BAR to enable the >+ * SR-IOV. While from hardware perspective, the range mapped by M64 >+ * BAR should be size aligned. >+ * >+ * This function return the total IOV BAR size if expanded or just the >+ * individual size if not. >+ */ > align = pci_iov_resource_size(pdev, resno); > if (pdn->vfs_expanded) > return pdn->vfs_expanded * align; >-- >1.7.9.5 > >>> >>> >>>-- >>>Alexey >> > > >-- >Alexey -- Richard Yang Help you, Help me ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V2 1/6] powerpc/powernv: don't enable SRIOV when VF BAR contains non M64 BAR
On 08/07/2015 12:13 AM, Wei Yang wrote: On Thu, Aug 06, 2015 at 05:47:42PM +1000, Alexey Kardashevskiy wrote: On 08/06/2015 04:57 PM, Gavin Shan wrote: On Thu, Aug 06, 2015 at 04:10:21PM +1000, Alexey Kardashevskiy wrote: On 08/06/2015 02:35 PM, Gavin Shan wrote: On Wed, Aug 05, 2015 at 09:24:58AM +0800, Wei Yang wrote: On PHB_IODA2, we enable SRIOV devices by mapping IOV BAR with M64 BARs. If a SRIOV device's BAR is not 64-bit prefetchable, this is not assigned from M64 windwo, which means M64 BAR can't work on it. The proper text would be something like this: === SRIOV only supports 64bit MMIO. So if we fail to assign 64bit BAR, we cannot enable the device. === s/PHB_IODA2/PHB3 No, it is IODA2. OPEL does PHB3-specific bits, the host kernel just uses OPAL. Ok. s/windwo/window This patch makes this explicit. Signed-off-by: Wei Yang The idea sounds right, but there is one question as below. --- arch/powerpc/platforms/powernv/pci-ioda.c | 25 + 1 file changed, 9 insertions(+), 16 deletions(-) diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index 5738d31..9b41dba 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -908,9 +908,6 @@ static int pnv_pci_vf_resource_shift(struct pci_dev *dev, int offset) if (!res->flags || !res->parent) continue; - if (!pnv_pci_is_mem_pref_64(res->flags)) - continue; - /* * The actual IOV BAR range is determined by the start address * and the actual size for num_vfs VFs BAR. This check is to @@ -939,9 +936,6 @@ static int pnv_pci_vf_resource_shift(struct pci_dev *dev, int offset) if (!res->flags || !res->parent) continue; - if (!pnv_pci_is_mem_pref_64(res->flags)) - continue; - size = pci_iov_resource_size(dev, i + PCI_IOV_RESOURCES); res2 = *res; res->start += size * offset; @@ -1221,9 +1215,6 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs) if (!res->flags || !res->parent) continue; - if (!pnv_pci_is_mem_pref_64(res->flags)) - continue; - for (j = 0; j < vf_groups; j++) { do { win = find_next_zero_bit(&phb->ioda.m64_bar_alloc, @@ -1510,6 +1501,12 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs) pdn = pci_get_pdn(pdev); if (phb->type == PNV_PHB_IODA2) { + if (!pdn->vfs_expanded) { + dev_info(&pdev->dev, "don't support this SRIOV device" + " with non M64 VF BAR\n"); + return -EBUSY; + } + It would be -ENOSPC since -EBUSY indicates the devices (VFs) are temparily unavailable. For this case, the VFs are permanently unavailable because of running out of space to accomodate M64 and non-M64 VF BARs. The error message could be printed with dev_warn() and it would be precise as below or something else you prefer: dev_warn(&pdev->dev, "SRIOV not supported because of non-M64 VF BAR\n"); Both messages are cryptic. If it is not M64 BAR, then what is it? It is always in one of M64 BARs (in the worst case - BAR#15?), the difference is if it is segmented or not, no? The VF BAR could be one of IO, M32, M64. If it's not M64, the VFs are supposed to be disabled and the (IO and M32) resources won't be allocted, but for sure, the IO and M32 resources can't be put into any one of the 16 PHB's M64 BARs. would you recommend one better message then? dev_warn(&pdev->dev, "SRIOV is disabled as no space is left in 64bit MMIO window\n"); Or it is not "MMIO window"? The reason is not "no space left in 64bit MMIO window". The reason is the IOV BAR is not 64bit prefetchable, then in linux kernel this can't be allocated from M64 Space, then we can't use M64 BAR to cover it. Oh. So now it is not M64 vs. M32 and IO, it is about prefetchable vs. non-prefetchable. Please choose one. Should it be this then? dev_warn(&pdev->dev, "Non-prefetchable BARs are not supported for SRIOV") But Gavin keeps insisting on mentioning "non-M64 BAR" - this part I do not understand. And where does this limit come from? Is it POWER8, IODA2, PHB3, SRIOV or something else? Is it all about isolation or the host _without_ KVM but with SRIOV also cannot use VFs if they have non-prefetchable BARs? Is this because of POWER8 or IODA2 or PHB3 or SRIOV or something else? /* Calculate available PE for required VFs */ mutex_lock(&phb->ioda.pe_alloc_mutex); pdn->offset = bitmap_find_next_zero_area( @@ -2774,9 +2771,10 @@ static voi
Re: [PATCH v5 RESEND] IFC: Change IO accessor based on endianness
On Thu, Aug 06, 2015 at 06:24:38PM -0500, Scott Wood wrote: > On Thu, 2015-08-06 at 09:52 -0700, Brian Norris wrote: > > Who takes patches for drivers/memory/ again? I can take it via MTD if > > no one else steps up. > There's no maintainer listed. IIRC, when we previously discussed the patch > we both said that it could go via either of our trees. I'll take it via mine. I recall the conversation. I was just confused (again) by there being no maintainer entry. Have at it. > > It's nothing new, but this patch continues the pattern of using a global > > pointer to the IFC device structure. Not pretty, but not worth holding > > this up over. > > It's not pretty, but I see little reason to come up with more complicated > infrastructure for the drivers finding each other given the low odds of ever > seeing a single-kernel system with more than one IFC. It's impossible with > current chips. If it ever does happen we can do something fancier. Yep, I figured as much. Brian ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2 2/2] powerpc32: optimise csum_partial() loop
On Thu, Aug 06, 2015 at 05:45:45PM -0500, Scott Wood wrote: > > The original loop was already optimal, as the comment said. > > The comment says that bdnz has zero overhead. That doesn't mean the adde > won't stall waiting for the load result. adde is execution serialising on those cores; it *always* stalls, that is, it won't run until it is next to complete. > > The new code adds extra instructions and a mispredicted branch. > > Outside the main loop. Sure, I never said it was super-bad or anything. > > You also might get less overlap between the loads and adde (I didn't check > > if there is any originally): those instructions are no longer > > interleaved. > > > > I think it is a stupid idea to optimise code for all 32-bit PowerPC > > CPUs based on solely what is best for a particularly simple, slow > > implementation; and that is what this patch is doing. > > The simple and slow implementation is the one that needs optimizations the > most. And, on the other hand, optimising for atypical (mostly) in-order single-issue chips without branch folding, hurts performance on other chips the most. Well, dual-issue in-order might be worse :-P > If this makes performance non-negligibly worse on other 32-bit chips, and is > an important improvement on 8xx, then we can use an ifdef since 8xx already > requires its own kernel build. I'd prefer to see a benchmark showing that it > actually does make things worse on those chips, though. And I'd like to see a benchmark that shows it *does not* hurt performance on most chips, and does improve things on 8xx, and by how much. But it isn't *me* who has to show that, it is not my patch. If these csum routines actually matter for performance that much, there really *should* be chip-specific implementations. Segher ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v5 RESEND] IFC: Change IO accessor based on endianness
On Thu, 2015-08-06 at 09:52 -0700, Brian Norris wrote: > On Wed, May 20, 2015 at 09:17:11PM -0500, Scott Wood wrote: > > From: Jaiprakash Singh > > > > IFC IO accressor are set at run time based > > on IFC IP registers endianness.IFC node in > > DTS file contains information about > > endianness. > > > > Signed-off-by: Jaiprakash Singh > > Signed-off-by: Scott Wood > > --- > > v5: I'm assuming it's the same as v4, but I didn't send v4, and this > > comes from a versionless [RESEND] that never made it to the mailing list, > > so bumping the version just in case. > > Acked-by: Brian Norris > > Who takes patches for drivers/memory/ again? I can take it via MTD if > no one else steps up. There's no maintainer listed. IIRC, when we previously discussed the patch we both said that it could go via either of our trees. I'll take it via mine. > It's nothing new, but this patch continues the pattern of using a global > pointer to the IFC device structure. Not pretty, but not worth holding > this up over. It's not pretty, but I see little reason to come up with more complicated infrastructure for the drivers finding each other given the low odds of ever seeing a single-kernel system with more than one IFC. It's impossible with current chips. If it ever does happen we can do something fancier. -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: windfarm: decrement client count when unregistering
On vr, 2015-08-07 at 00:21 +0200, Paul Bolle wrote: > On wo, 2015-08-05 at 14:16 +1000, Michael Ellerman wrote: > > I also get an oops when removing windfarm_lm75_sensor, so I suspect there > > are > > gremlins in the module ref counting for windfarm. > > (This I haven't (yet) looked into.) And that might be, sort of, related. Because oops is probably triggered by the, it seems, rather straightforward chain of events triggered by unloading an I2C module. (So windfarm_lm75_sensor refcount must be zero.) Which gets interesting at: wf_lm75_remove() wf_unregister_sensor(&wf_lm75_sensor->sens) wf_put_sensor(&wf_lm75_sensor->sens) module_put(wf_lm75_sensor->sens->ops->owner /* THIS_MODULE */) And in windfarm_lm75_sensor we trigger this issue because in the .probe() function there appears to be no corresponding call to try_module_get() preventing unloading the module, as we saw in windfarm_ smu_sensors. So module refcounting looks broken for both these modules in opposite ways. Gremlins indeed. Good luck! Paul Bolle ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2 2/2] powerpc32: optimise csum_partial() loop
On Wed, 2015-08-05 at 23:39 -0500, Segher Boessenkool wrote: > On Wed, Aug 05, 2015 at 09:31:41PM -0500, Scott Wood wrote: > > On Wed, 2015-08-05 at 19:30 -0500, Segher Boessenkool wrote: > > > On Wed, Aug 05, 2015 at 03:29:35PM +0200, Christophe Leroy wrote: > > > > On the 8xx, load latency is 2 cycles and taking branches also takes > > > > 2 cycles. So let's unroll the loop. > > > > > > This is not true for most other 32-bit PowerPC; this patch makes > > > performance worse on e.g. 6xx/7xx/7xxx. Let's not! > > > > Chips with a load latency greater than 2 cycles should also benefit from > > the > > unrolling. Have you benchmarked this somewhere and seen it reduce > > performance? Do you know of any 32-bit PPC chips with a load latency > > less > > than 2 cycles? > > The original loop was already optimal, as the comment said. The comment says that bdnz has zero overhead. That doesn't mean the adde won't stall waiting for the load result. > The new code adds extra instructions and a mispredicted branch. Outside the main loop. > You also might get less overlap between the loads and adde (I didn't check > if there is any originally): those instructions are no longer > interleaved. > > I think it is a stupid idea to optimise code for all 32-bit PowerPC > CPUs based on solely what is best for a particularly simple, slow > implementation; and that is what this patch is doing. The simple and slow implementation is the one that needs optimizations the most. If this makes performance non-negligibly worse on other 32-bit chips, and is an important improvement on 8xx, then we can use an ifdef since 8xx already requires its own kernel build. I'd prefer to see a benchmark showing that it actually does make things worse on those chips, though. -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: windfarm: decrement client count when unregistering
On wo, 2015-08-05 at 14:16 +1000, Michael Ellerman wrote: > On Fri, 2015-31-07 at 12:08:58 UTC, Paul Bolle wrote: > > windfarm_corex_exit() contains: > > BUG_ON(wf_client_count != 0); > > > > I wonder why that, apparently. never triggered. > > Hmm interesting. > > A quick test here on an iMacG5 shows that we get into a state where we can't > remove windfarm_core: > > $ lsmod > Module Size Used by > windfarm_smu_sensors7549 2 > windfarm_core 15391 1 windfarm_smu_sensors > > > Which means we can't trigger windfarm_core_exit() and the BUG_ON(). Perhaps this is what, roughly, happens: smu_sensors_init() smu_ads_create() /* Let's assume this happens ... */ ads->sens.ops = &smu_cpuamp_ops ads->sens.name = "cpu-current" smu_ads_create() /* ditto ... */ ads->sens.ops = &smu_cpuvolt_ops ads->sens.name = "cpu-voltage" /* ... so this would then be true */ if (volt_sensor && curr_sensor) /* and we do this */ smu_cpu_power_create(&volt_sensor->sens, &curr_sensor->sens) wf_get_sensor(&volt_sensor->sens) try_module_get(volt_sensor->sens->ops->owner /* THIS_MODULE */) wf_get_sensor(&curr_sensor->sens) try_module_get(curr_sensor->sens->ops->owner /* THIS_MODULE */) The cleanup would have happened here: smu_sensors_exit() while (!list_empty(&smu_ads) wf_unregister_sensor(&ads->sens) wf_put_sensor() /* would this also be done for sensors that never * triggered a call to module_get()? */ module_put(ads->sens->ops->owner /* THIS MODULE */) But, whatever it is that smu_sensors_exit() wants to do, it will never be called since there are these two references to this module that smu_sensors_init() created itself, preventing the unloading of this module. Does the above look plausible? Note that this was only cobbled together by staring at the code for far too long. If I had some powerpc machine at hand I could have actually tested this with a few strategically placed printk()'s. > I also get an oops when removing windfarm_lm75_sensor, so I suspect there are > gremlins in the module ref counting for windfarm. (This I haven't (yet) looked into.) > I'll merge this as probably correct. Hope this helps, Paul Bolle ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/3] Powerpc: mpc85xx: refactor the PM operations
On Thu, 2015-08-06 at 13:54 +0800, Chenhui Zhao wrote: > On Thu, Aug 6, 2015 at 1:46 PM, Scott Wood > wrote: > > On Thu, 2015-08-06 at 12:20 +0800, Chenhui Zhao wrote: > > > On Thu, Aug 6, 2015 at 10:57 AM, Scott Wood > > > > > > wrote: > > > > On Wed, 2015-08-05 at 18:11 +0800, Chenhui Zhao wrote: > > > > > On Tue, Aug 4, 2015 at 4:26 AM, Scott Wood > > > > > > > > wrote: > > > > > > On Mon, 2015-08-03 at 19:32 +0800, Chenhui Zhao wrote: > > > > > > > > > > > > > > > > > > > > > On Sat, Aug 1, 2015 at 7:59 AM, Scott Wood > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > Could you explain irq_mask()? Why would there still be > > > IRQs > > > > > > > destined > > > > > > > > for > > > > > > > > this CPU at this point? > > > > > > > > > > > > > > This function just masks irq by setting the registers in > > > RCPM > > > > > (for > > > > > > > example, RCPM_CPMIMR, RCPM_CPMCIMR). Actually, all irqs to > > > > > this CPU > > > > > > > have been migrated to other CPUs. > > > > > > > > > > > > So why do we need to set those bits in RCPM? Is it just > > > caution? > > > > > > > > > > Setting these bits can mask interrupts signalled to RCPM from > > > MPIC > > > > > as a > > > > > means of > > > > > waking up from a lower power state. So, cores will not be > > > waked up > > > > > unexpectedly. > > > > > > > > Why would the MPIC be signalling those interrupts if they've been > > > > masked at > > > > the MPIC? > > > > > > > > -Scott > > > > > > > > > > The interrupts to RCPM from MPIC are IRQ, Machine Check, NMI and > > > Critical interrupts. Some of them didn't be masked in MPIC. > > > > What interrupt could actually happen to a sleeping cpu that this > > protects > > against? > > > > -Scott > > Not sure. Maybe spurious interrupts or hardware exceptions. Spurious interrupts happen due to race conditions. They don't happen because the MPIC is bored and decides to ring a CPU's doorbell and hide in the bushes. If by "hardware exceptions" you mean machine checks, how would such a machine check be generated by a core that is off? > However, setting them make sure dead cpus can not be waked up unexpectedly. I'm not seeing enough value here to warrant resurrecting the old sleep node stuff. -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] kvm:powerpc:Fix error handling in the function mpic_set_default_irq_routing
This fixes error handling in the function mpic_set_default_irq_routing by checking if the call to the function kvm_set_irq_routing has failed and if so exit immediately to the caller by returning the error code returned by the call to mpic_set_default_irq_routing. Signed-off-by: Nicholas Krause --- arch/powerpc/kvm/mpic.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kvm/mpic.c b/arch/powerpc/kvm/mpic.c index 6249cdc..5a18859 100644 --- a/arch/powerpc/kvm/mpic.c +++ b/arch/powerpc/kvm/mpic.c @@ -1641,13 +1641,16 @@ static void mpic_destroy(struct kvm_device *dev) static int mpic_set_default_irq_routing(struct openpic *opp) { struct kvm_irq_routing_entry *routing; + int ret; /* Create a nop default map, so that dereferencing it still works */ routing = kzalloc((sizeof(*routing)), GFP_KERNEL); if (!routing) return -ENOMEM; - kvm_set_irq_routing(opp->kvm, routing, 0, 0); + ret = kvm_set_irq_routing(opp->kvm, routing, 0, 0); + if (ret) + return ret; kfree(routing); return 0; -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v5 RESEND] IFC: Change IO accessor based on endianness
On Wed, May 20, 2015 at 09:17:11PM -0500, Scott Wood wrote: > From: Jaiprakash Singh > > IFC IO accressor are set at run time based > on IFC IP registers endianness.IFC node in > DTS file contains information about > endianness. > > Signed-off-by: Jaiprakash Singh > Signed-off-by: Scott Wood > --- > v5: I'm assuming it's the same as v4, but I didn't send v4, and this > comes from a versionless [RESEND] that never made it to the mailing list, > so bumping the version just in case. Acked-by: Brian Norris Who takes patches for drivers/memory/ again? I can take it via MTD if no one else steps up. It's nothing new, but this patch continues the pattern of using a global pointer to the IFC device structure. Not pretty, but not worth holding this up over. Brian > .../bindings/memory-controllers/fsl/ifc.txt| 3 + > drivers/memory/fsl_ifc.c | 43 ++-- > drivers/mtd/nand/fsl_ifc_nand.c| 258 > +++-- > include/linux/fsl_ifc.h| 50 > 4 files changed, 213 insertions(+), 141 deletions(-) > > diff --git a/Documentation/devicetree/bindings/memory-controllers/fsl/ifc.txt > b/Documentation/devicetree/bindings/memory-controllers/fsl/ifc.txt > index d5e3704..89427b0 100644 > --- a/Documentation/devicetree/bindings/memory-controllers/fsl/ifc.txt > +++ b/Documentation/devicetree/bindings/memory-controllers/fsl/ifc.txt > @@ -18,6 +18,8 @@ Properties: >interrupt (NAND_EVTER_STAT). If there is only one, >that interrupt reports both types of event. > > +- little-endian : If this property is absent, the big-endian mode will > + be in use as default for registers. > > - ranges : Each range corresponds to a single chipselect, and covers > the entire access window as configured. > @@ -34,6 +36,7 @@ Example: > #size-cells = <1>; > reg = <0x0 0xffe1e000 0 0x2000>; > interrupts = <16 2 19 2>; > + little-endian; > > /* NOR, NAND Flashes and CPLD on board */ > ranges = <0x0 0x0 0x0 0xee00 0x0200 > diff --git a/drivers/memory/fsl_ifc.c b/drivers/memory/fsl_ifc.c > index 410c397..e87459f 100644 > --- a/drivers/memory/fsl_ifc.c > +++ b/drivers/memory/fsl_ifc.c > @@ -62,7 +62,7 @@ int fsl_ifc_find(phys_addr_t addr_base) > return -ENODEV; > > for (i = 0; i < fsl_ifc_ctrl_dev->banks; i++) { > - u32 cspr = in_be32(&fsl_ifc_ctrl_dev->regs->cspr_cs[i].cspr); > + u32 cspr = ifc_in32(&fsl_ifc_ctrl_dev->regs->cspr_cs[i].cspr); > if (cspr & CSPR_V && (cspr & CSPR_BA) == > convert_ifc_address(addr_base)) > return i; > @@ -79,16 +79,16 @@ static int fsl_ifc_ctrl_init(struct fsl_ifc_ctrl *ctrl) > /* >* Clear all the common status and event registers >*/ > - if (in_be32(&ifc->cm_evter_stat) & IFC_CM_EVTER_STAT_CSER) > - out_be32(&ifc->cm_evter_stat, IFC_CM_EVTER_STAT_CSER); > + if (ifc_in32(&ifc->cm_evter_stat) & IFC_CM_EVTER_STAT_CSER) > + ifc_out32(IFC_CM_EVTER_STAT_CSER, &ifc->cm_evter_stat); > > /* enable all error and events */ > - out_be32(&ifc->cm_evter_en, IFC_CM_EVTER_EN_CSEREN); > + ifc_out32(IFC_CM_EVTER_EN_CSEREN, &ifc->cm_evter_en); > > /* enable all error and event interrupts */ > - out_be32(&ifc->cm_evter_intr_en, IFC_CM_EVTER_INTR_EN_CSERIREN); > - out_be32(&ifc->cm_erattr0, 0x0); > - out_be32(&ifc->cm_erattr1, 0x0); > + ifc_out32(IFC_CM_EVTER_INTR_EN_CSERIREN, &ifc->cm_evter_intr_en); > + ifc_out32(0x0, &ifc->cm_erattr0); > + ifc_out32(0x0, &ifc->cm_erattr1); > > return 0; > } > @@ -127,9 +127,9 @@ static u32 check_nand_stat(struct fsl_ifc_ctrl *ctrl) > > spin_lock_irqsave(&nand_irq_lock, flags); > > - stat = in_be32(&ifc->ifc_nand.nand_evter_stat); > + stat = ifc_in32(&ifc->ifc_nand.nand_evter_stat); > if (stat) { > - out_be32(&ifc->ifc_nand.nand_evter_stat, stat); > + ifc_out32(stat, &ifc->ifc_nand.nand_evter_stat); > ctrl->nand_stat = stat; > wake_up(&ctrl->nand_wait); > } > @@ -161,16 +161,16 @@ static irqreturn_t fsl_ifc_ctrl_irq(int irqno, void > *data) > irqreturn_t ret = IRQ_NONE; > > /* read for chip select error */ > - cs_err = in_be32(&ifc->cm_evter_stat); > + cs_err = ifc_in32(&ifc->cm_evter_stat); > if (cs_err) { > dev_err(ctrl->dev, "transaction sent to IFC is not mapped to" > "any memory bank 0x%08X\n", cs_err); > /* clear the chip select error */ > - out_be32(&ifc->cm_evter_stat, IFC_CM_EVTER_STAT_CSER); > + ifc_out32(IFC_CM_EVTER_STAT_CSER, &ifc->cm_evter_stat); > > /* read error attribute registers print the error informatio
Re: [Xen-devel] [PATCH v2 4/8] xen: Use the correctly the Xen memory terminologies
On Thu, 6 Aug 2015, Julien Grall wrote: > On 06/08/15 12:06, Stefano Stabellini wrote: > > On Thu, 6 Aug 2015, Julien Grall wrote: > >> Hi, > >> > >> > >> On 04/08/15 19:12, Julien Grall wrote: > >>> diff --git a/include/xen/page.h b/include/xen/page.h > >>> index c5ed20b..e7e1425 100644 > >>> --- a/include/xen/page.h > >>> +++ b/include/xen/page.h > >>> @@ -3,9 +3,9 @@ > >>> > >>> #include > >>> > >>> -static inline unsigned long page_to_mfn(struct page *page) > >>> +static inline unsigned long page_to_gfn(struct page *page) > >>> { > >>> - return pfn_to_mfn(page_to_pfn(page)); > >>> + return pfn_to_gfn(page_to_pfn(page)); > >>> } > >> > >> I've just noticed that there is a function gfn_to_page used for KVM. > >> > >> Maybe I should rename page_to_gfn to xen_page_to_gfn to avoid confusion > >> with KVM one? > > > > Yeah, prepending xen would help to avoid namespace pollution. > > Will do. May I keep your Reviewed-by for this mechanical change? Yes ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v6 40/42] drivers/of: Return allocated memory chunk from of_fdt_unflatten_tree()
On Wed, Aug 5, 2015 at 11:11 PM, Gavin Shan wrote: > This changes of_fdt_unflatten_tree() so that it returns the allocated > memory chunk for unflattened device-tree, which can be released once > it's obsoleted. > > Signed-off-by: Gavin Shan Acked-by: Rob Herring > --- > drivers/of/fdt.c | 11 ++- > include/linux/of_fdt.h | 2 +- > 2 files changed, 7 insertions(+), 6 deletions(-) > > diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c > index 074870a..8e1ba7e 100644 > --- a/drivers/of/fdt.c > +++ b/drivers/of/fdt.c > @@ -408,7 +408,7 @@ static void *unflatten_dt_node(const void *blob, > * @dt_alloc: An allocator that provides a virtual address to memory > * for the resulting tree > */ > -static void __unflatten_device_tree(const void *blob, > +static void *__unflatten_device_tree(const void *blob, > struct device_node *dad, > struct device_node **mynodes, > void * (*dt_alloc)(u64 size, u64 align)) > @@ -421,7 +421,7 @@ static void __unflatten_device_tree(const void *blob, > > if (!blob) { > pr_debug("No device tree pointer\n"); > - return; > + return NULL; > } > > pr_debug("Unflattening device tree:\n"); > @@ -431,7 +431,7 @@ static void __unflatten_device_tree(const void *blob, > > if (fdt_check_header(blob)) { > pr_err("Invalid device tree blob header\n"); > - return; > + return NULL; > } > > /* First pass, scan for size */ > @@ -458,6 +458,7 @@ static void __unflatten_device_tree(const void *blob, >be32_to_cpup(mem + size)); > > pr_debug(" <- unflatten_device_tree()\n"); > + return mem; > } > > static void *kernel_tree_alloc(u64 size, u64 align) > @@ -473,11 +474,11 @@ static void *kernel_tree_alloc(u64 size, u64 align) > * pointers of the nodes so the normal device-tree walking functions > * can be used. > */ > -void of_fdt_unflatten_tree(const unsigned long *blob, > +void *of_fdt_unflatten_tree(const unsigned long *blob, > struct device_node *dad, > struct device_node **mynodes) > { > - __unflatten_device_tree(blob, dad, mynodes, &kernel_tree_alloc); > + return __unflatten_device_tree(blob, dad, mynodes, > &kernel_tree_alloc); > } > EXPORT_SYMBOL_GPL(of_fdt_unflatten_tree); > > diff --git a/include/linux/of_fdt.h b/include/linux/of_fdt.h > index 3644960..00db279 100644 > --- a/include/linux/of_fdt.h > +++ b/include/linux/of_fdt.h > @@ -37,7 +37,7 @@ extern bool of_fdt_is_big_endian(const void *blob, > unsigned long node); > extern int of_fdt_match(const void *blob, unsigned long node, > const char *const *compat); > -extern void of_fdt_unflatten_tree(const unsigned long *blob, > +extern void *of_fdt_unflatten_tree(const unsigned long *blob, >struct device_node *dad, >struct device_node **mynodes); > > -- > 2.1.0 > > -- > To unsubscribe from this list: send the line "unsubscribe devicetree" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V2 1/6] powerpc/powernv: don't enable SRIOV when VF BAR contains non M64 BAR
On Thu, Aug 06, 2015 at 05:47:42PM +1000, Alexey Kardashevskiy wrote: >On 08/06/2015 04:57 PM, Gavin Shan wrote: >>On Thu, Aug 06, 2015 at 04:10:21PM +1000, Alexey Kardashevskiy wrote: >>>On 08/06/2015 02:35 PM, Gavin Shan wrote: On Wed, Aug 05, 2015 at 09:24:58AM +0800, Wei Yang wrote: >On PHB_IODA2, we enable SRIOV devices by mapping IOV BAR with M64 BARs. If >a SRIOV device's BAR is not 64-bit prefetchable, this is not assigned from >M64 windwo, which means M64 BAR can't work on it. > > >The proper text would be something like this: > >=== >SRIOV only supports 64bit MMIO. So if we fail to assign 64bit BAR, we >cannot enable the device. >=== > > > s/PHB_IODA2/PHB3 >>> >>> >>>No, it is IODA2. OPEL does PHB3-specific bits, the host kernel just uses >>>OPAL. >>> >> >>Ok. >> >>> s/windwo/window >This patch makes this explicit. > >Signed-off-by: Wei Yang The idea sounds right, but there is one question as below. >--- >arch/powerpc/platforms/powernv/pci-ioda.c | 25 + >1 file changed, 9 insertions(+), 16 deletions(-) > >diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c >b/arch/powerpc/platforms/powernv/pci-ioda.c >index 5738d31..9b41dba 100644 >--- a/arch/powerpc/platforms/powernv/pci-ioda.c >+++ b/arch/powerpc/platforms/powernv/pci-ioda.c >@@ -908,9 +908,6 @@ static int pnv_pci_vf_resource_shift(struct pci_dev >*dev, int offset) > if (!res->flags || !res->parent) > continue; > >- if (!pnv_pci_is_mem_pref_64(res->flags)) >- continue; >- > /* >* The actual IOV BAR range is determined by the start address >* and the actual size for num_vfs VFs BAR. This check is to >@@ -939,9 +936,6 @@ static int pnv_pci_vf_resource_shift(struct pci_dev >*dev, int offset) > if (!res->flags || !res->parent) > continue; > >- if (!pnv_pci_is_mem_pref_64(res->flags)) >- continue; >- > size = pci_iov_resource_size(dev, i + PCI_IOV_RESOURCES); > res2 = *res; > res->start += size * offset; >@@ -1221,9 +1215,6 @@ static int pnv_pci_vf_assign_m64(struct pci_dev >*pdev, u16 num_vfs) > if (!res->flags || !res->parent) > continue; > >- if (!pnv_pci_is_mem_pref_64(res->flags)) >- continue; >- > for (j = 0; j < vf_groups; j++) { > do { > win = > find_next_zero_bit(&phb->ioda.m64_bar_alloc, >@@ -1510,6 +1501,12 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 >num_vfs) > pdn = pci_get_pdn(pdev); > > if (phb->type == PNV_PHB_IODA2) { >+ if (!pdn->vfs_expanded) { >+ dev_info(&pdev->dev, "don't support this SRIOV device" >+ " with non M64 VF BAR\n"); >+ return -EBUSY; >+ } >+ It would be -ENOSPC since -EBUSY indicates the devices (VFs) are temparily unavailable. For this case, the VFs are permanently unavailable because of running out of space to accomodate M64 and non-M64 VF BARs. The error message could be printed with dev_warn() and it would be precise as below or something else you prefer: dev_warn(&pdev->dev, "SRIOV not supported because of non-M64 VF BAR\n"); >>> >>> >>>Both messages are cryptic. >>> >>>If it is not M64 BAR, then what is it? It is always in one of M64 BARs (in >>>the worst case - BAR#15?), the difference is if it is segmented or not, no? >>> >> >>The VF BAR could be one of IO, M32, M64. If it's not M64, the VFs are supposed >>to be disabled and the (IO and M32) resources won't be allocted, but for sure, >>the IO and M32 resources can't be put into any one of the 16 PHB's M64 BARs. >>would you recommend one better message then? > > > >dev_warn(&pdev->dev, "SRIOV is disabled as no space is left in 64bit >MMIO window\n"); > >Or it is not "MMIO window"? > The reason is not "no space left in 64bit MMIO window". The reason is the IOV BAR is not 64bit prefetchable, then in linux kernel this can't be allocated from M64 Space, then we can't use M64 BAR to cover it. > > >> > /* Calculate available PE for required VFs */ > mutex_lock(&phb->ioda.pe_alloc_mutex); > pdn->offset = bitmap_find_next_zero_area( >@@ -2774,9 +2771,10 @@ static void pnv_pci_ioda_fixup_iov_resources(struct >pci_dev *pdev) > if (!res->flags || res->parent) > continue; > if (!pnv_pci_is_mem_pref_64(res->flags)) { >- dev_warn(&pdev->dev, " non M64 VF BAR%d: %pR\n", >+ dev_warn(&pdev->dev, "Don't su
Re: [PATCH V2 1/6] powerpc/powernv: don't enable SRIOV when VF BAR contains non M64 BAR
On Thu, Aug 06, 2015 at 02:35:57PM +1000, Gavin Shan wrote: >On Wed, Aug 05, 2015 at 09:24:58AM +0800, Wei Yang wrote: >>On PHB_IODA2, we enable SRIOV devices by mapping IOV BAR with M64 BARs. If >>a SRIOV device's BAR is not 64-bit prefetchable, this is not assigned from >>M64 windwo, which means M64 BAR can't work on it. >> > >s/PHB_IODA2/PHB3 >s/windwo/window > >>This patch makes this explicit. >> >>Signed-off-by: Wei Yang > >The idea sounds right, but there is one question as below. > >>--- >> arch/powerpc/platforms/powernv/pci-ioda.c | 25 + >> 1 file changed, 9 insertions(+), 16 deletions(-) >> >>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c >>b/arch/powerpc/platforms/powernv/pci-ioda.c >>index 5738d31..9b41dba 100644 >>--- a/arch/powerpc/platforms/powernv/pci-ioda.c >>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c >>@@ -908,9 +908,6 @@ static int pnv_pci_vf_resource_shift(struct pci_dev *dev, >>int offset) >> if (!res->flags || !res->parent) >> continue; >> >>- if (!pnv_pci_is_mem_pref_64(res->flags)) >>- continue; >>- >> /* >> * The actual IOV BAR range is determined by the start address >> * and the actual size for num_vfs VFs BAR. This check is to >>@@ -939,9 +936,6 @@ static int pnv_pci_vf_resource_shift(struct pci_dev *dev, >>int offset) >> if (!res->flags || !res->parent) >> continue; >> >>- if (!pnv_pci_is_mem_pref_64(res->flags)) >>- continue; >>- >> size = pci_iov_resource_size(dev, i + PCI_IOV_RESOURCES); >> res2 = *res; >> res->start += size * offset; >>@@ -1221,9 +1215,6 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, >>u16 num_vfs) >> if (!res->flags || !res->parent) >> continue; >> >>- if (!pnv_pci_is_mem_pref_64(res->flags)) >>- continue; >>- >> for (j = 0; j < vf_groups; j++) { >> do { >> win = >> find_next_zero_bit(&phb->ioda.m64_bar_alloc, >>@@ -1510,6 +1501,12 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 >>num_vfs) >> pdn = pci_get_pdn(pdev); >> >> if (phb->type == PNV_PHB_IODA2) { >>+ if (!pdn->vfs_expanded) { >>+ dev_info(&pdev->dev, "don't support this SRIOV device" >>+ " with non M64 VF BAR\n"); >>+ return -EBUSY; >>+ } >>+ > >It would be -ENOSPC since -EBUSY indicates the devices (VFs) are temparily >unavailable. For this case, the VFs are permanently unavailable because of >running out of space to accomodate M64 and non-M64 VF BARs. > >The error message could be printed with dev_warn() and it would be precise >as below or something else you prefer: > > dev_warn(&pdev->dev, "SRIOV not supported because of non-M64 VF BAR\n"); > Thanks for the comment, will change accordingly. > >> /* Calculate available PE for required VFs */ >> mutex_lock(&phb->ioda.pe_alloc_mutex); >> pdn->offset = bitmap_find_next_zero_area( >>@@ -2774,9 +2771,10 @@ static void pnv_pci_ioda_fixup_iov_resources(struct >>pci_dev *pdev) >> if (!res->flags || res->parent) >> continue; >> if (!pnv_pci_is_mem_pref_64(res->flags)) { >>- dev_warn(&pdev->dev, " non M64 VF BAR%d: %pR\n", >>+ dev_warn(&pdev->dev, "Don't support SR-IOV with" >>+ " non M64 VF BAR%d: %pR. \n", >> i, res); >>- continue; >>+ return; >> } >> >> size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES); >>@@ -2795,11 +2793,6 @@ static void pnv_pci_ioda_fixup_iov_resources(struct >>pci_dev *pdev) >> res = &pdev->resource[i + PCI_IOV_RESOURCES]; >> if (!res->flags || res->parent) >> continue; >>- if (!pnv_pci_is_mem_pref_64(res->flags)) { >>- dev_warn(&pdev->dev, "Skipping expanding VF BAR%d: >>%pR\n", >>- i, res); >>- continue; >>- } > >When any one IOV BAR on the PF is non-M64, none of the VFs can be enabled. >Will we still allocate/assign M64 or M32 resources for the IOV BARs? If so, >I think it can be avoided. > Don't get your point. You mean to avoid this function? Or clear the IOV BAR when we found one of it is non-M64? >> >> dev_dbg(&pdev->dev, " Fixing VF BAR%d: %pR to\n", i, res); >> size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES); >>-- >>1.7.9.5 >> -- Richard Yang Help you, Help me ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lis
Re: [PATCH v6 38/42] drivers/of: Unflatten subordinate nodes after specified level
On Wed, Aug 5, 2015 at 11:11 PM, Gavin Shan wrote: > unflatten_dt_node() is called recursively to unflatten FDT nodes > with the assumption that FDT blob has only one root node, which > isn't true when the FDT blob represents device sub-tree. This > improves the function to supporting device sub-tree that have > multiple nodes in the first level: > >* Rename original unflatten_dt_node() to __unflatten_dt_node(). >* Wrapper unflatten_dt_node() calls __unflatten_dt_node() with > adjusted current node depth to 1 to avoid underflow. > > Signed-off-by: Gavin Shan Acked-by: Rob Herring > --- > drivers/of/fdt.c | 53 - > 1 file changed, 40 insertions(+), 13 deletions(-) > > diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c > index 0749656..a18a2ce 100644 > --- a/drivers/of/fdt.c > +++ b/drivers/of/fdt.c > @@ -161,7 +161,7 @@ static void *unflatten_dt_alloc(void **mem, unsigned long > size, > } > > /** > - * unflatten_dt_node - Alloc and populate a device_node from the flat tree > + * __unflatten_dt_node - Alloc and populate a device_node from the flat tree > * @blob: The parent device tree blob > * @mem: Memory chunk to use for allocating device nodes and properties > * @poffset: pointer to node in flat tree > @@ -171,20 +171,20 @@ static void *unflatten_dt_alloc(void **mem, unsigned > long size, > * @dryrun: If true, do not allocate device nodes but still calculate needed > * memory size > */ > -static void * unflatten_dt_node(const void *blob, > +static void *__unflatten_dt_node(const void *blob, > void *mem, > int *poffset, > struct device_node *dad, > struct device_node **nodepp, > unsigned long fpsize, > - bool dryrun) > + bool dryrun, > + int *depth) > { > const __be32 *p; > struct device_node *np; > struct property *pp, **prev_pp = NULL; > const char *pathp; > unsigned int l, allocl; > - static int depth = 0; > int old_depth; > int offset; > int has_name = 0; > @@ -337,13 +337,25 @@ static void * unflatten_dt_node(const void *blob, > np->type = ""; > } > > - old_depth = depth; > - *poffset = fdt_next_node(blob, *poffset, &depth); > - if (depth < 0) > - depth = 0; > - while (*poffset > 0 && depth > old_depth) > - mem = unflatten_dt_node(blob, mem, poffset, np, NULL, > - fpsize, dryrun); > + /* Multiple nodes might be in the first depth level if > +* the device tree is sub-tree. All nodes in current > +* or deeper depth are unflattened after it returns. > +*/ > + old_depth = *depth; > + *poffset = fdt_next_node(blob, *poffset, depth); > + while (*poffset > 0) { > + if (*depth < old_depth) > + break; > + > + if (*depth == old_depth) > + mem = __unflatten_dt_node(blob, mem, poffset, > + dad, NULL, fpsize, > + dryrun, depth); > + else if (*depth > old_depth) > + mem = __unflatten_dt_node(blob, mem, poffset, > + np, NULL, fpsize, > + dryrun, depth); > + } > > if (*poffset < 0 && *poffset != -FDT_ERR_NOTFOUND) > pr_err("unflatten: error %d processing FDT\n", *poffset); > @@ -369,6 +381,20 @@ static void * unflatten_dt_node(const void *blob, > return mem; > } > > +static void *unflatten_dt_node(const void *blob, > + void *mem, > + int *poffset, > + struct device_node *dad, > + struct device_node **nodepp, > + bool dryrun) > +{ > + int depth = 1; > + > + return __unflatten_dt_node(blob, mem, poffset, > + dad, nodepp, 0, > + dryrun, &depth); > +} > + > /** > * __unflatten_device_tree - create tree of device_nodes from flat blob > * > @@ -408,7 +434,8 @@ static void __unflatten_device_tree(const void *blob, > > /* First pass, scan for size */ > start = 0; > - size = (unsigned long)unflatten_dt_node(blob, NULL, &start, NULL, > NULL, 0, true); > + size = (unsigned long)unflatten_dt_node(blob, NULL, &start, > + NULL, NULL, true); > size = ALIGN(size, 4); > > pr_debug(" size is %lx, allocating...\n", size); >
Re: [PATCH V2 5/6] powerpc/powernv: boundary the total vf bar size instead of the individual one
On Thu, Aug 06, 2015 at 03:28:51PM +1000, Gavin Shan wrote: >On Wed, Aug 05, 2015 at 09:25:02AM +0800, Wei Yang wrote: >>Each VF could have 6 BARs at most. When the total BAR size exceeds the >>gate, after expanding it will also exhaust the M64 Window. >> >>This patch limits the boundary by checking the total VF BAR size instead of >>the individual BAR. >> >>Signed-off-by: Wei Yang > >Ok. I didn't look at this when giving comments to last patch. It turns >you have the change in this patch. Please merge it with the previous >patch. > Hmm... I prefer to have them in two patches. One focus on the calculation of gate and the other focus on checking the total VF BAR size. This would help record the change. >>--- >> arch/powerpc/platforms/powernv/pci-ioda.c | 13 +++-- >> 1 file changed, 7 insertions(+), 6 deletions(-) >> >>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c >>b/arch/powerpc/platforms/powernv/pci-ioda.c >>index 31dcedc..4042303 100644 >>--- a/arch/powerpc/platforms/powernv/pci-ioda.c >>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c >>@@ -2702,7 +2702,7 @@ static void pnv_pci_ioda_fixup_iov_resources(struct >>pci_dev *pdev) >> struct pnv_phb *phb; >> struct resource *res; >> int i; >>- resource_size_t size, gate; >>+ resource_size_t size, gate, total_vf_bar_sz; >> struct pci_dn *pdn; >> int mul, total_vfs; >> >>@@ -2729,6 +2729,7 @@ static void pnv_pci_ioda_fixup_iov_resources(struct >>pci_dev *pdev) >> * Window and limit the system flexibility. >> */ >> gate = phb->ioda.m64_segsize >> 1; >>+ total_vf_bar_sz = 0; >> >> for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) { >> res = &pdev->resource[i + PCI_IOV_RESOURCES]; >>@@ -2741,13 +2742,13 @@ static void pnv_pci_ioda_fixup_iov_resources(struct >>pci_dev *pdev) >> return; >> } >> >>- size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES); >>+ total_vf_bar_sz += pci_iov_resource_size(pdev, >>+ i + PCI_IOV_RESOURCES); >> >> /* bigger than or equal to gate */ >>- if (size >= gate) { >>- dev_info(&pdev->dev, "PowerNV: VF BAR%d: %pR IOV size " >>- "is bigger than %lld, roundup power2\n", >>- i, res, gate); >>+ if (total_vf_bar_sz >= gate) { >>+ dev_info(&pdev->dev, "PowerNV: VF BAR Total IOV size " >>+ "is bigger than %lld, roundup power2\n", gate); >> mul = roundup_pow_of_two(total_vfs); >> pdn->m64_single_mode = true; >> break; >>-- >>1.7.9.5 >> -- Richard Yang Help you, Help me ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V6 2/6] mm: mlock: Add new mlock system call
On 07/29/2015 05:42 PM, Eric B Munson wrote: With the refactored mlock code, introduce a new system call for mlock. The new call will allow the user to specify what lock states are being added. mlock2 is trivial at the moment, but a follow on patch will add a new mlock state making it useful. Signed-off-by: Eric B Munson Cc: Michal Hocko Cc: Vlastimil Babka Acked-by: Vlastimil Babka ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V2 2/6] powerpc/powernv: simplify the calculation of iov resource
On Thu, Aug 06, 2015 at 02:51:40PM +1000, Gavin Shan wrote: >On Wed, Aug 05, 2015 at 09:24:59AM +0800, Wei Yang wrote: >>The alignment of IOV BAR on PowerNV platform is the total size of the IOV >>BAR. No matter whether the IOV BAR is truncated or not, the total size >>could be calculated by (vfs_expanded * VF size). >> > >s/VF size/VF BAR size > >I think the changelog would be more explicit: > >The alignment of IOV BAR on PowerNV platform is the total size of the >IOV BAR, no matter whether the IOV BAR is extended with number of max >VFs or number of max PE number (256). The alignment can be calculated number of max VFs is not correct. This should be roundup_pow_of_two(total_vfs). Others looks good to me. >by (vfs_expaned * VF_BAR_size). > >>This patch simplifies the pnv_pci_iov_resource_alignment() by removing the >>first case. >> >>Signed-off-by: Wei Yang > >Reviewed-by: Gavin Shan > >>--- >> arch/powerpc/platforms/powernv/pci-ioda.c | 14 +- >> 1 file changed, 9 insertions(+), 5 deletions(-) >> >>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c >>b/arch/powerpc/platforms/powernv/pci-ioda.c >>index 9b41dba..7192e62 100644 >>--- a/arch/powerpc/platforms/powernv/pci-ioda.c >>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c >>@@ -2987,12 +2987,16 @@ static resource_size_t >>pnv_pci_iov_resource_alignment(struct pci_dev *pdev, >>int resno) >> { >> struct pci_dn *pdn = pci_get_pdn(pdev); >>- resource_size_t align, iov_align; >>- >>- iov_align = resource_size(&pdev->resource[resno]); >>- if (iov_align) >>- return iov_align; >>+ resource_size_t align; >> >>+ /* >>+ * On PowerNV platform, IOV BAR is mapped by M64 BAR to enable the >>+ * SR-IOV. While from hardware perspective, the range mapped by M64 >>+ * BAR should be size aligned. >>+ * >>+ * This function return the total IOV BAR size if expanded or just the >>+ * individual size if not. >>+ */ >> align = pci_iov_resource_size(pdev, resno); >> if (pdn->vfs_expanded) >> return pdn->vfs_expanded * align; >>-- >>1.7.9.5 >> -- Richard Yang Help you, Help me ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v6 41/42] drivers/of: Export OF changeset functions
On Wed, Aug 5, 2015 at 11:11 PM, Gavin Shan wrote: > The PowerNV PCI hotplug driver is going to use the OF changeset > to manage the changed device sub-tree, which requires those OF > changeset functions are exported. > > Signed-off-by: Gavin Shan > --- > drivers/of/dynamic.c | 65 > --- > drivers/of/overlay.c | 8 +++ > drivers/of/unittest.c | 4 ++-- > include/linux/of.h| 2 ++ > 4 files changed, 54 insertions(+), 25 deletions(-) > > diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c > index 53826b8..af65b5b 100644 > --- a/drivers/of/dynamic.c > +++ b/drivers/of/dynamic.c > @@ -646,6 +646,7 @@ void of_changeset_init(struct of_changeset *ocs) > memset(ocs, 0, sizeof(*ocs)); > INIT_LIST_HEAD(&ocs->entries); > } > +EXPORT_SYMBOL(of_changeset_init); We probably want these to be the _GPL variant. > > /** > * of_changeset_destroy - Destroy a changeset > @@ -662,20 +663,9 @@ void of_changeset_destroy(struct of_changeset *ocs) > list_for_each_entry_safe_reverse(ce, cen, &ocs->entries, node) > __of_changeset_entry_destroy(ce); > } > +EXPORT_SYMBOL(of_changeset_destroy); > > -/** > - * of_changeset_apply - Applies a changeset > - * > - * @ocs: changeset pointer > - * > - * Applies a changeset to the live tree. > - * Any side-effects of live tree state changes are applied here on > - * sucess, like creation/destruction of devices and side-effects > - * like creation of sysfs properties and directories. > - * Returns 0 on success, a negative error value in case of an error. > - * On error the partially applied effects are reverted. > - */ > -int of_changeset_apply(struct of_changeset *ocs) > +int __of_changeset_apply(struct of_changeset *ocs) > { > struct of_changeset_entry *ce; > int ret; > @@ -704,17 +694,30 @@ int of_changeset_apply(struct of_changeset *ocs) > } > > /** > - * of_changeset_revert - Reverts an applied changeset > + * of_changeset_apply - Applies a changeset > * > * @ocs: changeset pointer > * > - * Reverts a changeset returning the state of the tree to what it > - * was before the application. > - * Any side-effects like creation/destruction of devices and > - * removal of sysfs properties and directories are applied. > + * Applies a changeset to the live tree. > + * Any side-effects of live tree state changes are applied here on > + * sucess, like creation/destruction of devices and side-effects s/sucess/success/ > + * like creation of sysfs properties and directories. > * Returns 0 on success, a negative error value in case of an error. > + * On error the partially applied effects are reverted. > */ > -int of_changeset_revert(struct of_changeset *ocs) > +int of_changeset_apply(struct of_changeset *ocs) > +{ > + int ret; > + > + mutex_lock(&of_mutex); > + ret = __of_changeset_apply(ocs); > + mutex_unlock(&of_mutex); > + > + return ret; > +} > +EXPORT_SYMBOL(of_changeset_apply); > + > +int __of_changeset_revert(struct of_changeset *ocs) > { > struct of_changeset_entry *ce; > int ret; > @@ -742,6 +745,29 @@ int of_changeset_revert(struct of_changeset *ocs) > } > > /** > + * of_changeset_revert - Reverts an applied changeset > + * > + * @ocs: changeset pointer > + * > + * Reverts a changeset returning the state of the tree to what it > + * was before the application. > + * Any side-effects like creation/destruction of devices and > + * removal of sysfs properties and directories are applied. > + * Returns 0 on success, a negative error value in case of an error. > + */ > +int of_changeset_revert(struct of_changeset *ocs) > +{ > + int ret; > + > + mutex_lock(&of_mutex); > + ret = __of_changeset_revert(ocs); > + mutex_unlock(&of_mutex); > + > + return ret; > +} > +EXPORT_SYMBOL(of_changeset_revert); > + > +/** > * of_changeset_action - Perform a changeset action > * > * @ocs: changeset pointer > @@ -779,3 +805,4 @@ int of_changeset_action(struct of_changeset *ocs, > unsigned long action, > list_add_tail(&ce->node, &ocs->entries); > return 0; > } > +EXPORT_SYMBOL(of_changeset_action); > diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c > index 24e025f..804ea33 100644 > --- a/drivers/of/overlay.c > +++ b/drivers/of/overlay.c > @@ -378,9 +378,9 @@ int of_overlay_create(struct device_node *tree) > } > > /* apply the changeset */ > - err = of_changeset_apply(&ov->cset); > + err = __of_changeset_apply(&ov->cset); > if (err) { > - pr_err("%s: of_changeset_apply() failed for tree@%s\n", > + pr_err("%s: __of_changeset_apply() failed for tree@%s\n", > __func__, tree->full_name); > goto err_revert_overlay; > } > @@ -508,7 +508,7 @@ int of_overlay_destroy(int id) > > > list_del(&ov->node); > - of_changese
Re: [PATCH V2 6/6] powerpc/powernv: allocate discrete PE# when using M64 BAR in Single PE mode
On Thu, Aug 06, 2015 at 03:36:01PM +1000, Gavin Shan wrote: >On Wed, Aug 05, 2015 at 09:25:03AM +0800, Wei Yang wrote: >>When M64 BAR is set to Single PE mode, the PE# assigned to VF could be >>discrete. >> >>This patch restructures the patch to allocate discrete PE# for VFs when M64 >>BAR is set to Single PE mode. >> >>Signed-off-by: Wei Yang >>--- >> arch/powerpc/include/asm/pci-bridge.h |2 +- >> arch/powerpc/platforms/powernv/pci-ioda.c | 69 >> + >> 2 files changed, 51 insertions(+), 20 deletions(-) >> >>diff --git a/arch/powerpc/include/asm/pci-bridge.h >>b/arch/powerpc/include/asm/pci-bridge.h >>index 8aeba4c..72415c7 100644 >>--- a/arch/powerpc/include/asm/pci-bridge.h >>+++ b/arch/powerpc/include/asm/pci-bridge.h >>@@ -213,7 +213,7 @@ struct pci_dn { >> #ifdef CONFIG_PCI_IOV >> u16 vfs_expanded; /* number of VFs IOV BAR expanded */ >> u16 num_vfs;/* number of VFs enabled*/ >>- int offset; /* PE# for the first VF PE */ >>+ int *offset;/* PE# for the first VF PE or array */ >> boolm64_single_mode;/* Use M64 BAR in Single Mode */ >> #define IODA_INVALID_M64(-1) >> int (*m64_map)[PCI_SRIOV_NUM_BARS]; > >how about renaming "offset" to "pe_num_map", or "pe_map" ? Similar to the >comments >I gave to the "m64_bar_map", num_of_max_vfs entries can be allocated. Though >not >all of them will be used, not too much memory will be wasted. > Thanks for your comment. I have thought about change the name to make it more self explain. While another fact I want to take in is this field is also used to be reflect the shift offset when M64 BAR is used in the Shared Mode. So I maintain the name. How about use "enum", one maintain the name "offset", and another one rename to "pe_num_map". And use the meaningful name at proper place? >>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c >>b/arch/powerpc/platforms/powernv/pci-ioda.c >>index 4042303..9953829 100644 >>--- a/arch/powerpc/platforms/powernv/pci-ioda.c >>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c >>@@ -1243,7 +1243,7 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, >>u16 num_vfs) >> >> /* Map the M64 here */ >> if (pdn->m64_single_mode) { >>- pe_num = pdn->offset + j; >>+ pe_num = pdn->offset[j]; >> rc = opal_pci_map_pe_mmio_window(phb->opal_id, >> pe_num, OPAL_M64_WINDOW_TYPE, >> pdn->m64_map[j][i], 0); >>@@ -1347,7 +1347,7 @@ void pnv_pci_sriov_disable(struct pci_dev *pdev) >> struct pnv_phb*phb; >> struct pci_dn *pdn; >> struct pci_sriov *iov; >>- u16 num_vfs; >>+ u16num_vfs, i; >> >> bus = pdev->bus; >> hose = pci_bus_to_host(bus); >>@@ -1361,14 +1361,18 @@ void pnv_pci_sriov_disable(struct pci_dev *pdev) >> >> if (phb->type == PNV_PHB_IODA2) { >> if (!pdn->m64_single_mode) >>- pnv_pci_vf_resource_shift(pdev, -pdn->offset); >>+ pnv_pci_vf_resource_shift(pdev, -*pdn->offset); >> >> /* Release M64 windows */ >> pnv_pci_vf_release_m64(pdev, num_vfs); >> >> /* Release PE numbers */ >>- bitmap_clear(phb->ioda.pe_alloc, pdn->offset, num_vfs); >>- pdn->offset = 0; >>+ if (pdn->m64_single_mode) { >>+ for (i = 0; i < num_vfs; i++) >>+ pnv_ioda_free_pe(phb, pdn->offset[i]); >>+ } else >>+ bitmap_clear(phb->ioda.pe_alloc, *pdn->offset, num_vfs); >>+ kfree(pdn->offset); > >Can pnv_ioda_free_pe() be reused to release PE ? You mean use it to similar thing in pnv_ioda_deconfigure_pe()? > >> } >> } >> >>@@ -1394,7 +1398,10 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, >>u16 num_vfs) >> >> /* Reserve PE for each VF */ >> for (vf_index = 0; vf_index < num_vfs; vf_index++) { >>- pe_num = pdn->offset + vf_index; >>+ if (pdn->m64_single_mode) >>+ pe_num = pdn->offset[vf_index]; >>+ else >>+ pe_num = *pdn->offset + vf_index; >> >> pe = &phb->ioda.pe_array[pe_num]; >> pe->pe_number = pe_num; >>@@ -1436,6 +1443,7 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 >>num_vfs) >> struct pnv_phb*phb; >> struct pci_dn *pdn; >> intret; >>+ u16i; >> >> bus = pdev->bus; >> hose = pci_bus_to_host(bus); >>@@ -1462,19 +1470,38 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 >>num_vfs) >> } >> >> /* Calculate available PE for required VFs */ >>-
Re: [V3] powerpc/irq: Enable some more exceptions in /proc/interrupts interface
On 08/04/2015 03:27 PM, Michael Ellerman wrote: > On Mon, 2015-13-07 at 08:16:06 UTC, Anshuman Khandual wrote: >> This patch enables facility unavailable exceptions for generic facility, >> FPU, ALTIVEC and VSX in /proc/interrupts listing by incrementing their >> newly added IRQ statistical counters as and when these exceptions happen. >> This also adds couple of helper functions which will be called from within >> the interrupt handler context to update their statistics. Similarly this >> patch also enables alignment and program check exceptions as well. > > ... > >> diff --git a/arch/powerpc/kernel/exceptions-64s.S >> b/arch/powerpc/kernel/exceptions-64s.S >> index 0a0399c2..a86180c 100644 >> --- a/arch/powerpc/kernel/exceptions-64s.S >> +++ b/arch/powerpc/kernel/exceptions-64s.S >> @@ -1158,6 +1158,7 @@ BEGIN_FTR_SECTION >> END_FTR_SECTION_IFSET(CPU_FTR_TM) >> #endif >> bl load_up_fpu >> +bl fpu_unav_exceptions_count > > Is it safe to call C code here? Hmm, is it not ? I had that question but was not really sure. Dont understand the difference between 'fast_exception_return' and 'ret_from_except' completely. Will converting the following sequence of code bl load_up_fpu + bl fpu_unav_exceptions_count b fast_exception_return into bl load_up_fpu RECONCILE_IRQ_STATE(r10, r11) addir3,r1,STACK_FRAME_OVERHEAD + bl fpu_unav_exceptions_count b ret_from_except help solve the problem ? ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] powerpc/prom: Use DRCONF flags while processing detected LMBs
This patch just replaces hard coded values with existing DRCONF flags while procesing detected LMBs from the device tree. This does not change any functionality. Signed-off-by: Anshuman Khandual --- arch/powerpc/kernel/prom.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c index 8b888b1..70a8cab 100644 --- a/arch/powerpc/kernel/prom.c +++ b/arch/powerpc/kernel/prom.c @@ -476,9 +476,10 @@ static int __init early_init_dt_scan_drconf_memory(unsigned long node) flags = of_read_number(&dm[3], 1); /* skip DRC index, pad, assoc. list index, flags */ dm += 4; - /* skip this block if the reserved bit is set in flags (0x80) - or if the block is not assigned to this partition (0x8) */ - if ((flags & 0x80) || !(flags & 0x8)) + /* skip this block if the reserved bit is set in flags + or if the block is not assigned to this partition */ + if ((flags & DRCONF_MEM_RESERVED) || + !(flags & DRCONF_MEM_ASSIGNED)) continue; size = memblock_size; rngs = 1; -- 2.1.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 09/10] Define PERF_PMU_TXN_READ interface
On Sun, Jul 26, 2015 at 10:40:37PM -0700, Sukadev Bhattiprolu wrote: > @@ -3743,7 +3762,13 @@ static u64 perf_event_aggregate(struct perf_event > *event, u64 *enabled, > lockdep_assert_held(&event->child_mutex); > > list_for_each_entry(child, &event->child_list, child_list) { > +#if 0 > + /* > + * TODO: Do we need this read() for group events on PMUs that > + * don't implement PERF_PMU_TXN_READ transactions? > + */ > (void)perf_event_read(child, false); > +#endif > total += perf_event_count(child); > *enabled += child->total_time_enabled; > *running += child->total_time_running; Aw gawd, I've been an idiot!! I just realized this is a _CHILD_ loop, not a _SIBLING_ loop !! We need to flip the loops in perf_read_group(), find attached two patches that go on top of 1,2,4. After this you can add the perf_event_read() return value (just fold patches 6,8) after which you can do patch 10 (which has a broken Subject fwiw). Subject: perf: Add group reads to perf_event_read() From: Peter Zijlstra Date: Thu, 23 Jul 2015 10:04:35 +0200 Enable perf_event_read() to update entire groups at once, this will be useful for read transactions. Cc: Ingo Molnar Cc: Arnaldo Carvalho de Melo Cc: Michael Ellerman Cc: Sukadev Bhattiprolu Signed-off-by: Peter Zijlstra (Intel) Link: http://lkml.kernel.org/r/20150723080435.ge25...@twins.programming.kicks-ass.net --- kernel/events/core.c | 39 --- 1 file changed, 32 insertions(+), 7 deletions(-) --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -3184,12 +3184,18 @@ void perf_event_exec(void) rcu_read_unlock(); } +struct perf_read_data { + struct perf_event *event; + bool group; +}; + /* * Cross CPU call to read the hardware event */ static void __perf_event_read(void *info) { - struct perf_event *event = info; + struct perf_read_data *data = info; + struct perf_event *sub, *event = data->event; struct perf_event_context *ctx = event->ctx; struct perf_cpu_context *cpuctx = __get_cpu_context(ctx); @@ -3208,9 +3214,21 @@ static void __perf_event_read(void *info update_context_time(ctx); update_cgrp_time_from_event(event); } + update_event_times(event); if (event->state == PERF_EVENT_STATE_ACTIVE) event->pmu->read(event); + + if (!data->group) + goto unlock; + + list_for_each_entry(sub, &event->sibling_list, group_entry) { + update_event_times(sub); + if (sub->state == PERF_EVENT_STATE_ACTIVE) + sub->pmu->read(sub); + } + +unlock: raw_spin_unlock(&ctx->lock); } @@ -3222,15 +3240,19 @@ static inline u64 perf_event_count(struc return __perf_event_count(event); } -static void perf_event_read(struct perf_event *event) +static void perf_event_read(struct perf_event *event, bool group) { /* * If event is enabled and currently active on a CPU, update the * value in the event structure: */ if (event->state == PERF_EVENT_STATE_ACTIVE) { + struct perf_read_data data = { + .event = event, + .group = group, + }; smp_call_function_single(event->oncpu, -__perf_event_read, event, 1); +__perf_event_read, &data, 1); } else if (event->state == PERF_EVENT_STATE_INACTIVE) { struct perf_event_context *ctx = event->ctx; unsigned long flags; @@ -3245,7 +3267,10 @@ static void perf_event_read(struct perf_ update_context_time(ctx); update_cgrp_time_from_event(event); } - update_event_times(event); + if (group) + update_group_times(event); + else + update_event_times(event); raw_spin_unlock_irqrestore(&ctx->lock, flags); } } @@ -3764,7 +3789,7 @@ u64 perf_event_read_value(struct perf_ev mutex_lock(&event->child_mutex); - perf_event_read(event); + perf_event_read(event, false); total += perf_event_count(event); *enabled += event->total_time_enabled + @@ -3773,7 +3798,7 @@ u64 perf_event_read_value(struct perf_ev atomic64_read(&event->child_total_time_running); list_for_each_entry(child, &event->child_list, child_list) { - perf_event_read(child); + perf_event_read(child, false); total += perf_event_count(child); *enabled += child->total_time_enabled; *running += child->total_time_running; @@ -3934,7 +3959,7 @@ static unsigned i
Re: [Xen-devel] [PATCH v2 4/8] xen: Use the correctly the Xen memory terminologies
On 06/08/15 12:06, Stefano Stabellini wrote: > On Thu, 6 Aug 2015, Julien Grall wrote: >> Hi, >> >> >> On 04/08/15 19:12, Julien Grall wrote: >>> diff --git a/include/xen/page.h b/include/xen/page.h >>> index c5ed20b..e7e1425 100644 >>> --- a/include/xen/page.h >>> +++ b/include/xen/page.h >>> @@ -3,9 +3,9 @@ >>> >>> #include >>> >>> -static inline unsigned long page_to_mfn(struct page *page) >>> +static inline unsigned long page_to_gfn(struct page *page) >>> { >>> - return pfn_to_mfn(page_to_pfn(page)); >>> + return pfn_to_gfn(page_to_pfn(page)); >>> } >> >> I've just noticed that there is a function gfn_to_page used for KVM. >> >> Maybe I should rename page_to_gfn to xen_page_to_gfn to avoid confusion >> with KVM one? > > Yeah, prepending xen would help to avoid namespace pollution. Will do. May I keep your Reviewed-by for this mechanical change? Regards, -- Julien Grall ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V2 1/6] powerpc/powernv: don't enable SRIOV when VF BAR contains non M64 BAR
On Thu, Aug 06, 2015 at 05:47:42PM +1000, Alexey Kardashevskiy wrote: >On 08/06/2015 04:57 PM, Gavin Shan wrote: >>On Thu, Aug 06, 2015 at 04:10:21PM +1000, Alexey Kardashevskiy wrote: >>>On 08/06/2015 02:35 PM, Gavin Shan wrote: On Wed, Aug 05, 2015 at 09:24:58AM +0800, Wei Yang wrote: >On PHB_IODA2, we enable SRIOV devices by mapping IOV BAR with M64 BARs. If >a SRIOV device's BAR is not 64-bit prefetchable, this is not assigned from >M64 windwo, which means M64 BAR can't work on it. > > >The proper text would be something like this: > >=== >SRIOV only supports 64bit MMIO. So if we fail to assign 64bit BAR, we cannot >enable the device. >=== > > > s/PHB_IODA2/PHB3 >>> >>> >>>No, it is IODA2. OPEL does PHB3-specific bits, the host kernel just uses >>>OPAL. >>> >> >>Ok. >> >>> s/windwo/window >This patch makes this explicit. > >Signed-off-by: Wei Yang The idea sounds right, but there is one question as below. >--- >arch/powerpc/platforms/powernv/pci-ioda.c | 25 + >1 file changed, 9 insertions(+), 16 deletions(-) > >diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c >b/arch/powerpc/platforms/powernv/pci-ioda.c >index 5738d31..9b41dba 100644 >--- a/arch/powerpc/platforms/powernv/pci-ioda.c >+++ b/arch/powerpc/platforms/powernv/pci-ioda.c >@@ -908,9 +908,6 @@ static int pnv_pci_vf_resource_shift(struct pci_dev >*dev, int offset) > if (!res->flags || !res->parent) > continue; > >- if (!pnv_pci_is_mem_pref_64(res->flags)) >- continue; >- > /* >* The actual IOV BAR range is determined by the start address >* and the actual size for num_vfs VFs BAR. This check is to >@@ -939,9 +936,6 @@ static int pnv_pci_vf_resource_shift(struct pci_dev >*dev, int offset) > if (!res->flags || !res->parent) > continue; > >- if (!pnv_pci_is_mem_pref_64(res->flags)) >- continue; >- > size = pci_iov_resource_size(dev, i + PCI_IOV_RESOURCES); > res2 = *res; > res->start += size * offset; >@@ -1221,9 +1215,6 @@ static int pnv_pci_vf_assign_m64(struct pci_dev >*pdev, u16 num_vfs) > if (!res->flags || !res->parent) > continue; > >- if (!pnv_pci_is_mem_pref_64(res->flags)) >- continue; >- > for (j = 0; j < vf_groups; j++) { > do { > win = > find_next_zero_bit(&phb->ioda.m64_bar_alloc, >@@ -1510,6 +1501,12 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 >num_vfs) > pdn = pci_get_pdn(pdev); > > if (phb->type == PNV_PHB_IODA2) { >+ if (!pdn->vfs_expanded) { >+ dev_info(&pdev->dev, "don't support this SRIOV device" >+ " with non M64 VF BAR\n"); >+ return -EBUSY; >+ } >+ It would be -ENOSPC since -EBUSY indicates the devices (VFs) are temparily unavailable. For this case, the VFs are permanently unavailable because of running out of space to accomodate M64 and non-M64 VF BARs. The error message could be printed with dev_warn() and it would be precise as below or something else you prefer: dev_warn(&pdev->dev, "SRIOV not supported because of non-M64 VF BAR\n"); >>> >>> >>>Both messages are cryptic. >>> >>>If it is not M64 BAR, then what is it? It is always in one of M64 BARs (in >>>the worst case - BAR#15?), the difference is if it is segmented or not, no? >>> >> >>The VF BAR could be one of IO, M32, M64. If it's not M64, the VFs are supposed >>to be disabled and the (IO and M32) resources won't be allocted, but for sure, >>the IO and M32 resources can't be put into any one of the 16 PHB's M64 BARs. >>would you recommend one better message then? > > > >dev_warn(&pdev->dev, "SRIOV is disabled as no space is left in 64bit MMIO >window\n"); > >Or it is not "MMIO window"? > It's a confusing message: It's not related to space and M64 window. When any VF BAR is IO or M32, we just give up attempting to allocate resources for it. I still think my original message is enough or similarly below one: dev_warn(&pdev->dev, "Disabled SRIOV because of non-M64 BAR" >> > /* Calculate available PE for required VFs */ > mutex_lock(&phb->ioda.pe_alloc_mutex); > pdn->offset = bitmap_find_next_zero_area( >@@ -2774,9 +2771,10 @@ static void pnv_pci_ioda_fixup_iov_resources(struct >pci_dev *pdev) > if (!res->flags || res->parent) > continue; > if (!pnv_pci_is_mem_pref_64(res->flags)) { >- dev_warn(&pdev->dev, " n
Re: [Xen-devel] [PATCH v2 4/8] xen: Use the correctly the Xen memory terminologies
On Thu, 6 Aug 2015, Julien Grall wrote: > Hi, > > > On 04/08/15 19:12, Julien Grall wrote: > > diff --git a/include/xen/page.h b/include/xen/page.h > > index c5ed20b..e7e1425 100644 > > --- a/include/xen/page.h > > +++ b/include/xen/page.h > > @@ -3,9 +3,9 @@ > > > > #include > > > > -static inline unsigned long page_to_mfn(struct page *page) > > +static inline unsigned long page_to_gfn(struct page *page) > > { > > - return pfn_to_mfn(page_to_pfn(page)); > > + return pfn_to_gfn(page_to_pfn(page)); > > } > > I've just noticed that there is a function gfn_to_page used for KVM. > > Maybe I should rename page_to_gfn to xen_page_to_gfn to avoid confusion > with KVM one? Yeah, prepending xen would help to avoid namespace pollution. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [Xen-devel] [PATCH v2 4/8] xen: Use the correctly the Xen memory terminologies
Hi, On 04/08/15 19:12, Julien Grall wrote: > diff --git a/include/xen/page.h b/include/xen/page.h > index c5ed20b..e7e1425 100644 > --- a/include/xen/page.h > +++ b/include/xen/page.h > @@ -3,9 +3,9 @@ > > #include > > -static inline unsigned long page_to_mfn(struct page *page) > +static inline unsigned long page_to_gfn(struct page *page) > { > - return pfn_to_mfn(page_to_pfn(page)); > + return pfn_to_gfn(page_to_pfn(page)); > } I've just noticed that there is a function gfn_to_page used for KVM. Maybe I should rename page_to_gfn to xen_page_to_gfn to avoid confusion with KVM one? Regards, -- Julien Grall ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V2 2/6] powerpc/powernv: simplify the calculation of iov resource
On 08/06/2015 07:41 PM, Wei Yang wrote: On Thu, Aug 06, 2015 at 07:00:00PM +1000, Alexey Kardashevskiy wrote: On 08/06/2015 02:51 PM, Gavin Shan wrote: On Wed, Aug 05, 2015 at 09:24:59AM +0800, Wei Yang wrote: The alignment of IOV BAR on PowerNV platform is the total size of the IOV BAR. No matter whether the IOV BAR is truncated or not, the total size could be calculated by (vfs_expanded * VF size). s/VF size/VF BAR size I think the changelog would be more explicit: The alignment of IOV BAR on PowerNV platform is the total size of the IOV BAR, no matter whether the IOV BAR is extended with number of max VFs or number of max PE number (256). The alignment can be calculated by (vfs_expaned * VF_BAR_size). Is that really a PowerNV-specific requirement or it is valid for every platform (I suspect this is the case here)? Currently, it is PowerNV-specific. How is x86 different on this matter? Why would we need this extra alignment, not just VF's BAR alignment? Also, what is the exact meaning of "expanded" in @vfs_expanded? It is either 255 (if individual VF BARs are <= 64MB) or roundup_pow_of_two(total_vfs) (which is something like 4 or 16). What is expanded here? PF's IOV BAR original size is (VF BAR size * total_vfs). After expanding, the IOV BAR size is (VF BAR size * 256) or (VF BAR size * roundup_pow_of_two(total_vfs)). Ufff, got it now. I'd store just an expanded IOV BAR size (not some magic VFs number) because this is what it actually is: pdn->vfs_expanded * align This patch simplifies the pnv_pci_iov_resource_alignment() by removing the first case. Signed-off-by: Wei Yang Reviewed-by: Gavin Shan --- arch/powerpc/platforms/powernv/pci-ioda.c | 14 +- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index 9b41dba..7192e62 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -2987,12 +2987,16 @@ static resource_size_t pnv_pci_iov_resource_alignment(struct pci_dev *pdev, int resno) { struct pci_dn *pdn = pci_get_pdn(pdev); - resource_size_t align, iov_align; - - iov_align = resource_size(&pdev->resource[resno]); - if (iov_align) - return iov_align; + resource_size_t align; + /* +* On PowerNV platform, IOV BAR is mapped by M64 BAR to enable the +* SR-IOV. While from hardware perspective, the range mapped by M64 +* BAR should be size aligned. +* +* This function return the total IOV BAR size if expanded or just the +* individual size if not. +*/ align = pci_iov_resource_size(pdev, resno); if (pdn->vfs_expanded) return pdn->vfs_expanded * align; -- 1.7.9.5 -- Alexey -- Alexey ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v3 1/1] KVM: PPC: Book3S: correct width in XER handling
Hi, I'd also like to see this patch in the mainstream as it fixes a bug appearing when we switch from vCPU context to hypervisor context (guest crash). Laurent On 06/08/2015 03:25, Sam Bobroff wrote: > Ping? > > I think I've addressed all the comments in this version. Is there anything > else > I need to look at? > > Cheers, > Sam. > > On Wed, May 27, 2015 at 09:56:57AM +1000, Sam Bobroff wrote: >> In 64 bit kernels, the Fixed Point Exception Register (XER) is a 64 >> bit field (e.g. in kvm_regs and kvm_vcpu_arch) and in most places it is >> accessed as such. >> >> This patch corrects places where it is accessed as a 32 bit field by a >> 64 bit kernel. In some cases this is via a 32 bit load or store >> instruction which, depending on endianness, will cause either the >> lower or upper 32 bits to be missed. In another case it is cast as a >> u32, causing the upper 32 bits to be cleared. >> >> This patch corrects those places by extending the access methods to >> 64 bits. >> >> Signed-off-by: Sam Bobroff >> --- >> >> v3: >> Adjust booke set/get xer to match book3s. >> >> v2: >> >> Also extend kvmppc_book3s_shadow_vcpu.xer to 64 bit. >> >> arch/powerpc/include/asm/kvm_book3s.h |4 ++-- >> arch/powerpc/include/asm/kvm_book3s_asm.h |2 +- >> arch/powerpc/include/asm/kvm_booke.h |4 ++-- >> arch/powerpc/kvm/book3s_hv_rmhandlers.S |6 +++--- >> arch/powerpc/kvm/book3s_segment.S |4 ++-- >> 5 files changed, 10 insertions(+), 10 deletions(-) >> >> diff --git a/arch/powerpc/include/asm/kvm_book3s.h >> b/arch/powerpc/include/asm/kvm_book3s.h >> index b91e74a..05a875a 100644 >> --- a/arch/powerpc/include/asm/kvm_book3s.h >> +++ b/arch/powerpc/include/asm/kvm_book3s.h >> @@ -225,12 +225,12 @@ static inline u32 kvmppc_get_cr(struct kvm_vcpu *vcpu) >> return vcpu->arch.cr; >> } >> >> -static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, u32 val) >> +static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, ulong val) >> { >> vcpu->arch.xer = val; >> } >> >> -static inline u32 kvmppc_get_xer(struct kvm_vcpu *vcpu) >> +static inline ulong kvmppc_get_xer(struct kvm_vcpu *vcpu) >> { >> return vcpu->arch.xer; >> } >> diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h >> b/arch/powerpc/include/asm/kvm_book3s_asm.h >> index 5bdfb5d..c4ccd2d 100644 >> --- a/arch/powerpc/include/asm/kvm_book3s_asm.h >> +++ b/arch/powerpc/include/asm/kvm_book3s_asm.h >> @@ -112,7 +112,7 @@ struct kvmppc_book3s_shadow_vcpu { >> bool in_use; >> ulong gpr[14]; >> u32 cr; >> -u32 xer; >> +ulong xer; >> ulong ctr; >> ulong lr; >> ulong pc; >> diff --git a/arch/powerpc/include/asm/kvm_booke.h >> b/arch/powerpc/include/asm/kvm_booke.h >> index 3286f0d..bc6e29e 100644 >> --- a/arch/powerpc/include/asm/kvm_booke.h >> +++ b/arch/powerpc/include/asm/kvm_booke.h >> @@ -54,12 +54,12 @@ static inline u32 kvmppc_get_cr(struct kvm_vcpu *vcpu) >> return vcpu->arch.cr; >> } >> >> -static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, u32 val) >> +static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, ulong val) >> { >> vcpu->arch.xer = val; >> } >> >> -static inline u32 kvmppc_get_xer(struct kvm_vcpu *vcpu) >> +static inline ulong kvmppc_get_xer(struct kvm_vcpu *vcpu) >> { >> return vcpu->arch.xer; >> } >> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S >> b/arch/powerpc/kvm/book3s_hv_rmhandlers.S >> index 4d70df2..d75be59 100644 >> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S >> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S >> @@ -870,7 +870,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S) >> blt hdec_soon >> >> ld r6, VCPU_CTR(r4) >> -lwz r7, VCPU_XER(r4) >> +ld r7, VCPU_XER(r4) >> >> mtctr r6 >> mtxer r7 >> @@ -1103,7 +1103,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) >> mfctr r3 >> mfxer r4 >> std r3, VCPU_CTR(r9) >> -stw r4, VCPU_XER(r9) >> +std r4, VCPU_XER(r9) >> >> /* If this is a page table miss then see if it's theirs or ours */ >> cmpwi r12, BOOK3S_INTERRUPT_H_DATA_STORAGE >> @@ -1675,7 +1675,7 @@ kvmppc_hdsi: >> bl kvmppc_msr_interrupt >> fast_interrupt_c_return: >> 6: ld r7, VCPU_CTR(r9) >> -lwz r8, VCPU_XER(r9) >> +ld r8, VCPU_XER(r9) >> mtctr r7 >> mtxer r8 >> mr r4, r9 >> diff --git a/arch/powerpc/kvm/book3s_segment.S >> b/arch/powerpc/kvm/book3s_segment.S >> index acee37c..ca8f174 100644 >> --- a/arch/powerpc/kvm/book3s_segment.S >> +++ b/arch/powerpc/kvm/book3s_segment.S >> @@ -123,7 +123,7 @@ no_dcbz32_on: >> PPC_LL r8, SVCPU_CTR(r3) >> PPC_LL r9, SVCPU_LR(r3) >> lwz r10, SVCPU_CR(r3) >> -lwz r11, SVCPU_XER(r3) >> +PPC_LL r11, SVCPU_XER(r3) >> >> mtctr r8 >> mtlrr9 >> @@ -237,7 +237,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE) >> mfctr r8 >> mflrr9 >> >> -stw
Re: [PATCH V2 3/6] powerpc/powernv: use one M64 BAR in Single PE mode for one VF BAR
On Thu, Aug 06, 2015 at 05:36:02PM +0800, Wei Yang wrote: >On Thu, Aug 06, 2015 at 03:20:25PM +1000, Gavin Shan wrote: >>On Wed, Aug 05, 2015 at 09:25:00AM +0800, Wei Yang wrote: >>>In current implementation, when VF BAR is bigger than 64MB, it uses 4 M64 >>>BAR in Single PE mode to cover the number of VFs required to be enabled. >>>By doing so, several VFs would be in one VF Group and leads to interference >>>between VFs in the same group. >>> >>>This patch changes the design by using one M64 BAR in Single PE mode for >>>one VF BAR. This gives absolute isolation for VFs. >>> >>>Signed-off-by: Wei Yang >>>--- >>> arch/powerpc/include/asm/pci-bridge.h |5 +- >>> arch/powerpc/platforms/powernv/pci-ioda.c | 180 >>> - >>> 2 files changed, 76 insertions(+), 109 deletions(-) >>> >>>diff --git a/arch/powerpc/include/asm/pci-bridge.h >>>b/arch/powerpc/include/asm/pci-bridge.h >>>index 712add5..8aeba4c 100644 >>>--- a/arch/powerpc/include/asm/pci-bridge.h >>>+++ b/arch/powerpc/include/asm/pci-bridge.h >>>@@ -214,10 +214,9 @@ struct pci_dn { >>> u16 vfs_expanded; /* number of VFs IOV BAR expanded */ >>> u16 num_vfs;/* number of VFs enabled*/ >>> int offset; /* PE# for the first VF PE */ >>>-#define M64_PER_IOV 4 >>>-int m64_per_iov; >>>+boolm64_single_mode;/* Use M64 BAR in Single Mode */ >>> #define IODA_INVALID_M64(-1) >>>-int m64_wins[PCI_SRIOV_NUM_BARS][M64_PER_IOV]; >>>+int (*m64_map)[PCI_SRIOV_NUM_BARS]; >> >>It can be explicit? For example: >> >> int *m64_map; >> >> /* Initialization */ >> size_t size = sizeof(*pdn->m64_map) * PCI_SRIOV_NUM_BARS * >> num_of_max_VFs; >> pdn->m64_map = kmalloc(size, GFP_KERNEL); >> for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) >> for (j = 0; j < num_of_max_VFs; j++) >> pdn->m64_map[i * PCI_SRIOV_NUM_BARS + j] = >> PNV_INVALID_M64; >> >> /* Destroy */ >> int step = 1; >> >> if (!pdn->m64_single_mode) >> step = phb->ioda.total_pe; >> for (i = 0; i < PCI_SRIOV_NUM_BARS * num_of_max_VFs; i += step) >> if (pdn->m64_map[i] == PNV_INVALID_M64) >> continue; >> >> /* Unmap the window */ >> > >The m64_map is a pointer to an array with 6 elements, which represents the 6 >M64 BAR index for the 6 VF BARs. > >When we use Shared Mode, one array is allocated. The six elements >represents the six M64 BAR(at most) used to map the whole IOV BAR. > >When we use Single Mode, num_vfs array is allocate. Each array represents >the map between one VF's BAR and M64 BAR index. > >During the map and un-map, M64 BAR is assigned one by one in VF BAR's order. >So I think the code is explicit. > >In your code, you allocate a big one dimension array to hold the M64 BAR >index. It works, while I don't think this is more explicit than original code. > When M64 is in Single Mode, array with (num_vfs * 6) entries is allocated because every VF BAR (6 at most) will have one corresponding PHB M64 BAR. Anything I missed? The point in my code is you needn't worry about the mode (single vs shared) As I said, not too much memory wasted. However, it's up to you. I'm not fan of "int (*m64_map)[PCI_SRIOV_NUM_BARS]". Instead, you can replace it with "int *m64_map" and calculate its size using following formula: sizeof(*pdn->m64_map) * PCI_SRIOV_NUM_BARS; sizeof(*pdn->m64_map) * PCI_SRIOV_NUM_BARS * num_vfs; >-- >Richard Yang >Help you, Help me ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V2 3/6] powerpc/powernv: use one M64 BAR in Single PE mode for one VF BAR
On 08/05/2015 11:25 AM, Wei Yang wrote: In current implementation, when VF BAR is bigger than 64MB, it uses 4 M64 BAR in Single PE mode to cover the number of VFs required to be enabled. By doing so, several VFs would be in one VF Group and leads to interference between VFs in the same group. This patch changes the design by using one M64 BAR in Single PE mode for one VF BAR. This gives absolute isolation for VFs. Signed-off-by: Wei Yang --- arch/powerpc/include/asm/pci-bridge.h |5 +- arch/powerpc/platforms/powernv/pci-ioda.c | 180 - 2 files changed, 76 insertions(+), 109 deletions(-) diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h index 712add5..8aeba4c 100644 --- a/arch/powerpc/include/asm/pci-bridge.h +++ b/arch/powerpc/include/asm/pci-bridge.h @@ -214,10 +214,9 @@ struct pci_dn { u16 vfs_expanded; /* number of VFs IOV BAR expanded */ u16 num_vfs;/* number of VFs enabled*/ int offset; /* PE# for the first VF PE */ -#define M64_PER_IOV 4 - int m64_per_iov; + boolm64_single_mode;/* Use M64 BAR in Single Mode */ #define IODA_INVALID_M64(-1) - int m64_wins[PCI_SRIOV_NUM_BARS][M64_PER_IOV]; + int (*m64_map)[PCI_SRIOV_NUM_BARS]; #endif /* CONFIG_PCI_IOV */ #endif struct list_head child_list; diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index 7192e62..f5d110c 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -1148,29 +1148,36 @@ static void pnv_pci_ioda_setup_PEs(void) } #ifdef CONFIG_PCI_IOV -static int pnv_pci_vf_release_m64(struct pci_dev *pdev) +static int pnv_pci_vf_release_m64(struct pci_dev *pdev, u16 num_vfs) { struct pci_bus*bus; struct pci_controller *hose; struct pnv_phb*phb; struct pci_dn *pdn; inti, j; + intm64_bars; bus = pdev->bus; hose = pci_bus_to_host(bus); phb = hose->private_data; pdn = pci_get_pdn(pdev); + if (pdn->m64_single_mode) + m64_bars = num_vfs; + else + m64_bars = 1; + for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) - for (j = 0; j < M64_PER_IOV; j++) { - if (pdn->m64_wins[i][j] == IODA_INVALID_M64) + for (j = 0; j < m64_bars; j++) { + if (pdn->m64_map[j][i] == IODA_INVALID_M64) continue; opal_pci_phb_mmio_enable(phb->opal_id, - OPAL_M64_WINDOW_TYPE, pdn->m64_wins[i][j], 0); - clear_bit(pdn->m64_wins[i][j], &phb->ioda.m64_bar_alloc); - pdn->m64_wins[i][j] = IODA_INVALID_M64; + OPAL_M64_WINDOW_TYPE, pdn->m64_map[j][i], 0); + clear_bit(pdn->m64_map[j][i], &phb->ioda.m64_bar_alloc); + pdn->m64_map[j][i] = IODA_INVALID_M64; } + kfree(pdn->m64_map); return 0; } @@ -1187,8 +1194,7 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs) inttotal_vfs; resource_size_tsize, start; intpe_num; - intvf_groups; - intvf_per_group; + intm64_bars; bus = pdev->bus; hose = pci_bus_to_host(bus); @@ -1196,26 +1202,26 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs) pdn = pci_get_pdn(pdev); total_vfs = pci_sriov_get_totalvfs(pdev); - /* Initialize the m64_wins to IODA_INVALID_M64 */ - for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) - for (j = 0; j < M64_PER_IOV; j++) - pdn->m64_wins[i][j] = IODA_INVALID_M64; + if (pdn->m64_single_mode) This is a physical function's @pdn, right? + m64_bars = num_vfs; + else + m64_bars = 1; + + pdn->m64_map = kmalloc(sizeof(*pdn->m64_map) * m64_bars, GFP_KERNEL); Assume we have SRIOV device with 16VF. So it was m64_wins[6][4], now it is (roughly speaking) m64_map[6][16] (for a single PE mode) or m64_map[6][1]. I believe m64_bars cannot be bigger than 16 on PHB3, right? Is this checked anywhere (does it have to)? This m64_wins -> m64_map change - is was not a map (what was it?), and it is, is not it? What does it store? An index of M64 BAR (0..15)? + if (!pdn->m64_map) + return -ENOMEM; + /* Initialize the m64_map to IODA_INVALID_M64 */ + for (i = 0; i < m64_bars ; i++) + for (j = 0; j < PCI_SRIOV_NUM_BARS; j++) + pdn->m64_map[i][j] = IODA_INV
Re: [PATCH V2 2/6] powerpc/powernv: simplify the calculation of iov resource
On Thu, Aug 06, 2015 at 07:00:00PM +1000, Alexey Kardashevskiy wrote: >On 08/06/2015 02:51 PM, Gavin Shan wrote: >>On Wed, Aug 05, 2015 at 09:24:59AM +0800, Wei Yang wrote: >>>The alignment of IOV BAR on PowerNV platform is the total size of the IOV >>>BAR. No matter whether the IOV BAR is truncated or not, the total size >>>could be calculated by (vfs_expanded * VF size). >>> >> >>s/VF size/VF BAR size >> >>I think the changelog would be more explicit: >> >>The alignment of IOV BAR on PowerNV platform is the total size of the >>IOV BAR, no matter whether the IOV BAR is extended with number of max >>VFs or number of max PE number (256). The alignment can be calculated >>by (vfs_expaned * VF_BAR_size). > > > >Is that really a PowerNV-specific requirement or it is valid for >every platform (I suspect this is the case here)? > Currently, it is PowerNV-specific. > >Also, what is the exact meaning of "expanded" in @vfs_expanded? It is >either 255 (if individual VF BARs are <= 64MB) or >roundup_pow_of_two(total_vfs) (which is something like 4 or 16). What >is expanded here? > PF's IOV BAR original size is (VF BAR size * total_vfs). After expanding, the IOV BAR size is (VF BAR size * 256) or (VF BAR size * roundup_pow_of_two(total_vfs)). > >> >>>This patch simplifies the pnv_pci_iov_resource_alignment() by removing the >>>first case. >>> >>>Signed-off-by: Wei Yang >> >>Reviewed-by: Gavin Shan >> >>>--- >>>arch/powerpc/platforms/powernv/pci-ioda.c | 14 +- >>>1 file changed, 9 insertions(+), 5 deletions(-) >>> >>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c >>>b/arch/powerpc/platforms/powernv/pci-ioda.c >>>index 9b41dba..7192e62 100644 >>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c >>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c >>>@@ -2987,12 +2987,16 @@ static resource_size_t >>>pnv_pci_iov_resource_alignment(struct pci_dev *pdev, >>> int resno) >>>{ >>> struct pci_dn *pdn = pci_get_pdn(pdev); >>>-resource_size_t align, iov_align; >>>- >>>-iov_align = resource_size(&pdev->resource[resno]); >>>-if (iov_align) >>>-return iov_align; >>>+resource_size_t align; >>> >>>+/* >>>+ * On PowerNV platform, IOV BAR is mapped by M64 BAR to enable the >>>+ * SR-IOV. While from hardware perspective, the range mapped by M64 >>>+ * BAR should be size aligned. >>>+ * >>>+ * This function return the total IOV BAR size if expanded or just the >>>+ * individual size if not. >>>+ */ >>> align = pci_iov_resource_size(pdev, resno); >>> if (pdn->vfs_expanded) >>> return pdn->vfs_expanded * align; >>>-- >>>1.7.9.5 >>> >> > > >-- >Alexey -- Richard Yang Help you, Help me ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V2 3/6] powerpc/powernv: use one M64 BAR in Single PE mode for one VF BAR
On Thu, Aug 06, 2015 at 03:20:25PM +1000, Gavin Shan wrote: >On Wed, Aug 05, 2015 at 09:25:00AM +0800, Wei Yang wrote: >>In current implementation, when VF BAR is bigger than 64MB, it uses 4 M64 >>BAR in Single PE mode to cover the number of VFs required to be enabled. >>By doing so, several VFs would be in one VF Group and leads to interference >>between VFs in the same group. >> >>This patch changes the design by using one M64 BAR in Single PE mode for >>one VF BAR. This gives absolute isolation for VFs. >> >>Signed-off-by: Wei Yang >>--- >> arch/powerpc/include/asm/pci-bridge.h |5 +- >> arch/powerpc/platforms/powernv/pci-ioda.c | 180 >> - >> 2 files changed, 76 insertions(+), 109 deletions(-) >> >>diff --git a/arch/powerpc/include/asm/pci-bridge.h >>b/arch/powerpc/include/asm/pci-bridge.h >>index 712add5..8aeba4c 100644 >>--- a/arch/powerpc/include/asm/pci-bridge.h >>+++ b/arch/powerpc/include/asm/pci-bridge.h >>@@ -214,10 +214,9 @@ struct pci_dn { >> u16 vfs_expanded; /* number of VFs IOV BAR expanded */ >> u16 num_vfs;/* number of VFs enabled*/ >> int offset; /* PE# for the first VF PE */ >>-#define M64_PER_IOV 4 >>- int m64_per_iov; >>+ boolm64_single_mode;/* Use M64 BAR in Single Mode */ >> #define IODA_INVALID_M64(-1) >>- int m64_wins[PCI_SRIOV_NUM_BARS][M64_PER_IOV]; >>+ int (*m64_map)[PCI_SRIOV_NUM_BARS]; > >It can be explicit? For example: > > int *m64_map; > > /* Initialization */ > size_t size = sizeof(*pdn->m64_map) * PCI_SRIOV_NUM_BARS * > num_of_max_VFs; > pdn->m64_map = kmalloc(size, GFP_KERNEL); > for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) > for (j = 0; j < num_of_max_VFs; j++) > pdn->m64_map[i * PCI_SRIOV_NUM_BARS + j] = > PNV_INVALID_M64; > > /* Destroy */ > int step = 1; > > if (!pdn->m64_single_mode) > step = phb->ioda.total_pe; > for (i = 0; i < PCI_SRIOV_NUM_BARS * num_of_max_VFs; i += step) > if (pdn->m64_map[i] == PNV_INVALID_M64) > continue; > > /* Unmap the window */ > The m64_map is a pointer to an array with 6 elements, which represents the 6 M64 BAR index for the 6 VF BARs. When we use Shared Mode, one array is allocated. The six elements represents the six M64 BAR(at most) used to map the whole IOV BAR. When we use Single Mode, num_vfs array is allocate. Each array represents the map between one VF's BAR and M64 BAR index. During the map and un-map, M64 BAR is assigned one by one in VF BAR's order. So I think the code is explicit. In your code, you allocate a big one dimension array to hold the M64 BAR index. It works, while I don't think this is more explicit than original code. -- Richard Yang Help you, Help me ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V2 2/6] powerpc/powernv: simplify the calculation of iov resource
On 08/06/2015 02:51 PM, Gavin Shan wrote: On Wed, Aug 05, 2015 at 09:24:59AM +0800, Wei Yang wrote: The alignment of IOV BAR on PowerNV platform is the total size of the IOV BAR. No matter whether the IOV BAR is truncated or not, the total size could be calculated by (vfs_expanded * VF size). s/VF size/VF BAR size I think the changelog would be more explicit: The alignment of IOV BAR on PowerNV platform is the total size of the IOV BAR, no matter whether the IOV BAR is extended with number of max VFs or number of max PE number (256). The alignment can be calculated by (vfs_expaned * VF_BAR_size). Is that really a PowerNV-specific requirement or it is valid for every platform (I suspect this is the case here)? Also, what is the exact meaning of "expanded" in @vfs_expanded? It is either 255 (if individual VF BARs are <= 64MB) or roundup_pow_of_two(total_vfs) (which is something like 4 or 16). What is expanded here? This patch simplifies the pnv_pci_iov_resource_alignment() by removing the first case. Signed-off-by: Wei Yang Reviewed-by: Gavin Shan --- arch/powerpc/platforms/powernv/pci-ioda.c | 14 +- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index 9b41dba..7192e62 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -2987,12 +2987,16 @@ static resource_size_t pnv_pci_iov_resource_alignment(struct pci_dev *pdev, int resno) { struct pci_dn *pdn = pci_get_pdn(pdev); - resource_size_t align, iov_align; - - iov_align = resource_size(&pdev->resource[resno]); - if (iov_align) - return iov_align; + resource_size_t align; + /* +* On PowerNV platform, IOV BAR is mapped by M64 BAR to enable the +* SR-IOV. While from hardware perspective, the range mapped by M64 +* BAR should be size aligned. +* +* This function return the total IOV BAR size if expanded or just the +* individual size if not. +*/ align = pci_iov_resource_size(pdev, resno); if (pdn->vfs_expanded) return pdn->vfs_expanded * align; -- 1.7.9.5 -- Alexey ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V2 1/6] powerpc/powernv: don't enable SRIOV when VF BAR contains non M64 BAR
On 08/06/2015 04:57 PM, Gavin Shan wrote: On Thu, Aug 06, 2015 at 04:10:21PM +1000, Alexey Kardashevskiy wrote: On 08/06/2015 02:35 PM, Gavin Shan wrote: On Wed, Aug 05, 2015 at 09:24:58AM +0800, Wei Yang wrote: On PHB_IODA2, we enable SRIOV devices by mapping IOV BAR with M64 BARs. If a SRIOV device's BAR is not 64-bit prefetchable, this is not assigned from M64 windwo, which means M64 BAR can't work on it. The proper text would be something like this: === SRIOV only supports 64bit MMIO. So if we fail to assign 64bit BAR, we cannot enable the device. === s/PHB_IODA2/PHB3 No, it is IODA2. OPEL does PHB3-specific bits, the host kernel just uses OPAL. Ok. s/windwo/window This patch makes this explicit. Signed-off-by: Wei Yang The idea sounds right, but there is one question as below. --- arch/powerpc/platforms/powernv/pci-ioda.c | 25 + 1 file changed, 9 insertions(+), 16 deletions(-) diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index 5738d31..9b41dba 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -908,9 +908,6 @@ static int pnv_pci_vf_resource_shift(struct pci_dev *dev, int offset) if (!res->flags || !res->parent) continue; - if (!pnv_pci_is_mem_pref_64(res->flags)) - continue; - /* * The actual IOV BAR range is determined by the start address * and the actual size for num_vfs VFs BAR. This check is to @@ -939,9 +936,6 @@ static int pnv_pci_vf_resource_shift(struct pci_dev *dev, int offset) if (!res->flags || !res->parent) continue; - if (!pnv_pci_is_mem_pref_64(res->flags)) - continue; - size = pci_iov_resource_size(dev, i + PCI_IOV_RESOURCES); res2 = *res; res->start += size * offset; @@ -1221,9 +1215,6 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs) if (!res->flags || !res->parent) continue; - if (!pnv_pci_is_mem_pref_64(res->flags)) - continue; - for (j = 0; j < vf_groups; j++) { do { win = find_next_zero_bit(&phb->ioda.m64_bar_alloc, @@ -1510,6 +1501,12 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs) pdn = pci_get_pdn(pdev); if (phb->type == PNV_PHB_IODA2) { + if (!pdn->vfs_expanded) { + dev_info(&pdev->dev, "don't support this SRIOV device" + " with non M64 VF BAR\n"); + return -EBUSY; + } + It would be -ENOSPC since -EBUSY indicates the devices (VFs) are temparily unavailable. For this case, the VFs are permanently unavailable because of running out of space to accomodate M64 and non-M64 VF BARs. The error message could be printed with dev_warn() and it would be precise as below or something else you prefer: dev_warn(&pdev->dev, "SRIOV not supported because of non-M64 VF BAR\n"); Both messages are cryptic. If it is not M64 BAR, then what is it? It is always in one of M64 BARs (in the worst case - BAR#15?), the difference is if it is segmented or not, no? The VF BAR could be one of IO, M32, M64. If it's not M64, the VFs are supposed to be disabled and the (IO and M32) resources won't be allocted, but for sure, the IO and M32 resources can't be put into any one of the 16 PHB's M64 BARs. would you recommend one better message then? dev_warn(&pdev->dev, "SRIOV is disabled as no space is left in 64bit MMIO window\n"); Or it is not "MMIO window"? /* Calculate available PE for required VFs */ mutex_lock(&phb->ioda.pe_alloc_mutex); pdn->offset = bitmap_find_next_zero_area( @@ -2774,9 +2771,10 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev) if (!res->flags || res->parent) continue; if (!pnv_pci_is_mem_pref_64(res->flags)) { - dev_warn(&pdev->dev, " non M64 VF BAR%d: %pR\n", + dev_warn(&pdev->dev, "Don't support SR-IOV with" + " non M64 VF BAR%d: %pR. \n", i, res); - continue; + return; } size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES); @@ -2795,11 +2793,6 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev) res = &pdev->resource[i + PCI_IOV_RESOURCES]; if (!res->flags || res->parent) continue; - if (!pnv_pci_is_mem_pref_64(res->flags)) { -
Re: [PATCH v3 3/3] mm: use numa_mem_id() in alloc_pages_node()
On 07/30/2015 07:41 PM, Johannes Weiner wrote: On Thu, Jul 30, 2015 at 06:34:31PM +0200, Vlastimil Babka wrote: numa_mem_id() is able to handle allocation from CPUs on memory-less nodes, so it's a more robust fallback than the currently used numa_node_id(). Won't it fall through to the next closest memory node in the zonelist anyway? Right, I would expect the zonelist of memoryless node to be the same as of the closest node. Documentation/vm/numa seems to agree. Is this for callers doing NUMA_NO_NODE with __GFP_THISZONE? I guess that's the only scenario where that matters, yeah. And there might well be no such caller now, but maybe some will sneak in without the author testing on a system with memoryless node. Note that with !CONFIG_HAVE_MEMORYLESS_NODES, numa_mem_id() just does numa_node_id(). So yeah I think "a more robust fallback" is correct :) But let's put it explicitly in changelog then: 8< alloc_pages_node() might fail when called with NUMA_NO_NODE and __GFP_THISNODE on a CPU belonging to a memoryless node. To make the local-node fallback more robust and prevent such situations, use numa_mem_id(), which was introduced for similar scenarios in the slab context. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev