Re: RFC: Reducing the number of non volatile GPRs in the ppc64 kernel

2015-08-06 Thread Bill Schmidt

I agree with Segher.  We already know we have opportunities to do a better
job with shrink-wrapping (pushing this kind of useless activity down past
early exits), so having examples of code to look at to improve this would
be useful.

-- Bill

Bill Schmidt, Ph.D.
Linux on Power Toolchain
IBM Linux Technology Center
wschm...@us.ibm.com   (507) 319-6873




From:   Segher Boessenkool 
To: Anton Blanchard 
Cc: linuxppc-dev@lists.ozlabs.org, Michael
Gschwind/Watson/IBM@IBMUS, Alan Modra , Bill
Schmidt/Rochester/IBM@IBMUS, Ulrich Weigand
, pau...@samba.org
Date:   08/05/2015 06:20 AM
Subject:Re: RFC: Reducing the number of non volatile GPRs in the ppc64
kernel



Hi Anton,

On Wed, Aug 05, 2015 at 02:03:00PM +1000, Anton Blanchard wrote:
> While looking at traces of kernel workloads, I noticed places where gcc
> used a large number of non volatiles. Some of these functions
> did very little work, and we spent most of our time saving the
> non volatiles to the stack and reading them back.

That is something that should be fixed in GCC -- do you have an example
of such a function?

> It made me wonder if we have the right ratio of volatile to non
> volatile GPRs. Since the kernel is completely self contained, we could
> potentially change that ratio.
>
> Attached is a quick hack to gcc and the kernel to decrease the number
> of non volatile GPRs to 8. I'm not sure if this is a good idea (and if
> the volatile to non volatile ratio is right), but this gives us
> something to play with.

Instead of the GCC hack you can add a bunch of -fcall-used-r14 etc.
options; does that not work for you?


Segher


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V2 6/6] powerpc/powernv: allocate discrete PE# when using M64 BAR in Single PE mode

2015-08-06 Thread Wei Yang
On Fri, Aug 07, 2015 at 03:54:48PM +1000, Gavin Shan wrote:
>On Fri, Aug 07, 2015 at 01:44:33PM +0800, Wei Yang wrote:
>>On Fri, Aug 07, 2015 at 01:43:01PM +1000, Gavin Shan wrote:
>>>On Fri, Aug 07, 2015 at 10:33:33AM +0800, Wei Yang wrote:
On Fri, Aug 07, 2015 at 11:36:56AM +1000, Gavin Shan wrote:
>On Thu, Aug 06, 2015 at 09:41:41PM +0800, Wei Yang wrote:
>>On Thu, Aug 06, 2015 at 03:36:01PM +1000, Gavin Shan wrote:
>>>On Wed, Aug 05, 2015 at 09:25:03AM +0800, Wei Yang wrote:
When M64 BAR is set to Single PE mode, the PE# assigned to VF could be
discrete.

This patch restructures the patch to allocate discrete PE# for VFs when 
M64
BAR is set to Single PE mode.

Signed-off-by: Wei Yang 
---
 arch/powerpc/include/asm/pci-bridge.h |2 +-
 arch/powerpc/platforms/powernv/pci-ioda.c |   69 
 +
 2 files changed, 51 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h 
b/arch/powerpc/include/asm/pci-bridge.h
index 8aeba4c..72415c7 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -213,7 +213,7 @@ struct pci_dn {
 #ifdef CONFIG_PCI_IOV
u16 vfs_expanded;   /* number of VFs IOV BAR 
 expanded */
u16 num_vfs;/* number of VFs enabled*/
-   int offset; /* PE# for the first VF PE */
+   int *offset;/* PE# for the first VF PE or 
array */
boolm64_single_mode;/* Use M64 BAR in Single Mode */
 #define IODA_INVALID_M64(-1)
int (*m64_map)[PCI_SRIOV_NUM_BARS];
>>>
>>>how about renaming "offset" to "pe_num_map", or "pe_map" ? Similar to 
>>>the comments
>>>I gave to the "m64_bar_map", num_of_max_vfs entries can be allocated. 
>>>Though not
>>>all of them will be used, not too much memory will be wasted.
>>>
>>
>>Thanks for your comment.
>>
>>I have thought about change the name to make it more self explain. While
>>another fact I want to take in is this field is also used to be reflect 
>>the
>>shift offset when M64 BAR is used in the Shared Mode. So I maintain the 
>>name.
>>
>>How about use "enum", one maintain the name "offset", and another one 
>>rename to
>>"pe_num_map". And use the meaningful name at proper place?
>>

So I suppose you agree with my naming proposal.

>>>
>>>No, I dislike the "enum" things.
>>>
>>
>>OK, then you suggest to rename it pe_num_map or keep it as offset?
>>
>
>pe_num_map would be better.
>
>
>Ok. I'm explaining it with more details. There are two cases: single vs 
>shared
>mode. When PHB M64 BARs run in single mode, you need an array to track the
>allocated discrete PE#. The VF_index is the index to the array. When PHB 
>M64
>BARs run in shared mode, you need continuous PE#. No array required for 
>this
>case. Instead, the starting PE# should be stored to somewhere, which can
>be pdn->offset[0] simply.
>
>So when allocating memory for this array, you just simply allocate 
>(sizeof(*pdn->offset)
>*max_vf_num) no matter what mode PHB's M64 BARs will run in. The point is 
>nobody
>can enable (max_vf_num + 1) VFs.

The max_vf_num is 15?

>>>
>>>I don't understand why you said: the max_vf_num is 15. Since max_vf_num is 
>>>variable
>>>on different PFs, how can it be fixed value - 15 ?
>>>
>>
>>In Shared PE case, only one int to indicate the start PE# is fine.
>>In Single PE mode, we totally could enable 15 VF, the same number of PEs for
>>each VF, which is limited by the number M64 BARs we have in the system.
>>
>>If not, the number you expected is total_vfs?
>>
>
>then it should be min(total_vfs, phb->ioda.m64_bar_idx), isn't it? 
>
>
>With above way, the arrays for PE# and M64 BAR remapping needn't be 
>allocated
>when enabling SRIOV capability and releasing on disabling SRIOV capability.
>Instead, those two arrays can be allocated during resource fixup time and 
>free'ed
>when destroying the pdn.
>

My same point of view like previous, if the memory is not in the concern, 
how
about define them static?

>>>
>>>It's a bad idea from my review. How many entries this array is going to have?
>>>256 * NUM_OF_MAX_VF_BARS ?
>>>
>>
>>No.
>>
>>It has 15 * 6, 15 VFs we could enable at most and 6 VF BARs a VF could have at
>>most.
>>
>
>It's min(total_vfs, phb->ioda.m64_bar_idx) VFs that can be enabled at maximal
>degree, no?
>

Yes, you are right. The number 15 is the one I used when the field is static.
If we want to allocate it dynamically, we need to choose the smaller one.

While I suggest 

[PATCH 5/5] powerpc/mm: Drop CONFIG_PPC_HAS_HASH_64K

2015-08-06 Thread Michael Ellerman
The relation between CONFIG_PPC_HAS_HASH_64K and CONFIG_PPC_64K_PAGES is
painfully complicated.

But if we rearrange it enough we can see that PPC_HAS_HASH_64K
essentially depends on PPC_STD_MMU_64 && PPC_64K_PAGES.

We can then notice that PPC_HAS_HASH_64K is used in files that are only
built for PPC_STD_MMU_64, meaning it's equivalent to PPC_64K_PAGES.

So replace all uses and drop it.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/Kconfig|  6 --
 arch/powerpc/mm/hash_low_64.S   |  4 ++--
 arch/powerpc/mm/hash_utils_64.c | 12 ++--
 3 files changed, 8 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 3a4ba2809201..1e69dee29be3 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -514,11 +514,6 @@ config NODES_SPAN_OTHER_NODES
def_bool y
depends on NEED_MULTIPLE_NODES
 
-config PPC_HAS_HASH_64K
-   bool
-   depends on PPC64
-   default n
-
 config STDBINUTILS
bool "Using standard binutils settings"
depends on 44x
@@ -566,7 +561,6 @@ config PPC_16K_PAGES
 config PPC_64K_PAGES
bool "64k page size"
depends on !PPC_FSL_BOOK3E && (44x || PPC_STD_MMU_64 || PPC_BOOK3E_64)
-   select PPC_HAS_HASH_64K if PPC_STD_MMU_64
 
 config PPC_256K_PAGES
bool "256k page size"
diff --git a/arch/powerpc/mm/hash_low_64.S b/arch/powerpc/mm/hash_low_64.S
index 463174a4a647..3b49e3295901 100644
--- a/arch/powerpc/mm/hash_low_64.S
+++ b/arch/powerpc/mm/hash_low_64.S
@@ -701,7 +701,7 @@ htab_pte_insert_failure:
 
 #endif /* CONFIG_PPC_64K_PAGES */
 
-#ifdef CONFIG_PPC_HAS_HASH_64K
+#ifdef CONFIG_PPC_64K_PAGES
 
 /*
  *   *
@@ -993,7 +993,7 @@ ht64_pte_insert_failure:
b   ht64_bail
 
 
-#endif /* CONFIG_PPC_HAS_HASH_64K */
+#endif /* CONFIG_PPC_64K_PAGES */
 
 
 /*
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 5ec987f65b2c..aee70171355b 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -640,7 +640,7 @@ extern u32 ht64_call_hpte_updatepp[];
 
 static void __init htab_finish_init(void)
 {
-#ifdef CONFIG_PPC_HAS_HASH_64K
+#ifdef CONFIG_PPC_64K_PAGES
patch_branch(ht64_call_hpte_insert1,
ppc_function_entry(ppc_md.hpte_insert),
BRANCH_SET_LINK);
@@ -653,7 +653,7 @@ static void __init htab_finish_init(void)
patch_branch(ht64_call_hpte_updatepp,
ppc_function_entry(ppc_md.hpte_updatepp),
BRANCH_SET_LINK);
-#endif /* CONFIG_PPC_HAS_HASH_64K */
+#endif /* CONFIG_PPC_64K_PAGES */
 
patch_branch(htab_call_hpte_insert1,
ppc_function_entry(ppc_md.hpte_insert),
@@ -1151,12 +1151,12 @@ int hash_page_mm(struct mm_struct *mm, unsigned long ea,
check_paca_psize(ea, mm, psize, user_region);
 #endif /* CONFIG_PPC_64K_PAGES */
 
-#ifdef CONFIG_PPC_HAS_HASH_64K
+#ifdef CONFIG_PPC_64K_PAGES
if (psize == MMU_PAGE_64K)
rc = __hash_page_64K(ea, access, vsid, ptep, trap,
 flags, ssize);
else
-#endif /* CONFIG_PPC_HAS_HASH_64K */
+#endif /* CONFIG_PPC_64K_PAGES */
{
int spp = subpage_protection(mm, ea);
if (access & spp)
@@ -1264,12 +1264,12 @@ void hash_preload(struct mm_struct *mm, unsigned long 
ea,
update_flags |= HPTE_LOCAL_UPDATE;
 
/* Hash it in */
-#ifdef CONFIG_PPC_HAS_HASH_64K
+#ifdef CONFIG_PPC_64K_PAGES
if (mm->context.user_psize == MMU_PAGE_64K)
rc = __hash_page_64K(ea, access, vsid, ptep, trap,
 update_flags, ssize);
else
-#endif /* CONFIG_PPC_HAS_HASH_64K */
+#endif /* CONFIG_PPC_64K_PAGES */
rc = __hash_page_4K(ea, access, vsid, ptep, trap, update_flags,
ssize, subpage_protection(mm, ea));
 
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 4/5] powerpc/mm: Simplify page size kconfig dependencies

2015-08-06 Thread Michael Ellerman
For config options with only a single value, guarding the single value
with 'if' is the same as adding a 'depends' statement. And it's more
standard to just use 'depends'.

And if the option has both an 'if' guard and a 'depends' we can collapse
them into a single 'depends' by combining them with &&.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/Kconfig | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 5ef27113b898..3a4ba2809201 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -560,16 +560,17 @@ config PPC_4K_PAGES
bool "4k page size"
 
 config PPC_16K_PAGES
-   bool "16k page size" if 44x || PPC_8xx
+   bool "16k page size"
+   depends on 44x || PPC_8xx
 
 config PPC_64K_PAGES
-   bool "64k page size" if 44x || PPC_STD_MMU_64 || PPC_BOOK3E_64
-   depends on !PPC_FSL_BOOK3E
+   bool "64k page size"
+   depends on !PPC_FSL_BOOK3E && (44x || PPC_STD_MMU_64 || PPC_BOOK3E_64)
select PPC_HAS_HASH_64K if PPC_STD_MMU_64
 
 config PPC_256K_PAGES
-   bool "256k page size" if 44x
-   depends on !STDBINUTILS
+   bool "256k page size"
+   depends on 44x && !STDBINUTILS
help
  Make the page size 256k.
 
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 3/5] powerpc/mm: Drop the 64K on 4K version of pte_pagesize_index()

2015-08-06 Thread Michael Ellerman
Now that support for 64k pages with a 4K kernel is removed, this code is
unreachable.

CONFIG_PPC_HAS_HASH_64K can only be true when CONFIG_PPC_64K_PAGES is
also true.

But when CONFIG_PPC_64K_PAGES is true we include pte-hash64.h which
includes pte-hash64-64k.h, which defines both pte_pagesize_index() and
crucially __real_pte, which means this defintion can never be used.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/include/asm/pgtable-ppc64.h | 12 
 1 file changed, 12 deletions(-)

diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h 
b/arch/powerpc/include/asm/pgtable-ppc64.h
index 7ee2300ee392..fa1dfb7f7b48 100644
--- a/arch/powerpc/include/asm/pgtable-ppc64.h
+++ b/arch/powerpc/include/asm/pgtable-ppc64.h
@@ -134,23 +134,11 @@
 
 #define pte_iterate_hashed_end() } while(0)
 
-#ifdef CONFIG_PPC_HAS_HASH_64K
 /*
  * We expect this to be called only for user addresses or kernel virtual
  * addresses other than the linear mapping.
  */
-#define pte_pagesize_index(mm, addr, pte)  \
-   ({  \
-   unsigned int psize; \
-   if (is_kernel_addr(addr))   \
-   psize = MMU_PAGE_4K;\
-   else\
-   psize = get_slice_psize(mm, addr);  \
-   psize;  \
-   })
-#else
 #define pte_pagesize_index(mm, addr, pte)  MMU_PAGE_4K
-#endif
 
 #endif /* __real_pte */
 
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 2/5] powerpc/cell: Drop support for 64K local store on 4K kernels

2015-08-06 Thread Michael Ellerman
Back in the olden days we added support for using 64K pages to map the
SPU (Synergistic Processing Unit) local store on Cell, when the main
kernel was using 4K pages.

This was useful at the time because distros were using 4K pages, but
using 64K pages on the SPUs could reduce TLB pressure there.

However these days the number of Cell users is approaching zero, and
supporting this option adds unpleasant complexity to the memory
management code.

So drop the option, CONFIG_SPU_FS_64K_LS, and all related code.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/include/asm/spu_csa.h  |   6 --
 arch/powerpc/mm/hugetlbpage.c   |   8 --
 arch/powerpc/platforms/cell/Kconfig |  15 ---
 arch/powerpc/platforms/cell/spufs/file.c|  55 ---
 arch/powerpc/platforms/cell/spufs/lscsa_alloc.c | 124 +---
 5 files changed, 2 insertions(+), 206 deletions(-)

diff --git a/arch/powerpc/include/asm/spu_csa.h 
b/arch/powerpc/include/asm/spu_csa.h
index a40fd491250c..51f80b41cda3 100644
--- a/arch/powerpc/include/asm/spu_csa.h
+++ b/arch/powerpc/include/asm/spu_csa.h
@@ -241,12 +241,6 @@ struct spu_priv2_collapsed {
  */
 struct spu_state {
struct spu_lscsa *lscsa;
-#ifdef CONFIG_SPU_FS_64K_LS
-   int use_big_pages;
-   /* One struct page per 64k page */
-#define SPU_LSCSA_NUM_BIG_PAGES(sizeof(struct spu_lscsa) / 0x1)
-   struct page *lscsa_pages[SPU_LSCSA_NUM_BIG_PAGES];
-#endif
struct spu_problem_collapsed prob;
struct spu_priv1_collapsed priv1;
struct spu_priv2_collapsed priv2;
diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index bb0bd7025cb8..06c14523b787 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -808,14 +808,6 @@ static int __init add_huge_page_size(unsigned long long 
size)
if ((mmu_psize = shift_to_mmu_psize(shift)) < 0)
return -EINVAL;
 
-#ifdef CONFIG_SPU_FS_64K_LS
-   /* Disable support for 64K huge pages when 64K SPU local store
-* support is enabled as the current implementation conflicts.
-*/
-   if (shift == PAGE_SHIFT_64K)
-   return -EINVAL;
-#endif /* CONFIG_SPU_FS_64K_LS */
-
BUG_ON(mmu_psize_defs[mmu_psize].shift != shift);
 
/* Return if huge page size has already been setup */
diff --git a/arch/powerpc/platforms/cell/Kconfig 
b/arch/powerpc/platforms/cell/Kconfig
index 2f23133ab3d1..b0ac1773cea6 100644
--- a/arch/powerpc/platforms/cell/Kconfig
+++ b/arch/powerpc/platforms/cell/Kconfig
@@ -57,21 +57,6 @@ config SPU_FS
  Units on machines implementing the Broadband Processor
  Architecture.
 
-config SPU_FS_64K_LS
-   bool "Use 64K pages to map SPE local  store"
-   # we depend on PPC_MM_SLICES for now rather than selecting
-   # it because we depend on hugetlbfs hooks being present. We
-   # will fix that when the generic code has been improved to
-   # not require hijacking hugetlbfs hooks.
-   depends on SPU_FS && PPC_MM_SLICES && !PPC_64K_PAGES
-   default y
-   select PPC_HAS_HASH_64K
-   help
- This option causes SPE local stores to be mapped in process
- address spaces using 64K pages while the rest of the kernel
- uses 4K pages. This can improve performances of applications
- using multiple SPEs by lowering the TLB pressure on them.
-
 config SPU_BASE
bool
default n
diff --git a/arch/powerpc/platforms/cell/spufs/file.c 
b/arch/powerpc/platforms/cell/spufs/file.c
index d966bbe58b8f..5038fd578e65 100644
--- a/arch/powerpc/platforms/cell/spufs/file.c
+++ b/arch/powerpc/platforms/cell/spufs/file.c
@@ -239,23 +239,6 @@ spufs_mem_mmap_fault(struct vm_area_struct *vma, struct 
vm_fault *vmf)
unsigned long address = (unsigned long)vmf->virtual_address;
unsigned long pfn, offset;
 
-#ifdef CONFIG_SPU_FS_64K_LS
-   struct spu_state *csa = &ctx->csa;
-   int psize;
-
-   /* Check what page size we are using */
-   psize = get_slice_psize(vma->vm_mm, address);
-
-   /* Some sanity checking */
-   BUG_ON(csa->use_big_pages != (psize == MMU_PAGE_64K));
-
-   /* Wow, 64K, cool, we need to align the address though */
-   if (csa->use_big_pages) {
-   BUG_ON(vma->vm_start & 0x);
-   address &= ~0xul;
-   }
-#endif /* CONFIG_SPU_FS_64K_LS */
-
offset = vmf->pgoff << PAGE_SHIFT;
if (offset >= LS_SIZE)
return VM_FAULT_SIGBUS;
@@ -310,22 +293,6 @@ static const struct vm_operations_struct 
spufs_mem_mmap_vmops = {
 
 static int spufs_mem_mmap(struct file *file, struct vm_area_struct *vma)
 {
-#ifdef CONFIG_SPU_FS_64K_LS
-   struct spu_context  *ctx = file->private_data;
-   struct spu_state*csa = &ctx->csa;
-
-   /* Sanity check VMA alignment */
-   if (csa->use_big_pages) {
-   pr_debug(

[PATCH 1/5] powerpc/mm: Fix pte_pagesize_index() crash on 4K w/64K hash

2015-08-06 Thread Michael Ellerman
The powerpc kernel can be built to have either a 4K PAGE_SIZE or a 64K
PAGE_SIZE.

However when built with a 4K PAGE_SIZE there is an additional config
option which can be enabled, PPC_HAS_HASH_64K, which means the kernel
also knows how to hash a 64K page even though the base PAGE_SIZE is 4K.

This is used in one obscure configuration, to support 64K pages for SPU
local store on the Cell processor when the rest of the kernel is using
4K pages.

In this configuration, pte_pagesize_index() is defined to just pass
through its arguments to get_slice_psize(). However pte_pagesize_index()
is called for both user and kernel addresses, whereas get_slice_psize()
only knows how to handle user addresses.

This has been broken forever, however until recently it happened to
work. That was because in get_slice_psize() the large kernel address
would cause the right shift of the slice mask to return zero.

However in commit 7aa0727f3302 "powerpc/mm: Increase the slice range to
64TB", the get_slice_psize() code was changed so that instead of a right
shift we do an array lookup based on the address. When passed a kernel
address this means we index way off the end of the slice array and
return random junk.

That is only fatal if we happen to hit something non-zero, but when we
do return a non-zero value we confuse the MMU code and eventually cause
a check stop.

This fix is ugly, but simple. When we're called for a kernel address we
return 4K, which is always correct in this configuration, otherwise we
use the slice mask.

Fixes: 7aa0727f3302 ("powerpc/mm: Increase the slice range to 64TB")
Reported-by: Cyril Bur 
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/include/asm/pgtable-ppc64.h | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h 
b/arch/powerpc/include/asm/pgtable-ppc64.h
index 3bb7488bd24b..7ee2300ee392 100644
--- a/arch/powerpc/include/asm/pgtable-ppc64.h
+++ b/arch/powerpc/include/asm/pgtable-ppc64.h
@@ -135,7 +135,19 @@
 #define pte_iterate_hashed_end() } while(0)
 
 #ifdef CONFIG_PPC_HAS_HASH_64K
-#define pte_pagesize_index(mm, addr, pte)  get_slice_psize(mm, addr)
+/*
+ * We expect this to be called only for user addresses or kernel virtual
+ * addresses other than the linear mapping.
+ */
+#define pte_pagesize_index(mm, addr, pte)  \
+   ({  \
+   unsigned int psize; \
+   if (is_kernel_addr(addr))   \
+   psize = MMU_PAGE_4K;\
+   else\
+   psize = get_slice_psize(mm, addr);  \
+   psize;  \
+   })
 #else
 #define pte_pagesize_index(mm, addr, pte)  MMU_PAGE_4K
 #endif
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V2 6/6] powerpc/powernv: allocate discrete PE# when using M64 BAR in Single PE mode

2015-08-06 Thread Gavin Shan
On Fri, Aug 07, 2015 at 01:44:33PM +0800, Wei Yang wrote:
>On Fri, Aug 07, 2015 at 01:43:01PM +1000, Gavin Shan wrote:
>>On Fri, Aug 07, 2015 at 10:33:33AM +0800, Wei Yang wrote:
>>>On Fri, Aug 07, 2015 at 11:36:56AM +1000, Gavin Shan wrote:
On Thu, Aug 06, 2015 at 09:41:41PM +0800, Wei Yang wrote:
>On Thu, Aug 06, 2015 at 03:36:01PM +1000, Gavin Shan wrote:
>>On Wed, Aug 05, 2015 at 09:25:03AM +0800, Wei Yang wrote:
>>>When M64 BAR is set to Single PE mode, the PE# assigned to VF could be
>>>discrete.
>>>
>>>This patch restructures the patch to allocate discrete PE# for VFs when 
>>>M64
>>>BAR is set to Single PE mode.
>>>
>>>Signed-off-by: Wei Yang 
>>>---
>>> arch/powerpc/include/asm/pci-bridge.h |2 +-
>>> arch/powerpc/platforms/powernv/pci-ioda.c |   69 
>>> +
>>> 2 files changed, 51 insertions(+), 20 deletions(-)
>>>
>>>diff --git a/arch/powerpc/include/asm/pci-bridge.h 
>>>b/arch/powerpc/include/asm/pci-bridge.h
>>>index 8aeba4c..72415c7 100644
>>>--- a/arch/powerpc/include/asm/pci-bridge.h
>>>+++ b/arch/powerpc/include/asm/pci-bridge.h
>>>@@ -213,7 +213,7 @@ struct pci_dn {
>>> #ifdef CONFIG_PCI_IOV
>>> u16 vfs_expanded;   /* number of VFs IOV BAR 
>>> expanded */
>>> u16 num_vfs;/* number of VFs enabled*/
>>>-int offset; /* PE# for the first VF PE */
>>>+int *offset;/* PE# for the first VF PE or 
>>>array */
>>> boolm64_single_mode;/* Use M64 BAR in Single Mode */
>>> #define IODA_INVALID_M64(-1)
>>> int (*m64_map)[PCI_SRIOV_NUM_BARS];
>>
>>how about renaming "offset" to "pe_num_map", or "pe_map" ? Similar to the 
>>comments
>>I gave to the "m64_bar_map", num_of_max_vfs entries can be allocated. 
>>Though not
>>all of them will be used, not too much memory will be wasted.
>>
>
>Thanks for your comment.
>
>I have thought about change the name to make it more self explain. While
>another fact I want to take in is this field is also used to be reflect the
>shift offset when M64 BAR is used in the Shared Mode. So I maintain the 
>name.
>
>How about use "enum", one maintain the name "offset", and another one 
>rename to
>"pe_num_map". And use the meaningful name at proper place?
>
>>>
>>>So I suppose you agree with my naming proposal.
>>>
>>
>>No, I dislike the "enum" things.
>>
>
>OK, then you suggest to rename it pe_num_map or keep it as offset?
>

pe_num_map would be better.


Ok. I'm explaining it with more details. There are two cases: single vs 
shared
mode. When PHB M64 BARs run in single mode, you need an array to track the
allocated discrete PE#. The VF_index is the index to the array. When PHB M64
BARs run in shared mode, you need continuous PE#. No array required for this
case. Instead, the starting PE# should be stored to somewhere, which can
be pdn->offset[0] simply.

So when allocating memory for this array, you just simply allocate 
(sizeof(*pdn->offset)
*max_vf_num) no matter what mode PHB's M64 BARs will run in. The point is 
nobody
can enable (max_vf_num + 1) VFs.
>>>
>>>The max_vf_num is 15?
>>>
>>
>>I don't understand why you said: the max_vf_num is 15. Since max_vf_num is 
>>variable
>>on different PFs, how can it be fixed value - 15 ?
>>
>
>In Shared PE case, only one int to indicate the start PE# is fine.
>In Single PE mode, we totally could enable 15 VF, the same number of PEs for
>each VF, which is limited by the number M64 BARs we have in the system.
>
>If not, the number you expected is total_vfs?
>

then it should be min(total_vfs, phb->ioda.m64_bar_idx), isn't it? 


With above way, the arrays for PE# and M64 BAR remapping needn't be 
allocated
when enabling SRIOV capability and releasing on disabling SRIOV capability.
Instead, those two arrays can be allocated during resource fixup time and 
free'ed
when destroying the pdn.

>>>
>>>My same point of view like previous, if the memory is not in the concern, how
>>>about define them static?
>>>
>>
>>It's a bad idea from my review. How many entries this array is going to have?
>>256 * NUM_OF_MAX_VF_BARS ?
>>
>
>No.
>
>It has 15 * 6, 15 VFs we could enable at most and 6 VF BARs a VF could have at
>most.
>

It's min(total_vfs, phb->ioda.m64_bar_idx) VFs that can be enabled at maximal
degree, no?

>>>And for the long term, we may support more VFs. Then at that moment, we need
>>>to restructure the code to meet it.
>>>
>>>So I suggest if we want to allocate it dynamically, we allocate the exact
>>>number of space.
>>>
>>
>>Fine... it can be improved when it has to be, as you said.
>>
>
>-- 
>Richard Yang
>Help you, Help me

__

Re: [PATCH 2/4] cxl: sparse: Make declarations static

2015-08-06 Thread Michael Neuling
On Fri, 2015-08-07 at 13:18 +1000, Daniel Axtens wrote:
> A few declarations were identified by sparse as needing to be static:
> 
> /scratch/dja/linux-capi/drivers/misc/cxl/irq.c:408:6: warning: symbol 
> 'afu_irq_name_free' was not declared. Should it be static?
> /scratch/dja/linux-capi/drivers/misc/cxl/irq.c:467:6: warning: symbol 
> 'afu_register_hwirqs' was not declared. Should it be static?
> /scratch/dja/linux-capi/drivers/misc/cxl/file.c:254:6: warning: symbol 
> 'afu_compat_ioctl' was not declared. Should it be static?
> /scratch/dja/linux-capi/drivers/misc/cxl/file.c:399:30: warning: symbol 
> 'afu_master_fops' was not declared. Should it be static?
> 
> Make them static.
> 
> Signed-off-by: Daniel Axtens 

Acked-by: Michael Neuling 

> ---
>  drivers/misc/cxl/file.c | 4 ++--
>  drivers/misc/cxl/irq.c  | 4 ++--
>  2 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/misc/cxl/file.c b/drivers/misc/cxl/file.c
> index e3f4b69527a9..c8c8bfa2679b 100644
> --- a/drivers/misc/cxl/file.c
> +++ b/drivers/misc/cxl/file.c
> @@ -251,7 +251,7 @@ long afu_ioctl(struct file *file, unsigned int cmd, 
> unsigned long arg)
>   return -EINVAL;
>  }
>  
> -long afu_compat_ioctl(struct file *file, unsigned int cmd,
> +static long afu_compat_ioctl(struct file *file, unsigned int cmd,
>unsigned long arg)
>  {
>   return afu_ioctl(file, cmd, arg);
> @@ -396,7 +396,7 @@ const struct file_operations afu_fops = {
>   .mmap   = afu_mmap,
>  };
>  
> -const struct file_operations afu_master_fops = {
> +static const struct file_operations afu_master_fops = {
>   .owner  = THIS_MODULE,
>   .open   = afu_master_open,
>   .poll   = afu_poll,
> diff --git a/drivers/misc/cxl/irq.c b/drivers/misc/cxl/irq.c
> index 680cd263436d..77e5d0e7ebe1 100644
> --- a/drivers/misc/cxl/irq.c
> +++ b/drivers/misc/cxl/irq.c
> @@ -405,7 +405,7 @@ void cxl_release_psl_irq(struct cxl_afu *afu)
>   kfree(afu->psl_irq_name);
>  }
>  
> -void afu_irq_name_free(struct cxl_context *ctx)
> +static void afu_irq_name_free(struct cxl_context *ctx)
>  {
>   struct cxl_irq_name *irq_name, *tmp;
>  
> @@ -464,7 +464,7 @@ out:
>   return -ENOMEM;
>  }
>  
> -void afu_register_hwirqs(struct cxl_context *ctx)
> +static void afu_register_hwirqs(struct cxl_context *ctx)
>  {
>   irq_hw_number_t hwirq;
>   struct cxl_irq_name *irq_name;

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V2 6/6] powerpc/powernv: allocate discrete PE# when using M64 BAR in Single PE mode

2015-08-06 Thread Wei Yang
On Fri, Aug 07, 2015 at 01:43:01PM +1000, Gavin Shan wrote:
>On Fri, Aug 07, 2015 at 10:33:33AM +0800, Wei Yang wrote:
>>On Fri, Aug 07, 2015 at 11:36:56AM +1000, Gavin Shan wrote:
>>>On Thu, Aug 06, 2015 at 09:41:41PM +0800, Wei Yang wrote:
On Thu, Aug 06, 2015 at 03:36:01PM +1000, Gavin Shan wrote:
>On Wed, Aug 05, 2015 at 09:25:03AM +0800, Wei Yang wrote:
>>When M64 BAR is set to Single PE mode, the PE# assigned to VF could be
>>discrete.
>>
>>This patch restructures the patch to allocate discrete PE# for VFs when 
>>M64
>>BAR is set to Single PE mode.
>>
>>Signed-off-by: Wei Yang 
>>---
>> arch/powerpc/include/asm/pci-bridge.h |2 +-
>> arch/powerpc/platforms/powernv/pci-ioda.c |   69 
>> +
>> 2 files changed, 51 insertions(+), 20 deletions(-)
>>
>>diff --git a/arch/powerpc/include/asm/pci-bridge.h 
>>b/arch/powerpc/include/asm/pci-bridge.h
>>index 8aeba4c..72415c7 100644
>>--- a/arch/powerpc/include/asm/pci-bridge.h
>>+++ b/arch/powerpc/include/asm/pci-bridge.h
>>@@ -213,7 +213,7 @@ struct pci_dn {
>> #ifdef CONFIG_PCI_IOV
>>  u16 vfs_expanded;   /* number of VFs IOV BAR expanded */
>>  u16 num_vfs;/* number of VFs enabled*/
>>- int offset; /* PE# for the first VF PE */
>>+ int *offset;/* PE# for the first VF PE or array */
>>  boolm64_single_mode;/* Use M64 BAR in Single Mode */
>> #define IODA_INVALID_M64(-1)
>>  int (*m64_map)[PCI_SRIOV_NUM_BARS];
>
>how about renaming "offset" to "pe_num_map", or "pe_map" ? Similar to the 
>comments
>I gave to the "m64_bar_map", num_of_max_vfs entries can be allocated. 
>Though not
>all of them will be used, not too much memory will be wasted.
>

Thanks for your comment.

I have thought about change the name to make it more self explain. While
another fact I want to take in is this field is also used to be reflect the
shift offset when M64 BAR is used in the Shared Mode. So I maintain the 
name.

How about use "enum", one maintain the name "offset", and another one 
rename to
"pe_num_map". And use the meaningful name at proper place?

>>
>>So I suppose you agree with my naming proposal.
>>
>
>No, I dislike the "enum" things.
>

OK, then you suggest to rename it pe_num_map or keep it as offset?

>>>
>>>Ok. I'm explaining it with more details. There are two cases: single vs 
>>>shared
>>>mode. When PHB M64 BARs run in single mode, you need an array to track the
>>>allocated discrete PE#. The VF_index is the index to the array. When PHB M64
>>>BARs run in shared mode, you need continuous PE#. No array required for this
>>>case. Instead, the starting PE# should be stored to somewhere, which can
>>>be pdn->offset[0] simply.
>>>
>>>So when allocating memory for this array, you just simply allocate 
>>>(sizeof(*pdn->offset)
>>>*max_vf_num) no matter what mode PHB's M64 BARs will run in. The point is 
>>>nobody
>>>can enable (max_vf_num + 1) VFs.
>>
>>The max_vf_num is 15?
>>
>
>I don't understand why you said: the max_vf_num is 15. Since max_vf_num is 
>variable
>on different PFs, how can it be fixed value - 15 ?
>

In Shared PE case, only one int to indicate the start PE# is fine.
In Single PE mode, we totally could enable 15 VF, the same number of PEs for
each VF, which is limited by the number M64 BARs we have in the system.

If not, the number you expected is total_vfs?

>>>
>>>With above way, the arrays for PE# and M64 BAR remapping needn't be allocated
>>>when enabling SRIOV capability and releasing on disabling SRIOV capability.
>>>Instead, those two arrays can be allocated during resource fixup time and 
>>>free'ed
>>>when destroying the pdn.
>>>
>>
>>My same point of view like previous, if the memory is not in the concern, how
>>about define them static?
>>
>
>It's a bad idea from my review. How many entries this array is going to have?
>256 * NUM_OF_MAX_VF_BARS ?
>

No.

It has 15 * 6, 15 VFs we could enable at most and 6 VF BARs a VF could have at
most.

>>And for the long term, we may support more VFs. Then at that moment, we need
>>to restructure the code to meet it.
>>
>>So I suggest if we want to allocate it dynamically, we allocate the exact
>>number of space.
>>
>
>Fine... it can be improved when it has to be, as you said.
>

-- 
Richard Yang
Help you, Help me

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 4/4] cxl: sparse: Silence iomem warning in debugfs file creation

2015-08-06 Thread Michael Neuling
On Fri, 2015-08-07 at 13:18 +1000, Daniel Axtens wrote:
> An IO address, tagged with __iomem, is passed to debugfs_create_file
> as private data. This requires that it be cast to void *. The cast
> creates a sparse warning:
> /scratch/dja/linux-capi/drivers/misc/cxl/debugfs.c:51:57: warning: cast 
> removes address space of expression
> 
> The address space marker is added back in the file operations
> (fops_io_u64).
> 
> Silence the warning with __force.
> 
> Signed-off-by: Daniel Axtens 

Acked-by: Michael Neuling 

> ---
>  drivers/misc/cxl/debugfs.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/misc/cxl/debugfs.c b/drivers/misc/cxl/debugfs.c
> index 825c412580bc..18df6f44af2a 100644
> --- a/drivers/misc/cxl/debugfs.c
> +++ b/drivers/misc/cxl/debugfs.c
> @@ -48,7 +48,7 @@ DEFINE_SIMPLE_ATTRIBUTE(fops_io_x64, debugfs_io_u64_get, 
> debugfs_io_u64_set, "0x
>  static struct dentry *debugfs_create_io_x64(const char *name, umode_t mode,
>   struct dentry *parent, u64 __iomem 
> *value)
>  {
> - return debugfs_create_file(name, mode, parent, (void *)value, 
> &fops_io_x64);
> + return debugfs_create_file(name, mode, parent, (void __force *)value, 
> &fops_io_x64);
>  }
>  
>  int cxl_debugfs_adapter_add(struct cxl *adapter)


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 3/4] cxl: sparse: Flag iomem pointers properly

2015-08-06 Thread Michael Neuling
On Fri, 2015-08-07 at 13:18 +1000, Daniel Axtens wrote:
> Sparse identifies the following address space issues:
> /scratch/dja/linux-capi/drivers/misc/cxl/vphb.c:125:17: warning: incorrect 
> type in assignment (different address spaces)
> /scratch/dja/linux-capi/drivers/misc/cxl/vphb.c:125:17:expected void 
> volatile [noderef] *
> /scratch/dja/linux-capi/drivers/misc/cxl/vphb.c:125:17:got void *
> /scratch/dja/linux-capi/drivers/misc/cxl/vphb.c:220:23: warning: incorrect 
> type in assignment (different address spaces)
> /scratch/dja/linux-capi/drivers/misc/cxl/vphb.c:220:23:expected void 
> [noderef] *cfg_data
> /scratch/dja/linux-capi/drivers/misc/cxl/vphb.c:220:23:got void *
> 
> Fix those by flagging __iomem in the relevant casts.
> 
> Signed-off-by: Daniel Axtens 

Acked-by: Michael Neuling 

> ---
>  drivers/misc/cxl/vphb.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/misc/cxl/vphb.c b/drivers/misc/cxl/vphb.c
> index 2eba002b580b..a7b55772a91c 100644
> --- a/drivers/misc/cxl/vphb.c
> +++ b/drivers/misc/cxl/vphb.c
> @@ -122,7 +122,7 @@ static int cxl_pcie_config_info(struct pci_bus *bus, 
> unsigned int devfn,
>   return PCIBIOS_BAD_REGISTER_NUMBER;
>   addr = cxl_pcie_cfg_addr(phb, bus->number, devfn, offset);
>  
> - *ioaddr = (void *)(addr & ~0x3ULL);
> + *ioaddr = (void __iomem *)(addr & ~0x3ULL);
>   *shift = ((addr & 0x3) * 8);
>   switch (len) {
>   case 1:
> @@ -217,7 +217,7 @@ int cxl_pci_vphb_add(struct cxl_afu *afu)
>   /* Setup the PHB using arch provided callback */
>   phb->ops = &cxl_pcie_pci_ops;
>   phb->cfg_addr = afu->afu_desc_mmio + afu->crs_offset;
> - phb->cfg_data = (void *)(u64)afu->crs_len;
> + phb->cfg_data = (void __iomem *)(u64)afu->crs_len;
>   phb->private_data = afu;
>   phb->controller_ops = cxl_pci_controller_ops;
>  


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/4] cxl: Compile with -Werror

2015-08-06 Thread Michael Neuling
On Fri, 2015-08-07 at 13:18 +1000, Daniel Axtens wrote:
> It's a good idea, and it brings us in line with the rest of arch/powerpc.
> 
> Signed-off-by: Daniel Axtens 

Acked-by: Michael Neuling 

> ---
>  drivers/misc/cxl/Makefile | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/misc/cxl/Makefile b/drivers/misc/cxl/Makefile
> index 14e3f8219a11..6f484dfe78f9 100644
> --- a/drivers/misc/cxl/Makefile
> +++ b/drivers/misc/cxl/Makefile
> @@ -1,3 +1,5 @@
> +ccflags-y := -Werror
> +
>  cxl-y+= main.o file.o irq.o fault.o native.o
>  cxl-y+= context.o sysfs.o debugfs.o pci.o 
> trace.o
>  cxl-y+= vphb.o api.o


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V2 1/6] powerpc/powernv: don't enable SRIOV when VF BAR contains non M64 BAR

2015-08-06 Thread Gavin Shan
On Fri, Aug 07, 2015 at 10:24:05AM +0800, Wei Yang wrote:
>On Fri, Aug 07, 2015 at 11:20:10AM +1000, Gavin Shan wrote:
>>On Thu, Aug 06, 2015 at 10:10:10PM +0800, Wei Yang wrote:
>>>On Thu, Aug 06, 2015 at 02:35:57PM +1000, Gavin Shan wrote:
On Wed, Aug 05, 2015 at 09:24:58AM +0800, Wei Yang wrote:
>On PHB_IODA2, we enable SRIOV devices by mapping IOV BAR with M64 BARs. If
>a SRIOV device's BAR is not 64-bit prefetchable, this is not assigned from
>M64 windwo, which means M64 BAR can't work on it.
>

s/PHB_IODA2/PHB3
s/windwo/window

>This patch makes this explicit.
>
>Signed-off-by: Wei Yang 

The idea sounds right, but there is one question as below.

>---
> arch/powerpc/platforms/powernv/pci-ioda.c |   25 +
> 1 file changed, 9 insertions(+), 16 deletions(-)
>
>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
>b/arch/powerpc/platforms/powernv/pci-ioda.c
>index 5738d31..9b41dba 100644
>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>@@ -908,9 +908,6 @@ static int pnv_pci_vf_resource_shift(struct pci_dev 
>*dev, int offset)
>   if (!res->flags || !res->parent)
>   continue;
>
>-  if (!pnv_pci_is_mem_pref_64(res->flags))
>-  continue;
>-
>   /*
>* The actual IOV BAR range is determined by the start address
>* and the actual size for num_vfs VFs BAR.  This check is to
>@@ -939,9 +936,6 @@ static int pnv_pci_vf_resource_shift(struct pci_dev 
>*dev, int offset)
>   if (!res->flags || !res->parent)
>   continue;
>
>-  if (!pnv_pci_is_mem_pref_64(res->flags))
>-  continue;
>-
>   size = pci_iov_resource_size(dev, i + PCI_IOV_RESOURCES);
>   res2 = *res;
>   res->start += size * offset;
>@@ -1221,9 +1215,6 @@ static int pnv_pci_vf_assign_m64(struct pci_dev 
>*pdev, u16 num_vfs)
>   if (!res->flags || !res->parent)
>   continue;
>
>-  if (!pnv_pci_is_mem_pref_64(res->flags))
>-  continue;
>-
>   for (j = 0; j < vf_groups; j++) {
>   do {
>   win = 
> find_next_zero_bit(&phb->ioda.m64_bar_alloc,
>@@ -1510,6 +1501,12 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 
>num_vfs)
>   pdn = pci_get_pdn(pdev);
>
>   if (phb->type == PNV_PHB_IODA2) {
>+  if (!pdn->vfs_expanded) {
>+  dev_info(&pdev->dev, "don't support this SRIOV device"
>+  " with non M64 VF BAR\n");
>+  return -EBUSY;
>+  }
>+

It would be -ENOSPC since -EBUSY indicates the devices (VFs) are temparily
unavailable. For this case, the VFs are permanently unavailable because of
running out of space to accomodate M64 and non-M64 VF BARs.

The error message could be printed with dev_warn() and it would be precise
as below or something else you prefer:

dev_warn(&pdev->dev, "SRIOV not supported because of non-M64 VF BAR\n");

>>>
>>>Thanks for the comment, will change accordingly.
>>>

>   /* Calculate available PE for required VFs */
>   mutex_lock(&phb->ioda.pe_alloc_mutex);
>   pdn->offset = bitmap_find_next_zero_area(
>@@ -2774,9 +2771,10 @@ static void pnv_pci_ioda_fixup_iov_resources(struct 
>pci_dev *pdev)
>   if (!res->flags || res->parent)
>   continue;
>   if (!pnv_pci_is_mem_pref_64(res->flags)) {
>-  dev_warn(&pdev->dev, " non M64 VF BAR%d: %pR\n",
>+  dev_warn(&pdev->dev, "Don't support SR-IOV with"
>+  " non M64 VF BAR%d: %pR. \n",
>i, res);
>-  continue;
>+  return;
>   }
>
>   size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES);
>@@ -2795,11 +2793,6 @@ static void pnv_pci_ioda_fixup_iov_resources(struct 
>pci_dev *pdev)
>   res = &pdev->resource[i + PCI_IOV_RESOURCES];
>   if (!res->flags || res->parent)
>   continue;
>-  if (!pnv_pci_is_mem_pref_64(res->flags)) {
>-  dev_warn(&pdev->dev, "Skipping expanding VF BAR%d: 
>%pR\n",
>-   i, res);
>-  continue;
>-  }

When any one IOV BAR on the PF is non-M64, none of the VFs can be enabled.
Will we still allocate/assign M64 or M32 resources for the IOV BARs? If so,
I think it can be avoided.

>>>
>>>Don't get your point. You mean to avoid this function?

Re: [PATCH] powerpc/powernv: move dma_get_required_mask from pnv_phb to pci_controller_ops

2015-08-06 Thread Daniel Axtens
On Fri, 2015-08-07 at 13:45 +1000, Andrew Donnellan wrote:
> Simplify the dma_get_required_mask call chain by moving it from pnv_phb to
> pci_controller_ops, similar to commit 763d2d8df1ee2b92ff09c
> ("powerpc/powernv: Move dma_set_mask from pnv_phb to pci_controller_ops").
> 
> Previous call chain:
> 
>   0) call dma_get_required_mask() (kernel/dma.c)
>   1) call ppc_md.dma_get_required_mask, if it exists. On powernv, that
>  points to pnv_dma_get_required_mask() (platforms/powernv/setup.c)
>   2) device is PCI, therefore call pnv_pci_dma_get_required_mask()
>  (platforms/powernv/pci.c)
>   3) call phb->dma_get_required_mask if it exists
>   4) it only exists in the ioda case, where it points to
>pnv_pci_ioda_dma_get_required_mask() (platforms/powernv/pci-ioda.c)
> 
> New call chain:
> 
>   0) call dma_get_required_mask() (kernel/dma.c)
>   1) device is PCI, therefore call pci_controller_ops.dma_get_required_mask
>  if it exists
>   2) in the ioda case, that points to pnv_pci_ioda_dma_get_required_mask()
>  (platforms/powernv/pci-ioda.c)
> 
> In the p5ioc2 case, the call chain remains the same -
> dma_get_required_mask() does not find either a ppc_md call or
> pci_controller_ops call, so it calls __dma_get_required_mask().
> 
> Signed-off-by: Andrew Donnellan 
> 
Reviewed-by: Daniel Axtens 

-- 
Regards,
Daniel


signature.asc
Description: This is a digitally signed message part
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] powerpc/powernv: move dma_get_required_mask from pnv_phb to pci_controller_ops

2015-08-06 Thread Andrew Donnellan
Simplify the dma_get_required_mask call chain by moving it from pnv_phb to
pci_controller_ops, similar to commit 763d2d8df1ee2b92ff09c
("powerpc/powernv: Move dma_set_mask from pnv_phb to pci_controller_ops").

Previous call chain:

  0) call dma_get_required_mask() (kernel/dma.c)
  1) call ppc_md.dma_get_required_mask, if it exists. On powernv, that
 points to pnv_dma_get_required_mask() (platforms/powernv/setup.c)
  2) device is PCI, therefore call pnv_pci_dma_get_required_mask()
 (platforms/powernv/pci.c)
  3) call phb->dma_get_required_mask if it exists
  4) it only exists in the ioda case, where it points to
   pnv_pci_ioda_dma_get_required_mask() (platforms/powernv/pci-ioda.c)

New call chain:

  0) call dma_get_required_mask() (kernel/dma.c)
  1) device is PCI, therefore call pci_controller_ops.dma_get_required_mask
 if it exists
  2) in the ioda case, that points to pnv_pci_ioda_dma_get_required_mask()
 (platforms/powernv/pci-ioda.c)

In the p5ioc2 case, the call chain remains the same -
dma_get_required_mask() does not find either a ppc_md call or
pci_controller_ops call, so it calls __dma_get_required_mask().

Signed-off-by: Andrew Donnellan 

---

dma_get_required_mask() is only used in a small handful of drivers, for
which I don't have access to appropriate hardware, so this was tested by
artificially inserting calls into the tg3 PCI network card driver.

---
 arch/powerpc/include/asm/pci-bridge.h |  1 +
 arch/powerpc/kernel/dma.c |  7 +++
 arch/powerpc/platforms/powernv/pci-ioda.c |  7 ---
 arch/powerpc/platforms/powernv/pci.c  | 11 ---
 arch/powerpc/platforms/powernv/pci.h  |  2 --
 arch/powerpc/platforms/powernv/powernv.h  |  6 --
 arch/powerpc/platforms/powernv/setup.c|  9 -
 7 files changed, 12 insertions(+), 31 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h 
b/arch/powerpc/include/asm/pci-bridge.h
index 712add5..37fc535 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -42,6 +42,7 @@ struct pci_controller_ops {
 #endif
 
int (*dma_set_mask)(struct pci_dev *dev, u64 dma_mask);
+   u64 (*dma_get_required_mask)(struct pci_dev *dev);
 
void(*shutdown)(struct pci_controller *);
 };
diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c
index 35e4dcc..9d68589 100644
--- a/arch/powerpc/kernel/dma.c
+++ b/arch/powerpc/kernel/dma.c
@@ -278,6 +278,13 @@ u64 dma_get_required_mask(struct device *dev)
if (ppc_md.dma_get_required_mask)
return ppc_md.dma_get_required_mask(dev);
 
+   if (dev_is_pci(dev)) {
+   struct pci_dev *pdev = to_pci_dev(dev);
+   struct pci_controller *phb = pci_bus_to_host(pdev->bus);
+   if (phb->controller_ops.dma_get_required_mask)
+   return phb->controller_ops.dma_get_required_mask(pdev);
+   }
+
return __dma_get_required_mask(dev);
 }
 EXPORT_SYMBOL_GPL(dma_get_required_mask);
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 85cbc96..6f015d4 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1630,9 +1630,10 @@ static int pnv_pci_ioda_dma_set_mask(struct pci_dev 
*pdev, u64 dma_mask)
return 0;
 }
 
-static u64 pnv_pci_ioda_dma_get_required_mask(struct pnv_phb *phb,
- struct pci_dev *pdev)
+static u64 pnv_pci_ioda_dma_get_required_mask(struct pci_dev *pdev)
 {
+   struct pci_controller *hose = pci_bus_to_host(pdev->bus);
+   struct pnv_phb *phb = hose->private_data;
struct pci_dn *pdn = pci_get_pdn(pdev);
struct pnv_ioda_pe *pe;
u64 end, mask;
@@ -3057,6 +3058,7 @@ static const struct pci_controller_ops 
pnv_pci_ioda_controller_ops = {
.window_alignment = pnv_pci_window_alignment,
.reset_secondary_bus = pnv_pci_reset_secondary_bus,
.dma_set_mask = pnv_pci_ioda_dma_set_mask,
+   .dma_get_required_mask = pnv_pci_ioda_dma_get_required_mask,
.shutdown = pnv_pci_ioda_shutdown,
 };
 
@@ -3203,7 +3205,6 @@ static void __init pnv_pci_init_ioda_phb(struct 
device_node *np,
 
/* Setup TCEs */
phb->dma_dev_setup = pnv_pci_ioda_dma_dev_setup;
-   phb->dma_get_required_mask = pnv_pci_ioda_dma_get_required_mask;
 
/* Setup MSI support */
pnv_pci_init_ioda_msis(phb);
diff --git a/arch/powerpc/platforms/powernv/pci.c 
b/arch/powerpc/platforms/powernv/pci.c
index 765d8ed..3e7f6fd 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -761,17 +761,6 @@ void pnv_pci_dma_dev_setup(struct pci_dev *pdev)
phb->dma_dev_setup(phb, pdev);
 }
 
-u64 pnv_pci_dma_get_required_mask(struct pci_dev *pdev)
-{
-   struct pci_controller *hose = pci_bus_to_host(pdev->bus);
-   s

Re: [PATCH V2 6/6] powerpc/powernv: allocate discrete PE# when using M64 BAR in Single PE mode

2015-08-06 Thread Gavin Shan
On Fri, Aug 07, 2015 at 10:33:33AM +0800, Wei Yang wrote:
>On Fri, Aug 07, 2015 at 11:36:56AM +1000, Gavin Shan wrote:
>>On Thu, Aug 06, 2015 at 09:41:41PM +0800, Wei Yang wrote:
>>>On Thu, Aug 06, 2015 at 03:36:01PM +1000, Gavin Shan wrote:
On Wed, Aug 05, 2015 at 09:25:03AM +0800, Wei Yang wrote:
>When M64 BAR is set to Single PE mode, the PE# assigned to VF could be
>discrete.
>
>This patch restructures the patch to allocate discrete PE# for VFs when M64
>BAR is set to Single PE mode.
>
>Signed-off-by: Wei Yang 
>---
> arch/powerpc/include/asm/pci-bridge.h |2 +-
> arch/powerpc/platforms/powernv/pci-ioda.c |   69 
> +
> 2 files changed, 51 insertions(+), 20 deletions(-)
>
>diff --git a/arch/powerpc/include/asm/pci-bridge.h 
>b/arch/powerpc/include/asm/pci-bridge.h
>index 8aeba4c..72415c7 100644
>--- a/arch/powerpc/include/asm/pci-bridge.h
>+++ b/arch/powerpc/include/asm/pci-bridge.h
>@@ -213,7 +213,7 @@ struct pci_dn {
> #ifdef CONFIG_PCI_IOV
>   u16 vfs_expanded;   /* number of VFs IOV BAR expanded */
>   u16 num_vfs;/* number of VFs enabled*/
>-  int offset; /* PE# for the first VF PE */
>+  int *offset;/* PE# for the first VF PE or array */
>   boolm64_single_mode;/* Use M64 BAR in Single Mode */
> #define IODA_INVALID_M64(-1)
>   int (*m64_map)[PCI_SRIOV_NUM_BARS];

how about renaming "offset" to "pe_num_map", or "pe_map" ? Similar to the 
comments
I gave to the "m64_bar_map", num_of_max_vfs entries can be allocated. 
Though not
all of them will be used, not too much memory will be wasted.

>>>
>>>Thanks for your comment.
>>>
>>>I have thought about change the name to make it more self explain. While
>>>another fact I want to take in is this field is also used to be reflect the
>>>shift offset when M64 BAR is used in the Shared Mode. So I maintain the name.
>>>
>>>How about use "enum", one maintain the name "offset", and another one rename 
>>>to
>>>"pe_num_map". And use the meaningful name at proper place?
>>>
>
>So I suppose you agree with my naming proposal.
>

No, I dislike the "enum" things.

>>
>>Ok. I'm explaining it with more details. There are two cases: single vs shared
>>mode. When PHB M64 BARs run in single mode, you need an array to track the
>>allocated discrete PE#. The VF_index is the index to the array. When PHB M64
>>BARs run in shared mode, you need continuous PE#. No array required for this
>>case. Instead, the starting PE# should be stored to somewhere, which can
>>be pdn->offset[0] simply.
>>
>>So when allocating memory for this array, you just simply allocate 
>>(sizeof(*pdn->offset)
>>*max_vf_num) no matter what mode PHB's M64 BARs will run in. The point is 
>>nobody
>>can enable (max_vf_num + 1) VFs.
>
>The max_vf_num is 15?
>

I don't understand why you said: the max_vf_num is 15. Since max_vf_num is 
variable
on different PFs, how can it be fixed value - 15 ?

>>
>>With above way, the arrays for PE# and M64 BAR remapping needn't be allocated
>>when enabling SRIOV capability and releasing on disabling SRIOV capability.
>>Instead, those two arrays can be allocated during resource fixup time and 
>>free'ed
>>when destroying the pdn.
>>
>
>My same point of view like previous, if the memory is not in the concern, how
>about define them static?
>

It's a bad idea from my review. How many entries this array is going to have?
256 * NUM_OF_MAX_VF_BARS ?

>And for the long term, we may support more VFs. Then at that moment, we need
>to restructure the code to meet it.
>
>So I suggest if we want to allocate it dynamically, we allocate the exact
>number of space.
>

Fine... it can be improved when it has to be, as you said.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/4] cxl: Compile with -Werror

2015-08-06 Thread Daniel Axtens
> Do you need to put this patch last so that you don't break bisecting the
> series?
> 
The warnings fixed in the rest of the series are from sparse, not the
compiler, so they don't break building with -Werror.

-- 
Regards,
Daniel


signature.asc
Description: This is a digitally signed message part
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/4] cxl: Compile with -Werror

2015-08-06 Thread Michael Neuling
On Fri, 2015-08-07 at 13:18 +1000, Daniel Axtens wrote:
> It's a good idea, and it brings us in line with the rest of arch/powerpc.

Do you need to put this patch last so that you don't break bisecting the
series?

Mikey

> Signed-off-by: Daniel Axtens 
> ---
>  drivers/misc/cxl/Makefile | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/misc/cxl/Makefile b/drivers/misc/cxl/Makefile
> index 14e3f8219a11..6f484dfe78f9 100644
> --- a/drivers/misc/cxl/Makefile
> +++ b/drivers/misc/cxl/Makefile
> @@ -1,3 +1,5 @@
> +ccflags-y := -Werror
> +
>  cxl-y+= main.o file.o irq.o fault.o native.o
>  cxl-y+= context.o sysfs.o debugfs.o pci.o 
> trace.o
>  cxl-y+= vphb.o api.o

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 4/4] cxl: sparse: Silence iomem warning in debugfs file creation

2015-08-06 Thread Daniel Axtens
An IO address, tagged with __iomem, is passed to debugfs_create_file
as private data. This requires that it be cast to void *. The cast
creates a sparse warning:
/scratch/dja/linux-capi/drivers/misc/cxl/debugfs.c:51:57: warning: cast removes 
address space of expression

The address space marker is added back in the file operations
(fops_io_u64).

Silence the warning with __force.

Signed-off-by: Daniel Axtens 
---
 drivers/misc/cxl/debugfs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/misc/cxl/debugfs.c b/drivers/misc/cxl/debugfs.c
index 825c412580bc..18df6f44af2a 100644
--- a/drivers/misc/cxl/debugfs.c
+++ b/drivers/misc/cxl/debugfs.c
@@ -48,7 +48,7 @@ DEFINE_SIMPLE_ATTRIBUTE(fops_io_x64, debugfs_io_u64_get, 
debugfs_io_u64_set, "0x
 static struct dentry *debugfs_create_io_x64(const char *name, umode_t mode,
struct dentry *parent, u64 __iomem 
*value)
 {
-   return debugfs_create_file(name, mode, parent, (void *)value, 
&fops_io_x64);
+   return debugfs_create_file(name, mode, parent, (void __force *)value, 
&fops_io_x64);
 }
 
 int cxl_debugfs_adapter_add(struct cxl *adapter)
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 3/4] cxl: sparse: Flag iomem pointers properly

2015-08-06 Thread Daniel Axtens
Sparse identifies the following address space issues:
/scratch/dja/linux-capi/drivers/misc/cxl/vphb.c:125:17: warning: incorrect type 
in assignment (different address spaces)
/scratch/dja/linux-capi/drivers/misc/cxl/vphb.c:125:17:expected void 
volatile [noderef] *
/scratch/dja/linux-capi/drivers/misc/cxl/vphb.c:125:17:got void *
/scratch/dja/linux-capi/drivers/misc/cxl/vphb.c:220:23: warning: incorrect type 
in assignment (different address spaces)
/scratch/dja/linux-capi/drivers/misc/cxl/vphb.c:220:23:expected void 
[noderef] *cfg_data
/scratch/dja/linux-capi/drivers/misc/cxl/vphb.c:220:23:got void *

Fix those by flagging __iomem in the relevant casts.

Signed-off-by: Daniel Axtens 
---
 drivers/misc/cxl/vphb.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/misc/cxl/vphb.c b/drivers/misc/cxl/vphb.c
index 2eba002b580b..a7b55772a91c 100644
--- a/drivers/misc/cxl/vphb.c
+++ b/drivers/misc/cxl/vphb.c
@@ -122,7 +122,7 @@ static int cxl_pcie_config_info(struct pci_bus *bus, 
unsigned int devfn,
return PCIBIOS_BAD_REGISTER_NUMBER;
addr = cxl_pcie_cfg_addr(phb, bus->number, devfn, offset);
 
-   *ioaddr = (void *)(addr & ~0x3ULL);
+   *ioaddr = (void __iomem *)(addr & ~0x3ULL);
*shift = ((addr & 0x3) * 8);
switch (len) {
case 1:
@@ -217,7 +217,7 @@ int cxl_pci_vphb_add(struct cxl_afu *afu)
/* Setup the PHB using arch provided callback */
phb->ops = &cxl_pcie_pci_ops;
phb->cfg_addr = afu->afu_desc_mmio + afu->crs_offset;
-   phb->cfg_data = (void *)(u64)afu->crs_len;
+   phb->cfg_data = (void __iomem *)(u64)afu->crs_len;
phb->private_data = afu;
phb->controller_ops = cxl_pci_controller_ops;
 
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 2/4] cxl: sparse: Make declarations static

2015-08-06 Thread Daniel Axtens
A few declarations were identified by sparse as needing to be static:

/scratch/dja/linux-capi/drivers/misc/cxl/irq.c:408:6: warning: symbol 
'afu_irq_name_free' was not declared. Should it be static?
/scratch/dja/linux-capi/drivers/misc/cxl/irq.c:467:6: warning: symbol 
'afu_register_hwirqs' was not declared. Should it be static?
/scratch/dja/linux-capi/drivers/misc/cxl/file.c:254:6: warning: symbol 
'afu_compat_ioctl' was not declared. Should it be static?
/scratch/dja/linux-capi/drivers/misc/cxl/file.c:399:30: warning: symbol 
'afu_master_fops' was not declared. Should it be static?

Make them static.

Signed-off-by: Daniel Axtens 
---
 drivers/misc/cxl/file.c | 4 ++--
 drivers/misc/cxl/irq.c  | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/misc/cxl/file.c b/drivers/misc/cxl/file.c
index e3f4b69527a9..c8c8bfa2679b 100644
--- a/drivers/misc/cxl/file.c
+++ b/drivers/misc/cxl/file.c
@@ -251,7 +251,7 @@ long afu_ioctl(struct file *file, unsigned int cmd, 
unsigned long arg)
return -EINVAL;
 }
 
-long afu_compat_ioctl(struct file *file, unsigned int cmd,
+static long afu_compat_ioctl(struct file *file, unsigned int cmd,
 unsigned long arg)
 {
return afu_ioctl(file, cmd, arg);
@@ -396,7 +396,7 @@ const struct file_operations afu_fops = {
.mmap   = afu_mmap,
 };
 
-const struct file_operations afu_master_fops = {
+static const struct file_operations afu_master_fops = {
.owner  = THIS_MODULE,
.open   = afu_master_open,
.poll   = afu_poll,
diff --git a/drivers/misc/cxl/irq.c b/drivers/misc/cxl/irq.c
index 680cd263436d..77e5d0e7ebe1 100644
--- a/drivers/misc/cxl/irq.c
+++ b/drivers/misc/cxl/irq.c
@@ -405,7 +405,7 @@ void cxl_release_psl_irq(struct cxl_afu *afu)
kfree(afu->psl_irq_name);
 }
 
-void afu_irq_name_free(struct cxl_context *ctx)
+static void afu_irq_name_free(struct cxl_context *ctx)
 {
struct cxl_irq_name *irq_name, *tmp;
 
@@ -464,7 +464,7 @@ out:
return -ENOMEM;
 }
 
-void afu_register_hwirqs(struct cxl_context *ctx)
+static void afu_register_hwirqs(struct cxl_context *ctx)
 {
irq_hw_number_t hwirq;
struct cxl_irq_name *irq_name;
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 1/4] cxl: Compile with -Werror

2015-08-06 Thread Daniel Axtens
It's a good idea, and it brings us in line with the rest of arch/powerpc.

Signed-off-by: Daniel Axtens 
---
 drivers/misc/cxl/Makefile | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/misc/cxl/Makefile b/drivers/misc/cxl/Makefile
index 14e3f8219a11..6f484dfe78f9 100644
--- a/drivers/misc/cxl/Makefile
+++ b/drivers/misc/cxl/Makefile
@@ -1,3 +1,5 @@
+ccflags-y := -Werror
+
 cxl-y  += main.o file.o irq.o fault.o native.o
 cxl-y  += context.o sysfs.o debugfs.o pci.o trace.o
 cxl-y  += vphb.o api.o
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/3] Powerpc: mpc85xx: refactor the PM operations

2015-08-06 Thread Chenhui Zhao



On Fri, Aug 7, 2015 at 2:02 AM, Scott Wood  
wrote:

On Thu, 2015-08-06 at 13:54 +0800, Chenhui Zhao wrote:

 On Thu, Aug 6, 2015 at 1:46 PM, Scott Wood 
 wrote:
 > On Thu, 2015-08-06 at 12:20 +0800, Chenhui Zhao wrote:
 > >  On Thu, Aug 6, 2015 at 10:57 AM, Scott Wood
 > > 
 > >  wrote:
 > >  > On Wed, 2015-08-05 at 18:11 +0800, Chenhui Zhao wrote:
 > >  > >  On Tue, Aug 4, 2015 at 4:26 AM, Scott Wood
 > > 
 > >  > >  wrote:
 > >  > >  > On Mon, 2015-08-03 at 19:32 +0800, Chenhui Zhao wrote:
 > >  > >  > >  >
 > >  > >  >
 > >  > >  > >  On Sat, Aug 1, 2015 at 7:59 AM, Scott Wood
 > >  > > 
 > >  > >  > >  wrote:
 > >  > >  >
 > >  > >  > >  >
 > >  > >  > >  > Could you explain irq_mask()?  Why would there 
still be

 > > IRQs
 > >  > >  > > destined
 > >  > >  > >  > for
 > >  > >  > >  > this CPU at this point?
 > >  > >  > >
 > >  > >  > >  This function just masks irq by setting the 
registers in

 > > RCPM
 > >  > > (for
 > >  > >  > >  example, RCPM_CPMIMR, RCPM_CPMCIMR). Actually, all 
irqs to

 > >  > > this CPU
 > >  > >  > >  have been migrated to other CPUs.
 > >  > >  >
 > >  > >  > So why do we need to set those bits in RCPM?  Is it just
 > > caution?
 > >  > >
 > >  > >  Setting these bits can mask interrupts signalled to RCPM 
from

 > > MPIC
 > >  > > as a
 > >  > >  means of
 > >  > >  waking up from a lower power state. So, cores will not be
 > > waked up
 > >  > >  unexpectedly.
 > >  >
 > >  > Why would the MPIC be signalling those interrupts if they've 
been

 > >  > masked at
 > >  > the MPIC?
 > >  >
 > >  > -Scott
 > >  >
 > >
 > >  The interrupts to RCPM from MPIC are IRQ, Machine Check, NMI 
and

 > >  Critical interrupts. Some of them didn't be masked in MPIC.
 >
 > What interrupt could actually happen to a sleeping cpu that this
 > protects
 > against?
 >
 > -Scott

 Not sure. Maybe spurious interrupts or hardware exceptions.


Spurious interrupts happen due to race conditions.  They don't happen 
because
the MPIC is bored and decides to ring a CPU's doorbell and hide in 
the bushes.


If by "hardware exceptions" you mean machine checks, how would such a 
machine

check be generated by a core that is off?

  However, setting them make sure dead cpus can not be waked up 
unexpectedly.


I'm not seeing enough value here to warrant resurrecting the old 
sleep node

stuff.

-Scott


My guess maybe not accurate. My point is that electronic parts don't 
always work as expected. Taking preventative measures can make the 
system more robust. In addition, this step is required in deep sleep 
procedure.


-Chenhui



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] cxl: Add alternate MMIO error handling

2015-08-06 Thread Michael Neuling


On Thu, 2015-07-23 at 16:43 +1000, Ian Munsie wrote:
> From: Ian Munsie 
> 
> userspace programs using cxl currently have to use two strategies for
> dealing with MMIO errors simultaneously. They have to check every read
> for a return of all Fs in case the adapter has gone away and the kernel
> has not yet noticed, and they have to deal with SIGBUS in case the
> kernel has already noticed, invalidated the mapping and marked the
> context as failed.
> 
> In order to simplify things, this patch adds an alternative approach
> where the kernel will return a page filled with Fs instead of delivering
> a SIGBUS. This allows userspace to only need to deal with one of these
> two error paths, and is intended for use in libraries that use cxl
> transparently and may not be able to safely install a signal handler.
> 
> This approach will only work if certain constraints are met. Namely, if
> the application is both reading and writing to an address in the problem
> state area it cannot assume that a non-FF read is OK, as it may just be
> reading out a value it has previously written. Further - since only one
> page is used per context a write to a given offset would be visible when
> reading the same offset from a different page in the mapping (this only
> applies within a single context, not between contexts).
> 
> An application could deal with this by e.g. making sure it also reads
> from a read-only offset after any reads to a read/write offset.
> 
> Due to these constraints, this functionality must be explicitly
> requested by userspace when starting the context by passing in the
> CXL_START_WORK_ERR_FF flag.
> 
> Signed-off-by: Ian Munsie 

Acked-by: Michael Neuling 

> ---
>  drivers/misc/cxl/context.c | 14 ++
>  drivers/misc/cxl/cxl.h |  4 +++-
>  drivers/misc/cxl/file.c|  4 +++-
>  include/uapi/misc/cxl.h|  4 +++-
>  4 files changed, 23 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/misc/cxl/context.c b/drivers/misc/cxl/context.c
> index 1287148..6570f10 100644
> --- a/drivers/misc/cxl/context.c
> +++ b/drivers/misc/cxl/context.c
> @@ -126,6 +126,18 @@ static int cxl_mmap_fault(struct vm_area_struct *vma, 
> struct vm_fault *vmf)
>   if (ctx->status != STARTED) {
>   mutex_unlock(&ctx->status_mutex);
>   pr_devel("%s: Context not started, failing problem state 
> access\n", __func__);
> + if (ctx->mmio_err_ff) {
> + if (!ctx->ff_page) {
> + ctx->ff_page = alloc_page(GFP_USER);
> + if (!ctx->ff_page)
> + return VM_FAULT_OOM;
> + memset(page_address(ctx->ff_page), 0xff, 
> PAGE_SIZE);
> + }
> + get_page(ctx->ff_page);
> + vmf->page = ctx->ff_page;
> + vma->vm_page_prot = pgprot_cached(vma->vm_page_prot);
> + return 0;
> + }
>   return VM_FAULT_SIGBUS;
>   }
>  
> @@ -253,6 +265,8 @@ static void reclaim_ctx(struct rcu_head *rcu)
>   struct cxl_context *ctx = container_of(rcu, struct cxl_context, rcu);
>  
>   free_page((u64)ctx->sstp);
> + if (ctx->ff_page)
> + __free_page(ctx->ff_page);
>   ctx->sstp = NULL;
>  
>   kfree(ctx);
> diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h
> index 4fd66ca..b7293a4 100644
> --- a/drivers/misc/cxl/cxl.h
> +++ b/drivers/misc/cxl/cxl.h
> @@ -34,7 +34,7 @@ extern uint cxl_verbose;
>   * Bump version each time a user API change is made, whether it is
>   * backwards compatible ot not.
>   */
> -#define CXL_API_VERSION 1
> +#define CXL_API_VERSION 2
>  #define CXL_API_VERSION_COMPATIBLE 1
>  
>  /*
> @@ -418,6 +418,8 @@ struct cxl_context {
>   /* Used to unmap any mmaps when force detaching */
>   struct address_space *mapping;
>   struct mutex mapping_lock;
> + struct page *ff_page;
> + bool mmio_err_ff;
>  
>   spinlock_t sste_lock; /* Protects segment table entries */
>   struct cxl_sste *sstp;
> diff --git a/drivers/misc/cxl/file.c b/drivers/misc/cxl/file.c
> index e3f4b69..34c7a5e 100644
> --- a/drivers/misc/cxl/file.c
> +++ b/drivers/misc/cxl/file.c
> @@ -179,6 +179,8 @@ static long afu_ioctl_start_work(struct cxl_context *ctx,
>   if (work.flags & CXL_START_WORK_AMR)
>   amr = work.amr & mfspr(SPRN_UAMOR);
>  
> + ctx->mmio_err_ff = !!(work.flags & CXL_START_WORK_ERR_FF);
> +
>   /*
>* We grab the PID here and not in the file open to allow for the case
>* where a process (master, some daemon, etc) has opened the chardev on
> @@ -519,7 +521,7 @@ int __init cxl_file_init(void)
>* If these change we really need to update API.  Either change some
>* flags or update API version number CXL_API_VERSION.
>*/
> - BUILD_BUG_ON(CXL_API_VERSION != 1);
> + BUILD_BUG_ON(CXL_API_VERSION != 2);
>   BUIL

[PATCH] powerpc/vdso: Emit GNU & SysV hashes

2015-08-06 Thread Michael Ellerman
Andy Lutomirski says:

  Some dynamic loaders may be slightly faster if a GNU hash is
  available.

  This is unlikely to have any measurable effect on the time it takes
  to resolve vdso symbols (since there are so few of them).  In some
  contexts, it can be a win for a different reason: if every DSO has a
  GNU hash section, then libc can avoid calculating SysV hashes at
  all. Both musl and glibc appear to have this optimization.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/kernel/vdso32/Makefile | 2 +-
 arch/powerpc/kernel/vdso64/Makefile | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/vdso32/Makefile 
b/arch/powerpc/kernel/vdso32/Makefile
index 53e6c9b979ec..6abffb7a8cd9 100644
--- a/arch/powerpc/kernel/vdso32/Makefile
+++ b/arch/powerpc/kernel/vdso32/Makefile
@@ -18,7 +18,7 @@ GCOV_PROFILE := n
 
 ccflags-y := -shared -fno-common -fno-builtin
 ccflags-y += -nostdlib -Wl,-soname=linux-vdso32.so.1 \
-   $(call cc-ldoption, -Wl$(comma)--hash-style=sysv)
+   $(call cc-ldoption, -Wl$(comma)--hash-style=both)
 asflags-y := -D__VDSO32__ -s
 
 obj-y += vdso32_wrapper.o
diff --git a/arch/powerpc/kernel/vdso64/Makefile 
b/arch/powerpc/kernel/vdso64/Makefile
index effca9404b17..8c8f2ae43935 100644
--- a/arch/powerpc/kernel/vdso64/Makefile
+++ b/arch/powerpc/kernel/vdso64/Makefile
@@ -11,7 +11,7 @@ GCOV_PROFILE := n
 
 ccflags-y := -shared -fno-common -fno-builtin
 ccflags-y += -nostdlib -Wl,-soname=linux-vdso64.so.1 \
-   $(call cc-ldoption, -Wl$(comma)--hash-style=sysv)
+   $(call cc-ldoption, -Wl$(comma)--hash-style=both)
 asflags-y := -D__VDSO64__ -s
 
 obj-y += vdso64_wrapper.o
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V2 6/6] powerpc/powernv: allocate discrete PE# when using M64 BAR in Single PE mode

2015-08-06 Thread Wei Yang
On Fri, Aug 07, 2015 at 11:36:56AM +1000, Gavin Shan wrote:
>On Thu, Aug 06, 2015 at 09:41:41PM +0800, Wei Yang wrote:
>>On Thu, Aug 06, 2015 at 03:36:01PM +1000, Gavin Shan wrote:
>>>On Wed, Aug 05, 2015 at 09:25:03AM +0800, Wei Yang wrote:
When M64 BAR is set to Single PE mode, the PE# assigned to VF could be
discrete.

This patch restructures the patch to allocate discrete PE# for VFs when M64
BAR is set to Single PE mode.

Signed-off-by: Wei Yang 
---
 arch/powerpc/include/asm/pci-bridge.h |2 +-
 arch/powerpc/platforms/powernv/pci-ioda.c |   69 
 +
 2 files changed, 51 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h 
b/arch/powerpc/include/asm/pci-bridge.h
index 8aeba4c..72415c7 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -213,7 +213,7 @@ struct pci_dn {
 #ifdef CONFIG_PCI_IOV
u16 vfs_expanded;   /* number of VFs IOV BAR expanded */
u16 num_vfs;/* number of VFs enabled*/
-   int offset; /* PE# for the first VF PE */
+   int *offset;/* PE# for the first VF PE or array */
boolm64_single_mode;/* Use M64 BAR in Single Mode */
 #define IODA_INVALID_M64(-1)
int (*m64_map)[PCI_SRIOV_NUM_BARS];
>>>
>>>how about renaming "offset" to "pe_num_map", or "pe_map" ? Similar to the 
>>>comments
>>>I gave to the "m64_bar_map", num_of_max_vfs entries can be allocated. Though 
>>>not
>>>all of them will be used, not too much memory will be wasted.
>>>
>>
>>Thanks for your comment.
>>
>>I have thought about change the name to make it more self explain. While
>>another fact I want to take in is this field is also used to be reflect the
>>shift offset when M64 BAR is used in the Shared Mode. So I maintain the name.
>>
>>How about use "enum", one maintain the name "offset", and another one rename 
>>to
>>"pe_num_map". And use the meaningful name at proper place?
>>

So I suppose you agree with my naming proposal.

>
>Ok. I'm explaining it with more details. There are two cases: single vs shared
>mode. When PHB M64 BARs run in single mode, you need an array to track the
>allocated discrete PE#. The VF_index is the index to the array. When PHB M64
>BARs run in shared mode, you need continuous PE#. No array required for this
>case. Instead, the starting PE# should be stored to somewhere, which can
>be pdn->offset[0] simply.
>
>So when allocating memory for this array, you just simply allocate 
>(sizeof(*pdn->offset)
>*max_vf_num) no matter what mode PHB's M64 BARs will run in. The point is 
>nobody
>can enable (max_vf_num + 1) VFs.

The max_vf_num is 15?

>
>With above way, the arrays for PE# and M64 BAR remapping needn't be allocated
>when enabling SRIOV capability and releasing on disabling SRIOV capability.
>Instead, those two arrays can be allocated during resource fixup time and 
>free'ed
>when destroying the pdn.
>

My same point of view like previous, if the memory is not in the concern, how
about define them static?

And for the long term, we may support more VFs. Then at that moment, we need
to restructure the code to meet it.

So I suggest if we want to allocate it dynamically, we allocate the exact
number of space.

-- 
Richard Yang
Help you, Help me

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V2 5/6] powerpc/powernv: boundary the total vf bar size instead of the individual one

2015-08-06 Thread Wei Yang
On Fri, Aug 07, 2015 at 11:23:54AM +1000, Gavin Shan wrote:
>On Thu, Aug 06, 2015 at 10:03:04PM +0800, Wei Yang wrote:
>>On Thu, Aug 06, 2015 at 03:28:51PM +1000, Gavin Shan wrote:
>>>On Wed, Aug 05, 2015 at 09:25:02AM +0800, Wei Yang wrote:
Each VF could have 6 BARs at most. When the total BAR size exceeds the
gate, after expanding it will also exhaust the M64 Window.

This patch limits the boundary by checking the total VF BAR size instead of
the individual BAR.

Signed-off-by: Wei Yang 
>>>
>>>Ok. I didn't look at this when giving comments to last patch. It turns
>>>you have the change in this patch. Please merge it with the previous
>>>patch.
>>>
>>
>>Hmm... I prefer to have them in two patches. One focus on the calculation of
>>gate and the other focus on checking the total VF BAR size. This would help
>>record the change.
>>
>
>It's fine to me as well. I'll take close look on your next revision since
>you have to refresh the whole series. Is that fine to you?
>

Fine and thanks for your comments.

---
 arch/powerpc/platforms/powernv/pci-ioda.c |   13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 31dcedc..4042303 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2702,7 +2702,7 @@ static void pnv_pci_ioda_fixup_iov_resources(struct 
pci_dev *pdev)
struct pnv_phb *phb;
struct resource *res;
int i;
-   resource_size_t size, gate;
+   resource_size_t size, gate, total_vf_bar_sz;
struct pci_dn *pdn;
int mul, total_vfs;

@@ -2729,6 +2729,7 @@ static void pnv_pci_ioda_fixup_iov_resources(struct 
pci_dev *pdev)
 * Window and limit the system flexibility.
 */
gate = phb->ioda.m64_segsize >> 1;
+   total_vf_bar_sz = 0;

for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
res = &pdev->resource[i + PCI_IOV_RESOURCES];
@@ -2741,13 +2742,13 @@ static void pnv_pci_ioda_fixup_iov_resources(struct 
pci_dev *pdev)
return;
}

-   size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES);
+   total_vf_bar_sz += pci_iov_resource_size(pdev,
+   i + PCI_IOV_RESOURCES);

/* bigger than or equal to gate */
-   if (size >= gate) {
-   dev_info(&pdev->dev, "PowerNV: VF BAR%d: %pR IOV size "
-   "is bigger than %lld, roundup power2\n",
-i, res, gate);
+   if (total_vf_bar_sz >= gate) {
+   dev_info(&pdev->dev, "PowerNV: VF BAR Total IOV size "
+   "is bigger than %lld, roundup power2\n", gate);
mul = roundup_pow_of_two(total_vfs);
pdn->m64_single_mode = true;
break;
-- 
1.7.9.5

>>
>>-- 
>>Richard Yang
>>Help you, Help me

-- 
Richard Yang
Help you, Help me

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V2 1/6] powerpc/powernv: don't enable SRIOV when VF BAR contains non M64 BAR

2015-08-06 Thread Wei Yang
On Fri, Aug 07, 2015 at 11:20:10AM +1000, Gavin Shan wrote:
>On Thu, Aug 06, 2015 at 10:10:10PM +0800, Wei Yang wrote:
>>On Thu, Aug 06, 2015 at 02:35:57PM +1000, Gavin Shan wrote:
>>>On Wed, Aug 05, 2015 at 09:24:58AM +0800, Wei Yang wrote:
On PHB_IODA2, we enable SRIOV devices by mapping IOV BAR with M64 BARs. If
a SRIOV device's BAR is not 64-bit prefetchable, this is not assigned from
M64 windwo, which means M64 BAR can't work on it.

>>>
>>>s/PHB_IODA2/PHB3
>>>s/windwo/window
>>>
This patch makes this explicit.

Signed-off-by: Wei Yang 
>>>
>>>The idea sounds right, but there is one question as below.
>>>
---
 arch/powerpc/platforms/powernv/pci-ioda.c |   25 +
 1 file changed, 9 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 5738d31..9b41dba 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -908,9 +908,6 @@ static int pnv_pci_vf_resource_shift(struct pci_dev 
*dev, int offset)
if (!res->flags || !res->parent)
continue;

-   if (!pnv_pci_is_mem_pref_64(res->flags))
-   continue;
-
/*
 * The actual IOV BAR range is determined by the start address
 * and the actual size for num_vfs VFs BAR.  This check is to
@@ -939,9 +936,6 @@ static int pnv_pci_vf_resource_shift(struct pci_dev 
*dev, int offset)
if (!res->flags || !res->parent)
continue;

-   if (!pnv_pci_is_mem_pref_64(res->flags))
-   continue;
-
size = pci_iov_resource_size(dev, i + PCI_IOV_RESOURCES);
res2 = *res;
res->start += size * offset;
@@ -1221,9 +1215,6 @@ static int pnv_pci_vf_assign_m64(struct pci_dev 
*pdev, u16 num_vfs)
if (!res->flags || !res->parent)
continue;

-   if (!pnv_pci_is_mem_pref_64(res->flags))
-   continue;
-
for (j = 0; j < vf_groups; j++) {
do {
win = 
 find_next_zero_bit(&phb->ioda.m64_bar_alloc,
@@ -1510,6 +1501,12 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 
num_vfs)
pdn = pci_get_pdn(pdev);

if (phb->type == PNV_PHB_IODA2) {
+   if (!pdn->vfs_expanded) {
+   dev_info(&pdev->dev, "don't support this SRIOV device"
+   " with non M64 VF BAR\n");
+   return -EBUSY;
+   }
+
>>>
>>>It would be -ENOSPC since -EBUSY indicates the devices (VFs) are temparily
>>>unavailable. For this case, the VFs are permanently unavailable because of
>>>running out of space to accomodate M64 and non-M64 VF BARs.
>>>
>>>The error message could be printed with dev_warn() and it would be precise
>>>as below or something else you prefer:
>>>
>>> dev_warn(&pdev->dev, "SRIOV not supported because of non-M64 VF BAR\n");
>>>
>>
>>Thanks for the comment, will change accordingly.
>>
>>>
/* Calculate available PE for required VFs */
mutex_lock(&phb->ioda.pe_alloc_mutex);
pdn->offset = bitmap_find_next_zero_area(
@@ -2774,9 +2771,10 @@ static void pnv_pci_ioda_fixup_iov_resources(struct 
pci_dev *pdev)
if (!res->flags || res->parent)
continue;
if (!pnv_pci_is_mem_pref_64(res->flags)) {
-   dev_warn(&pdev->dev, " non M64 VF BAR%d: %pR\n",
+   dev_warn(&pdev->dev, "Don't support SR-IOV with"
+   " non M64 VF BAR%d: %pR. \n",
 i, res);
-   continue;
+   return;
}

size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES);
@@ -2795,11 +2793,6 @@ static void pnv_pci_ioda_fixup_iov_resources(struct 
pci_dev *pdev)
res = &pdev->resource[i + PCI_IOV_RESOURCES];
if (!res->flags || res->parent)
continue;
-   if (!pnv_pci_is_mem_pref_64(res->flags)) {
-   dev_warn(&pdev->dev, "Skipping expanding VF BAR%d: 
%pR\n",
-i, res);
-   continue;
-   }
>>>
>>>When any one IOV BAR on the PF is non-M64, none of the VFs can be enabled.
>>>Will we still allocate/assign M64 or M32 resources for the IOV BARs? If so,
>>>I think it can be avoided.
>>>
>>
>>Don't get your point. You mean to avoid this function?
>>
>>Or clear the IOV BAR when we found one of it is non-M64?
>>
>
>I mean to clear all IOV BARs in case any more of them are IO

Re: [PATCH] powerpc/prom: Use DRCONF flags while processing detected LMBs

2015-08-06 Thread Madhavan Srinivasan


On Thursday 06 August 2015 06:35 PM, Anshuman Khandual wrote:
> This patch just replaces hard coded values with existing

  Please drop "This patch just" and start with "Replace hard ..."

  https://www.kernel.org/doc/Documentation/SubmittingPatches 

  Maddy

> DRCONF flags while procesing detected LMBs from the device
> tree. This does not change any functionality.
>
> Signed-off-by: Anshuman Khandual 
> ---
>  arch/powerpc/kernel/prom.c | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
> index 8b888b1..70a8cab 100644
> --- a/arch/powerpc/kernel/prom.c
> +++ b/arch/powerpc/kernel/prom.c
> @@ -476,9 +476,10 @@ static int __init 
> early_init_dt_scan_drconf_memory(unsigned long node)
>   flags = of_read_number(&dm[3], 1);
>   /* skip DRC index, pad, assoc. list index, flags */
>   dm += 4;
> - /* skip this block if the reserved bit is set in flags (0x80)
> -or if the block is not assigned to this partition (0x8) */
> - if ((flags & 0x80) || !(flags & 0x8))
> + /* skip this block if the reserved bit is set in flags
> +or if the block is not assigned to this partition */
> + if ((flags & DRCONF_MEM_RESERVED) ||
> + !(flags & DRCONF_MEM_ASSIGNED))
>   continue;
>   size = memblock_size;
>   rngs = 1;

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V2 3/6] powerpc/powernv: use one M64 BAR in Single PE mode for one VF BAR

2015-08-06 Thread Wei Yang
On Thu, Aug 06, 2015 at 08:04:58PM +1000, Alexey Kardashevskiy wrote:
>On 08/05/2015 11:25 AM, Wei Yang wrote:
>>In current implementation, when VF BAR is bigger than 64MB, it uses 4 M64
>>BAR in Single PE mode to cover the number of VFs required to be enabled.
>>By doing so, several VFs would be in one VF Group and leads to interference
>>between VFs in the same group.
>>
>>This patch changes the design by using one M64 BAR in Single PE mode for
>>one VF BAR. This gives absolute isolation for VFs.
>>
>>Signed-off-by: Wei Yang 
>>---
>>  arch/powerpc/include/asm/pci-bridge.h |5 +-
>>  arch/powerpc/platforms/powernv/pci-ioda.c |  180 
>> -
>>  2 files changed, 76 insertions(+), 109 deletions(-)
>>
>>diff --git a/arch/powerpc/include/asm/pci-bridge.h 
>>b/arch/powerpc/include/asm/pci-bridge.h
>>index 712add5..8aeba4c 100644
>>--- a/arch/powerpc/include/asm/pci-bridge.h
>>+++ b/arch/powerpc/include/asm/pci-bridge.h
>>@@ -214,10 +214,9 @@ struct pci_dn {
>>  u16 vfs_expanded;   /* number of VFs IOV BAR expanded */
>>  u16 num_vfs;/* number of VFs enabled*/
>>  int offset; /* PE# for the first VF PE */
>>-#define M64_PER_IOV 4
>>- int m64_per_iov;
>>+ boolm64_single_mode;/* Use M64 BAR in Single Mode */
>>  #define IODA_INVALID_M64(-1)
>>- int m64_wins[PCI_SRIOV_NUM_BARS][M64_PER_IOV];
>>+ int (*m64_map)[PCI_SRIOV_NUM_BARS];
>>  #endif /* CONFIG_PCI_IOV */
>>  #endif
>>  struct list_head child_list;
>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
>>b/arch/powerpc/platforms/powernv/pci-ioda.c
>>index 7192e62..f5d110c 100644
>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>@@ -1148,29 +1148,36 @@ static void pnv_pci_ioda_setup_PEs(void)
>>  }
>>
>>  #ifdef CONFIG_PCI_IOV
>>-static int pnv_pci_vf_release_m64(struct pci_dev *pdev)
>>+static int pnv_pci_vf_release_m64(struct pci_dev *pdev, u16 num_vfs)
>>  {
>>  struct pci_bus*bus;
>>  struct pci_controller *hose;
>>  struct pnv_phb*phb;
>>  struct pci_dn *pdn;
>>  inti, j;
>>+ intm64_bars;
>>
>>  bus = pdev->bus;
>>  hose = pci_bus_to_host(bus);
>>  phb = hose->private_data;
>>  pdn = pci_get_pdn(pdev);
>>
>>+ if (pdn->m64_single_mode)
>>+ m64_bars = num_vfs;
>>+ else
>>+ m64_bars = 1;
>>+
>>  for (i = 0; i < PCI_SRIOV_NUM_BARS; i++)
>>- for (j = 0; j < M64_PER_IOV; j++) {
>>- if (pdn->m64_wins[i][j] == IODA_INVALID_M64)
>>+ for (j = 0; j < m64_bars; j++) {
>>+ if (pdn->m64_map[j][i] == IODA_INVALID_M64)
>>  continue;
>>  opal_pci_phb_mmio_enable(phb->opal_id,
>>- OPAL_M64_WINDOW_TYPE, pdn->m64_wins[i][j], 0);
>>- clear_bit(pdn->m64_wins[i][j], 
>>&phb->ioda.m64_bar_alloc);
>>- pdn->m64_wins[i][j] = IODA_INVALID_M64;
>>+ OPAL_M64_WINDOW_TYPE, pdn->m64_map[j][i], 0);
>>+ clear_bit(pdn->m64_map[j][i], &phb->ioda.m64_bar_alloc);
>>+ pdn->m64_map[j][i] = IODA_INVALID_M64;
>>  }
>>
>>+ kfree(pdn->m64_map);
>>  return 0;
>>  }
>>
>>@@ -1187,8 +1194,7 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, 
>>u16 num_vfs)
>>  inttotal_vfs;
>>  resource_size_tsize, start;
>>  intpe_num;
>>- intvf_groups;
>>- intvf_per_group;
>>+ intm64_bars;
>>
>>  bus = pdev->bus;
>>  hose = pci_bus_to_host(bus);
>>@@ -1196,26 +1202,26 @@ static int pnv_pci_vf_assign_m64(struct pci_dev 
>>*pdev, u16 num_vfs)
>>  pdn = pci_get_pdn(pdev);
>>  total_vfs = pci_sriov_get_totalvfs(pdev);
>>
>>- /* Initialize the m64_wins to IODA_INVALID_M64 */
>>- for (i = 0; i < PCI_SRIOV_NUM_BARS; i++)
>>- for (j = 0; j < M64_PER_IOV; j++)
>>- pdn->m64_wins[i][j] = IODA_INVALID_M64;
>>+ if (pdn->m64_single_mode)
>
>
>This is a physical function's @pdn, right?

Yes

>
>
>>+ m64_bars = num_vfs;
>>+ else
>>+ m64_bars = 1;
>>+
>>+ pdn->m64_map = kmalloc(sizeof(*pdn->m64_map) * m64_bars, GFP_KERNEL);
>
>
>Assume we have SRIOV device with 16VF.
>So it was m64_wins[6][4], now it is (roughly speaking) m64_map[6][16]
>(for a single PE mode) or m64_map[6][1]. I believe m64_bars cannot be
>bigger than 16 on PHB3, right? Is this checked anywhere (does it have
>to)?

In pnv_pci_vf_assign_m64(), we need to find_next_zero_bit() and check the
return value. If exceed m64_bar_idx, means fail.

>
>This m64_wins -> m64_map change - is was not a map (what was it?),
>and it is, is not it?

Hmm... G

Re: [PATCH V2 1/6] powerpc/powernv: don't enable SRIOV when VF BAR contains non M64 BAR

2015-08-06 Thread Gavin Shan
On Thu, Aug 06, 2015 at 10:10:10PM +0800, Wei Yang wrote:
>On Thu, Aug 06, 2015 at 02:35:57PM +1000, Gavin Shan wrote:
>>On Wed, Aug 05, 2015 at 09:24:58AM +0800, Wei Yang wrote:
>>>On PHB_IODA2, we enable SRIOV devices by mapping IOV BAR with M64 BARs. If
>>>a SRIOV device's BAR is not 64-bit prefetchable, this is not assigned from
>>>M64 windwo, which means M64 BAR can't work on it.
>>>
>>
>>s/PHB_IODA2/PHB3
>>s/windwo/window
>>
>>>This patch makes this explicit.
>>>
>>>Signed-off-by: Wei Yang 
>>
>>The idea sounds right, but there is one question as below.
>>
>>>---
>>> arch/powerpc/platforms/powernv/pci-ioda.c |   25 +
>>> 1 file changed, 9 insertions(+), 16 deletions(-)
>>>
>>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
>>>b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>index 5738d31..9b41dba 100644
>>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>@@ -908,9 +908,6 @@ static int pnv_pci_vf_resource_shift(struct pci_dev 
>>>*dev, int offset)
>>> if (!res->flags || !res->parent)
>>> continue;
>>>
>>>-if (!pnv_pci_is_mem_pref_64(res->flags))
>>>-continue;
>>>-
>>> /*
>>>  * The actual IOV BAR range is determined by the start address
>>>  * and the actual size for num_vfs VFs BAR.  This check is to
>>>@@ -939,9 +936,6 @@ static int pnv_pci_vf_resource_shift(struct pci_dev 
>>>*dev, int offset)
>>> if (!res->flags || !res->parent)
>>> continue;
>>>
>>>-if (!pnv_pci_is_mem_pref_64(res->flags))
>>>-continue;
>>>-
>>> size = pci_iov_resource_size(dev, i + PCI_IOV_RESOURCES);
>>> res2 = *res;
>>> res->start += size * offset;
>>>@@ -1221,9 +1215,6 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, 
>>>u16 num_vfs)
>>> if (!res->flags || !res->parent)
>>> continue;
>>>
>>>-if (!pnv_pci_is_mem_pref_64(res->flags))
>>>-continue;
>>>-
>>> for (j = 0; j < vf_groups; j++) {
>>> do {
>>> win = 
>>> find_next_zero_bit(&phb->ioda.m64_bar_alloc,
>>>@@ -1510,6 +1501,12 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 
>>>num_vfs)
>>> pdn = pci_get_pdn(pdev);
>>>
>>> if (phb->type == PNV_PHB_IODA2) {
>>>+if (!pdn->vfs_expanded) {
>>>+dev_info(&pdev->dev, "don't support this SRIOV device"
>>>+" with non M64 VF BAR\n");
>>>+return -EBUSY;
>>>+}
>>>+
>>
>>It would be -ENOSPC since -EBUSY indicates the devices (VFs) are temparily
>>unavailable. For this case, the VFs are permanently unavailable because of
>>running out of space to accomodate M64 and non-M64 VF BARs.
>>
>>The error message could be printed with dev_warn() and it would be precise
>>as below or something else you prefer:
>>
>>  dev_warn(&pdev->dev, "SRIOV not supported because of non-M64 VF BAR\n");
>>
>
>Thanks for the comment, will change accordingly.
>
>>
>>> /* Calculate available PE for required VFs */
>>> mutex_lock(&phb->ioda.pe_alloc_mutex);
>>> pdn->offset = bitmap_find_next_zero_area(
>>>@@ -2774,9 +2771,10 @@ static void pnv_pci_ioda_fixup_iov_resources(struct 
>>>pci_dev *pdev)
>>> if (!res->flags || res->parent)
>>> continue;
>>> if (!pnv_pci_is_mem_pref_64(res->flags)) {
>>>-dev_warn(&pdev->dev, " non M64 VF BAR%d: %pR\n",
>>>+dev_warn(&pdev->dev, "Don't support SR-IOV with"
>>>+" non M64 VF BAR%d: %pR. \n",
>>>  i, res);
>>>-continue;
>>>+return;
>>> }
>>>
>>> size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES);
>>>@@ -2795,11 +2793,6 @@ static void pnv_pci_ioda_fixup_iov_resources(struct 
>>>pci_dev *pdev)
>>> res = &pdev->resource[i + PCI_IOV_RESOURCES];
>>> if (!res->flags || res->parent)
>>> continue;
>>>-if (!pnv_pci_is_mem_pref_64(res->flags)) {
>>>-dev_warn(&pdev->dev, "Skipping expanding VF BAR%d: 
>>>%pR\n",
>>>- i, res);
>>>-continue;
>>>-}
>>
>>When any one IOV BAR on the PF is non-M64, none of the VFs can be enabled.
>>Will we still allocate/assign M64 or M32 resources for the IOV BARs? If so,
>>I think it can be avoided.
>>
>
>Don't get your point. You mean to avoid this function?
>
>Or clear the IOV BAR when we found one of it is non-M64?
>

I mean to clear all IOV BARs in case any more of them are IO or M32. In this
case, the SRIOV capability won't be enabled. Otherwise, the resources for
all IOV BARs are assigned and allocated by 

Re: [PATCH V2 5/6] powerpc/powernv: boundary the total vf bar size instead of the individual one

2015-08-06 Thread Gavin Shan
On Thu, Aug 06, 2015 at 10:03:04PM +0800, Wei Yang wrote:
>On Thu, Aug 06, 2015 at 03:28:51PM +1000, Gavin Shan wrote:
>>On Wed, Aug 05, 2015 at 09:25:02AM +0800, Wei Yang wrote:
>>>Each VF could have 6 BARs at most. When the total BAR size exceeds the
>>>gate, after expanding it will also exhaust the M64 Window.
>>>
>>>This patch limits the boundary by checking the total VF BAR size instead of
>>>the individual BAR.
>>>
>>>Signed-off-by: Wei Yang 
>>
>>Ok. I didn't look at this when giving comments to last patch. It turns
>>you have the change in this patch. Please merge it with the previous
>>patch.
>>
>
>Hmm... I prefer to have them in two patches. One focus on the calculation of
>gate and the other focus on checking the total VF BAR size. This would help
>record the change.
>

It's fine to me as well. I'll take close look on your next revision since
you have to refresh the whole series. Is that fine to you?

>>>---
>>> arch/powerpc/platforms/powernv/pci-ioda.c |   13 +++--
>>> 1 file changed, 7 insertions(+), 6 deletions(-)
>>>
>>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
>>>b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>index 31dcedc..4042303 100644
>>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>@@ -2702,7 +2702,7 @@ static void pnv_pci_ioda_fixup_iov_resources(struct 
>>>pci_dev *pdev)
>>> struct pnv_phb *phb;
>>> struct resource *res;
>>> int i;
>>>-resource_size_t size, gate;
>>>+resource_size_t size, gate, total_vf_bar_sz;
>>> struct pci_dn *pdn;
>>> int mul, total_vfs;
>>>
>>>@@ -2729,6 +2729,7 @@ static void pnv_pci_ioda_fixup_iov_resources(struct 
>>>pci_dev *pdev)
>>>  * Window and limit the system flexibility.
>>>  */
>>> gate = phb->ioda.m64_segsize >> 1;
>>>+total_vf_bar_sz = 0;
>>>
>>> for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
>>> res = &pdev->resource[i + PCI_IOV_RESOURCES];
>>>@@ -2741,13 +2742,13 @@ static void pnv_pci_ioda_fixup_iov_resources(struct 
>>>pci_dev *pdev)
>>> return;
>>> }
>>>
>>>-size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES);
>>>+total_vf_bar_sz += pci_iov_resource_size(pdev,
>>>+i + PCI_IOV_RESOURCES);
>>>
>>> /* bigger than or equal to gate */
>>>-if (size >= gate) {
>>>-dev_info(&pdev->dev, "PowerNV: VF BAR%d: %pR IOV size "
>>>-"is bigger than %lld, roundup power2\n",
>>>- i, res, gate);
>>>+if (total_vf_bar_sz >= gate) {
>>>+dev_info(&pdev->dev, "PowerNV: VF BAR Total IOV size "
>>>+"is bigger than %lld, roundup power2\n", gate);
>>> mul = roundup_pow_of_two(total_vfs);
>>> pdn->m64_single_mode = true;
>>> break;
>>>-- 
>>>1.7.9.5
>>>
>
>-- 
>Richard Yang
>Help you, Help me

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V2 3/6] powerpc/powernv: use one M64 BAR in Single PE mode for one VF BAR

2015-08-06 Thread Wei Yang
On Thu, Aug 06, 2015 at 08:07:01PM +1000, Gavin Shan wrote:
>On Thu, Aug 06, 2015 at 05:36:02PM +0800, Wei Yang wrote:
>>On Thu, Aug 06, 2015 at 03:20:25PM +1000, Gavin Shan wrote:
>>>On Wed, Aug 05, 2015 at 09:25:00AM +0800, Wei Yang wrote:
In current implementation, when VF BAR is bigger than 64MB, it uses 4 M64
BAR in Single PE mode to cover the number of VFs required to be enabled.
By doing so, several VFs would be in one VF Group and leads to interference
between VFs in the same group.

This patch changes the design by using one M64 BAR in Single PE mode for
one VF BAR. This gives absolute isolation for VFs.

Signed-off-by: Wei Yang 
---
 arch/powerpc/include/asm/pci-bridge.h |5 +-
 arch/powerpc/platforms/powernv/pci-ioda.c |  180 
 -
 2 files changed, 76 insertions(+), 109 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h 
b/arch/powerpc/include/asm/pci-bridge.h
index 712add5..8aeba4c 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -214,10 +214,9 @@ struct pci_dn {
u16 vfs_expanded;   /* number of VFs IOV BAR expanded */
u16 num_vfs;/* number of VFs enabled*/
int offset; /* PE# for the first VF PE */
-#define M64_PER_IOV 4
-   int m64_per_iov;
+   boolm64_single_mode;/* Use M64 BAR in Single Mode */
 #define IODA_INVALID_M64(-1)
-   int m64_wins[PCI_SRIOV_NUM_BARS][M64_PER_IOV];
+   int (*m64_map)[PCI_SRIOV_NUM_BARS];
>>>
>>>It can be explicit? For example:
>>>
>>> int *m64_map;
>>>
>>> /* Initialization */
>>> size_t size = sizeof(*pdn->m64_map) * PCI_SRIOV_NUM_BARS * 
>>> num_of_max_VFs;
>>> pdn->m64_map = kmalloc(size, GFP_KERNEL);
>>> for (i = 0; i < PCI_SRIOV_NUM_BARS; i++)
>>> for (j = 0; j < num_of_max_VFs; j++)
>>> pdn->m64_map[i * PCI_SRIOV_NUM_BARS + j] = 
>>> PNV_INVALID_M64;
>>>
>>> /* Destroy */
>>> int step = 1;
>>>
>>> if (!pdn->m64_single_mode)
>>> step = phb->ioda.total_pe;
>>> for (i = 0; i < PCI_SRIOV_NUM_BARS * num_of_max_VFs; i += step)
>>> if (pdn->m64_map[i] == PNV_INVALID_M64)
>>> continue;
>>>
>>> /* Unmap the window */
>>> 
>>
>>The m64_map is a pointer to an array with 6 elements, which represents the 6
>>M64 BAR index for the 6 VF BARs.
>>
>>When we use Shared Mode, one array is allocated. The six elements
>>represents the six M64 BAR(at most) used to map the whole IOV BAR.
>>
>>When we use Single Mode, num_vfs array is allocate. Each array represents
>>the map between one VF's BAR and M64 BAR index.
>>
>>During the map and un-map, M64 BAR is assigned one by one in VF BAR's order.
>>So I think the code is explicit.
>>
>>In your code, you allocate a big one dimension array to hold the M64 BAR
>>index. It works, while I don't think this is more explicit than original code.
>>
>
>When M64 is in Single Mode, array with (num_vfs * 6) entries is allocated
>because every VF BAR (6 at most) will have one corresponding PHB M64 BAR.
>Anything I missed?
>
>The point in my code is you needn't worry about the mode (single vs shared)
>As I said, not too much memory wasted. However, it's up to you.
>

If we don't want to save some memory, how about just define them static
instead of dynamically allocate?

>I'm not fan of "int (*m64_map)[PCI_SRIOV_NUM_BARS]". Instead, you can replace
>it with "int *m64_map" and calculate its size using following formula:
>
>   sizeof(*pdn->m64_map) * PCI_SRIOV_NUM_BARS;
>
>   sizeof(*pdn->m64_map) * PCI_SRIOV_NUM_BARS * num_vfs;
>
>>-- 
>>Richard Yang
>>Help you, Help me

-- 
Richard Yang
Help you, Help me

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V2 6/6] powerpc/powernv: allocate discrete PE# when using M64 BAR in Single PE mode

2015-08-06 Thread Gavin Shan
On Thu, Aug 06, 2015 at 09:41:41PM +0800, Wei Yang wrote:
>On Thu, Aug 06, 2015 at 03:36:01PM +1000, Gavin Shan wrote:
>>On Wed, Aug 05, 2015 at 09:25:03AM +0800, Wei Yang wrote:
>>>When M64 BAR is set to Single PE mode, the PE# assigned to VF could be
>>>discrete.
>>>
>>>This patch restructures the patch to allocate discrete PE# for VFs when M64
>>>BAR is set to Single PE mode.
>>>
>>>Signed-off-by: Wei Yang 
>>>---
>>> arch/powerpc/include/asm/pci-bridge.h |2 +-
>>> arch/powerpc/platforms/powernv/pci-ioda.c |   69 
>>> +
>>> 2 files changed, 51 insertions(+), 20 deletions(-)
>>>
>>>diff --git a/arch/powerpc/include/asm/pci-bridge.h 
>>>b/arch/powerpc/include/asm/pci-bridge.h
>>>index 8aeba4c..72415c7 100644
>>>--- a/arch/powerpc/include/asm/pci-bridge.h
>>>+++ b/arch/powerpc/include/asm/pci-bridge.h
>>>@@ -213,7 +213,7 @@ struct pci_dn {
>>> #ifdef CONFIG_PCI_IOV
>>> u16 vfs_expanded;   /* number of VFs IOV BAR expanded */
>>> u16 num_vfs;/* number of VFs enabled*/
>>>-int offset; /* PE# for the first VF PE */
>>>+int *offset;/* PE# for the first VF PE or array */
>>> boolm64_single_mode;/* Use M64 BAR in Single Mode */
>>> #define IODA_INVALID_M64(-1)
>>> int (*m64_map)[PCI_SRIOV_NUM_BARS];
>>
>>how about renaming "offset" to "pe_num_map", or "pe_map" ? Similar to the 
>>comments
>>I gave to the "m64_bar_map", num_of_max_vfs entries can be allocated. Though 
>>not
>>all of them will be used, not too much memory will be wasted.
>>
>
>Thanks for your comment.
>
>I have thought about change the name to make it more self explain. While
>another fact I want to take in is this field is also used to be reflect the
>shift offset when M64 BAR is used in the Shared Mode. So I maintain the name.
>
>How about use "enum", one maintain the name "offset", and another one rename to
>"pe_num_map". And use the meaningful name at proper place?
>

Ok. I'm explaining it with more details. There are two cases: single vs shared
mode. When PHB M64 BARs run in single mode, you need an array to track the
allocated discrete PE#. The VF_index is the index to the array. When PHB M64
BARs run in shared mode, you need continuous PE#. No array required for this
case. Instead, the starting PE# should be stored to somewhere, which can
be pdn->offset[0] simply.

So when allocating memory for this array, you just simply allocate 
(sizeof(*pdn->offset)
*max_vf_num) no matter what mode PHB's M64 BARs will run in. The point is nobody
can enable (max_vf_num + 1) VFs.

With above way, the arrays for PE# and M64 BAR remapping needn't be allocated
when enabling SRIOV capability and releasing on disabling SRIOV capability.
Instead, those two arrays can be allocated during resource fixup time and 
free'ed
when destroying the pdn.

>>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
>>>b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>index 4042303..9953829 100644
>>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>@@ -1243,7 +1243,7 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, 
>>>u16 num_vfs)
>>>
>>> /* Map the M64 here */
>>> if (pdn->m64_single_mode) {
>>>-pe_num = pdn->offset + j;
>>>+pe_num = pdn->offset[j];
>>> rc = opal_pci_map_pe_mmio_window(phb->opal_id,
>>> pe_num, OPAL_M64_WINDOW_TYPE,
>>> pdn->m64_map[j][i], 0);
>>>@@ -1347,7 +1347,7 @@ void pnv_pci_sriov_disable(struct pci_dev *pdev)
>>> struct pnv_phb*phb;
>>> struct pci_dn *pdn;
>>> struct pci_sriov  *iov;
>>>-u16 num_vfs;
>>>+u16num_vfs, i;
>>>
>>> bus = pdev->bus;
>>> hose = pci_bus_to_host(bus);
>>>@@ -1361,14 +1361,18 @@ void pnv_pci_sriov_disable(struct pci_dev *pdev)
>>>
>>> if (phb->type == PNV_PHB_IODA2) {
>>> if (!pdn->m64_single_mode)
>>>-pnv_pci_vf_resource_shift(pdev, -pdn->offset);
>>>+pnv_pci_vf_resource_shift(pdev, -*pdn->offset);
>>>
>>> /* Release M64 windows */
>>> pnv_pci_vf_release_m64(pdev, num_vfs);
>>>
>>> /* Release PE numbers */
>>>-bitmap_clear(phb->ioda.pe_alloc, pdn->offset, num_vfs);
>>>-pdn->offset = 0;
>>>+if (pdn->m64_single_mode) {
>>>+for (i = 0; i < num_vfs; i++)
>>>+pnv_ioda_free_pe(phb, pdn->offset[i]);
>>>+} else
>>>+bitmap_clear(phb->ioda.pe_alloc, *pdn->offset, num_vfs);
>>>+kfree(pdn->offset);
>>
>>Can pnv_ioda_free_pe() be reused to release PE ?
>
>You mean use it to similar thing in pnv_ioda_deconfigure_pe()?
>

Forget it please

Re: [PATCH v6 41/42] drivers/of: Export OF changeset functions

2015-08-06 Thread Gavin Shan
On Thu, Aug 06, 2015 at 08:48:10AM -0500, Rob Herring wrote:
>On Wed, Aug 5, 2015 at 11:11 PM, Gavin Shan  wrote:

Thanks, Rob. All your comments will be covered in next revision.

Thanks,
Gavin

>> The PowerNV PCI hotplug driver is going to use the OF changeset
>> to manage the changed device sub-tree, which requires those OF
>> changeset functions are exported.
>>
>> Signed-off-by: Gavin Shan 
>> ---
>>  drivers/of/dynamic.c  | 65 
>> ---
>>  drivers/of/overlay.c  |  8 +++
>>  drivers/of/unittest.c |  4 ++--
>>  include/linux/of.h|  2 ++
>>  4 files changed, 54 insertions(+), 25 deletions(-)
>>
>> diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c
>> index 53826b8..af65b5b 100644
>> --- a/drivers/of/dynamic.c
>> +++ b/drivers/of/dynamic.c
>> @@ -646,6 +646,7 @@ void of_changeset_init(struct of_changeset *ocs)
>> memset(ocs, 0, sizeof(*ocs));
>> INIT_LIST_HEAD(&ocs->entries);
>>  }
>> +EXPORT_SYMBOL(of_changeset_init);
>
>We probably want these to be the _GPL variant.
>
>>
>>  /**
>>   * of_changeset_destroy - Destroy a changeset
>> @@ -662,20 +663,9 @@ void of_changeset_destroy(struct of_changeset *ocs)
>> list_for_each_entry_safe_reverse(ce, cen, &ocs->entries, node)
>> __of_changeset_entry_destroy(ce);
>>  }
>> +EXPORT_SYMBOL(of_changeset_destroy);
>>
>> -/**
>> - * of_changeset_apply - Applies a changeset
>> - *
>> - * @ocs:   changeset pointer
>> - *
>> - * Applies a changeset to the live tree.
>> - * Any side-effects of live tree state changes are applied here on
>> - * sucess, like creation/destruction of devices and side-effects
>> - * like creation of sysfs properties and directories.
>> - * Returns 0 on success, a negative error value in case of an error.
>> - * On error the partially applied effects are reverted.
>> - */
>> -int of_changeset_apply(struct of_changeset *ocs)
>> +int __of_changeset_apply(struct of_changeset *ocs)
>>  {
>> struct of_changeset_entry *ce;
>> int ret;
>> @@ -704,17 +694,30 @@ int of_changeset_apply(struct of_changeset *ocs)
>>  }
>>
>>  /**
>> - * of_changeset_revert - Reverts an applied changeset
>> + * of_changeset_apply - Applies a changeset
>>   *
>>   * @ocs:   changeset pointer
>>   *
>> - * Reverts a changeset returning the state of the tree to what it
>> - * was before the application.
>> - * Any side-effects like creation/destruction of devices and
>> - * removal of sysfs properties and directories are applied.
>> + * Applies a changeset to the live tree.
>> + * Any side-effects of live tree state changes are applied here on
>> + * sucess, like creation/destruction of devices and side-effects
>
>s/sucess/success/
>
>> + * like creation of sysfs properties and directories.
>>   * Returns 0 on success, a negative error value in case of an error.
>> + * On error the partially applied effects are reverted.
>>   */
>> -int of_changeset_revert(struct of_changeset *ocs)
>> +int of_changeset_apply(struct of_changeset *ocs)
>> +{
>> +   int ret;
>> +
>> +   mutex_lock(&of_mutex);
>> +   ret = __of_changeset_apply(ocs);
>> +   mutex_unlock(&of_mutex);
>> +
>> +   return ret;
>> +}
>> +EXPORT_SYMBOL(of_changeset_apply);
>> +
>> +int __of_changeset_revert(struct of_changeset *ocs)
>>  {
>> struct of_changeset_entry *ce;
>> int ret;
>> @@ -742,6 +745,29 @@ int of_changeset_revert(struct of_changeset *ocs)
>>  }
>>
>>  /**
>> + * of_changeset_revert - Reverts an applied changeset
>> + *
>> + * @ocs:   changeset pointer
>> + *
>> + * Reverts a changeset returning the state of the tree to what it
>> + * was before the application.
>> + * Any side-effects like creation/destruction of devices and
>> + * removal of sysfs properties and directories are applied.
>> + * Returns 0 on success, a negative error value in case of an error.
>> + */
>> +int of_changeset_revert(struct of_changeset *ocs)
>> +{
>> +   int ret;
>> +
>> +   mutex_lock(&of_mutex);
>> +   ret = __of_changeset_revert(ocs);
>> +   mutex_unlock(&of_mutex);
>> +
>> +   return ret;
>> +}
>> +EXPORT_SYMBOL(of_changeset_revert);
>> +
>> +/**
>>   * of_changeset_action - Perform a changeset action
>>   *
>>   * @ocs:   changeset pointer
>> @@ -779,3 +805,4 @@ int of_changeset_action(struct of_changeset *ocs, 
>> unsigned long action,
>> list_add_tail(&ce->node, &ocs->entries);
>> return 0;
>>  }
>> +EXPORT_SYMBOL(of_changeset_action);
>> diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c
>> index 24e025f..804ea33 100644
>> --- a/drivers/of/overlay.c
>> +++ b/drivers/of/overlay.c
>> @@ -378,9 +378,9 @@ int of_overlay_create(struct device_node *tree)
>> }
>>
>> /* apply the changeset */
>> -   err = of_changeset_apply(&ov->cset);
>> +   err = __of_changeset_apply(&ov->cset);
>> if (err) {
>> -   pr_err("%s: of_changeset_apply() failed for tree@%s\n",
>> +   

Re: [PATCH V2 2/6] powerpc/powernv: simplify the calculation of iov resource

2015-08-06 Thread Gavin Shan
On Thu, Aug 06, 2015 at 09:49:02PM +0800, Wei Yang wrote:
>On Thu, Aug 06, 2015 at 02:51:40PM +1000, Gavin Shan wrote:
>>On Wed, Aug 05, 2015 at 09:24:59AM +0800, Wei Yang wrote:
>>>The alignment of IOV BAR on PowerNV platform is the total size of the IOV
>>>BAR. No matter whether the IOV BAR is truncated or not, the total size
>>>could be calculated by (vfs_expanded * VF size).
>>>
>>
>>s/VF size/VF BAR size
>>
>>I think the changelog would be more explicit:
>>
>>The alignment of IOV BAR on PowerNV platform is the total size of the
>>IOV BAR, no matter whether the IOV BAR is extended with number of max
>>VFs or number of max PE number (256). The alignment can be calculated
>
>number of max VFs is not correct. This should be
>roundup_pow_of_two(total_vfs).
>
>Others looks good to me.
>

Yes, You're correct.

>>by (vfs_expaned * VF_BAR_size).
>>
>>>This patch simplifies the pnv_pci_iov_resource_alignment() by removing the
>>>first case.
>>>
>>>Signed-off-by: Wei Yang 
>>
>>Reviewed-by: Gavin Shan 
>>
>>>---
>>> arch/powerpc/platforms/powernv/pci-ioda.c |   14 +-
>>> 1 file changed, 9 insertions(+), 5 deletions(-)
>>>
>>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
>>>b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>index 9b41dba..7192e62 100644
>>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>@@ -2987,12 +2987,16 @@ static resource_size_t 
>>>pnv_pci_iov_resource_alignment(struct pci_dev *pdev,
>>>   int resno)
>>> {
>>> struct pci_dn *pdn = pci_get_pdn(pdev);
>>>-resource_size_t align, iov_align;
>>>-
>>>-iov_align = resource_size(&pdev->resource[resno]);
>>>-if (iov_align)
>>>-return iov_align;
>>>+resource_size_t align;
>>>
>>>+/*
>>>+ * On PowerNV platform, IOV BAR is mapped by M64 BAR to enable the
>>>+ * SR-IOV. While from hardware perspective, the range mapped by M64
>>>+ * BAR should be size aligned.
>>>+ *
>>>+ * This function return the total IOV BAR size if expanded or just the
>>>+ * individual size if not.
>>>+ */
>>> align = pci_iov_resource_size(pdev, resno);
>>> if (pdn->vfs_expanded)
>>> return pdn->vfs_expanded * align;
>>>-- 
>>>1.7.9.5
>>>
>
>-- 
>Richard Yang
>Help you, Help me

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V2 2/6] powerpc/powernv: simplify the calculation of iov resource

2015-08-06 Thread Wei Yang
On Thu, Aug 06, 2015 at 08:15:55PM +1000, Alexey Kardashevskiy wrote:
>On 08/06/2015 07:41 PM, Wei Yang wrote:
>>On Thu, Aug 06, 2015 at 07:00:00PM +1000, Alexey Kardashevskiy wrote:
>>>On 08/06/2015 02:51 PM, Gavin Shan wrote:
On Wed, Aug 05, 2015 at 09:24:59AM +0800, Wei Yang wrote:
>The alignment of IOV BAR on PowerNV platform is the total size of the IOV
>BAR. No matter whether the IOV BAR is truncated or not, the total size
>could be calculated by (vfs_expanded * VF size).
>

s/VF size/VF BAR size

I think the changelog would be more explicit:

The alignment of IOV BAR on PowerNV platform is the total size of the
IOV BAR, no matter whether the IOV BAR is extended with number of max
VFs or number of max PE number (256). The alignment can be calculated
by (vfs_expaned * VF_BAR_size).
>>>
>>>
>>>
>>>Is that really a PowerNV-specific requirement or it is valid for
>>>every platform (I suspect this is the case here)?
>>>
>>
>>Currently, it is PowerNV-specific.
>
>
>How is x86 different on this matter?
>Why would we need this extra alignment, not just VF's BAR alignment?
>
>

The difference lie the "PE" isolation on Power.

From the MMIO perspective, we use M32 BAR and M64 BAR to map a MMIO range to a
PE#.  M32 is controlled with one M32 BAR, while M64 has 16 BARs.

Now let's back to the pure PCI world. Every PCI device has BAR, which is
required to be size aligned. For example, BAR#0 is 1MB, which means it is 1MB
aligned. Then let's look at the SRIOV case. The PF's IOV BAR (6 BARs at most)
contains the start address of the total_vfs number of VFs BAR. For example, VF
BAR#0 size is 1MB, and suppose PF's IOV BAR#0 is assigned to 0 and total_vfs
is 8. The IOV BAR is mapped to a range [0 - (8MB -1)]. When pci core
allocating the IOV BAR, it just make sure IOV BAR is 1MB aligned, since by
doing so each VF BAR is size aligned. That is x86 does and maybe other
platforms. I believe you understand this part.

Now back to our platform, we want to use M64 BAR to map the IOV BAR. Come to
the example above. The IOV BAR#0 is 8MB size, if we still let the pci core
allocate it with 1MB alignment, our M64 BAR can't work. Since M64 BAR itself
should be size aligned(in this case 8MB.). The same concept as PCI BAR.

>>>
>>>Also, what is the exact meaning of "expanded" in @vfs_expanded? It is
>>>either 255 (if individual VF BARs are <= 64MB) or
>>>roundup_pow_of_two(total_vfs) (which is something like 4 or 16). What
>>>is expanded here?
>>>
>>
>>PF's IOV BAR original size is (VF BAR size * total_vfs).
>>
>>After expanding, the IOV BAR size  is (VF BAR size * 256) or (VF BAR size *
>>roundup_pow_of_two(total_vfs)).
>
>
>Ufff, got it now. I'd store just an expanded IOV BAR size (not some
>magic VFs number) because this is what it actually is:
>pdn->vfs_expanded * align
>

This would change the idea, since VF still could have 6 BARs. In your
proposal, we need to 6 variable to store the expanded size for each.

>
>>>

>This patch simplifies the pnv_pci_iov_resource_alignment() by removing the
>first case.
>
>Signed-off-by: Wei Yang 

Reviewed-by: Gavin Shan 

>---
>arch/powerpc/platforms/powernv/pci-ioda.c |   14 +-
>1 file changed, 9 insertions(+), 5 deletions(-)
>
>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
>b/arch/powerpc/platforms/powernv/pci-ioda.c
>index 9b41dba..7192e62 100644
>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>@@ -2987,12 +2987,16 @@ static resource_size_t 
>pnv_pci_iov_resource_alignment(struct pci_dev *pdev,
> int resno)
>{
>   struct pci_dn *pdn = pci_get_pdn(pdev);
>-  resource_size_t align, iov_align;
>-
>-  iov_align = resource_size(&pdev->resource[resno]);
>-  if (iov_align)
>-  return iov_align;
>+  resource_size_t align;
>
>+  /*
>+   * On PowerNV platform, IOV BAR is mapped by M64 BAR to enable the
>+   * SR-IOV. While from hardware perspective, the range mapped by M64
>+   * BAR should be size aligned.
>+   *
>+   * This function return the total IOV BAR size if expanded or just the
>+   * individual size if not.
>+   */
>   align = pci_iov_resource_size(pdev, resno);
>   if (pdn->vfs_expanded)
>   return pdn->vfs_expanded * align;
>--
>1.7.9.5
>

>>>
>>>
>>>--
>>>Alexey
>>
>
>
>-- 
>Alexey

-- 
Richard Yang
Help you, Help me

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V2 1/6] powerpc/powernv: don't enable SRIOV when VF BAR contains non M64 BAR

2015-08-06 Thread Alexey Kardashevskiy

On 08/07/2015 12:13 AM, Wei Yang wrote:

On Thu, Aug 06, 2015 at 05:47:42PM +1000, Alexey Kardashevskiy wrote:

On 08/06/2015 04:57 PM, Gavin Shan wrote:

On Thu, Aug 06, 2015 at 04:10:21PM +1000, Alexey Kardashevskiy wrote:

On 08/06/2015 02:35 PM, Gavin Shan wrote:

On Wed, Aug 05, 2015 at 09:24:58AM +0800, Wei Yang wrote:

On PHB_IODA2, we enable SRIOV devices by mapping IOV BAR with M64 BARs. If
a SRIOV device's BAR is not 64-bit prefetchable, this is not assigned from
M64 windwo, which means M64 BAR can't work on it.



The proper text would be something like this:

===
SRIOV only supports 64bit MMIO. So if we fail to assign 64bit BAR, we
cannot enable the device.
===






s/PHB_IODA2/PHB3



No, it is IODA2. OPEL does PHB3-specific bits, the host kernel just uses OPAL.



Ok.




s/windwo/window


This patch makes this explicit.

Signed-off-by: Wei Yang 


The idea sounds right, but there is one question as below.


---
arch/powerpc/platforms/powernv/pci-ioda.c |   25 +
1 file changed, 9 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 5738d31..9b41dba 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -908,9 +908,6 @@ static int pnv_pci_vf_resource_shift(struct pci_dev *dev, 
int offset)
if (!res->flags || !res->parent)
continue;

-   if (!pnv_pci_is_mem_pref_64(res->flags))
-   continue;
-
/*
 * The actual IOV BAR range is determined by the start address
 * and the actual size for num_vfs VFs BAR.  This check is to
@@ -939,9 +936,6 @@ static int pnv_pci_vf_resource_shift(struct pci_dev *dev, 
int offset)
if (!res->flags || !res->parent)
continue;

-   if (!pnv_pci_is_mem_pref_64(res->flags))
-   continue;
-
size = pci_iov_resource_size(dev, i + PCI_IOV_RESOURCES);
res2 = *res;
res->start += size * offset;
@@ -1221,9 +1215,6 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, 
u16 num_vfs)
if (!res->flags || !res->parent)
continue;

-   if (!pnv_pci_is_mem_pref_64(res->flags))
-   continue;
-
for (j = 0; j < vf_groups; j++) {
do {
win = 
find_next_zero_bit(&phb->ioda.m64_bar_alloc,
@@ -1510,6 +1501,12 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 
num_vfs)
pdn = pci_get_pdn(pdev);

if (phb->type == PNV_PHB_IODA2) {
+   if (!pdn->vfs_expanded) {
+   dev_info(&pdev->dev, "don't support this SRIOV device"
+   " with non M64 VF BAR\n");
+   return -EBUSY;
+   }
+


It would be -ENOSPC since -EBUSY indicates the devices (VFs) are temparily
unavailable. For this case, the VFs are permanently unavailable because of
running out of space to accomodate M64 and non-M64 VF BARs.

The error message could be printed with dev_warn() and it would be precise
as below or something else you prefer:

dev_warn(&pdev->dev, "SRIOV not supported because of non-M64 VF BAR\n");



Both messages are cryptic.

If it is not M64 BAR, then what is it? It is always in one of M64 BARs (in
the worst case - BAR#15?), the difference is if it is segmented or not, no?



The VF BAR could be one of IO, M32, M64. If it's not M64, the VFs are supposed
to be disabled and the (IO and M32) resources won't be allocted, but for sure,
the IO and M32 resources can't be put into any one of the 16 PHB's M64 BARs.
would you recommend one better message then?




dev_warn(&pdev->dev, "SRIOV is disabled as no space is left in 64bit
MMIO window\n");

Or it is not "MMIO window"?



The reason is not "no space left in 64bit MMIO window".

The reason is the IOV BAR is not 64bit prefetchable, then in linux kernel this
can't be allocated from M64 Space, then we can't use M64 BAR to cover it.


Oh. So now it is not M64 vs. M32 and IO, it is about prefetchable vs. 
non-prefetchable. Please choose one.


Should it be this then?
dev_warn(&pdev->dev, "Non-prefetchable BARs are not supported for SRIOV")


But Gavin keeps insisting on mentioning "non-M64 BAR" - this part I do not 
understand.


And where does this limit come from? Is it POWER8, IODA2, PHB3, SRIOV or 
something else? Is it all about isolation or the host _without_ KVM but 
with SRIOV also cannot use VFs if they have non-prefetchable BARs? Is this 
because of POWER8 or IODA2 or PHB3 or SRIOV or something else?














/* Calculate available PE for required VFs */
mutex_lock(&phb->ioda.pe_alloc_mutex);
pdn->offset = bitmap_find_next_zero_area(
@@ -2774,9 +2771,10 @@ static voi

Re: [PATCH v5 RESEND] IFC: Change IO accessor based on endianness

2015-08-06 Thread Brian Norris
On Thu, Aug 06, 2015 at 06:24:38PM -0500, Scott Wood wrote:
> On Thu, 2015-08-06 at 09:52 -0700, Brian Norris wrote:
> > Who takes patches for drivers/memory/ again? I can take it via MTD if
> > no one else steps up.
>  There's no maintainer listed.  IIRC, when we previously discussed the patch 
> we both said that it could go via either of our trees.  I'll take it via mine.

I recall the conversation. I was just confused (again) by there being no
maintainer entry. Have at it.

> > It's nothing new, but this patch continues the pattern of using a global
> > pointer to the IFC device structure. Not pretty, but not worth holding
> > this up over.
> 
> It's not pretty, but I see little reason to come up with more complicated 
> infrastructure for the drivers finding each other given the low odds of ever 
> seeing a single-kernel system with more than one IFC.  It's impossible with 
> current chips.  If it ever does happen we can do something fancier.

Yep, I figured as much.

Brian
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 2/2] powerpc32: optimise csum_partial() loop

2015-08-06 Thread Segher Boessenkool
On Thu, Aug 06, 2015 at 05:45:45PM -0500, Scott Wood wrote:
> > The original loop was already optimal, as the comment said.
> 
> The comment says that bdnz has zero overhead.  That doesn't mean the adde 
> won't stall waiting for the load result.

adde is execution serialising on those cores; it *always* stalls,
that is, it won't run until it is next to complete.

> > The new code adds extra instructions and a mispredicted branch.
> 
> Outside the main loop.

Sure, I never said it was super-bad or anything.

> >   You also might get less overlap between the loads and adde (I didn't check
> > if there is any originally): those instructions are no longer
> > interleaved.
> >
> > I think it is a stupid idea to optimise code for all 32-bit PowerPC
> > CPUs based on solely what is best for a particularly simple, slow
> > implementation; and that is what this patch is doing.
> 
> The simple and slow implementation is the one that needs optimizations the 
> most.

And, on the other hand, optimising for atypical (mostly) in-order
single-issue chips without branch folding, hurts performance on
other chips the most.  Well, dual-issue in-order might be worse :-P

> If this makes performance non-negligibly worse on other 32-bit chips, and is 
> an important improvement on 8xx, then we can use an ifdef since 8xx already 
> requires its own kernel build.  I'd prefer to see a benchmark showing that it 
> actually does make things worse on those chips, though.

And I'd like to see a benchmark that shows it *does not* hurt performance
on most chips, and does improve things on 8xx, and by how much.  But it
isn't *me* who has to show that, it is not my patch.

If these csum routines actually matter for performance that much, there
really *should* be chip-specific implementations.


Segher
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v5 RESEND] IFC: Change IO accessor based on endianness

2015-08-06 Thread Scott Wood
On Thu, 2015-08-06 at 09:52 -0700, Brian Norris wrote:
> On Wed, May 20, 2015 at 09:17:11PM -0500, Scott Wood wrote:
> > From: Jaiprakash Singh 
> > 
> > IFC IO accressor are set at run time based
> > on IFC IP registers endianness.IFC node in
> > DTS file contains information about
> > endianness.
> > 
> > Signed-off-by: Jaiprakash Singh 
> > Signed-off-by: Scott Wood 
> > ---
> > v5: I'm assuming it's the same as v4, but I didn't send v4, and this
> > comes from a versionless [RESEND] that never made it to the mailing list,
> > so bumping the version just in case.
> 
> Acked-by: Brian Norris 
> 
> Who takes patches for drivers/memory/ again? I can take it via MTD if
> no one else steps up.
 There's no maintainer listed.  IIRC, when we previously discussed the patch 
we both said that it could go via either of our trees.  I'll take it via mine.

> It's nothing new, but this patch continues the pattern of using a global
> pointer to the IFC device structure. Not pretty, but not worth holding
> this up over.

It's not pretty, but I see little reason to come up with more complicated 
infrastructure for the drivers finding each other given the low odds of ever 
seeing a single-kernel system with more than one IFC.  It's impossible with 
current chips.  If it ever does happen we can do something fancier.

-Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: windfarm: decrement client count when unregistering

2015-08-06 Thread Paul Bolle
On vr, 2015-08-07 at 00:21 +0200, Paul Bolle wrote:
> On wo, 2015-08-05 at 14:16 +1000, Michael Ellerman wrote:
> > I also get an oops when removing windfarm_lm75_sensor, so I suspect there 
> > are
> > gremlins in the module ref counting for windfarm.
> 
> (This I haven't (yet) looked into.)

And that might be, sort of, related. Because oops is probably triggered
by the, it seems, rather straightforward chain of events triggered by
unloading an I2C module. (So windfarm_lm75_sensor refcount must be
zero.) Which gets interesting at:

wf_lm75_remove()
wf_unregister_sensor(&wf_lm75_sensor->sens)
wf_put_sensor(&wf_lm75_sensor->sens)
module_put(wf_lm75_sensor->sens->ops->owner /* THIS_MODULE */)

And in windfarm_lm75_sensor we trigger this issue because in the
.probe() function there appears to be no corresponding call to
try_module_get() preventing unloading the module, as we saw in windfarm_
smu_sensors.

So module refcounting looks broken for both these modules in opposite
ways. Gremlins indeed.

Good luck!


Paul Bolle
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 2/2] powerpc32: optimise csum_partial() loop

2015-08-06 Thread Scott Wood
On Wed, 2015-08-05 at 23:39 -0500, Segher Boessenkool wrote:
> On Wed, Aug 05, 2015 at 09:31:41PM -0500, Scott Wood wrote:
> > On Wed, 2015-08-05 at 19:30 -0500, Segher Boessenkool wrote:
> > > On Wed, Aug 05, 2015 at 03:29:35PM +0200, Christophe Leroy wrote:
> > > > On the 8xx, load latency is 2 cycles and taking branches also takes
> > > > 2 cycles. So let's unroll the loop.
> > > 
> > > This is not true for most other 32-bit PowerPC; this patch makes
> > > performance worse on e.g. 6xx/7xx/7xxx.  Let's not!
> > 
> > Chips with a load latency greater than 2 cycles should also benefit from 
> > the 
> > unrolling.  Have you benchmarked this somewhere and seen it reduce 
> > performance?  Do you know of any 32-bit PPC chips with a load latency 
> > less 
> > than 2 cycles?
> 
> The original loop was already optimal, as the comment said.

The comment says that bdnz has zero overhead.  That doesn't mean the adde 
won't stall waiting for the load result.

> The new code adds extra instructions and a mispredicted branch.

Outside the main loop.

>   You also might get less overlap between the loads and adde (I didn't check
> if there is any originally): those instructions are no longer
> interleaved.
>
> I think it is a stupid idea to optimise code for all 32-bit PowerPC
> CPUs based on solely what is best for a particularly simple, slow
> implementation; and that is what this patch is doing.

The simple and slow implementation is the one that needs optimizations the 
most.

If this makes performance non-negligibly worse on other 32-bit chips, and is 
an important improvement on 8xx, then we can use an ifdef since 8xx already 
requires its own kernel build.  I'd prefer to see a benchmark showing that it 
actually does make things worse on those chips, though.

-Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: windfarm: decrement client count when unregistering

2015-08-06 Thread Paul Bolle
On wo, 2015-08-05 at 14:16 +1000, Michael Ellerman wrote:
> On Fri, 2015-31-07 at 12:08:58 UTC, Paul Bolle wrote:
> > windfarm_corex_exit() contains:
> > BUG_ON(wf_client_count != 0);
> > 
> > I wonder why that, apparently. never triggered.
> 
> Hmm interesting.
> 
> A quick test here on an iMacG5 shows that we get into a state where we can't
> remove windfarm_core:
> 
>   $ lsmod
>   Module  Size  Used by
>   windfarm_smu_sensors7549  2
>   windfarm_core  15391  1 windfarm_smu_sensors
> 
> 
> Which means we can't trigger windfarm_core_exit() and the BUG_ON().

Perhaps this is what, roughly, happens:

smu_sensors_init()
smu_ads_create()
/* Let's assume this happens ... */
ads->sens.ops = &smu_cpuamp_ops
ads->sens.name = "cpu-current"
smu_ads_create()
/* ditto ... */
ads->sens.ops = &smu_cpuvolt_ops
ads->sens.name = "cpu-voltage"

/* ... so this would then be true */
if (volt_sensor && curr_sensor)
/* and we do this */
smu_cpu_power_create(&volt_sensor->sens, &curr_sensor->sens)
wf_get_sensor(&volt_sensor->sens)
try_module_get(volt_sensor->sens->ops->owner /* THIS_MODULE */)
wf_get_sensor(&curr_sensor->sens)
try_module_get(curr_sensor->sens->ops->owner /* THIS_MODULE */)

The cleanup would have happened here:

smu_sensors_exit()
while (!list_empty(&smu_ads)
wf_unregister_sensor(&ads->sens)
wf_put_sensor()
/* would this also be done for sensors that never 
 * triggered a call to module_get()? */
module_put(ads->sens->ops->owner /* THIS MODULE */)

But, whatever it is that smu_sensors_exit() wants to do, it will never
be called since there are these two references to this module that
smu_sensors_init() created itself, preventing the unloading of this
module.

Does the above look plausible?

Note that this was only cobbled together by staring at the code for far
too long. If I had some powerpc machine at hand I could have actually
tested this with a few strategically placed printk()'s.

> I also get an oops when removing windfarm_lm75_sensor, so I suspect there are
> gremlins in the module ref counting for windfarm.

(This I haven't (yet) looked into.)

> I'll merge this as probably correct.

Hope this helps,


Paul Bolle
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/3] Powerpc: mpc85xx: refactor the PM operations

2015-08-06 Thread Scott Wood
On Thu, 2015-08-06 at 13:54 +0800, Chenhui Zhao wrote:
> On Thu, Aug 6, 2015 at 1:46 PM, Scott Wood  
> wrote:
> > On Thu, 2015-08-06 at 12:20 +0800, Chenhui Zhao wrote:
> > >  On Thu, Aug 6, 2015 at 10:57 AM, Scott Wood 
> > > 
> > >  wrote:
> > >  > On Wed, 2015-08-05 at 18:11 +0800, Chenhui Zhao wrote:
> > >  > >  On Tue, Aug 4, 2015 at 4:26 AM, Scott Wood 
> > > 
> > >  > >  wrote:
> > >  > >  > On Mon, 2015-08-03 at 19:32 +0800, Chenhui Zhao wrote:
> > >  > >  > >  >
> > >  > >  >
> > >  > >  > >  On Sat, Aug 1, 2015 at 7:59 AM, Scott Wood
> > >  > > 
> > >  > >  > >  wrote:
> > >  > >  >
> > >  > >  > >  >
> > >  > >  > >  > Could you explain irq_mask()?  Why would there still be 
> > > IRQs
> > >  > >  > > destined
> > >  > >  > >  > for
> > >  > >  > >  > this CPU at this point?
> > >  > >  > >
> > >  > >  > >  This function just masks irq by setting the registers in 
> > > RCPM
> > >  > > (for
> > >  > >  > >  example, RCPM_CPMIMR, RCPM_CPMCIMR). Actually, all irqs to
> > >  > > this CPU
> > >  > >  > >  have been migrated to other CPUs.
> > >  > >  >
> > >  > >  > So why do we need to set those bits in RCPM?  Is it just 
> > > caution?
> > >  > >
> > >  > >  Setting these bits can mask interrupts signalled to RCPM from 
> > > MPIC
> > >  > > as a
> > >  > >  means of
> > >  > >  waking up from a lower power state. So, cores will not be 
> > > waked up
> > >  > >  unexpectedly.
> > >  >
> > >  > Why would the MPIC be signalling those interrupts if they've been
> > >  > masked at
> > >  > the MPIC?
> > >  >
> > >  > -Scott
> > >  >
> > > 
> > >  The interrupts to RCPM from MPIC are IRQ, Machine Check, NMI and
> > >  Critical interrupts. Some of them didn't be masked in MPIC.
> > 
> > What interrupt could actually happen to a sleeping cpu that this 
> > protects
> > against?
> > 
> > -Scott
> 
> Not sure. Maybe spurious interrupts or hardware exceptions.

Spurious interrupts happen due to race conditions.  They don't happen because 
the MPIC is bored and decides to ring a CPU's doorbell and hide in the bushes.

If by "hardware exceptions" you mean machine checks, how would such a machine 
check be generated by a core that is off?

>  However, setting them make sure dead cpus can not be waked up unexpectedly.

I'm not seeing enough value here to warrant resurrecting the old sleep node 
stuff.

-Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] kvm:powerpc:Fix error handling in the function mpic_set_default_irq_routing

2015-08-06 Thread Nicholas Krause
This fixes error handling in the function mpic_set_default_irq_routing
by checking if the call to the function kvm_set_irq_routing has failed
and if so exit immediately to the caller by returning the error code
returned by the call to mpic_set_default_irq_routing.

Signed-off-by: Nicholas Krause 
---
 arch/powerpc/kvm/mpic.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/mpic.c b/arch/powerpc/kvm/mpic.c
index 6249cdc..5a18859 100644
--- a/arch/powerpc/kvm/mpic.c
+++ b/arch/powerpc/kvm/mpic.c
@@ -1641,13 +1641,16 @@ static void mpic_destroy(struct kvm_device *dev)
 static int mpic_set_default_irq_routing(struct openpic *opp)
 {
struct kvm_irq_routing_entry *routing;
+   int ret;
 
/* Create a nop default map, so that dereferencing it still works */
routing = kzalloc((sizeof(*routing)), GFP_KERNEL);
if (!routing)
return -ENOMEM;
 
-   kvm_set_irq_routing(opp->kvm, routing, 0, 0);
+   ret = kvm_set_irq_routing(opp->kvm, routing, 0, 0);
+   if (ret)
+   return ret;
 
kfree(routing);
return 0;
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v5 RESEND] IFC: Change IO accessor based on endianness

2015-08-06 Thread Brian Norris
On Wed, May 20, 2015 at 09:17:11PM -0500, Scott Wood wrote:
> From: Jaiprakash Singh 
> 
> IFC IO accressor are set at run time based
> on IFC IP registers endianness.IFC node in
> DTS file contains information about
> endianness.
> 
> Signed-off-by: Jaiprakash Singh 
> Signed-off-by: Scott Wood 
> ---
> v5: I'm assuming it's the same as v4, but I didn't send v4, and this
> comes from a versionless [RESEND] that never made it to the mailing list,
> so bumping the version just in case.

Acked-by: Brian Norris 

Who takes patches for drivers/memory/ again? I can take it via MTD if
no one else steps up.

It's nothing new, but this patch continues the pattern of using a global
pointer to the IFC device structure. Not pretty, but not worth holding
this up over.

Brian

>  .../bindings/memory-controllers/fsl/ifc.txt|   3 +
>  drivers/memory/fsl_ifc.c   |  43 ++--
>  drivers/mtd/nand/fsl_ifc_nand.c| 258 
> +++--
>  include/linux/fsl_ifc.h|  50 
>  4 files changed, 213 insertions(+), 141 deletions(-)
> 
> diff --git a/Documentation/devicetree/bindings/memory-controllers/fsl/ifc.txt 
> b/Documentation/devicetree/bindings/memory-controllers/fsl/ifc.txt
> index d5e3704..89427b0 100644
> --- a/Documentation/devicetree/bindings/memory-controllers/fsl/ifc.txt
> +++ b/Documentation/devicetree/bindings/memory-controllers/fsl/ifc.txt
> @@ -18,6 +18,8 @@ Properties:
>interrupt (NAND_EVTER_STAT).  If there is only one,
>that interrupt reports both types of event.
>  
> +- little-endian : If this property is absent, the big-endian mode will
> +  be in use as default for registers.
>  
>  - ranges : Each range corresponds to a single chipselect, and covers
> the entire access window as configured.
> @@ -34,6 +36,7 @@ Example:
>   #size-cells = <1>;
>   reg = <0x0 0xffe1e000 0 0x2000>;
>   interrupts = <16 2 19 2>;
> + little-endian;
>  
>   /* NOR, NAND Flashes and CPLD on board */
>   ranges = <0x0 0x0 0x0 0xee00 0x0200
> diff --git a/drivers/memory/fsl_ifc.c b/drivers/memory/fsl_ifc.c
> index 410c397..e87459f 100644
> --- a/drivers/memory/fsl_ifc.c
> +++ b/drivers/memory/fsl_ifc.c
> @@ -62,7 +62,7 @@ int fsl_ifc_find(phys_addr_t addr_base)
>   return -ENODEV;
>  
>   for (i = 0; i < fsl_ifc_ctrl_dev->banks; i++) {
> - u32 cspr = in_be32(&fsl_ifc_ctrl_dev->regs->cspr_cs[i].cspr);
> + u32 cspr = ifc_in32(&fsl_ifc_ctrl_dev->regs->cspr_cs[i].cspr);
>   if (cspr & CSPR_V && (cspr & CSPR_BA) ==
>   convert_ifc_address(addr_base))
>   return i;
> @@ -79,16 +79,16 @@ static int fsl_ifc_ctrl_init(struct fsl_ifc_ctrl *ctrl)
>   /*
>* Clear all the common status and event registers
>*/
> - if (in_be32(&ifc->cm_evter_stat) & IFC_CM_EVTER_STAT_CSER)
> - out_be32(&ifc->cm_evter_stat, IFC_CM_EVTER_STAT_CSER);
> + if (ifc_in32(&ifc->cm_evter_stat) & IFC_CM_EVTER_STAT_CSER)
> + ifc_out32(IFC_CM_EVTER_STAT_CSER, &ifc->cm_evter_stat);
>  
>   /* enable all error and events */
> - out_be32(&ifc->cm_evter_en, IFC_CM_EVTER_EN_CSEREN);
> + ifc_out32(IFC_CM_EVTER_EN_CSEREN, &ifc->cm_evter_en);
>  
>   /* enable all error and event interrupts */
> - out_be32(&ifc->cm_evter_intr_en, IFC_CM_EVTER_INTR_EN_CSERIREN);
> - out_be32(&ifc->cm_erattr0, 0x0);
> - out_be32(&ifc->cm_erattr1, 0x0);
> + ifc_out32(IFC_CM_EVTER_INTR_EN_CSERIREN, &ifc->cm_evter_intr_en);
> + ifc_out32(0x0, &ifc->cm_erattr0);
> + ifc_out32(0x0, &ifc->cm_erattr1);
>  
>   return 0;
>  }
> @@ -127,9 +127,9 @@ static u32 check_nand_stat(struct fsl_ifc_ctrl *ctrl)
>  
>   spin_lock_irqsave(&nand_irq_lock, flags);
>  
> - stat = in_be32(&ifc->ifc_nand.nand_evter_stat);
> + stat = ifc_in32(&ifc->ifc_nand.nand_evter_stat);
>   if (stat) {
> - out_be32(&ifc->ifc_nand.nand_evter_stat, stat);
> + ifc_out32(stat, &ifc->ifc_nand.nand_evter_stat);
>   ctrl->nand_stat = stat;
>   wake_up(&ctrl->nand_wait);
>   }
> @@ -161,16 +161,16 @@ static irqreturn_t fsl_ifc_ctrl_irq(int irqno, void 
> *data)
>   irqreturn_t ret = IRQ_NONE;
>  
>   /* read for chip select error */
> - cs_err = in_be32(&ifc->cm_evter_stat);
> + cs_err = ifc_in32(&ifc->cm_evter_stat);
>   if (cs_err) {
>   dev_err(ctrl->dev, "transaction sent to IFC is not mapped to"
>   "any memory bank 0x%08X\n", cs_err);
>   /* clear the chip select error */
> - out_be32(&ifc->cm_evter_stat, IFC_CM_EVTER_STAT_CSER);
> + ifc_out32(IFC_CM_EVTER_STAT_CSER, &ifc->cm_evter_stat);
>  
>   /* read error attribute registers print the error informatio

Re: [Xen-devel] [PATCH v2 4/8] xen: Use the correctly the Xen memory terminologies

2015-08-06 Thread Stefano Stabellini
On Thu, 6 Aug 2015, Julien Grall wrote:
> On 06/08/15 12:06, Stefano Stabellini wrote:
> > On Thu, 6 Aug 2015, Julien Grall wrote:
> >> Hi,
> >>
> >>
> >> On 04/08/15 19:12, Julien Grall wrote:
> >>> diff --git a/include/xen/page.h b/include/xen/page.h
> >>> index c5ed20b..e7e1425 100644
> >>> --- a/include/xen/page.h
> >>> +++ b/include/xen/page.h
> >>> @@ -3,9 +3,9 @@
> >>>  
> >>>  #include 
> >>>  
> >>> -static inline unsigned long page_to_mfn(struct page *page)
> >>> +static inline unsigned long page_to_gfn(struct page *page)
> >>>  {
> >>> - return pfn_to_mfn(page_to_pfn(page));
> >>> + return pfn_to_gfn(page_to_pfn(page));
> >>>  }
> >>
> >> I've just noticed that there is a function gfn_to_page used for KVM.
> >>
> >> Maybe I should rename page_to_gfn to xen_page_to_gfn to avoid confusion
> >> with KVM one?
> > 
> > Yeah, prepending xen would help to avoid namespace pollution.
> 
> Will do. May I keep your Reviewed-by for this mechanical change?

Yes

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v6 40/42] drivers/of: Return allocated memory chunk from of_fdt_unflatten_tree()

2015-08-06 Thread Rob Herring
On Wed, Aug 5, 2015 at 11:11 PM, Gavin Shan  wrote:
> This changes of_fdt_unflatten_tree() so that it returns the allocated
> memory chunk for unflattened device-tree, which can be released once
> it's obsoleted.
>
> Signed-off-by: Gavin Shan 

Acked-by: Rob Herring 

> ---
>  drivers/of/fdt.c   | 11 ++-
>  include/linux/of_fdt.h |  2 +-
>  2 files changed, 7 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> index 074870a..8e1ba7e 100644
> --- a/drivers/of/fdt.c
> +++ b/drivers/of/fdt.c
> @@ -408,7 +408,7 @@ static void *unflatten_dt_node(const void *blob,
>   * @dt_alloc: An allocator that provides a virtual address to memory
>   * for the resulting tree
>   */
> -static void __unflatten_device_tree(const void *blob,
> +static void *__unflatten_device_tree(const void *blob,
>  struct device_node *dad,
>  struct device_node **mynodes,
>  void * (*dt_alloc)(u64 size, u64 align))
> @@ -421,7 +421,7 @@ static void __unflatten_device_tree(const void *blob,
>
> if (!blob) {
> pr_debug("No device tree pointer\n");
> -   return;
> +   return NULL;
> }
>
> pr_debug("Unflattening device tree:\n");
> @@ -431,7 +431,7 @@ static void __unflatten_device_tree(const void *blob,
>
> if (fdt_check_header(blob)) {
> pr_err("Invalid device tree blob header\n");
> -   return;
> +   return NULL;
> }
>
> /* First pass, scan for size */
> @@ -458,6 +458,7 @@ static void __unflatten_device_tree(const void *blob,
>be32_to_cpup(mem + size));
>
> pr_debug(" <- unflatten_device_tree()\n");
> +   return mem;
>  }
>
>  static void *kernel_tree_alloc(u64 size, u64 align)
> @@ -473,11 +474,11 @@ static void *kernel_tree_alloc(u64 size, u64 align)
>   * pointers of the nodes so the normal device-tree walking functions
>   * can be used.
>   */
> -void of_fdt_unflatten_tree(const unsigned long *blob,
> +void *of_fdt_unflatten_tree(const unsigned long *blob,
> struct device_node *dad,
> struct device_node **mynodes)
>  {
> -   __unflatten_device_tree(blob, dad, mynodes, &kernel_tree_alloc);
> +   return __unflatten_device_tree(blob, dad, mynodes, 
> &kernel_tree_alloc);
>  }
>  EXPORT_SYMBOL_GPL(of_fdt_unflatten_tree);
>
> diff --git a/include/linux/of_fdt.h b/include/linux/of_fdt.h
> index 3644960..00db279 100644
> --- a/include/linux/of_fdt.h
> +++ b/include/linux/of_fdt.h
> @@ -37,7 +37,7 @@ extern bool of_fdt_is_big_endian(const void *blob,
>  unsigned long node);
>  extern int of_fdt_match(const void *blob, unsigned long node,
> const char *const *compat);
> -extern void of_fdt_unflatten_tree(const unsigned long *blob,
> +extern void *of_fdt_unflatten_tree(const unsigned long *blob,
>struct device_node *dad,
>struct device_node **mynodes);
>
> --
> 2.1.0
>
> --
> To unsubscribe from this list: send the line "unsubscribe devicetree" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V2 1/6] powerpc/powernv: don't enable SRIOV when VF BAR contains non M64 BAR

2015-08-06 Thread Wei Yang
On Thu, Aug 06, 2015 at 05:47:42PM +1000, Alexey Kardashevskiy wrote:
>On 08/06/2015 04:57 PM, Gavin Shan wrote:
>>On Thu, Aug 06, 2015 at 04:10:21PM +1000, Alexey Kardashevskiy wrote:
>>>On 08/06/2015 02:35 PM, Gavin Shan wrote:
On Wed, Aug 05, 2015 at 09:24:58AM +0800, Wei Yang wrote:
>On PHB_IODA2, we enable SRIOV devices by mapping IOV BAR with M64 BARs. If
>a SRIOV device's BAR is not 64-bit prefetchable, this is not assigned from
>M64 windwo, which means M64 BAR can't work on it.
>
>
>The proper text would be something like this:
>
>===
>SRIOV only supports 64bit MMIO. So if we fail to assign 64bit BAR, we
>cannot enable the device.
>===
>
>
>

s/PHB_IODA2/PHB3
>>>
>>>
>>>No, it is IODA2. OPEL does PHB3-specific bits, the host kernel just uses 
>>>OPAL.
>>>
>>
>>Ok.
>>
>>>
s/windwo/window

>This patch makes this explicit.
>
>Signed-off-by: Wei Yang 

The idea sounds right, but there is one question as below.

>---
>arch/powerpc/platforms/powernv/pci-ioda.c |   25 +
>1 file changed, 9 insertions(+), 16 deletions(-)
>
>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
>b/arch/powerpc/platforms/powernv/pci-ioda.c
>index 5738d31..9b41dba 100644
>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>@@ -908,9 +908,6 @@ static int pnv_pci_vf_resource_shift(struct pci_dev 
>*dev, int offset)
>   if (!res->flags || !res->parent)
>   continue;
>
>-  if (!pnv_pci_is_mem_pref_64(res->flags))
>-  continue;
>-
>   /*
>* The actual IOV BAR range is determined by the start address
>* and the actual size for num_vfs VFs BAR.  This check is to
>@@ -939,9 +936,6 @@ static int pnv_pci_vf_resource_shift(struct pci_dev 
>*dev, int offset)
>   if (!res->flags || !res->parent)
>   continue;
>
>-  if (!pnv_pci_is_mem_pref_64(res->flags))
>-  continue;
>-
>   size = pci_iov_resource_size(dev, i + PCI_IOV_RESOURCES);
>   res2 = *res;
>   res->start += size * offset;
>@@ -1221,9 +1215,6 @@ static int pnv_pci_vf_assign_m64(struct pci_dev 
>*pdev, u16 num_vfs)
>   if (!res->flags || !res->parent)
>   continue;
>
>-  if (!pnv_pci_is_mem_pref_64(res->flags))
>-  continue;
>-
>   for (j = 0; j < vf_groups; j++) {
>   do {
>   win = 
> find_next_zero_bit(&phb->ioda.m64_bar_alloc,
>@@ -1510,6 +1501,12 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 
>num_vfs)
>   pdn = pci_get_pdn(pdev);
>
>   if (phb->type == PNV_PHB_IODA2) {
>+  if (!pdn->vfs_expanded) {
>+  dev_info(&pdev->dev, "don't support this SRIOV device"
>+  " with non M64 VF BAR\n");
>+  return -EBUSY;
>+  }
>+

It would be -ENOSPC since -EBUSY indicates the devices (VFs) are temparily
unavailable. For this case, the VFs are permanently unavailable because of
running out of space to accomodate M64 and non-M64 VF BARs.

The error message could be printed with dev_warn() and it would be precise
as below or something else you prefer:

dev_warn(&pdev->dev, "SRIOV not supported because of non-M64 VF BAR\n");
>>>
>>>
>>>Both messages are cryptic.
>>>
>>>If it is not M64 BAR, then what is it? It is always in one of M64 BARs (in
>>>the worst case - BAR#15?), the difference is if it is segmented or not, no?
>>>
>>
>>The VF BAR could be one of IO, M32, M64. If it's not M64, the VFs are supposed
>>to be disabled and the (IO and M32) resources won't be allocted, but for sure,
>>the IO and M32 resources can't be put into any one of the 16 PHB's M64 BARs.
>>would you recommend one better message then?
>
>
>
>dev_warn(&pdev->dev, "SRIOV is disabled as no space is left in 64bit
>MMIO window\n");
>
>Or it is not "MMIO window"?
>

The reason is not "no space left in 64bit MMIO window".

The reason is the IOV BAR is not 64bit prefetchable, then in linux kernel this
can't be allocated from M64 Space, then we can't use M64 BAR to cover it.

>
>
>>


>   /* Calculate available PE for required VFs */
>   mutex_lock(&phb->ioda.pe_alloc_mutex);
>   pdn->offset = bitmap_find_next_zero_area(
>@@ -2774,9 +2771,10 @@ static void pnv_pci_ioda_fixup_iov_resources(struct 
>pci_dev *pdev)
>   if (!res->flags || res->parent)
>   continue;
>   if (!pnv_pci_is_mem_pref_64(res->flags)) {
>-  dev_warn(&pdev->dev, " non M64 VF BAR%d: %pR\n",
>+  dev_warn(&pdev->dev, "Don't su

Re: [PATCH V2 1/6] powerpc/powernv: don't enable SRIOV when VF BAR contains non M64 BAR

2015-08-06 Thread Wei Yang
On Thu, Aug 06, 2015 at 02:35:57PM +1000, Gavin Shan wrote:
>On Wed, Aug 05, 2015 at 09:24:58AM +0800, Wei Yang wrote:
>>On PHB_IODA2, we enable SRIOV devices by mapping IOV BAR with M64 BARs. If
>>a SRIOV device's BAR is not 64-bit prefetchable, this is not assigned from
>>M64 windwo, which means M64 BAR can't work on it.
>>
>
>s/PHB_IODA2/PHB3
>s/windwo/window
>
>>This patch makes this explicit.
>>
>>Signed-off-by: Wei Yang 
>
>The idea sounds right, but there is one question as below.
>
>>---
>> arch/powerpc/platforms/powernv/pci-ioda.c |   25 +
>> 1 file changed, 9 insertions(+), 16 deletions(-)
>>
>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
>>b/arch/powerpc/platforms/powernv/pci-ioda.c
>>index 5738d31..9b41dba 100644
>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>@@ -908,9 +908,6 @@ static int pnv_pci_vf_resource_shift(struct pci_dev *dev, 
>>int offset)
>>  if (!res->flags || !res->parent)
>>  continue;
>>
>>- if (!pnv_pci_is_mem_pref_64(res->flags))
>>- continue;
>>-
>>  /*
>>   * The actual IOV BAR range is determined by the start address
>>   * and the actual size for num_vfs VFs BAR.  This check is to
>>@@ -939,9 +936,6 @@ static int pnv_pci_vf_resource_shift(struct pci_dev *dev, 
>>int offset)
>>  if (!res->flags || !res->parent)
>>  continue;
>>
>>- if (!pnv_pci_is_mem_pref_64(res->flags))
>>- continue;
>>-
>>  size = pci_iov_resource_size(dev, i + PCI_IOV_RESOURCES);
>>  res2 = *res;
>>  res->start += size * offset;
>>@@ -1221,9 +1215,6 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, 
>>u16 num_vfs)
>>  if (!res->flags || !res->parent)
>>  continue;
>>
>>- if (!pnv_pci_is_mem_pref_64(res->flags))
>>- continue;
>>-
>>  for (j = 0; j < vf_groups; j++) {
>>  do {
>>  win = 
>> find_next_zero_bit(&phb->ioda.m64_bar_alloc,
>>@@ -1510,6 +1501,12 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 
>>num_vfs)
>>  pdn = pci_get_pdn(pdev);
>>
>>  if (phb->type == PNV_PHB_IODA2) {
>>+ if (!pdn->vfs_expanded) {
>>+ dev_info(&pdev->dev, "don't support this SRIOV device"
>>+ " with non M64 VF BAR\n");
>>+ return -EBUSY;
>>+ }
>>+
>
>It would be -ENOSPC since -EBUSY indicates the devices (VFs) are temparily
>unavailable. For this case, the VFs are permanently unavailable because of
>running out of space to accomodate M64 and non-M64 VF BARs.
>
>The error message could be printed with dev_warn() and it would be precise
>as below or something else you prefer:
>
>   dev_warn(&pdev->dev, "SRIOV not supported because of non-M64 VF BAR\n");
>

Thanks for the comment, will change accordingly.

>
>>  /* Calculate available PE for required VFs */
>>  mutex_lock(&phb->ioda.pe_alloc_mutex);
>>  pdn->offset = bitmap_find_next_zero_area(
>>@@ -2774,9 +2771,10 @@ static void pnv_pci_ioda_fixup_iov_resources(struct 
>>pci_dev *pdev)
>>  if (!res->flags || res->parent)
>>  continue;
>>  if (!pnv_pci_is_mem_pref_64(res->flags)) {
>>- dev_warn(&pdev->dev, " non M64 VF BAR%d: %pR\n",
>>+ dev_warn(&pdev->dev, "Don't support SR-IOV with"
>>+ " non M64 VF BAR%d: %pR. \n",
>>   i, res);
>>- continue;
>>+ return;
>>  }
>>
>>  size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES);
>>@@ -2795,11 +2793,6 @@ static void pnv_pci_ioda_fixup_iov_resources(struct 
>>pci_dev *pdev)
>>  res = &pdev->resource[i + PCI_IOV_RESOURCES];
>>  if (!res->flags || res->parent)
>>  continue;
>>- if (!pnv_pci_is_mem_pref_64(res->flags)) {
>>- dev_warn(&pdev->dev, "Skipping expanding VF BAR%d: 
>>%pR\n",
>>-  i, res);
>>- continue;
>>- }
>
>When any one IOV BAR on the PF is non-M64, none of the VFs can be enabled.
>Will we still allocate/assign M64 or M32 resources for the IOV BARs? If so,
>I think it can be avoided.
>

Don't get your point. You mean to avoid this function?

Or clear the IOV BAR when we found one of it is non-M64?

>>
>>  dev_dbg(&pdev->dev, " Fixing VF BAR%d: %pR to\n", i, res);
>>  size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES);
>>-- 
>>1.7.9.5
>>

-- 
Richard Yang
Help you, Help me

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lis

Re: [PATCH v6 38/42] drivers/of: Unflatten subordinate nodes after specified level

2015-08-06 Thread Rob Herring
On Wed, Aug 5, 2015 at 11:11 PM, Gavin Shan  wrote:
> unflatten_dt_node() is called recursively to unflatten FDT nodes
> with the assumption that FDT blob has only one root node, which
> isn't true when the FDT blob represents device sub-tree. This
> improves the function to supporting device sub-tree that have
> multiple nodes in the first level:
>
>* Rename original unflatten_dt_node() to __unflatten_dt_node().
>* Wrapper unflatten_dt_node() calls __unflatten_dt_node() with
>  adjusted current node depth to 1 to avoid underflow.
>
> Signed-off-by: Gavin Shan 

Acked-by: Rob Herring 

> ---
>  drivers/of/fdt.c | 53 -
>  1 file changed, 40 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> index 0749656..a18a2ce 100644
> --- a/drivers/of/fdt.c
> +++ b/drivers/of/fdt.c
> @@ -161,7 +161,7 @@ static void *unflatten_dt_alloc(void **mem, unsigned long 
> size,
>  }
>
>  /**
> - * unflatten_dt_node - Alloc and populate a device_node from the flat tree
> + * __unflatten_dt_node - Alloc and populate a device_node from the flat tree
>   * @blob: The parent device tree blob
>   * @mem: Memory chunk to use for allocating device nodes and properties
>   * @poffset: pointer to node in flat tree
> @@ -171,20 +171,20 @@ static void *unflatten_dt_alloc(void **mem, unsigned 
> long size,
>   * @dryrun: If true, do not allocate device nodes but still calculate needed
>   * memory size
>   */
> -static void * unflatten_dt_node(const void *blob,
> +static void *__unflatten_dt_node(const void *blob,
> void *mem,
> int *poffset,
> struct device_node *dad,
> struct device_node **nodepp,
> unsigned long fpsize,
> -   bool dryrun)
> +   bool dryrun,
> +   int *depth)
>  {
> const __be32 *p;
> struct device_node *np;
> struct property *pp, **prev_pp = NULL;
> const char *pathp;
> unsigned int l, allocl;
> -   static int depth = 0;
> int old_depth;
> int offset;
> int has_name = 0;
> @@ -337,13 +337,25 @@ static void * unflatten_dt_node(const void *blob,
> np->type = "";
> }
>
> -   old_depth = depth;
> -   *poffset = fdt_next_node(blob, *poffset, &depth);
> -   if (depth < 0)
> -   depth = 0;
> -   while (*poffset > 0 && depth > old_depth)
> -   mem = unflatten_dt_node(blob, mem, poffset, np, NULL,
> -   fpsize, dryrun);
> +   /* Multiple nodes might be in the first depth level if
> +* the device tree is sub-tree. All nodes in current
> +* or deeper depth are unflattened after it returns.
> +*/
> +   old_depth = *depth;
> +   *poffset = fdt_next_node(blob, *poffset, depth);
> +   while (*poffset > 0) {
> +   if (*depth < old_depth)
> +   break;
> +
> +   if (*depth == old_depth)
> +   mem = __unflatten_dt_node(blob, mem, poffset,
> + dad, NULL, fpsize,
> + dryrun, depth);
> +   else if (*depth > old_depth)
> +   mem = __unflatten_dt_node(blob, mem, poffset,
> + np, NULL, fpsize,
> + dryrun, depth);
> +   }
>
> if (*poffset < 0 && *poffset != -FDT_ERR_NOTFOUND)
> pr_err("unflatten: error %d processing FDT\n", *poffset);
> @@ -369,6 +381,20 @@ static void * unflatten_dt_node(const void *blob,
> return mem;
>  }
>
> +static void *unflatten_dt_node(const void *blob,
> +  void *mem,
> +  int *poffset,
> +  struct device_node *dad,
> +  struct device_node **nodepp,
> +  bool dryrun)
> +{
> +   int depth = 1;
> +
> +   return __unflatten_dt_node(blob, mem, poffset,
> +  dad, nodepp, 0,
> +  dryrun, &depth);
> +}
> +
>  /**
>   * __unflatten_device_tree - create tree of device_nodes from flat blob
>   *
> @@ -408,7 +434,8 @@ static void __unflatten_device_tree(const void *blob,
>
> /* First pass, scan for size */
> start = 0;
> -   size = (unsigned long)unflatten_dt_node(blob, NULL, &start, NULL, 
> NULL, 0, true);
> +   size = (unsigned long)unflatten_dt_node(blob, NULL, &start,
> +   NULL, NULL, true);
> size = ALIGN(size, 4);
>
> pr_debug("  size is %lx, allocating...\n", size);
>

Re: [PATCH V2 5/6] powerpc/powernv: boundary the total vf bar size instead of the individual one

2015-08-06 Thread Wei Yang
On Thu, Aug 06, 2015 at 03:28:51PM +1000, Gavin Shan wrote:
>On Wed, Aug 05, 2015 at 09:25:02AM +0800, Wei Yang wrote:
>>Each VF could have 6 BARs at most. When the total BAR size exceeds the
>>gate, after expanding it will also exhaust the M64 Window.
>>
>>This patch limits the boundary by checking the total VF BAR size instead of
>>the individual BAR.
>>
>>Signed-off-by: Wei Yang 
>
>Ok. I didn't look at this when giving comments to last patch. It turns
>you have the change in this patch. Please merge it with the previous
>patch.
>

Hmm... I prefer to have them in two patches. One focus on the calculation of
gate and the other focus on checking the total VF BAR size. This would help
record the change.

>>---
>> arch/powerpc/platforms/powernv/pci-ioda.c |   13 +++--
>> 1 file changed, 7 insertions(+), 6 deletions(-)
>>
>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
>>b/arch/powerpc/platforms/powernv/pci-ioda.c
>>index 31dcedc..4042303 100644
>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>@@ -2702,7 +2702,7 @@ static void pnv_pci_ioda_fixup_iov_resources(struct 
>>pci_dev *pdev)
>>  struct pnv_phb *phb;
>>  struct resource *res;
>>  int i;
>>- resource_size_t size, gate;
>>+ resource_size_t size, gate, total_vf_bar_sz;
>>  struct pci_dn *pdn;
>>  int mul, total_vfs;
>>
>>@@ -2729,6 +2729,7 @@ static void pnv_pci_ioda_fixup_iov_resources(struct 
>>pci_dev *pdev)
>>   * Window and limit the system flexibility.
>>   */
>>  gate = phb->ioda.m64_segsize >> 1;
>>+ total_vf_bar_sz = 0;
>>
>>  for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
>>  res = &pdev->resource[i + PCI_IOV_RESOURCES];
>>@@ -2741,13 +2742,13 @@ static void pnv_pci_ioda_fixup_iov_resources(struct 
>>pci_dev *pdev)
>>  return;
>>  }
>>
>>- size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES);
>>+ total_vf_bar_sz += pci_iov_resource_size(pdev,
>>+ i + PCI_IOV_RESOURCES);
>>
>>  /* bigger than or equal to gate */
>>- if (size >= gate) {
>>- dev_info(&pdev->dev, "PowerNV: VF BAR%d: %pR IOV size "
>>- "is bigger than %lld, roundup power2\n",
>>-  i, res, gate);
>>+ if (total_vf_bar_sz >= gate) {
>>+ dev_info(&pdev->dev, "PowerNV: VF BAR Total IOV size "
>>+ "is bigger than %lld, roundup power2\n", gate);
>>  mul = roundup_pow_of_two(total_vfs);
>>  pdn->m64_single_mode = true;
>>  break;
>>-- 
>>1.7.9.5
>>

-- 
Richard Yang
Help you, Help me

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V6 2/6] mm: mlock: Add new mlock system call

2015-08-06 Thread Vlastimil Babka

On 07/29/2015 05:42 PM, Eric B Munson wrote:

With the refactored mlock code, introduce a new system call for mlock.
The new call will allow the user to specify what lock states are being
added.  mlock2 is trivial at the moment, but a follow on patch will add
a new mlock state making it useful.

Signed-off-by: Eric B Munson 
Cc: Michal Hocko 
Cc: Vlastimil Babka 


Acked-by: Vlastimil Babka 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V2 2/6] powerpc/powernv: simplify the calculation of iov resource

2015-08-06 Thread Wei Yang
On Thu, Aug 06, 2015 at 02:51:40PM +1000, Gavin Shan wrote:
>On Wed, Aug 05, 2015 at 09:24:59AM +0800, Wei Yang wrote:
>>The alignment of IOV BAR on PowerNV platform is the total size of the IOV
>>BAR. No matter whether the IOV BAR is truncated or not, the total size
>>could be calculated by (vfs_expanded * VF size).
>>
>
>s/VF size/VF BAR size
>
>I think the changelog would be more explicit:
>
>The alignment of IOV BAR on PowerNV platform is the total size of the
>IOV BAR, no matter whether the IOV BAR is extended with number of max
>VFs or number of max PE number (256). The alignment can be calculated

number of max VFs is not correct. This should be
roundup_pow_of_two(total_vfs).

Others looks good to me.

>by (vfs_expaned * VF_BAR_size).
>
>>This patch simplifies the pnv_pci_iov_resource_alignment() by removing the
>>first case.
>>
>>Signed-off-by: Wei Yang 
>
>Reviewed-by: Gavin Shan 
>
>>---
>> arch/powerpc/platforms/powernv/pci-ioda.c |   14 +-
>> 1 file changed, 9 insertions(+), 5 deletions(-)
>>
>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
>>b/arch/powerpc/platforms/powernv/pci-ioda.c
>>index 9b41dba..7192e62 100644
>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>@@ -2987,12 +2987,16 @@ static resource_size_t 
>>pnv_pci_iov_resource_alignment(struct pci_dev *pdev,
>>int resno)
>> {
>>  struct pci_dn *pdn = pci_get_pdn(pdev);
>>- resource_size_t align, iov_align;
>>-
>>- iov_align = resource_size(&pdev->resource[resno]);
>>- if (iov_align)
>>- return iov_align;
>>+ resource_size_t align;
>>
>>+ /*
>>+  * On PowerNV platform, IOV BAR is mapped by M64 BAR to enable the
>>+  * SR-IOV. While from hardware perspective, the range mapped by M64
>>+  * BAR should be size aligned.
>>+  *
>>+  * This function return the total IOV BAR size if expanded or just the
>>+  * individual size if not.
>>+  */
>>  align = pci_iov_resource_size(pdev, resno);
>>  if (pdn->vfs_expanded)
>>  return pdn->vfs_expanded * align;
>>-- 
>>1.7.9.5
>>

-- 
Richard Yang
Help you, Help me

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v6 41/42] drivers/of: Export OF changeset functions

2015-08-06 Thread Rob Herring
On Wed, Aug 5, 2015 at 11:11 PM, Gavin Shan  wrote:
> The PowerNV PCI hotplug driver is going to use the OF changeset
> to manage the changed device sub-tree, which requires those OF
> changeset functions are exported.
>
> Signed-off-by: Gavin Shan 
> ---
>  drivers/of/dynamic.c  | 65 
> ---
>  drivers/of/overlay.c  |  8 +++
>  drivers/of/unittest.c |  4 ++--
>  include/linux/of.h|  2 ++
>  4 files changed, 54 insertions(+), 25 deletions(-)
>
> diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c
> index 53826b8..af65b5b 100644
> --- a/drivers/of/dynamic.c
> +++ b/drivers/of/dynamic.c
> @@ -646,6 +646,7 @@ void of_changeset_init(struct of_changeset *ocs)
> memset(ocs, 0, sizeof(*ocs));
> INIT_LIST_HEAD(&ocs->entries);
>  }
> +EXPORT_SYMBOL(of_changeset_init);

We probably want these to be the _GPL variant.

>
>  /**
>   * of_changeset_destroy - Destroy a changeset
> @@ -662,20 +663,9 @@ void of_changeset_destroy(struct of_changeset *ocs)
> list_for_each_entry_safe_reverse(ce, cen, &ocs->entries, node)
> __of_changeset_entry_destroy(ce);
>  }
> +EXPORT_SYMBOL(of_changeset_destroy);
>
> -/**
> - * of_changeset_apply - Applies a changeset
> - *
> - * @ocs:   changeset pointer
> - *
> - * Applies a changeset to the live tree.
> - * Any side-effects of live tree state changes are applied here on
> - * sucess, like creation/destruction of devices and side-effects
> - * like creation of sysfs properties and directories.
> - * Returns 0 on success, a negative error value in case of an error.
> - * On error the partially applied effects are reverted.
> - */
> -int of_changeset_apply(struct of_changeset *ocs)
> +int __of_changeset_apply(struct of_changeset *ocs)
>  {
> struct of_changeset_entry *ce;
> int ret;
> @@ -704,17 +694,30 @@ int of_changeset_apply(struct of_changeset *ocs)
>  }
>
>  /**
> - * of_changeset_revert - Reverts an applied changeset
> + * of_changeset_apply - Applies a changeset
>   *
>   * @ocs:   changeset pointer
>   *
> - * Reverts a changeset returning the state of the tree to what it
> - * was before the application.
> - * Any side-effects like creation/destruction of devices and
> - * removal of sysfs properties and directories are applied.
> + * Applies a changeset to the live tree.
> + * Any side-effects of live tree state changes are applied here on
> + * sucess, like creation/destruction of devices and side-effects

s/sucess/success/

> + * like creation of sysfs properties and directories.
>   * Returns 0 on success, a negative error value in case of an error.
> + * On error the partially applied effects are reverted.
>   */
> -int of_changeset_revert(struct of_changeset *ocs)
> +int of_changeset_apply(struct of_changeset *ocs)
> +{
> +   int ret;
> +
> +   mutex_lock(&of_mutex);
> +   ret = __of_changeset_apply(ocs);
> +   mutex_unlock(&of_mutex);
> +
> +   return ret;
> +}
> +EXPORT_SYMBOL(of_changeset_apply);
> +
> +int __of_changeset_revert(struct of_changeset *ocs)
>  {
> struct of_changeset_entry *ce;
> int ret;
> @@ -742,6 +745,29 @@ int of_changeset_revert(struct of_changeset *ocs)
>  }
>
>  /**
> + * of_changeset_revert - Reverts an applied changeset
> + *
> + * @ocs:   changeset pointer
> + *
> + * Reverts a changeset returning the state of the tree to what it
> + * was before the application.
> + * Any side-effects like creation/destruction of devices and
> + * removal of sysfs properties and directories are applied.
> + * Returns 0 on success, a negative error value in case of an error.
> + */
> +int of_changeset_revert(struct of_changeset *ocs)
> +{
> +   int ret;
> +
> +   mutex_lock(&of_mutex);
> +   ret = __of_changeset_revert(ocs);
> +   mutex_unlock(&of_mutex);
> +
> +   return ret;
> +}
> +EXPORT_SYMBOL(of_changeset_revert);
> +
> +/**
>   * of_changeset_action - Perform a changeset action
>   *
>   * @ocs:   changeset pointer
> @@ -779,3 +805,4 @@ int of_changeset_action(struct of_changeset *ocs, 
> unsigned long action,
> list_add_tail(&ce->node, &ocs->entries);
> return 0;
>  }
> +EXPORT_SYMBOL(of_changeset_action);
> diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c
> index 24e025f..804ea33 100644
> --- a/drivers/of/overlay.c
> +++ b/drivers/of/overlay.c
> @@ -378,9 +378,9 @@ int of_overlay_create(struct device_node *tree)
> }
>
> /* apply the changeset */
> -   err = of_changeset_apply(&ov->cset);
> +   err = __of_changeset_apply(&ov->cset);
> if (err) {
> -   pr_err("%s: of_changeset_apply() failed for tree@%s\n",
> +   pr_err("%s: __of_changeset_apply() failed for tree@%s\n",
> __func__, tree->full_name);
> goto err_revert_overlay;
> }
> @@ -508,7 +508,7 @@ int of_overlay_destroy(int id)
>
>
> list_del(&ov->node);
> -   of_changese

Re: [PATCH V2 6/6] powerpc/powernv: allocate discrete PE# when using M64 BAR in Single PE mode

2015-08-06 Thread Wei Yang
On Thu, Aug 06, 2015 at 03:36:01PM +1000, Gavin Shan wrote:
>On Wed, Aug 05, 2015 at 09:25:03AM +0800, Wei Yang wrote:
>>When M64 BAR is set to Single PE mode, the PE# assigned to VF could be
>>discrete.
>>
>>This patch restructures the patch to allocate discrete PE# for VFs when M64
>>BAR is set to Single PE mode.
>>
>>Signed-off-by: Wei Yang 
>>---
>> arch/powerpc/include/asm/pci-bridge.h |2 +-
>> arch/powerpc/platforms/powernv/pci-ioda.c |   69 
>> +
>> 2 files changed, 51 insertions(+), 20 deletions(-)
>>
>>diff --git a/arch/powerpc/include/asm/pci-bridge.h 
>>b/arch/powerpc/include/asm/pci-bridge.h
>>index 8aeba4c..72415c7 100644
>>--- a/arch/powerpc/include/asm/pci-bridge.h
>>+++ b/arch/powerpc/include/asm/pci-bridge.h
>>@@ -213,7 +213,7 @@ struct pci_dn {
>> #ifdef CONFIG_PCI_IOV
>>  u16 vfs_expanded;   /* number of VFs IOV BAR expanded */
>>  u16 num_vfs;/* number of VFs enabled*/
>>- int offset; /* PE# for the first VF PE */
>>+ int *offset;/* PE# for the first VF PE or array */
>>  boolm64_single_mode;/* Use M64 BAR in Single Mode */
>> #define IODA_INVALID_M64(-1)
>>  int (*m64_map)[PCI_SRIOV_NUM_BARS];
>
>how about renaming "offset" to "pe_num_map", or "pe_map" ? Similar to the 
>comments
>I gave to the "m64_bar_map", num_of_max_vfs entries can be allocated. Though 
>not
>all of them will be used, not too much memory will be wasted.
>

Thanks for your comment.

I have thought about change the name to make it more self explain. While
another fact I want to take in is this field is also used to be reflect the
shift offset when M64 BAR is used in the Shared Mode. So I maintain the name.

How about use "enum", one maintain the name "offset", and another one rename to
"pe_num_map". And use the meaningful name at proper place?

>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
>>b/arch/powerpc/platforms/powernv/pci-ioda.c
>>index 4042303..9953829 100644
>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>@@ -1243,7 +1243,7 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, 
>>u16 num_vfs)
>>
>>  /* Map the M64 here */
>>  if (pdn->m64_single_mode) {
>>- pe_num = pdn->offset + j;
>>+ pe_num = pdn->offset[j];
>>  rc = opal_pci_map_pe_mmio_window(phb->opal_id,
>>  pe_num, OPAL_M64_WINDOW_TYPE,
>>  pdn->m64_map[j][i], 0);
>>@@ -1347,7 +1347,7 @@ void pnv_pci_sriov_disable(struct pci_dev *pdev)
>>  struct pnv_phb*phb;
>>  struct pci_dn *pdn;
>>  struct pci_sriov  *iov;
>>- u16 num_vfs;
>>+ u16num_vfs, i;
>>
>>  bus = pdev->bus;
>>  hose = pci_bus_to_host(bus);
>>@@ -1361,14 +1361,18 @@ void pnv_pci_sriov_disable(struct pci_dev *pdev)
>>
>>  if (phb->type == PNV_PHB_IODA2) {
>>  if (!pdn->m64_single_mode)
>>- pnv_pci_vf_resource_shift(pdev, -pdn->offset);
>>+ pnv_pci_vf_resource_shift(pdev, -*pdn->offset);
>>
>>  /* Release M64 windows */
>>  pnv_pci_vf_release_m64(pdev, num_vfs);
>>
>>  /* Release PE numbers */
>>- bitmap_clear(phb->ioda.pe_alloc, pdn->offset, num_vfs);
>>- pdn->offset = 0;
>>+ if (pdn->m64_single_mode) {
>>+ for (i = 0; i < num_vfs; i++)
>>+ pnv_ioda_free_pe(phb, pdn->offset[i]);
>>+ } else
>>+ bitmap_clear(phb->ioda.pe_alloc, *pdn->offset, num_vfs);
>>+ kfree(pdn->offset);
>
>Can pnv_ioda_free_pe() be reused to release PE ?

You mean use it to similar thing in pnv_ioda_deconfigure_pe()?

>
>>  }
>> }
>>
>>@@ -1394,7 +1398,10 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, 
>>u16 num_vfs)
>>
>>  /* Reserve PE for each VF */
>>  for (vf_index = 0; vf_index < num_vfs; vf_index++) {
>>- pe_num = pdn->offset + vf_index;
>>+ if (pdn->m64_single_mode)
>>+ pe_num = pdn->offset[vf_index];
>>+ else
>>+ pe_num = *pdn->offset + vf_index;
>>
>>  pe = &phb->ioda.pe_array[pe_num];
>>  pe->pe_number = pe_num;
>>@@ -1436,6 +1443,7 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 
>>num_vfs)
>>  struct pnv_phb*phb;
>>  struct pci_dn *pdn;
>>  intret;
>>+ u16i;
>>
>>  bus = pdev->bus;
>>  hose = pci_bus_to_host(bus);
>>@@ -1462,19 +1470,38 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 
>>num_vfs)
>>  }
>>
>>  /* Calculate available PE for required VFs */
>>-  

Re: [V3] powerpc/irq: Enable some more exceptions in /proc/interrupts interface

2015-08-06 Thread Anshuman Khandual
On 08/04/2015 03:27 PM, Michael Ellerman wrote:
> On Mon, 2015-13-07 at 08:16:06 UTC, Anshuman Khandual wrote:
>> This patch enables facility unavailable exceptions for generic facility,
>> FPU, ALTIVEC and VSX in /proc/interrupts listing by incrementing their
>> newly added IRQ statistical counters as and when these exceptions happen.
>> This also adds couple of helper functions which will be called from within
>> the interrupt handler context to update their statistics. Similarly this
>> patch also enables alignment and program check exceptions as well.
> 
> ...
> 
>> diff --git a/arch/powerpc/kernel/exceptions-64s.S 
>> b/arch/powerpc/kernel/exceptions-64s.S
>> index 0a0399c2..a86180c 100644
>> --- a/arch/powerpc/kernel/exceptions-64s.S
>> +++ b/arch/powerpc/kernel/exceptions-64s.S
>> @@ -1158,6 +1158,7 @@ BEGIN_FTR_SECTION
>>  END_FTR_SECTION_IFSET(CPU_FTR_TM)
>>  #endif
>>  bl  load_up_fpu
>> +bl  fpu_unav_exceptions_count
> 
> Is it safe to call C code here?

Hmm, is it not ? I had that question but was not really sure. Dont
understand the difference between 'fast_exception_return' and
'ret_from_except' completely. Will converting the following sequence
of code

bl  load_up_fpu
+   bl  fpu_unav_exceptions_count
b   fast_exception_return

into

bl  load_up_fpu
RECONCILE_IRQ_STATE(r10, r11)
addir3,r1,STACK_FRAME_OVERHEAD
+   bl  fpu_unav_exceptions_count
b   ret_from_except

help solve the problem ?

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] powerpc/prom: Use DRCONF flags while processing detected LMBs

2015-08-06 Thread Anshuman Khandual
This patch just replaces hard coded values with existing
DRCONF flags while procesing detected LMBs from the device
tree. This does not change any functionality.

Signed-off-by: Anshuman Khandual 
---
 arch/powerpc/kernel/prom.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index 8b888b1..70a8cab 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -476,9 +476,10 @@ static int __init 
early_init_dt_scan_drconf_memory(unsigned long node)
flags = of_read_number(&dm[3], 1);
/* skip DRC index, pad, assoc. list index, flags */
dm += 4;
-   /* skip this block if the reserved bit is set in flags (0x80)
-  or if the block is not assigned to this partition (0x8) */
-   if ((flags & 0x80) || !(flags & 0x8))
+   /* skip this block if the reserved bit is set in flags
+  or if the block is not assigned to this partition */
+   if ((flags & DRCONF_MEM_RESERVED) ||
+   !(flags & DRCONF_MEM_ASSIGNED))
continue;
size = memblock_size;
rngs = 1;
-- 
2.1.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 09/10] Define PERF_PMU_TXN_READ interface

2015-08-06 Thread Peter Zijlstra
On Sun, Jul 26, 2015 at 10:40:37PM -0700, Sukadev Bhattiprolu wrote:
> @@ -3743,7 +3762,13 @@ static u64 perf_event_aggregate(struct perf_event 
> *event, u64 *enabled,
>   lockdep_assert_held(&event->child_mutex);
>  
>   list_for_each_entry(child, &event->child_list, child_list) {
> +#if 0
> + /*
> +  * TODO: Do we need this read() for group events on PMUs that
> +  *   don't implement PERF_PMU_TXN_READ transactions?
> +  */
>   (void)perf_event_read(child, false);
> +#endif
>   total += perf_event_count(child);
>   *enabled += child->total_time_enabled;
>   *running += child->total_time_running;

Aw gawd, I've been an idiot!!

I just realized this is a _CHILD_ loop, not a _SIBLING_ loop !!

We need to flip the loops in perf_read_group(), find attached two
patches that go on top of 1,2,4.

After this you can add the perf_event_read() return value (just fold
patches 6,8) after which you can do patch 10 (which has a broken
Subject fwiw).


Subject: perf: Add group reads to perf_event_read()
From: Peter Zijlstra 
Date: Thu, 23 Jul 2015 10:04:35 +0200

Enable perf_event_read() to update entire groups at once, this will be
useful for read transactions.

Cc: Ingo Molnar 
Cc: Arnaldo Carvalho de Melo 
Cc: Michael Ellerman 
Cc: Sukadev Bhattiprolu 
Signed-off-by: Peter Zijlstra (Intel) 
Link: 
http://lkml.kernel.org/r/20150723080435.ge25...@twins.programming.kicks-ass.net
---
 kernel/events/core.c |   39 ---
 1 file changed, 32 insertions(+), 7 deletions(-)

--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3184,12 +3184,18 @@ void perf_event_exec(void)
rcu_read_unlock();
 }
 
+struct perf_read_data {
+   struct perf_event *event;
+   bool group;
+};
+
 /*
  * Cross CPU call to read the hardware event
  */
 static void __perf_event_read(void *info)
 {
-   struct perf_event *event = info;
+   struct perf_read_data *data = info;
+   struct perf_event *sub, *event = data->event;
struct perf_event_context *ctx = event->ctx;
struct perf_cpu_context *cpuctx = __get_cpu_context(ctx);
 
@@ -3208,9 +3214,21 @@ static void __perf_event_read(void *info
update_context_time(ctx);
update_cgrp_time_from_event(event);
}
+
update_event_times(event);
if (event->state == PERF_EVENT_STATE_ACTIVE)
event->pmu->read(event);
+
+   if (!data->group)
+   goto unlock;
+
+   list_for_each_entry(sub, &event->sibling_list, group_entry) {
+   update_event_times(sub);
+   if (sub->state == PERF_EVENT_STATE_ACTIVE)
+   sub->pmu->read(sub);
+   }
+
+unlock:
raw_spin_unlock(&ctx->lock);
 }
 
@@ -3222,15 +3240,19 @@ static inline u64 perf_event_count(struc
return __perf_event_count(event);
 }
 
-static void perf_event_read(struct perf_event *event)
+static void perf_event_read(struct perf_event *event, bool group)
 {
/*
 * If event is enabled and currently active on a CPU, update the
 * value in the event structure:
 */
if (event->state == PERF_EVENT_STATE_ACTIVE) {
+   struct perf_read_data data = {
+   .event = event,
+   .group = group,
+   };
smp_call_function_single(event->oncpu,
-__perf_event_read, event, 1);
+__perf_event_read, &data, 1);
} else if (event->state == PERF_EVENT_STATE_INACTIVE) {
struct perf_event_context *ctx = event->ctx;
unsigned long flags;
@@ -3245,7 +3267,10 @@ static void perf_event_read(struct perf_
update_context_time(ctx);
update_cgrp_time_from_event(event);
}
-   update_event_times(event);
+   if (group)
+   update_group_times(event);
+   else
+   update_event_times(event);
raw_spin_unlock_irqrestore(&ctx->lock, flags);
}
 }
@@ -3764,7 +3789,7 @@ u64 perf_event_read_value(struct perf_ev
 
mutex_lock(&event->child_mutex);
 
-   perf_event_read(event);
+   perf_event_read(event, false);
total += perf_event_count(event);
 
*enabled += event->total_time_enabled +
@@ -3773,7 +3798,7 @@ u64 perf_event_read_value(struct perf_ev
atomic64_read(&event->child_total_time_running);
 
list_for_each_entry(child, &event->child_list, child_list) {
-   perf_event_read(child);
+   perf_event_read(child, false);
total += perf_event_count(child);
*enabled += child->total_time_enabled;
*running += child->total_time_running;
@@ -3934,7 +3959,7 @@ static unsigned i

Re: [Xen-devel] [PATCH v2 4/8] xen: Use the correctly the Xen memory terminologies

2015-08-06 Thread Julien Grall
On 06/08/15 12:06, Stefano Stabellini wrote:
> On Thu, 6 Aug 2015, Julien Grall wrote:
>> Hi,
>>
>>
>> On 04/08/15 19:12, Julien Grall wrote:
>>> diff --git a/include/xen/page.h b/include/xen/page.h
>>> index c5ed20b..e7e1425 100644
>>> --- a/include/xen/page.h
>>> +++ b/include/xen/page.h
>>> @@ -3,9 +3,9 @@
>>>  
>>>  #include 
>>>  
>>> -static inline unsigned long page_to_mfn(struct page *page)
>>> +static inline unsigned long page_to_gfn(struct page *page)
>>>  {
>>> -   return pfn_to_mfn(page_to_pfn(page));
>>> +   return pfn_to_gfn(page_to_pfn(page));
>>>  }
>>
>> I've just noticed that there is a function gfn_to_page used for KVM.
>>
>> Maybe I should rename page_to_gfn to xen_page_to_gfn to avoid confusion
>> with KVM one?
> 
> Yeah, prepending xen would help to avoid namespace pollution.

Will do. May I keep your Reviewed-by for this mechanical change?

Regards,

-- 
Julien Grall
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V2 1/6] powerpc/powernv: don't enable SRIOV when VF BAR contains non M64 BAR

2015-08-06 Thread Gavin Shan
On Thu, Aug 06, 2015 at 05:47:42PM +1000, Alexey Kardashevskiy wrote:
>On 08/06/2015 04:57 PM, Gavin Shan wrote:
>>On Thu, Aug 06, 2015 at 04:10:21PM +1000, Alexey Kardashevskiy wrote:
>>>On 08/06/2015 02:35 PM, Gavin Shan wrote:
On Wed, Aug 05, 2015 at 09:24:58AM +0800, Wei Yang wrote:
>On PHB_IODA2, we enable SRIOV devices by mapping IOV BAR with M64 BARs. If
>a SRIOV device's BAR is not 64-bit prefetchable, this is not assigned from
>M64 windwo, which means M64 BAR can't work on it.
>
>
>The proper text would be something like this:
>
>===
>SRIOV only supports 64bit MMIO. So if we fail to assign 64bit BAR, we cannot
>enable the device.
>===
>
>
>

s/PHB_IODA2/PHB3
>>>
>>>
>>>No, it is IODA2. OPEL does PHB3-specific bits, the host kernel just uses 
>>>OPAL.
>>>
>>
>>Ok.
>>
>>>
s/windwo/window

>This patch makes this explicit.
>
>Signed-off-by: Wei Yang 

The idea sounds right, but there is one question as below.

>---
>arch/powerpc/platforms/powernv/pci-ioda.c |   25 +
>1 file changed, 9 insertions(+), 16 deletions(-)
>
>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
>b/arch/powerpc/platforms/powernv/pci-ioda.c
>index 5738d31..9b41dba 100644
>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>@@ -908,9 +908,6 @@ static int pnv_pci_vf_resource_shift(struct pci_dev 
>*dev, int offset)
>   if (!res->flags || !res->parent)
>   continue;
>
>-  if (!pnv_pci_is_mem_pref_64(res->flags))
>-  continue;
>-
>   /*
>* The actual IOV BAR range is determined by the start address
>* and the actual size for num_vfs VFs BAR.  This check is to
>@@ -939,9 +936,6 @@ static int pnv_pci_vf_resource_shift(struct pci_dev 
>*dev, int offset)
>   if (!res->flags || !res->parent)
>   continue;
>
>-  if (!pnv_pci_is_mem_pref_64(res->flags))
>-  continue;
>-
>   size = pci_iov_resource_size(dev, i + PCI_IOV_RESOURCES);
>   res2 = *res;
>   res->start += size * offset;
>@@ -1221,9 +1215,6 @@ static int pnv_pci_vf_assign_m64(struct pci_dev 
>*pdev, u16 num_vfs)
>   if (!res->flags || !res->parent)
>   continue;
>
>-  if (!pnv_pci_is_mem_pref_64(res->flags))
>-  continue;
>-
>   for (j = 0; j < vf_groups; j++) {
>   do {
>   win = 
> find_next_zero_bit(&phb->ioda.m64_bar_alloc,
>@@ -1510,6 +1501,12 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 
>num_vfs)
>   pdn = pci_get_pdn(pdev);
>
>   if (phb->type == PNV_PHB_IODA2) {
>+  if (!pdn->vfs_expanded) {
>+  dev_info(&pdev->dev, "don't support this SRIOV device"
>+  " with non M64 VF BAR\n");
>+  return -EBUSY;
>+  }
>+

It would be -ENOSPC since -EBUSY indicates the devices (VFs) are temparily
unavailable. For this case, the VFs are permanently unavailable because of
running out of space to accomodate M64 and non-M64 VF BARs.

The error message could be printed with dev_warn() and it would be precise
as below or something else you prefer:

dev_warn(&pdev->dev, "SRIOV not supported because of non-M64 VF BAR\n");
>>>
>>>
>>>Both messages are cryptic.
>>>
>>>If it is not M64 BAR, then what is it? It is always in one of M64 BARs (in
>>>the worst case - BAR#15?), the difference is if it is segmented or not, no?
>>>
>>
>>The VF BAR could be one of IO, M32, M64. If it's not M64, the VFs are supposed
>>to be disabled and the (IO and M32) resources won't be allocted, but for sure,
>>the IO and M32 resources can't be put into any one of the 16 PHB's M64 BARs.
>>would you recommend one better message then?
>
>
>
>dev_warn(&pdev->dev, "SRIOV is disabled as no space is left in 64bit MMIO
>window\n");
>
>Or it is not "MMIO window"?
>

It's a confusing message: It's not related to space and M64 window.
When any VF BAR is IO or M32, we just give up attempting to allocate
resources for it. I still think my original message is enough or
similarly below one:

dev_warn(&pdev->dev, "Disabled SRIOV because of non-M64 BAR"

>>


>   /* Calculate available PE for required VFs */
>   mutex_lock(&phb->ioda.pe_alloc_mutex);
>   pdn->offset = bitmap_find_next_zero_area(
>@@ -2774,9 +2771,10 @@ static void pnv_pci_ioda_fixup_iov_resources(struct 
>pci_dev *pdev)
>   if (!res->flags || res->parent)
>   continue;
>   if (!pnv_pci_is_mem_pref_64(res->flags)) {
>-  dev_warn(&pdev->dev, " n

Re: [Xen-devel] [PATCH v2 4/8] xen: Use the correctly the Xen memory terminologies

2015-08-06 Thread Stefano Stabellini
On Thu, 6 Aug 2015, Julien Grall wrote:
> Hi,
> 
> 
> On 04/08/15 19:12, Julien Grall wrote:
> > diff --git a/include/xen/page.h b/include/xen/page.h
> > index c5ed20b..e7e1425 100644
> > --- a/include/xen/page.h
> > +++ b/include/xen/page.h
> > @@ -3,9 +3,9 @@
> >  
> >  #include 
> >  
> > -static inline unsigned long page_to_mfn(struct page *page)
> > +static inline unsigned long page_to_gfn(struct page *page)
> >  {
> > -   return pfn_to_mfn(page_to_pfn(page));
> > +   return pfn_to_gfn(page_to_pfn(page));
> >  }
> 
> I've just noticed that there is a function gfn_to_page used for KVM.
> 
> Maybe I should rename page_to_gfn to xen_page_to_gfn to avoid confusion
> with KVM one?

Yeah, prepending xen would help to avoid namespace pollution.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [Xen-devel] [PATCH v2 4/8] xen: Use the correctly the Xen memory terminologies

2015-08-06 Thread Julien Grall
Hi,


On 04/08/15 19:12, Julien Grall wrote:
> diff --git a/include/xen/page.h b/include/xen/page.h
> index c5ed20b..e7e1425 100644
> --- a/include/xen/page.h
> +++ b/include/xen/page.h
> @@ -3,9 +3,9 @@
>  
>  #include 
>  
> -static inline unsigned long page_to_mfn(struct page *page)
> +static inline unsigned long page_to_gfn(struct page *page)
>  {
> - return pfn_to_mfn(page_to_pfn(page));
> + return pfn_to_gfn(page_to_pfn(page));
>  }

I've just noticed that there is a function gfn_to_page used for KVM.

Maybe I should rename page_to_gfn to xen_page_to_gfn to avoid confusion
with KVM one?

Regards,

-- 
Julien Grall
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V2 2/6] powerpc/powernv: simplify the calculation of iov resource

2015-08-06 Thread Alexey Kardashevskiy

On 08/06/2015 07:41 PM, Wei Yang wrote:

On Thu, Aug 06, 2015 at 07:00:00PM +1000, Alexey Kardashevskiy wrote:

On 08/06/2015 02:51 PM, Gavin Shan wrote:

On Wed, Aug 05, 2015 at 09:24:59AM +0800, Wei Yang wrote:

The alignment of IOV BAR on PowerNV platform is the total size of the IOV
BAR. No matter whether the IOV BAR is truncated or not, the total size
could be calculated by (vfs_expanded * VF size).



s/VF size/VF BAR size

I think the changelog would be more explicit:

The alignment of IOV BAR on PowerNV platform is the total size of the
IOV BAR, no matter whether the IOV BAR is extended with number of max
VFs or number of max PE number (256). The alignment can be calculated
by (vfs_expaned * VF_BAR_size).




Is that really a PowerNV-specific requirement or it is valid for
every platform (I suspect this is the case here)?



Currently, it is PowerNV-specific.



How is x86 different on this matter?
Why would we need this extra alignment, not just VF's BAR alignment?




Also, what is the exact meaning of "expanded" in @vfs_expanded? It is
either 255 (if individual VF BARs are <= 64MB) or
roundup_pow_of_two(total_vfs) (which is something like 4 or 16). What
is expanded here?



PF's IOV BAR original size is (VF BAR size * total_vfs).

After expanding, the IOV BAR size  is (VF BAR size * 256) or (VF BAR size *
roundup_pow_of_two(total_vfs)).



Ufff, got it now. I'd store just an expanded IOV BAR size (not some magic 
VFs number) because this is what it actually is:

pdn->vfs_expanded * align







This patch simplifies the pnv_pci_iov_resource_alignment() by removing the
first case.

Signed-off-by: Wei Yang 


Reviewed-by: Gavin Shan 


---
arch/powerpc/platforms/powernv/pci-ioda.c |   14 +-
1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 9b41dba..7192e62 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2987,12 +2987,16 @@ static resource_size_t 
pnv_pci_iov_resource_alignment(struct pci_dev *pdev,
  int resno)
{
struct pci_dn *pdn = pci_get_pdn(pdev);
-   resource_size_t align, iov_align;
-
-   iov_align = resource_size(&pdev->resource[resno]);
-   if (iov_align)
-   return iov_align;
+   resource_size_t align;

+   /*
+* On PowerNV platform, IOV BAR is mapped by M64 BAR to enable the
+* SR-IOV. While from hardware perspective, the range mapped by M64
+* BAR should be size aligned.
+*
+* This function return the total IOV BAR size if expanded or just the
+* individual size if not.
+*/
align = pci_iov_resource_size(pdev, resno);
if (pdn->vfs_expanded)
return pdn->vfs_expanded * align;
--
1.7.9.5






--
Alexey





--
Alexey
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 1/1] KVM: PPC: Book3S: correct width in XER handling

2015-08-06 Thread Laurent Vivier
Hi,

I'd also like to see this patch in the mainstream as it fixes a bug
appearing when we switch from vCPU context to hypervisor context (guest
crash).

Laurent

On 06/08/2015 03:25, Sam Bobroff wrote:
> Ping?
> 
> I think I've addressed all the comments in this version. Is there anything 
> else
> I need to look at?
> 
> Cheers,
> Sam.
> 
> On Wed, May 27, 2015 at 09:56:57AM +1000, Sam Bobroff wrote:
>> In 64 bit kernels, the Fixed Point Exception Register (XER) is a 64
>> bit field (e.g. in kvm_regs and kvm_vcpu_arch) and in most places it is
>> accessed as such.
>>
>> This patch corrects places where it is accessed as a 32 bit field by a
>> 64 bit kernel.  In some cases this is via a 32 bit load or store
>> instruction which, depending on endianness, will cause either the
>> lower or upper 32 bits to be missed.  In another case it is cast as a
>> u32, causing the upper 32 bits to be cleared.
>>
>> This patch corrects those places by extending the access methods to
>> 64 bits.
>>
>> Signed-off-by: Sam Bobroff 
>> ---
>>
>> v3:
>> Adjust booke set/get xer to match book3s.
>>
>> v2:
>>
>> Also extend kvmppc_book3s_shadow_vcpu.xer to 64 bit.
>>
>>  arch/powerpc/include/asm/kvm_book3s.h |4 ++--
>>  arch/powerpc/include/asm/kvm_book3s_asm.h |2 +-
>>  arch/powerpc/include/asm/kvm_booke.h  |4 ++--
>>  arch/powerpc/kvm/book3s_hv_rmhandlers.S   |6 +++---
>>  arch/powerpc/kvm/book3s_segment.S |4 ++--
>>  5 files changed, 10 insertions(+), 10 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
>> b/arch/powerpc/include/asm/kvm_book3s.h
>> index b91e74a..05a875a 100644
>> --- a/arch/powerpc/include/asm/kvm_book3s.h
>> +++ b/arch/powerpc/include/asm/kvm_book3s.h
>> @@ -225,12 +225,12 @@ static inline u32 kvmppc_get_cr(struct kvm_vcpu *vcpu)
>>  return vcpu->arch.cr;
>>  }
>>  
>> -static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, u32 val)
>> +static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, ulong val)
>>  {
>>  vcpu->arch.xer = val;
>>  }
>>  
>> -static inline u32 kvmppc_get_xer(struct kvm_vcpu *vcpu)
>> +static inline ulong kvmppc_get_xer(struct kvm_vcpu *vcpu)
>>  {
>>  return vcpu->arch.xer;
>>  }
>> diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h 
>> b/arch/powerpc/include/asm/kvm_book3s_asm.h
>> index 5bdfb5d..c4ccd2d 100644
>> --- a/arch/powerpc/include/asm/kvm_book3s_asm.h
>> +++ b/arch/powerpc/include/asm/kvm_book3s_asm.h
>> @@ -112,7 +112,7 @@ struct kvmppc_book3s_shadow_vcpu {
>>  bool in_use;
>>  ulong gpr[14];
>>  u32 cr;
>> -u32 xer;
>> +ulong xer;
>>  ulong ctr;
>>  ulong lr;
>>  ulong pc;
>> diff --git a/arch/powerpc/include/asm/kvm_booke.h 
>> b/arch/powerpc/include/asm/kvm_booke.h
>> index 3286f0d..bc6e29e 100644
>> --- a/arch/powerpc/include/asm/kvm_booke.h
>> +++ b/arch/powerpc/include/asm/kvm_booke.h
>> @@ -54,12 +54,12 @@ static inline u32 kvmppc_get_cr(struct kvm_vcpu *vcpu)
>>  return vcpu->arch.cr;
>>  }
>>  
>> -static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, u32 val)
>> +static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, ulong val)
>>  {
>>  vcpu->arch.xer = val;
>>  }
>>  
>> -static inline u32 kvmppc_get_xer(struct kvm_vcpu *vcpu)
>> +static inline ulong kvmppc_get_xer(struct kvm_vcpu *vcpu)
>>  {
>>  return vcpu->arch.xer;
>>  }
>> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
>> b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
>> index 4d70df2..d75be59 100644
>> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
>> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
>> @@ -870,7 +870,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
>>  blt hdec_soon
>>  
>>  ld  r6, VCPU_CTR(r4)
>> -lwz r7, VCPU_XER(r4)
>> +ld  r7, VCPU_XER(r4)
>>  
>>  mtctr   r6
>>  mtxer   r7
>> @@ -1103,7 +1103,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
>>  mfctr   r3
>>  mfxer   r4
>>  std r3, VCPU_CTR(r9)
>> -stw r4, VCPU_XER(r9)
>> +std r4, VCPU_XER(r9)
>>  
>>  /* If this is a page table miss then see if it's theirs or ours */
>>  cmpwi   r12, BOOK3S_INTERRUPT_H_DATA_STORAGE
>> @@ -1675,7 +1675,7 @@ kvmppc_hdsi:
>>  bl  kvmppc_msr_interrupt
>>  fast_interrupt_c_return:
>>  6:  ld  r7, VCPU_CTR(r9)
>> -lwz r8, VCPU_XER(r9)
>> +ld  r8, VCPU_XER(r9)
>>  mtctr   r7
>>  mtxer   r8
>>  mr  r4, r9
>> diff --git a/arch/powerpc/kvm/book3s_segment.S 
>> b/arch/powerpc/kvm/book3s_segment.S
>> index acee37c..ca8f174 100644
>> --- a/arch/powerpc/kvm/book3s_segment.S
>> +++ b/arch/powerpc/kvm/book3s_segment.S
>> @@ -123,7 +123,7 @@ no_dcbz32_on:
>>  PPC_LL  r8, SVCPU_CTR(r3)
>>  PPC_LL  r9, SVCPU_LR(r3)
>>  lwz r10, SVCPU_CR(r3)
>> -lwz r11, SVCPU_XER(r3)
>> +PPC_LL  r11, SVCPU_XER(r3)
>>  
>>  mtctr   r8
>>  mtlrr9
>> @@ -237,7 +237,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
>>  mfctr   r8
>>  mflrr9
>>  
>> -stw 

Re: [PATCH V2 3/6] powerpc/powernv: use one M64 BAR in Single PE mode for one VF BAR

2015-08-06 Thread Gavin Shan
On Thu, Aug 06, 2015 at 05:36:02PM +0800, Wei Yang wrote:
>On Thu, Aug 06, 2015 at 03:20:25PM +1000, Gavin Shan wrote:
>>On Wed, Aug 05, 2015 at 09:25:00AM +0800, Wei Yang wrote:
>>>In current implementation, when VF BAR is bigger than 64MB, it uses 4 M64
>>>BAR in Single PE mode to cover the number of VFs required to be enabled.
>>>By doing so, several VFs would be in one VF Group and leads to interference
>>>between VFs in the same group.
>>>
>>>This patch changes the design by using one M64 BAR in Single PE mode for
>>>one VF BAR. This gives absolute isolation for VFs.
>>>
>>>Signed-off-by: Wei Yang 
>>>---
>>> arch/powerpc/include/asm/pci-bridge.h |5 +-
>>> arch/powerpc/platforms/powernv/pci-ioda.c |  180 
>>> -
>>> 2 files changed, 76 insertions(+), 109 deletions(-)
>>>
>>>diff --git a/arch/powerpc/include/asm/pci-bridge.h 
>>>b/arch/powerpc/include/asm/pci-bridge.h
>>>index 712add5..8aeba4c 100644
>>>--- a/arch/powerpc/include/asm/pci-bridge.h
>>>+++ b/arch/powerpc/include/asm/pci-bridge.h
>>>@@ -214,10 +214,9 @@ struct pci_dn {
>>> u16 vfs_expanded;   /* number of VFs IOV BAR expanded */
>>> u16 num_vfs;/* number of VFs enabled*/
>>> int offset; /* PE# for the first VF PE */
>>>-#define M64_PER_IOV 4
>>>-int m64_per_iov;
>>>+boolm64_single_mode;/* Use M64 BAR in Single Mode */
>>> #define IODA_INVALID_M64(-1)
>>>-int m64_wins[PCI_SRIOV_NUM_BARS][M64_PER_IOV];
>>>+int (*m64_map)[PCI_SRIOV_NUM_BARS];
>>
>>It can be explicit? For example:
>>
>>  int *m64_map;
>>
>>  /* Initialization */
>>  size_t size = sizeof(*pdn->m64_map) * PCI_SRIOV_NUM_BARS * 
>> num_of_max_VFs;
>>  pdn->m64_map = kmalloc(size, GFP_KERNEL);
>>  for (i = 0; i < PCI_SRIOV_NUM_BARS; i++)
>>  for (j = 0; j < num_of_max_VFs; j++)
>>  pdn->m64_map[i * PCI_SRIOV_NUM_BARS + j] = 
>> PNV_INVALID_M64;
>>
>>  /* Destroy */
>>  int step = 1;
>>
>>  if (!pdn->m64_single_mode)
>>  step = phb->ioda.total_pe;
>>  for (i = 0; i < PCI_SRIOV_NUM_BARS * num_of_max_VFs; i += step)
>>  if (pdn->m64_map[i] == PNV_INVALID_M64)
>>  continue;
>>
>>  /* Unmap the window */
>>  
>
>The m64_map is a pointer to an array with 6 elements, which represents the 6
>M64 BAR index for the 6 VF BARs.
>
>When we use Shared Mode, one array is allocated. The six elements
>represents the six M64 BAR(at most) used to map the whole IOV BAR.
>
>When we use Single Mode, num_vfs array is allocate. Each array represents
>the map between one VF's BAR and M64 BAR index.
>
>During the map and un-map, M64 BAR is assigned one by one in VF BAR's order.
>So I think the code is explicit.
>
>In your code, you allocate a big one dimension array to hold the M64 BAR
>index. It works, while I don't think this is more explicit than original code.
>

When M64 is in Single Mode, array with (num_vfs * 6) entries is allocated
because every VF BAR (6 at most) will have one corresponding PHB M64 BAR.
Anything I missed?

The point in my code is you needn't worry about the mode (single vs shared)
As I said, not too much memory wasted. However, it's up to you.

I'm not fan of "int (*m64_map)[PCI_SRIOV_NUM_BARS]". Instead, you can replace
it with "int *m64_map" and calculate its size using following formula:

sizeof(*pdn->m64_map) * PCI_SRIOV_NUM_BARS;

sizeof(*pdn->m64_map) * PCI_SRIOV_NUM_BARS * num_vfs;

>-- 
>Richard Yang
>Help you, Help me

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V2 3/6] powerpc/powernv: use one M64 BAR in Single PE mode for one VF BAR

2015-08-06 Thread Alexey Kardashevskiy

On 08/05/2015 11:25 AM, Wei Yang wrote:

In current implementation, when VF BAR is bigger than 64MB, it uses 4 M64
BAR in Single PE mode to cover the number of VFs required to be enabled.
By doing so, several VFs would be in one VF Group and leads to interference
between VFs in the same group.

This patch changes the design by using one M64 BAR in Single PE mode for
one VF BAR. This gives absolute isolation for VFs.

Signed-off-by: Wei Yang 
---
  arch/powerpc/include/asm/pci-bridge.h |5 +-
  arch/powerpc/platforms/powernv/pci-ioda.c |  180 -
  2 files changed, 76 insertions(+), 109 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h 
b/arch/powerpc/include/asm/pci-bridge.h
index 712add5..8aeba4c 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -214,10 +214,9 @@ struct pci_dn {
u16 vfs_expanded;   /* number of VFs IOV BAR expanded */
u16 num_vfs;/* number of VFs enabled*/
int offset; /* PE# for the first VF PE */
-#define M64_PER_IOV 4
-   int m64_per_iov;
+   boolm64_single_mode;/* Use M64 BAR in Single Mode */
  #define IODA_INVALID_M64(-1)
-   int m64_wins[PCI_SRIOV_NUM_BARS][M64_PER_IOV];
+   int (*m64_map)[PCI_SRIOV_NUM_BARS];
  #endif /* CONFIG_PCI_IOV */
  #endif
struct list_head child_list;
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 7192e62..f5d110c 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1148,29 +1148,36 @@ static void pnv_pci_ioda_setup_PEs(void)
  }

  #ifdef CONFIG_PCI_IOV
-static int pnv_pci_vf_release_m64(struct pci_dev *pdev)
+static int pnv_pci_vf_release_m64(struct pci_dev *pdev, u16 num_vfs)
  {
struct pci_bus*bus;
struct pci_controller *hose;
struct pnv_phb*phb;
struct pci_dn *pdn;
inti, j;
+   intm64_bars;

bus = pdev->bus;
hose = pci_bus_to_host(bus);
phb = hose->private_data;
pdn = pci_get_pdn(pdev);

+   if (pdn->m64_single_mode)
+   m64_bars = num_vfs;
+   else
+   m64_bars = 1;
+
for (i = 0; i < PCI_SRIOV_NUM_BARS; i++)
-   for (j = 0; j < M64_PER_IOV; j++) {
-   if (pdn->m64_wins[i][j] == IODA_INVALID_M64)
+   for (j = 0; j < m64_bars; j++) {
+   if (pdn->m64_map[j][i] == IODA_INVALID_M64)
continue;
opal_pci_phb_mmio_enable(phb->opal_id,
-   OPAL_M64_WINDOW_TYPE, pdn->m64_wins[i][j], 0);
-   clear_bit(pdn->m64_wins[i][j], 
&phb->ioda.m64_bar_alloc);
-   pdn->m64_wins[i][j] = IODA_INVALID_M64;
+   OPAL_M64_WINDOW_TYPE, pdn->m64_map[j][i], 0);
+   clear_bit(pdn->m64_map[j][i], &phb->ioda.m64_bar_alloc);
+   pdn->m64_map[j][i] = IODA_INVALID_M64;
}

+   kfree(pdn->m64_map);
return 0;
  }

@@ -1187,8 +1194,7 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, 
u16 num_vfs)
inttotal_vfs;
resource_size_tsize, start;
intpe_num;
-   intvf_groups;
-   intvf_per_group;
+   intm64_bars;

bus = pdev->bus;
hose = pci_bus_to_host(bus);
@@ -1196,26 +1202,26 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, 
u16 num_vfs)
pdn = pci_get_pdn(pdev);
total_vfs = pci_sriov_get_totalvfs(pdev);

-   /* Initialize the m64_wins to IODA_INVALID_M64 */
-   for (i = 0; i < PCI_SRIOV_NUM_BARS; i++)
-   for (j = 0; j < M64_PER_IOV; j++)
-   pdn->m64_wins[i][j] = IODA_INVALID_M64;
+   if (pdn->m64_single_mode)



This is a physical function's @pdn, right?



+   m64_bars = num_vfs;
+   else
+   m64_bars = 1;
+
+   pdn->m64_map = kmalloc(sizeof(*pdn->m64_map) * m64_bars, GFP_KERNEL);



Assume we have SRIOV device with 16VF.
So it was m64_wins[6][4], now it is (roughly speaking) m64_map[6][16] (for 
a single PE mode) or m64_map[6][1]. I believe m64_bars cannot be bigger 
than 16 on PHB3, right? Is this checked anywhere (does it have to)?


This m64_wins -> m64_map change - is was not a map (what was it?), and it 
is, is not it?


What does it store? An index of M64 BAR (0..15)?




+   if (!pdn->m64_map)
+   return -ENOMEM;
+   /* Initialize the m64_map to IODA_INVALID_M64 */
+   for (i = 0; i < m64_bars ; i++)
+   for (j = 0; j < PCI_SRIOV_NUM_BARS; j++)
+   pdn->m64_map[i][j] = IODA_INV

Re: [PATCH V2 2/6] powerpc/powernv: simplify the calculation of iov resource

2015-08-06 Thread Wei Yang
On Thu, Aug 06, 2015 at 07:00:00PM +1000, Alexey Kardashevskiy wrote:
>On 08/06/2015 02:51 PM, Gavin Shan wrote:
>>On Wed, Aug 05, 2015 at 09:24:59AM +0800, Wei Yang wrote:
>>>The alignment of IOV BAR on PowerNV platform is the total size of the IOV
>>>BAR. No matter whether the IOV BAR is truncated or not, the total size
>>>could be calculated by (vfs_expanded * VF size).
>>>
>>
>>s/VF size/VF BAR size
>>
>>I think the changelog would be more explicit:
>>
>>The alignment of IOV BAR on PowerNV platform is the total size of the
>>IOV BAR, no matter whether the IOV BAR is extended with number of max
>>VFs or number of max PE number (256). The alignment can be calculated
>>by (vfs_expaned * VF_BAR_size).
>
>
>
>Is that really a PowerNV-specific requirement or it is valid for
>every platform (I suspect this is the case here)?
>

Currently, it is PowerNV-specific.

>
>Also, what is the exact meaning of "expanded" in @vfs_expanded? It is
>either 255 (if individual VF BARs are <= 64MB) or
>roundup_pow_of_two(total_vfs) (which is something like 4 or 16). What
>is expanded here?
>

PF's IOV BAR original size is (VF BAR size * total_vfs). 

After expanding, the IOV BAR size  is (VF BAR size * 256) or (VF BAR size *
roundup_pow_of_two(total_vfs)).

>
>>
>>>This patch simplifies the pnv_pci_iov_resource_alignment() by removing the
>>>first case.
>>>
>>>Signed-off-by: Wei Yang 
>>
>>Reviewed-by: Gavin Shan 
>>
>>>---
>>>arch/powerpc/platforms/powernv/pci-ioda.c |   14 +-
>>>1 file changed, 9 insertions(+), 5 deletions(-)
>>>
>>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
>>>b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>index 9b41dba..7192e62 100644
>>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>@@ -2987,12 +2987,16 @@ static resource_size_t 
>>>pnv_pci_iov_resource_alignment(struct pci_dev *pdev,
>>>   int resno)
>>>{
>>> struct pci_dn *pdn = pci_get_pdn(pdev);
>>>-resource_size_t align, iov_align;
>>>-
>>>-iov_align = resource_size(&pdev->resource[resno]);
>>>-if (iov_align)
>>>-return iov_align;
>>>+resource_size_t align;
>>>
>>>+/*
>>>+ * On PowerNV platform, IOV BAR is mapped by M64 BAR to enable the
>>>+ * SR-IOV. While from hardware perspective, the range mapped by M64
>>>+ * BAR should be size aligned.
>>>+ *
>>>+ * This function return the total IOV BAR size if expanded or just the
>>>+ * individual size if not.
>>>+ */
>>> align = pci_iov_resource_size(pdev, resno);
>>> if (pdn->vfs_expanded)
>>> return pdn->vfs_expanded * align;
>>>--
>>>1.7.9.5
>>>
>>
>
>
>-- 
>Alexey

-- 
Richard Yang
Help you, Help me

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V2 3/6] powerpc/powernv: use one M64 BAR in Single PE mode for one VF BAR

2015-08-06 Thread Wei Yang
On Thu, Aug 06, 2015 at 03:20:25PM +1000, Gavin Shan wrote:
>On Wed, Aug 05, 2015 at 09:25:00AM +0800, Wei Yang wrote:
>>In current implementation, when VF BAR is bigger than 64MB, it uses 4 M64
>>BAR in Single PE mode to cover the number of VFs required to be enabled.
>>By doing so, several VFs would be in one VF Group and leads to interference
>>between VFs in the same group.
>>
>>This patch changes the design by using one M64 BAR in Single PE mode for
>>one VF BAR. This gives absolute isolation for VFs.
>>
>>Signed-off-by: Wei Yang 
>>---
>> arch/powerpc/include/asm/pci-bridge.h |5 +-
>> arch/powerpc/platforms/powernv/pci-ioda.c |  180 
>> -
>> 2 files changed, 76 insertions(+), 109 deletions(-)
>>
>>diff --git a/arch/powerpc/include/asm/pci-bridge.h 
>>b/arch/powerpc/include/asm/pci-bridge.h
>>index 712add5..8aeba4c 100644
>>--- a/arch/powerpc/include/asm/pci-bridge.h
>>+++ b/arch/powerpc/include/asm/pci-bridge.h
>>@@ -214,10 +214,9 @@ struct pci_dn {
>>  u16 vfs_expanded;   /* number of VFs IOV BAR expanded */
>>  u16 num_vfs;/* number of VFs enabled*/
>>  int offset; /* PE# for the first VF PE */
>>-#define M64_PER_IOV 4
>>- int m64_per_iov;
>>+ boolm64_single_mode;/* Use M64 BAR in Single Mode */
>> #define IODA_INVALID_M64(-1)
>>- int m64_wins[PCI_SRIOV_NUM_BARS][M64_PER_IOV];
>>+ int (*m64_map)[PCI_SRIOV_NUM_BARS];
>
>It can be explicit? For example:
>
>   int *m64_map;
>
>   /* Initialization */
>   size_t size = sizeof(*pdn->m64_map) * PCI_SRIOV_NUM_BARS * 
> num_of_max_VFs;
>   pdn->m64_map = kmalloc(size, GFP_KERNEL);
>   for (i = 0; i < PCI_SRIOV_NUM_BARS; i++)
>   for (j = 0; j < num_of_max_VFs; j++)
>   pdn->m64_map[i * PCI_SRIOV_NUM_BARS + j] = 
> PNV_INVALID_M64;
>
>   /* Destroy */
>   int step = 1;
>
>   if (!pdn->m64_single_mode)
>   step = phb->ioda.total_pe;
>   for (i = 0; i < PCI_SRIOV_NUM_BARS * num_of_max_VFs; i += step)
>   if (pdn->m64_map[i] == PNV_INVALID_M64)
>   continue;
>
>   /* Unmap the window */
>   

The m64_map is a pointer to an array with 6 elements, which represents the 6
M64 BAR index for the 6 VF BARs.

When we use Shared Mode, one array is allocated. The six elements
represents the six M64 BAR(at most) used to map the whole IOV BAR.

When we use Single Mode, num_vfs array is allocate. Each array represents
the map between one VF's BAR and M64 BAR index.

During the map and un-map, M64 BAR is assigned one by one in VF BAR's order.
So I think the code is explicit.

In your code, you allocate a big one dimension array to hold the M64 BAR
index. It works, while I don't think this is more explicit than original code.

-- 
Richard Yang
Help you, Help me

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V2 2/6] powerpc/powernv: simplify the calculation of iov resource

2015-08-06 Thread Alexey Kardashevskiy

On 08/06/2015 02:51 PM, Gavin Shan wrote:

On Wed, Aug 05, 2015 at 09:24:59AM +0800, Wei Yang wrote:

The alignment of IOV BAR on PowerNV platform is the total size of the IOV
BAR. No matter whether the IOV BAR is truncated or not, the total size
could be calculated by (vfs_expanded * VF size).



s/VF size/VF BAR size

I think the changelog would be more explicit:

The alignment of IOV BAR on PowerNV platform is the total size of the
IOV BAR, no matter whether the IOV BAR is extended with number of max
VFs or number of max PE number (256). The alignment can be calculated
by (vfs_expaned * VF_BAR_size).




Is that really a PowerNV-specific requirement or it is valid for every 
platform (I suspect this is the case here)?



Also, what is the exact meaning of "expanded" in @vfs_expanded? It is 
either 255 (if individual VF BARs are <= 64MB) or 
roundup_pow_of_two(total_vfs) (which is something like 4 or 16). What is 
expanded here?






This patch simplifies the pnv_pci_iov_resource_alignment() by removing the
first case.

Signed-off-by: Wei Yang 


Reviewed-by: Gavin Shan 


---
arch/powerpc/platforms/powernv/pci-ioda.c |   14 +-
1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 9b41dba..7192e62 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2987,12 +2987,16 @@ static resource_size_t 
pnv_pci_iov_resource_alignment(struct pci_dev *pdev,
  int resno)
{
struct pci_dn *pdn = pci_get_pdn(pdev);
-   resource_size_t align, iov_align;
-
-   iov_align = resource_size(&pdev->resource[resno]);
-   if (iov_align)
-   return iov_align;
+   resource_size_t align;

+   /*
+* On PowerNV platform, IOV BAR is mapped by M64 BAR to enable the
+* SR-IOV. While from hardware perspective, the range mapped by M64
+* BAR should be size aligned.
+*
+* This function return the total IOV BAR size if expanded or just the
+* individual size if not.
+*/
align = pci_iov_resource_size(pdev, resno);
if (pdn->vfs_expanded)
return pdn->vfs_expanded * align;
--
1.7.9.5






--
Alexey
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V2 1/6] powerpc/powernv: don't enable SRIOV when VF BAR contains non M64 BAR

2015-08-06 Thread Alexey Kardashevskiy

On 08/06/2015 04:57 PM, Gavin Shan wrote:

On Thu, Aug 06, 2015 at 04:10:21PM +1000, Alexey Kardashevskiy wrote:

On 08/06/2015 02:35 PM, Gavin Shan wrote:

On Wed, Aug 05, 2015 at 09:24:58AM +0800, Wei Yang wrote:

On PHB_IODA2, we enable SRIOV devices by mapping IOV BAR with M64 BARs. If
a SRIOV device's BAR is not 64-bit prefetchable, this is not assigned from
M64 windwo, which means M64 BAR can't work on it.



The proper text would be something like this:

===
SRIOV only supports 64bit MMIO. So if we fail to assign 64bit BAR, we 
cannot enable the device.

===






s/PHB_IODA2/PHB3



No, it is IODA2. OPEL does PHB3-specific bits, the host kernel just uses OPAL.



Ok.




s/windwo/window


This patch makes this explicit.

Signed-off-by: Wei Yang 


The idea sounds right, but there is one question as below.


---
arch/powerpc/platforms/powernv/pci-ioda.c |   25 +
1 file changed, 9 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 5738d31..9b41dba 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -908,9 +908,6 @@ static int pnv_pci_vf_resource_shift(struct pci_dev *dev, 
int offset)
if (!res->flags || !res->parent)
continue;

-   if (!pnv_pci_is_mem_pref_64(res->flags))
-   continue;
-
/*
 * The actual IOV BAR range is determined by the start address
 * and the actual size for num_vfs VFs BAR.  This check is to
@@ -939,9 +936,6 @@ static int pnv_pci_vf_resource_shift(struct pci_dev *dev, 
int offset)
if (!res->flags || !res->parent)
continue;

-   if (!pnv_pci_is_mem_pref_64(res->flags))
-   continue;
-
size = pci_iov_resource_size(dev, i + PCI_IOV_RESOURCES);
res2 = *res;
res->start += size * offset;
@@ -1221,9 +1215,6 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, 
u16 num_vfs)
if (!res->flags || !res->parent)
continue;

-   if (!pnv_pci_is_mem_pref_64(res->flags))
-   continue;
-
for (j = 0; j < vf_groups; j++) {
do {
win = 
find_next_zero_bit(&phb->ioda.m64_bar_alloc,
@@ -1510,6 +1501,12 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 
num_vfs)
pdn = pci_get_pdn(pdev);

if (phb->type == PNV_PHB_IODA2) {
+   if (!pdn->vfs_expanded) {
+   dev_info(&pdev->dev, "don't support this SRIOV device"
+   " with non M64 VF BAR\n");
+   return -EBUSY;
+   }
+


It would be -ENOSPC since -EBUSY indicates the devices (VFs) are temparily
unavailable. For this case, the VFs are permanently unavailable because of
running out of space to accomodate M64 and non-M64 VF BARs.

The error message could be printed with dev_warn() and it would be precise
as below or something else you prefer:

dev_warn(&pdev->dev, "SRIOV not supported because of non-M64 VF BAR\n");



Both messages are cryptic.

If it is not M64 BAR, then what is it? It is always in one of M64 BARs (in
the worst case - BAR#15?), the difference is if it is segmented or not, no?



The VF BAR could be one of IO, M32, M64. If it's not M64, the VFs are supposed
to be disabled and the (IO and M32) resources won't be allocted, but for sure,
the IO and M32 resources can't be put into any one of the 16 PHB's M64 BARs.
would you recommend one better message then?




dev_warn(&pdev->dev, "SRIOV is disabled as no space is left in 64bit MMIO 
window\n");


Or it is not "MMIO window"?









/* Calculate available PE for required VFs */
mutex_lock(&phb->ioda.pe_alloc_mutex);
pdn->offset = bitmap_find_next_zero_area(
@@ -2774,9 +2771,10 @@ static void pnv_pci_ioda_fixup_iov_resources(struct 
pci_dev *pdev)
if (!res->flags || res->parent)
continue;
if (!pnv_pci_is_mem_pref_64(res->flags)) {
-   dev_warn(&pdev->dev, " non M64 VF BAR%d: %pR\n",
+   dev_warn(&pdev->dev, "Don't support SR-IOV with"
+   " non M64 VF BAR%d: %pR. \n",
 i, res);
-   continue;
+   return;
}

size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES);
@@ -2795,11 +2793,6 @@ static void pnv_pci_ioda_fixup_iov_resources(struct 
pci_dev *pdev)
res = &pdev->resource[i + PCI_IOV_RESOURCES];
if (!res->flags || res->parent)
continue;
-   if (!pnv_pci_is_mem_pref_64(res->flags)) {
-  

Re: [PATCH v3 3/3] mm: use numa_mem_id() in alloc_pages_node()

2015-08-06 Thread Vlastimil Babka

On 07/30/2015 07:41 PM, Johannes Weiner wrote:

On Thu, Jul 30, 2015 at 06:34:31PM +0200, Vlastimil Babka wrote:

numa_mem_id() is able to handle allocation from CPUs on memory-less nodes,
so it's a more robust fallback than the currently used numa_node_id().


Won't it fall through to the next closest memory node in the zonelist
anyway?


Right, I would expect the zonelist of memoryless node to be the same as 
of the closest node. Documentation/vm/numa seems to agree.


Is this for callers doing NUMA_NO_NODE with __GFP_THISZONE?

I guess that's the only scenario where that matters, yeah. And there 
might well be no such caller now, but maybe some will sneak in without 
the author testing on a system with memoryless node.


Note that with !CONFIG_HAVE_MEMORYLESS_NODES, numa_mem_id() just does 
numa_node_id().


So yeah I think "a more robust fallback" is correct :) But let's put it 
explicitly in changelog then:


8<

alloc_pages_node() might fail when called with NUMA_NO_NODE and 
__GFP_THISNODE on a CPU belonging to a memoryless node. To make the 
local-node fallback more robust and prevent such situations, use 
numa_mem_id(), which was introduced for similar scenarios in the slab 
context.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev