Re: GPMC in device tree

2015-08-09 Thread Ran Shalit
On Thu, Aug 6, 2015 at 6:07 AM, Scott Wood scottw...@freescale.com wrote:
 On Wed, 2015-08-05 at 17:27 +0300, Ran Shalit wrote:
 On Wed, Aug 5, 2015 at 9:11 AM, Ran Shalit ransha...@gmail.com wrote:
  On Wed, Aug 5, 2015 at 6:56 AM, Ran Shalit ransha...@gmail.com wrote:
   On Wed, Aug 5, 2015 at 12:25 AM, Scott Wood scottw...@freescale.com
   wrote:
On Wed, 2015-08-05 at 00:22 +0300, Ran Shalit wrote:
 On more thing, if I may.
 The localbus is also connected to nvram  cpld.
 I've noticed that read/write works well, even though I didn't define
 anything in device tree.
 Is there any reasom to add these devices into device tree, or can we
 use the cpld and nvram without the definition in device tree ?
   
I don't know what you're doing in your kernel to access devices that
aren't
in the device tree.  You should add the devices to the device tree,
and have
the kernel use it rather than hardcoded info.
   
-Scott
   
   Hi,
  
   Yes I understand.
   But It is worse noting that I have no localbus entry in the device tree.
   Yes, The nvram, cpld which are both connected to device tree, seems to
   work without any issues.
  
   Thanks,
   Ran
 
  I apologyze for the bad english, I meant it worth to note that there
  is no localbus entry at all in the device tree.
  So I wander how the nvram and cpld worked...

 I don't know how it worked -- presumably there's something in your kernel
 that hardcodes knowledge of those devices.

  If I may please ask, what should be the compatible for generic
  devices such as  nvram/cpld ?

 CPLD is not a generic device.  The compatible should describe the logic that
 has been programmed into the CPLD.

  I assume that if they worked without any entry, it means that there is
  no need for specific driver.
 
  Regards,
  Ran

 Hi,

 After studing the localbus configuration as should be configured in
 device tree for powerpc, I think I have come with the following
 configuration, (not yet tested on board):


  localbus@e0005000{
 #address-cells = 2;
 #size-cells = 1;
 compatible = fsl,mpc8349-localbus, simple-bus;
  reg = 0xe0005000 0x1000;
  interrupts = 77 0x8;
  interrupt-parent = ipic;


 /* NOR and NAND Flashes */
ranges = 0x0 0x0 0xff80 0x0080 /* 8MB NOR Flash */
  0x1 0x0 0xF800 0x0800  /* User flash (same
 nor, in burst mode) 128M */
  0x2 0x0 0xf7e0 0x0020;/*NVRAM/CPLD C2 is
 selected in CPLD , */
 /*nvram 0xf7e0 1MB */
 /*cpld  0xf7f0 1M  (- different address!)*/
 nor@0,0 {
 #address-cells = 1;
 #size-cells = 1;
 compatible = cfi-flash;
 reg = 0x0 0x0 0x100;
 #bank-width = 1;
 device-width = 4;

 };
 };


Hi ,

I reboot the board, with the new device tree localbus, but I don't
have any new /dev/mtdX entry for the NOR flash.
There is no HW issue, becuase we can R/W access the NOR flash from u-boot.
Is there any hint what can be the issue here ? I've checked in kernel
config and validated that mtd is supported.
The NOR flash is S29GL512P , SPANSION.

localbus@e0005000 {
#address-cells = 2;
#size-cells = 1;
   compatible = fsl,mpc8349-localbus, simple-bus;
reg = 0xe0005000 0x1000;
interrupts = 77 0x8;
interrupt-parent = ipic;


# NOR and NAND Flashes
  ranges = 0x0 0x0 0xff80 0x0080
0x1 0x0 0xF800 0x0800
0x2 0x0 0xf7e0 0x0020;
nor@0,0 {
#address-cells = 1;
#size-cells = 1;
compatible = cfi-flash;
reg = 0x0 0x0 0x0080;
#bank-width = 1;
device-width = 1;

};
};

Best Regards,
Ran
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: 4.1-rc6: ATA link is slow to respond, please be patient

2015-08-09 Thread Christian Kujau
On Sat, 8 Aug 2015, Christian Kujau wrote:

 [Adding linux-...@vger.kernel.org]
 
 On Fri, 7 Aug 2015, Christian Kujau wrote:
  this PowerBook G4 was running 3.16 for a while but now I wanted to upgrade 
  to latest mainline. However, during bootup the following happens:
  
  ===
  [2.237102] ata1: PATA max UDMA/100 irq 39
  [2.401708] ata1.00: ATA-8: SAMSUNG HM061GC, LR100-10, max UDMA/100
  [2.401764] ata1.00: 117231408 sectors, multi 16: LBA48 
  [2.417633] ata1.00: configured for UDMA/100
  [   44.918102] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
  [   44.920452] ata1.00: failed command: READ DMA
  [   44.922725] ata1.00: cmd c8/00:88:64:c2:12/00:00:00:00:00/e0 tag 0 dma 
  69632 in
  [   44.927257] ata1.00: status: { DRDY }
  [   49.971784] ata1.00: qc timeout (cmd 0xec)
  [   49.976529] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
  [   49.978908] ata1.00: revalidation failed (errno=-5)
  [   55.019662] ata1: link is slow to respond, please be patient (ready=0)
  [   60.007677] ata1: device not ready (errno=-16), forcing hardreset
  [   60.012670] ata1: soft resetting link
  [   60.193638] ata1.00: configured for UDMA/100
  [   60.196158] ata1.00: device reported invalid CHS sector 0
  [   60.198610] ata1: EH complete
  ===
  
  This happens only once, but systemd thinks there's a hard problem and will 
  drop to a recovery shell. I can start sshd and login remotely and then the 
  system appears to be running just fine.

I played around with libata* kernel parameters, the only success I had 
was with libata.dma=0 - which disables DMA and the system booted without 
the error. But of course the disk throughput was much slower - is there a 
way to enable DMA again once the system is booted? hdparm -d would 
return HDIO_SET_DMA, of course[0].

Tried something more drastic and disabled libata completely and enabled 
CONFIG_IDE (and CONFIG_BLK_DEV_IDE_PMAC) again and a similar error appears 
(sometimes) during bootup:

[   39.971392] ide-pmac lost interrupt, dma status: 8480
[   39.972704] hda: lost interrupt
[   39.973951] hda: dma_intr: status=0xd8 { Busy }
[   39.975231] hda: possibly failed opcode: 0x25
[   39.978855] hda: DMA disabled
[   40.019388] ide0: reset: success

But the host seems to recover more quickly and systemd wasn't thrown off 
by the small ATA delay. But DMA got disabled again :-\

Ideas welcome! :-)

Christian.

[0] https://ata.wiki.kernel.org/index.php/Libata_FAQ

  This happened in 4.2.0-rc5 so I went back a few versions and found that
  4.1-rc5 was OK (the error does not show up and the system boots just fine)
  and 4.1-rc6 is not.
  
 
 After more digging around I noticed that the same error (with 
 changed wording) happens with a Debian 3.16.0-4-powerpc kernel - so it
 doesn't appear to be a recent regression as I suspected at first:
 
 ==
 [   46.907147] ata1: drained 572 bytes to clear DRQ
 [   46.907166] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
 [   46.908419] ata1.00: failed command: READ DMA
 [   46.909058] ata1.00: cmd c8/00:80:9c:f9:60/00:00:00:00:00/e0 tag 0 dma 
 65536 in
  res 40/00:fe:00:00:00/00:00:00:00:00/40 Emask 0x20 (host bus error)
 [   46.910303] ata1.00: status: { DRDY }
 [   46.970579] ata1.00: configured for UDMA/100
 [   46.971853] ata1.00: device reported invalid CHS sector 0
 [   46.972524] ata1: EH complete
 ==
 
 Also, the error cannot repduced as reliably as I thought: sometimes, the 
 machine just boots w/o a hitch - and that might be the reasons why my 
 bisect attempts failed and incorrectly blamed totally unrelated commits: 
 after each git bisect {good,bad} (+compiling) I rebooted but there was a 
 chance that the system came up just fine / showed the same ATA error and 
 thus falsified the git-bisect results.
 
 I noticed that with this Debian 3.16 kernel, it happens less often when I 
 use the irqpoll option. But with 4.2-rc5 this doesn't seem to help much, 
 the system still hangs during boot but continues after the EH complete  
 message. And it doesn't appear afterwards, I can read from my root disk 
 just fine and a long SMART check also comes back fine.
 
 Because the error only appears to happen on the very first access after a 
 reboot, I tried to boot with rootdelay=30 - but of course then it just 
 waits before accessing the root disk. I'd need a magic option to wait a 
 few seconds after the first disk access, so that the boot framework 
 (systemd) won't be thrown off when /dev/sda isn't responding as fast as 
 expected.
 
 What _does_ seem to help a bit was to disable the the swap device, which 
 is configured as an encrypted dm-device here - and systemd was almost 
 always stumbling over this particular service during bootup. Because of 
 the ATA timeout, the dm-device could not be setup correctly and systemd 
 would bail out and drop me into a 

Re: GPMC in device tree

2015-08-09 Thread Ran Shalit
On Sun, Aug 9, 2015 at 9:27 AM, Ran Shalit ransha...@gmail.com wrote:
 On Thu, Aug 6, 2015 at 6:07 AM, Scott Wood scottw...@freescale.com wrote:
 On Wed, 2015-08-05 at 17:27 +0300, Ran Shalit wrote:
 On Wed, Aug 5, 2015 at 9:11 AM, Ran Shalit ransha...@gmail.com wrote:
  On Wed, Aug 5, 2015 at 6:56 AM, Ran Shalit ransha...@gmail.com wrote:
   On Wed, Aug 5, 2015 at 12:25 AM, Scott Wood scottw...@freescale.com
   wrote:
On Wed, 2015-08-05 at 00:22 +0300, Ran Shalit wrote:
 On more thing, if I may.
 The localbus is also connected to nvram  cpld.
 I've noticed that read/write works well, even though I didn't define
 anything in device tree.
 Is there any reasom to add these devices into device tree, or can we
 use the cpld and nvram without the definition in device tree ?
   
I don't know what you're doing in your kernel to access devices that
aren't
in the device tree.  You should add the devices to the device tree,
and have
the kernel use it rather than hardcoded info.
   
-Scott
   
   Hi,
  
   Yes I understand.
   But It is worse noting that I have no localbus entry in the device tree.
   Yes, The nvram, cpld which are both connected to device tree, seems to
   work without any issues.
  
   Thanks,
   Ran
 
  I apologyze for the bad english, I meant it worth to note that there
  is no localbus entry at all in the device tree.
  So I wander how the nvram and cpld worked...

 I don't know how it worked -- presumably there's something in your kernel
 that hardcodes knowledge of those devices.

  If I may please ask, what should be the compatible for generic
  devices such as  nvram/cpld ?

 CPLD is not a generic device.  The compatible should describe the logic that
 has been programmed into the CPLD.

  I assume that if they worked without any entry, it means that there is
  no need for specific driver.
 
  Regards,
  Ran

 Hi,

 After studing the localbus configuration as should be configured in
 device tree for powerpc, I think I have come with the following
 configuration, (not yet tested on board):


  localbus@e0005000{
 #address-cells = 2;
 #size-cells = 1;
 compatible = fsl,mpc8349-localbus, simple-bus;
  reg = 0xe0005000 0x1000;
  interrupts = 77 0x8;
  interrupt-parent = ipic;


 /* NOR and NAND Flashes */
ranges = 0x0 0x0 0xff80 0x0080 /* 8MB NOR Flash */
  0x1 0x0 0xF800 0x0800  /* User flash (same
 nor, in burst mode) 128M */
  0x2 0x0 0xf7e0 0x0020;/*NVRAM/CPLD C2 is
 selected in CPLD , */
 /*nvram 0xf7e0 1MB */
 /*cpld  0xf7f0 1M  (- different address!)*/
 nor@0,0 {
 #address-cells = 1;
 #size-cells = 1;
 compatible = cfi-flash;
 reg = 0x0 0x0 0x100;
 #bank-width = 1;
 device-width = 4;

 };
 };


 Hi ,

 I reboot the board, with the new device tree localbus, but I don't
 have any new /dev/mtdX entry for the NOR flash.
 There is no HW issue, becuase we can R/W access the NOR flash from u-boot.
 Is there any hint what can be the issue here ? I've checked in kernel
 config and validated that mtd is supported.
 The NOR flash is S29GL512P , SPANSION.

 localbus@e0005000 {
 #address-cells = 2;
 #size-cells = 1;
compatible = fsl,mpc8349-localbus, simple-bus;
 reg = 0xe0005000 0x1000;
 interrupts = 77 0x8;
 interrupt-parent = ipic;


 # NOR and NAND Flashes
   ranges = 0x0 0x0 0xff80 0x0080
 0x1 0x0 0xF800 0x0800
 0x2 0x0 0xf7e0 0x0020;
 nor@0,0 {
 #address-cells = 1;
 #size-cells = 1;
 compatible = cfi-flash;
 reg = 0x0 0x0 0x0080;
 #bank-width = 1;
 device-width = 1;

 };
 };

 Best Regards,
 Ran

Hello,

Just to update,
I eventually solved this issue.
I don't do any configuration in device tree. All BRx configuration is
already done in u-boot (as was done from the start), and everything
seems to work OK: cpld, nvram.

For NOR FPGA I only added NOR configuration to kernel:

CONFIG_MTD_PHYSMAP=y
CONFIG_MTD_PHYSMAP_START=0xf800
CONFIG_MTD_PHYSMAP_LEN=0x780
CONFIG_MTD_PHYSMAP_BANKWIDTH=4

Thank you for the tips,
Ran
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

EEH regression: PE - device binding lost after reset

2015-08-09 Thread Daniel Axtens
Hi,

I'm experiencing a regression in EEH that was introduced somewhere
between 4.0 and 4.1.

I have been reproducing this with a CAPI (CXL) card, but the behaviour
isn't CAPI related and the triggering code hasn't changed. CAPI cards
are reprogrammed by PERSTing the slot they sit in, so CAPI exposes a
'reset' file in sysfs that does pci_set_pcie_reset_state(dev,
pcie_warm_reset), and then relies on EEH noticing to properly reset the
card.

In 4.0 and earlier, this worked: the slot would be persted, EEH would
notice and hotplug. You could do this as many times as you liked.

In 4.1 and later, you can do 1 successful reset, but any subsequent
reset causes the following to be printed in dmesg:

[  225.118656] cxl-pci 0006:01:00.0: CXL reset
[  225.118663] pcibios_set_pcie_reset_state: No PE found on PCI device 
0006:01:00.0
[  225.118672] cxl-pci 0006:01:00.0: cxl: pcie_warm_reset failed

I'm digging through the commits between 4.0 and 4.1 at the moment, but I
thought I'd post it here in hopes someone had an idea what the root
cause was. 


-- 
Regards,
Daniel


signature.asc
Description: This is a digitally signed message part
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/5] powerpc/mm: Fix pte_pagesize_index() crash on 4K w/64K hash

2015-08-09 Thread Aneesh Kumar K.V
Michael Ellerman m...@ellerman.id.au writes:

 The powerpc kernel can be built to have either a 4K PAGE_SIZE or a 64K
 PAGE_SIZE.

 However when built with a 4K PAGE_SIZE there is an additional config
 option which can be enabled, PPC_HAS_HASH_64K, which means the kernel
 also knows how to hash a 64K page even though the base PAGE_SIZE is 4K.

 This is used in one obscure configuration, to support 64K pages for SPU
 local store on the Cell processor when the rest of the kernel is using
 4K pages.

 In this configuration, pte_pagesize_index() is defined to just pass
 through its arguments to get_slice_psize(). However pte_pagesize_index()
 is called for both user and kernel addresses, whereas get_slice_psize()
 only knows how to handle user addresses.

 This has been broken forever, however until recently it happened to
 work. That was because in get_slice_psize() the large kernel address
 would cause the right shift of the slice mask to return zero.

 However in commit 7aa0727f3302 powerpc/mm: Increase the slice range to
 64TB, the get_slice_psize() code was changed so that instead of a right
 shift we do an array lookup based on the address. When passed a kernel
 address this means we index way off the end of the slice array and
 return random junk.

 That is only fatal if we happen to hit something non-zero, but when we
 do return a non-zero value we confuse the MMU code and eventually cause
 a check stop.

 This fix is ugly, but simple. When we're called for a kernel address we
 return 4K, which is always correct in this configuration, otherwise we
 use the slice mask.

 Fixes: 7aa0727f3302 (powerpc/mm: Increase the slice range to 64TB)
 Reported-by: Cyril Bur cyril...@gmail.com
 Signed-off-by: Michael Ellerman m...@ellerman.id.au


Reviewed-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com

 ---
  arch/powerpc/include/asm/pgtable-ppc64.h | 14 +-
  1 file changed, 13 insertions(+), 1 deletion(-)

 diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h 
 b/arch/powerpc/include/asm/pgtable-ppc64.h
 index 3bb7488bd24b..7ee2300ee392 100644
 --- a/arch/powerpc/include/asm/pgtable-ppc64.h
 +++ b/arch/powerpc/include/asm/pgtable-ppc64.h
 @@ -135,7 +135,19 @@
  #define pte_iterate_hashed_end() } while(0)

  #ifdef CONFIG_PPC_HAS_HASH_64K
 -#define pte_pagesize_index(mm, addr, pte)get_slice_psize(mm, addr)
 +/*
 + * We expect this to be called only for user addresses or kernel virtual
 + * addresses other than the linear mapping.
 + */
 +#define pte_pagesize_index(mm, addr, pte)\
 + ({  \
 + unsigned int psize; \
 + if (is_kernel_addr(addr))   \
 + psize = MMU_PAGE_4K;\
 + else\
 + psize = get_slice_psize(mm, addr);  \
 + psize;  \
 + })
  #else
  #define pte_pagesize_index(mm, addr, pte)MMU_PAGE_4K
  #endif
 -- 
 2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 3/5] powerpc/mm: Drop the 64K on 4K version of pte_pagesize_index()

2015-08-09 Thread Aneesh Kumar K.V
Michael Ellerman m...@ellerman.id.au writes:

 Now that support for 64k pages with a 4K kernel is removed, this code is
 unreachable.

 CONFIG_PPC_HAS_HASH_64K can only be true when CONFIG_PPC_64K_PAGES is
 also true.

 But when CONFIG_PPC_64K_PAGES is true we include pte-hash64.h which
 includes pte-hash64-64k.h, which defines both pte_pagesize_index() and
 crucially __real_pte, which means this defintion can never be used.

 Signed-off-by: Michael Ellerman m...@ellerman.id.au

Reviewed-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
 ---
  arch/powerpc/include/asm/pgtable-ppc64.h | 12 
  1 file changed, 12 deletions(-)

 diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h 
 b/arch/powerpc/include/asm/pgtable-ppc64.h
 index 7ee2300ee392..fa1dfb7f7b48 100644
 --- a/arch/powerpc/include/asm/pgtable-ppc64.h
 +++ b/arch/powerpc/include/asm/pgtable-ppc64.h
 @@ -134,23 +134,11 @@

  #define pte_iterate_hashed_end() } while(0)

 -#ifdef CONFIG_PPC_HAS_HASH_64K
  /*
   * We expect this to be called only for user addresses or kernel virtual
   * addresses other than the linear mapping.
   */
 -#define pte_pagesize_index(mm, addr, pte)\
 - ({  \
 - unsigned int psize; \
 - if (is_kernel_addr(addr))   \
 - psize = MMU_PAGE_4K;\
 - else\
 - psize = get_slice_psize(mm, addr);  \
 - psize;  \
 - })
 -#else
  #define pte_pagesize_index(mm, addr, pte)MMU_PAGE_4K
 -#endif

  #endif /* __real_pte */

 -- 
 2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v5 2/6] powerpc/powernv: Add definition of OPAL_MSG_OCC message type

2015-08-09 Thread Stewart Smith
Shilpasri G Bhat shilpa.b...@linux.vnet.ibm.com writes:
 Add OPAL_MSG_OCC message definition to opal_message_type to receive
 OCC events like reset, load and throttled. Host performance can be
 affected when OCC is reset or OCC throttles the max Pstate.
 We can register to opal_message_notifier to receive OPAL_MSG_OCC type
 of message and report it to the userspace so as to keep the user
 informed about the reason for a performance drop in workloads.

 The reset and load OCC events are notified to kernel when FSP sends
 OCC_RESET and OCC_LOAD commands.  Both reset and load messages are
 sent to kernel on successful completion of reset and load operation
 respectively.

How is this done on OpenPower systems? Explanation involving just what
OPAL does is likely better, rather than explaining in context of FSP,
which Linux has no real knowledge of (OPAL provides all abstraction of
it).

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [v3] powerpc/fsl-booke: Add T1040D4RDB/T1042D4RDB board support

2015-08-09 Thread Priyanka Jain
Hello Scott, 

T1040D4RDB, T1042D4RDB are completely new boards. 
They can support DDR4 memory, new serdes protocol 0x86, eth phy addresses are 
different (than previous boards), no. of eth ports are different, etc

Regards
Priyanka




 -Original Message-
 From: Wood Scott-B07421
 Sent: Saturday, August 08, 2015 8:11 AM
 To: Jain Priyanka-B32167
 Cc: linuxppc-dev@lists.ozlabs.org; Sun York-R58495
 Subject: Re: [v3] powerpc/fsl-booke: Add T1040D4RDB/T1042D4RDB board
 support
 
 On Thu, Jul 30, 2015 at 10:33:55AM +0530, Priyanka Jain wrote:
  T1040D4RDB/T1042D4RDB are Freescale Reference Design Board which can
  support T1040/T1042 QorIQ Power Architectureâ„¢ processor respectively
 
 What is the actual name of this board?
 http://patchwork.ozlabs.org/patch/504944/ changes the name in U-Boot
 from T1040D4RDB to T1040RDB_DDR4.
 
 Is it really a different board, or just different RAM?  If the latter, we 
 don't
 specify the RAM type in the device tree, so why do we need separate device
 trees?
 
 -Scott
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: 4.1-rc6: ATA link is slow to respond, please be patient

2015-08-09 Thread Michael Ellerman
On Sat, 2015-08-08 at 21:17 -0700, Christian Kujau wrote:
 [Adding linux-...@vger.kernel.org]
 
 On Fri, 7 Aug 2015, Christian Kujau wrote:
  this PowerBook G4 was running 3.16 for a while but now I wanted to upgrade 
  to latest mainline. However, during bootup the following happens:
  
  ===
  [2.237102] ata1: PATA max UDMA/100 irq 39
  [2.401708] ata1.00: ATA-8: SAMSUNG HM061GC, LR100-10, max UDMA/100
  [2.401764] ata1.00: 117231408 sectors, multi 16: LBA48 
  [2.417633] ata1.00: configured for UDMA/100
  [   44.918102] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
  [   44.920452] ata1.00: failed command: READ DMA
  [   44.922725] ata1.00: cmd c8/00:88:64:c2:12/00:00:00:00:00/e0 tag 0 dma 
  69632 in
  [   44.927257] ata1.00: status: { DRDY }
  [   49.971784] ata1.00: qc timeout (cmd 0xec)
  [   49.976529] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
  [   49.978908] ata1.00: revalidation failed (errno=-5)
  [   55.019662] ata1: link is slow to respond, please be patient (ready=0)
  [   60.007677] ata1: device not ready (errno=-16), forcing hardreset
  [   60.012670] ata1: soft resetting link
  [   60.193638] ata1.00: configured for UDMA/100
  [   60.196158] ata1.00: device reported invalid CHS sector 0
  [   60.198610] ata1: EH complete
  ===
  
  This happens only once, but systemd thinks there's a hard problem and will 
  drop to a recovery shell. I can start sshd and login remotely and then the 
  system appears to be running just fine.
  
  This happened in 4.2.0-rc5 so I went back a few versions and found that
  4.1-rc5 was OK (the error does not show up and the system boots just fine)
  and 4.1-rc6 is not.
  
 
 After more digging around I noticed that the same error (with 
 changed wording) happens with a Debian 3.16.0-4-powerpc kernel - so it
 doesn't appear to be a recent regression as I suspected at first:
 
 ==
 [   46.907147] ata1: drained 572 bytes to clear DRQ
 [   46.907166] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
 [   46.908419] ata1.00: failed command: READ DMA
 [   46.909058] ata1.00: cmd c8/00:80:9c:f9:60/00:00:00:00:00/e0 tag 0 dma 
 65536 in
  res 40/00:fe:00:00:00/00:00:00:00:00/40 Emask 0x20 (host bus error)
 [   46.910303] ata1.00: status: { DRDY }
 [   46.970579] ata1.00: configured for UDMA/100
 [   46.971853] ata1.00: device reported invalid CHS sector 0
 [   46.972524] ata1: EH complete
 ==
 
 Also, the error cannot repduced as reliably as I thought: sometimes, the 
 machine just boots w/o a hitch - and that might be the reasons why my 
 bisect attempts failed and incorrectly blamed totally unrelated commits: 
 after each git bisect {good,bad} (+compiling) I rebooted but there was a 
 chance that the system came up just fine / showed the same ATA error and 
 thus falsified the git-bisect results.

Yes that would explain why the bisect went wrong. If you have an intermittent
bug like that you have to be very careful about which commits you mark good or
bad.

I don't really know anything about disk drivers, so hopefully someone who does
can chime in.

cheers


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V2 3/6] powerpc/powernv: use one M64 BAR in Single PE mode for one VF BAR

2015-08-09 Thread Wei Yang
On Fri, Aug 07, 2015 at 06:59:58PM +1000, Alexey Kardashevskiy wrote:
On 08/07/2015 12:01 PM, Wei Yang wrote:
On Thu, Aug 06, 2015 at 08:04:58PM +1000, Alexey Kardashevskiy wrote:
On 08/05/2015 11:25 AM, Wei Yang wrote:
In current implementation, when VF BAR is bigger than 64MB, it uses 4 M64
BAR in Single PE mode to cover the number of VFs required to be enabled.
By doing so, several VFs would be in one VF Group and leads to interference
between VFs in the same group.

This patch changes the design by using one M64 BAR in Single PE mode for
one VF BAR. This gives absolute isolation for VFs.

Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com
---
  arch/powerpc/include/asm/pci-bridge.h |5 +-
  arch/powerpc/platforms/powernv/pci-ioda.c |  180 
 -
  2 files changed, 76 insertions(+), 109 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h 
b/arch/powerpc/include/asm/pci-bridge.h
index 712add5..8aeba4c 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -214,10 +214,9 @@ struct pci_dn {
u16 vfs_expanded;   /* number of VFs IOV BAR expanded */
u16 num_vfs;/* number of VFs enabled*/
int offset; /* PE# for the first VF PE */
-#define M64_PER_IOV 4
-   int m64_per_iov;
+   boolm64_single_mode;/* Use M64 BAR in Single Mode */
  #define IODA_INVALID_M64(-1)
-   int m64_wins[PCI_SRIOV_NUM_BARS][M64_PER_IOV];
+   int (*m64_map)[PCI_SRIOV_NUM_BARS];
  #endif /* CONFIG_PCI_IOV */
  #endif
struct list_head child_list;
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 7192e62..f5d110c 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1148,29 +1148,36 @@ static void pnv_pci_ioda_setup_PEs(void)
  }

  #ifdef CONFIG_PCI_IOV
-static int pnv_pci_vf_release_m64(struct pci_dev *pdev)
+static int pnv_pci_vf_release_m64(struct pci_dev *pdev, u16 num_vfs)
  {
struct pci_bus*bus;
struct pci_controller *hose;
struct pnv_phb*phb;
struct pci_dn *pdn;
inti, j;
+   intm64_bars;

bus = pdev-bus;
hose = pci_bus_to_host(bus);
phb = hose-private_data;
pdn = pci_get_pdn(pdev);

+   if (pdn-m64_single_mode)
+   m64_bars = num_vfs;
+   else
+   m64_bars = 1;
+
for (i = 0; i  PCI_SRIOV_NUM_BARS; i++)
-   for (j = 0; j  M64_PER_IOV; j++) {
-   if (pdn-m64_wins[i][j] == IODA_INVALID_M64)
+   for (j = 0; j  m64_bars; j++) {
+   if (pdn-m64_map[j][i] == IODA_INVALID_M64)
continue;
opal_pci_phb_mmio_enable(phb-opal_id,
-   OPAL_M64_WINDOW_TYPE, pdn-m64_wins[i][j], 0);
-   clear_bit(pdn-m64_wins[i][j], 
phb-ioda.m64_bar_alloc);
-   pdn-m64_wins[i][j] = IODA_INVALID_M64;
+   OPAL_M64_WINDOW_TYPE, pdn-m64_map[j][i], 0);
+   clear_bit(pdn-m64_map[j][i], phb-ioda.m64_bar_alloc);
+   pdn-m64_map[j][i] = IODA_INVALID_M64;
}

+   kfree(pdn-m64_map);
return 0;
  }

@@ -1187,8 +1194,7 @@ static int pnv_pci_vf_assign_m64(struct pci_dev 
*pdev, u16 num_vfs)
inttotal_vfs;
resource_size_tsize, start;
intpe_num;
-   intvf_groups;
-   intvf_per_group;
+   intm64_bars;

bus = pdev-bus;
hose = pci_bus_to_host(bus);
@@ -1196,26 +1202,26 @@ static int pnv_pci_vf_assign_m64(struct pci_dev 
*pdev, u16 num_vfs)
pdn = pci_get_pdn(pdev);
total_vfs = pci_sriov_get_totalvfs(pdev);

-   /* Initialize the m64_wins to IODA_INVALID_M64 */
-   for (i = 0; i  PCI_SRIOV_NUM_BARS; i++)
-   for (j = 0; j  M64_PER_IOV; j++)
-   pdn-m64_wins[i][j] = IODA_INVALID_M64;
+   if (pdn-m64_single_mode)


This is a physical function's @pdn, right?

Yes



+   m64_bars = num_vfs;
+   else
+   m64_bars = 1;
+
+   pdn-m64_map = kmalloc(sizeof(*pdn-m64_map) * m64_bars, GFP_KERNEL);


Assume we have SRIOV device with 16VF.
So it was m64_wins[6][4], now it is (roughly speaking) m64_map[6][16]
(for a single PE mode) or m64_map[6][1]. I believe m64_bars cannot be
bigger than 16 on PHB3, right? Is this checked anywhere (does it have
to)?

In pnv_pci_vf_assign_m64(), we need to find_next_zero_bit() and check the
return value. If exceed m64_bar_idx, means fail.


This m64_wins - m64_map change - is was not a map (what was it?),
and it is, is not it?

Hmm... Gavin like this name.


What does it store? An index of M64 BAR (0..15)?


Yes.



+   if (!pdn-m64_map)
+   return -ENOMEM;
+   /* Initialize the m64_map to IODA_INVALID_M64 */
+   for (i = 0; i  m64_bars ; i++)
+   for 

Re: powerpc/fsl_book3e: fix the relocatable bug in debug interrupt handler

2015-08-09 Thread Huang, Yuanjie

Hi Scott,

On 08/08/2015 10:29 AM, Scott Wood wrote:

[Please wrap commit messages at around 74 columns]

Ok, I will when sending a new version.


On Fri, Aug 07, 2015 at 02:58:10PM +0800, Yuanjie Huang wrote:

PowerPC Book3E processor features hardware-supported single instruction
execution, and it is used for ptrace(PTRACE_SINGLESTEP, ...).  When a
debugger loads a debuggee, it typically sets the CPU to yield debug
interrupt on first instruction complete or branch taken.  However, the
newly-forked child process could run into instruction TLB miss
exception handler when switched to, and causes a debug interrupt in the
exception entry sequence.  This is not expected by caller of
ptrace(PTRACE_SINGLESTEP, ...), so the next instruction address saved
in DSRR0 is checked against the boundary of exception entry sequence,
to ensure the kernel only process the interrupt as a normal exception
if the address does not fall in the exception entry sequence.  Failure
in obtaining the correct boundary leads to such debug exception handled
as from privileged mode, and causes kernel oops.

The LOAD_REG_IMMEDIATE can't be used to load the boundary addresses
when relocatable enabled, so this patch replace them with
LOAD_REG_ADDR_PIC.  LR is backed up and restored before and after
calling LOAD_REG_ADDR_PIC, because LOAD_REG_ADDR_PIC clobbers it.

Signed-off-by: Yuanjie Huang yuanjie.hu...@windriver.com
---
  arch/powerpc/kernel/exceptions-64e.S | 24 
  1 file changed, 24 insertions(+)

diff --git a/arch/powerpc/kernel/exceptions-64e.S 
b/arch/powerpc/kernel/exceptions-64e.S
index 3e68d1c..c475f569 100644
--- a/arch/powerpc/kernel/exceptions-64e.S
+++ b/arch/powerpc/kernel/exceptions-64e.S
@@ -735,12 +735,24 @@ END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC)
andis.  r15,r14,(DBSR_IC|DBSR_BT)@h
beq+1f
  
+#ifdef CONFIG_RELOCATABLE

+   mflrr14
+   LOAD_REG_ADDR_PIC(r15,interrupt_base_book3e)
+   mtlrr14
+   cmpld   cr0,r10,r15
+   blt+cr0,1f
+   LOAD_REG_ADDR_PIC(r15,interrupt_end_book3e)
+   mtlrr14
+   cmpld   cr0,r10,r15
+   bge+cr0,1f
+#else

CONFIG_RELOCATABLE is not supported on 64-bit book3e without applying
additional patches, such as the RFC patchset I posted recently that
contained the patch powerpc/book3e-64: rename interrupt_end_book3e with
__end_interrupts.  But if you've applied that patchset, then you
wouldn't be working with the name interrupt_base_book3e, so how are you
seeing this?


Actually I have merged additional patches submitted but not merged to 
make CONFIG_RELOCATABLE work with 64-bit book3e. I am happy to delay 
this until those patches are merged, and sent an adjusted version. Shall 
I wait until they are merged?



Also, why not use the RELOCATABLE version unconditionally?  I don't think
this is a performance-critical path.


The difference is 15 instructions against 14, if it's not important we 
can surely use only RELOCATABLE version.


Best,
Yuanjie


-Scott


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: RFC: Reducing the number of non volatile GPRs in the ppc64 kernel

2015-08-09 Thread Anton Blanchard
Hi Bill, Segher,

 I agree with Segher.  We already know we have opportunities to do a
 better job with shrink-wrapping (pushing this kind of useless
 activity down past early exits), so having examples of code to look
 at to improve this would be useful.

I'll look out for specific examples. I noticed this one today when
analysing malloc(8). It is an instruction trace of _int_malloc().

The overall function is pretty huge, which I assume leads to gcc using
so many non volatiles. Perhaps in this case we should separate out the
slow path into another function marked noinline.

This is just an upstream glibc build, but I'll send the preprocessed
source off list.

Anton
--

0x410d538   mflrr0
0x410d53c   li  r9,-65
0x410d540   std r14,-144(r1) # 0x000fff00efe0
0x410d544   std r15,-136(r1) # 0x000fff00efe8
0x410d548   cmpld   cr7,r4,r9
0x410d54c   std r16,-128(r1) # 0x000fff00eff0
0x410d550   std r17,-120(r1) # 0x000fff00eff8
0x410d554   std r18,-112(r1) # 0x000fff00f000
0x410d558   std r19,-104(r1) # 0x000fff00f008
0x410d55c   std r20,-96(r1)  # 0x000fff00f010
0x410d560   std r21,-88(r1)  # 0x000fff00f018
0x410d564   std r22,-80(r1)  # 0x000fff00f020
0x410d568   std r23,-72(r1)  # 0x000fff00f028
0x410d56c   std r0,16(r1)# 0x000fff00f080
0x410d570   std r24,-64(r1)  # 0x000fff00f030
0x410d574   std r25,-56(r1)  # 0x000fff00f038
0x410d578   std r26,-48(r1)  # 0x000fff00f040
0x410d57c   std r27,-40(r1)  # 0x000fff00f048
0x410d580   std r28,-32(r1)  # 0x000fff00f050
0x410d584   std r29,-24(r1)  # 0x000fff00f058
0x410d588   std r30,-16(r1)  # 0x000fff00f060
0x410d58c   std r31,-8(r1)   # 0x000fff00f068
0x410d590   stdur1,-224(r1)  # 0x000fff00ef90
0x410d594   bgt cr7,0x410dda4
0x410d598   addir9,r4,23
0x410d59c   li  r16,32
0x410d5a0   cmpldi  cr7,r9,31
0x410d5a4   bgt cr7,0x410d700
0x410d5a8   cmpdi   cr7,r3,0
0x410d5ac   mr  r14,r3
0x410d5b0   mr  r30,r4
0x410d5b4   beq cr7,0x410ddc0
0x410d5b8   nop
0x410d5bc   ld  r9,-19136(r2)# 0x04222840
0x410d5c0   rlwinm  r29,r16,28,4,31
0x410d5c4   cmpld   cr7,r16,r9
0x410d5c8   bgt cr7,0x410d650
0x410d5cc   addir6,r29,-2
0x410d5d0   clrldi  r9,r6,32
0x410d5d4   rldicr  r10,r9,3,60
0x410d5d8   addir7,r9,1
0x410d5dc   add r10,r3,r10
0x410d5e0   rldicr  r7,r7,3,60
0x410d5e4   add r7,r3,r7
0x410d5e8   ld  r9,8(r10)# 0x04220ce0
0x410d5ec   cmpdi   cr7,r9,0
0x410d5f0   beq cr7,0x410d650
0x410d5f4   ld  r10,16(r9)   # 0x10030010
0x410d5f8   ldarx   r15,0,r7,1   # 0x04220ce0
0x410d5fc   cmpdr15,r9
0x410d600   bne 0x410d60c
0x410d604   stdcx.  r10,0,r7 # 0x04220ce0
0x410d608   bne-0x410d5f8
0x410d60c   isync
0x410d610   cmpld   cr7,r15,r9
0x410d614   bne cr7,0x410d648
0x410d618   b   0x410da40
0x410da40   ld  r9,8(r15)# 0x10030008
0x410da44   rlwinm  r9,r9,28,4,31
0x410da48   addir9,r9,-2
0x410da4c   cmplw   cr7,r9,r6
0x410da50   bne cr7,0x410de08
0x410da54   nop
0x410da58   addir31,r15,16
0x410da5c   lwa r9,-19080(r2)# 0x04222878
0x410da60   cmpdi   cr7,r9,0
0x410da64   bne cr7,0x410d6e4
0x410da68   addir1,r1,224
0x410da6c   mr  r3,r31
0x410da70   ld  r0,16(r1)# 0x000fff00f080
0x410da74   ld  r14,-144(r1) # 0x000fff00efe0
0x410da78   ld  r15,-136(r1) # 0x000fff00efe8
0x410da7c   ld  r16,-128(r1) # 0x000fff00eff0
0x410da80   ld  r17,-120(r1) # 0x000fff00eff8
0x410da84   ld  r18,-112(r1) # 0x000fff00f000
0x410da88   ld  r19,-104(r1) # 0x000fff00f008
0x410da8c   ld  r20,-96(r1)  # 0x000fff00f010
0x410da90   ld  r21,-88(r1)  # 0x000fff00f018
0x410da94   ld  r22,-80(r1)  # 0x000fff00f020
0x410da98   ld  r23,-72(r1)  # 0x000fff00f028
0x410da9c   ld  r24,-64(r1)  # 0x000fff00f030
0x410daa0   mtlrr0
0x410da70   ld  r0,16(r1)# 0x000fff00f080
0x410da74   ld  r14,-144(r1) # 0x000fff00efe0
0x410da78   ld  r15,-136(r1) # 0x000fff00efe8
0x410da7c   ld  r16,-128(r1) # 0x000fff00eff0
0x410da80   ld  r17,-120(r1) # 0x000fff00eff8
0x410da84   ld  r18,-112(r1) # 0x000fff00f000
0x410da88   ld  r19,-104(r1) # 0x000fff00f008
0x410da8c   ld  r20,-96(r1)  

Re: [PATCH 4/5] powerpc/mm: Simplify page size kconfig dependencies

2015-08-09 Thread Aneesh Kumar K.V
Michael Ellerman m...@ellerman.id.au writes:

 For config options with only a single value, guarding the single value
 with 'if' is the same as adding a 'depends' statement. And it's more
 standard to just use 'depends'.

 And if the option has both an 'if' guard and a 'depends' we can collapse
 them into a single 'depends' by combining them with .

 Signed-off-by: Michael Ellerman m...@ellerman.id.au

Reviewed-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com

 ---
  arch/powerpc/Kconfig | 11 ++-
  1 file changed, 6 insertions(+), 5 deletions(-)

 diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
 index 5ef27113b898..3a4ba2809201 100644
 --- a/arch/powerpc/Kconfig
 +++ b/arch/powerpc/Kconfig
 @@ -560,16 +560,17 @@ config PPC_4K_PAGES
   bool 4k page size

  config PPC_16K_PAGES
 - bool 16k page size if 44x || PPC_8xx
 + bool 16k page size
 + depends on 44x || PPC_8xx

  config PPC_64K_PAGES
 - bool 64k page size if 44x || PPC_STD_MMU_64 || PPC_BOOK3E_64
 - depends on !PPC_FSL_BOOK3E
 + bool 64k page size
 + depends on !PPC_FSL_BOOK3E  (44x || PPC_STD_MMU_64 || PPC_BOOK3E_64)
   select PPC_HAS_HASH_64K if PPC_STD_MMU_64

  config PPC_256K_PAGES
 - bool 256k page size if 44x
 - depends on !STDBINUTILS
 + bool 256k page size
 + depends on 44x  !STDBINUTILS
   help
 Make the page size 256k.

 -- 
 2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v5 3/6] cpufreq: powernv: Register for OCC related opal_message notification

2015-08-09 Thread Stewart Smith
Shilpasri G Bhat shilpa.b...@linux.vnet.ibm.com writes:
 diff --git a/drivers/cpufreq/powernv-cpufreq.c 
 b/drivers/cpufreq/powernv-cpufreq.c
 index d0c18c9..a634199 100644
 --- a/drivers/cpufreq/powernv-cpufreq.c
 +++ b/drivers/cpufreq/powernv-cpufreq.c
 @@ -33,6 +33,7 @@
  #include asm/firmware.h
  #include asm/reg.h
  #include asm/smp.h /* Required for cpu_sibling_mask() in UP configs */
 +#include asm/opal.h
  
  #define POWERNV_MAX_PSTATES  256
  #define PMSR_PSAFE_ENABLE(1UL  30)
 @@ -41,7 +42,7 @@
  #define PMSR_LP(x)   ((x  48)  0xFF)
  
  static struct cpufreq_frequency_table powernv_freqs[POWERNV_MAX_PSTATES+1];
 -static bool rebooting, throttled;
 +static bool rebooting, throttled, occ_reset;
  
  static struct chip {
   unsigned int id;
 @@ -414,6 +415,74 @@ static struct notifier_block powernv_cpufreq_reboot_nb = 
 {
   .notifier_call = powernv_cpufreq_reboot_notifier,
  };
  
 +static char throttle_reason[][30] = {
 + No throttling,
 + Power Cap,
 + Processor Over Temperature,
 + Power Supply Failure,
 + Over Current,
 + OCC Reset
 +  };
 +
 +static int powernv_cpufreq_occ_msg(struct notifier_block *nb,
 +unsigned long msg_type, void *_msg)
 +{
 + struct opal_msg *msg = _msg;
 + struct opal_occ_msg omsg;
 +
 + if (msg_type != OPAL_MSG_OCC)
 + return 0;
 +
 + omsg.type = be64_to_cpu(msg-params[0]);
 +
 + switch (omsg.type) {
 + case OCC_RESET:
 + occ_reset = true;
 + /*
 +  * powernv_cpufreq_throttle_check() is called in
 +  * target() callback which can detect the throttle state
 +  * for governors like ondemand.
 +  * But static governors will not call target() often thus
 +  * report throttling here.
 +  */
 + if (!throttled) {
 + throttled = true;
 + pr_crit(CPU Frequency is throttled\n);
 + }
 + pr_info(OCC: Reset\n);
 + break;
 + case OCC_LOAD:
 + pr_info(OCC: Loaded\n);
 + break;

I wonder if we could have the log messages be a bit clearer here, odds
are, unless you're one of the people reading this code, you have no idea
what an OCC is or what on earth OCC: Loaded means and why this
*doesn't* mean that your CPUs are no longer throttled so that your
computer doesn't catch fire/break/add 1+1 and get 4.

Also, do we export this information via sysfs somewhere? It would seem
to want to go along with other cpufreq/cpu info there.

It feels like we could do much better at informing users as to what is
going on maybe something like:

OCC (On Chip Controller - enforces hard thermal/power limits) Resetting: CPU 
frequency throttled for duration
OCC Loading, CPU frequency throttled until OCC started
OCC Active, CPU frequency no longer throttled

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V2 1/6] powerpc/powernv: don't enable SRIOV when VF BAR contains non M64 BAR

2015-08-09 Thread Wei Yang
On Fri, Aug 07, 2015 at 05:14:41PM +1000, Alexey Kardashevskiy wrote:
On 08/07/2015 12:24 PM, Wei Yang wrote:
On Fri, Aug 07, 2015 at 11:20:10AM +1000, Gavin Shan wrote:
On Thu, Aug 06, 2015 at 10:10:10PM +0800, Wei Yang wrote:
On Thu, Aug 06, 2015 at 02:35:57PM +1000, Gavin Shan wrote:
On Wed, Aug 05, 2015 at 09:24:58AM +0800, Wei Yang wrote:
On PHB_IODA2, we enable SRIOV devices by mapping IOV BAR with M64 BARs. If
a SRIOV device's BAR is not 64-bit prefetchable, this is not assigned from
M64 windwo, which means M64 BAR can't work on it.


s/PHB_IODA2/PHB3
s/windwo/window

This patch makes this explicit.

Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com

The idea sounds right, but there is one question as below.

---
arch/powerpc/platforms/powernv/pci-ioda.c |   25 +
1 file changed, 9 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 5738d31..9b41dba 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -908,9 +908,6 @@ static int pnv_pci_vf_resource_shift(struct pci_dev 
*dev, int offset)
  if (!res-flags || !res-parent)
  continue;

- if (!pnv_pci_is_mem_pref_64(res-flags))
- continue;
-
  /*
   * The actual IOV BAR range is determined by the start address
   * and the actual size for num_vfs VFs BAR.  This check is to
@@ -939,9 +936,6 @@ static int pnv_pci_vf_resource_shift(struct pci_dev 
*dev, int offset)
  if (!res-flags || !res-parent)
  continue;

- if (!pnv_pci_is_mem_pref_64(res-flags))
- continue;
-
  size = pci_iov_resource_size(dev, i + PCI_IOV_RESOURCES);
  res2 = *res;
  res-start += size * offset;
@@ -1221,9 +1215,6 @@ static int pnv_pci_vf_assign_m64(struct pci_dev 
*pdev, u16 num_vfs)
  if (!res-flags || !res-parent)
  continue;

- if (!pnv_pci_is_mem_pref_64(res-flags))
- continue;
-
  for (j = 0; j  vf_groups; j++) {
  do {
  win = 
 find_next_zero_bit(phb-ioda.m64_bar_alloc,
@@ -1510,6 +1501,12 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 
num_vfs)
  pdn = pci_get_pdn(pdev);

  if (phb-type == PNV_PHB_IODA2) {
+ if (!pdn-vfs_expanded) {
+ dev_info(pdev-dev, don't support this SRIOV device
+  with non M64 VF BAR\n);
+ return -EBUSY;
+ }
+

It would be -ENOSPC since -EBUSY indicates the devices (VFs) are temparily
unavailable. For this case, the VFs are permanently unavailable because of
running out of space to accomodate M64 and non-M64 VF BARs.

The error message could be printed with dev_warn() and it would be precise
as below or something else you prefer:

   dev_warn(pdev-dev, SRIOV not supported because of non-M64 VF BAR\n);


Thanks for the comment, will change accordingly.


  /* Calculate available PE for required VFs */
  mutex_lock(phb-ioda.pe_alloc_mutex);
  pdn-offset = bitmap_find_next_zero_area(
@@ -2774,9 +2771,10 @@ static void 
pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
  if (!res-flags || res-parent)
  continue;
  if (!pnv_pci_is_mem_pref_64(res-flags)) {
- dev_warn(pdev-dev,  non M64 VF BAR%d: %pR\n,
+ dev_warn(pdev-dev, Don't support SR-IOV with
+  non M64 VF BAR%d: %pR. \n,
   i, res);
- continue;
+ return;
  }

  size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES);
@@ -2795,11 +2793,6 @@ static void 
pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
  res = pdev-resource[i + PCI_IOV_RESOURCES];
  if (!res-flags || res-parent)
  continue;
- if (!pnv_pci_is_mem_pref_64(res-flags)) {
- dev_warn(pdev-dev, Skipping expanding VF BAR%d: 
%pR\n,
-  i, res);
- continue;
- }

When any one IOV BAR on the PF is non-M64, none of the VFs can be enabled.
Will we still allocate/assign M64 or M32 resources for the IOV BARs? If so,
I think it can be avoided.


Don't get your point. You mean to avoid this function?

Or clear the IOV BAR when we found one of it is non-M64?


I mean to clear all IOV BARs in case any more of them are IO or M32. In this
case, the SRIOV capability won't be enabled. Otherwise, the resources for
all IOV BARs are assigned and allocated by PCI subsystem, but they won't
be used. Does it make sense to you?


If we want to save MMIO space, this is not necessary.

The IOV BAR will be put into the optional list in assignment stage. So when
there is not enough MMIO space, they will not be assigned.


If we are not going to use non-64bit IOV BAR, why would we assign

Re: [PATCH 5/5] powerpc/mm: Drop CONFIG_PPC_HAS_HASH_64K

2015-08-09 Thread Aneesh Kumar K.V
Michael Ellerman m...@ellerman.id.au writes:

 The relation between CONFIG_PPC_HAS_HASH_64K and CONFIG_PPC_64K_PAGES is
 painfully complicated.

 But if we rearrange it enough we can see that PPC_HAS_HASH_64K
 essentially depends on PPC_STD_MMU_64  PPC_64K_PAGES.

 We can then notice that PPC_HAS_HASH_64K is used in files that are only
 built for PPC_STD_MMU_64, meaning it's equivalent to PPC_64K_PAGES.

 So replace all uses and drop it.

 Signed-off-by: Michael Ellerman m...@ellerman.id.au

Reviewed-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com

 ---
  arch/powerpc/Kconfig|  6 --
  arch/powerpc/mm/hash_low_64.S   |  4 ++--
  arch/powerpc/mm/hash_utils_64.c | 12 ++--
  3 files changed, 8 insertions(+), 14 deletions(-)

 diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
 index 3a4ba2809201..1e69dee29be3 100644
 --- a/arch/powerpc/Kconfig
 +++ b/arch/powerpc/Kconfig
 @@ -514,11 +514,6 @@ config NODES_SPAN_OTHER_NODES
   def_bool y
   depends on NEED_MULTIPLE_NODES

 -config PPC_HAS_HASH_64K
 - bool
 - depends on PPC64
 - default n
 -
  config STDBINUTILS
   bool Using standard binutils settings
   depends on 44x
 @@ -566,7 +561,6 @@ config PPC_16K_PAGES
  config PPC_64K_PAGES
   bool 64k page size
   depends on !PPC_FSL_BOOK3E  (44x || PPC_STD_MMU_64 || PPC_BOOK3E_64)
 - select PPC_HAS_HASH_64K if PPC_STD_MMU_64

  config PPC_256K_PAGES
   bool 256k page size
 diff --git a/arch/powerpc/mm/hash_low_64.S b/arch/powerpc/mm/hash_low_64.S
 index 463174a4a647..3b49e3295901 100644
 --- a/arch/powerpc/mm/hash_low_64.S
 +++ b/arch/powerpc/mm/hash_low_64.S
 @@ -701,7 +701,7 @@ htab_pte_insert_failure:

  #endif /* CONFIG_PPC_64K_PAGES */

 -#ifdef CONFIG_PPC_HAS_HASH_64K
 +#ifdef CONFIG_PPC_64K_PAGES

  
 /*
   *   
 *
 @@ -993,7 +993,7 @@ ht64_pte_insert_failure:
   b   ht64_bail


 -#endif /* CONFIG_PPC_HAS_HASH_64K */
 +#endif /* CONFIG_PPC_64K_PAGES */


  
 /*
 diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
 index 5ec987f65b2c..aee70171355b 100644
 --- a/arch/powerpc/mm/hash_utils_64.c
 +++ b/arch/powerpc/mm/hash_utils_64.c
 @@ -640,7 +640,7 @@ extern u32 ht64_call_hpte_updatepp[];

  static void __init htab_finish_init(void)
  {
 -#ifdef CONFIG_PPC_HAS_HASH_64K
 +#ifdef CONFIG_PPC_64K_PAGES
   patch_branch(ht64_call_hpte_insert1,
   ppc_function_entry(ppc_md.hpte_insert),
   BRANCH_SET_LINK);
 @@ -653,7 +653,7 @@ static void __init htab_finish_init(void)
   patch_branch(ht64_call_hpte_updatepp,
   ppc_function_entry(ppc_md.hpte_updatepp),
   BRANCH_SET_LINK);
 -#endif /* CONFIG_PPC_HAS_HASH_64K */
 +#endif /* CONFIG_PPC_64K_PAGES */

   patch_branch(htab_call_hpte_insert1,
   ppc_function_entry(ppc_md.hpte_insert),
 @@ -1151,12 +1151,12 @@ int hash_page_mm(struct mm_struct *mm, unsigned long 
 ea,
   check_paca_psize(ea, mm, psize, user_region);
  #endif /* CONFIG_PPC_64K_PAGES */

 -#ifdef CONFIG_PPC_HAS_HASH_64K
 +#ifdef CONFIG_PPC_64K_PAGES
   if (psize == MMU_PAGE_64K)
   rc = __hash_page_64K(ea, access, vsid, ptep, trap,
flags, ssize);
   else
 -#endif /* CONFIG_PPC_HAS_HASH_64K */
 +#endif /* CONFIG_PPC_64K_PAGES */
   {
   int spp = subpage_protection(mm, ea);
   if (access  spp)
 @@ -1264,12 +1264,12 @@ void hash_preload(struct mm_struct *mm, unsigned long 
 ea,
   update_flags |= HPTE_LOCAL_UPDATE;

   /* Hash it in */
 -#ifdef CONFIG_PPC_HAS_HASH_64K
 +#ifdef CONFIG_PPC_64K_PAGES
   if (mm-context.user_psize == MMU_PAGE_64K)
   rc = __hash_page_64K(ea, access, vsid, ptep, trap,
update_flags, ssize);
   else
 -#endif /* CONFIG_PPC_HAS_HASH_64K */
 +#endif /* CONFIG_PPC_64K_PAGES */
   rc = __hash_page_4K(ea, access, vsid, ptep, trap, update_flags,
   ssize, subpage_protection(mm, ea));

 -- 
 2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev