[PATCH] powerpc/memtrace: Remove memory in chunks

2018-08-16 Thread Rashmica Gupta
When hot-removing memory release_mem_region_adjustable() splits iomem
resources if they are not the exact size of the memory being
hot-deleted. Adding this memory back to the kernel adds a new resource.

Eg a node has memory 0x0 - 0xf. Hot-removing 1GB from
0xf4000 results in the single resource 0x0-0xf being split
into two resources: 0x0-0xf3fff and 0xf8000-0xf.

When we hot-add the memory back we now have three resources:
0x0-0xf3fff, 0xf4000-0xf7fff, and 0xf8000-0xf.

This is an issue if we try to remove some memory that overlaps
resources. Eg when trying to remove 2GB at address 0xf4000,
release_mem_region_adjustable() fails as it expects the chunk of memory
to be within the boundaries of a single resource. We then get the
warning: "Unable to release resource" and attempting to use memtrace
again gives us this error: "bash: echo: write error: Resource
temporarily unavailable"

This patch makes memtrace remove memory in chunks that are always the
same size from an address that is always equal to end_of_memory -
n*size, for some n. So hotremoving and hotadding memory of different
sizes will now not attempt to remove memory that spans multiple
resources.

Signed-off-by: Rashmica Gupta 
---

To replicate the above issue hot-remove and hot-add memory of different
sizes a bunch of times. This does it for me on POWER8 and POWER9: 
for i in `seq 1 10`; do
echo $(( $i * 268435456))  >  /sys/kernel/debug/powerpc/memtrace/enable 
echo '.'
done

 arch/powerpc/platforms/powernv/memtrace.c | 20 +++-
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/memtrace.c 
b/arch/powerpc/platforms/powernv/memtrace.c
index 51fe0862dcab..c5749f898652 100644
--- a/arch/powerpc/platforms/powernv/memtrace.c
+++ b/arch/powerpc/platforms/powernv/memtrace.c
@@ -119,17 +119,15 @@ static bool memtrace_offline_pages(u32 nid, u64 
start_pfn, u64 nr_pages)
walk_memory_range(start_pfn, end_pfn, (void *)MEM_OFFLINE,
  change_memblock_state);
 
-   lock_device_hotplug();
-   remove_memory(nid, start_pfn << PAGE_SHIFT, nr_pages << PAGE_SHIFT);
-   unlock_device_hotplug();
 
return true;
 }
 
 static u64 memtrace_alloc_node(u32 nid, u64 size)
 {
-   u64 start_pfn, end_pfn, nr_pages;
+   u64 start_pfn, end_pfn, nr_pages, pfn;
u64 base_pfn;
+   u64 bytes = memory_block_size_bytes();
 
if (!node_spanned_pages(nid))
return 0;
@@ -142,8 +140,20 @@ static u64 memtrace_alloc_node(u32 nid, u64 size)
end_pfn = round_down(end_pfn - nr_pages, nr_pages);
 
for (base_pfn = end_pfn; base_pfn > start_pfn; base_pfn -= nr_pages) {
-   if (memtrace_offline_pages(nid, base_pfn, nr_pages) == true)
+   if (memtrace_offline_pages(nid, base_pfn, nr_pages) == true) {
+   /* Remove memory in memory block size chunks so that
+* iomem resources are always split to the same size
+* and we never try to remove memory that spans two
+* iomem resources.
+*/
+   lock_device_hotplug();
+   end_pfn = base_pfn + nr_pages;
+   for (pfn = base_pfn; pfn < end_pfn; pfn += bytes>> 
PAGE_SHIFT) {
+   remove_memory(nid, pfn << PAGE_SHIFT, bytes);
+   }
+   unlock_device_hotplug();
return base_pfn << PAGE_SHIFT;
+   }
}
 
return 0;
-- 
2.14.4



[GIT PULL] Please pull powerpc/linux.git powerpc-4.19-1 tag

2018-08-16 Thread Michael Ellerman
Hi Linus,

Please pull powerpc updates for 4.19.

There's a trivial conflict in Documentation/admin-guide/kernel-parameters.txt
between our addition of nospectre_v1 and the x86 documentation for nosmt.

There's another in arch/m68k/mac/misc.c, which was coordinated with
Geert but unfortunately we still caused a conflict. The resolution is to
take the time64_t change and the CONFIG_ADB_PMU change.

The change to include/uapi/linux/pmu.h is safe and not perf related as
one would assume, the pmu there means Power Management Unit.

Finally there's the change to struct page. That was sent to linux-mm but
we couldn't tease an Ack out of anyone. Willy mentioned he wants to make
pt_mm generic in future, at which point we will happily shuffle our
field elsewhere.

cheers


The following changes since commit 021c91791a5e7e85c567452f1be3e4c2c6cb6063:

  Linux 4.18-rc3 (2018-07-01 16:04:53 -0700)

are available in the git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
tags/powerpc-4.19-1

for you to fetch changes up to a2dc009afa9ae8b92305be7728676562a104cb40:

  powerpc/mm/book3s/radix: Add mapping statistics (2018-08-13 16:35:05 +1000)


powerpc updates for 4.19

Notable changes:

 - A fix for a bug in our page table fragment allocator, where a page table page
   could be freed and reallocated for something else while still in use, leading
   to memory corruption etc. The fix reuses pt_mm in struct page (x86 only) for
   a powerpc only refcount.

 - Fixes to our pkey support. Several are user-visible changes, but bring us in
   to line with x86 behaviour and/or fix outright bugs. Thanks to Florian Weimer
   for reporting many of these.

 - A series to improve the hvc driver & related OPAL console code, which have
   been seen to cause hardlockups at times. The hvc driver changes in particular
   have been in linux-next for ~month.

 - Increase our MAX_PHYSMEM_BITS to 128TB when SPARSEMEM_VMEMMAP=y.

 - Remove Power8 DD1 and Power9 DD1 support, neither chip should be in use
   anywhere other than as a paper weight.

 - An optimised memcmp implementation using Power7-or-later VMX instructions

 - Support for barrier_nospec on some NXP CPUs.

 - Support for flushing the count cache on context switch on some IBM CPUs
   (controlled by firmware), as a Spectre v2 mitigation.

 - A series to enhance the information we print on unhandled signals to bring it
   into line with other arches, including showing the offending VMA and dumping
   the instructions around the fault.

Thanks to:
  Aaro Koskinen, Akshay Adiga, Alastair D'Silva, Alexey Kardashevskiy, Alexey
  Spirkov, Alistair Popple, Andrew Donnellan, Aneesh Kumar K.V, Anju T Sudhakar,
  Arnd Bergmann, Bartosz Golaszewski, Benjamin Herrenschmidt, Bharat Bhushan,
  Bjoern Noetel, Boqun Feng, Breno Leitao, Bryant G. Ly, Camelia Groza,
  Christophe Leroy, Christoph Hellwig, Cyril Bur, Dan Carpenter, Daniel Klamt,
  Darren Stevens, Dave Young, David Gibson, Diana Craciun, Finn Thain, Florian
  Weimer, Frederic Barrat, Gautham R. Shenoy, Geert Uytterhoeven, Geoff Levand,
  Guenter Roeck, Gustavo Romero, Haren Myneni, Hari Bathini, Joel Stanley,
  Jonathan Neuschäfer, Kees Cook, Madhavan Srinivasan, Mahesh Salgaonkar, Markus
  Elfring, Mathieu Malaterre, Mauro S. M. Rodrigues, Michael Hanselmann, Michael
  Neuling, Michael Schmitz, Mukesh Ojha, Murilo Opsfelder Araujo, Nicholas
  Piggin, Parth Y Shah, Paul Mackerras, Paul Menzel, Ram Pai, Randy Dunlap,
  Rashmica Gupta, Reza Arbab, Rodrigo R. Galvao, Russell Currey, Sam Bobroff,
  Scott Wood, Shilpasri G Bhat, Simon Guo, Souptick Joarder, Stan Johnson,
  Thiago Jung Bauermann, Tyrel Datwyler, Vaibhav Jain, Vasant Hegde, Venkat Rao
  B, zhong jiang.


Aaro Koskinen (1):
  powerpc: Enable kernel XZ compression option on BOOK3S_32

Akshay Adiga (2):
  powernv/cpuidle: Parse dt idle properties into global structure
  powernv/cpuidle: Use parsed device tree values for cpuidle_init

Alastair D'Silva (7):
  Revert "cxl: Add kernel API to allow a context to operate with relocate 
disabled"
  Revert "cxl: Add support for interrupts on the Mellanox CX4"
  Revert "cxl: Add preliminary workaround for CX4 interrupt limitation"
  Revert "cxl: Add kernel APIs to get & set the max irqs per context"
  Revert "cxl: Add cxl_check_and_switch_mode() API to switch bi-modal cards"
  Revert "cxl: Add support for using the kernel API with a real PHB"
  Revert "powerpc/powernv: Add support for the cxl kernel api on the real 
phb"

Alexey Kardashevskiy (8):
  powerpc/powernv/ioda2: Reduce upper limit for DMA window size
  powerpc/powernv/ioda2: Add 256M IOMMU page size to the default POWER8 case
  powerpc/powernv: Remove useless wrapper
  powerpc/powernv: Move TCE manupulation code to its own file
  KVM: PPC: Make 

Re: [PATCH v2 3/4] powerpc/mm: fix a warning when a cache is common to PGD and hugepages

2018-08-16 Thread Aneesh Kumar K.V

On 08/14/2018 08:24 PM, Christophe Leroy wrote:

While implementing TLB miss HW assistance on the 8xx, the following
warning was encountered:

[  423.732965] WARNING: CPU: 0 PID: 345 at mm/slub.c:2412 
___slab_alloc.constprop.30+0x26c/0x46c
[  423.733033] CPU: 0 PID: 345 Comm: mmap Not tainted 
4.18.0-rc8-00664-g2dfff9121c55 #671
[  423.733075] NIP:  c0108f90 LR: c0109ad0 CTR: 0004
[  423.733121] REGS: c455bba0 TRAP: 0700   Not tainted  
(4.18.0-rc8-00664-g2dfff9121c55)
[  423.733147] MSR:  00021032   CR: 24224848  XER: 2000
[  423.733319]
[  423.733319] GPR00: c0109ad0 c455bc50 c4521910 c60053c0 007080c0 c0011b34 
c7fa41e0 c455be30
[  423.733319] GPR08: 0001 c00103a0 c7fa41e0 c49afcc4 24282842 10018840 
c079b37c 0040
[  423.733319] GPR16: 73f0 00210d00  0001 c455a000 0100 
0200 c455a000
[  423.733319] GPR24: c60053c0 c0011b34 007080c0 c455a000 c455a000 c7fa41e0 
 9032
[  423.734190] NIP [c0108f90] ___slab_alloc.constprop.30+0x26c/0x46c
[  423.734257] LR [c0109ad0] kmem_cache_alloc+0x210/0x23c
[  423.734283] Call Trace:
[  423.734326] [c455bc50] [0100] 0x100 (unreliable)
[  423.734430] [c455bcc0] [c0109ad0] kmem_cache_alloc+0x210/0x23c
[  423.734543] [c455bcf0] [c0011b34] huge_pte_alloc+0xc0/0x1dc
[  423.734633] [c455bd20] [c01044dc] hugetlb_fault+0x408/0x48c
[  423.734720] [c455bdb0] [c0104b20] follow_hugetlb_page+0x14c/0x44c
[  423.734826] [c455be10] [c00e8e54] __get_user_pages+0x1c4/0x3dc
[  423.734919] [c455be80] [c00e9924] __mm_populate+0xac/0x140
[  423.735020] [c455bec0] [c00db14c] vm_mmap_pgoff+0xb4/0xb8
[  423.735127] [c455bf00] [c00f27c0] ksys_mmap_pgoff+0xcc/0x1fc
[  423.735222] [c455bf40] [c000e0f8] ret_from_syscall+0x0/0x38
[  423.735271] Instruction dump:
[  423.735321] 7cbf482e 38fd0008 7fa6eb78 7fc4f378 4bfff5dd 7fe3fb78 4bfffe24 
81370010
[  423.735536] 71280004 41a2ff88 4840c571 4b80 <0fe0> 4bfffeb8 81340010 
712a0004
[  423.735757] ---[ end trace e9b222919a470790 ]---

This warning occurs when calling kmem_cache_zalloc() on a
cache having a constructor.

In this case it happens because PGD cache and 512k hugepte cache are
the same size (4k). While a cache with constructor is created for
the PGD, hugepages create cache without constructor and uses
kmem_cache_zalloc(). As both expect a cache with the same size,
the hugepages reuse the cache created for PGD, hence the conflict.

In order to avoid this conflict, this patch:
- modifies pgtable_cache_add() so that a zeroising constructor is
added for any cache size.
- replaces calls to kmem_cache_zalloc() by kmem_cache_alloc()



Can't we just do kmem_cache_alloc with gfp flags __GFP_ZERO? and remove 
the constructor completely?



-aneesh



Re: [PATCH v2 1/4] powerpc/tm: Remove msr_tm_active()

2018-08-16 Thread Michael Ellerman
Michael Neuling  writes:

> On Mon, 2018-06-18 at 19:59 -0300, Breno Leitao wrote:
>> Currently msr_tm_active() is a wrapper around MSR_TM_ACTIVE() if
>> CONFIG_PPC_TRANSACTIONAL_MEM is set, or it is just a function that
>> returns false if CONFIG_PPC_TRANSACTIONAL_MEM is not set.
>> 
>> This function is not necessary, since MSR_TM_ACTIVE() just do the same,
>> checking for the TS bits and does not require any TM facility.
>> 
>> This patchset remove every instance of msr_tm_active() and replaced it
>> by MSR_TM_ACTIVE().
>> 
>> Signed-off-by: Breno Leitao 
>> 
>
> Patch looks good... one minor nit below...
>
>>  
>> -if (!msr_tm_active(regs->msr) &&
>> -!current->thread.load_fp && !loadvec(current->thread))
>> +if (!current->thread.load_fp && !loadvec(current->thread)) {
>> +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
>> +if (!MSR_TM_ACTIVE(regs->msr))
>> +return;
>
> Can you make a MSR_TM_ACTIVE() that returns false when
> !CONFIG_PPC_TRANSACTIONAL_MEM. Then you don't need this inline #ifdef.

Is that safe?

I see ~50 callers of MSR_TM_ACTIVE(), are they all inside #ifdef TM ?

cheers


[PATCH] powerpc/powernv/pci: Work around races in PCI bridge enabling

2018-08-16 Thread Benjamin Herrenschmidt
The generic code is race when multiple children of a PCI bridge try to
enable it simultaneously.

This leads to drivers trying to access a device through a not-yet-enabled
bridge, and this EEH errors under various circumstances when using parallel
driver probing.

There is work going on to fix that properly in the PCI core but it will
take some time.

x86 gets away with it because (outside of hotplug), the BIOS enables all
the bridges at boot time.

This patch does the same thing on powernv by enabling all bridges that
have child devices at boot time, thus avoiding subsequent races. It's suitable
for backporting to stable and distros, while the proper PCI fix will probably
be significantly more invasive.

Signed-off-by: Benjamin Herrenschmidt 
CC: sta...@vger.kernel.org
---

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 70b2e1e0f23c..0975b0aaf210 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -3368,12 +3368,52 @@ static void pnv_pci_ioda_create_dbgfs(void)
 #endif /* CONFIG_DEBUG_FS */
 }
 
+static void pnv_pci_enable_bridge(struct pci_bus *bus)
+{
+   struct pci_dev *dev = bus->self;
+   struct pci_bus *child;
+
+   /* Empty bus ? bail */
+   if (list_empty(>devices))
+   return;
+
+   /*
+* If there's a bridge associated with that bus enable it. This works
+* around races in the generic code if the enabling is done during
+* parallel probing. This can be removed once those races have been
+* fixed.
+*/
+   if (dev) {
+   int rc = pci_enable_device(dev);
+   if (rc)
+   pci_err(dev, "Error enabling bridge (%d)\n", rc);
+   pci_set_master(dev);
+   }
+
+   /* Perform the same to child busses */
+   list_for_each_entry(child, >children, node)
+   pnv_pci_enable_bridge(child);
+}
+
+static void pnv_pci_enable_bridges(void)
+{
+   struct pci_controller *hose;
+   struct pnv_phb *phb;
+   struct pci_bus *bus;
+   struct pci_dev *pdev;
+
+   list_for_each_entry(hose, _list, list_node)
+   pnv_pci_enable_bridge(hose->bus);
+}
+
 static void pnv_pci_ioda_fixup(void)
 {
pnv_pci_ioda_setup_PEs();
pnv_pci_ioda_setup_iommu_api();
pnv_pci_ioda_create_dbgfs();
 
+   pnv_pci_enable_bridges();
+
 #ifdef CONFIG_EEH
pnv_eeh_post_init();
 #endif



Re: [PATCH 3/3] powerpc/pseries/mm: call H_BLOCK_REMOVE

2018-08-16 Thread Laurent Dufour
On 30/07/2018 16:22, Aneesh Kumar K.V wrote:
> Michael Ellerman  writes:
> 
>> Hi Laurent,
>>
>> Just one comment below.
>>
>> Laurent Dufour  writes:
>>> diff --git a/arch/powerpc/platforms/pseries/lpar.c 
>>> b/arch/powerpc/platforms/pseries/lpar.c
>>> index 96b8cd8a802d..41ed03245eb4 100644
>>> --- a/arch/powerpc/platforms/pseries/lpar.c
>>> +++ b/arch/powerpc/platforms/pseries/lpar.c
>>> @@ -418,6 +418,73 @@ static void pSeries_lpar_hpte_invalidate(unsigned long 
>>> slot, unsigned long vpn,
>>> BUG_ON(lpar_rc != H_SUCCESS);
>>>  }
>>>  
>>> +
>>> +/*
>>> + * As defined in the PAPR's section 14.5.4.1.8
>>> + * The control mask doesn't include the returned reference and change bit 
>>> from
>>> + * the processed PTE.
>>> + */
>>> +#define HBLKR_AVPN 0x0100UL
>>> +#define HBLKR_CTRL_MASK0xf800UL
>>> +#define HBLKR_CTRL_SUCCESS 0x8000UL
>>> +#define HBLKR_CTRL_ERRNOTFOUND 0x8800UL
>>> +#define HBLKR_CTRL_ERRBUSY 0xa000UL
>>> +
>>> +/**
>>> + * H_BLOCK_REMOVE caller.
>>> + * @idx should point to the latest @param entry set with a PTEX.
>>> + * If PTE cannot be processed because another CPUs has already locked that
>>> + * group, those entries are put back in @param starting at index 1.
>>> + * If entries has to be retried and @retry_busy is set to true, these 
>>> entries
>>> + * are retried until success. If @retry_busy is set to false, the returned
>>> + * is the number of entries yet to process.
>>> + */
>>> +static unsigned long call_block_remove(unsigned long idx, unsigned long 
>>> *param,
>>> +  bool retry_busy)
>>> +{
>>> +   unsigned long i, rc, new_idx;
>>> +   unsigned long retbuf[PLPAR_HCALL9_BUFSIZE];
>>> +
>>> +again:
>>> +   new_idx = 0;
>>> +   BUG_ON((idx < 2) || (idx > PLPAR_HCALL9_BUFSIZE));
>>
>> I count 1 ..
>>
>>> +   if (idx < PLPAR_HCALL9_BUFSIZE)
>>> +   param[idx] = HBR_END;
>>> +
>>> +   rc = plpar_hcall9(H_BLOCK_REMOVE, retbuf,
>>> + param[0], /* AVA */
>>> + param[1],  param[2],  param[3],  param[4], /* TS0-7 */
>>> + param[5],  param[6],  param[7],  param[8]);
>>> +   if (rc == H_SUCCESS)
>>> +   return 0;
>>> +
>>> +   BUG_ON(rc != H_PARTIAL);
>>
>> 2 ...
>>
>>> +   /* Check that the unprocessed entries were 'not found' or 'busy' */
>>> +   for (i = 0; i < idx-1; i++) {
>>> +   unsigned long ctrl = retbuf[i] & HBLKR_CTRL_MASK;
>>> +
>>> +   if (ctrl == HBLKR_CTRL_ERRBUSY) {
>>> +   param[++new_idx] = param[i+1];
>>> +   continue;
>>> +   }
>>> +
>>> +   BUG_ON(ctrl != HBLKR_CTRL_SUCCESS
>>> +  && ctrl != HBLKR_CTRL_ERRNOTFOUND);
>>
>> 3 ...
>>
>> BUG_ON()s.
>>
>> I know the code in this file is already pretty liberal with the use of
>> BUG_ON() but I'd prefer if we don't make it any worse.
>>
>> Given this is an optimisation it seems like we should be able to fall
>> back to the existing implementation in the case of error (which will
>> probably then BUG_ON() )
>>
>> If there's some reason we can't then I guess I can live with it.
> 
> It would be nice to log the error in case we are not expecting the
> error return. We recently did
> https://marc.info/?i=20180629083904.29250-1-aneesh.ku...@linux.ibm.com

I'm not sure that a failure during an invalidation should just result in an
error message being displayed because the page remains accessible and could
potentially be accessed later.
A comment in the caller hash__tlb_flush(), is quite explicit about that:
/* If there's a TLB batch pending, then we must flush it because the
 * pages are going to be freed and we really don't want to have a CPU
 * access a freed page because it has a stale TLB
 */

Getting an error when adding an entry may not be fatal but when removing one,
this could lead to data being exposed.

Laurent.



[PATCH v3] powerpc/tm: Remove msr_tm_active()

2018-08-16 Thread Breno Leitao
Currently msr_tm_active() is a wrapper around MSR_TM_ACTIVE() if
CONFIG_PPC_TRANSACTIONAL_MEM is set, or it is just a function that
returns false if CONFIG_PPC_TRANSACTIONAL_MEM is not set.

This function is not necessary, since MSR_TM_ACTIVE() just do the same and
could be used, removing the dualism and simplifying the code.

This patchset remove every instance of msr_tm_active() and replaced it
by MSR_TM_ACTIVE().

Signed-off-by: Breno Leitao 
---
 arch/powerpc/include/asm/reg.h |  7 ++-
 arch/powerpc/kernel/process.c  | 21 +
 2 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index 562568414cf4..58393bc6c964 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -116,11 +116,16 @@
 #define MSR_TS_S   __MASK(MSR_TS_S_LG) /*  Transaction Suspended */
 #define MSR_TS_T   __MASK(MSR_TS_T_LG) /*  Transaction Transactional */
 #define MSR_TS_MASK(MSR_TS_T | MSR_TS_S)   /* Transaction State bits */
-#define MSR_TM_ACTIVE(x) (((x) & MSR_TS_MASK) != 0) /* Transaction active? */
 #define MSR_TM_RESV(x) (((x) & MSR_TS_MASK) == MSR_TS_MASK) /* Reserved */
 #define MSR_TM_TRANSACTIONAL(x)(((x) & MSR_TS_MASK) == MSR_TS_T)
 #define MSR_TM_SUSPENDED(x)(((x) & MSR_TS_MASK) == MSR_TS_S)
 
+#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
+#define MSR_TM_ACTIVE(x) (((x) & MSR_TS_MASK) != 0) /* Transaction active? */
+#else
+#define MSR_TM_ACTIVE(x) 0
+#endif
+
 #if defined(CONFIG_PPC_BOOK3S_64)
 #define MSR_64BIT  MSR_SF
 
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 9ef4aea9fffe..f18dba07ea16 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -102,24 +102,18 @@ static void check_if_tm_restore_required(struct 
task_struct *tsk)
}
 }
 
-static inline bool msr_tm_active(unsigned long msr)
-{
-   return MSR_TM_ACTIVE(msr);
-}
-
 static bool tm_active_with_fp(struct task_struct *tsk)
 {
-   return msr_tm_active(tsk->thread.regs->msr) &&
+   return MSR_TM_ACTIVE(tsk->thread.regs->msr) &&
(tsk->thread.ckpt_regs.msr & MSR_FP);
 }
 
 static bool tm_active_with_altivec(struct task_struct *tsk)
 {
-   return msr_tm_active(tsk->thread.regs->msr) &&
+   return MSR_TM_ACTIVE(tsk->thread.regs->msr) &&
(tsk->thread.ckpt_regs.msr & MSR_VEC);
 }
 #else
-static inline bool msr_tm_active(unsigned long msr) { return false; }
 static inline void check_if_tm_restore_required(struct task_struct *tsk) { }
 static inline bool tm_active_with_fp(struct task_struct *tsk) { return false; }
 static inline bool tm_active_with_altivec(struct task_struct *tsk) { return 
false; }
@@ -247,7 +241,8 @@ void enable_kernel_fp(void)
 * giveup as this would save  to the 'live' structure not the
 * checkpointed structure.
 */
-   if(!msr_tm_active(cpumsr) && 
msr_tm_active(current->thread.regs->msr))
+   if (!MSR_TM_ACTIVE(cpumsr) &&
+MSR_TM_ACTIVE(current->thread.regs->msr))
return;
__giveup_fpu(current);
}
@@ -311,7 +306,8 @@ void enable_kernel_altivec(void)
 * giveup as this would save  to the 'live' structure not the
 * checkpointed structure.
 */
-   if(!msr_tm_active(cpumsr) && 
msr_tm_active(current->thread.regs->msr))
+   if (!MSR_TM_ACTIVE(cpumsr) &&
+MSR_TM_ACTIVE(current->thread.regs->msr))
return;
__giveup_altivec(current);
}
@@ -397,7 +393,8 @@ void enable_kernel_vsx(void)
 * giveup as this would save  to the 'live' structure not the
 * checkpointed structure.
 */
-   if(!msr_tm_active(cpumsr) && 
msr_tm_active(current->thread.regs->msr))
+   if (!MSR_TM_ACTIVE(cpumsr) &&
+MSR_TM_ACTIVE(current->thread.regs->msr))
return;
__giveup_vsx(current);
}
@@ -530,7 +527,7 @@ void restore_math(struct pt_regs *regs)
 {
unsigned long msr;
 
-   if (!msr_tm_active(regs->msr) &&
+   if (!MSR_TM_ACTIVE(regs->msr) &&
!current->thread.load_fp && !loadvec(current->thread))
return;
 
-- 
2.16.3



Re: [PATCH v7 4/9] powerpc/pseries: Define MCE error event section.

2018-08-16 Thread Segher Boessenkool
Hi!

On Thu, Aug 16, 2018 at 02:14:39PM +1000, Michael Ellerman wrote:
> Mahesh Jagannath Salgaonkar  writes:
> > On 08/08/2018 08:12 PM, Michael Ellerman wrote:
> >>> + uint8_t reserved_1[6];
> >>> + __be64  effective_address;
> >>> + __be64  logical_address;
> >>> + } ue_error;
> >>> + struct {
> >>> + uint8_t soft_err_type;
> >>> + /* 
> >>> +  * X1: Effective address provided.
> >>> +  *  X   5: Reserved.
> >>> +  *   XX 2: Type of SLB/ERAT/TLB error.
> >>> +  */
> >>> + uint8_t reserved_1[6];
> >>> + __be64  effective_address;
> >>> + uint8_t reserved_2[8];
> >>> + } soft_error;
> >>> + } u;
> >>> +};
> >>> +#pragma pack(pop)
> >> 
> >> Why not __packed ?
> >
> > Because when used __packed it added 1 byte extra padding between
> > reserved_1[6] and effective_address. That caused wrong effective address
> > to be printed on the console. Hence I switched to #pragma pack to force
> > 1 byte alignment for this structure alone.
> 
> OK, that's weird.

Yes, if that is true, then please open a GCC bugzilla report.


Segher


Re: [PATCH v2 4/4] powerpc/tm: Do not recheckpoint non-tm task

2018-08-16 Thread Breno Leitao
Hey Mikey,

Thanks for the review.

On 08/15/2018 08:50 PM, Michael Neuling wrote:
> On Mon, 2018-06-18 at 19:59 -0300, Breno Leitao wrote:
>> If __switch_to() tries to context switch from task A to task B, and task A
>>  had task->thread->regs->msr[TM] enabled, then __switch_to_tm() will call
>> tm_recheckpoint_new_task(), which will call trecheckpoint, for task B, which
>>  is clearly wrong since task B might not be an active TM user.
>>
>> This does not cause a lot of damage because tm_recheckpoint() will abort
>> the call since it realize that the current task does not have msr[TM] bit
>> set.
>>
>> Signed-off-by: Breno Leitao 
>> ---
>>  arch/powerpc/kernel/process.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
>> index f8beee03f00a..d26a150766ef 100644
>> --- a/arch/powerpc/kernel/process.c
>> +++ b/arch/powerpc/kernel/process.c
>> @@ -1036,7 +1036,8 @@ static inline void __switch_to_tm(struct task_struct
>> *prev,
>>  prev->thread.regs->msr &= ~MSR_TM;
>>  }
>>  
>> -tm_recheckpoint_new_task(new);
>> +if (tm_enabled(new))
>> +tm_recheckpoint_new_task(new);
> 
> I'm not sure we need this patch as tm_recheckpoint_new_task() does this 
> itself.

My plan is to move this check prior to calling these TM functions, doing
early checking and avoiding calling tm_recheckpoint on a non-tm enabled task.
It is very weird when you see, during a tracing, a kernel thread (PF_KTHREAD)
being tm_recheckpointed. :-/

That said, the TM function would do the operation other than "check and do
it" mode.

Ideally I would like to check the thread before calling any TM functions,
and warning (WARN_ON) if we detect, later in the game, that a thread is not
TM enabled.

This helps on two different fronts, in my opinion:

 * Code readability
 * Understanding tracing (ftrace) outputs.


Re: [PATCH v2 2/4] powerpc/tm: Fix HTM documentation

2018-08-16 Thread Michael Neuling
On Mon, 2018-06-18 at 19:59 -0300, Breno Leitao wrote:
> This patch simply fix part of the documentation on the HTM code.
> 
> This fixes reference to old fields that were renamed in commit
> 000ec280e3dd ("powerpc: tm: Rename transct_(*) to ck(\1)_state")
> 
> It also documents better the flow after commit eb5c3f1c8647 ("powerpc:
> Always save/restore checkpointed regs during treclaim/trecheckpoint"),
> where tm_recheckpoint can recheckpoint what is in ck{fp,vr}_state blindly.
> 
> Signed-off-by: Breno Leitao 

Acked-By: Michael Neuling 

Thanks

> ---
>  arch/powerpc/kernel/tm.S| 10 +-
>  arch/powerpc/kernel/traps.c | 15 +--
>  2 files changed, 14 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/tm.S b/arch/powerpc/kernel/tm.S
> index ff12f47a96b6..019d73053cd3 100644
> --- a/arch/powerpc/kernel/tm.S
> +++ b/arch/powerpc/kernel/tm.S
> @@ -95,9 +95,9 @@ EXPORT_SYMBOL_GPL(tm_abort);
>   *  uint8_t cause)
>   *
>   *   - Performs a full reclaim.  This destroys outstanding
> - * transactions and updates thread->regs.tm_ckpt_* with the
> - * original checkpointed state.  Note that thread->regs is
> - * unchanged.
> + * transactions and updates thread.ckpt_regs, thread.ckfp_state and
> + * thread.ckvr_state with the original checkpointed state.  Note that
> + * thread->regs is unchanged.
>   *
>   * Purpose is to both abort transactions of, and preserve the state of,
>   * a transactions at a context switch. We preserve/restore both sets of
> process
> @@ -260,7 +260,7 @@ _GLOBAL(tm_reclaim)
>  
>   /* Altivec (VEC/VMX/VR)*/
>   addir7, r3, THREAD_CKVRSTATE
> - SAVE_32VRS(0, r6, r7)   /* r6 scratch, r7 transact vr state */
> + SAVE_32VRS(0, r6, r7)   /* r6 scratch, r7 ckvr_state */
>   mfvscr  v0
>   li  r6, VRSTATE_VSCR
>   stvxv0, r7, r6
> @@ -271,7 +271,7 @@ _GLOBAL(tm_reclaim)
>  
>   /* Floating Point (FP) */
>   addir7, r3, THREAD_CKFPSTATE
> - SAVE_32FPRS_VSRS(0, R6, R7) /* r6 scratch, r7 transact fp state */
> + SAVE_32FPRS_VSRS(0, R6, R7) /* r6 scratch, r7 ckfp_state */
>   mffsfr0
>   stfdfr0,FPSTATE_FPSCR(r7)
>  
> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
> index 0e17dcb48720..6742b6b3eb37 100644
> --- a/arch/powerpc/kernel/traps.c
> +++ b/arch/powerpc/kernel/traps.c
> @@ -1719,16 +1719,19 @@ void fp_unavailable_tm(struct pt_regs *regs)
>   * checkpointed FP registers need to be loaded.
>*/
>   tm_reclaim_current(TM_CAUSE_FAC_UNAV);
> - /* Reclaim didn't save out any FPRs to transact_fprs. */
> +
> + /* Reclaim initially saved out bogus (lazy) FPRs to ckfp_state, and
> +  * then it was overwrite by the thr->fp_state by tm_reclaim_thread().
> +  *
> +  * At this point, ck{fp,vr}_state contains the exact values we want to
> +  * recheckpoint.
> +  */
>  
>   /* Enable FP for the task: */
>   current->thread.load_fp = 1;
>  
> - /* This loads and recheckpoints the FP registers from
> -  * thread.fpr[].  They will remain in registers after the
> -  * checkpoint so we don't need to reload them after.
> -  * If VMX is in use, the VRs now hold checkpointed values,
> -  * so we don't want to load the VRs from the thread_struct.
> + /*
> +  * Recheckpoint all the checkpointed ckpt, ck{fp, vr}_state registers.
>*/
>   tm_recheckpoint(>thread);
>  }


Re: [PATCH v2 4/4] powerpc/tm: Do not recheckpoint non-tm task

2018-08-16 Thread Michael Neuling
On Mon, 2018-06-18 at 19:59 -0300, Breno Leitao wrote:
> If __switch_to() tries to context switch from task A to task B, and task A
>  had task->thread->regs->msr[TM] enabled, then __switch_to_tm() will call
> tm_recheckpoint_new_task(), which will call trecheckpoint, for task B, which
>  is clearly wrong since task B might not be an active TM user.
> 
> This does not cause a lot of damage because tm_recheckpoint() will abort
> the call since it realize that the current task does not have msr[TM] bit
> set.
> 
> Signed-off-by: Breno Leitao 
> ---
>  arch/powerpc/kernel/process.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> index f8beee03f00a..d26a150766ef 100644
> --- a/arch/powerpc/kernel/process.c
> +++ b/arch/powerpc/kernel/process.c
> @@ -1036,7 +1036,8 @@ static inline void __switch_to_tm(struct task_struct
> *prev,
>   prev->thread.regs->msr &= ~MSR_TM;
>   }
>  
> - tm_recheckpoint_new_task(new);
> + if (tm_enabled(new))
> + tm_recheckpoint_new_task(new);

I'm not sure we need this patch as tm_recheckpoint_new_task() does this itself.

---
static inline void tm_recheckpoint_new_task(struct task_struct *new)
{
if (!cpu_has_feature(CPU_FTR_TM))
return;

/* Recheckpoint the registers of the thread we're about to switch to.
 *
 * If the task was using FP, we non-lazily reload both the original and
 * the speculative FP register states.  This is because the kernel
 * doesn't see if/when a TM rollback occurs, so if we take an FP
 * unavailable later, we are unable to determine which set of FP regs
 * need to be restored.
 */
if (!tm_enabled(new))
return;


---
Mikey


Re: [PATCH v2 3/4] powerpc/tm: Adjust tm_reclaim_thread() parameters

2018-08-16 Thread Michael Neuling
On Mon, 2018-06-18 at 19:59 -0300, Breno Leitao wrote:
> From: Cyril Bur 
> 
> tm_reclaim_thread() doesn't use the parameter anymore, both callers have
> to bother getting it as they have no need for a struct thread_info
> either.
> 
> It was previously used but became unused in commit
> dc3106690b20 ("powerpc: tm: Always use fp_state and vr_state to store live
> registers")
> 
> Just remove it and adjust the callers.
> 
> Signed-off-by: Cyril Bur 
> Signed-off-by: Breno Leitao 

Acked-by: Michael Neuling 

> ---
>  arch/powerpc/kernel/process.c | 7 +++
>  1 file changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> index 9ef4aea9fffe..f8beee03f00a 100644
> --- a/arch/powerpc/kernel/process.c
> +++ b/arch/powerpc/kernel/process.c
> @@ -866,8 +866,7 @@ static inline bool tm_enabled(struct task_struct *tsk)
>   return tsk && tsk->thread.regs && (tsk->thread.regs->msr & MSR_TM);
>  }
>  
> -static void tm_reclaim_thread(struct thread_struct *thr,
> -   struct thread_info *ti, uint8_t cause)
> +static void tm_reclaim_thread(struct thread_struct *thr, uint8_t cause)
>  {
>   /*
>* Use the current MSR TM suspended bit to track if we have
> @@ -914,7 +913,7 @@ static void tm_reclaim_thread(struct thread_struct *thr,
>  void tm_reclaim_current(uint8_t cause)
>  {
>   tm_enable();
> - tm_reclaim_thread(>thread, current_thread_info(), cause);
> + tm_reclaim_thread(>thread, cause);
>  }
>  
>  static inline void tm_reclaim_task(struct task_struct *tsk)
> @@ -945,7 +944,7 @@ static inline void tm_reclaim_task(struct task_struct
> *tsk)
>thr->regs->ccr, thr->regs->msr,
>thr->regs->trap);
>  
> - tm_reclaim_thread(thr, task_thread_info(tsk), TM_CAUSE_RESCHED);
> + tm_reclaim_thread(thr, TM_CAUSE_RESCHED);
>  
>   TM_DEBUG("--- tm_reclaim on pid %d complete\n",
>tsk->pid);


Re: [PATCH v2 1/4] powerpc/tm: Remove msr_tm_active()

2018-08-16 Thread Michael Neuling
On Mon, 2018-06-18 at 19:59 -0300, Breno Leitao wrote:
> Currently msr_tm_active() is a wrapper around MSR_TM_ACTIVE() if
> CONFIG_PPC_TRANSACTIONAL_MEM is set, or it is just a function that
> returns false if CONFIG_PPC_TRANSACTIONAL_MEM is not set.
> 
> This function is not necessary, since MSR_TM_ACTIVE() just do the same,
> checking for the TS bits and does not require any TM facility.
> 
> This patchset remove every instance of msr_tm_active() and replaced it
> by MSR_TM_ACTIVE().
> 
> Signed-off-by: Breno Leitao 
> 

Patch looks good... one minor nit below...

>  
> - if (!msr_tm_active(regs->msr) &&
> - !current->thread.load_fp && !loadvec(current->thread))
> + if (!current->thread.load_fp && !loadvec(current->thread)) {
> +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
> + if (!MSR_TM_ACTIVE(regs->msr))
> + return;

Can you make a MSR_TM_ACTIVE() that returns false when
!CONFIG_PPC_TRANSACTIONAL_MEM. Then you don't need this inline #ifdef.

Mikey

> +#else
>   return;
> +#endif
> + }
>  
>   msr = regs->msr;
>   msr_check_and_set(msr_all_available);


Re: [PATCH 3/3] powerpc/pseries/mm: call H_BLOCK_REMOVE

2018-08-16 Thread Laurent Dufour
On 30/07/2018 15:47, Michael Ellerman wrote:
> Hi Laurent,
> 
> Just one comment below.
> 
> Laurent Dufour  writes:
>> diff --git a/arch/powerpc/platforms/pseries/lpar.c 
>> b/arch/powerpc/platforms/pseries/lpar.c
>> index 96b8cd8a802d..41ed03245eb4 100644
>> --- a/arch/powerpc/platforms/pseries/lpar.c
>> +++ b/arch/powerpc/platforms/pseries/lpar.c
>> @@ -418,6 +418,73 @@ static void pSeries_lpar_hpte_invalidate(unsigned long 
>> slot, unsigned long vpn,
>>  BUG_ON(lpar_rc != H_SUCCESS);
>>  }
>>  
>> +
>> +/*
>> + * As defined in the PAPR's section 14.5.4.1.8
>> + * The control mask doesn't include the returned reference and change bit 
>> from
>> + * the processed PTE.
>> + */
>> +#define HBLKR_AVPN  0x0100UL
>> +#define HBLKR_CTRL_MASK 0xf800UL
>> +#define HBLKR_CTRL_SUCCESS  0x8000UL
>> +#define HBLKR_CTRL_ERRNOTFOUND  0x8800UL
>> +#define HBLKR_CTRL_ERRBUSY  0xa000UL
>> +
>> +/**
>> + * H_BLOCK_REMOVE caller.
>> + * @idx should point to the latest @param entry set with a PTEX.
>> + * If PTE cannot be processed because another CPUs has already locked that
>> + * group, those entries are put back in @param starting at index 1.
>> + * If entries has to be retried and @retry_busy is set to true, these 
>> entries
>> + * are retried until success. If @retry_busy is set to false, the returned
>> + * is the number of entries yet to process.
>> + */
>> +static unsigned long call_block_remove(unsigned long idx, unsigned long 
>> *param,
>> +   bool retry_busy)
>> +{
>> +unsigned long i, rc, new_idx;
>> +unsigned long retbuf[PLPAR_HCALL9_BUFSIZE];
>> +
>> +again:
>> +new_idx = 0;
>> +BUG_ON((idx < 2) || (idx > PLPAR_HCALL9_BUFSIZE));
> 
> I count 1 ..
> 
>> +if (idx < PLPAR_HCALL9_BUFSIZE)
>> +param[idx] = HBR_END;
>> +
>> +rc = plpar_hcall9(H_BLOCK_REMOVE, retbuf,
>> +  param[0], /* AVA */
>> +  param[1],  param[2],  param[3],  param[4], /* TS0-7 */
>> +  param[5],  param[6],  param[7],  param[8]);
>> +if (rc == H_SUCCESS)
>> +return 0;
>> +
>> +BUG_ON(rc != H_PARTIAL);
> 
> 2 ...
> 
>> +/* Check that the unprocessed entries were 'not found' or 'busy' */
>> +for (i = 0; i < idx-1; i++) {
>> +unsigned long ctrl = retbuf[i] & HBLKR_CTRL_MASK;
>> +
>> +if (ctrl == HBLKR_CTRL_ERRBUSY) {
>> +param[++new_idx] = param[i+1];
>> +continue;
>> +}
>> +
>> +BUG_ON(ctrl != HBLKR_CTRL_SUCCESS
>> +   && ctrl != HBLKR_CTRL_ERRNOTFOUND);
> 
> 3 ...
> 
> BUG_ON()s.
> 
> I know the code in this file is already pretty liberal with the use of
> BUG_ON() but I'd prefer if we don't make it any worse.

The first one is clearly not required. But I would keep the following twos
because this call is not expected to fail except if there is a discrepancy
between the linux kernel HASH views and the hypervisor's one, which could be
dramatic in the consequences.

> 
> Given this is an optimisation it seems like we should be able to fall
> back to the existing implementation in the case of error (which will
> probably then BUG_ON() )

I don't think falling back to the H_BULK call will be helpfull since it is
doing the same so the same errors are expected. Furthermore, this hcall can do
a partial work which means complex code to fallback on H_BULK as we should
identify to already processed entries.

> If there's some reason we can't then I guess I can live with it.

I'm proposing to send a new series with _only_ 2 calls to BUG_ON().

Furthermore this patch is not correct on the way the huge pages are managed. I
was too hurry to push it last time.

Cheers,
Laurent.



ptrace compile failure with gcc-8.2 on 32-bit powerpc

2018-08-16 Thread Meelis Roos
After upgrading my distro compiler to gcc-8.2, Linux fails to compile on 
32-bit powerpc (tested with 4.17, 4.18 and v4.18-7873-gf91e654474d4).


  CC  arch/powerpc/kernel/ptrace.o
In file included from ./include/linux/bitmap.h:9,
 from ./include/linux/cpumask.h:12,
 from ./include/linux/rcupdate.h:44,
 from ./include/linux/rculist.h:11,
 from ./include/linux/pid.h:5,
 from ./include/linux/sched.h:14,
 from arch/powerpc/kernel/ptrace.c:19:
In function ‘memcpy’,
inlined from ‘user_regset_copyin’ at ./include/linux/regset.h:295:4,
inlined from ‘vr_set’ at arch/powerpc/kernel/ptrace.c:619:9:
./include/linux/string.h:345:9: error: ‘__builtin_memcpy’ offset [-527, -529] 
is out of the bounds [0, 16] of object ‘vrsave’ with type ‘union ’ 
[-Werror=array-bounds]
  return __builtin_memcpy(p, q, size);
 ^~~~
arch/powerpc/kernel/ptrace.c: In function ‘vr_set’:
arch/powerpc/kernel/ptrace.c:614:5: note: ‘vrsave’ declared here
   } vrsave;
 ^~
In file included from ./include/linux/bitmap.h:9,
 from ./include/linux/cpumask.h:12,
 from ./include/linux/rcupdate.h:44,
 from ./include/linux/rculist.h:11,
 from ./include/linux/pid.h:5,
 from ./include/linux/sched.h:14,
 from arch/powerpc/kernel/ptrace.c:19:
In function ‘memcpy’,
inlined from ‘user_regset_copyout’ at ./include/linux/regset.h:270:4,
inlined from ‘vr_get’ at arch/powerpc/kernel/ptrace.c:572:9:
./include/linux/string.h:345:9: error: ‘__builtin_memcpy’ offset [-527, -529] 
is out of the bounds [0, 16] of object ‘vrsave’ with type ‘union ’ 
[-Werror=array-bounds]
  return __builtin_memcpy(p, q, size);
 ^~~~
arch/powerpc/kernel/ptrace.c: In function ‘vr_get’:
arch/powerpc/kernel/ptrace.c:567:5: note: ‘vrsave’ declared here
   } vrsave;
 ^~
cc1: all warnings being treated as errors
make[1]: *** [scripts/Makefile.build:311: arch/powerpc/kernel/ptrace.o] Error 1


-- 
Meelis Roos (mr...@linux.ee)