Re: [PATCH 3/6] powerpc/64s: Use htab_convert_pte_flags() in hash__mark_rodata_ro()

2021-02-15 Thread Daniel Axtens
Hi Michael,

> In hash__mark_rodata_ro() we pass the raw PP_RXXX value to
> hash__change_memory_range(). That has the effect of setting the key to
> zero, because PP_RXXX contains no key value.
>
> Fix it by using htab_convert_pte_flags(), which knows how to convert a
> pgprot into a pp value, including the key.

So far as I can tell by chasing the definitions around, this appears
to do what it claims to do.

So, for what it's worth:
Reviewed-by: Daniel Axtens 

Kind regards,
Daniel

>
> Fixes: d94b827e89dc ("powerpc/book3s64/kuap: Use Key 3 for kernel mapping 
> with hash translation")
> Signed-off-by: Michael Ellerman 
> ---
>  arch/powerpc/mm/book3s64/hash_pgtable.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/mm/book3s64/hash_pgtable.c 
> b/arch/powerpc/mm/book3s64/hash_pgtable.c
> index 567e0c6b3978..03819c259f0a 100644
> --- a/arch/powerpc/mm/book3s64/hash_pgtable.c
> +++ b/arch/powerpc/mm/book3s64/hash_pgtable.c
> @@ -428,12 +428,14 @@ static bool hash__change_memory_range(unsigned long 
> start, unsigned long end,
>  
>  void hash__mark_rodata_ro(void)
>  {
> - unsigned long start, end;
> + unsigned long start, end, pp;
>  
>   start = (unsigned long)_stext;
>   end = (unsigned long)__init_begin;
>  
> - WARN_ON(!hash__change_memory_range(start, end, PP_RXXX));
> + pp = htab_convert_pte_flags(pgprot_val(PAGE_KERNEL_ROX), 
> HPTE_USE_KERNEL_KEY);
> +
> + WARN_ON(!hash__change_memory_range(start, end, pp));
>  }
>  
>  void hash__mark_initmem_nx(void)
> -- 
> 2.25.1


Re: [PATCH 2/6] powerpc/pseries: Add key to flags in pSeries_lpar_hpte_updateboltedpp()

2021-02-15 Thread Daniel Axtens
Michael Ellerman  writes:

> The flags argument to plpar_pte_protect() (aka. H_PROTECT), includes
> the key in bits 9-13, but currently we always set those bits to zero.
>
> In the past that hasn't been a problem because we always used key 0
> for the kernel, and updateboltedpp() is only used for kernel mappings.
>
> However since commit d94b827e89dc ("powerpc/book3s64/kuap: Use Key 3
> for kernel mapping with hash translation") we are now inadvertently
> changing the key (to zero) when we call plpar_pte_protect().
>
> That hasn't broken anything because updateboltedpp() is only used for
> STRICT_KERNEL_RWX, which is currently disabled on 64s due to other
> bugs.
>
> But we want to fix that, so first we need to pass the key correctly to
> plpar_pte_protect(). In the `newpp` value the low 3 bits of the key
> are already in the correct spot, but the high 2 bits of the key need
> to be shifted down.
>
> Fixes: d94b827e89dc ("powerpc/book3s64/kuap: Use Key 3 for kernel mapping 
> with hash translation")
> Signed-off-by: Michael Ellerman 
> ---
>  arch/powerpc/platforms/pseries/lpar.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/platforms/pseries/lpar.c 
> b/arch/powerpc/platforms/pseries/lpar.c
> index 764170fdb0f7..8bbbddff7226 100644
> --- a/arch/powerpc/platforms/pseries/lpar.c
> +++ b/arch/powerpc/platforms/pseries/lpar.c
> @@ -976,11 +976,13 @@ static void pSeries_lpar_hpte_updateboltedpp(unsigned 
> long newpp,
>   slot = pSeries_lpar_hpte_find(vpn, psize, ssize);
>   BUG_ON(slot == -1);
>  
> - flags = newpp & 7;
> + flags = newpp & (HPTE_R_PP | HPTE_R_N);
>   if (mmu_has_feature(MMU_FTR_KERNEL_RO))
>   /* Move pp0 into bit 8 (IBM 55) */
>   flags |= (newpp & HPTE_R_PP0) >> 55;
>  
> + flags |= ((newpp & HPTE_R_KEY_HI) >> 48) | (newpp & HPTE_R_KEY_LO);
> +

I'm really confused about how these bits are getting packed into the
flags parameter. It seems to match how they are unpacked in
kvmppc_h_pr_protect, but I cannot figure out why they are packed in that
order, and the LoPAR doesn't seem especially illuminating on this topic
- although I may have missed the relevant section.

Kind regards,
Daniel

>   lpar_rc = plpar_pte_protect(flags, slot, 0);
>  
>   BUG_ON(lpar_rc != H_SUCCESS);
> -- 
> 2.25.1


[PATCH kernel 2/2] powerpc/iommu: Do not immediately panic when failed IOMMU table allocation

2021-02-15 Thread Alexey Kardashevskiy
Most platforms allocate IOMMU table structures (specifically it_map)
at the boot time and when this fails - it is a valid reason for panic().

However the powernv platform allocates it_map after a device is returned
to the host OS after being passed through and this happens long after
the host OS booted. It is quite possible to trigger the it_map allocation
panic() and kill the host even though it is not necessary - the host OS
can still use the DMA bypass mode (requires a tiny fraction of it_map's
memory) and even if that fails, the host OS is runnnable as it was without
the device for which allocating it_map causes the panic.

Instead of immediately crashing in a powernv/ioda2 system, this prints
an error and continues. All other platforms still call panic().

Signed-off-by: Alexey Kardashevskiy 
---
 arch/powerpc/kernel/iommu.c   |  6 --
 arch/powerpc/platforms/cell/iommu.c   |  3 ++-
 arch/powerpc/platforms/pasemi/iommu.c |  4 +++-
 arch/powerpc/platforms/powernv/pci-ioda.c | 15 ---
 arch/powerpc/platforms/pseries/iommu.c| 10 +++---
 arch/powerpc/sysdev/dart_iommu.c  |  3 ++-
 6 files changed, 26 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index 8eb6eb0afa97..c1a5c366a664 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -728,8 +728,10 @@ struct iommu_table *iommu_init_table(struct iommu_table 
*tbl, int nid,
sz = BITS_TO_LONGS(tbl->it_size) * sizeof(unsigned long);
 
tbl->it_map = vzalloc_node(sz, nid);
-   if (!tbl->it_map)
-   panic("iommu_init_table: Can't allocate %ld bytes\n", sz);
+   if (!tbl->it_map) {
+   pr_err("%s: Can't allocate %ld bytes\n", __func__, sz);
+   return NULL;
+   }
 
iommu_table_reserve_pages(tbl, res_start, res_end);
 
diff --git a/arch/powerpc/platforms/cell/iommu.c 
b/arch/powerpc/platforms/cell/iommu.c
index 2124831cf57c..fa08699aedeb 100644
--- a/arch/powerpc/platforms/cell/iommu.c
+++ b/arch/powerpc/platforms/cell/iommu.c
@@ -486,7 +486,8 @@ cell_iommu_setup_window(struct cbe_iommu *iommu, struct 
device_node *np,
window->table.it_size = size >> window->table.it_page_shift;
window->table.it_ops = &cell_iommu_ops;
 
-   iommu_init_table(&window->table, iommu->nid, 0, 0);
+   if (!iommu_init_table(&window->table, iommu->nid, 0, 0))
+   panic("Failed to initialize iommu table");
 
pr_debug("\tioid  %d\n", window->ioid);
pr_debug("\tblocksize %ld\n", window->table.it_blocksize);
diff --git a/arch/powerpc/platforms/pasemi/iommu.c 
b/arch/powerpc/platforms/pasemi/iommu.c
index b500a6e47e6b..5be7242fbd86 100644
--- a/arch/powerpc/platforms/pasemi/iommu.c
+++ b/arch/powerpc/platforms/pasemi/iommu.c
@@ -146,7 +146,9 @@ static void iommu_table_iobmap_setup(void)
 */
iommu_table_iobmap.it_blocksize = 4;
iommu_table_iobmap.it_ops = &iommu_table_iobmap_ops;
-   iommu_init_table(&iommu_table_iobmap, 0, 0, 0);
+   if (!iommu_init_table(&iommu_table_iobmap, 0, 0, 0))
+   panic("Failed to initialize iommu table");
+
pr_debug(" <- %s\n", __func__);
 }
 
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index f0f901683a2f..66c3c3337334 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1762,7 +1762,8 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb 
*phb,
tbl->it_ops = &pnv_ioda1_iommu_ops;
pe->table_group.tce32_start = tbl->it_offset << tbl->it_page_shift;
pe->table_group.tce32_size = tbl->it_size << tbl->it_page_shift;
-   iommu_init_table(tbl, phb->hose->node, 0, 0);
+   if (!iommu_init_table(tbl, phb->hose->node, 0, 0))
+   panic("Failed to initialize iommu table");
 
pe->dma_setup_done = true;
return;
@@ -1930,16 +1931,16 @@ static long pnv_pci_ioda2_setup_default_config(struct 
pnv_ioda_pe *pe)
res_start = pe->phb->ioda.m32_pci_base >> tbl->it_page_shift;
res_end = min(window_size, SZ_4G) >> tbl->it_page_shift;
}
-   iommu_init_table(tbl, pe->phb->hose->node, res_start, res_end);
 
-   rc = pnv_pci_ioda2_set_window(&pe->table_group, 0, tbl);
+   if (iommu_init_table(tbl, pe->phb->hose->node, res_start, res_end))
+   rc = pnv_pci_ioda2_set_window(&pe->table_group, 0, tbl);
+   else
+   rc = -ENOMEM;
if (rc) {
-   pe_err(pe, "Failed to configure 32-bit TCE table, err %ld\n",
-   rc);
+   pe_err(pe, "Failed to configure 32-bit TCE table, err %ld\n", 
rc);
iommu_tce_table_put(tbl);
-   return rc;
+   tbl = NULL; /* This clears iommu_table_base below */
}
-
if (!pnv_iommu_bypass_disabled)
pnv_pc

[PATCH kernel 0/2] powerpc/iommu: Stop crashing the host when VM is terminated

2021-02-15 Thread Alexey Kardashevskiy
Killing a VM on a host under memory pressure kills a host which is
annoying. 1/2 reduces the chances, 2/2 eliminates panic() on
ioda2.


This is based on sha1
f40ddce88593 Linus Torvalds "Linux 5.11".

Please comment. Thanks.



Alexey Kardashevskiy (2):
  powerpc/iommu: Allocate it_map by vmalloc
  powerpc/iommu: Do not immediately panic when failed IOMMU table
allocation

 arch/powerpc/kernel/iommu.c   | 19 ++-
 arch/powerpc/platforms/cell/iommu.c   |  3 ++-
 arch/powerpc/platforms/pasemi/iommu.c |  4 +++-
 arch/powerpc/platforms/powernv/pci-ioda.c | 15 ---
 arch/powerpc/platforms/pseries/iommu.c| 10 +++---
 arch/powerpc/sysdev/dart_iommu.c  |  3 ++-
 6 files changed, 28 insertions(+), 26 deletions(-)

-- 
2.17.1



[PATCH kernel 1/2] powerpc/iommu: Allocate it_map by vmalloc

2021-02-15 Thread Alexey Kardashevskiy
The IOMMU table uses the it_map bitmap to keep track of allocated DMA
pages. This has always been a contiguous array allocated at either
the boot time or when a passed through device is returned to the host OS.
The it_map memory is allocated by alloc_pages() which allocates
contiguous physical memory.

Such allocation method occasionally creates a problem when there is
no big chunk of memory available (no free memory or too fragmented).
On powernv/ioda2 the default DMA window requires 16MB for it_map.

This replaces alloc_pages_node() with vzalloc_node() which allocates
contiguous block but in virtual memory. This should reduce changes of
failure but should not cause other behavioral changes as it_map is only
used by the kernel's DMA hooks/api when MMU is on.

Signed-off-by: Alexey Kardashevskiy 
---
 arch/powerpc/kernel/iommu.c | 15 +++
 1 file changed, 3 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index c00214a4355c..8eb6eb0afa97 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -719,7 +719,6 @@ struct iommu_table *iommu_init_table(struct iommu_table 
*tbl, int nid,
 {
unsigned long sz;
static int welcomed = 0;
-   struct page *page;
unsigned int i;
struct iommu_pool *p;
 
@@ -728,11 +727,9 @@ struct iommu_table *iommu_init_table(struct iommu_table 
*tbl, int nid,
/* number of bytes needed for the bitmap */
sz = BITS_TO_LONGS(tbl->it_size) * sizeof(unsigned long);
 
-   page = alloc_pages_node(nid, GFP_KERNEL, get_order(sz));
-   if (!page)
+   tbl->it_map = vzalloc_node(sz, nid);
+   if (!tbl->it_map)
panic("iommu_init_table: Can't allocate %ld bytes\n", sz);
-   tbl->it_map = page_address(page);
-   memset(tbl->it_map, 0, sz);
 
iommu_table_reserve_pages(tbl, res_start, res_end);
 
@@ -774,8 +771,6 @@ struct iommu_table *iommu_init_table(struct iommu_table 
*tbl, int nid,
 
 static void iommu_table_free(struct kref *kref)
 {
-   unsigned long bitmap_sz;
-   unsigned int order;
struct iommu_table *tbl;
 
tbl = container_of(kref, struct iommu_table, it_kref);
@@ -796,12 +791,8 @@ static void iommu_table_free(struct kref *kref)
if (!bitmap_empty(tbl->it_map, tbl->it_size))
pr_warn("%s: Unexpected TCEs\n", __func__);
 
-   /* calculate bitmap size in bytes */
-   bitmap_sz = BITS_TO_LONGS(tbl->it_size) * sizeof(unsigned long);
-
/* free bitmap */
-   order = get_order(bitmap_sz);
-   free_pages((unsigned long) tbl->it_map, order);
+   vfree(tbl->it_map);
 
/* free table */
kfree(tbl);
-- 
2.17.1



[PATCH kernel] powerpc/iommu: Annotate nested lock for lockdep

2021-02-15 Thread Alexey Kardashevskiy
The IOMMU table is divided into pools for concurrent mappings and each
pool has a separate spinlock. When taking the ownership of an IOMMU group
to pass through a device to a VM, we lock these spinlocks which triggers
a false negative warning in lockdep (below).

This fixes it by annotating the large pool's spinlock as a nest lock.

===
WARNING: possible recursive locking detected
5.11.0-le_syzkaller_a+fstn1 #100 Not tainted

qemu-system-ppc/4129 is trying to acquire lock:
c000119bddb0 (&(p->lock)/1){}-{2:2}, at: iommu_take_ownership+0xac/0x1e0

but task is already holding lock:
c000119bdd30 (&(p->lock)/1){}-{2:2}, at: iommu_take_ownership+0xac/0x1e0

other info that might help us debug this:
 Possible unsafe locking scenario:

   CPU0
   
  lock(&(p->lock)/1);
  lock(&(p->lock)/1);
===

Signed-off-by: Alexey Kardashevskiy 
---
 arch/powerpc/kernel/iommu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index 557a09dd5b2f..2ee642a6731a 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -1089,7 +1089,7 @@ int iommu_take_ownership(struct iommu_table *tbl)
 
spin_lock_irqsave(&tbl->large_pool.lock, flags);
for (i = 0; i < tbl->nr_pools; i++)
-   spin_lock(&tbl->pools[i].lock);
+   spin_lock_nest_lock(&tbl->pools[i].lock, &tbl->large_pool.lock);
 
iommu_table_release_pages(tbl);
 
-- 
2.17.1



Re: [PATCH kernel] powerpc/perf: Stop crashing with generic_compat_pmu

2021-02-15 Thread Alexey Kardashevskiy




On 03/12/2020 16:27, Madhavan Srinivasan wrote:


On 12/2/20 8:31 AM, Alexey Kardashevskiy wrote:

Hi Maddy,

I just noticed that I still have "powerpc/perf: Add checks for 
reserved values" in my pile (pushed here 
https://github.com/aik/linux/commit/61e1bc3f2e19d450e2e2d39174d422160b21957b 
), do we still need it? The lockups I saw were fixed by 
https://github.com/aik/linux/commit/17899eaf88d689 but it is hardly a 
replacement. Thanks,


sorry missed this. Will look at this again. Since we will need 
generation specific checks for the reserve field.



So any luck with this? Cheers,






Maddy




On 04/06/2020 02:34, Madhavan Srinivasan wrote:



On 6/2/20 8:26 AM, Alexey Kardashevskiy wrote:
The bhrb_filter_map ("The  Branch History  Rolling  Buffer") 
callback is

only defined in raw CPUs' power_pmu structs. The "architected" CPUs use
generic_compat_pmu which does not have this callback and crashed occur.

This add a NULL pointer check for bhrb_filter_map() which behaves as if
the callback returned an error.

This does not add the same check for config_bhrb() as the only caller
checks for cpuhw->bhrb_users which remains zero if bhrb_filter_map==0.


Changes looks fine.
Reviewed-by: Madhavan Srinivasan 

The commit be80e758d0c2e ('powerpc/perf: Add generic compat mode pmu 
driver')

which introduced generic_compat_pmu was merged in v5.2.  So we need to
CC stable starting from 5.2 :( .  My bad,  sorry.

Maddy


Signed-off-by: Alexey Kardashevskiy 
---
  arch/powerpc/perf/core-book3s.c | 19 ++-
  1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/perf/core-book3s.c 
b/arch/powerpc/perf/core-book3s.c

index 3dcfecf858f3..36870569bf9c 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -1515,9 +1515,16 @@ static int power_pmu_add(struct perf_event 
*event, int ef_flags)

  ret = 0;
   out:
  if (has_branch_stack(event)) {
-    power_pmu_bhrb_enable(event);
-    cpuhw->bhrb_filter = ppmu->bhrb_filter_map(
-    event->attr.branch_sample_type);
+    u64 bhrb_filter = -1;
+
+    if (ppmu->bhrb_filter_map)
+    bhrb_filter = ppmu->bhrb_filter_map(
+    event->attr.branch_sample_type);
+
+    if (bhrb_filter != -1) {
+    cpuhw->bhrb_filter = bhrb_filter;
+    power_pmu_bhrb_enable(event); /* Does bhrb_users++ */
+    }
  }

  perf_pmu_enable(event->pmu);
@@ -1839,7 +1846,6 @@ static int power_pmu_event_init(struct 
perf_event *event)

  int n;
  int err;
  struct cpu_hw_events *cpuhw;
-    u64 bhrb_filter;

  if (!ppmu)
  return -ENOENT;
@@ -1945,7 +1951,10 @@ static int power_pmu_event_init(struct 
perf_event *event)

  err = power_check_constraints(cpuhw, events, cflags, n + 1);

  if (has_branch_stack(event)) {
-    bhrb_filter = ppmu->bhrb_filter_map(
+    u64 bhrb_filter = -1;
+
+    if (ppmu->bhrb_filter_map)
+    bhrb_filter = ppmu->bhrb_filter_map(
  event->attr.branch_sample_type);

  if (bhrb_filter == -1) {






--
Alexey


Re: [PATCH v18 03/11] of: Add a common kexec FDT setup function

2021-02-15 Thread Thiago Jung Bauermann


Lakshmi Ramasubramanian  writes:

> From: Rob Herring 
>
> Both arm64 and powerpc do essentially the same FDT /chosen setup for
> kexec.  The differences are either omissions that arm64 should have
> or additional properties that will be ignored.  The setup code can be
> combined and shared by both powerpc and arm64.
>
> The differences relative to the arm64 version:
>  - If /chosen doesn't exist, it will be created (should never happen).
>  - Any old dtb and initrd reserved memory will be released.
>  - The new initrd and elfcorehdr are marked reserved.
>  - "linux,booted-from-kexec" is set.
>
> The differences relative to the powerpc version:
>  - "kaslr-seed" and "rng-seed" may be set.
>  - "linux,elfcorehdr" is set.
>  - Any existing "linux,usable-memory-range" is removed.
>
> Combine the code for setting up the /chosen node in the FDT and updating
> the memory reservation for kexec, for powerpc and arm64, in
> of_kexec_alloc_and_setup_fdt() and move it to "drivers/of/kexec.c".
>
> Signed-off-by: Rob Herring 
> Signed-off-by: Lakshmi Ramasubramanian 
> ---
>  drivers/of/Makefile |   6 +
>  drivers/of/kexec.c  | 265 
>  include/linux/of.h  |   5 +
>  3 files changed, 276 insertions(+)
>  create mode 100644 drivers/of/kexec.c

Reviewed-by: Thiago Jung Bauermann 

-- 
Thiago Jung Bauermann
IBM Linux Technology Center


Re: [PATCH v18 06/11] powerpc: Move ima buffer fields to struct kimage

2021-02-15 Thread Thiago Jung Bauermann


Lakshmi Ramasubramanian  writes:

> The fields ima_buffer_addr and ima_buffer_size in "struct kimage_arch"
> for powerpc are used to carry forward the IMA measurement list across
> kexec system call.  These fields are not architecture specific, but are
> currently limited to powerpc.
>
> arch_ima_add_kexec_buffer() defined in "arch/powerpc/kexec/ima.c"
> sets ima_buffer_addr and ima_buffer_size for the kexec system call.
> This function does not have architecture specific code, but is
> currently limited to powerpc.
>
> Move ima_buffer_addr and ima_buffer_size to "struct kimage".
> Set ima_buffer_addr and ima_buffer_size in ima_add_kexec_buffer()
> in security/integrity/ima/ima_kexec.c.
>
> Co-developed-by: Prakhar Srivastava 
> Signed-off-by: Prakhar Srivastava 
> Signed-off-by: Lakshmi Ramasubramanian 
> Suggested-by: Will Deacon 
> ---
>  arch/powerpc/include/asm/ima.h |  3 ---
>  arch/powerpc/include/asm/kexec.h   |  5 -
>  arch/powerpc/kexec/ima.c   | 29 ++---
>  include/linux/kexec.h  |  3 +++
>  security/integrity/ima/ima_kexec.c |  8 ++--
>  5 files changed, 11 insertions(+), 37 deletions(-)

Reviewed-by: Thiago Jung Bauermann 

-- 
Thiago Jung Bauermann
IBM Linux Technology Center


Re: [PATCH v18 05/11] powerpc: Use common of_kexec_alloc_and_setup_fdt()

2021-02-15 Thread Thiago Jung Bauermann


Lakshmi Ramasubramanian  writes:

> From: Rob Herring 
>
> The code for setting up the /chosen node in the device tree
> and updating the memory reservation for the next kernel has been
> moved to of_kexec_alloc_and_setup_fdt() defined in "drivers/of/kexec.c".
>
> Use the common of_kexec_alloc_and_setup_fdt() to setup the device tree
> and update the memory reservation for kexec for powerpc.
>
> Signed-off-by: Rob Herring 
> Signed-off-by: Lakshmi Ramasubramanian 
> ---
>  arch/powerpc/include/asm/kexec.h  |   1 +
>  arch/powerpc/kexec/elf_64.c   |  30 ---
>  arch/powerpc/kexec/file_load.c| 132 +-
>  arch/powerpc/kexec/file_load_64.c |   3 +
>  4 files changed, 26 insertions(+), 140 deletions(-)

Reviewed-by: Thiago Jung Bauermann 

-- 
Thiago Jung Bauermann
IBM Linux Technology Center


Re: [PATCH v18 04/11] arm64: Use common of_kexec_alloc_and_setup_fdt()

2021-02-15 Thread Thiago Jung Bauermann


Lakshmi Ramasubramanian  writes:

> From: Rob Herring 
>
> The code for setting up the /chosen node in the device tree
> and updating the memory reservation for the next kernel has been
> moved to of_kexec_alloc_and_setup_fdt() defined in "drivers/of/kexec.c".
>
> Use the common of_kexec_alloc_and_setup_fdt() to setup the device tree
> and update the memory reservation for kexec for arm64.
>
> Signed-off-by: Rob Herring 
> Signed-off-by: Lakshmi Ramasubramanian 
> ---
>  arch/arm64/kernel/machine_kexec_file.c | 180 ++---
>  1 file changed, 8 insertions(+), 172 deletions(-)

Reviewed-by: Thiago Jung Bauermann 

-- 
Thiago Jung Bauermann
IBM Linux Technology Center


Re: [PATCH v18 01/11] powerpc: Rename kexec elfcorehdr_addr to elf_load_addr

2021-02-15 Thread Thiago Jung Bauermann


Lakshmi Ramasubramanian  writes:

> From: Rob Herring 
>
> The architecture specific field, elfcorehdr_addr in struct kimage_arch,
> that holds the address of the buffer in memory for ELF core header for
> powerpc has a different name than the one used for x86_64.  This makes
> it hard to have a common code for setting up the device tree for
> kexec system call.
>
> Rename elfcorehdr_addr to elf_load_addr to align with x86_64 name so
> common code can use it.
>
> Signed-off-by: Rob Herring 
> Reviewed-by: Lakshmi Ramasubramanian 
> ---
>  arch/powerpc/include/asm/kexec.h  | 2 +-
>  arch/powerpc/kexec/file_load.c| 4 ++--
>  arch/powerpc/kexec/file_load_64.c | 4 ++--
>  3 files changed, 5 insertions(+), 5 deletions(-)

Reviewed-by: Thiago Jung Bauermann 

-- 
Thiago Jung Bauermann
IBM Linux Technology Center


Re: [PATCH v18 02/11] arm64: Rename kexec elf_headers_mem to elf_load_addr

2021-02-15 Thread Thiago Jung Bauermann


Lakshmi Ramasubramanian  writes:

> The architecture specific field, elf_headers_mem in struct kimage_arch,
> that holds the address of the buffer in memory for ELF core header for
> arm64 has a different name than the one used for powerpc.  This makes
> it hard to have a common code for setting up the device tree for
> kexec system call.
>
> Rename elf_headers_mem to elf_load_addr to align with powerpc name so
> common code can use it.
>
> Signed-off-by: Lakshmi Ramasubramanian 
> Suggested-by: Thiago Jung Bauermann 
> ---
>  arch/arm64/include/asm/kexec.h | 2 +-
>  arch/arm64/kernel/machine_kexec_file.c | 6 +++---
>  2 files changed, 4 insertions(+), 4 deletions(-)

Reviewed-by: Thiago Jung Bauermann 

-- 
Thiago Jung Bauermann
IBM Linux Technology Center


Re: [PATCH 1/4] add generic builtin command line

2021-02-15 Thread Daniel Gimpelevich
On Thu, 2019-03-21 at 15:15 -0700, Andrew Morton wrote:
> On Thu, 21 Mar 2019 08:13:08 -0700 Daniel Walker  wrote:
> > On Wed, Mar 20, 2019 at 08:14:33PM -0700, Andrew Morton wrote:
> > > The patches (or some version of them) are already in linux-next,
> > > which messes me up.  I'll disable them for now.
> >  
> > Those are from my tree, but I remove them when you picked up the series. The
> > next linux-next should not have them.
> 
> Yup, thanks, all looks good now.

This patchset is currently neither in mainline nor in -next. May I ask
what happened to it? Thanks.



Re: [PATCH 1/4] ibmvfc: simplify handling of sub-CRQ initialization

2021-02-15 Thread Brian King
Reviewed-by: Brian King 


-- 
Brian King
Power Linux I/O
IBM Linux Technology Center



Re: [PATCH for 5.10] powerpc/32: Preserve cr1 in exception prolog stack check to fix build error

2021-02-15 Thread Greg KH
On Fri, Feb 12, 2021 at 08:57:14AM +, Christophe Leroy wrote:
> This is backport of 3642eb21256a ("powerpc/32: Preserve cr1 in
> exception prolog stack check to fix build error") for kernel 5.10
> 
> It fixes the build failure on v5.10 reported by kernel test robot
> and by David Michael.
> 
> This fix is not in Linux tree yet, it is in next branch in powerpc tree.

Then there's nothing I can do about it until that happens :(



Re: [PATCH v4 1/3] powerpc/book3s64/radix/tlb: tlbie primitives for process-scoped invalidations from guests

2021-02-15 Thread kernel test robot
Hi Bharata,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on kvm/linux-next]
[also build test ERROR on v5.11]
[cannot apply to powerpc/next next-20210212]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Bharata-B-Rao/Support-for-H_RPT_INVALIDATE-in-PowerPC-KVM/20210215-143815
base:   https://git.kernel.org/pub/scm/virt/kvm/kvm.git linux-next
config: powerpc64-randconfig-r005-20210215 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 
c9439ca36342fb6013187d0a69aef92736951476)
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# install powerpc64 cross compiling tool for clang build
# apt-get install binutils-powerpc64-linux-gnu
# 
https://github.com/0day-ci/linux/commit/2a2c1320dc2bc67ec962721c39e7639cc1abfa9d
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Bharata-B-Rao/Support-for-H_RPT_INVALIDATE-in-PowerPC-KVM/20210215-143815
git checkout 2a2c1320dc2bc67ec962721c39e7639cc1abfa9d
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross 
ARCH=powerpc64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>):

>> arch/powerpc/mm/book3s64/radix_tlb.c:399:20: error: unused function 
>> '_tlbie_pid_lpid' [-Werror,-Wunused-function]
   static inline void _tlbie_pid_lpid(unsigned long pid, unsigned long lpid,
  ^
>> arch/powerpc/mm/book3s64/radix_tlb.c:643:20: error: unused function 
>> '_tlbie_va_range_lpid' [-Werror,-Wunused-function]
   static inline void _tlbie_va_range_lpid(unsigned long start, unsigned long 
end,
  ^
   2 errors generated.


vim +/_tlbie_pid_lpid +399 arch/powerpc/mm/book3s64/radix_tlb.c

   398  
 > 399  static inline void _tlbie_pid_lpid(unsigned long pid, unsigned long 
 > lpid,
   400 unsigned long ric)
   401  {
   402  asm volatile("ptesync" : : : "memory");
   403  
   404  /*
   405   * Workaround the fact that the "ric" argument to __tlbie_pid
   406   * must be a compile-time contraint to match the "i" constraint
   407   * in the asm statement.
   408   */
   409  switch (ric) {
   410  case RIC_FLUSH_TLB:
   411  __tlbie_pid_lpid(pid, lpid, RIC_FLUSH_TLB);
   412  fixup_tlbie_pid_lpid(pid, lpid);
   413  break;
   414  case RIC_FLUSH_PWC:
   415  __tlbie_pid_lpid(pid, lpid, RIC_FLUSH_PWC);
   416  break;
   417  case RIC_FLUSH_ALL:
   418  default:
   419  __tlbie_pid_lpid(pid, lpid, RIC_FLUSH_ALL);
   420  fixup_tlbie_pid_lpid(pid, lpid);
   421  }
   422  asm volatile("eieio; tlbsync; ptesync" : : : "memory");
   423  }
   424  struct tlbiel_pid {
   425  unsigned long pid;
   426  unsigned long ric;
   427  };
   428  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip


Re: [PATCH 00/27] arch: syscalls: unifiy all syscalltbl.sh into scripts/syscalltbl.sh

2021-02-15 Thread Masahiro Yamada
On Thu, Jan 28, 2021 at 9:51 AM Masahiro Yamada  wrote:
>
>
> As of v5.11-rc1, 12 architectures duplicate similar shell scripts:
>
>   $ find arch -name syscalltbl.sh | sort
>   arch/alpha/kernel/syscalls/syscalltbl.sh
>   arch/arm/tools/syscalltbl.sh
>   arch/ia64/kernel/syscalls/syscalltbl.sh
>   arch/m68k/kernel/syscalls/syscalltbl.sh
>   arch/microblaze/kernel/syscalls/syscalltbl.sh
>   arch/mips/kernel/syscalls/syscalltbl.sh
>   arch/parisc/kernel/syscalls/syscalltbl.sh
>   arch/powerpc/kernel/syscalls/syscalltbl.sh
>   arch/sh/kernel/syscalls/syscalltbl.sh
>   arch/sparc/kernel/syscalls/syscalltbl.sh
>   arch/x86/entry/syscalls/syscalltbl.sh
>   arch/xtensa/kernel/syscalls/syscalltbl.sh
>
> This patch set unifies all of them into a single file,
> scripts/syscalltbl.sh.
>
> The code-diff is attractive:
>
>  51 files changed, 254 insertions(+), 674 deletions(-)
>  delete mode 100644 arch/alpha/kernel/syscalls/syscalltbl.sh
>  delete mode 100644 arch/arm/tools/syscalltbl.sh
>  delete mode 100644 arch/ia64/kernel/syscalls/syscalltbl.sh
>  delete mode 100644 arch/m68k/kernel/syscalls/syscalltbl.sh
>  delete mode 100644 arch/microblaze/kernel/syscalls/syscalltbl.sh
>  delete mode 100644 arch/mips/kernel/syscalls/syscalltbl.sh
>  delete mode 100644 arch/parisc/kernel/syscalls/syscalltbl.sh
>  delete mode 100644 arch/powerpc/kernel/syscalls/syscalltbl.sh
>  delete mode 100644 arch/sh/kernel/syscalls/syscalltbl.sh
>  delete mode 100644 arch/sparc/kernel/syscalls/syscalltbl.sh
>  delete mode 100644 arch/x86/entry/syscalls/syscalltbl.sh
>  delete mode 100644 arch/xtensa/kernel/syscalls/syscalltbl.sh
>  create mode 100644 scripts/syscalltbl.sh
>
> Also, this includes Makefile fixes, and some x86 fixes and cleanups.
>
> My question is, how to merge this series.
>
> I am touching all architectures, but the first patch is a prerequisite
> of the rest of this series.
>
> One possibility is to ask the x86 maintainers to pickup the first 5
> patches for v5.12-rc1, and then send the rest for v5.13-rc1,
> splitting per-arch.
>
> I want the x86 maintainers to check the first 5 patches because
> I cleaned up the x32 code.


Never mind.

Sending too big patch set tends to fail.

I will apply the generic script parts to my tree,
then split the rest per arch in the next development cycle
(aim for v5.13-rc1)









> I know x32 was considered for deprecation, but my motivation is to
> clean-up scripts across the tree without changing the functionality.
>
>
>
> Masahiro Yamada (27):
>   scripts: add generic syscalltbl.sh
>   x86/syscalls: fix -Wmissing-prototypes warnings from COND_SYSCALL()
>   x86/build: add missing FORCE and fix 'targets' to make if_changed work
>   x86/entry/x32: rename __x32_compat_sys_* to __x64_compat_sys_*
>   x86/syscalls: switch to generic syscalltbl.sh
>   ARM: syscalls: switch to generic syscalltbl.sh
>   alpha: add missing FORCE and fix 'targets' to make if_changed work
>   alpha: syscalls: switch to generic syscalltbl.sh
>   ia64: add missing FORCE and fix 'targets' to make if_changed work
>   ia64: syscalls: switch to generic syscalltbl.sh
>   m68k: add missing FORCE and fix 'targets' to make if_changed work
>   m68k: syscalls: switch to generic syscalltbl.sh
>   microblaze: add missing FORCE and fix 'targets' to make if_changed
> work
>   microblaze: syscalls: switch to generic syscalltbl.sh
>   mips: add missing FORCE and fix 'targets' to make if_changed work
>   mips: syscalls: switch to generic syscalltbl.sh
>   parisc: add missing FORCE and fix 'targets' to make if_changed work
>   parisc: syscalls: switch to generic syscalltbl.sh
>   sh: add missing FORCE and fix 'targets' to make if_changed work
>   sh: syscalls: switch to generic syscalltbl.sh
>   sparc: remove wrong comment from arch/sparc/include/asm/Kbuild
>   sparc: add missing FORCE and fix 'targets' to make if_changed work
>   sparc: syscalls: switch to generic syscalltbl.sh
>   powerpc: add missing FORCE and fix 'targets' to make if_changed work
>   powerpc: syscalls: switch to generic syscalltbl.sh
>   xtensa: add missing FORCE and fix 'targets' to make if_changed work
>   xtensa: syscalls: switch to generic syscalltbl.sh
>
>  arch/alpha/kernel/syscalls/Makefile   | 18 +++
>  arch/alpha/kernel/syscalls/syscalltbl.sh  | 32 ---
>  arch/alpha/kernel/systbls.S   |  3 +-
>  arch/arm/kernel/entry-common.S|  8 +--
>  arch/arm/tools/Makefile   |  9 ++--
>  arch/arm/tools/syscalltbl.sh  | 22 
>  arch/ia64/kernel/entry.S  |  3 +-
>  arch/ia64/kernel/syscalls/Makefile| 19 +++
>  arch/ia64/kernel/syscalls/syscalltbl.sh   | 32 ---
>  arch/m68k/kernel/syscalls/Makefile| 18 +++
>  arch/m68k/kernel/syscalls/syscalltbl.sh   | 32 ---
>  arch/m68k/kernel/syscalltable.S   |  3 +-
>  arch/microblaze/kernel/syscall_table.S|  3 +-
>  arch/microblaze/kern

[PATCH v2] powerpc/pseries: Don't enforce MSI affinity with kdump

2021-02-15 Thread Greg Kurz
Depending on the number of online CPUs in the original kernel, it is
likely for CPU #0 to be offline in a kdump kernel. The associated IRQs
in the affinity mappings provided by irq_create_affinity_masks() are
thus not started by irq_startup(), as per-design with managed IRQs.

This can be a problem with multi-queue block devices driven by blk-mq :
such a non-started IRQ is very likely paired with the single queue
enforced by blk-mq during kdump (see blk_mq_alloc_tag_set()). This
causes the device to remain silent and likely hangs the guest at
some point.

This is a regression caused by commit 9ea69a55b3b9 ("powerpc/pseries:
Pass MSI affinity to irq_create_mapping()"). Note that this only happens
with the XIVE interrupt controller because XICS has a workaround to bypass
affinity, which is activated during kdump with the "noirqdistrib" kernel
parameter.

The issue comes from a combination of factors:
- discrepancy between the number of queues detected by the multi-queue
  block driver, that was used to create the MSI vectors, and the single
  queue mode enforced later on by blk-mq because of kdump (i.e. keeping
  all queues fixes the issue)
- CPU#0 offline (i.e. kdump always succeed with CPU#0)

Given that I couldn't reproduce on x86, which seems to always have CPU#0
online even during kdump, I'm not sure where this should be fixed. Hence
going for another approach : fine-grained affinity is for performance
and we don't really care about that during kdump. Simply revert to the
previous working behavior of ignoring affinity masks in this case only.

Fixes: 9ea69a55b3b9 ("powerpc/pseries: Pass MSI affinity to 
irq_create_mapping()")
Cc: lviv...@redhat.com
Cc: sta...@vger.kernel.org
Reviewed-by: Laurent Vivier 
Reviewed-by: Cédric Le Goater 
Signed-off-by: Greg Kurz 
---

v2: - added missing #include 

 arch/powerpc/platforms/pseries/msi.c | 25 +++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/msi.c 
b/arch/powerpc/platforms/pseries/msi.c
index b3ac2455faad..637300330507 100644
--- a/arch/powerpc/platforms/pseries/msi.c
+++ b/arch/powerpc/platforms/pseries/msi.c
@@ -4,6 +4,7 @@
  * Copyright 2006-2007 Michael Ellerman, IBM Corp.
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -458,8 +459,28 @@ static int rtas_setup_msi_irqs(struct pci_dev *pdev, int 
nvec_in, int type)
return hwirq;
}
 
-   virq = irq_create_mapping_affinity(NULL, hwirq,
-  entry->affinity);
+   /*
+* Depending on the number of online CPUs in the original
+* kernel, it is likely for CPU #0 to be offline in a kdump
+* kernel. The associated IRQs in the affinity mappings
+* provided by irq_create_affinity_masks() are thus not
+* started by irq_startup(), as per-design for managed IRQs.
+* This can be a problem with multi-queue block devices driven
+* by blk-mq : such a non-started IRQ is very likely paired
+* with the single queue enforced by blk-mq during kdump (see
+* blk_mq_alloc_tag_set()). This causes the device to remain
+* silent and likely hangs the guest at some point.
+*
+* We don't really care for fine-grained affinity when doing
+* kdump actually : simply ignore the pre-computed affinity
+* masks in this case and let the default mask with all CPUs
+* be used when creating the IRQ mappings.
+*/
+   if (is_kdump_kernel())
+   virq = irq_create_mapping(NULL, hwirq);
+   else
+   virq = irq_create_mapping_affinity(NULL, hwirq,
+  entry->affinity);
 
if (!virq) {
pr_debug("rtas_msi: Failed mapping hwirq %d\n", hwirq);
-- 
2.26.2