Re: [PATCH 4/4] mm/vmalloc: Hugepage vmalloc mappings

2019-06-09 Thread Nicholas Piggin
Nicholas Piggin's on June 10, 2019 2:38 pm:
> +static int vmap_hpages_range(unsigned long start, unsigned long end,
> +pgprot_t prot, struct page **pages,
> +unsigned int page_shift)
> +{
> + BUG_ON(page_shift != PAGE_SIZE);
> + return vmap_pages_range(start, end, prot, pages);
> +}

That's a false positive BUG_ON for !HUGE_VMAP configs. I'll fix that
and repost after a round of feedback.

Thanks,
Nick



Re: [PATCH 2/4] arm64: support huge vmap vmalloc

2019-06-09 Thread Anshuman Khandual



On 06/10/2019 10:08 AM, Nicholas Piggin wrote:
> Applying huge vmap to vmalloc requires vmalloc_to_page to walk huge
> pages. Define pud_large and pmd_large to support this.
> 
> Signed-off-by: Nicholas Piggin 
> ---
>  arch/arm64/include/asm/pgtable.h | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/pgtable.h 
> b/arch/arm64/include/asm/pgtable.h
> index 2c41b04708fe..30fe7b344bf7 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -428,6 +428,7 @@ extern pgprot_t phys_mem_access_prot(struct file *file, 
> unsigned long pfn,
>PMD_TYPE_TABLE)
>  #define pmd_sect(pmd)((pmd_val(pmd) & PMD_TYPE_MASK) == \
>PMD_TYPE_SECT)
> +#define pmd_large(pmd)   pmd_sect(pmd)
>  
>  #if defined(CONFIG_ARM64_64K_PAGES) || CONFIG_PGTABLE_LEVELS < 3
>  #define pud_sect(pud)(0)
> @@ -438,6 +439,7 @@ extern pgprot_t phys_mem_access_prot(struct file *file, 
> unsigned long pfn,
>  #define pud_table(pud)   ((pud_val(pud) & PUD_TYPE_MASK) == \
>PUD_TYPE_TABLE)
>  #endif
> +#define pud_large(pud)   pud_sect(pud)
>  
>  extern pgd_t init_pg_dir[PTRS_PER_PGD];
>  extern pgd_t init_pg_end[];

Another series (I guess not merged yet) is trying to add these wrappers
on arm64 (https://patchwork.kernel.org/patch/10883887/).


Re: [PATCH 1/4] mm: Move ioremap page table mapping function to mm/

2019-06-09 Thread Anshuman Khandual



On 06/10/2019 10:08 AM, Nicholas Piggin wrote:
> ioremap_page_range is a generic function to create a kernel virtual
> mapping, move it to mm/vmalloc.c and rename it vmap_range.

Absolutely. It belongs in mm/vmalloc.c as its a kernel virtual range.
But what is the rationale of changing the name to vmap_range ?
 
> 
> For clarity with this move, also:
> - Rename vunmap_page_range (vmap_range's inverse) to vunmap_range.

Will be inverse for both vmap_range() and vmap_page[s]_range() ?

> - Rename vmap_page_range (which takes a page array) to vmap_pages.

s/vmap_pages/vmap_pages_range instead here ^^

This deviates from the subject of this patch that it is related to
ioremap only. I believe what this patch intends is to create

- vunmap_range() takes [VA range]

This will be the common kernel virtual range tear down
function for ranges created either with vmap_range() or
vmap_pages_range(). Is that correct ?

- vmap_range() takes [VA range, PA range, prot]
- vmap_pages_range() takes [VA range, struct pages, prot] 

Can we re-order the arguments (pages <--> prot) for vmap_pages_range()
just to make it sync with vmap_range() ?

static int vmap_pages_range(unsigned long start, unsigned long end,
   pgprot_t prot, struct page **pages)


Re: [PATCH kernel v3 0/3] powerpc/ioda2: Yet another attempt to allow DMA masks between 32 and 59

2019-06-09 Thread Alexey Kardashevskiy



On 07/06/2019 11:41, Alistair Popple wrote:
> On Thursday, 6 June 2019 10:07:54 PM AEST Oliver wrote:
>> On Thu, Jun 6, 2019 at 5:17 PM Alistair Popple  
> wrote:
>>> I have been hitting EEH address errors testing this with some network
>>> cards which map/unmap DMA addresses more frequently. For example:
>>>
>>> PHB4 PHB#5 Diag-data (Version: 1)
>>> brdgCtl:0002
>>> RootSts:00060020 00402000 a0220008 00100107 0800
>>> PhbSts: 001c 001c
>>> Lem:00010080  0080
>>> PhbErr: 0280 0200 214898000240
>>> a0084000 RxeTceErr:  2000 2000
>>> c000  PblErr: 0002
>>> 0002   RegbErr:   
>>> 0040 0040 61000c48 
>>> PE[000] A/B: 8300b038 8000
>>>
>>> Interestingly the PE[000] A/B data is the same across different cards
>>> and drivers.
>>
>> TCE page fault due to permissions so odds are the DMA address was unmapped.
>>
>> What cards did you get this with? I tried with one of the common
>> BCM5719 NICs and generated network traffic by using rsync to copy a
>> linux git tree to the system and it worked fine.
> 
> Personally I've seen it with the BCM5719 with the driver modified to set a 
> DMA 
> mask of 48 bits instead of 64 and using scp to copy a random 1GB file to the 
> system repeatedly until it crashes. 
> 
> I have also had reports of someone hitting the same error using a Mellanox 
> CX-5 adaptor with a similar driver modification.
> 
> - Alistair


Seems to be a race (I could see broken TCEs), this helps on my setup:


diff --git a/arch/powerpc/include/asm/iommu.h
b/arch/powerpc/include/asm/iommu.h
index f920697c03ef..215351e16ae8 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -113,6 +113,7 @@ struct iommu_table {
int it_nid;
unsigned long it_reserved_start; /* Start of not-DMA-able (MMIO)
area */
unsigned long it_reserved_end;
+   spinlock_t lock;
 };

 #define IOMMU_TABLE_USERSPACE_ENTRY_RO(tbl, entry) \
diff --git a/arch/powerpc/platforms/powernv/pci-ioda-tce.c
b/arch/powerpc/platforms/powernv/pci-ioda-tce.c
index c75ec37bf0cd..7045b6518243 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda-tce.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda-tce.c
@@ -29,6 +29,7 @@ void pnv_pci_setup_iommu_table(struct iommu_table *tbl,
tbl->it_size = tce_size >> 3;
tbl->it_busno = 0;
tbl->it_type = TCE_PCI;
+   spin_lock_init(>lock);
 }

 static __be64 *pnv_alloc_tce_level(int nid, unsigned int shift)
@@ -65,14 +66,17 @@ static __be64 *pnv_tce(struct iommu_table *tbl, bool
user, long idx, bool alloc)

if (!alloc)
return NULL;
-
-   tmp2 = pnv_alloc_tce_level(tbl->it_nid,
-   ilog2(tbl->it_level_size) + 3);
-   if (!tmp2)
-   return NULL;
-
-   tmp[n] = cpu_to_be64(__pa(tmp2) |
-   TCE_PCI_READ | TCE_PCI_WRITE);
+   }
+   spin_lock(>lock);
+   if (tmp[n] == 0) {
+   tmp2 = pnv_alloc_tce_level(tbl->it_nid,
+
ilog2(tbl->it_level_size) + 3);
+   if (!tmp2)
+   return NULL;
+   tmp[n] = cpu_to_be64(__pa(tmp2) |
+   TCE_PCI_READ |
TCE_PCI_WRITE);
+   }
+   spin_unlock(>lock);
}
tce = be64_to_cpu(tmp[n]);




-- 
Alexey


Re: [RFC V3] mm: Generalize and rename notify_page_fault() as kprobe_page_fault()

2019-06-09 Thread Anshuman Khandual



On 06/10/2019 10:27 AM, Dave Hansen wrote:
> On 6/9/19 9:34 PM, Anshuman Khandual wrote:
>>> Do you really think this is easier to read?
>>>
>>> Why not just move the x86 version to include/linux/kprobes.h, and replace
>>> the int with bool?
>> Will just return bool directly without an additional variable here as 
>> suggested
>> before. But for the conditional statement, I guess the proposed one here is 
>> more
>> compact than the x86 one.
> 
> FWIW, I don't think "compact" is generally a good goal for code.  Being
> readable is 100x more important than being compact and being un-compact
> is only a problem when it hurts readability.
> 
> For a function like the one in question, having the individual return
> conditions clearly commented is way more important than saving 10 lines
> of code.

Fair enough. Will keep the existing code flow from x86.


Re: [RFC V3] mm: Generalize and rename notify_page_fault() as kprobe_page_fault()

2019-06-09 Thread Dave Hansen
On 6/9/19 9:34 PM, Anshuman Khandual wrote:
>> Do you really think this is easier to read?
>>
>> Why not just move the x86 version to include/linux/kprobes.h, and replace
>> the int with bool?
> Will just return bool directly without an additional variable here as 
> suggested
> before. But for the conditional statement, I guess the proposed one here is 
> more
> compact than the x86 one.

FWIW, I don't think "compact" is generally a good goal for code.  Being
readable is 100x more important than being compact and being un-compact
is only a problem when it hurts readability.

For a function like the one in question, having the individual return
conditions clearly commented is way more important than saving 10 lines
of code.


[PATCH 4/4] mm/vmalloc: Hugepage vmalloc mappings

2019-06-09 Thread Nicholas Piggin
For platforms that define HAVE_ARCH_HUGE_VMAP, have vmap allow vmalloc to
allocate huge pages and map them

This brings dTLB misses for linux kernel tree `git diff` from 45,000 to
8,000 on a Kaby Lake KVM guest with 8MB dentry hash and mitigations=off
(performance is in the noise, under 1% difference, page tables are likely
to be well cached for this workload). Similar numbers are seen on POWER9.

Signed-off-by: Nicholas Piggin 
---
 include/asm-generic/4level-fixup.h |   1 +
 include/asm-generic/5level-fixup.h |   1 +
 include/linux/vmalloc.h|   1 +
 mm/vmalloc.c   | 132 +++--
 4 files changed, 107 insertions(+), 28 deletions(-)

diff --git a/include/asm-generic/4level-fixup.h 
b/include/asm-generic/4level-fixup.h
index e3667c9a33a5..3cc65a4dd093 100644
--- a/include/asm-generic/4level-fixup.h
+++ b/include/asm-generic/4level-fixup.h
@@ -20,6 +20,7 @@
 #define pud_none(pud)  0
 #define pud_bad(pud)   0
 #define pud_present(pud)   1
+#define pud_large(pud) 0
 #define pud_ERROR(pud) do { } while (0)
 #define pud_clear(pud) pgd_clear(pud)
 #define pud_val(pud)   pgd_val(pud)
diff --git a/include/asm-generic/5level-fixup.h 
b/include/asm-generic/5level-fixup.h
index bb6cb347018c..c4377db09a4f 100644
--- a/include/asm-generic/5level-fixup.h
+++ b/include/asm-generic/5level-fixup.h
@@ -22,6 +22,7 @@
 #define p4d_none(p4d)  0
 #define p4d_bad(p4d)   0
 #define p4d_present(p4d)   1
+#define p4d_large(p4d) 0
 #define p4d_ERROR(p4d) do { } while (0)
 #define p4d_clear(p4d) pgd_clear(p4d)
 #define p4d_val(p4d)   pgd_val(p4d)
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 812bea5866d6..4c92dc608928 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -42,6 +42,7 @@ struct vm_struct {
unsigned long   size;
unsigned long   flags;
struct page **pages;
+   unsigned intpage_shift;
unsigned intnr_pages;
phys_addr_t phys_addr;
const void  *caller;
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index dd27cfb29b10..0cf8e861caeb 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -36,6 +36,7 @@
 #include 
 
 #include 
+#include 
 #include 
 #include 
 
@@ -440,6 +441,41 @@ static int vmap_pages_range(unsigned long start, unsigned 
long end,
return ret;
 }
 
+#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
+static int vmap_hpages_range(unsigned long start, unsigned long end,
+  pgprot_t prot, struct page **pages,
+  unsigned int page_shift)
+{
+   unsigned long addr = start;
+   unsigned int i, nr = (end - start) >> (PAGE_SHIFT + page_shift);
+
+   for (i = 0; i < nr; i++) {
+   int err;
+
+   err = vmap_range_noflush(addr,
+   addr + (PAGE_SIZE << page_shift),
+   __pa(page_address(pages[i])), prot,
+   PAGE_SHIFT + page_shift);
+   if (err)
+   return err;
+
+   addr += PAGE_SIZE << page_shift;
+   }
+   flush_cache_vmap(start, end);
+
+   return nr;
+}
+#else
+static int vmap_hpages_range(unsigned long start, unsigned long end,
+  pgprot_t prot, struct page **pages,
+  unsigned int page_shift)
+{
+   BUG_ON(page_shift != PAGE_SIZE);
+   return vmap_pages_range(start, end, prot, pages);
+}
+#endif
+
+
 int is_vmalloc_or_module_addr(const void *x)
 {
/*
@@ -462,7 +498,7 @@ struct page *vmalloc_to_page(const void *vmalloc_addr)
 {
unsigned long addr = (unsigned long) vmalloc_addr;
struct page *page = NULL;
-   pgd_t *pgd = pgd_offset_k(addr);
+   pgd_t *pgd;
p4d_t *p4d;
pud_t *pud;
pmd_t *pmd;
@@ -474,27 +510,38 @@ struct page *vmalloc_to_page(const void *vmalloc_addr)
 */
VIRTUAL_BUG_ON(!is_vmalloc_or_module_addr(vmalloc_addr));
 
+   pgd = pgd_offset_k(addr);
if (pgd_none(*pgd))
return NULL;
+
p4d = p4d_offset(pgd, addr);
if (p4d_none(*p4d))
return NULL;
-   pud = pud_offset(p4d, addr);
+#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
+   if (p4d_large(*p4d))
+   return p4d_page(*p4d) + ((addr & ~P4D_MASK) >> PAGE_SHIFT);
+#endif
+   if (WARN_ON_ONCE(p4d_bad(*p4d)))
+   return NULL;
 
-   /*
-* Don't dereference bad PUD or PMD (below) entries. This will also
-* identify huge mappings, which we may encounter on architectures
-* that define CONFIG_HAVE_ARCH_HUGE_VMAP=y. Such regions will be
-* identified as 

[PATCH 3/4] powerpc/64s/radix: support huge vmap vmalloc

2019-06-09 Thread Nicholas Piggin
Applying huge vmap to vmalloc requires vmalloc_to_page to walk huge
pages. Define pud_large and pmd_large to support this.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/book3s/64/pgtable.h | 24 
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 5faceeefd9f9..8e02077b11fb 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -923,6 +923,11 @@ static inline int pud_present(pud_t pud)
return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PRESENT));
 }
 
+static inline int pud_large(pud_t pud)
+{
+   return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PTE));
+}
+
 extern struct page *pud_page(pud_t pud);
 extern struct page *pmd_page(pmd_t pmd);
 static inline pte_t pud_pte(pud_t pud)
@@ -966,6 +971,11 @@ static inline int pgd_present(pgd_t pgd)
return !!(pgd_raw(pgd) & cpu_to_be64(_PAGE_PRESENT));
 }
 
+static inline int pgd_large(pgd_t pgd)
+{
+   return !!(pgd_raw(pgd) & cpu_to_be64(_PAGE_PTE));
+}
+
 static inline pte_t pgd_pte(pgd_t pgd)
 {
return __pte_raw(pgd_raw(pgd));
@@ -1091,6 +1101,11 @@ static inline pte_t *pmdp_ptep(pmd_t *pmd)
 #define pmd_mk_savedwrite(pmd) pte_pmd(pte_mk_savedwrite(pmd_pte(pmd)))
 #define pmd_clear_savedwrite(pmd)  
pte_pmd(pte_clear_savedwrite(pmd_pte(pmd)))
 
+static inline int pmd_large(pmd_t pmd)
+{
+   return !!(pmd_raw(pmd) & cpu_to_be64(_PAGE_PTE));
+}
+
 #ifdef CONFIG_HAVE_ARCH_SOFT_DIRTY
 #define pmd_soft_dirty(pmd)pte_soft_dirty(pmd_pte(pmd))
 #define pmd_mksoft_dirty(pmd)  pte_pmd(pte_mksoft_dirty(pmd_pte(pmd)))
@@ -1159,15 +1174,6 @@ pmd_hugepage_update(struct mm_struct *mm, unsigned long 
addr, pmd_t *pmdp,
return hash__pmd_hugepage_update(mm, addr, pmdp, clr, set);
 }
 
-/*
- * returns true for pmd migration entries, THP, devmap, hugetlb
- * But compile time dependent on THP config
- */
-static inline int pmd_large(pmd_t pmd)
-{
-   return !!(pmd_raw(pmd) & cpu_to_be64(_PAGE_PTE));
-}
-
 static inline pmd_t pmd_mknotpresent(pmd_t pmd)
 {
return __pmd(pmd_val(pmd) & ~_PAGE_PRESENT);
-- 
2.20.1



[PATCH 2/4] arm64: support huge vmap vmalloc

2019-06-09 Thread Nicholas Piggin
Applying huge vmap to vmalloc requires vmalloc_to_page to walk huge
pages. Define pud_large and pmd_large to support this.

Signed-off-by: Nicholas Piggin 
---
 arch/arm64/include/asm/pgtable.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 2c41b04708fe..30fe7b344bf7 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -428,6 +428,7 @@ extern pgprot_t phys_mem_access_prot(struct file *file, 
unsigned long pfn,
 PMD_TYPE_TABLE)
 #define pmd_sect(pmd)  ((pmd_val(pmd) & PMD_TYPE_MASK) == \
 PMD_TYPE_SECT)
+#define pmd_large(pmd) pmd_sect(pmd)
 
 #if defined(CONFIG_ARM64_64K_PAGES) || CONFIG_PGTABLE_LEVELS < 3
 #define pud_sect(pud)  (0)
@@ -438,6 +439,7 @@ extern pgprot_t phys_mem_access_prot(struct file *file, 
unsigned long pfn,
 #define pud_table(pud) ((pud_val(pud) & PUD_TYPE_MASK) == \
 PUD_TYPE_TABLE)
 #endif
+#define pud_large(pud) pud_sect(pud)
 
 extern pgd_t init_pg_dir[PTRS_PER_PGD];
 extern pgd_t init_pg_end[];
-- 
2.20.1



[PATCH 1/4] mm: Move ioremap page table mapping function to mm/

2019-06-09 Thread Nicholas Piggin
ioremap_page_range is a generic function to create a kernel virtual
mapping, move it to mm/vmalloc.c and rename it vmap_range.

For clarity with this move, also:
- Rename vunmap_page_range (vmap_range's inverse) to vunmap_range.
- Rename vmap_page_range (which takes a page array) to vmap_pages.

Signed-off-by: Nicholas Piggin 
---

Fixed up the arm64 compile errors, fixed a few bugs, and tidied
things up a bit more.

Have tested powerpc and x86 but not arm64, would appreciate a review
and test of the arm64 patch if possible.

 include/linux/vmalloc.h |   3 +
 lib/ioremap.c   | 173 +++---
 mm/vmalloc.c| 228 
 3 files changed, 229 insertions(+), 175 deletions(-)

diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 51e131245379..812bea5866d6 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -147,6 +147,9 @@ extern struct vm_struct *find_vm_area(const void *addr);
 extern int map_vm_area(struct vm_struct *area, pgprot_t prot,
struct page **pages);
 #ifdef CONFIG_MMU
+extern int vmap_range(unsigned long addr,
+  unsigned long end, phys_addr_t phys_addr, pgprot_t prot,
+  unsigned int max_page_shift);
 extern int map_kernel_range_noflush(unsigned long start, unsigned long size,
pgprot_t prot, struct page **pages);
 extern void unmap_kernel_range_noflush(unsigned long addr, unsigned long size);
diff --git a/lib/ioremap.c b/lib/ioremap.c
index 063213685563..e13946da8ec3 100644
--- a/lib/ioremap.c
+++ b/lib/ioremap.c
@@ -58,165 +58,24 @@ static inline int ioremap_pud_enabled(void) { return 0; }
 static inline int ioremap_pmd_enabled(void) { return 0; }
 #endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */
 
-static int ioremap_pte_range(pmd_t *pmd, unsigned long addr,
-   unsigned long end, phys_addr_t phys_addr, pgprot_t prot)
-{
-   pte_t *pte;
-   u64 pfn;
-
-   pfn = phys_addr >> PAGE_SHIFT;
-   pte = pte_alloc_kernel(pmd, addr);
-   if (!pte)
-   return -ENOMEM;
-   do {
-   BUG_ON(!pte_none(*pte));
-   set_pte_at(_mm, addr, pte, pfn_pte(pfn, prot));
-   pfn++;
-   } while (pte++, addr += PAGE_SIZE, addr != end);
-   return 0;
-}
-
-static int ioremap_try_huge_pmd(pmd_t *pmd, unsigned long addr,
-   unsigned long end, phys_addr_t phys_addr,
-   pgprot_t prot)
-{
-   if (!ioremap_pmd_enabled())
-   return 0;
-
-   if ((end - addr) != PMD_SIZE)
-   return 0;
-
-   if (!IS_ALIGNED(phys_addr, PMD_SIZE))
-   return 0;
-
-   if (pmd_present(*pmd) && !pmd_free_pte_page(pmd, addr))
-   return 0;
-
-   return pmd_set_huge(pmd, phys_addr, prot);
-}
-
-static inline int ioremap_pmd_range(pud_t *pud, unsigned long addr,
-   unsigned long end, phys_addr_t phys_addr, pgprot_t prot)
-{
-   pmd_t *pmd;
-   unsigned long next;
-
-   pmd = pmd_alloc(_mm, pud, addr);
-   if (!pmd)
-   return -ENOMEM;
-   do {
-   next = pmd_addr_end(addr, end);
-
-   if (ioremap_try_huge_pmd(pmd, addr, next, phys_addr, prot))
-   continue;
-
-   if (ioremap_pte_range(pmd, addr, next, phys_addr, prot))
-   return -ENOMEM;
-   } while (pmd++, phys_addr += (next - addr), addr = next, addr != end);
-   return 0;
-}
-
-static int ioremap_try_huge_pud(pud_t *pud, unsigned long addr,
-   unsigned long end, phys_addr_t phys_addr,
-   pgprot_t prot)
-{
-   if (!ioremap_pud_enabled())
-   return 0;
-
-   if ((end - addr) != PUD_SIZE)
-   return 0;
-
-   if (!IS_ALIGNED(phys_addr, PUD_SIZE))
-   return 0;
-
-   if (pud_present(*pud) && !pud_free_pmd_page(pud, addr))
-   return 0;
-
-   return pud_set_huge(pud, phys_addr, prot);
-}
-
-static inline int ioremap_pud_range(p4d_t *p4d, unsigned long addr,
-   unsigned long end, phys_addr_t phys_addr, pgprot_t prot)
-{
-   pud_t *pud;
-   unsigned long next;
-
-   pud = pud_alloc(_mm, p4d, addr);
-   if (!pud)
-   return -ENOMEM;
-   do {
-   next = pud_addr_end(addr, end);
-
-   if (ioremap_try_huge_pud(pud, addr, next, phys_addr, prot))
-   continue;
-
-   if (ioremap_pmd_range(pud, addr, next, phys_addr, prot))
-   return -ENOMEM;
-   } while (pud++, phys_addr += (next - addr), addr = next, addr != end);
-   return 0;
-}
-
-static int ioremap_try_huge_p4d(p4d_t *p4d, unsigned long addr,
-   unsigned long end, phys_addr_t phys_addr,
-   pgprot_t prot)
-{
-   if 

Re: [RFC V3] mm: Generalize and rename notify_page_fault() as kprobe_page_fault()

2019-06-09 Thread Anshuman Khandual



On 06/07/2019 08:36 PM, Dave Hansen wrote:
> On 6/7/19 3:34 AM, Anshuman Khandual wrote:
>> +static nokprobe_inline bool kprobe_page_fault(struct pt_regs *regs,
>> +  unsigned int trap)
>> +{
>> +int ret = 0;
>> +
>> +/*
>> + * To be potentially processing a kprobe fault and to be allowed
>> + * to call kprobe_running(), we have to be non-preemptible.
>> + */
>> +if (kprobes_built_in() && !preemptible() && !user_mode(regs)) {
>> +if (kprobe_running() && kprobe_fault_handler(regs, trap))
>> +ret = 1;
>> +}
>> +return ret;
>> +}
> 
> Nits: Other that taking the nice, readable, x86 one and globbing it onto
> a single line, looks OK to me.  It does seem a _bit_ silly to go to the
> trouble of converting to 'bool' and then using 0/1 and an 'int'
> internally instead of true/false and a bool, though.  It's also not a

Changing to 'bool'...

> horrible thing to add a single line comment to this sucker to say:
> 
> /* returns true if kprobes handled the fault */
> 

Picking this in-code comment.

> In any case, and even if you don't clean any of this up:
> 
> Reviewed-by: Dave Hansen 
> 

Thanks !


Re: [RFC V3] mm: Generalize and rename notify_page_fault() as kprobe_page_fault()

2019-06-09 Thread Anshuman Khandual



On 06/08/2019 01:42 AM, Matthew Wilcox wrote:
> Before:
> 
>> @@ -46,23 +46,6 @@ kmmio_fault(struct pt_regs *regs, unsigned long addr)
>>  return 0;
>>  }
>>  
>> -static nokprobe_inline int kprobes_fault(struct pt_regs *regs)
>> -{
>> -if (!kprobes_built_in())
>> -return 0;
>> -if (user_mode(regs))
>> -return 0;
>> -/*
>> - * To be potentially processing a kprobe fault and to be allowed to call
>> - * kprobe_running(), we have to be non-preemptible.
>> - */
>> -if (preemptible())
>> -return 0;
>> -if (!kprobe_running())
>> -return 0;
>> -return kprobe_fault_handler(regs, X86_TRAP_PF);
>> -}
> 
> After:
> 
>> +++ b/include/linux/kprobes.h
>> @@ -458,4 +458,20 @@ static inline bool is_kprobe_optinsn_slot(unsigned long 
>> addr)
>>  }
>>  #endif
>>  
>> +static nokprobe_inline bool kprobe_page_fault(struct pt_regs *regs,
>> +  unsigned int trap)
>> +{
>> +int ret = 0;
>> +
>> +/*
>> + * To be potentially processing a kprobe fault and to be allowed
>> + * to call kprobe_running(), we have to be non-preemptible.
>> + */
>> +if (kprobes_built_in() && !preemptible() && !user_mode(regs)) {
>> +if (kprobe_running() && kprobe_fault_handler(regs, trap))
>> +ret = 1;
>> +}
>> +return ret;
>> +}
> 
> Do you really think this is easier to read?
> 
> Why not just move the x86 version to include/linux/kprobes.h, and replace
> the int with bool?

Will just return bool directly without an additional variable here as suggested
before. But for the conditional statement, I guess the proposed one here is more
compact than the x86 one.

> 
> On Fri, Jun 07, 2019 at 04:04:15PM +0530, Anshuman Khandual wrote:
>> Very similar definitions for notify_page_fault() are being used by multiple
>> architectures duplicating much of the same code. This attempts to unify all
>> of them into a generic implementation, rename it as kprobe_page_fault() and
>> then move it to a common header.
> 
> I think this description suffers from having been written for v1 of
> this patch.  It describes what you _did_, but it's not what this patch
> currently _is_.
> 
> Why not something like:
> 
> Architectures which support kprobes have very similar boilerplate around
> calling kprobe_fault_handler().  Use a helper function in kprobes.h to
> unify them, based on the x86 code.
> 
> This changes the behaviour for other architectures when preemption
> is enabled.  Previously, they would have disabled preemption while
> calling the kprobe handler.  However, preemption would be disabled
> if this fault was due to a kprobe, so we know the fault was not due
> to a kprobe handler and can simply return failure.  This behaviour was
> introduced in commit a980c0ef9f6d ("x86/kprobes: Refactor kprobes_fault()
> like kprobe_exceptions_notify()")

Will replace commit message with above.

> 
>>  arch/arm/mm/fault.c  | 24 +---
>>  arch/arm64/mm/fault.c| 24 +---
>>  arch/ia64/mm/fault.c | 24 +---
>>  arch/powerpc/mm/fault.c  | 23 ++-
>>  arch/s390/mm/fault.c | 16 +---
>>  arch/sh/mm/fault.c   | 18 ++
>>  arch/sparc/mm/fault_64.c | 16 +---
>>  arch/x86/mm/fault.c  | 21 ++---
>>  include/linux/kprobes.h  | 16 
> 
> What about arc and mips?

+ Vineet Gupta  
+ linux-snps-...@lists.infradead.org

+ James Hogan 
+ Paul Burton 
+ Ralf Baechle 
+ linux-m...@vger.kernel.org

Both the above architectures dont call kprobe_fault_handler() from the
page fault context (do_page_fault() to be specific). Though it gets called
from mips kprobe_exceptions_notify (DIE_PAGE_FAULT). Am I missing something
here ?


[PATCH 3/3] powerpc/64s/radix: Enable HAVE_ARCH_HUGE_VMAP

2019-06-09 Thread Nicholas Piggin
This sets the HAVE_ARCH_HUGE_VMAP option, and defines the required
page table functions.

This enables huge (2MB and 1GB) ioremap mappings. I don't have a
benchmark for this change, but huge vmap will be used by a later core
kernel change to enable huge vmalloc memory mappings. This improves
cached `git diff` performance by about 5% on a 2-node POWER9 with 32MB
size dentry cache hash.

  Profiling git diff dTLB misses with a vanilla kernel:

  81.75%  git  [kernel.vmlinux][k] __d_lookup_rcu
   7.21%  git  [kernel.vmlinux][k] strncpy_from_user
   1.77%  git  [kernel.vmlinux][k] find_get_entry
   1.59%  git  [kernel.vmlinux][k] kmem_cache_free

40,168  dTLB-miss
   0.100342754 seconds time elapsed

  With powerpc huge vmalloc:

 2,987  dTLB-miss
   0.095933138 seconds time elapsed

Signed-off-by: Nicholas Piggin 
---
 .../admin-guide/kernel-parameters.txt |   2 +-
 arch/powerpc/Kconfig  |   1 +
 arch/powerpc/include/asm/book3s/64/pgtable.h  |   8 ++
 arch/powerpc/mm/book3s64/radix_pgtable.c  | 100 ++
 4 files changed, 110 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 138f6664b2e2..a4c3538857e9 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2927,7 +2927,7 @@
register save and restore. The kernel will only save
legacy floating-point registers on task switch.
 
-   nohugeiomap [KNL,x86] Disable kernel huge I/O mappings.
+   nohugeiomap [KNL,x86,PPC] Disable kernel huge I/O mappings.
 
nosmt   [KNL,S390] Disable symmetric multithreading (SMT).
Equivalent to smt=1.
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 8c1c636308c8..f0e5b38d52e8 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -167,6 +167,7 @@ config PPC
select GENERIC_STRNLEN_USER
select GENERIC_TIME_VSYSCALL
select HAVE_ARCH_AUDITSYSCALL
+   select HAVE_ARCH_HUGE_VMAP  if PPC_BOOK3S_64 && 
PPC_RADIX_MMU
select HAVE_ARCH_JUMP_LABEL
select HAVE_ARCH_KASAN  if PPC32
select HAVE_ARCH_KGDB
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index ccf00a8b98c6..5faceeefd9f9 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -274,6 +274,14 @@ extern unsigned long __vmalloc_end;
 #define VMALLOC_START  __vmalloc_start
 #define VMALLOC_END__vmalloc_end
 
+static inline unsigned int ioremap_max_order(void)
+{
+   if (radix_enabled())
+   return PUD_SHIFT;
+   return 7 + PAGE_SHIFT; /* default from linux/vmalloc.h */
+}
+#define IOREMAP_MAX_ORDER ioremap_max_order()
+
 extern unsigned long __kernel_virt_start;
 extern unsigned long __kernel_virt_size;
 extern unsigned long __kernel_io_start;
diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c 
b/arch/powerpc/mm/book3s64/radix_pgtable.c
index db993bc1aef3..35cf04bbd50b 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -1124,6 +1124,106 @@ void radix__ptep_modify_prot_commit(struct 
vm_area_struct *vma,
set_pte_at(mm, addr, ptep, pte);
 }
 
+int __init arch_ioremap_pud_supported(void)
+{
+   /* HPT does not cope with large pages in the vmalloc area */
+   return radix_enabled();
+}
+
+int __init arch_ioremap_pmd_supported(void)
+{
+   return radix_enabled();
+}
+
+int p4d_free_pud_page(p4d_t *p4d, unsigned long addr)
+{
+   return 0;
+}
+
+int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
+{
+   pte_t *ptep = (pte_t *)pud;
+   pte_t new_pud = pfn_pte(__phys_to_pfn(addr), prot);
+
+   if (!radix_enabled())
+   return 0;
+
+   set_pte_at(_mm, 0 /* radix unused */, ptep, new_pud);
+
+   return 1;
+}
+
+int pud_clear_huge(pud_t *pud)
+{
+   if (pud_huge(*pud)) {
+   pud_clear(pud);
+   return 1;
+   }
+
+   return 0;
+}
+
+int pud_free_pmd_page(pud_t *pud, unsigned long addr)
+{
+   pmd_t *pmd;
+   int i;
+
+   pmd = (pmd_t *)pud_page_vaddr(*pud);
+   pud_clear(pud);
+
+   flush_tlb_kernel_range(addr, addr + PUD_SIZE);
+
+   for (i = 0; i < PTRS_PER_PMD; i++) {
+   if (!pmd_none(pmd[i])) {
+   pte_t *pte;
+   pte = (pte_t *)pmd_page_vaddr(pmd[i]);
+
+   pte_free_kernel(_mm, pte);
+   }
+   }
+
+   pmd_free(_mm, pmd);
+
+   return 1;
+}
+
+int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot)
+{
+   pte_t *ptep = (pte_t *)pmd;
+   pte_t new_pmd = pfn_pte(__phys_to_pfn(addr), prot);
+
+ 

[PATCH 2/3] powerpc/64s/radix: ioremap use ioremap_page_range

2019-06-09 Thread Nicholas Piggin
Radix can use ioremap_page_range for ioremap, after slab is available.
This makes it possible to enable huge ioremap mapping support.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/book3s/64/radix.h |  3 +++
 arch/powerpc/mm/book3s64/pgtable.c | 21 +
 arch/powerpc/mm/book3s64/radix_pgtable.c   | 21 +
 arch/powerpc/mm/pgtable_64.c   |  2 +-
 4 files changed, 46 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/book3s/64/radix.h 
b/arch/powerpc/include/asm/book3s/64/radix.h
index 574eca33f893..e04a839cb5b9 100644
--- a/arch/powerpc/include/asm/book3s/64/radix.h
+++ b/arch/powerpc/include/asm/book3s/64/radix.h
@@ -266,6 +266,9 @@ extern void radix__vmemmap_remove_mapping(unsigned long 
start,
 extern int radix__map_kernel_page(unsigned long ea, unsigned long pa,
 pgprot_t flags, unsigned int psz);
 
+extern int radix__ioremap_range(unsigned long ea, phys_addr_t pa,
+   unsigned long size, pgprot_t prot, int nid);
+
 static inline unsigned long radix__get_tree_size(void)
 {
unsigned long rts_field;
diff --git a/arch/powerpc/mm/book3s64/pgtable.c 
b/arch/powerpc/mm/book3s64/pgtable.c
index ff98b663c83e..953850a602f7 100644
--- a/arch/powerpc/mm/book3s64/pgtable.c
+++ b/arch/powerpc/mm/book3s64/pgtable.c
@@ -450,3 +450,24 @@ int pmd_move_must_withdraw(struct spinlock *new_pmd_ptl,
 
return true;
 }
+
+int ioremap_range(unsigned long ea, phys_addr_t pa, unsigned long size, 
pgprot_t prot, int nid)
+{
+   unsigned long i;
+
+   if (radix_enabled())
+   return radix__ioremap_range(ea, pa, size, prot, nid);
+
+   for (i = 0; i < size; i += PAGE_SIZE) {
+   int err = map_kernel_page(ea + i, pa + i, prot);
+   if (err) {
+   if (slab_is_available())
+   unmap_kernel_range(ea, size);
+   else
+   WARN_ON_ONCE(1); /* Should clean up */
+   return err;
+   }
+   }
+
+   return 0;
+}
diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c 
b/arch/powerpc/mm/book3s64/radix_pgtable.c
index c9bcf428dd2b..db993bc1aef3 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -11,6 +11,7 @@
 
 #define pr_fmt(fmt) "radix-mmu: " fmt
 
+#include 
 #include 
 #include 
 #include 
@@ -1122,3 +1123,23 @@ void radix__ptep_modify_prot_commit(struct 
vm_area_struct *vma,
 
set_pte_at(mm, addr, ptep, pte);
 }
+
+int radix__ioremap_range(unsigned long ea, phys_addr_t pa, unsigned long size,
+   pgprot_t prot, int nid)
+{
+   if (likely(slab_is_available())) {
+   int err = ioremap_page_range(ea, ea + size, pa, prot);
+   if (err)
+   unmap_kernel_range(ea, size);
+   return err;
+   } else {
+   unsigned long i;
+
+   for (i = 0; i < size; i += PAGE_SIZE) {
+   int err = map_kernel_page(ea + i, pa + i, prot);
+   if (WARN_ON_ONCE(err)) /* Should clean up */
+   return err;
+   }
+   return 0;
+   }
+}
diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c
index 6bd3660388aa..63cd81130643 100644
--- a/arch/powerpc/mm/pgtable_64.c
+++ b/arch/powerpc/mm/pgtable_64.c
@@ -108,7 +108,7 @@ unsigned long ioremap_bot;
 unsigned long ioremap_bot = IOREMAP_BASE;
 #endif
 
-static int ioremap_range(unsigned long ea, phys_addr_t pa, unsigned long size, 
pgprot_t prot, int nid)
+int __weak ioremap_range(unsigned long ea, phys_addr_t pa, unsigned long size, 
pgprot_t prot, int nid)
 {
unsigned long i;
 
-- 
2.20.1



[PATCH 1/3] powerpc/64: __ioremap_at clean up in the error case

2019-06-09 Thread Nicholas Piggin
__ioremap_at error handling is wonky, it requires caller to clean up
after it. Implement a helper that does the map and error cleanup and
remove the requirement from the caller.

Signed-off-by: Nicholas Piggin 
---

This series is a different approach to the problem, using the generic
ioremap_page_range directly which reduces added code, and moves
the radix specific code into radix files. Thanks to Christophe for
pointing out various problems with the previous patch.

 arch/powerpc/mm/pgtable_64.c | 27 ---
 1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c
index d2d976ff8a0e..6bd3660388aa 100644
--- a/arch/powerpc/mm/pgtable_64.c
+++ b/arch/powerpc/mm/pgtable_64.c
@@ -108,14 +108,30 @@ unsigned long ioremap_bot;
 unsigned long ioremap_bot = IOREMAP_BASE;
 #endif
 
+static int ioremap_range(unsigned long ea, phys_addr_t pa, unsigned long size, 
pgprot_t prot, int nid)
+{
+   unsigned long i;
+
+   for (i = 0; i < size; i += PAGE_SIZE) {
+   int err = map_kernel_page(ea + i, pa + i, prot);
+   if (err) {
+   if (slab_is_available())
+   unmap_kernel_range(ea, size);
+   else
+   WARN_ON_ONCE(1); /* Should clean up */
+   return err;
+   }
+   }
+
+   return 0;
+}
+
 /**
  * __ioremap_at - Low level function to establish the page tables
  *for an IO mapping
  */
 void __iomem *__ioremap_at(phys_addr_t pa, void *ea, unsigned long size, 
pgprot_t prot)
 {
-   unsigned long i;
-
/* We don't support the 4K PFN hack with ioremap */
if (pgprot_val(prot) & H_PAGE_4K_PFN)
return NULL;
@@ -129,9 +145,8 @@ void __iomem *__ioremap_at(phys_addr_t pa, void *ea, 
unsigned long size, pgprot_
WARN_ON(((unsigned long)ea) & ~PAGE_MASK);
WARN_ON(size & ~PAGE_MASK);
 
-   for (i = 0; i < size; i += PAGE_SIZE)
-   if (map_kernel_page((unsigned long)ea + i, pa + i, prot))
-   return NULL;
+   if (ioremap_range((unsigned long)ea, pa, size, prot, NUMA_NO_NODE))
+   return NULL;
 
return (void __iomem *)ea;
 }
@@ -182,8 +197,6 @@ void __iomem * __ioremap_caller(phys_addr_t addr, unsigned 
long size,
 
area->phys_addr = paligned;
ret = __ioremap_at(paligned, area->addr, size, prot);
-   if (!ret)
-   vunmap(area->addr);
} else {
ret = __ioremap_at(paligned, (void *)ioremap_bot, size, prot);
if (ret)
-- 
2.20.1



Re: [PATCH net 0/3] ibmvnic: Fixes for device reset handling

2019-06-09 Thread David Miller
From: Thomas Falcon 
Date: Fri,  7 Jun 2019 16:03:52 -0500

> This series contains three unrelated fixes to issues seen during
> device resets. The first patch fixes an error when the driver requests
> to deactivate the link of an uninitialized device, resulting in a 
> failure to reset. Next, a patch to fix multicast transmission 
> failures seen after a driver reset. The final patch fixes mishandling
> of memory allocation failures during device initialization, which
> caused a kernel oops.

Series applied, thanks.


Re: [RFC V3] mm: Generalize and rename notify_page_fault() as kprobe_page_fault()

2019-06-09 Thread Anshuman Khandual



On 06/07/2019 09:01 PM, Christophe Leroy wrote:
> 
> 
> Le 07/06/2019 à 12:34, Anshuman Khandual a écrit :
>> Very similar definitions for notify_page_fault() are being used by multiple
>> architectures duplicating much of the same code. This attempts to unify all
>> of them into a generic implementation, rename it as kprobe_page_fault() and
>> then move it to a common header.
>>
>> kprobes_built_in() can detect CONFIG_KPROBES, hence new kprobe_page_fault()
>> need not be wrapped again within CONFIG_KPROBES. Trap number argument can
>> now contain upto an 'unsigned int' accommodating all possible platforms.
>>
>> kprobe_page_fault() goes the x86 way while dealing with preemption context.
>> As explained in these following commits the invoking context in itself must
>> be non-preemptible for kprobes processing context irrespective of whether
>> kprobe_running() or perhaps smp_processor_id() is safe or not. It does not
>> make much sense to continue when original context is preemptible. Instead
>> just bail out earlier.
>>
>> commit a980c0ef9f6d
>> ("x86/kprobes: Refactor kprobes_fault() like kprobe_exceptions_notify()")
>>
>> commit b506a9d08bae ("x86: code clarification patch to Kprobes arch code")
>>
>> Cc: linux-arm-ker...@lists.infradead.org
>> Cc: linux-i...@vger.kernel.org
>> Cc: linuxppc-dev@lists.ozlabs.org
>> Cc: linux-s...@vger.kernel.org
>> Cc: linux...@vger.kernel.org
>> Cc: sparcli...@vger.kernel.org
>> Cc: x...@kernel.org
>> Cc: Andrew Morton 
>> Cc: Michal Hocko 
>> Cc: Matthew Wilcox 
>> Cc: Mark Rutland 
>> Cc: Christophe Leroy 
>> Cc: Stephen Rothwell 
>> Cc: Andrey Konovalov 
>> Cc: Michael Ellerman 
>> Cc: Paul Mackerras 
>> Cc: Russell King 
>> Cc: Catalin Marinas 
>> Cc: Will Deacon 
>> Cc: Tony Luck 
>> Cc: Fenghua Yu 
>> Cc: Martin Schwidefsky 
>> Cc: Heiko Carstens 
>> Cc: Yoshinori Sato 
>> Cc: "David S. Miller" 
>> Cc: Thomas Gleixner 
>> Cc: Peter Zijlstra 
>> Cc: Ingo Molnar 
>> Cc: Andy Lutomirski 
>> Cc: Dave Hansen 
>>
>> Signed-off-by: Anshuman Khandual 
>> ---
>> Testing:
>>
>> - Build and boot tested on arm64 and x86
>> - Build tested on some other archs (arm, sparc64, alpha, powerpc etc)
>>
>> Changes in RFC V3:
>>
>> - Updated the commit message with an explaination for new preemption 
>> behaviour
>> - Moved notify_page_fault() to kprobes.h with 'static nokprobe_inline' per 
>> Matthew
>> - Changed notify_page_fault() return type from int to bool per Michael 
>> Ellerman
>> - Renamed notify_page_fault() as kprobe_page_fault() per Peterz
>>
>> Changes in RFC V2: (https://patchwork.kernel.org/patch/10974221/)
>>
>> - Changed generic notify_page_fault() per Mathew Wilcox
>> - Changed x86 to use new generic notify_page_fault()
>> - s/must not/need not/ in commit message per Matthew Wilcox
>>
>> Changes in RFC V1: (https://patchwork.kernel.org/patch/10968273/)
>>
>>   arch/arm/mm/fault.c  | 24 +---
>>   arch/arm64/mm/fault.c    | 24 +---
>>   arch/ia64/mm/fault.c | 24 +---
>>   arch/powerpc/mm/fault.c  | 23 ++-
>>   arch/s390/mm/fault.c | 16 +---
>>   arch/sh/mm/fault.c   | 18 ++
>>   arch/sparc/mm/fault_64.c | 16 +---
>>   arch/x86/mm/fault.c  | 21 ++---
>>   include/linux/kprobes.h  | 16 
>>   9 files changed, 27 insertions(+), 155 deletions(-)
>>
> 
> [...]
> 
>> diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
>> index 443d980..064dd15 100644
>> --- a/include/linux/kprobes.h
>> +++ b/include/linux/kprobes.h
>> @@ -458,4 +458,20 @@ static inline bool is_kprobe_optinsn_slot(unsigned long 
>> addr)
>>   }
>>   #endif
>>   +static nokprobe_inline bool kprobe_page_fault(struct pt_regs *regs,
>> +  unsigned int trap)
>> +{
>> +    int ret = 0;
> 
> ret is pointless.
> 
>> +
>> +    /*
>> + * To be potentially processing a kprobe fault and to be allowed
>> + * to call kprobe_running(), we have to be non-preemptible.
>> + */
>> +    if (kprobes_built_in() && !preemptible() && !user_mode(regs)) {
>> +    if (kprobe_running() && kprobe_fault_handler(regs, trap))
> 
> don't need an 'if A if B', can do 'if A && B'

Which will make it a very lengthy condition check.

> 
>> +    ret = 1;
> 
> can do 'return true;' directly here
> 
>> +    }
>> +    return ret;
> 
> And 'return false' here.

Makes sense, will drop ret.


Re: [RFC V3] mm: Generalize and rename notify_page_fault() as kprobe_page_fault()

2019-06-09 Thread Anshuman Khandual



On 06/07/2019 05:33 PM, Stephen Rothwell wrote:
> Hi Anshuman,
> 
> On Fri,  7 Jun 2019 16:04:15 +0530 Anshuman Khandual 
>  wrote:
>>
>> +static nokprobe_inline bool kprobe_page_fault(struct pt_regs *regs,
>> +  unsigned int trap)
>> +{
>> +int ret = 0;
>> +
>> +/*
>> + * To be potentially processing a kprobe fault and to be allowed
>> + * to call kprobe_running(), we have to be non-preemptible.
>> + */
>> +if (kprobes_built_in() && !preemptible() && !user_mode(regs)) {
>> +if (kprobe_running() && kprobe_fault_handler(regs, trap))
>> +ret = 1;
>> +}
>> +return ret;
>> +}
> 
> Since this is now declared as "bool" (thanks for that), you should make
> "ret" be bool and use true and false;

Sure, done.


[PATCH 4.4 145/241] cpufreq/pasemi: fix possible object reference leak

2019-06-09 Thread Greg Kroah-Hartman
[ Upstream commit a9acc26b75f652f697e02a9febe2ab0da648a571 ]

The call to of_get_cpu_node returns a node pointer with refcount
incremented thus it must be explicitly decremented after the last
usage.

Detected by coccinelle with the following warnings:
./drivers/cpufreq/pasemi-cpufreq.c:212:1-7: ERROR: missing of_node_put; 
acquired a node pointer with refcount incremented on line 147, but without a 
corresponding object release within this function.
./drivers/cpufreq/pasemi-cpufreq.c:220:1-7: ERROR: missing of_node_put; 
acquired a node pointer with refcount incremented on line 147, but without a 
corresponding object release within this function.

Signed-off-by: Wen Yang 
Cc: "Rafael J. Wysocki" 
Cc: Viresh Kumar 
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Signed-off-by: Viresh Kumar 
Signed-off-by: Sasha Levin 
---
 drivers/cpufreq/pasemi-cpufreq.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/cpufreq/pasemi-cpufreq.c b/drivers/cpufreq/pasemi-cpufreq.c
index 35dd4d7ffee08..58c933f483004 100644
--- a/drivers/cpufreq/pasemi-cpufreq.c
+++ b/drivers/cpufreq/pasemi-cpufreq.c
@@ -146,6 +146,7 @@ static int pas_cpufreq_cpu_init(struct cpufreq_policy 
*policy)
 
cpu = of_get_cpu_node(policy->cpu, NULL);
 
+   of_node_put(cpu);
if (!cpu)
goto out;
 
-- 
2.20.1





[PATCH 4.4 163/241] ASoC: fsl_utils: fix a leaked reference by adding missing of_node_put

2019-06-09 Thread Greg Kroah-Hartman
[ Upstream commit c705247136a523488eac806bd357c3e5d79a7acd ]

The call to of_parse_phandle returns a node pointer with refcount
incremented thus it must be explicitly decremented after the last
usage.

Detected by coccinelle with the following warnings:
./sound/soc/fsl/fsl_utils.c:74:2-8: ERROR: missing of_node_put; acquired a node 
pointer with refcount incremented on line 38, but without a corresponding 
object release within this function.

Signed-off-by: Wen Yang 
Cc: Timur Tabi 
Cc: Nicolin Chen 
Cc: Xiubo Li 
Cc: Fabio Estevam 
Cc: Liam Girdwood 
Cc: Mark Brown 
Cc: Jaroslav Kysela 
Cc: Takashi Iwai 
Cc: alsa-de...@alsa-project.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-ker...@vger.kernel.org
Signed-off-by: Mark Brown 
Signed-off-by: Sasha Levin 
---
 sound/soc/fsl/fsl_utils.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/sound/soc/fsl/fsl_utils.c b/sound/soc/fsl/fsl_utils.c
index b9e42b503a377..4f8bdb7650e84 100644
--- a/sound/soc/fsl/fsl_utils.c
+++ b/sound/soc/fsl/fsl_utils.c
@@ -75,6 +75,7 @@ int fsl_asoc_get_dma_channel(struct device_node *ssi_np,
iprop = of_get_property(dma_np, "cell-index", NULL);
if (!iprop) {
of_node_put(dma_np);
+   of_node_put(dma_channel_np);
return -EINVAL;
}
*dma_id = be32_to_cpup(iprop);
-- 
2.20.1





[PATCH 4.4 146/241] cpufreq: pmac32: fix possible object reference leak

2019-06-09 Thread Greg Kroah-Hartman
[ Upstream commit 8d10dc28a9ea6e8c02e825dab28699f3c72b02d9 ]

The call to of_find_node_by_name returns a node pointer with refcount
incremented thus it must be explicitly decremented after the last
usage.

Detected by coccinelle with the following warnings:
./drivers/cpufreq/pmac32-cpufreq.c:557:2-8: ERROR: missing of_node_put; 
acquired a node pointer with refcount incremented on line 552, but without a 
corresponding object release within this function.
./drivers/cpufreq/pmac32-cpufreq.c:569:1-7: ERROR: missing of_node_put; 
acquired a node pointer with refcount incremented on line 552, but without a 
corresponding object release within this function.
./drivers/cpufreq/pmac32-cpufreq.c:598:1-7: ERROR: missing of_node_put; 
acquired a node pointer with refcount incremented on line 587, but without a 
corresponding object release within this function.

Signed-off-by: Wen Yang 
Cc: "Rafael J. Wysocki" 
Cc: Viresh Kumar 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: linux...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-ker...@vger.kernel.org
Signed-off-by: Viresh Kumar 
Signed-off-by: Sasha Levin 
---
 drivers/cpufreq/pmac32-cpufreq.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/cpufreq/pmac32-cpufreq.c b/drivers/cpufreq/pmac32-cpufreq.c
index 1f49d97a70ea1..14928e0dc3265 100644
--- a/drivers/cpufreq/pmac32-cpufreq.c
+++ b/drivers/cpufreq/pmac32-cpufreq.c
@@ -549,6 +549,7 @@ static int pmac_cpufreq_init_7447A(struct device_node 
*cpunode)
volt_gpio_np = of_find_node_by_name(NULL, "cpu-vcore-select");
if (volt_gpio_np)
voltage_gpio = read_gpio(volt_gpio_np);
+   of_node_put(volt_gpio_np);
if (!voltage_gpio){
printk(KERN_ERR "cpufreq: missing cpu-vcore-select gpio\n");
return 1;
@@ -585,6 +586,7 @@ static int pmac_cpufreq_init_750FX(struct device_node 
*cpunode)
if (volt_gpio_np)
voltage_gpio = read_gpio(volt_gpio_np);
 
+   of_node_put(volt_gpio_np);
pvr = mfspr(SPRN_PVR);
has_cpu_l2lve = !((pvr & 0xf00) == 0x100);
 
-- 
2.20.1





Re: [PATCH v3 15/20] docs: move protection-keys.rst to the core-api book

2019-06-09 Thread Geert Uytterhoeven
Hi Mauro,

On Fri, Jun 7, 2019 at 9:38 PM Mauro Carvalho Chehab
 wrote:
> This document is used by multiple architectures:

Indeed it is...

>
> $ echo $(git grep -l  pkey_mprotect arch|cut -d'/' -f 2|sort|uniq)
> alpha arm arm64 ia64 m68k microblaze mips parisc powerpc s390 sh 
> sparc x86 xtensa

... but not because we now have a unified space for new syscall numbers ;-)

$ git grep -w ARCH_HAS_PKEYS -- "*Kconf*"
arch/powerpc/Kconfig:   select ARCH_HAS_PKEYS
arch/x86/Kconfig:   select ARCH_HAS_PKEYS
mm/Kconfig:config ARCH_HAS_PKEYS

I.e. limited to x86 and powerpc.

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds


[Bug 155231] powerpc : native aslr vdso randomization is not working in powerpc platform

2019-06-09 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=155231

Christophe Leroy (christophe.le...@c-s.fr) changed:

   What|Removed |Added

 CC||christophe.le...@c-s.fr

--- Comment #2 from Christophe Leroy (christophe.le...@c-s.fr) ---
See related ppc issue https://github.com/linuxppc/issues/issues/59

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

[Bug 203839] Kernel 5.2-rc3 fails to boot on a PowerMac G4 3,6: systemd[1]: Failed to bump fs.file-max, ignoring: invalid argument

2019-06-09 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=203839

Christophe Leroy (christophe.le...@c-s.fr) changed:

   What|Removed |Added

 CC||christophe.le...@c-s.fr

--- Comment #5 from Christophe Leroy (christophe.le...@c-s.fr) ---
Then the problem is not due to the rework of syscalls.

Are you able to bisect ?

-- 
You are receiving this mail because:
You are watching the assignee of the bug.