subject:"corruption"

[PATCH 5.11 012/210] selinux: fix cond_list corruption when changing booleans

2021-04-12 Thread Greg Kroah-Hartman

From: Ondrej Mosnacek 

commit d8f5f0ea5b86300390b026b6c6e7836b7150814a upstream.

Currently, duplicate_policydb_cond_list() first copies the whole
conditional avtab and then tries to link to the correct entries in
cond_dup_av_list() using avtab_search(). However, since the conditional
avtab may contain multiple entries with the same key, this approach
often fails to find the right entry, potentially leading to wrong rules
being activated/deactivated when booleans are changed.

To fix this, instead start with an empty conditional avtab and add the
individual entries one-by-one while building the new av_lists. This
approach leads to the correct result, since each entry is present in the
av_lists exactly once.

The issue can be reproduced with Fedora policy as follows:

# sesearch -s ftpd_t -t public_content_rw_t -c dir -p create -A
allow ftpd_t non_security_file_type:dir { add_name create getattr ioctl 
link lock open read remove_name rename reparent rmdir search setattr unlink 
watch watch_reads write }; [ ftpd_full_access ]:True
allow ftpd_t public_content_rw_t:dir { add_name create link remove_name 
rename reparent rmdir setattr unlink watch watch_reads write }; [ 
ftpd_anon_write ]:True
# setsebool ftpd_anon_write=off ftpd_connect_all_unreserved=off 
ftpd_connect_db=off ftpd_full_access=off

On fixed kernels, the sesearch output is the same after the setsebool
command:

# sesearch -s ftpd_t -t public_content_rw_t -c dir -p create -A
allow ftpd_t non_security_file_type:dir { add_name create getattr ioctl 
link lock open read remove_name rename reparent rmdir search setattr unlink 
watch watch_reads write }; [ ftpd_full_access ]:True
allow ftpd_t public_content_rw_t:dir { add_name create link remove_name 
rename reparent rmdir setattr unlink watch watch_reads write }; [ 
ftpd_anon_write ]:True

While on the broken kernels, it will be different:

# sesearch -s ftpd_t -t public_content_rw_t -c dir -p create -A
allow ftpd_t non_security_file_type:dir { add_name create getattr ioctl 
link lock open read remove_name rename reparent rmdir search setattr unlink 
watch watch_reads write }; [ ftpd_full_access ]:True
allow ftpd_t non_security_file_type:dir { add_name create getattr ioctl 
link lock open read remove_name rename reparent rmdir search setattr unlink 
watch watch_reads write }; [ ftpd_full_access ]:True
allow ftpd_t non_security_file_type:dir { add_name create getattr ioctl 
link lock open read remove_name rename reparent rmdir search setattr unlink 
watch watch_reads write }; [ ftpd_full_access ]:True

While there, also simplify the computation of nslots. This changes the
nslots values for nrules 2 or 3 to just two slots instead of 4, which
makes the sequence more consistent.

Cc: sta...@vger.kernel.org
Fixes: c7c556f1e81b ("selinux: refactor changing booleans")
Signed-off-by: Ondrej Mosnacek 
Signed-off-by: Paul Moore 
Signed-off-by: Greg Kroah-Hartman 
---
 security/selinux/ss/avtab.c   |   86 +++---
 security/selinux/ss/avtab.h   |2 
 security/selinux/ss/conditional.c |   12 ++---
 3 files changed, 32 insertions(+), 68 deletions(-)

--- a/security/selinux/ss/avtab.c
+++ b/security/selinux/ss/avtab.c
@@ -308,24 +308,10 @@ void avtab_init(struct avtab *h)
h->mask = 0;
 }
 
-int avtab_alloc(struct avtab *h, u32 nrules)
+static int avtab_alloc_common(struct avtab *h, u32 nslot)
 {
-   u32 shift = 0;
-   u32 work = nrules;
-   u32 nslot;
-
-   if (nrules == 0)
-   goto avtab_alloc_out;
-
-   while (work) {
-   work  = work >> 1;
-   shift++;
-   }
-   if (shift > 2)
-   shift = shift - 2;
-   nslot = 1 << shift;
-   if (nslot > MAX_AVTAB_HASH_BUCKETS)
-   nslot = MAX_AVTAB_HASH_BUCKETS;
+   if (!nslot)
+   return 0;
 
h->htable = kvcalloc(nslot, sizeof(void *), GFP_KERNEL);
if (!h->htable)
@@ -333,59 +319,37 @@ int avtab_alloc(struct avtab *h, u32 nru
 
h->nslot = nslot;
h->mask = nslot - 1;
-
-avtab_alloc_out:
-   pr_debug("SELinux: %d avtab hash slots, %d rules.\n",
-  h->nslot, nrules);
return 0;
 }
 
-int avtab_duplicate(struct avtab *new, struct avtab *orig)
+int avtab_alloc(struct avtab *h, u32 nrules)
 {
-   int i;
-   struct avtab_node *node, *tmp, *tail;
-
-   memset(new, 0, sizeof(*new));
+   int rc;
+   u32 nslot = 0;
 
-   new->htable = kvcalloc(orig->nslot, sizeof(void *), GFP_KERNEL);
-   if (!new->htable)
-   return -ENOMEM;
-   new->nslot = orig->nslot;
-   new->mask = orig->mask;
-
-   for (i = 0; i < orig->nslot; i++) {
-   tail = NULL;
-   for (node = orig->htable[i]; node; node = node->next) {
-   tmp = kmem_cache_zalloc(avtab_node_cachep, GFP_KERNEL);
-   if (!tmp)
-   goto error;
-

[PATCH 5.10 011/188] selinux: fix cond_list corruption when changing booleans

2021-04-12 Thread Greg Kroah-Hartman

From: Ondrej Mosnacek 

commit d8f5f0ea5b86300390b026b6c6e7836b7150814a upstream.

Currently, duplicate_policydb_cond_list() first copies the whole
conditional avtab and then tries to link to the correct entries in
cond_dup_av_list() using avtab_search(). However, since the conditional
avtab may contain multiple entries with the same key, this approach
often fails to find the right entry, potentially leading to wrong rules
being activated/deactivated when booleans are changed.

To fix this, instead start with an empty conditional avtab and add the
individual entries one-by-one while building the new av_lists. This
approach leads to the correct result, since each entry is present in the
av_lists exactly once.

The issue can be reproduced with Fedora policy as follows:

# sesearch -s ftpd_t -t public_content_rw_t -c dir -p create -A
allow ftpd_t non_security_file_type:dir { add_name create getattr ioctl 
link lock open read remove_name rename reparent rmdir search setattr unlink 
watch watch_reads write }; [ ftpd_full_access ]:True
allow ftpd_t public_content_rw_t:dir { add_name create link remove_name 
rename reparent rmdir setattr unlink watch watch_reads write }; [ 
ftpd_anon_write ]:True
# setsebool ftpd_anon_write=off ftpd_connect_all_unreserved=off 
ftpd_connect_db=off ftpd_full_access=off

On fixed kernels, the sesearch output is the same after the setsebool
command:

# sesearch -s ftpd_t -t public_content_rw_t -c dir -p create -A
allow ftpd_t non_security_file_type:dir { add_name create getattr ioctl 
link lock open read remove_name rename reparent rmdir search setattr unlink 
watch watch_reads write }; [ ftpd_full_access ]:True
allow ftpd_t public_content_rw_t:dir { add_name create link remove_name 
rename reparent rmdir setattr unlink watch watch_reads write }; [ 
ftpd_anon_write ]:True

While on the broken kernels, it will be different:

# sesearch -s ftpd_t -t public_content_rw_t -c dir -p create -A
allow ftpd_t non_security_file_type:dir { add_name create getattr ioctl 
link lock open read remove_name rename reparent rmdir search setattr unlink 
watch watch_reads write }; [ ftpd_full_access ]:True
allow ftpd_t non_security_file_type:dir { add_name create getattr ioctl 
link lock open read remove_name rename reparent rmdir search setattr unlink 
watch watch_reads write }; [ ftpd_full_access ]:True
allow ftpd_t non_security_file_type:dir { add_name create getattr ioctl 
link lock open read remove_name rename reparent rmdir search setattr unlink 
watch watch_reads write }; [ ftpd_full_access ]:True

While there, also simplify the computation of nslots. This changes the
nslots values for nrules 2 or 3 to just two slots instead of 4, which
makes the sequence more consistent.

Cc: sta...@vger.kernel.org
Fixes: c7c556f1e81b ("selinux: refactor changing booleans")
Signed-off-by: Ondrej Mosnacek 
Signed-off-by: Paul Moore 
Signed-off-by: Greg Kroah-Hartman 
---
 security/selinux/ss/avtab.c   |   86 +++---
 security/selinux/ss/avtab.h   |2 
 security/selinux/ss/conditional.c |   12 ++---
 3 files changed, 32 insertions(+), 68 deletions(-)

--- a/security/selinux/ss/avtab.c
+++ b/security/selinux/ss/avtab.c
@@ -308,24 +308,10 @@ void avtab_init(struct avtab *h)
h->mask = 0;
 }
 
-int avtab_alloc(struct avtab *h, u32 nrules)
+static int avtab_alloc_common(struct avtab *h, u32 nslot)
 {
-   u32 shift = 0;
-   u32 work = nrules;
-   u32 nslot;
-
-   if (nrules == 0)
-   goto avtab_alloc_out;
-
-   while (work) {
-   work  = work >> 1;
-   shift++;
-   }
-   if (shift > 2)
-   shift = shift - 2;
-   nslot = 1 << shift;
-   if (nslot > MAX_AVTAB_HASH_BUCKETS)
-   nslot = MAX_AVTAB_HASH_BUCKETS;
+   if (!nslot)
+   return 0;
 
h->htable = kvcalloc(nslot, sizeof(void *), GFP_KERNEL);
if (!h->htable)
@@ -333,59 +319,37 @@ int avtab_alloc(struct avtab *h, u32 nru
 
h->nslot = nslot;
h->mask = nslot - 1;
-
-avtab_alloc_out:
-   pr_debug("SELinux: %d avtab hash slots, %d rules.\n",
-  h->nslot, nrules);
return 0;
 }
 
-int avtab_duplicate(struct avtab *new, struct avtab *orig)
+int avtab_alloc(struct avtab *h, u32 nrules)
 {
-   int i;
-   struct avtab_node *node, *tmp, *tail;
-
-   memset(new, 0, sizeof(*new));
+   int rc;
+   u32 nslot = 0;
 
-   new->htable = kvcalloc(orig->nslot, sizeof(void *), GFP_KERNEL);
-   if (!new->htable)
-   return -ENOMEM;
-   new->nslot = orig->nslot;
-   new->mask = orig->mask;
-
-   for (i = 0; i < orig->nslot; i++) {
-   tail = NULL;
-   for (node = orig->htable[i]; node; node = node->next) {
-   tmp = kmem_cache_zalloc(avtab_node_cachep, GFP_KERNEL);
-   if (!tmp)
-   goto error;
-

Re: [PATCH v2] mm: page_poison: print page info when corruption is caught

2021-04-08 Thread Vlastimil Babka

On 4/8/21 1:08 AM, Sergei Trofimovich wrote:
> When page_poison detects page corruption it's useful to see who
> freed a page recently to have a guess where write-after-free
> corruption happens.
> 
> After this change corruption report has extra page data.
> Example report from real corruption (includes only page_pwner part):
> 
> pagealloc: memory corruption
> e0014cd61d10: 11 00 00 00 00 00 00 00 30 1d d2 ff ff 0f 00 60  
> 0..`
> e0014cd61d20: b0 1d d2 ff ff 0f 00 60 90 fe 1c 00 08 00 00 20  
> ...`...
> ...
> CPU: 1 PID: 220402 Comm: cc1plus Not tainted 
> 5.12.0-rc5-00107-g9720c6f59ecf #245
> Hardware name: hp server rx3600, BIOS 04.03 04/08/2008
> ...
> Call Trace:
>  [] show_stack+0x90/0xc0
>  [] dump_stack+0x150/0x1c0
>  [] __kernel_unpoison_pages+0x410/0x440
>  [] get_page_from_freelist+0x1460/0x2ca0
>  [] __alloc_pages_nodemask+0x3c0/0x660
>  [] alloc_pages_vma+0xb0/0x500
>  [] __handle_mm_fault+0x1230/0x1fe0
>  [] handle_mm_fault+0x310/0x4e0
>  [] ia64_do_page_fault+0x1f0/0xb80
>  [] ia64_leave_kernel+0x0/0x270
> page_owner tracks the page as freed
> page allocated via order 0, migratetype Movable,
>   gfp_mask 0x100dca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), pid 37, ts 
> 8173444098740
>  __reset_page_owner+0x40/0x200
>  free_pcp_prepare+0x4d0/0x600
>  free_unref_page+0x20/0x1c0
>  __put_page+0x110/0x1a0
>  migrate_pages+0x16d0/0x1dc0
>  compact_zone+0xfc0/0x1aa0
>  proactive_compact_node+0xd0/0x1e0
>  kcompactd+0x550/0x600
>  kthread+0x2c0/0x2e0
>  call_payload+0x50/0x80
> 
> Here we can see that page was freed by page migration but something
> managed to write to it afterwards.
> 
> CC: Vlastimil Babka 
> CC: Andrew Morton 
> CC: linux...@kvack.org
> Signed-off-by: Sergei Trofimovich 

Acked-by: Vlastimil Babka 

> ---
> Change since v1: use more generic 'dump_page()' suggested by Vlastimil
> Should supersede existing 
> mm-page_poison-print-page-owner-info-when-corruption-is-caught.patch
> 
>  mm/page_poison.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/page_poison.c b/mm/page_poison.c
> index 65cdf844c8ad..df03126f3b2b 100644
> --- a/mm/page_poison.c
> +++ b/mm/page_poison.c
> @@ -2,6 +2,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -45,7 +46,7 @@ static bool single_bit_flip(unsigned char a, unsigned char 
> b)
>   return error && !(error & (error - 1));
>  }
>  
> -static void check_poison_mem(unsigned char *mem, size_t bytes)
> +static void check_poison_mem(struct page *page, unsigned char *mem, size_t 
> bytes)
>  {
>   static DEFINE_RATELIMIT_STATE(ratelimit, 5 * HZ, 10);
>   unsigned char *start;
> @@ -70,6 +71,7 @@ static void check_poison_mem(unsigned char *mem, size_t 
> bytes)
>   print_hex_dump(KERN_ERR, "", DUMP_PREFIX_ADDRESS, 16, 1, start,
>   end - start + 1, 1);
>   dump_stack();
> + dump_page(page, "pagealloc: corrupted page details");
>  }
>  
>  static void unpoison_page(struct page *page)
> @@ -82,7 +84,7 @@ static void unpoison_page(struct page *page)
>* that is freed to buddy. Thus no extra check is done to
>* see if a page was poisoned.
>*/
> - check_poison_mem(addr, PAGE_SIZE);
> + check_poison_mem(page, addr, PAGE_SIZE);
>   kunmap_atomic(addr);
>  }
>  
>

[PATCH v2] mm: page_poison: print page info when corruption is caught

2021-04-07 Thread Sergei Trofimovich

When page_poison detects page corruption it's useful to see who
freed a page recently to have a guess where write-after-free
corruption happens.

After this change corruption report has extra page data.
Example report from real corruption (includes only page_pwner part):

pagealloc: memory corruption
e0014cd61d10: 11 00 00 00 00 00 00 00 30 1d d2 ff ff 0f 00 60  
0..`
e0014cd61d20: b0 1d d2 ff ff 0f 00 60 90 fe 1c 00 08 00 00 20  
...`...
...
CPU: 1 PID: 220402 Comm: cc1plus Not tainted 5.12.0-rc5-00107-g9720c6f59ecf 
#245
Hardware name: hp server rx3600, BIOS 04.03 04/08/2008
...
Call Trace:
 [] show_stack+0x90/0xc0
 [] dump_stack+0x150/0x1c0
 [] __kernel_unpoison_pages+0x410/0x440
 [] get_page_from_freelist+0x1460/0x2ca0
 [] __alloc_pages_nodemask+0x3c0/0x660
 [] alloc_pages_vma+0xb0/0x500
 [] __handle_mm_fault+0x1230/0x1fe0
 [] handle_mm_fault+0x310/0x4e0
 [] ia64_do_page_fault+0x1f0/0xb80
 [] ia64_leave_kernel+0x0/0x270
page_owner tracks the page as freed
page allocated via order 0, migratetype Movable,
  gfp_mask 0x100dca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), pid 37, ts 
8173444098740
 __reset_page_owner+0x40/0x200
 free_pcp_prepare+0x4d0/0x600
 free_unref_page+0x20/0x1c0
 __put_page+0x110/0x1a0
 migrate_pages+0x16d0/0x1dc0
 compact_zone+0xfc0/0x1aa0
 proactive_compact_node+0xd0/0x1e0
 kcompactd+0x550/0x600
 kthread+0x2c0/0x2e0
 call_payload+0x50/0x80

Here we can see that page was freed by page migration but something
managed to write to it afterwards.

CC: Vlastimil Babka 
CC: Andrew Morton 
CC: linux...@kvack.org
Signed-off-by: Sergei Trofimovich 
---
Change since v1: use more generic 'dump_page()' suggested by Vlastimil
Should supersede existing 
mm-page_poison-print-page-owner-info-when-corruption-is-caught.patch

 mm/page_poison.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/mm/page_poison.c b/mm/page_poison.c
index 65cdf844c8ad..df03126f3b2b 100644
--- a/mm/page_poison.c
+++ b/mm/page_poison.c
@@ -2,6 +2,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -45,7 +46,7 @@ static bool single_bit_flip(unsigned char a, unsigned char b)
return error && !(error & (error - 1));
 }
 
-static void check_poison_mem(unsigned char *mem, size_t bytes)
+static void check_poison_mem(struct page *page, unsigned char *mem, size_t 
bytes)
 {
static DEFINE_RATELIMIT_STATE(ratelimit, 5 * HZ, 10);
unsigned char *start;
@@ -70,6 +71,7 @@ static void check_poison_mem(unsigned char *mem, size_t bytes)
print_hex_dump(KERN_ERR, "", DUMP_PREFIX_ADDRESS, 16, 1, start,
end - start + 1, 1);
dump_stack();
+   dump_page(page, "pagealloc: corrupted page details");
 }
 
 static void unpoison_page(struct page *page)
@@ -82,7 +84,7 @@ static void unpoison_page(struct page *page)
 * that is freed to buddy. Thus no extra check is done to
 * see if a page was poisoned.
 */
-   check_poison_mem(addr, PAGE_SIZE);
+   check_poison_mem(page, addr, PAGE_SIZE);
kunmap_atomic(addr);
 }
 
-- 
2.31.1

Re: [PATCH] mm: page_poison: print page owner info when corruption is caught

2021-04-07 Thread Sergei Trofimovich

On Wed, Apr 07, 2021 at 02:15:50PM +0200, Vlastimil Babka wrote:
> On 4/4/21 4:17 PM, Sergei Trofimovich wrote:
> > When page_poison detects page corruption it's useful to see who
> > freed a page recently to have a guess where write-after-free
> > corruption happens.
> > 
> > After this change corruption report has extra page_owner data.
> > Example report from real corruption:
> > 
> > pagealloc: memory corruption
> > e0014cd61d10: 11 00 00 00 00 00 00 00 30 1d d2 ff ff 0f 00 60
> > e0014cd61d20: b0 1d d2 ff ff 0f 00 60 90 fe 1c 00 08 00 00 20
> > ...
> > CPU: 1 PID: 220402 Comm: cc1plus Not tainted 
> > 5.12.0-rc5-00107-g9720c6f59ecf #245
> > Hardware name: hp server rx3600, BIOS 04.03 04/08/2008
> > ...
> > Call Trace:
> >  [] show_stack+0x90/0xc0
> >  [] dump_stack+0x150/0x1c0
> >  [] __kernel_unpoison_pages+0x410/0x440
> >  [] get_page_from_freelist+0x1460/0x2ca0
> >  [] __alloc_pages_nodemask+0x3c0/0x660
> >  [] alloc_pages_vma+0xb0/0x500
> >  [] __handle_mm_fault+0x1230/0x1fe0
> >  [] handle_mm_fault+0x310/0x4e0
> >  [] ia64_do_page_fault+0x1f0/0xb80
> >  [] ia64_leave_kernel+0x0/0x270
> > page_owner tracks the page as freed
> > page allocated via order 0, migratetype Movable,
> >   gfp_mask 0x100dca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), pid 37, ts 
> > 8173444098740
> >  __reset_page_owner+0x40/0x200
> >  free_pcp_prepare+0x4d0/0x600
> >  free_unref_page+0x20/0x1c0
> >  __put_page+0x110/0x1a0
> >  migrate_pages+0x16d0/0x1dc0
> >  compact_zone+0xfc0/0x1aa0
> >  proactive_compact_node+0xd0/0x1e0
> >  kcompactd+0x550/0x600
> >  kthread+0x2c0/0x2e0
> >  call_payload+0x50/0x80
> > 
> > Here we can see that page was freed by page migration but something
> > managed to write to it afterwards.
> > 
> > CC: Andrew Morton 
> > CC: linux...@kvack.org
> > Signed-off-by: Sergei Trofimovich 
> > ---
> >  mm/page_poison.c | 6 --
> >  1 file changed, 4 insertions(+), 2 deletions(-)
> > 
> > diff --git a/mm/page_poison.c b/mm/page_poison.c
> > index 65cdf844c8ad..ef2a1eab13d7 100644
> > --- a/mm/page_poison.c
> > +++ b/mm/page_poison.c
> > @@ -4,6 +4,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  #include 
> >  #include 
> >  #include 
> > @@ -45,7 +46,7 @@ static bool single_bit_flip(unsigned char a, unsigned 
> > char b)
> > return error && !(error & (error - 1));
> >  }
> >  
> > -static void check_poison_mem(unsigned char *mem, size_t bytes)
> > +static void check_poison_mem(struct page *page, unsigned char *mem, size_t 
> > bytes)
> >  {
> > static DEFINE_RATELIMIT_STATE(ratelimit, 5 * HZ, 10);
> > unsigned char *start;
> > @@ -70,6 +71,7 @@ static void check_poison_mem(unsigned char *mem, size_t 
> > bytes)
> > print_hex_dump(KERN_ERR, "", DUMP_PREFIX_ADDRESS, 16, 1, start,
> > end - start + 1, 1);
> > dump_stack();
> > +   dump_page_owner(page);
> 
> OK but why not a full dump_page()?

Oh, I did not know it existed! Looks even better.
Will send a v2 with dump_page().

> >  }
> >  
> >  static void unpoison_page(struct page *page)
> > @@ -82,7 +84,7 @@ static void unpoison_page(struct page *page)
> >  * that is freed to buddy. Thus no extra check is done to
> >  * see if a page was poisoned.
> >  */
> > -   check_poison_mem(addr, PAGE_SIZE);
> > +   check_poison_mem(page, addr, PAGE_SIZE);
> > kunmap_atomic(addr);
> >  }
> >  
> > 
> 

-- 

  Sergei

Re: [PATCH] mm: page_poison: print page owner info when corruption is caught

2021-04-07 Thread Vlastimil Babka

On 4/4/21 4:17 PM, Sergei Trofimovich wrote:
> When page_poison detects page corruption it's useful to see who
> freed a page recently to have a guess where write-after-free
> corruption happens.
> 
> After this change corruption report has extra page_owner data.
> Example report from real corruption:
> 
> pagealloc: memory corruption
> e0014cd61d10: 11 00 00 00 00 00 00 00 30 1d d2 ff ff 0f 00 60
> e0014cd61d20: b0 1d d2 ff ff 0f 00 60 90 fe 1c 00 08 00 00 20
> ...
> CPU: 1 PID: 220402 Comm: cc1plus Not tainted 
> 5.12.0-rc5-00107-g9720c6f59ecf #245
> Hardware name: hp server rx3600, BIOS 04.03 04/08/2008
> ...
> Call Trace:
>  [] show_stack+0x90/0xc0
>  [] dump_stack+0x150/0x1c0
>  [] __kernel_unpoison_pages+0x410/0x440
>  [] get_page_from_freelist+0x1460/0x2ca0
>  [] __alloc_pages_nodemask+0x3c0/0x660
>  [] alloc_pages_vma+0xb0/0x500
>  [] __handle_mm_fault+0x1230/0x1fe0
>  [] handle_mm_fault+0x310/0x4e0
>  [] ia64_do_page_fault+0x1f0/0xb80
>  [] ia64_leave_kernel+0x0/0x270
> page_owner tracks the page as freed
> page allocated via order 0, migratetype Movable,
>   gfp_mask 0x100dca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), pid 37, ts 
> 8173444098740
>  __reset_page_owner+0x40/0x200
>  free_pcp_prepare+0x4d0/0x600
>  free_unref_page+0x20/0x1c0
>  __put_page+0x110/0x1a0
>  migrate_pages+0x16d0/0x1dc0
>  compact_zone+0xfc0/0x1aa0
>  proactive_compact_node+0xd0/0x1e0
>  kcompactd+0x550/0x600
>  kthread+0x2c0/0x2e0
>  call_payload+0x50/0x80
> 
> Here we can see that page was freed by page migration but something
> managed to write to it afterwards.
> 
> CC: Andrew Morton 
> CC: linux...@kvack.org
> Signed-off-by: Sergei Trofimovich 
> ---
>  mm/page_poison.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/page_poison.c b/mm/page_poison.c
> index 65cdf844c8ad..ef2a1eab13d7 100644
> --- a/mm/page_poison.c
> +++ b/mm/page_poison.c
> @@ -4,6 +4,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -45,7 +46,7 @@ static bool single_bit_flip(unsigned char a, unsigned char 
> b)
>   return error && !(error & (error - 1));
>  }
>  
> -static void check_poison_mem(unsigned char *mem, size_t bytes)
> +static void check_poison_mem(struct page *page, unsigned char *mem, size_t 
> bytes)
>  {
>   static DEFINE_RATELIMIT_STATE(ratelimit, 5 * HZ, 10);
>   unsigned char *start;
> @@ -70,6 +71,7 @@ static void check_poison_mem(unsigned char *mem, size_t 
> bytes)
>   print_hex_dump(KERN_ERR, "", DUMP_PREFIX_ADDRESS, 16, 1, start,
>   end - start + 1, 1);
>   dump_stack();
> + dump_page_owner(page);

OK but why not a full dump_page()?

>  }
>  
>  static void unpoison_page(struct page *page)
> @@ -82,7 +84,7 @@ static void unpoison_page(struct page *page)
>* that is freed to buddy. Thus no extra check is done to
>* see if a page was poisoned.
>*/
> - check_poison_mem(addr, PAGE_SIZE);
> + check_poison_mem(page, addr, PAGE_SIZE);
>   kunmap_atomic(addr);
>  }
>  
>

[PATCH 5.11 088/152] drm/amdkfd: dqm fence memory corruption

2021-04-05 Thread Greg Kroah-Hartman

From: Qu Huang 

commit e92049ae4548ba09e53eaa9c8f6964b07ea274c9 upstream.

Amdgpu driver uses 4-byte data type as DQM fence memory,
and transmits GPU address of fence memory to microcode
through query status PM4 message. However, query status
PM4 message definition and microcode processing are all
processed according to 8 bytes. Fence memory only allocates
4 bytes of memory, but microcode does write 8 bytes of memory,
so there is a memory corruption.

Changes since v1:
  * Change dqm->fence_addr as a u64 pointer to fix this issue,
also fix up query_status and amdkfd_fence_wait_timeout function
uses 64 bit fence value to make them consistent.

Signed-off-by: Qu Huang 
Reviewed-by: Felix Kuehling 
Signed-off-by: Felix Kuehling 
Signed-off-by: Alex Deucher 
Cc: sta...@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman 
---
 drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c   |2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c |6 +++---
 drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h |2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c   |2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c|2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_vi.c|2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h |8 
 7 files changed, 12 insertions(+), 12 deletions(-)

--- a/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c
@@ -155,7 +155,7 @@ static int dbgdev_diq_submit_ib(struct k
 
/* Wait till CP writes sync code: */
status = amdkfd_fence_wait_timeout(
-   (unsigned int *) rm_state,
+   rm_state,
QUEUESTATE__ACTIVE, 1500);
 
kfd_gtt_sa_free(dbgdev->dev, mem_obj);
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -1167,7 +1167,7 @@ static int start_cpsch(struct device_que
if (retval)
goto fail_allocate_vidmem;
 
-   dqm->fence_addr = dqm->fence_mem->cpu_ptr;
+   dqm->fence_addr = (uint64_t *)dqm->fence_mem->cpu_ptr;
dqm->fence_gpu_addr = dqm->fence_mem->gpu_addr;
 
init_interrupts(dqm);
@@ -1340,8 +1340,8 @@ out:
return retval;
 }
 
-int amdkfd_fence_wait_timeout(unsigned int *fence_addr,
-   unsigned int fence_value,
+int amdkfd_fence_wait_timeout(uint64_t *fence_addr,
+   uint64_t fence_value,
unsigned int timeout_ms)
 {
unsigned long end_jiffies = msecs_to_jiffies(timeout_ms) + jiffies;
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
@@ -192,7 +192,7 @@ struct device_queue_manager {
uint16_tvmid_pasid[VMID_NUM];
uint64_tpipelines_addr;
uint64_tfence_gpu_addr;
-   unsigned int*fence_addr;
+   uint64_t*fence_addr;
struct kfd_mem_obj  *fence_mem;
boolactive_runlist;
int sched_policy;
--- a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
@@ -347,7 +347,7 @@ fail_create_runlist_ib:
 }
 
 int pm_send_query_status(struct packet_manager *pm, uint64_t fence_address,
-   uint32_t fence_value)
+   uint64_t fence_value)
 {
uint32_t *buffer, size;
int retval = 0;
--- a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c
@@ -283,7 +283,7 @@ static int pm_unmap_queues_v9(struct pac
 }
 
 static int pm_query_status_v9(struct packet_manager *pm, uint32_t *buffer,
-   uint64_t fence_address, uint32_t fence_value)
+   uint64_t fence_address, uint64_t fence_value)
 {
struct pm4_mes_query_status *packet;
 
--- a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_vi.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_vi.c
@@ -263,7 +263,7 @@ static int pm_unmap_queues_vi(struct pac
 }
 
 static int pm_query_status_vi(struct packet_manager *pm, uint32_t *buffer,
-   uint64_t fence_address, uint32_t fence_value)
+   uint64_t fence_address, uint64_t fence_value)
 {
struct pm4_mes_query_status *packet;
 
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -1003,8 +1003,8 @@ int pqm_get_wave_state(struct process_qu
   u32 *ctl_stack_used_size,
   u32 *save_area_used_size);
 
-int amdkfd_fence_wait_timeout(unsigned int *fence_addr,
- unsigned int fence_value,
+int amdkfd_fence_wait_timeout(uint64_t *fence_addr,
+ uint64_t fence_value,

[PATCH 5.10 073/126] drm/amdkfd: dqm fence memory corruption

2021-04-05 Thread Greg Kroah-Hartman

From: Qu Huang 

commit e92049ae4548ba09e53eaa9c8f6964b07ea274c9 upstream.

Amdgpu driver uses 4-byte data type as DQM fence memory,
and transmits GPU address of fence memory to microcode
through query status PM4 message. However, query status
PM4 message definition and microcode processing are all
processed according to 8 bytes. Fence memory only allocates
4 bytes of memory, but microcode does write 8 bytes of memory,
so there is a memory corruption.

Changes since v1:
  * Change dqm->fence_addr as a u64 pointer to fix this issue,
also fix up query_status and amdkfd_fence_wait_timeout function
uses 64 bit fence value to make them consistent.

Signed-off-by: Qu Huang 
Reviewed-by: Felix Kuehling 
Signed-off-by: Felix Kuehling 
Signed-off-by: Alex Deucher 
Cc: sta...@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman 
---
 drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c   |2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c |6 +++---
 drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h |2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c   |2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c|2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_vi.c|2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h |8 
 7 files changed, 12 insertions(+), 12 deletions(-)

--- a/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c
@@ -155,7 +155,7 @@ static int dbgdev_diq_submit_ib(struct k
 
/* Wait till CP writes sync code: */
status = amdkfd_fence_wait_timeout(
-   (unsigned int *) rm_state,
+   rm_state,
QUEUESTATE__ACTIVE, 1500);
 
kfd_gtt_sa_free(dbgdev->dev, mem_obj);
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -1167,7 +1167,7 @@ static int start_cpsch(struct device_que
if (retval)
goto fail_allocate_vidmem;
 
-   dqm->fence_addr = dqm->fence_mem->cpu_ptr;
+   dqm->fence_addr = (uint64_t *)dqm->fence_mem->cpu_ptr;
dqm->fence_gpu_addr = dqm->fence_mem->gpu_addr;
 
init_interrupts(dqm);
@@ -1340,8 +1340,8 @@ out:
return retval;
 }
 
-int amdkfd_fence_wait_timeout(unsigned int *fence_addr,
-   unsigned int fence_value,
+int amdkfd_fence_wait_timeout(uint64_t *fence_addr,
+   uint64_t fence_value,
unsigned int timeout_ms)
 {
unsigned long end_jiffies = msecs_to_jiffies(timeout_ms) + jiffies;
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
@@ -192,7 +192,7 @@ struct device_queue_manager {
uint16_tvmid_pasid[VMID_NUM];
uint64_tpipelines_addr;
uint64_tfence_gpu_addr;
-   unsigned int*fence_addr;
+   uint64_t*fence_addr;
struct kfd_mem_obj  *fence_mem;
boolactive_runlist;
int sched_policy;
--- a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
@@ -345,7 +345,7 @@ fail_create_runlist_ib:
 }
 
 int pm_send_query_status(struct packet_manager *pm, uint64_t fence_address,
-   uint32_t fence_value)
+   uint64_t fence_value)
 {
uint32_t *buffer, size;
int retval = 0;
--- a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c
@@ -283,7 +283,7 @@ static int pm_unmap_queues_v9(struct pac
 }
 
 static int pm_query_status_v9(struct packet_manager *pm, uint32_t *buffer,
-   uint64_t fence_address, uint32_t fence_value)
+   uint64_t fence_address, uint64_t fence_value)
 {
struct pm4_mes_query_status *packet;
 
--- a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_vi.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_vi.c
@@ -263,7 +263,7 @@ static int pm_unmap_queues_vi(struct pac
 }
 
 static int pm_query_status_vi(struct packet_manager *pm, uint32_t *buffer,
-   uint64_t fence_address, uint32_t fence_value)
+   uint64_t fence_address, uint64_t fence_value)
 {
struct pm4_mes_query_status *packet;
 
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -1006,8 +1006,8 @@ int pqm_get_wave_state(struct process_qu
   u32 *ctl_stack_used_size,
   u32 *save_area_used_size);
 
-int amdkfd_fence_wait_timeout(unsigned int *fence_addr,
- unsigned int fence_value,
+int amdkfd_fence_wait_timeout(uint64_t *fence_addr,
+ uint64_t fence_value,

[PATCH] mm: page_poison: print page owner info when corruption is caught

2021-04-04 Thread Sergei Trofimovich

When page_poison detects page corruption it's useful to see who
freed a page recently to have a guess where write-after-free
corruption happens.

After this change corruption report has extra page_owner data.
Example report from real corruption:

pagealloc: memory corruption
e0014cd61d10: 11 00 00 00 00 00 00 00 30 1d d2 ff ff 0f 00 60
e0014cd61d20: b0 1d d2 ff ff 0f 00 60 90 fe 1c 00 08 00 00 20
...
CPU: 1 PID: 220402 Comm: cc1plus Not tainted 5.12.0-rc5-00107-g9720c6f59ecf 
#245
Hardware name: hp server rx3600, BIOS 04.03 04/08/2008
...
Call Trace:
 [] show_stack+0x90/0xc0
 [] dump_stack+0x150/0x1c0
 [] __kernel_unpoison_pages+0x410/0x440
 [] get_page_from_freelist+0x1460/0x2ca0
 [] __alloc_pages_nodemask+0x3c0/0x660
 [] alloc_pages_vma+0xb0/0x500
 [] __handle_mm_fault+0x1230/0x1fe0
 [] handle_mm_fault+0x310/0x4e0
 [] ia64_do_page_fault+0x1f0/0xb80
 [] ia64_leave_kernel+0x0/0x270
page_owner tracks the page as freed
page allocated via order 0, migratetype Movable,
  gfp_mask 0x100dca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), pid 37, ts 
8173444098740
 __reset_page_owner+0x40/0x200
 free_pcp_prepare+0x4d0/0x600
 free_unref_page+0x20/0x1c0
 __put_page+0x110/0x1a0
 migrate_pages+0x16d0/0x1dc0
 compact_zone+0xfc0/0x1aa0
 proactive_compact_node+0xd0/0x1e0
 kcompactd+0x550/0x600
 kthread+0x2c0/0x2e0
 call_payload+0x50/0x80

Here we can see that page was freed by page migration but something
managed to write to it afterwards.

CC: Andrew Morton 
CC: linux...@kvack.org
Signed-off-by: Sergei Trofimovich 
---
 mm/page_poison.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/mm/page_poison.c b/mm/page_poison.c
index 65cdf844c8ad..ef2a1eab13d7 100644
--- a/mm/page_poison.c
+++ b/mm/page_poison.c
@@ -4,6 +4,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -45,7 +46,7 @@ static bool single_bit_flip(unsigned char a, unsigned char b)
return error && !(error & (error - 1));
 }
 
-static void check_poison_mem(unsigned char *mem, size_t bytes)
+static void check_poison_mem(struct page *page, unsigned char *mem, size_t 
bytes)
 {
static DEFINE_RATELIMIT_STATE(ratelimit, 5 * HZ, 10);
unsigned char *start;
@@ -70,6 +71,7 @@ static void check_poison_mem(unsigned char *mem, size_t bytes)
print_hex_dump(KERN_ERR, "", DUMP_PREFIX_ADDRESS, 16, 1, start,
end - start + 1, 1);
dump_stack();
+   dump_page_owner(page);
 }
 
 static void unpoison_page(struct page *page)
@@ -82,7 +84,7 @@ static void unpoison_page(struct page *page)
 * that is freed to buddy. Thus no extra check is done to
 * see if a page was poisoned.
 */
-   check_poison_mem(addr, PAGE_SIZE);
+   check_poison_mem(page, addr, PAGE_SIZE);
kunmap_atomic(addr);
 }
 
-- 
2.31.1

[PATCH 5/5] mm/userfaultfd: fix memory corruption due to writeprotect

2021-04-01 Thread Suren Baghdasaryan

From: Nadav Amit 

Userfaultfd self-test fails occasionally, indicating a memory corruption.

Analyzing this problem indicates that there is a real bug since mmap_lock
is only taken for read in mwriteprotect_range() and defers flushes, and
since there is insufficient consideration of concurrent deferred TLB
flushes in wp_page_copy().  Although the PTE is flushed from the TLBs in
wp_page_copy(), this flush takes place after the copy has already been
performed, and therefore changes of the page are possible between the time
of the copy and the time in which the PTE is flushed.

To make matters worse, memory-unprotection using userfaultfd also poses a
problem.  Although memory unprotection is logically a promotion of PTE
permissions, and therefore should not require a TLB flush, the current
userrfaultfd code might actually cause a demotion of the architectural PTE
permission: when userfaultfd_writeprotect() unprotects memory region, it
unintentionally *clears* the RW-bit if it was already set.  Note that this
unprotecting a PTE that is not write-protected is a valid use-case: the
userfaultfd monitor might ask to unprotect a region that holds both
write-protected and write-unprotected PTEs.

The scenario that happens in selftests/vm/userfaultfd is as follows:

cpu0cpu1cpu2

[ Writable PTE
  cached in TLB ]
userfaultfd_writeprotect()
[ write-*unprotect* ]
mwriteprotect_range()
mmap_read_lock()
change_protection()

change_protection_range()
...
change_pte_range()
[ *clear* “write”-bit ]
[ defer TLB flushes ]
[ page-fault ]
...
wp_page_copy()
 cow_user_page()
  [ copy page ]
[ write to old
  page ]
...
 set_pte_at_notify()

A similar scenario can happen:

cpu0cpu1cpu2cpu3

[ Writable PTE
  cached in TLB ]
userfaultfd_writeprotect()
[ write-protect ]
[ deferred TLB flush ]
userfaultfd_writeprotect()
[ write-unprotect ]
[ deferred TLB flush]
[ page-fault ]
wp_page_copy()
 cow_user_page()
 [ copy page ]
 ...[ write to page ]
set_pte_at_notify()

This race exists since commit 292924b26024 ("userfaultfd: wp: apply
_PAGE_UFFD_WP bit").  Yet, as Yu Zhao pointed, these races became apparent
since commit 09854ba94c6a ("mm: do_wp_page() simplification") which made
wp_page_copy() more likely to take place, specifically if page_count(page)
> 1.

To resolve the aforementioned races, check whether there are pending
flushes on uffd-write-protected VMAs, and if there are, perform a flush
before doing the COW.

Further optimizations will follow to avoid during uffd-write-unprotect
unnecassary PTE write-protection and TLB flushes.

Link: https://lkml.kernel.org/r/20210304095423.3825684-1-na...@vmware.com
Fixes: 09854ba94c6a ("mm: do_wp_page() simplification")
Signed-off-by: Nadav Amit 
Suggested-by: Yu Zhao 
Reviewed-by: Peter Xu 
Tested-by: Peter Xu 
Cc: Andrea Arcangeli 
Cc: Andy Lutomirski 
Cc: Pavel Emelyanov 
Cc: Mike Kravetz 
Cc: Mike Rapoport 
Cc: Minchan Kim 
Cc: Will Deacon 
Cc: Peter Zijlstra 
Cc: [5.9+]
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
---
 mm/memory.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/mm/memory.c b/mm/memory.c
index 656d90a75cf8..fe6e92de9bec 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2825,6 +2825,14 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf)
 {
struct vm_area_struct *vma = vmf->vma;
 
+   /*
+* Userfaultfd write-protect can defer flushes. Ensure the TLB
+* is flushed in this case before copying.
+*/
+   if (unlikely(userfaultfd_wp(vmf->vma) &&
+mm_tlb_flush_pending(vmf->vma->vm_mm)))
+   flush_tlb_page(vmf->vma, vmf->address);
+
vmf->page = vm_normal_page(vma, vmf->address, vmf->orig_pte);
if (!vmf->page) {
/*
-- 
2.31.0.291.g576ba9dcdaf-goog

[PATCH 5/5] mm/userfaultfd: fix memory corruption due to writeprotect

2021-04-01 Thread Suren Baghdasaryan

From: Nadav Amit 

Userfaultfd self-test fails occasionally, indicating a memory corruption.

Analyzing this problem indicates that there is a real bug since mmap_lock
is only taken for read in mwriteprotect_range() and defers flushes, and
since there is insufficient consideration of concurrent deferred TLB
flushes in wp_page_copy().  Although the PTE is flushed from the TLBs in
wp_page_copy(), this flush takes place after the copy has already been
performed, and therefore changes of the page are possible between the time
of the copy and the time in which the PTE is flushed.

To make matters worse, memory-unprotection using userfaultfd also poses a
problem.  Although memory unprotection is logically a promotion of PTE
permissions, and therefore should not require a TLB flush, the current
userrfaultfd code might actually cause a demotion of the architectural PTE
permission: when userfaultfd_writeprotect() unprotects memory region, it
unintentionally *clears* the RW-bit if it was already set.  Note that this
unprotecting a PTE that is not write-protected is a valid use-case: the
userfaultfd monitor might ask to unprotect a region that holds both
write-protected and write-unprotected PTEs.

The scenario that happens in selftests/vm/userfaultfd is as follows:

cpu0cpu1cpu2

[ Writable PTE
  cached in TLB ]
userfaultfd_writeprotect()
[ write-*unprotect* ]
mwriteprotect_range()
mmap_read_lock()
change_protection()

change_protection_range()
...
change_pte_range()
[ *clear* “write”-bit ]
[ defer TLB flushes ]
[ page-fault ]
...
wp_page_copy()
 cow_user_page()
  [ copy page ]
[ write to old
  page ]
...
 set_pte_at_notify()

A similar scenario can happen:

cpu0cpu1cpu2cpu3

[ Writable PTE
  cached in TLB ]
userfaultfd_writeprotect()
[ write-protect ]
[ deferred TLB flush ]
userfaultfd_writeprotect()
[ write-unprotect ]
[ deferred TLB flush]
[ page-fault ]
wp_page_copy()
 cow_user_page()
 [ copy page ]
 ...[ write to page ]
set_pte_at_notify()

This race exists since commit 292924b26024 ("userfaultfd: wp: apply
_PAGE_UFFD_WP bit").  Yet, as Yu Zhao pointed, these races became apparent
since commit 09854ba94c6a ("mm: do_wp_page() simplification") which made
wp_page_copy() more likely to take place, specifically if page_count(page)
> 1.

To resolve the aforementioned races, check whether there are pending
flushes on uffd-write-protected VMAs, and if there are, perform a flush
before doing the COW.

Further optimizations will follow to avoid during uffd-write-unprotect
unnecassary PTE write-protection and TLB flushes.

Link: https://lkml.kernel.org/r/20210304095423.3825684-1-na...@vmware.com
Fixes: 09854ba94c6a ("mm: do_wp_page() simplification")
Signed-off-by: Nadav Amit 
Suggested-by: Yu Zhao 
Reviewed-by: Peter Xu 
Tested-by: Peter Xu 
Cc: Andrea Arcangeli 
Cc: Andy Lutomirski 
Cc: Pavel Emelyanov 
Cc: Mike Kravetz 
Cc: Mike Rapoport 
Cc: Minchan Kim 
Cc: Will Deacon 
Cc: Peter Zijlstra 
Cc: [5.9+]
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
---
 mm/memory.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/mm/memory.c b/mm/memory.c
index 14470ceaf3f2..3f33651a2a39 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2810,6 +2810,14 @@ static int do_wp_page(struct vm_fault *vmf)
 {
struct vm_area_struct *vma = vmf->vma;
 
+   /*
+* Userfaultfd write-protect can defer flushes. Ensure the TLB
+* is flushed in this case before copying.
+*/
+   if (unlikely(userfaultfd_wp(vmf->vma) &&
+mm_tlb_flush_pending(vmf->vma->vm_mm)))
+   flush_tlb_page(vmf->vma, vmf->address);
+
vmf->page = vm_normal_page(vma, vmf->address, vmf->orig_pte);
if (!vmf->page) {
/*
-- 
2.31.0.291.g576ba9dcdaf-goog

Re: [PATCH] drm/amdkfd: dqm fence memory corruption

2021-03-26 Thread Felix Kuehling

Am 2021-03-26 um 5:38 a.m. schrieb Qu Huang:
> On 2021/1/28 5:50, Felix Kuehling wrote:
>> Am 2021-01-27 um 7:33 a.m. schrieb Qu Huang:
>>> Amdgpu driver uses 4-byte data type as DQM fence memory,
>>> and transmits GPU address of fence memory to microcode
>>> through query status PM4 message. However, query status
>>> PM4 message definition and microcode processing are all
>>> processed according to 8 bytes. Fence memory only allocates
>>> 4 bytes of memory, but microcode does write 8 bytes of memory,
>>> so there is a memory corruption.
>>
>> Thank you for pointing out that discrepancy. That's a good catch!
>>
>> I'd prefer to fix this properly by making dqm->fence_addr a u64 pointer.
>> We should probably also fix up the query_status and
>> amdkfd_fence_wait_timeout function interfaces to use a 64 bit fence
>> values everywhere to be consistent.
>>
>> Regards,
>>    Felix
> Hi Felix, Thanks for your advice, please check v2 at
> https://lore.kernel.org/patchwork/patch/1372584/

Thank you for the reminder. I somehow missed your v2 patch on the
mailing list. I have reviewed and applied it to amd-staging-drm-next now.

Regards,
  Felix


> Thanks,
> Qu.
>>
>>
>>>
>>> Signed-off-by: Qu Huang 
>>> ---
>>>   drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
>>> b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
>>> index e686ce2..8b38d0c 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
>>> @@ -1161,7 +1161,7 @@ static int start_cpsch(struct
>>> device_queue_manager *dqm)
>>>   pr_debug("Allocating fence memory\n");
>>>     /* allocate fence memory on the gart */
>>> -    retval = kfd_gtt_sa_allocate(dqm->dev, sizeof(*dqm->fence_addr),
>>> +    retval = kfd_gtt_sa_allocate(dqm->dev, sizeof(uint64_t),
>>>   &dqm->fence_mem);
>>>     if (retval)
>

Re: [PATCH] drm/amdkfd: dqm fence memory corruption

2021-03-26 Thread Qu Huang


On 2021/1/28 5:50, Felix Kuehling wrote:

Am 2021-01-27 um 7:33 a.m. schrieb Qu Huang:

Amdgpu driver uses 4-byte data type as DQM fence memory,
and transmits GPU address of fence memory to microcode
through query status PM4 message. However, query status
PM4 message definition and microcode processing are all
processed according to 8 bytes. Fence memory only allocates
4 bytes of memory, but microcode does write 8 bytes of memory,
so there is a memory corruption.


Thank you for pointing out that discrepancy. That's a good catch!

I'd prefer to fix this properly by making dqm->fence_addr a u64 pointer.
We should probably also fix up the query_status and
amdkfd_fence_wait_timeout function interfaces to use a 64 bit fence
values everywhere to be consistent.

Regards,
   Felix
Hi Felix, Thanks for your advice, please check v2 at 
https://lore.kernel.org/patchwork/patch/1372584/

Thanks,
Qu.





Signed-off-by: Qu Huang 
---
  drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index e686ce2..8b38d0c 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -1161,7 +1161,7 @@ static int start_cpsch(struct device_queue_manager *dqm)
pr_debug("Allocating fence memory\n");
  
  	/* allocate fence memory on the gart */

-   retval = kfd_gtt_sa_allocate(dqm->dev, sizeof(*dqm->fence_addr),
+   retval = kfd_gtt_sa_allocate(dqm->dev, sizeof(uint64_t),
&dqm->fence_mem);
  
  	if (retval)

Re: [PATCH] mm: cma: fix corruption cma_sysfs_alloc_pages_count

2021-03-24 Thread Minchan Kim

On Wed, Mar 24, 2021 at 11:02:47PM +0300, Dmitry Osipenko wrote:
> 24.03.2021 22:57, Minchan Kim пишет:
> > On Wed, Mar 24, 2021 at 10:49:58PM +0300, Dmitry Osipenko wrote:
> >> 24.03.2021 22:43, Dmitry Osipenko пишет:
> >>> 24.03.2021 22:20, Minchan Kim пишет:
>   static int __init cma_sysfs_init(void)
>   {
>  -int i = 0;
>  +struct kobject *cma_kobj_root;
>  +struct cma_kobject *cma_kobj;
>   struct cma *cma;
>  +unsigned int i;
> >>>
>   while (--i >= 0) {
> >>>
> >>> Do you realize that this doesn't work anymore?
> >>>
>   cma = &cma_areas[i];
>  -kobject_put(&cma->stat->kobj);
>  -}
>   
>  -kfree(cma_stats);
>  -kobject_put(cma_kobj);
>  +kobject_put(&cma->cma_kobj->kobj);
>  +kfree(cma->cma_kobj);
> >>>
> >>> Freeing a null pointer?
> >>>
>  +cma->cma_kobj = NULL;
>  +}
>  +kobject_put(cma_kobj_root);
> >>>
> >>
> >> Please try to simulate the errors and check that error path is working
> >> properly in the next version.
> >>
> >> Alternatively, we could remove the cma_kobj_release entirely, like Greg
> >> suggested previously, and then don't care about cleaning up at all.
> > 
> > Does he suggested it to remove cma_kobj_release?(Initially, I did but
> > was rejected from Greg)
> > 
> 
> Alright, I haven't followed the previous threads fully and only saw the
> reply where he suggested to removed it.

No problem. I just posted it new version. Hopefully, it tastes good
for you. ;-)

Thanks for the review!

Re: [PATCH] mm: cma: fix corruption cma_sysfs_alloc_pages_count

2021-03-24 Thread Dmitry Osipenko

24.03.2021 22:57, Minchan Kim пишет:
> On Wed, Mar 24, 2021 at 10:49:58PM +0300, Dmitry Osipenko wrote:
>> 24.03.2021 22:43, Dmitry Osipenko пишет:
>>> 24.03.2021 22:20, Minchan Kim пишет:
  static int __init cma_sysfs_init(void)
  {
 -  int i = 0;
 +  struct kobject *cma_kobj_root;
 +  struct cma_kobject *cma_kobj;
struct cma *cma;
 +  unsigned int i;
>>>
while (--i >= 0) {
>>>
>>> Do you realize that this doesn't work anymore?
>>>
cma = &cma_areas[i];
 -  kobject_put(&cma->stat->kobj);
 -  }
  
 -  kfree(cma_stats);
 -  kobject_put(cma_kobj);
 +  kobject_put(&cma->cma_kobj->kobj);
 +  kfree(cma->cma_kobj);
>>>
>>> Freeing a null pointer?
>>>
 +  cma->cma_kobj = NULL;
 +  }
 +  kobject_put(cma_kobj_root);
>>>
>>
>> Please try to simulate the errors and check that error path is working
>> properly in the next version.
>>
>> Alternatively, we could remove the cma_kobj_release entirely, like Greg
>> suggested previously, and then don't care about cleaning up at all.
> 
> Does he suggested it to remove cma_kobj_release?(Initially, I did but
> was rejected from Greg)
> 

Alright, I haven't followed the previous threads fully and only saw the
reply where he suggested to removed it.

Re: [PATCH] mm: cma: fix corruption cma_sysfs_alloc_pages_count

2021-03-24 Thread Minchan Kim

On Wed, Mar 24, 2021 at 10:49:58PM +0300, Dmitry Osipenko wrote:
> 24.03.2021 22:43, Dmitry Osipenko пишет:
> > 24.03.2021 22:20, Minchan Kim пишет:
> >>  static int __init cma_sysfs_init(void)
> >>  {
> >> -  int i = 0;
> >> +  struct kobject *cma_kobj_root;
> >> +  struct cma_kobject *cma_kobj;
> >>struct cma *cma;
> >> +  unsigned int i;
> > 
> >>while (--i >= 0) {
> > 
> > Do you realize that this doesn't work anymore?
> > 
> >>cma = &cma_areas[i];
> >> -  kobject_put(&cma->stat->kobj);
> >> -  }
> >>  
> >> -  kfree(cma_stats);
> >> -  kobject_put(cma_kobj);
> >> +  kobject_put(&cma->cma_kobj->kobj);
> >> +  kfree(cma->cma_kobj);
> > 
> > Freeing a null pointer?
> > 
> >> +  cma->cma_kobj = NULL;
> >> +  }
> >> +  kobject_put(cma_kobj_root);
> > 
> 
> Please try to simulate the errors and check that error path is working
> properly in the next version.
> 
> Alternatively, we could remove the cma_kobj_release entirely, like Greg
> suggested previously, and then don't care about cleaning up at all.

Does he suggested it to remove cma_kobj_release?(Initially, I did but
was rejected from Greg)

Re: [PATCH] mm: cma: fix corruption cma_sysfs_alloc_pages_count

2021-03-24 Thread Minchan Kim

On Wed, Mar 24, 2021 at 08:53:26PM +0100, David Hildenbrand wrote:
> On 24.03.21 20:45, John Hubbard wrote:
> > On 3/24/21 12:20 PM, Minchan Kim wrote:
> > > struct cma_stat's lifespan for cma_sysfs is different with
> > > struct cma because kobject for sysfs requires dynamic object
> > > while CMA is static object[1]. When CMA is initialized,
> > > it couldn't use slab to allocate cma_stat since slab was not
> > > initialized yet. Thus, it allocates the dynamic object
> > > in subsys_initcall.
> > > 
> > > However, the cma allocation can happens before subsys_initcall
> > > then, it goes crash.
> > > 
> > > Dmitry reported[2]:
> > > 
> > > ..
> > > [1.226190] [] (cma_sysfs_alloc_pages_count) from 
> > > [] (cma_alloc+0x153/0x274)
> > > [1.226720] [] (cma_alloc) from [] 
> > > (__alloc_from_contiguous+0x37/0x8c)
> > > [1.227272] [] (__alloc_from_contiguous) from [] 
> > > (atomic_pool_init+0x7b/0x126)
> > > [1.233596] [] (atomic_pool_init) from [] 
> > > (do_one_initcall+0x45/0x1e4)
> > > [1.234188] [] (do_one_initcall) from [] 
> > > (kernel_init_freeable+0x157/0x1a6)
> > > [1.234741] [] (kernel_init_freeable) from [] 
> > > (kernel_init+0xd/0xe0)
> > > [1.235289] [] (kernel_init) from [] 
> > > (ret_from_fork+0x11/0x1c)
> > > 
> > > This patch moves those statistic fields of cma_stat into struct cma
> > > and introduces cma_kobject wrapper to follow kobject's rule.
> > > 
> > > At the same time, it fixes other routines based on suggestions[3][4].
> > > 
> > > [1] https://lore.kernel.org/linux-mm/ycoamxqt6dzkc...@kroah.com/
> > > [2] 
> > > https://lore.kernel.org/linux-mm/fead70a2-4330-79ff-e79a-d8511eab1...@gmail.com/
> > > [3] 
> > > https://lore.kernel.org/linux-mm/20210323195050.2577017-1-minc...@kernel.org/
> > > [4] 
> > > https://lore.kernel.org/linux-mm/20210324010547.4134370-1-minc...@kernel.org/
> > > 
> > > Reported-by: Dmitry Osipenko 
> > > Tested-by: Dmitry Osipenko 
> > > Suggested-by: Dmitry Osipenko 
> > > Suggested-by: John Hubbard 
> > > Suggested-by: Matthew Wilcox 
> > > Signed-off-by: Minchan Kim 
> > > ---
> > > I belive it's worth to have separate patch rather than replacing
> > > original patch. It will also help to merge without conflict
> > > since we already filed other patch based on it.
> > > Strictly speaking, separating fix part and readbility part
> > > in this patch would be better but it's gray to separate them
> > > since most code in this patch was done while we were fixing
> > > the bug. Since we don't release it yet, I hope it will work.
> > > Otherwise, I can send a replacement patch inclucing all of
> > > changes happend until now with gathering SoB.
> > 
> > If we still have a choice, we should not merge a patch that has a known
> > serious problem, such as a crash. That's only done if the broken problematic
> > patch has already been committed to a tree that doesn't allow rebasing,
> > such as of course the main linux.git.
> > 
> > Here, I *think* it's just in linux-next and mmotm, so we still are allowed
> > to fix the original patch.
> 
> Yes, that's what we should do in case it's not upstream yet. Clean resend +
> re-apply.

Okay, let me replace the original one including all other patches.

Re: [PATCH] mm: cma: fix corruption cma_sysfs_alloc_pages_count

2021-03-24 Thread David Hildenbrand


On 24.03.21 20:45, John Hubbard wrote:

On 3/24/21 12:20 PM, Minchan Kim wrote:

struct cma_stat's lifespan for cma_sysfs is different with
struct cma because kobject for sysfs requires dynamic object
while CMA is static object[1]. When CMA is initialized,
it couldn't use slab to allocate cma_stat since slab was not
initialized yet. Thus, it allocates the dynamic object
in subsys_initcall.

However, the cma allocation can happens before subsys_initcall
then, it goes crash.

Dmitry reported[2]:

..
[1.226190] [] (cma_sysfs_alloc_pages_count) from [] 
(cma_alloc+0x153/0x274)
[1.226720] [] (cma_alloc) from [] 
(__alloc_from_contiguous+0x37/0x8c)
[1.227272] [] (__alloc_from_contiguous) from [] 
(atomic_pool_init+0x7b/0x126)
[1.233596] [] (atomic_pool_init) from [] 
(do_one_initcall+0x45/0x1e4)
[1.234188] [] (do_one_initcall) from [] 
(kernel_init_freeable+0x157/0x1a6)
[1.234741] [] (kernel_init_freeable) from [] 
(kernel_init+0xd/0xe0)
[1.235289] [] (kernel_init) from [] 
(ret_from_fork+0x11/0x1c)

This patch moves those statistic fields of cma_stat into struct cma
and introduces cma_kobject wrapper to follow kobject's rule.

At the same time, it fixes other routines based on suggestions[3][4].

[1] https://lore.kernel.org/linux-mm/ycoamxqt6dzkc...@kroah.com/
[2] 
https://lore.kernel.org/linux-mm/fead70a2-4330-79ff-e79a-d8511eab1...@gmail.com/
[3] 
https://lore.kernel.org/linux-mm/20210323195050.2577017-1-minc...@kernel.org/
[4] 
https://lore.kernel.org/linux-mm/20210324010547.4134370-1-minc...@kernel.org/

Reported-by: Dmitry Osipenko 
Tested-by: Dmitry Osipenko 
Suggested-by: Dmitry Osipenko 
Suggested-by: John Hubbard 
Suggested-by: Matthew Wilcox 
Signed-off-by: Minchan Kim 
---
I belive it's worth to have separate patch rather than replacing
original patch. It will also help to merge without conflict
since we already filed other patch based on it.
Strictly speaking, separating fix part and readbility part
in this patch would be better but it's gray to separate them
since most code in this patch was done while we were fixing
the bug. Since we don't release it yet, I hope it will work.
Otherwise, I can send a replacement patch inclucing all of
changes happend until now with gathering SoB.


If we still have a choice, we should not merge a patch that has a known
serious problem, such as a crash. That's only done if the broken problematic
patch has already been committed to a tree that doesn't allow rebasing,
such as of course the main linux.git.

Here, I *think* it's just in linux-next and mmotm, so we still are allowed
to fix the original patch.


Yes, that's what we should do in case it's not upstream yet. Clean 
resend + re-apply.



--
Thanks,

David / dhildenb

Re: [PATCH] mm: cma: fix corruption cma_sysfs_alloc_pages_count

2021-03-24 Thread Dmitry Osipenko

24.03.2021 22:43, Dmitry Osipenko пишет:
> 24.03.2021 22:20, Minchan Kim пишет:
>>  static int __init cma_sysfs_init(void)
>>  {
>> -int i = 0;
>> +struct kobject *cma_kobj_root;
>> +struct cma_kobject *cma_kobj;
>>  struct cma *cma;
>> +unsigned int i;
> 
>>  while (--i >= 0) {
> 
> Do you realize that this doesn't work anymore?
> 
>>  cma = &cma_areas[i];
>> -kobject_put(&cma->stat->kobj);
>> -}
>>  
>> -kfree(cma_stats);
>> -kobject_put(cma_kobj);
>> +kobject_put(&cma->cma_kobj->kobj);
>> +kfree(cma->cma_kobj);
> 
> Freeing a null pointer?
> 
>> +cma->cma_kobj = NULL;
>> +}
>> +kobject_put(cma_kobj_root);
> 

Please try to simulate the errors and check that error path is working
properly in the next version.

Alternatively, we could remove the cma_kobj_release entirely, like Greg
suggested previously, and then don't care about cleaning up at all.

Re: [PATCH] mm: cma: fix corruption cma_sysfs_alloc_pages_count

2021-03-24 Thread Minchan Kim

On Wed, Mar 24, 2021 at 10:43:49PM +0300, Dmitry Osipenko wrote:
> 24.03.2021 22:20, Minchan Kim пишет:
> >  static int __init cma_sysfs_init(void)
> >  {
> > -   int i = 0;
> > +   struct kobject *cma_kobj_root;
> > +   struct cma_kobject *cma_kobj;
> > struct cma *cma;
> > +   unsigned int i;
> 
> > while (--i >= 0) {
> 
> Do you realize that this doesn't work anymore?
> 
> > cma = &cma_areas[i];
> > -   kobject_put(&cma->stat->kobj);
> > -   }
> >  
> > -   kfree(cma_stats);
> > -   kobject_put(cma_kobj);
> > +   kobject_put(&cma->cma_kobj->kobj);
> > +   kfree(cma->cma_kobj);
> 
> Freeing a null pointer?

Need coffee.
 
diff --git a/mm/cma_sysfs.c b/mm/cma_sysfs.c
index a670a80aad6f..73463be08df7 100644
--- a/mm/cma_sysfs.c
+++ b/mm/cma_sysfs.c
@@ -79,8 +79,7 @@ static int __init cma_sysfs_init(void)
struct kobject *cma_kobj_root;
struct cma_kobject *cma_kobj;
struct cma *cma;
-   unsigned int i;
-   int err;
+   int i, err;

cma_kobj_root = kobject_create_and_add("cma", mm_kobj);
if (!cma_kobj_root)
@@ -108,10 +107,7 @@ static int __init cma_sysfs_init(void)
 out:
while (--i >= 0) {
cma = &cma_areas[i];
-
kobject_put(&cma->cma_kobj->kobj);
-   kfree(cma->cma_kobj);
-   cma->cma_kobj = NULL;
}
kobject_put(cma_kobj_root);

Re: [PATCH] mm: cma: fix corruption cma_sysfs_alloc_pages_count

2021-03-24 Thread John Hubbard


On 3/24/21 12:20 PM, Minchan Kim wrote:

struct cma_stat's lifespan for cma_sysfs is different with
struct cma because kobject for sysfs requires dynamic object
while CMA is static object[1]. When CMA is initialized,
it couldn't use slab to allocate cma_stat since slab was not
initialized yet. Thus, it allocates the dynamic object
in subsys_initcall.

However, the cma allocation can happens before subsys_initcall
then, it goes crash.

Dmitry reported[2]:

..
[1.226190] [] (cma_sysfs_alloc_pages_count) from [] 
(cma_alloc+0x153/0x274)
[1.226720] [] (cma_alloc) from [] 
(__alloc_from_contiguous+0x37/0x8c)
[1.227272] [] (__alloc_from_contiguous) from [] 
(atomic_pool_init+0x7b/0x126)
[1.233596] [] (atomic_pool_init) from [] 
(do_one_initcall+0x45/0x1e4)
[1.234188] [] (do_one_initcall) from [] 
(kernel_init_freeable+0x157/0x1a6)
[1.234741] [] (kernel_init_freeable) from [] 
(kernel_init+0xd/0xe0)
[1.235289] [] (kernel_init) from [] 
(ret_from_fork+0x11/0x1c)

This patch moves those statistic fields of cma_stat into struct cma
and introduces cma_kobject wrapper to follow kobject's rule.

At the same time, it fixes other routines based on suggestions[3][4].

[1] https://lore.kernel.org/linux-mm/ycoamxqt6dzkc...@kroah.com/
[2] 
https://lore.kernel.org/linux-mm/fead70a2-4330-79ff-e79a-d8511eab1...@gmail.com/
[3] 
https://lore.kernel.org/linux-mm/20210323195050.2577017-1-minc...@kernel.org/
[4] 
https://lore.kernel.org/linux-mm/20210324010547.4134370-1-minc...@kernel.org/

Reported-by: Dmitry Osipenko 
Tested-by: Dmitry Osipenko 
Suggested-by: Dmitry Osipenko 
Suggested-by: John Hubbard 
Suggested-by: Matthew Wilcox 
Signed-off-by: Minchan Kim 
---
I belive it's worth to have separate patch rather than replacing
original patch. It will also help to merge without conflict
since we already filed other patch based on it.
Strictly speaking, separating fix part and readbility part
in this patch would be better but it's gray to separate them
since most code in this patch was done while we were fixing
the bug. Since we don't release it yet, I hope it will work.
Otherwise, I can send a replacement patch inclucing all of
changes happend until now with gathering SoB.


If we still have a choice, we should not merge a patch that has a known
serious problem, such as a crash. That's only done if the broken problematic
patch has already been committed to a tree that doesn't allow rebasing,
such as of course the main linux.git.

Here, I *think* it's just in linux-next and mmotm, so we still are allowed
to fix the original patch.

thanks,
--
John Hubbard
NVIDIA

Re: [PATCH] mm: cma: fix corruption cma_sysfs_alloc_pages_count

2021-03-24 Thread Dmitry Osipenko

24.03.2021 22:20, Minchan Kim пишет:
>  static int __init cma_sysfs_init(void)
>  {
> - int i = 0;
> + struct kobject *cma_kobj_root;
> + struct cma_kobject *cma_kobj;
>   struct cma *cma;
> + unsigned int i;

>   while (--i >= 0) {

Do you realize that this doesn't work anymore?

>   cma = &cma_areas[i];
> - kobject_put(&cma->stat->kobj);
> - }
>  
> - kfree(cma_stats);
> - kobject_put(cma_kobj);
> + kobject_put(&cma->cma_kobj->kobj);
> + kfree(cma->cma_kobj);

Freeing a null pointer?

> + cma->cma_kobj = NULL;
> + }
> + kobject_put(cma_kobj_root);

[PATCH] mm: cma: fix corruption cma_sysfs_alloc_pages_count

2021-03-24 Thread Minchan Kim

struct cma_stat's lifespan for cma_sysfs is different with
struct cma because kobject for sysfs requires dynamic object
while CMA is static object[1]. When CMA is initialized,
it couldn't use slab to allocate cma_stat since slab was not
initialized yet. Thus, it allocates the dynamic object
in subsys_initcall.

However, the cma allocation can happens before subsys_initcall
then, it goes crash.

Dmitry reported[2]:

..
[1.226190] [] (cma_sysfs_alloc_pages_count) from [] 
(cma_alloc+0x153/0x274)
[1.226720] [] (cma_alloc) from [] 
(__alloc_from_contiguous+0x37/0x8c)
[1.227272] [] (__alloc_from_contiguous) from [] 
(atomic_pool_init+0x7b/0x126)
[1.233596] [] (atomic_pool_init) from [] 
(do_one_initcall+0x45/0x1e4)
[1.234188] [] (do_one_initcall) from [] 
(kernel_init_freeable+0x157/0x1a6)
[1.234741] [] (kernel_init_freeable) from [] 
(kernel_init+0xd/0xe0)
[1.235289] [] (kernel_init) from [] 
(ret_from_fork+0x11/0x1c)

This patch moves those statistic fields of cma_stat into struct cma
and introduces cma_kobject wrapper to follow kobject's rule.

At the same time, it fixes other routines based on suggestions[3][4].

[1] https://lore.kernel.org/linux-mm/ycoamxqt6dzkc...@kroah.com/
[2] 
https://lore.kernel.org/linux-mm/fead70a2-4330-79ff-e79a-d8511eab1...@gmail.com/
[3] 
https://lore.kernel.org/linux-mm/20210323195050.2577017-1-minc...@kernel.org/
[4] 
https://lore.kernel.org/linux-mm/20210324010547.4134370-1-minc...@kernel.org/

Reported-by: Dmitry Osipenko 
Tested-by: Dmitry Osipenko 
Suggested-by: Dmitry Osipenko 
Suggested-by: John Hubbard 
Suggested-by: Matthew Wilcox 
Signed-off-by: Minchan Kim 
---
I belive it's worth to have separate patch rather than replacing
original patch. It will also help to merge without conflict
since we already filed other patch based on it.
Strictly speaking, separating fix part and readbility part
in this patch would be better but it's gray to separate them
since most code in this patch was done while we were fixing
the bug. Since we don't release it yet, I hope it will work.
Otherwise, I can send a replacement patch inclucing all of
changes happend until now with gathering SoB.

 mm/cma.c   |  4 +--
 mm/cma.h   | 25 +++---
 mm/cma_sysfs.c | 88 --
 3 files changed, 65 insertions(+), 52 deletions(-)

diff --git a/mm/cma.c b/mm/cma.c
index 90e27458ddb7..08c45157911a 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -509,11 +509,11 @@ struct page *cma_alloc(struct cma *cma, size_t count, 
unsigned int align,
 out:
if (page) {
count_vm_event(CMA_ALLOC_SUCCESS);
-   cma_sysfs_alloc_pages_count(cma, count);
+   cma_sysfs_account_success_pages(cma, count);
} else {
count_vm_event(CMA_ALLOC_FAIL);
if (cma)
-   cma_sysfs_fail_pages_count(cma, count);
+   cma_sysfs_account_fail_pages(cma, count);
}
 
return page;
diff --git a/mm/cma.h b/mm/cma.h
index 95d1aa2d808a..37b9b7858c8e 100644
--- a/mm/cma.h
+++ b/mm/cma.h
@@ -5,12 +5,8 @@
 #include 
 #include 
 
-struct cma_stat {
-   spinlock_t lock;
-   /* the number of CMA page successful allocations */
-   unsigned long nr_pages_succeeded;
-   /* the number of CMA page allocation failures */
-   unsigned long nr_pages_failed;
+struct cma_kobject {
+   struct cma *cma;
struct kobject kobj;
 };
 
@@ -27,7 +23,12 @@ struct cma {
 #endif
char name[CMA_MAX_NAME];
 #ifdef CONFIG_CMA_SYSFS
-   struct cma_stat *stat;
+   /* the number of CMA page successful allocations */
+   atomic64_t nr_pages_succeeded;
+   /* the number of CMA page allocation failures */
+   atomic64_t nr_pages_failed;
+   /* kobject requires dynamic object */
+   struct cma_kobject *cma_kobj;
 #endif
 };
 
@@ -40,10 +41,12 @@ static inline unsigned long cma_bitmap_maxno(struct cma 
*cma)
 }
 
 #ifdef CONFIG_CMA_SYSFS
-void cma_sysfs_alloc_pages_count(struct cma *cma, size_t count);
-void cma_sysfs_fail_pages_count(struct cma *cma, size_t count);
+void cma_sysfs_account_success_pages(struct cma *cma, unsigned long nr_pages);
+void cma_sysfs_account_fail_pages(struct cma *cma, unsigned long nr_pages);
 #else
-static inline void cma_sysfs_alloc_pages_count(struct cma *cma, size_t count) 
{};
-static inline void cma_sysfs_fail_pages_count(struct cma *cma, size_t count) 
{};
+static inline void cma_sysfs_account_success_pages(struct cma *cma,
+  unsigned long nr_pages) {};
+static inline void cma_sysfs_account_fail_pages(struct cma *cma,
+   unsigned long nr_pages) {};
 #endif
 #endif
diff --git a/mm/cma_sysfs.c b/mm/cma_sysfs.c
index 3134b2b3a96d..a670a80aad6f 100644
--- a/mm/cma_sysfs.c
+++ b/mm/cma_sysfs.c
@@ -11,50 +11,54 @@
 
 #include "cma.h"
 
-static struct cma_stat *cma_stats;
-
-void cma_sysfs

Re: [PATCH 5.10 103/157] MIPS: kernel: Reserve exception base early to prevent corruption

2021-03-22 Thread Florian Fainelli




On 3/22/2021 8:00 AM, Greg Kroah-Hartman wrote:
> On Mon, Mar 22, 2021 at 07:44:05PM +0530, Naresh Kamboju wrote:
>> On Mon, 22 Mar 2021 at 18:14, Greg Kroah-Hartman
>>  wrote:
>>>
>>> From: Thomas Bogendoerfer 
>>>
>>> [ Upstream commit bd67b711bfaa02cf19e88aa2d9edae5c1c1d2739 ]
>>>
>>> BMIPS is one of the few platforms that do change the exception base.
>>> After commit 2dcb39645441 ("memblock: do not start bottom-up allocations
>>> with kernel_end") we started seeing BMIPS boards fail to boot with the
>>> built-in FDT being corrupted.
>>>
>>> Before the cited commit, early allocations would be in the [kernel_end,
>>> RAM_END] range, but after commit they would be within [RAM_START +
>>> PAGE_SIZE, RAM_END].
>>>
>>> The custom exception base handler that is installed by
>>> bmips_ebase_setup() done for BMIPS5000 CPUs ends-up trampling on the
>>> memory region allocated by unflatten_and_copy_device_tree() thus
>>> corrupting the FDT used by the kernel.
>>>
>>> To fix this, we need to perform an early reservation of the custom
>>> exception space. Additional we reserve the first 4k (1k for R3k) for
>>> either normal exception vector space (legacy CPUs) or special vectors
>>> like cache exceptions.
>>>
>>> Huge thanks to Serge for analysing and proposing a solution to this
>>> issue.
>>>
>>> Fixes: 2dcb39645441 ("memblock: do not start bottom-up allocations with 
>>> kernel_end")
>>> Reported-by: Kamal Dasu 
>>> Debugged-by: Serge Semin 
>>> Acked-by: Mike Rapoport 
>>> Tested-by: Florian Fainelli 
>>> Reviewed-by: Serge Semin 
>>> Signed-off-by: Thomas Bogendoerfer 
>>> Signed-off-by: Sasha Levin 
>>> ---
>>>  arch/mips/include/asm/traps.h|  3 +++
>>>  arch/mips/kernel/cpu-probe.c |  6 ++
>>>  arch/mips/kernel/cpu-r3k-probe.c |  3 +++
>>>  arch/mips/kernel/traps.c | 10 +-
>>
>> mipc tinyconfig and allnoconfig builds failed on stable-rc 5.10 branch
>>
>> make --silent --keep-going --jobs=8
>> O=/home/tuxbuild/.cache/tuxmake/builds/1/tmp ARCH=mips
>> CROSS_COMPILE=mips-linux-gnu- 'CC=sccache mips-linux-gnu-gcc'
>> 'HOSTCC=sccache gcc'
>> WARNING: modpost: vmlinux.o(.text+0x6a88): Section mismatch in
>> reference from the function reserve_exception_space() to the function
>> .meminit.text:memblock_reserve()
>> The function reserve_exception_space() references
>> the function __meminit memblock_reserve().
>> This is often because reserve_exception_space lacks a __meminit
>> annotation or the annotation of memblock_reserve is wrong.
>>
>> FATAL: modpost: Section mismatches detected.
>> Set CONFIG_SECTION_MISMATCH_WARN_ONLY=y to allow them.
>> make[2]: *** [/builds/linux/scripts/Makefile.modpost:59:
>> vmlinux.symvers] Error 1
>>
>> Here is the list of build failed,
>>  - gcc-8-allnoconfig
>>  - gcc-8-tinyconfig
>>  - gcc-9-allnoconfig
>>  - gcc-9-tinyconfig
>>  - gcc-10-allnoconfig
>>  - gcc-10-tinyconfig
>>  - clang-10-tinyconfig
>>  - clang-10-allnoconfig
>>  - clang-11-allnoconfig
>>  - clang-11-tinyconfig
>>  - clang-12-tinyconfig
>>  - clang-12-allnoconfig
>>
>> Reported-by: Naresh Kamboju 
>>
>> link:
>> https://gitlab.com/Linaro/lkft/mirrors/stable/linux-stable-rc/-/jobs/1117167411#L142
>>
>> steps to reproduce:
>> ---
>> # TuxMake is a command line tool and Python library that provides
>> # portable and repeatable Linux kernel builds across a variety of
>> # architectures, toolchains, kernel configurations, and make targets.
>> #
>> # TuxMake supports the concept of runtimes.
>> # See https://docs.tuxmake.org/runtimes/, for that to work it requires
>> # that you install podman or docker on your system.
>> #
>> # To install tuxmake on your system globally:
>> # sudo pip3 install -U tuxmake
>> #
>> # See https://docs.tuxmake.org/ for complete documentation.
>>
>>
>> tuxmake --runtime podman --target-arch mips --toolchain gcc-10
>> --kconfig tinyconfig
>>
> 
> Thanks for letting me know, I'll go drop this and push out a new -rc...

2dcb39645441 ("memblock: do not start bottom-up allocations with
kernel_end") is only present in v5.11 AFAICT, so not applicable for
v5.10. I did not observe this problem on v5.10.
-- 
Florian

Re: [PATCH 5.10 103/157] MIPS: kernel: Reserve exception base early to prevent corruption

2021-03-22 Thread Greg Kroah-Hartman

On Mon, Mar 22, 2021 at 07:44:05PM +0530, Naresh Kamboju wrote:
> On Mon, 22 Mar 2021 at 18:14, Greg Kroah-Hartman
>  wrote:
> >
> > From: Thomas Bogendoerfer 
> >
> > [ Upstream commit bd67b711bfaa02cf19e88aa2d9edae5c1c1d2739 ]
> >
> > BMIPS is one of the few platforms that do change the exception base.
> > After commit 2dcb39645441 ("memblock: do not start bottom-up allocations
> > with kernel_end") we started seeing BMIPS boards fail to boot with the
> > built-in FDT being corrupted.
> >
> > Before the cited commit, early allocations would be in the [kernel_end,
> > RAM_END] range, but after commit they would be within [RAM_START +
> > PAGE_SIZE, RAM_END].
> >
> > The custom exception base handler that is installed by
> > bmips_ebase_setup() done for BMIPS5000 CPUs ends-up trampling on the
> > memory region allocated by unflatten_and_copy_device_tree() thus
> > corrupting the FDT used by the kernel.
> >
> > To fix this, we need to perform an early reservation of the custom
> > exception space. Additional we reserve the first 4k (1k for R3k) for
> > either normal exception vector space (legacy CPUs) or special vectors
> > like cache exceptions.
> >
> > Huge thanks to Serge for analysing and proposing a solution to this
> > issue.
> >
> > Fixes: 2dcb39645441 ("memblock: do not start bottom-up allocations with 
> > kernel_end")
> > Reported-by: Kamal Dasu 
> > Debugged-by: Serge Semin 
> > Acked-by: Mike Rapoport 
> > Tested-by: Florian Fainelli 
> > Reviewed-by: Serge Semin 
> > Signed-off-by: Thomas Bogendoerfer 
> > Signed-off-by: Sasha Levin 
> > ---
> >  arch/mips/include/asm/traps.h|  3 +++
> >  arch/mips/kernel/cpu-probe.c |  6 ++
> >  arch/mips/kernel/cpu-r3k-probe.c |  3 +++
> >  arch/mips/kernel/traps.c | 10 +-
> 
> mipc tinyconfig and allnoconfig builds failed on stable-rc 5.10 branch
> 
> make --silent --keep-going --jobs=8
> O=/home/tuxbuild/.cache/tuxmake/builds/1/tmp ARCH=mips
> CROSS_COMPILE=mips-linux-gnu- 'CC=sccache mips-linux-gnu-gcc'
> 'HOSTCC=sccache gcc'
> WARNING: modpost: vmlinux.o(.text+0x6a88): Section mismatch in
> reference from the function reserve_exception_space() to the function
> .meminit.text:memblock_reserve()
> The function reserve_exception_space() references
> the function __meminit memblock_reserve().
> This is often because reserve_exception_space lacks a __meminit
> annotation or the annotation of memblock_reserve is wrong.
> 
> FATAL: modpost: Section mismatches detected.
> Set CONFIG_SECTION_MISMATCH_WARN_ONLY=y to allow them.
> make[2]: *** [/builds/linux/scripts/Makefile.modpost:59:
> vmlinux.symvers] Error 1
> 
> Here is the list of build failed,
>  - gcc-8-allnoconfig
>  - gcc-8-tinyconfig
>  - gcc-9-allnoconfig
>  - gcc-9-tinyconfig
>  - gcc-10-allnoconfig
>  - gcc-10-tinyconfig
>  - clang-10-tinyconfig
>  - clang-10-allnoconfig
>  - clang-11-allnoconfig
>  - clang-11-tinyconfig
>  - clang-12-tinyconfig
>  - clang-12-allnoconfig
> 
> Reported-by: Naresh Kamboju 
> 
> link:
> https://gitlab.com/Linaro/lkft/mirrors/stable/linux-stable-rc/-/jobs/1117167411#L142
> 
> steps to reproduce:
> ---
> # TuxMake is a command line tool and Python library that provides
> # portable and repeatable Linux kernel builds across a variety of
> # architectures, toolchains, kernel configurations, and make targets.
> #
> # TuxMake supports the concept of runtimes.
> # See https://docs.tuxmake.org/runtimes/, for that to work it requires
> # that you install podman or docker on your system.
> #
> # To install tuxmake on your system globally:
> # sudo pip3 install -U tuxmake
> #
> # See https://docs.tuxmake.org/ for complete documentation.
> 
> 
> tuxmake --runtime podman --target-arch mips --toolchain gcc-10
> --kconfig tinyconfig
> 

Thanks for letting me know, I'll go drop this and push out a new -rc...

greg k-h

Re: [PATCH 5.10 103/157] MIPS: kernel: Reserve exception base early to prevent corruption

2021-03-22 Thread Naresh Kamboju

On Mon, 22 Mar 2021 at 18:14, Greg Kroah-Hartman
 wrote:
>
> From: Thomas Bogendoerfer 
>
> [ Upstream commit bd67b711bfaa02cf19e88aa2d9edae5c1c1d2739 ]
>
> BMIPS is one of the few platforms that do change the exception base.
> After commit 2dcb39645441 ("memblock: do not start bottom-up allocations
> with kernel_end") we started seeing BMIPS boards fail to boot with the
> built-in FDT being corrupted.
>
> Before the cited commit, early allocations would be in the [kernel_end,
> RAM_END] range, but after commit they would be within [RAM_START +
> PAGE_SIZE, RAM_END].
>
> The custom exception base handler that is installed by
> bmips_ebase_setup() done for BMIPS5000 CPUs ends-up trampling on the
> memory region allocated by unflatten_and_copy_device_tree() thus
> corrupting the FDT used by the kernel.
>
> To fix this, we need to perform an early reservation of the custom
> exception space. Additional we reserve the first 4k (1k for R3k) for
> either normal exception vector space (legacy CPUs) or special vectors
> like cache exceptions.
>
> Huge thanks to Serge for analysing and proposing a solution to this
> issue.
>
> Fixes: 2dcb39645441 ("memblock: do not start bottom-up allocations with 
> kernel_end")
> Reported-by: Kamal Dasu 
> Debugged-by: Serge Semin 
> Acked-by: Mike Rapoport 
> Tested-by: Florian Fainelli 
> Reviewed-by: Serge Semin 
> Signed-off-by: Thomas Bogendoerfer 
> Signed-off-by: Sasha Levin 
> ---
>  arch/mips/include/asm/traps.h|  3 +++
>  arch/mips/kernel/cpu-probe.c |  6 ++
>  arch/mips/kernel/cpu-r3k-probe.c |  3 +++
>  arch/mips/kernel/traps.c | 10 +-

mipc tinyconfig and allnoconfig builds failed on stable-rc 5.10 branch

make --silent --keep-going --jobs=8
O=/home/tuxbuild/.cache/tuxmake/builds/1/tmp ARCH=mips
CROSS_COMPILE=mips-linux-gnu- 'CC=sccache mips-linux-gnu-gcc'
'HOSTCC=sccache gcc'
WARNING: modpost: vmlinux.o(.text+0x6a88): Section mismatch in
reference from the function reserve_exception_space() to the function
.meminit.text:memblock_reserve()
The function reserve_exception_space() references
the function __meminit memblock_reserve().
This is often because reserve_exception_space lacks a __meminit
annotation or the annotation of memblock_reserve is wrong.

FATAL: modpost: Section mismatches detected.
Set CONFIG_SECTION_MISMATCH_WARN_ONLY=y to allow them.
make[2]: *** [/builds/linux/scripts/Makefile.modpost:59:
vmlinux.symvers] Error 1

Here is the list of build failed,
 - gcc-8-allnoconfig
 - gcc-8-tinyconfig
 - gcc-9-allnoconfig
 - gcc-9-tinyconfig
 - gcc-10-allnoconfig
 - gcc-10-tinyconfig
 - clang-10-tinyconfig
 - clang-10-allnoconfig
 - clang-11-allnoconfig
 - clang-11-tinyconfig
 - clang-12-tinyconfig
 - clang-12-allnoconfig

Reported-by: Naresh Kamboju 

link:
https://gitlab.com/Linaro/lkft/mirrors/stable/linux-stable-rc/-/jobs/1117167411#L142

steps to reproduce:
---
# TuxMake is a command line tool and Python library that provides
# portable and repeatable Linux kernel builds across a variety of
# architectures, toolchains, kernel configurations, and make targets.
#
# TuxMake supports the concept of runtimes.
# See https://docs.tuxmake.org/runtimes/, for that to work it requires
# that you install podman or docker on your system.
#
# To install tuxmake on your system globally:
# sudo pip3 install -U tuxmake
#
# See https://docs.tuxmake.org/ for complete documentation.


tuxmake --runtime podman --target-arch mips --toolchain gcc-10
--kconfig tinyconfig


--
Linaro LKFT
https://lkft.linaro.org

[PATCH 4.14 34/43] PCI: rpadlpar: Fix potential drc_name corruption in store functions

2021-03-22 Thread Greg Kroah-Hartman

From: Tyrel Datwyler 

commit cc7a0bb058b85ea03db87169c60c7cfdd5d34678 upstream.

Both add_slot_store() and remove_slot_store() try to fix up the
drc_name copied from the store buffer by placing a NUL terminator at
nbyte + 1 or in place of a '\n' if present. However, the static buffer
that we copy the drc_name data into is not zeroed and can contain
anything past the n-th byte.

This is problematic if a '\n' byte appears in that buffer after nbytes
and the string copied into the store buffer was not NUL terminated to
start with as the strchr() search for a '\n' byte will mark this
incorrectly as the end of the drc_name string resulting in a drc_name
string that contains garbage data after the n-th byte.

Additionally it will cause us to overwrite that '\n' byte on the stack
with NUL, potentially corrupting data on the stack.

The following debugging shows an example of the drmgr utility writing
"PHB 4543" to the add_slot sysfs attribute, but add_slot_store()
logging a corrupted string value.

  drmgr: drmgr: -c phb -a -s PHB 4543 -d 1
  add_slot_store: drc_name = PHB 4543°|<82>!, rc = -19

Fix this by using strscpy() instead of memcpy() to ensure the string
is NUL terminated when copied into the static drc_name buffer.
Further, since the string is now NUL terminated the code only needs to
change '\n' to '\0' when present.

Cc: sta...@vger.kernel.org
Signed-off-by: Tyrel Datwyler 
[mpe: Reformat change log and add mention of possible stack corruption]
Signed-off-by: Michael Ellerman 
Link: https://lore.kernel.org/r/20210315214821.452959-1-tyr...@linux.ibm.com
Signed-off-by: Greg Kroah-Hartman 
---
 drivers/pci/hotplug/rpadlpar_sysfs.c |   14 ++
 1 file changed, 6 insertions(+), 8 deletions(-)

--- a/drivers/pci/hotplug/rpadlpar_sysfs.c
+++ b/drivers/pci/hotplug/rpadlpar_sysfs.c
@@ -39,12 +39,11 @@ static ssize_t add_slot_store(struct kob
if (nbytes >= MAX_DRC_NAME_LEN)
return 0;
 
-   memcpy(drc_name, buf, nbytes);
+   strscpy(drc_name, buf, nbytes + 1);
 
end = strchr(drc_name, '\n');
-   if (!end)
-   end = &drc_name[nbytes];
-   *end = '\0';
+   if (end)
+   *end = '\0';
 
rc = dlpar_add_slot(drc_name);
if (rc)
@@ -70,12 +69,11 @@ static ssize_t remove_slot_store(struct
if (nbytes >= MAX_DRC_NAME_LEN)
return 0;
 
-   memcpy(drc_name, buf, nbytes);
+   strscpy(drc_name, buf, nbytes + 1);
 
end = strchr(drc_name, '\n');
-   if (!end)
-   end = &drc_name[nbytes];
-   *end = '\0';
+   if (end)
+   *end = '\0';
 
rc = dlpar_remove_slot(drc_name);
if (rc)

[PATCH 4.9 17/25] PCI: rpadlpar: Fix potential drc_name corruption in store functions

2021-03-22 Thread Greg Kroah-Hartman

From: Tyrel Datwyler 

commit cc7a0bb058b85ea03db87169c60c7cfdd5d34678 upstream.

Both add_slot_store() and remove_slot_store() try to fix up the
drc_name copied from the store buffer by placing a NUL terminator at
nbyte + 1 or in place of a '\n' if present. However, the static buffer
that we copy the drc_name data into is not zeroed and can contain
anything past the n-th byte.

This is problematic if a '\n' byte appears in that buffer after nbytes
and the string copied into the store buffer was not NUL terminated to
start with as the strchr() search for a '\n' byte will mark this
incorrectly as the end of the drc_name string resulting in a drc_name
string that contains garbage data after the n-th byte.

Additionally it will cause us to overwrite that '\n' byte on the stack
with NUL, potentially corrupting data on the stack.

The following debugging shows an example of the drmgr utility writing
"PHB 4543" to the add_slot sysfs attribute, but add_slot_store()
logging a corrupted string value.

  drmgr: drmgr: -c phb -a -s PHB 4543 -d 1
  add_slot_store: drc_name = PHB 4543°|<82>!, rc = -19

Fix this by using strscpy() instead of memcpy() to ensure the string
is NUL terminated when copied into the static drc_name buffer.
Further, since the string is now NUL terminated the code only needs to
change '\n' to '\0' when present.

Cc: sta...@vger.kernel.org
Signed-off-by: Tyrel Datwyler 
[mpe: Reformat change log and add mention of possible stack corruption]
Signed-off-by: Michael Ellerman 
Link: https://lore.kernel.org/r/20210315214821.452959-1-tyr...@linux.ibm.com
Signed-off-by: Greg Kroah-Hartman 
---
 drivers/pci/hotplug/rpadlpar_sysfs.c |   14 ++
 1 file changed, 6 insertions(+), 8 deletions(-)

--- a/drivers/pci/hotplug/rpadlpar_sysfs.c
+++ b/drivers/pci/hotplug/rpadlpar_sysfs.c
@@ -39,12 +39,11 @@ static ssize_t add_slot_store(struct kob
if (nbytes >= MAX_DRC_NAME_LEN)
return 0;
 
-   memcpy(drc_name, buf, nbytes);
+   strscpy(drc_name, buf, nbytes + 1);
 
end = strchr(drc_name, '\n');
-   if (!end)
-   end = &drc_name[nbytes];
-   *end = '\0';
+   if (end)
+   *end = '\0';
 
rc = dlpar_add_slot(drc_name);
if (rc)
@@ -70,12 +69,11 @@ static ssize_t remove_slot_store(struct
if (nbytes >= MAX_DRC_NAME_LEN)
return 0;
 
-   memcpy(drc_name, buf, nbytes);
+   strscpy(drc_name, buf, nbytes + 1);
 
end = strchr(drc_name, '\n');
-   if (!end)
-   end = &drc_name[nbytes];
-   *end = '\0';
+   if (end)
+   *end = '\0';
 
rc = dlpar_remove_slot(drc_name);
if (rc)

[PATCH 4.4 10/14] PCI: rpadlpar: Fix potential drc_name corruption in store functions

2021-03-22 Thread Greg Kroah-Hartman

From: Tyrel Datwyler 

commit cc7a0bb058b85ea03db87169c60c7cfdd5d34678 upstream.

Both add_slot_store() and remove_slot_store() try to fix up the
drc_name copied from the store buffer by placing a NUL terminator at
nbyte + 1 or in place of a '\n' if present. However, the static buffer
that we copy the drc_name data into is not zeroed and can contain
anything past the n-th byte.

This is problematic if a '\n' byte appears in that buffer after nbytes
and the string copied into the store buffer was not NUL terminated to
start with as the strchr() search for a '\n' byte will mark this
incorrectly as the end of the drc_name string resulting in a drc_name
string that contains garbage data after the n-th byte.

Additionally it will cause us to overwrite that '\n' byte on the stack
with NUL, potentially corrupting data on the stack.

The following debugging shows an example of the drmgr utility writing
"PHB 4543" to the add_slot sysfs attribute, but add_slot_store()
logging a corrupted string value.

  drmgr: drmgr: -c phb -a -s PHB 4543 -d 1
  add_slot_store: drc_name = PHB 4543°|<82>!, rc = -19

Fix this by using strscpy() instead of memcpy() to ensure the string
is NUL terminated when copied into the static drc_name buffer.
Further, since the string is now NUL terminated the code only needs to
change '\n' to '\0' when present.

Cc: sta...@vger.kernel.org
Signed-off-by: Tyrel Datwyler 
[mpe: Reformat change log and add mention of possible stack corruption]
Signed-off-by: Michael Ellerman 
Link: https://lore.kernel.org/r/20210315214821.452959-1-tyr...@linux.ibm.com
Signed-off-by: Greg Kroah-Hartman 
---
 drivers/pci/hotplug/rpadlpar_sysfs.c |   14 ++
 1 file changed, 6 insertions(+), 8 deletions(-)

--- a/drivers/pci/hotplug/rpadlpar_sysfs.c
+++ b/drivers/pci/hotplug/rpadlpar_sysfs.c
@@ -39,12 +39,11 @@ static ssize_t add_slot_store(struct kob
if (nbytes >= MAX_DRC_NAME_LEN)
return 0;
 
-   memcpy(drc_name, buf, nbytes);
+   strscpy(drc_name, buf, nbytes + 1);
 
end = strchr(drc_name, '\n');
-   if (!end)
-   end = &drc_name[nbytes];
-   *end = '\0';
+   if (end)
+   *end = '\0';
 
rc = dlpar_add_slot(drc_name);
if (rc)
@@ -70,12 +69,11 @@ static ssize_t remove_slot_store(struct
if (nbytes >= MAX_DRC_NAME_LEN)
return 0;
 
-   memcpy(drc_name, buf, nbytes);
+   strscpy(drc_name, buf, nbytes + 1);
 
end = strchr(drc_name, '\n');
-   if (!end)
-   end = &drc_name[nbytes];
-   *end = '\0';
+   if (end)
+   *end = '\0';
 
rc = dlpar_remove_slot(drc_name);
if (rc)

[PATCH 4.19 33/43] PCI: rpadlpar: Fix potential drc_name corruption in store functions

2021-03-22 Thread Greg Kroah-Hartman

From: Tyrel Datwyler 

commit cc7a0bb058b85ea03db87169c60c7cfdd5d34678 upstream.

Both add_slot_store() and remove_slot_store() try to fix up the
drc_name copied from the store buffer by placing a NUL terminator at
nbyte + 1 or in place of a '\n' if present. However, the static buffer
that we copy the drc_name data into is not zeroed and can contain
anything past the n-th byte.

This is problematic if a '\n' byte appears in that buffer after nbytes
and the string copied into the store buffer was not NUL terminated to
start with as the strchr() search for a '\n' byte will mark this
incorrectly as the end of the drc_name string resulting in a drc_name
string that contains garbage data after the n-th byte.

Additionally it will cause us to overwrite that '\n' byte on the stack
with NUL, potentially corrupting data on the stack.

The following debugging shows an example of the drmgr utility writing
"PHB 4543" to the add_slot sysfs attribute, but add_slot_store()
logging a corrupted string value.

  drmgr: drmgr: -c phb -a -s PHB 4543 -d 1
  add_slot_store: drc_name = PHB 4543°|<82>!, rc = -19

Fix this by using strscpy() instead of memcpy() to ensure the string
is NUL terminated when copied into the static drc_name buffer.
Further, since the string is now NUL terminated the code only needs to
change '\n' to '\0' when present.

Cc: sta...@vger.kernel.org
Signed-off-by: Tyrel Datwyler 
[mpe: Reformat change log and add mention of possible stack corruption]
Signed-off-by: Michael Ellerman 
Link: https://lore.kernel.org/r/20210315214821.452959-1-tyr...@linux.ibm.com
Signed-off-by: Greg Kroah-Hartman 
---
 drivers/pci/hotplug/rpadlpar_sysfs.c |   14 ++
 1 file changed, 6 insertions(+), 8 deletions(-)

--- a/drivers/pci/hotplug/rpadlpar_sysfs.c
+++ b/drivers/pci/hotplug/rpadlpar_sysfs.c
@@ -34,12 +34,11 @@ static ssize_t add_slot_store(struct kob
if (nbytes >= MAX_DRC_NAME_LEN)
return 0;
 
-   memcpy(drc_name, buf, nbytes);
+   strscpy(drc_name, buf, nbytes + 1);
 
end = strchr(drc_name, '\n');
-   if (!end)
-   end = &drc_name[nbytes];
-   *end = '\0';
+   if (end)
+   *end = '\0';
 
rc = dlpar_add_slot(drc_name);
if (rc)
@@ -65,12 +64,11 @@ static ssize_t remove_slot_store(struct
if (nbytes >= MAX_DRC_NAME_LEN)
return 0;
 
-   memcpy(drc_name, buf, nbytes);
+   strscpy(drc_name, buf, nbytes + 1);
 
end = strchr(drc_name, '\n');
-   if (!end)
-   end = &drc_name[nbytes];
-   *end = '\0';
+   if (end)
+   *end = '\0';
 
rc = dlpar_remove_slot(drc_name);
if (rc)

[PATCH 5.4 48/60] PCI: rpadlpar: Fix potential drc_name corruption in store functions

2021-03-22 Thread Greg Kroah-Hartman

From: Tyrel Datwyler 

commit cc7a0bb058b85ea03db87169c60c7cfdd5d34678 upstream.

Both add_slot_store() and remove_slot_store() try to fix up the
drc_name copied from the store buffer by placing a NUL terminator at
nbyte + 1 or in place of a '\n' if present. However, the static buffer
that we copy the drc_name data into is not zeroed and can contain
anything past the n-th byte.

This is problematic if a '\n' byte appears in that buffer after nbytes
and the string copied into the store buffer was not NUL terminated to
start with as the strchr() search for a '\n' byte will mark this
incorrectly as the end of the drc_name string resulting in a drc_name
string that contains garbage data after the n-th byte.

Additionally it will cause us to overwrite that '\n' byte on the stack
with NUL, potentially corrupting data on the stack.

The following debugging shows an example of the drmgr utility writing
"PHB 4543" to the add_slot sysfs attribute, but add_slot_store()
logging a corrupted string value.

  drmgr: drmgr: -c phb -a -s PHB 4543 -d 1
  add_slot_store: drc_name = PHB 4543°|<82>!, rc = -19

Fix this by using strscpy() instead of memcpy() to ensure the string
is NUL terminated when copied into the static drc_name buffer.
Further, since the string is now NUL terminated the code only needs to
change '\n' to '\0' when present.

Cc: sta...@vger.kernel.org
Signed-off-by: Tyrel Datwyler 
[mpe: Reformat change log and add mention of possible stack corruption]
Signed-off-by: Michael Ellerman 
Link: https://lore.kernel.org/r/20210315214821.452959-1-tyr...@linux.ibm.com
Signed-off-by: Greg Kroah-Hartman 
---
 drivers/pci/hotplug/rpadlpar_sysfs.c |   14 ++
 1 file changed, 6 insertions(+), 8 deletions(-)

--- a/drivers/pci/hotplug/rpadlpar_sysfs.c
+++ b/drivers/pci/hotplug/rpadlpar_sysfs.c
@@ -34,12 +34,11 @@ static ssize_t add_slot_store(struct kob
if (nbytes >= MAX_DRC_NAME_LEN)
return 0;
 
-   memcpy(drc_name, buf, nbytes);
+   strscpy(drc_name, buf, nbytes + 1);
 
end = strchr(drc_name, '\n');
-   if (!end)
-   end = &drc_name[nbytes];
-   *end = '\0';
+   if (end)
+   *end = '\0';
 
rc = dlpar_add_slot(drc_name);
if (rc)
@@ -65,12 +64,11 @@ static ssize_t remove_slot_store(struct
if (nbytes >= MAX_DRC_NAME_LEN)
return 0;
 
-   memcpy(drc_name, buf, nbytes);
+   strscpy(drc_name, buf, nbytes + 1);
 
end = strchr(drc_name, '\n');
-   if (!end)
-   end = &drc_name[nbytes];
-   *end = '\0';
+   if (end)
+   *end = '\0';
 
rc = dlpar_remove_slot(drc_name);
if (rc)

[PATCH 5.10 137/157] PCI: rpadlpar: Fix potential drc_name corruption in store functions

2021-03-22 Thread Greg Kroah-Hartman

From: Tyrel Datwyler 

commit cc7a0bb058b85ea03db87169c60c7cfdd5d34678 upstream.

Both add_slot_store() and remove_slot_store() try to fix up the
drc_name copied from the store buffer by placing a NUL terminator at
nbyte + 1 or in place of a '\n' if present. However, the static buffer
that we copy the drc_name data into is not zeroed and can contain
anything past the n-th byte.

This is problematic if a '\n' byte appears in that buffer after nbytes
and the string copied into the store buffer was not NUL terminated to
start with as the strchr() search for a '\n' byte will mark this
incorrectly as the end of the drc_name string resulting in a drc_name
string that contains garbage data after the n-th byte.

Additionally it will cause us to overwrite that '\n' byte on the stack
with NUL, potentially corrupting data on the stack.

The following debugging shows an example of the drmgr utility writing
"PHB 4543" to the add_slot sysfs attribute, but add_slot_store()
logging a corrupted string value.

  drmgr: drmgr: -c phb -a -s PHB 4543 -d 1
  add_slot_store: drc_name = PHB 4543°|<82>!, rc = -19

Fix this by using strscpy() instead of memcpy() to ensure the string
is NUL terminated when copied into the static drc_name buffer.
Further, since the string is now NUL terminated the code only needs to
change '\n' to '\0' when present.

Cc: sta...@vger.kernel.org
Signed-off-by: Tyrel Datwyler 
[mpe: Reformat change log and add mention of possible stack corruption]
Signed-off-by: Michael Ellerman 
Link: https://lore.kernel.org/r/20210315214821.452959-1-tyr...@linux.ibm.com
Signed-off-by: Greg Kroah-Hartman 
---
 drivers/pci/hotplug/rpadlpar_sysfs.c |   14 ++
 1 file changed, 6 insertions(+), 8 deletions(-)

--- a/drivers/pci/hotplug/rpadlpar_sysfs.c
+++ b/drivers/pci/hotplug/rpadlpar_sysfs.c
@@ -34,12 +34,11 @@ static ssize_t add_slot_store(struct kob
if (nbytes >= MAX_DRC_NAME_LEN)
return 0;
 
-   memcpy(drc_name, buf, nbytes);
+   strscpy(drc_name, buf, nbytes + 1);
 
end = strchr(drc_name, '\n');
-   if (!end)
-   end = &drc_name[nbytes];
-   *end = '\0';
+   if (end)
+   *end = '\0';
 
rc = dlpar_add_slot(drc_name);
if (rc)
@@ -65,12 +64,11 @@ static ssize_t remove_slot_store(struct
if (nbytes >= MAX_DRC_NAME_LEN)
return 0;
 
-   memcpy(drc_name, buf, nbytes);
+   strscpy(drc_name, buf, nbytes + 1);
 
end = strchr(drc_name, '\n');
-   if (!end)
-   end = &drc_name[nbytes];
-   *end = '\0';
+   if (end)
+   *end = '\0';
 
rc = dlpar_remove_slot(drc_name);
if (rc)

[PATCH 5.10 103/157] MIPS: kernel: Reserve exception base early to prevent corruption

2021-03-22 Thread Greg Kroah-Hartman

From: Thomas Bogendoerfer 

[ Upstream commit bd67b711bfaa02cf19e88aa2d9edae5c1c1d2739 ]

BMIPS is one of the few platforms that do change the exception base.
After commit 2dcb39645441 ("memblock: do not start bottom-up allocations
with kernel_end") we started seeing BMIPS boards fail to boot with the
built-in FDT being corrupted.

Before the cited commit, early allocations would be in the [kernel_end,
RAM_END] range, but after commit they would be within [RAM_START +
PAGE_SIZE, RAM_END].

The custom exception base handler that is installed by
bmips_ebase_setup() done for BMIPS5000 CPUs ends-up trampling on the
memory region allocated by unflatten_and_copy_device_tree() thus
corrupting the FDT used by the kernel.

To fix this, we need to perform an early reservation of the custom
exception space. Additional we reserve the first 4k (1k for R3k) for
either normal exception vector space (legacy CPUs) or special vectors
like cache exceptions.

Huge thanks to Serge for analysing and proposing a solution to this
issue.

Fixes: 2dcb39645441 ("memblock: do not start bottom-up allocations with 
kernel_end")
Reported-by: Kamal Dasu 
Debugged-by: Serge Semin 
Acked-by: Mike Rapoport 
Tested-by: Florian Fainelli 
Reviewed-by: Serge Semin 
Signed-off-by: Thomas Bogendoerfer 
Signed-off-by: Sasha Levin 
---
 arch/mips/include/asm/traps.h|  3 +++
 arch/mips/kernel/cpu-probe.c |  6 ++
 arch/mips/kernel/cpu-r3k-probe.c |  3 +++
 arch/mips/kernel/traps.c | 10 +-
 4 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/arch/mips/include/asm/traps.h b/arch/mips/include/asm/traps.h
index 6a0864bb604d..9038b91e2d8c 100644
--- a/arch/mips/include/asm/traps.h
+++ b/arch/mips/include/asm/traps.h
@@ -24,6 +24,9 @@ extern void (*board_ebase_setup)(void);
 extern void (*board_cache_error_setup)(void);
 
 extern int register_nmi_notifier(struct notifier_block *nb);
+extern void reserve_exception_space(phys_addr_t addr, unsigned long size);
+
+#define VECTORSPACING 0x100/* for EI/VI mode */
 
 #define nmi_notifier(fn, pri)  \
 ({ \
diff --git a/arch/mips/kernel/cpu-probe.c b/arch/mips/kernel/cpu-probe.c
index 31cb9199197c..21794db53c05 100644
--- a/arch/mips/kernel/cpu-probe.c
+++ b/arch/mips/kernel/cpu-probe.c
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "fpu-probe.h"
@@ -1619,6 +1620,7 @@ static inline void cpu_probe_broadcom(struct cpuinfo_mips 
*c, unsigned int cpu)
c->cputype = CPU_BMIPS3300;
__cpu_name[cpu] = "Broadcom BMIPS3300";
set_elf_platform(cpu, "bmips3300");
+   reserve_exception_space(0x400, VECTORSPACING * 64);
break;
case PRID_IMP_BMIPS43XX: {
int rev = c->processor_id & PRID_REV_MASK;
@@ -1629,6 +1631,7 @@ static inline void cpu_probe_broadcom(struct cpuinfo_mips 
*c, unsigned int cpu)
__cpu_name[cpu] = "Broadcom BMIPS4380";
set_elf_platform(cpu, "bmips4380");
c->options |= MIPS_CPU_RIXI;
+   reserve_exception_space(0x400, VECTORSPACING * 64);
} else {
c->cputype = CPU_BMIPS4350;
__cpu_name[cpu] = "Broadcom BMIPS4350";
@@ -1645,6 +1648,7 @@ static inline void cpu_probe_broadcom(struct cpuinfo_mips 
*c, unsigned int cpu)
__cpu_name[cpu] = "Broadcom BMIPS5000";
set_elf_platform(cpu, "bmips5000");
c->options |= MIPS_CPU_ULRI | MIPS_CPU_RIXI;
+   reserve_exception_space(0x1000, VECTORSPACING * 64);
break;
}
 }
@@ -2124,6 +2128,8 @@ void cpu_probe(void)
if (cpu == 0)
__ua_limit = ~((1ull << cpu_vmbits) - 1);
 #endif
+
+   reserve_exception_space(0, 0x1000);
 }
 
 void cpu_report(void)
diff --git a/arch/mips/kernel/cpu-r3k-probe.c b/arch/mips/kernel/cpu-r3k-probe.c
index abdbbe8c5a43..af654771918c 100644
--- a/arch/mips/kernel/cpu-r3k-probe.c
+++ b/arch/mips/kernel/cpu-r3k-probe.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "fpu-probe.h"
 
@@ -158,6 +159,8 @@ void cpu_probe(void)
cpu_set_fpu_opts(c);
else
cpu_set_nofpu_opts(c);
+
+   reserve_exception_space(0, 0x400);
 }
 
 void cpu_report(void)
diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c
index e0352958e2f7..808b8b61ded1 100644
--- a/arch/mips/kernel/traps.c
+++ b/arch/mips/kernel/traps.c
@@ -2009,13 +2009,16 @@ void __noreturn nmi_exception_handler(struct pt_regs 
*regs)
nmi_exit();
 }
 
-#define VECTORSPACING 0x100/* for EI/VI mode */
-
 unsigned long ebase;
 EXPORT_SYMBOL_GPL(ebase);
 unsigned long exception_handlers[32];
 unsigned long vi_handlers[64];
 
+void reserve_exception_space(phys_addr_t addr, unsigned l

[PATCH 5.11 099/120] PCI: rpadlpar: Fix potential drc_name corruption in store functions

2021-03-22 Thread Greg Kroah-Hartman

From: Tyrel Datwyler 

commit cc7a0bb058b85ea03db87169c60c7cfdd5d34678 upstream.

Both add_slot_store() and remove_slot_store() try to fix up the
drc_name copied from the store buffer by placing a NUL terminator at
nbyte + 1 or in place of a '\n' if present. However, the static buffer
that we copy the drc_name data into is not zeroed and can contain
anything past the n-th byte.

This is problematic if a '\n' byte appears in that buffer after nbytes
and the string copied into the store buffer was not NUL terminated to
start with as the strchr() search for a '\n' byte will mark this
incorrectly as the end of the drc_name string resulting in a drc_name
string that contains garbage data after the n-th byte.

Additionally it will cause us to overwrite that '\n' byte on the stack
with NUL, potentially corrupting data on the stack.

The following debugging shows an example of the drmgr utility writing
"PHB 4543" to the add_slot sysfs attribute, but add_slot_store()
logging a corrupted string value.

  drmgr: drmgr: -c phb -a -s PHB 4543 -d 1
  add_slot_store: drc_name = PHB 4543°|<82>!, rc = -19

Fix this by using strscpy() instead of memcpy() to ensure the string
is NUL terminated when copied into the static drc_name buffer.
Further, since the string is now NUL terminated the code only needs to
change '\n' to '\0' when present.

Cc: sta...@vger.kernel.org
Signed-off-by: Tyrel Datwyler 
[mpe: Reformat change log and add mention of possible stack corruption]
Signed-off-by: Michael Ellerman 
Link: https://lore.kernel.org/r/20210315214821.452959-1-tyr...@linux.ibm.com
Signed-off-by: Greg Kroah-Hartman 
---
 drivers/pci/hotplug/rpadlpar_sysfs.c |   14 ++
 1 file changed, 6 insertions(+), 8 deletions(-)

--- a/drivers/pci/hotplug/rpadlpar_sysfs.c
+++ b/drivers/pci/hotplug/rpadlpar_sysfs.c
@@ -34,12 +34,11 @@ static ssize_t add_slot_store(struct kob
if (nbytes >= MAX_DRC_NAME_LEN)
return 0;
 
-   memcpy(drc_name, buf, nbytes);
+   strscpy(drc_name, buf, nbytes + 1);
 
end = strchr(drc_name, '\n');
-   if (!end)
-   end = &drc_name[nbytes];
-   *end = '\0';
+   if (end)
+   *end = '\0';
 
rc = dlpar_add_slot(drc_name);
if (rc)
@@ -65,12 +64,11 @@ static ssize_t remove_slot_store(struct
if (nbytes >= MAX_DRC_NAME_LEN)
return 0;
 
-   memcpy(drc_name, buf, nbytes);
+   strscpy(drc_name, buf, nbytes + 1);
 
end = strchr(drc_name, '\n');
-   if (!end)
-   end = &drc_name[nbytes];
-   *end = '\0';
+   if (end)
+   *end = '\0';
 
rc = dlpar_remove_slot(drc_name);
if (rc)

[tip: irq/core] genirq/matrix: Prevent allocation counter corruption

2021-03-19 Thread tip-bot2 for Vitaly Kuznetsov

The following commit has been merged into the irq/core branch of tip:

Commit-ID: c93a5e20c3c2dabef8ea360a3d3f18c6f68233ab
Gitweb:
https://git.kernel.org/tip/c93a5e20c3c2dabef8ea360a3d3f18c6f68233ab
Author:Vitaly Kuznetsov 
AuthorDate:Fri, 19 Mar 2021 12:18:23 +01:00
Committer: Thomas Gleixner 
CommitterDate: Fri, 19 Mar 2021 22:52:11 +01:00

genirq/matrix: Prevent allocation counter corruption

When irq_matrix_free() is called for an unallocated vector the
managed_allocated and total_allocated counters get out of sync with the
real state of the matrix. Later, when the last interrupt is freed, these
counters will underflow resulting in UINTMAX because the counters are
unsigned.

While this is certainly a problem of the calling code, this can be catched
in the allocator by checking the allocation bit for the to be freed vector
which simplifies debugging.

An example of the problem described above:
https://lore.kernel.org/lkml/20210318192819.636943...@linutronix.de/

Add the missing sanity check and emit a warning when it triggers.

Suggested-by: Thomas Gleixner 
Signed-off-by: Vitaly Kuznetsov 
Signed-off-by: Thomas Gleixner 
Link: https://lore.kernel.org/r/20210319111823.1105248-1-vkuzn...@redhat.com

---
 kernel/irq/matrix.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/irq/matrix.c b/kernel/irq/matrix.c
index 6f8b1d1..578596e 100644
--- a/kernel/irq/matrix.c
+++ b/kernel/irq/matrix.c
@@ -422,7 +422,9 @@ void irq_matrix_free(struct irq_matrix *m, unsigned int cpu,
if (WARN_ON_ONCE(bit < m->alloc_start || bit >= m->alloc_end))
return;
 
-   clear_bit(bit, cm->alloc_map);
+   if (WARN_ON_ONCE(!test_and_clear_bit(bit, cm->alloc_map)))
+   return;
+
cm->allocated--;
if(managed)
cm->managed_allocated--;

Re: [PATCH] mm/slub: Add slub_debug option to panic on memory corruption

2021-03-18 Thread Kees Cook

On Thu, Mar 18, 2021 at 01:56:05PM +0100, Vlastimil Babka wrote:
> I was going to suggest adding a panic_on_taint parameter... but turns out it 
> was
> already added last year! And various memory corruption detections already use
> TAINT_BAD_PAGE, including SLUB.
> If anything's missing an add_taint() it can be added, and with the parameter 
> you
> should get what you want.

Ah-ha! That works too. I hadn't seen that -- I wonder if I can wire some
other hardening things up to that. (e.g. refactor BUG_ON_CORRUPTION
finally.)

-- 
Kees Cook

Re: [PATCH] mm/slub: Add slub_debug option to panic on memory corruption

2021-03-18 Thread Vlastimil Babka

On 3/18/21 6:48 AM, Kees Cook wrote:
> On Tue, Mar 09, 2021 at 07:18:32PM +0100, Vlastimil Babka wrote:
>> On 3/9/21 7:14 PM, Georgi Djakov wrote:
>> > Hi Vlastimil,
>> > 
>> > Thanks for the comment!
>> > 
>> > On 3/9/21 17:09, Vlastimil Babka wrote:
>> >> On 3/9/21 2:47 PM, Georgi Djakov wrote:
>> >>> Being able to stop the system immediately when a memory corruption
>> >>> is detected is crucial to finding the source of it. This is very
>> >>> useful when the memory can be inspected with kdump or other tools.
>> >>
>> >> Is this in some testing scenarios where you would also use e.g. 
>> >> panic_on_warn?
>> >> We could hook to that. If not, we could introduce a new
>> >> panic_on_memory_corruption that would apply also for debug_pagealloc and 
>> >> whatnot?
>> > 
>> > I would prefer that we not tie it with panic_on_warn - there might be lots 
>> > of
>> > new code in multiple subsystems, so hitting some WARNing while testing is 
>> > not
>> > something unexpected.
>> > 
>> > Introducing an additional panic_on_memory_corruption would work, but i 
>> > noticed
>> > that we already have slub_debug and thought to re-use that. But indeed, 
>> > аdding
>> > an option to panic in for example bad_page() sounds also useful, if that's 
>> > what
>> > you suggest.
>> 
>> Yes, that would be another example.
>> Also CCing Kees for input, as besides the "kdump ASAP for debugging" case, I 
>> can
>> imagine security hardening folks could be interested in the "somebody might 
>> have
>> just failed to pwn the kernel, better panic than let them continue" angle. 
>> But
>> I'm naive wrt security, so it might be a stupid idea :)
> 
> I've really wanted such things, but Linus has been pretty adamant about
> not wanting to provide new "panic" paths (or even BUG usage[1]). It
> seems that panic_on_warn remains the way to get this behavior,
> with the understanding that WARN should only be produced on
> expected-to-be-impossible situations[1].
> 
> Hitting a WARN while testing should result in either finding and fixing
> a real bug, or removing the WARN in favor of pr_warn(). :)

I was going to suggest adding a panic_on_taint parameter... but turns out it was
already added last year! And various memory corruption detections already use
TAINT_BAD_PAGE, including SLUB.
If anything's missing an add_taint() it can be added, and with the parameter you
should get what you want.

> -Kees
> 
> [1] 
> https://www.kernel.org/doc/html/latest/process/deprecated.html#bug-and-bug-on
>

[PATCH rdma-next 0/6] Fix memory corruption in CM

2021-03-18 Thread Leon Romanovsky

From: Leon Romanovsky 

Hi,

This series from Mark fixes long standing bug in CM migration logic,
reported by Ryan [1].

Thanks

[1] 
https://lore.kernel.org/linux-rdma/cafmmrnx9cg--nunzjfm8ywqfaetsmawv4eogkb3a0+hnjdt...@mail.gmail.com/

Mark Zhang (6):
  Revert "IB/cm: Mark stale CM id's whenever the mad agent was
unregistered"
  IB/cm: Remove "mad_agent" parameter of ib_cancel_mad
  IB/cm: Remove "mad_agent" parameter of ib_modify_mad
  IB/cm: Clear all associated AV's ports when remove a cm device
  IB/cm: Add lock protection when access av/alt_av's port of a cm_id
  IB/cm: Initialize av before acquire the spin lock in cm_lap_handler

 drivers/infiniband/core/cm.c   | 359 -
 drivers/infiniband/core/mad.c  |  17 +-
 drivers/infiniband/core/sa_query.c |   4 +-
 include/rdma/ib_mad.h  |  27 ++-
 4 files changed, 222 insertions(+), 185 deletions(-)

--
2.30.2

Re: [PATCH] mm/slub: Add slub_debug option to panic on memory corruption

2021-03-17 Thread Kees Cook

On Tue, Mar 09, 2021 at 07:18:32PM +0100, Vlastimil Babka wrote:
> On 3/9/21 7:14 PM, Georgi Djakov wrote:
> > Hi Vlastimil,
> > 
> > Thanks for the comment!
> > 
> > On 3/9/21 17:09, Vlastimil Babka wrote:
> >> On 3/9/21 2:47 PM, Georgi Djakov wrote:
> >>> Being able to stop the system immediately when a memory corruption
> >>> is detected is crucial to finding the source of it. This is very
> >>> useful when the memory can be inspected with kdump or other tools.
> >>
> >> Is this in some testing scenarios where you would also use e.g. 
> >> panic_on_warn?
> >> We could hook to that. If not, we could introduce a new
> >> panic_on_memory_corruption that would apply also for debug_pagealloc and 
> >> whatnot?
> > 
> > I would prefer that we not tie it with panic_on_warn - there might be lots 
> > of
> > new code in multiple subsystems, so hitting some WARNing while testing is 
> > not
> > something unexpected.
> > 
> > Introducing an additional panic_on_memory_corruption would work, but i 
> > noticed
> > that we already have slub_debug and thought to re-use that. But indeed, 
> > аdding
> > an option to panic in for example bad_page() sounds also useful, if that's 
> > what
> > you suggest.
> 
> Yes, that would be another example.
> Also CCing Kees for input, as besides the "kdump ASAP for debugging" case, I 
> can
> imagine security hardening folks could be interested in the "somebody might 
> have
> just failed to pwn the kernel, better panic than let them continue" angle. But
> I'm naive wrt security, so it might be a stupid idea :)

I've really wanted such things, but Linus has been pretty adamant about
not wanting to provide new "panic" paths (or even BUG usage[1]). It
seems that panic_on_warn remains the way to get this behavior,
with the understanding that WARN should only be produced on
expected-to-be-impossible situations[1].

Hitting a WARN while testing should result in either finding and fixing
a real bug, or removing the WARN in favor of pr_warn(). :)

-Kees

[1] 
https://www.kernel.org/doc/html/latest/process/deprecated.html#bug-and-bug-on

-- 
Kees Cook

Re: BUG: KFENCE: memory corruption in usb_get_device_descriptor

2021-03-17 Thread Greg Kroah-Hartman

On Wed, Mar 17, 2021 at 04:56:15PM +0530, Naresh Kamboju wrote:
> On Wed, 17 Mar 2021 at 15:34, Greg Kroah-Hartman
>  wrote:
> >
> > On Wed, Mar 17, 2021 at 02:28:40PM +0530, Naresh Kamboju wrote:
> > > While booting Linux mainline master 5.12.0-rc2 and 5.12.0-rc3 on arm64
> > > Hikey device the following KFENCE bug was found.
> > >
> > > Recently, we have enabled CONFIG_KFENCE=y and started seeing this crash.
> > > kernel BUG log:
> >
> > What USB traffic are you having here?
> 
> This is  getting triggered while booting the device.
> We are not running any traffic.

Ah, so this is device probe time.

> > And has this ever not triggered?
> 
> No.
> It was not triggered before.
> Since CONFIG_KFENCE=y is added to our builds recently we are able to
> reproduce always on recent builds.
> 
> Steps to reproduce:
> 1) Build arm64 kernel Image with this given config.
>   - tuxmake --runtime podman --target-arch arm64 --toolchain gcc-9
> --kconfig defconfig --kconfig-add
> https://builds.tuxbuild.com/1pfztfszUNcDwOAyMrw2wPMKNfc/config
> 2) Boot arm64 hikey hi6220 device
> 3) While booting the device you will get to see this kernel BUG:
> 
> [   18.243075] BUG: KFENCE: memory corruption in
> usb_get_device_descriptor+0x80/0xb0
> [   18.813861] BUG: KFENCE: memory corruption in
> __usbnet_read_cmd.isra.0+0xd0/0x1a0

There was a warning before this, from the hub code, when reading from
this device as well.  Perhaps this is just a side affect of the real
memory corruption issue somewhere else?

Bisection would be nice, but I'm placing odds on this always being an
issue here in this driver code...

thanks for the report.

greg k-h

Re: [PATCH v2] rpadlpar: fix potential drc_name corruption in store functions

2021-03-17 Thread Michael Ellerman

On Mon, 15 Mar 2021 15:48:21 -0600, Tyrel Datwyler wrote:
> Both add_slot_store() and remove_slot_store() try to fix up the drc_name
> copied from the store buffer by placing a NULL terminator at nbyte + 1
> or in place of a '\n' if present. However, the static buffer that we
> copy the drc_name data into is not zeored and can contain anything past
> the n-th byte. This is problematic if a '\n' byte appears in that buffer
> after nbytes and the string copied into the store buffer was not NULL
> terminated to start with as the strchr() search for a '\n' byte will mark
> this incorrectly as the end of the drc_name string resulting in a drc_name
> string that contains garbage data after the n-th byte. The following
> debugging shows an example of the drmgr utility writing "PHB 4543" to
> the add_slot sysfs attribute, but add_slot_store logging a corrupted
> string value.
> 
> [...]

Applied to powerpc/fixes.

[1/1] rpadlpar: fix potential drc_name corruption in store functions
  https://git.kernel.org/powerpc/c/cc7a0bb058b85ea03db87169c60c7cfdd5d34678

cheers

Re: BUG: KFENCE: memory corruption in usb_get_device_descriptor

2021-03-17 Thread Naresh Kamboju

On Wed, 17 Mar 2021 at 15:34, Greg Kroah-Hartman
 wrote:
>
> On Wed, Mar 17, 2021 at 02:28:40PM +0530, Naresh Kamboju wrote:
> > While booting Linux mainline master 5.12.0-rc2 and 5.12.0-rc3 on arm64
> > Hikey device the following KFENCE bug was found.
> >
> > Recently, we have enabled CONFIG_KFENCE=y and started seeing this crash.
> > kernel BUG log:
>
> What USB traffic are you having here?

This is  getting triggered while booting the device.
We are not running any traffic.

>
> And has this ever not triggered?

No.
It was not triggered before.
Since CONFIG_KFENCE=y is added to our builds recently we are able to
reproduce always on recent builds.

Steps to reproduce:
1) Build arm64 kernel Image with this given config.
  - tuxmake --runtime podman --target-arch arm64 --toolchain gcc-9
--kconfig defconfig --kconfig-add
https://builds.tuxbuild.com/1pfztfszUNcDwOAyMrw2wPMKNfc/config
2) Boot arm64 hikey hi6220 device
3) While booting the device you will get to see this kernel BUG:

[   18.243075] BUG: KFENCE: memory corruption in
usb_get_device_descriptor+0x80/0xb0
[   18.813861] BUG: KFENCE: memory corruption in
__usbnet_read_cmd.isra.0+0xd0/0x1a0

link:
https://qa-reports.linaro.org/lkft/linux-mainline-master/build/v5.12-rc2-487-gf296bfd5cd04/testrun/4155170/suite/linux-log-parser/test/check-kernel-bug-2388200/log

- Naresh

Re: BUG: KFENCE: memory corruption in usb_get_device_descriptor

2021-03-17 Thread Greg Kroah-Hartman

On Wed, Mar 17, 2021 at 02:28:40PM +0530, Naresh Kamboju wrote:
> While booting Linux mainline master 5.12.0-rc2 and 5.12.0-rc3 on arm64
> Hikey device the following KFENCE bug was found.
> 
> Recently, we have enabled CONFIG_KFENCE=y and started seeing this crash.
> kernel BUG log:

What USB traffic are you having here?

And has this ever not triggered?

thanks,

greg k-h

BUG: KFENCE: memory corruption in usb_get_device_descriptor

2021-03-17 Thread Naresh Kamboju

While booting Linux mainline master 5.12.0-rc2 and 5.12.0-rc3 on arm64
Hikey device the following KFENCE bug was found.

Recently, we have enabled CONFIG_KFENCE=y and started seeing this crash.
kernel BUG log:

[   18.243075] BUG: KFENCE: memory corruption in
usb_get_device_descriptor+0x80/0xb0
[   18.243075]
[   18.253016] Corrupted memory at 0xbb4567e7 [ ! ! . . . . .
. . . . . . . . . ] (in kfence-#118):
[   18.263817]  usb_get_device_descriptor+0x80/0xb0
[   18.268978]  hub_port_init+0x3e8/0xb70
[   18.273189]  hub_event+0x578/0x1628
[   18.277109]  process_one_work+0x1c8/0x488
[   18.281593]  worker_thread+0x54/0x428
[   18.285692]  kthread+0x120/0x158
[   18.289320]  ret_from_fork+0x10/0x34
[   18.293330]
[   18.295018] kfence-#118 [0xb55b54e8-0x1fc57965,
size=18, cache=kmalloc-128] allocated by task 204:

[   18.306534]  usb_get_device_descriptor+0x40/0xb0
[   18.311693]  hub_port_init+0x3e8/0xb70
[   18.315900]  hub_event+0x578/0x1628
[   18.319819]  process_one_work+0x1c8/0x488
[   18.324301]  worker_thread+0x54/0x428
[   18.328397]  kthread+0x120/0x158
[   18.332024]  ret_from_fork+0x10/0x34
root@hikey:~# [   18.33603. /lava-2388200/environment
3]
[   18.338544] CPU: 7 PID: 204 Comm: kworker/7:2 Not tainted 5.12.0-rc2 #2
[   18.345902] Hardware name: HiKey Development Board (DT)
[   18.351715] Workqueue: usb_hub_wq hub_event
[   18.356428] 
==
. /lava[   18.805771]
==
[   18.813861] BUG: KFENCE: memory corruption in
__usbnet_read_cmd.isra.0+0xd0/0x1a0
[   18.813861]
[   18.823804] Corrupted memory at 0x7cedde53 [ ! ! ! . . . .
. . . . . . . . . ] (in kfence-#121):
[   18.834603]  __usbnet_read_cmd.isra.0+0xd0/0x1a0
[   18.839765]  usbnet_read_cmd+0x70/0xa8
[   18.843965]  asix_read_cmd+0x60/0xa0
[   18.847981]  ax88772a_hw_reset+0x148/0x468
[   18.852570]  ax88772_bind+0x1c8/0x310
[   18.856683]  usbnet_probe+0x29c/0x7d8
[   18.860788]  usb_probe_interface+0xe0/0x2c0
-[   18.865236]  really_probe+0xf0/0x4d8
[   18.869016]  driver_probe_device+0xfc/0x168
[   18.873430]  __device_attach_driver+0x94/0x120
[   18.878116]  bus_for_each_drv+0x80/0xd8
[   18.882165]  __device_attach+0xfc/0x180
[   18.886214]  device_initial_probe+0x1c/0x28
[   18.890627]  bus_probe_device+0xa4/0xb0
[   18.894676]  device_add+0x3a8/0x7e8
[   18.898357]  usb_set_configuration+0x488/0x8e8
[   18.903044]  usb_generic_driver_probe+0x58/0x98
[   18.907823]  usb_probe_device+0x44/0x108
[   18.911964]  really_probe+0xf0/0x4d8
2[   18.924600]  driver_probe_device+0xfc/0x168
[   18.937379]  __device_attach_driver+0x94/0x120
[   18.950406]  bus_for_each_drv+0x80/0xd8
[   18.960383]  __device_attach+0xfc/0x180
[   18.969078]  device_initial_probe+0x1c/0x28
3[   18.977855]  bus_probe_device+0xa4/0xb0
[   18.986226]  device_add+0x3a8/0x7e8
[   18.994190]  usb_new_device+0x1e0/0x590
[   19.002475]  hub_event+0x5ec/0x1628
[   19.010352]  process_one_work+0x1c8/0x488
[   19.018792]  worker_thread+0x54/0x428
[   19.026921]  kthread+0x120/0x158
[   19.034614]  ret_from_fork+0x10/0x34
8[   19.042712]
[   19.048623] kfence-#121 [0x8a763b3c-0x8a763b3c,
size=1, cache=kmalloc-128] allocated by task 204:
[   19.063612]  __usbnet_read_cmd.isra.0+0x60/0x1a0
[   19.072924]  usbnet_read_cmd+0x70/0xa8
[   19.081325]  asix_read_cmd+0x60/0xa0
[   19.089503]  ax88772a_hw_reset+0x148/0x468
8[   19.098163]  ax88772_bind+0x1c8/0x310
[   19.106312]  usbnet_probe+0x29c/0x7d8
[   19.114407]  usb_probe_interface+0xe0/0x2c0
[   19.122950]  really_probe+0xf0/0x4d8
[   19.130811]  driver_probe_device+0xfc/0x168
[   19.139273]  __device_attach_driver+0x94/0x120
[   19.148025]  bus_for_each_drv+0x80/0xd8
[   19.156148]  __device_attach+0xfc/0x180
2[   19.164287]  device_initial_probe+0x1c/0x28
[   19.172782]  bus_probe_device+0xa4/0xb0
[   19.180948]  device_add+0x3a8/0x7e8
[   19.188758]  usb_set_configuration+0x488/0x8e8
[   19.197455]  usb_generic_driver_probe+0x58/0x98
[   19.206120]  usb_probe_device+0x44/0x108
[   19.214175]  really_probe+0xf0/0x4d8
0[   19.221885]  driver_probe_device+0xfc/0x168
[   19.230202]  __device_attach_driver+0x94/0x120
[   19.238794]  bus_for_each_drv+0x80/0xd8
[   19.246780]  __device_attach+0xfc/0x180
[   19.254790]  device_initial_probe+0x1c/0x28
[   19.263145]  bus_probe_device+0xa4/0xb0
[   19.27]  device_add+0x3a8/0x7e8
0[   19.278682]  usb_new_device+0x1e0/0x590
[   19.286583]  hub_event+0x5ec/0x1628
[   19.294055]  process_one_work+0x1c8/0x488
[   19.302102]  worker_thread+0x54/0x428
[   19.309743]  kthread+0x120/0x158
[   19.316894]  ret_from_fork+0x10/0x34
[   19.324306]
[   19.329495] CPU: 7 PID: 204 Comm: kworker/7:2 Tainted: GB
  5.12.0-rc2 #2
/[   19.341360] Hardware name: HiKey Development Board (DT)
[   19.350439] Workqueue: usb_hub_wq hub_event


Reported-by: Naresh Kamboju 

metadata:
  git branch: master
  git repo: https

[PATCH v2] rpadlpar: fix potential drc_name corruption in store functions

2021-03-15 Thread Tyrel Datwyler

Both add_slot_store() and remove_slot_store() try to fix up the drc_name
copied from the store buffer by placing a NULL terminator at nbyte + 1
or in place of a '\n' if present. However, the static buffer that we
copy the drc_name data into is not zeored and can contain anything past
the n-th byte. This is problematic if a '\n' byte appears in that buffer
after nbytes and the string copied into the store buffer was not NULL
terminated to start with as the strchr() search for a '\n' byte will mark
this incorrectly as the end of the drc_name string resulting in a drc_name
string that contains garbage data after the n-th byte. The following
debugging shows an example of the drmgr utility writing "PHB 4543" to
the add_slot sysfs attribute, but add_slot_store logging a corrupted
string value.

[135823.702864] drmgr: drmgr: -c phb -a -s PHB 4543 -d 1
[135823.702879] add_slot_store: drc_name = PHB 4543°|<82>!, rc = -19

Fix this by using strscpy() instead of memcpy() to ensure the string is
NULL terminated when copied into the static drc_name buffer. Further,
since the string is now NULL terminated the code only needs to change
'\n' to '\0' when present.

Signed-off-by: Tyrel Datwyler 
---

Changes in v2:
* use strscpy instead of memcpy (suggested by mpe)

 drivers/pci/hotplug/rpadlpar_sysfs.c | 14 ++
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/drivers/pci/hotplug/rpadlpar_sysfs.c 
b/drivers/pci/hotplug/rpadlpar_sysfs.c
index cdbfa5df3a51..dbfa0b55d31a 100644
--- a/drivers/pci/hotplug/rpadlpar_sysfs.c
+++ b/drivers/pci/hotplug/rpadlpar_sysfs.c
@@ -34,12 +34,11 @@ static ssize_t add_slot_store(struct kobject *kobj, struct 
kobj_attribute *attr,
if (nbytes >= MAX_DRC_NAME_LEN)
return 0;
 
-   memcpy(drc_name, buf, nbytes);
+   strscpy(drc_name, buf, nbytes + 1);
 
end = strchr(drc_name, '\n');
-   if (!end)
-   end = &drc_name[nbytes];
-   *end = '\0';
+   if (end)
+   *end = '\0';
 
rc = dlpar_add_slot(drc_name);
if (rc)
@@ -65,12 +64,11 @@ static ssize_t remove_slot_store(struct kobject *kobj,
if (nbytes >= MAX_DRC_NAME_LEN)
return 0;
 
-   memcpy(drc_name, buf, nbytes);
+   strscpy(drc_name, buf, nbytes + 1);
 
end = strchr(drc_name, '\n');
-   if (!end)
-   end = &drc_name[nbytes];
-   *end = '\0';
+   if (end)
+   *end = '\0';
 
rc = dlpar_remove_slot(drc_name);
if (rc)
-- 
2.27.0

Re: [PATCH] rpadlpar: fix potential drc_name corruption in store functions

2021-03-15 Thread Tyrel Datwyler

On 3/14/21 7:52 PM, Michael Ellerman wrote:
> Tyrel Datwyler  writes:
>> On 3/13/21 1:17 AM, Michal Suchánek wrote:
>>> On Wed, Mar 10, 2021 at 04:30:21PM -0600, Tyrel Datwyler wrote:
 Both add_slot_store() and remove_slot_store() try to fix up the drc_name
 copied from the store buffer by placing a NULL terminator at nbyte + 1
 or in place of a '\n' if present. However, the static buffer that we
 copy the drc_name data into is not zeored and can contain anything past
 the n-th byte. This is problematic if a '\n' byte appears in that buffer
 after nbytes and the string copied into the store buffer was not NULL
 terminated to start with as the strchr() search for a '\n' byte will mark
 this incorrectly as the end of the drc_name string resulting in a drc_name
 string that contains garbage data after the n-th byte. The following
 debugging shows an example of the drmgr utility writing "PHB 4543" to
 the add_slot sysfs attribute, but add_slot_store logging a corrupted
 string value.

 [135823.702864] drmgr: drmgr: -c phb -a -s PHB 4543 -d 1
 [135823.702879] add_slot_store: drc_name = PHB 4543°|<82>!, rc = -19

 Fix this by NULL terminating the string when we copy it into our static
 buffer by coping nbytes + 1 of data from the store buffer. The code has
>>> Why is it OK to copy nbytes + 1 and why is it expected that the buffer
>>> contains a nul after the content?
>>
>> It is my understanding that the store function buffer is allocated as a
>> zeroed-page which the kernel copies up to at most (PAGE_SIZE - 1) of user 
>> data
>> into. Anything after nbytes would therefore be zeroed.
> 
> I think that's true, but it would be nice if we didn't have to rely on
> that obscure detail in order for this code to be correct & understandable.

I think its a security guarantee, but I guess barring a comment that explicitly
outlines the correctness it probably isn't obvious.

> 
>>> Isn't it much saner to just nul terminate the string after copying?
>>
>> At the cost of an extra line of code, sure.
> 
> Is there a reason we can't use strscpy()? That should deal with all the
> corner cases around the string copy, and then all you have to do is look
> for a newline and turn it into nul.

Fine with me. I'll spin v2 with strscpy().

-Tyrel

> 
> cheers
>

[PATCH 5.11 301/306] mm/userfaultfd: fix memory corruption due to writeprotect

2021-03-15 Thread gregkh

From: Greg Kroah-Hartman 

From: Nadav Amit 

commit 6ce64428d62026a10cb5d80138ff2f90cc21d367 upstream.

Userfaultfd self-test fails occasionally, indicating a memory corruption.

Analyzing this problem indicates that there is a real bug since mmap_lock
is only taken for read in mwriteprotect_range() and defers flushes, and
since there is insufficient consideration of concurrent deferred TLB
flushes in wp_page_copy().  Although the PTE is flushed from the TLBs in
wp_page_copy(), this flush takes place after the copy has already been
performed, and therefore changes of the page are possible between the time
of the copy and the time in which the PTE is flushed.

To make matters worse, memory-unprotection using userfaultfd also poses a
problem.  Although memory unprotection is logically a promotion of PTE
permissions, and therefore should not require a TLB flush, the current
userrfaultfd code might actually cause a demotion of the architectural PTE
permission: when userfaultfd_writeprotect() unprotects memory region, it
unintentionally *clears* the RW-bit if it was already set.  Note that this
unprotecting a PTE that is not write-protected is a valid use-case: the
userfaultfd monitor might ask to unprotect a region that holds both
write-protected and write-unprotected PTEs.

The scenario that happens in selftests/vm/userfaultfd is as follows:

cpu0cpu1cpu2

[ Writable PTE
  cached in TLB ]
userfaultfd_writeprotect()
[ write-*unprotect* ]
mwriteprotect_range()
mmap_read_lock()
change_protection()

change_protection_range()
...
change_pte_range()
[ *clear* “write”-bit ]
[ defer TLB flushes ]
[ page-fault ]
...
wp_page_copy()
 cow_user_page()
  [ copy page ]
[ write to old
  page ]
...
 set_pte_at_notify()

A similar scenario can happen:

cpu0cpu1cpu2cpu3

[ Writable PTE
  cached in TLB ]
userfaultfd_writeprotect()
[ write-protect ]
[ deferred TLB flush ]
userfaultfd_writeprotect()
[ write-unprotect ]
[ deferred TLB flush]
[ page-fault ]
wp_page_copy()
 cow_user_page()
 [ copy page ]
 ...[ write to page ]
set_pte_at_notify()

This race exists since commit 292924b26024 ("userfaultfd: wp: apply
_PAGE_UFFD_WP bit").  Yet, as Yu Zhao pointed, these races became apparent
since commit 09854ba94c6a ("mm: do_wp_page() simplification") which made
wp_page_copy() more likely to take place, specifically if page_count(page)
> 1.

To resolve the aforementioned races, check whether there are pending
flushes on uffd-write-protected VMAs, and if there are, perform a flush
before doing the COW.

Further optimizations will follow to avoid during uffd-write-unprotect
unnecassary PTE write-protection and TLB flushes.

Link: https://lkml.kernel.org/r/20210304095423.3825684-1-na...@vmware.com
Fixes: 09854ba94c6a ("mm: do_wp_page() simplification")
Signed-off-by: Nadav Amit 
Suggested-by: Yu Zhao 
Reviewed-by: Peter Xu 
Tested-by: Peter Xu 
Cc: Andrea Arcangeli 
Cc: Andy Lutomirski 
Cc: Pavel Emelyanov 
Cc: Mike Kravetz 
Cc: Mike Rapoport 
Cc: Minchan Kim 
Cc: Will Deacon 
Cc: Peter Zijlstra 
Cc: [5.9+]
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman 
---
 mm/memory.c |8 
 1 file changed, 8 insertions(+)

--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3092,6 +3092,14 @@ static vm_fault_t do_wp_page(struct vm_f
return handle_userfault(vmf, VM_UFFD_WP);
}

+   /*
+* Userfaultfd write-protect can defer flushes. Ensure the TLB
+* is flushed in this case before copying.
+*/
+   if (unlikely(userfaultfd_wp(vmf->vma) &&
+mm_tlb_flush_pending(vmf->vma->vm_mm)))
+   flush_tlb_page(vmf->vma, vmf->address);
+
vmf->page = vm_normal_page(vma, vmf->address, vmf->orig_pte);
if (!vmf->page) {
/*

[PATCH 5.10 283/290] mm/userfaultfd: fix memory corruption due to writeprotect

2021-03-15 Thread gregkh

From: Greg Kroah-Hartman 

From: Nadav Amit 

commit 6ce64428d62026a10cb5d80138ff2f90cc21d367 upstream.

Userfaultfd self-test fails occasionally, indicating a memory corruption.

Analyzing this problem indicates that there is a real bug since mmap_lock
is only taken for read in mwriteprotect_range() and defers flushes, and
since there is insufficient consideration of concurrent deferred TLB
flushes in wp_page_copy().  Although the PTE is flushed from the TLBs in
wp_page_copy(), this flush takes place after the copy has already been
performed, and therefore changes of the page are possible between the time
of the copy and the time in which the PTE is flushed.

To make matters worse, memory-unprotection using userfaultfd also poses a
problem.  Although memory unprotection is logically a promotion of PTE
permissions, and therefore should not require a TLB flush, the current
userrfaultfd code might actually cause a demotion of the architectural PTE
permission: when userfaultfd_writeprotect() unprotects memory region, it
unintentionally *clears* the RW-bit if it was already set.  Note that this
unprotecting a PTE that is not write-protected is a valid use-case: the
userfaultfd monitor might ask to unprotect a region that holds both
write-protected and write-unprotected PTEs.

The scenario that happens in selftests/vm/userfaultfd is as follows:

cpu0cpu1cpu2

[ Writable PTE
  cached in TLB ]
userfaultfd_writeprotect()
[ write-*unprotect* ]
mwriteprotect_range()
mmap_read_lock()
change_protection()

change_protection_range()
...
change_pte_range()
[ *clear* “write”-bit ]
[ defer TLB flushes ]
[ page-fault ]
...
wp_page_copy()
 cow_user_page()
  [ copy page ]
[ write to old
  page ]
...
 set_pte_at_notify()

A similar scenario can happen:

cpu0cpu1cpu2cpu3

[ Writable PTE
  cached in TLB ]
userfaultfd_writeprotect()
[ write-protect ]
[ deferred TLB flush ]
userfaultfd_writeprotect()
[ write-unprotect ]
[ deferred TLB flush]
[ page-fault ]
wp_page_copy()
 cow_user_page()
 [ copy page ]
 ...[ write to page ]
set_pte_at_notify()

This race exists since commit 292924b26024 ("userfaultfd: wp: apply
_PAGE_UFFD_WP bit").  Yet, as Yu Zhao pointed, these races became apparent
since commit 09854ba94c6a ("mm: do_wp_page() simplification") which made
wp_page_copy() more likely to take place, specifically if page_count(page)
> 1.

To resolve the aforementioned races, check whether there are pending
flushes on uffd-write-protected VMAs, and if there are, perform a flush
before doing the COW.

Further optimizations will follow to avoid during uffd-write-unprotect
unnecassary PTE write-protection and TLB flushes.

Link: https://lkml.kernel.org/r/20210304095423.3825684-1-na...@vmware.com
Fixes: 09854ba94c6a ("mm: do_wp_page() simplification")
Signed-off-by: Nadav Amit 
Suggested-by: Yu Zhao 
Reviewed-by: Peter Xu 
Tested-by: Peter Xu 
Cc: Andrea Arcangeli 
Cc: Andy Lutomirski 
Cc: Pavel Emelyanov 
Cc: Mike Kravetz 
Cc: Mike Rapoport 
Cc: Minchan Kim 
Cc: Will Deacon 
Cc: Peter Zijlstra 
Cc: [5.9+]
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman 
---
 mm/memory.c |8 
 1 file changed, 8 insertions(+)

--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3090,6 +3090,14 @@ static vm_fault_t do_wp_page(struct vm_f
return handle_userfault(vmf, VM_UFFD_WP);
}

+   /*
+* Userfaultfd write-protect can defer flushes. Ensure the TLB
+* is flushed in this case before copying.
+*/
+   if (unlikely(userfaultfd_wp(vmf->vma) &&
+mm_tlb_flush_pending(vmf->vma->vm_mm)))
+   flush_tlb_page(vmf->vma, vmf->address);
+
vmf->page = vm_normal_page(vma, vmf->address, vmf->orig_pte);
if (!vmf->page) {
/*

[PATCH 5.11 224/306] staging: rtl8188eu: fix potential memory corruption in rtw_check_beacon_data()

2021-03-15 Thread gregkh

From: Greg Kroah-Hartman 

From: Dan Carpenter 

commit d4ac640322b06095128a5c45ba4a1e80929fe7f3 upstream.

The "ie_len" is a value in the 1-255 range that comes from the user.  We
have to cap it to ensure that it's not too large or it could lead to
memory corruption.

Fixes: 9a7fe54ddc3a ("staging: r8188eu: Add source files for new driver - part 
1")
Signed-off-by: Dan Carpenter 
Cc: stable 
Link: https://lore.kernel.org/r/YEHyQCrFZKTXyT7J@mwanda
Signed-off-by: Greg Kroah-Hartman 
---
 drivers/staging/rtl8188eu/core/rtw_ap.c |5 +
 1 file changed, 5 insertions(+)

--- a/drivers/staging/rtl8188eu/core/rtw_ap.c
+++ b/drivers/staging/rtl8188eu/core/rtw_ap.c
@@ -791,6 +791,7 @@ int rtw_check_beacon_data(struct adapter
p = rtw_get_ie(ie + _BEACON_IE_OFFSET_, WLAN_EID_SSID, &ie_len,
   pbss_network->ie_length - _BEACON_IE_OFFSET_);
if (p && ie_len > 0) {
+   ie_len = min_t(int, ie_len, sizeof(pbss_network->ssid.ssid));
memset(&pbss_network->ssid, 0, sizeof(struct ndis_802_11_ssid));
memcpy(pbss_network->ssid.ssid, p + 2, ie_len);
pbss_network->ssid.ssid_length = ie_len;
@@ -811,6 +812,7 @@ int rtw_check_beacon_data(struct adapter
p = rtw_get_ie(ie + _BEACON_IE_OFFSET_, WLAN_EID_SUPP_RATES, &ie_len,
   pbss_network->ie_length - _BEACON_IE_OFFSET_);
if (p) {
+   ie_len = min_t(int, ie_len, NDIS_802_11_LENGTH_RATES_EX);
memcpy(supportRate, p + 2, ie_len);
supportRateNum = ie_len;
}
@@ -819,6 +821,8 @@ int rtw_check_beacon_data(struct adapter
p = rtw_get_ie(ie + _BEACON_IE_OFFSET_, WLAN_EID_EXT_SUPP_RATES,
   &ie_len, pbss_network->ie_length - _BEACON_IE_OFFSET_);
if (p) {
+   ie_len = min_t(int, ie_len,
+  NDIS_802_11_LENGTH_RATES_EX - supportRateNum);
memcpy(supportRate + supportRateNum, p + 2, ie_len);
supportRateNum += ie_len;
}
@@ -934,6 +938,7 @@ int rtw_check_beacon_data(struct adapter
 
pht_cap->mcs.rx_mask[0] = 0xff;
pht_cap->mcs.rx_mask[1] = 0x0;
+   ie_len = min_t(int, ie_len, sizeof(pmlmepriv->htpriv.ht_cap));
memcpy(&pmlmepriv->htpriv.ht_cap, p + 2, ie_len);
}

[PATCH 5.10 228/290] staging: rtl8188eu: fix potential memory corruption in rtw_check_beacon_data()

2021-03-15 Thread gregkh

From: Greg Kroah-Hartman 

From: Dan Carpenter 

commit d4ac640322b06095128a5c45ba4a1e80929fe7f3 upstream.

The "ie_len" is a value in the 1-255 range that comes from the user.  We
have to cap it to ensure that it's not too large or it could lead to
memory corruption.

Fixes: 9a7fe54ddc3a ("staging: r8188eu: Add source files for new driver - part 
1")
Signed-off-by: Dan Carpenter 
Cc: stable 
Link: https://lore.kernel.org/r/YEHyQCrFZKTXyT7J@mwanda
Signed-off-by: Greg Kroah-Hartman 
---
 drivers/staging/rtl8188eu/core/rtw_ap.c |5 +
 1 file changed, 5 insertions(+)

--- a/drivers/staging/rtl8188eu/core/rtw_ap.c
+++ b/drivers/staging/rtl8188eu/core/rtw_ap.c
@@ -791,6 +791,7 @@ int rtw_check_beacon_data(struct adapter
p = rtw_get_ie(ie + _BEACON_IE_OFFSET_, _SSID_IE_, &ie_len,
   pbss_network->ie_length - _BEACON_IE_OFFSET_);
if (p && ie_len > 0) {
+   ie_len = min_t(int, ie_len, sizeof(pbss_network->ssid.ssid));
memset(&pbss_network->ssid, 0, sizeof(struct ndis_802_11_ssid));
memcpy(pbss_network->ssid.ssid, p + 2, ie_len);
pbss_network->ssid.ssid_length = ie_len;
@@ -811,6 +812,7 @@ int rtw_check_beacon_data(struct adapter
p = rtw_get_ie(ie + _BEACON_IE_OFFSET_, _SUPPORTEDRATES_IE_, &ie_len,
   pbss_network->ie_length - _BEACON_IE_OFFSET_);
if (p) {
+   ie_len = min_t(int, ie_len, NDIS_802_11_LENGTH_RATES_EX);
memcpy(supportRate, p + 2, ie_len);
supportRateNum = ie_len;
}
@@ -819,6 +821,8 @@ int rtw_check_beacon_data(struct adapter
p = rtw_get_ie(ie + _BEACON_IE_OFFSET_, _EXT_SUPPORTEDRATES_IE_,
   &ie_len, pbss_network->ie_length - _BEACON_IE_OFFSET_);
if (p) {
+   ie_len = min_t(int, ie_len,
+  NDIS_802_11_LENGTH_RATES_EX - supportRateNum);
memcpy(supportRate + supportRateNum, p + 2, ie_len);
supportRateNum += ie_len;
}
@@ -934,6 +938,7 @@ int rtw_check_beacon_data(struct adapter
 
pht_cap->mcs.rx_mask[0] = 0xff;
pht_cap->mcs.rx_mask[1] = 0x0;
+   ie_len = min_t(int, ie_len, sizeof(pmlmepriv->htpriv.ht_cap));
memcpy(&pmlmepriv->htpriv.ht_cap, p + 2, ie_len);
}

[PATCH 5.10 159/290] kasan: fix memory corruption in kasan_bitops_tags test

2021-03-15 Thread gregkh

From: Greg Kroah-Hartman 

From: Andrey Konovalov 

[ Upstream commit e66e1799a76621003e5b04c9c057826a2152e103 ]

Since the hardware tag-based KASAN mode might not have a redzone that
comes after an allocated object (when kasan.mode=prod is enabled), the
kasan_bitops_tags() test ends up corrupting the next object in memory.

Change the test so it always accesses the redzone that lies within the
allocated object's boundaries.

Link: 
https://linux-review.googlesource.com/id/I67f51d1ee48f0a8d0fe2658c2a39e4879fe0832a
Link: 
https://lkml.kernel.org/r/7d452ce4ae35bb1988d2c9244dfea56cf2cc9315.1610733117.git.andreyk...@google.com
Signed-off-by: Andrey Konovalov 
Reviewed-by: Marco Elver 
Reviewed-by: Alexander Potapenko 
Cc: Andrey Ryabinin 
Cc: Branislav Rankov 
Cc: Catalin Marinas 
Cc: Dmitry Vyukov 
Cc: Evgenii Stepanov 
Cc: Kevin Brodsky 
Cc: Peter Collingbourne 
Cc: Vincenzo Frascino 
Cc: Will Deacon 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Sasha Levin 
---
 lib/test_kasan.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/lib/test_kasan.c b/lib/test_kasan.c
index 662f862702fc..400507f1e5db 100644
--- a/lib/test_kasan.c
+++ b/lib/test_kasan.c
@@ -737,13 +737,13 @@ static void kasan_bitops_tags(struct kunit *test)
return;
}
 
-   /* Allocation size will be rounded to up granule size, which is 16. */
-   bits = kzalloc(sizeof(*bits), GFP_KERNEL);
+   /* kmalloc-64 cache will be used and the last 16 bytes will be the 
redzone. */
+   bits = kzalloc(48, GFP_KERNEL);
KUNIT_ASSERT_NOT_ERR_OR_NULL(test, bits);
 
-   /* Do the accesses past the 16 allocated bytes. */
-   kasan_bitops_modify(test, BITS_PER_LONG, &bits[1]);
-   kasan_bitops_test_and_modify(test, BITS_PER_LONG + BITS_PER_BYTE, 
&bits[1]);
+   /* Do the accesses past the 48 allocated bytes, but within the redone. 
*/
+   kasan_bitops_modify(test, BITS_PER_LONG, (void *)bits + 48);
+   kasan_bitops_test_and_modify(test, BITS_PER_LONG + BITS_PER_BYTE, (void 
*)bits + 48);
 
kfree(bits);
 }
-- 
2.30.1

[PATCH 5.11 156/306] kasan: fix memory corruption in kasan_bitops_tags test

2021-03-15 Thread gregkh

From: Greg Kroah-Hartman 

From: Andrey Konovalov 

[ Upstream commit e66e1799a76621003e5b04c9c057826a2152e103 ]

Since the hardware tag-based KASAN mode might not have a redzone that
comes after an allocated object (when kasan.mode=prod is enabled), the
kasan_bitops_tags() test ends up corrupting the next object in memory.

Change the test so it always accesses the redzone that lies within the
allocated object's boundaries.

Link: 
https://linux-review.googlesource.com/id/I67f51d1ee48f0a8d0fe2658c2a39e4879fe0832a
Link: 
https://lkml.kernel.org/r/7d452ce4ae35bb1988d2c9244dfea56cf2cc9315.1610733117.git.andreyk...@google.com
Signed-off-by: Andrey Konovalov 
Reviewed-by: Marco Elver 
Reviewed-by: Alexander Potapenko 
Cc: Andrey Ryabinin 
Cc: Branislav Rankov 
Cc: Catalin Marinas 
Cc: Dmitry Vyukov 
Cc: Evgenii Stepanov 
Cc: Kevin Brodsky 
Cc: Peter Collingbourne 
Cc: Vincenzo Frascino 
Cc: Will Deacon 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Sasha Levin 
---
 lib/test_kasan.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/lib/test_kasan.c b/lib/test_kasan.c
index 2947274cc2d3..5a2f104ca13f 100644
--- a/lib/test_kasan.c
+++ b/lib/test_kasan.c
@@ -737,13 +737,13 @@ static void kasan_bitops_tags(struct kunit *test)
return;
}
 
-   /* Allocation size will be rounded to up granule size, which is 16. */
-   bits = kzalloc(sizeof(*bits), GFP_KERNEL);
+   /* kmalloc-64 cache will be used and the last 16 bytes will be the 
redzone. */
+   bits = kzalloc(48, GFP_KERNEL);
KUNIT_ASSERT_NOT_ERR_OR_NULL(test, bits);
 
-   /* Do the accesses past the 16 allocated bytes. */
-   kasan_bitops_modify(test, BITS_PER_LONG, &bits[1]);
-   kasan_bitops_test_and_modify(test, BITS_PER_LONG + BITS_PER_BYTE, 
&bits[1]);
+   /* Do the accesses past the 48 allocated bytes, but within the redone. 
*/
+   kasan_bitops_modify(test, BITS_PER_LONG, (void *)bits + 48);
+   kasan_bitops_test_and_modify(test, BITS_PER_LONG + BITS_PER_BYTE, (void 
*)bits + 48);
 
kfree(bits);
 }
-- 
2.30.1

[PATCH 5.10 134/290] udf: fix silent AED tagLocation corruption

2021-03-15 Thread gregkh

From: Greg Kroah-Hartman 

From: Steven J. Magnani 

[ Upstream commit 63c9e47a1642fc817654a1bc18a6ec4bbcc0f056 ]

When extending a file, udf_do_extend_file() may enter following empty
indirect extent. At the end of udf_do_extend_file() we revert prev_epos
to point to the last written extent. However if we end up not adding any
further extent in udf_do_extend_file(), the reverting points prev_epos
into the header area of the AED and following updates of the extents
(in udf_update_extents()) will corrupt the header.

Make sure that we do not follow indirect extent if we are not going to
add any more extents so that returning back to the last written extent
works correctly.

Link: https://lore.kernel.org/r/20210107234116.6190-2-magn...@ieee.org
Signed-off-by: Steven J. Magnani 
Signed-off-by: Jan Kara 
Signed-off-by: Sasha Levin 
---
 fs/udf/inode.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/fs/udf/inode.c b/fs/udf/inode.c
index bb89c3e43212..0dd2f93ac048 100644
--- a/fs/udf/inode.c
+++ b/fs/udf/inode.c
@@ -544,11 +544,14 @@ static int udf_do_extend_file(struct inode *inode,

udf_write_aext(inode, last_pos, &last_ext->extLocation,
last_ext->extLength, 1);
+
/*
-* We've rewritten the last extent but there may be empty
-* indirect extent after it - enter it.
+* We've rewritten the last extent. If we are going to add
+* more extents, we may need to enter possible following
+* empty indirect extent.
 */
-   udf_next_aext(inode, last_pos, &tmploc, &tmplen, 0);
+   if (new_block_bytes || prealloc_len)
+   udf_next_aext(inode, last_pos, &tmploc, &tmplen, 0);
}

/* Managed to do everything necessary? */
-- 
2.30.1

[PATCH 5.4 130/168] staging: rtl8188eu: fix potential memory corruption in rtw_check_beacon_data()

2021-03-15 Thread gregkh

From: Greg Kroah-Hartman 

From: Dan Carpenter 

commit d4ac640322b06095128a5c45ba4a1e80929fe7f3 upstream.

The "ie_len" is a value in the 1-255 range that comes from the user.  We
have to cap it to ensure that it's not too large or it could lead to
memory corruption.

Fixes: 9a7fe54ddc3a ("staging: r8188eu: Add source files for new driver - part 
1")
Signed-off-by: Dan Carpenter 
Cc: stable 
Link: https://lore.kernel.org/r/YEHyQCrFZKTXyT7J@mwanda
Signed-off-by: Greg Kroah-Hartman 
---
 drivers/staging/rtl8188eu/core/rtw_ap.c |5 +
 1 file changed, 5 insertions(+)

--- a/drivers/staging/rtl8188eu/core/rtw_ap.c
+++ b/drivers/staging/rtl8188eu/core/rtw_ap.c
@@ -784,6 +784,7 @@ int rtw_check_beacon_data(struct adapter
/* SSID */
p = rtw_get_ie(ie + _BEACON_IE_OFFSET_, _SSID_IE_, &ie_len, 
(pbss_network->ie_length - _BEACON_IE_OFFSET_));
if (p && ie_len > 0) {
+   ie_len = min_t(int, ie_len, sizeof(pbss_network->ssid.ssid));
memset(&pbss_network->ssid, 0, sizeof(struct ndis_802_11_ssid));
memcpy(pbss_network->ssid.ssid, (p + 2), ie_len);
pbss_network->ssid.ssid_length = ie_len;
@@ -802,6 +803,7 @@ int rtw_check_beacon_data(struct adapter
/*  get supported rates */
p = rtw_get_ie(ie + _BEACON_IE_OFFSET_, _SUPPORTEDRATES_IE_, &ie_len, 
(pbss_network->ie_length - _BEACON_IE_OFFSET_));
if (p) {
+   ie_len = min_t(int, ie_len, NDIS_802_11_LENGTH_RATES_EX);
memcpy(supportRate, p + 2, ie_len);
supportRateNum = ie_len;
}
@@ -809,6 +811,8 @@ int rtw_check_beacon_data(struct adapter
/* get ext_supported rates */
p = rtw_get_ie(ie + _BEACON_IE_OFFSET_, _EXT_SUPPORTEDRATES_IE_, 
&ie_len, pbss_network->ie_length - _BEACON_IE_OFFSET_);
if (p) {
+   ie_len = min_t(int, ie_len,
+  NDIS_802_11_LENGTH_RATES_EX - supportRateNum);
memcpy(supportRate + supportRateNum, p + 2, ie_len);
supportRateNum += ie_len;
}
@@ -922,6 +926,7 @@ int rtw_check_beacon_data(struct adapter
 
pht_cap->mcs.rx_mask[0] = 0xff;
pht_cap->mcs.rx_mask[1] = 0x0;
+   ie_len = min_t(int, ie_len, sizeof(pmlmepriv->htpriv.ht_cap));
memcpy(&pmlmepriv->htpriv.ht_cap, p + 2, ie_len);
}

[PATCH 4.19 092/120] staging: rtl8188eu: fix potential memory corruption in rtw_check_beacon_data()

2021-03-15 Thread gregkh

From: Greg Kroah-Hartman 

From: Dan Carpenter 

commit d4ac640322b06095128a5c45ba4a1e80929fe7f3 upstream.

The "ie_len" is a value in the 1-255 range that comes from the user.  We
have to cap it to ensure that it's not too large or it could lead to
memory corruption.

Fixes: 9a7fe54ddc3a ("staging: r8188eu: Add source files for new driver - part 
1")
Signed-off-by: Dan Carpenter 
Cc: stable 
Link: https://lore.kernel.org/r/YEHyQCrFZKTXyT7J@mwanda
Signed-off-by: Greg Kroah-Hartman 
---
 drivers/staging/rtl8188eu/core/rtw_ap.c |5 +
 1 file changed, 5 insertions(+)

--- a/drivers/staging/rtl8188eu/core/rtw_ap.c
+++ b/drivers/staging/rtl8188eu/core/rtw_ap.c
@@ -794,6 +794,7 @@ int rtw_check_beacon_data(struct adapter
/* SSID */
p = rtw_get_ie(ie + _BEACON_IE_OFFSET_, _SSID_IE_, &ie_len, 
(pbss_network->ie_length - _BEACON_IE_OFFSET_));
if (p && ie_len > 0) {
+   ie_len = min_t(int, ie_len, sizeof(pbss_network->Ssid.Ssid));
memset(&pbss_network->Ssid, 0, sizeof(struct ndis_802_11_ssid));
memcpy(pbss_network->Ssid.Ssid, (p + 2), ie_len);
pbss_network->Ssid.SsidLength = ie_len;
@@ -812,6 +813,7 @@ int rtw_check_beacon_data(struct adapter
/*  get supported rates */
p = rtw_get_ie(ie + _BEACON_IE_OFFSET_, _SUPPORTEDRATES_IE_, &ie_len, 
(pbss_network->ie_length - _BEACON_IE_OFFSET_));
if (p) {
+   ie_len = min_t(int, ie_len, NDIS_802_11_LENGTH_RATES_EX);
memcpy(supportRate, p + 2, ie_len);
supportRateNum = ie_len;
}
@@ -819,6 +821,8 @@ int rtw_check_beacon_data(struct adapter
/* get ext_supported rates */
p = rtw_get_ie(ie + _BEACON_IE_OFFSET_, _EXT_SUPPORTEDRATES_IE_, 
&ie_len, pbss_network->ie_length - _BEACON_IE_OFFSET_);
if (p) {
+   ie_len = min_t(int, ie_len,
+  NDIS_802_11_LENGTH_RATES_EX - supportRateNum);
memcpy(supportRate + supportRateNum, p + 2, ie_len);
supportRateNum += ie_len;
}
@@ -932,6 +936,7 @@ int rtw_check_beacon_data(struct adapter
 
pht_cap->mcs.rx_mask[0] = 0xff;
pht_cap->mcs.rx_mask[1] = 0x0;
+   ie_len = min_t(int, ie_len, sizeof(pmlmepriv->htpriv.ht_cap));
memcpy(&pmlmepriv->htpriv.ht_cap, p+2, ie_len);
}

[PATCH 5.4 064/168] udf: fix silent AED tagLocation corruption

2021-03-15 Thread gregkh

From: Greg Kroah-Hartman 

From: Steven J. Magnani 

[ Upstream commit 63c9e47a1642fc817654a1bc18a6ec4bbcc0f056 ]

When extending a file, udf_do_extend_file() may enter following empty
indirect extent. At the end of udf_do_extend_file() we revert prev_epos
to point to the last written extent. However if we end up not adding any
further extent in udf_do_extend_file(), the reverting points prev_epos
into the header area of the AED and following updates of the extents
(in udf_update_extents()) will corrupt the header.

Make sure that we do not follow indirect extent if we are not going to
add any more extents so that returning back to the last written extent
works correctly.

Link: https://lore.kernel.org/r/20210107234116.6190-2-magn...@ieee.org
Signed-off-by: Steven J. Magnani 
Signed-off-by: Jan Kara 
Signed-off-by: Sasha Levin 
---
 fs/udf/inode.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/fs/udf/inode.c b/fs/udf/inode.c
index 97a192eb9949..507f8f910327 100644
--- a/fs/udf/inode.c
+++ b/fs/udf/inode.c
@@ -547,11 +547,14 @@ static int udf_do_extend_file(struct inode *inode,

udf_write_aext(inode, last_pos, &last_ext->extLocation,
last_ext->extLength, 1);
+
/*
-* We've rewritten the last extent but there may be empty
-* indirect extent after it - enter it.
+* We've rewritten the last extent. If we are going to add
+* more extents, we may need to enter possible following
+* empty indirect extent.
 */
-   udf_next_aext(inode, last_pos, &tmploc, &tmplen, 0);
+   if (new_block_bytes || prealloc_len)
+   udf_next_aext(inode, last_pos, &tmploc, &tmplen, 0);
}

/* Managed to do everything necessary? */
-- 
2.30.1

[PATCH 4.19 043/120] udf: fix silent AED tagLocation corruption

2021-03-15 Thread gregkh

From: Greg Kroah-Hartman 

From: Steven J. Magnani 

[ Upstream commit 63c9e47a1642fc817654a1bc18a6ec4bbcc0f056 ]

When extending a file, udf_do_extend_file() may enter following empty
indirect extent. At the end of udf_do_extend_file() we revert prev_epos
to point to the last written extent. However if we end up not adding any
further extent in udf_do_extend_file(), the reverting points prev_epos
into the header area of the AED and following updates of the extents
(in udf_update_extents()) will corrupt the header.

Make sure that we do not follow indirect extent if we are not going to
add any more extents so that returning back to the last written extent
works correctly.

Link: https://lore.kernel.org/r/20210107234116.6190-2-magn...@ieee.org
Signed-off-by: Steven J. Magnani 
Signed-off-by: Jan Kara 
Signed-off-by: Sasha Levin 
---
 fs/udf/inode.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/fs/udf/inode.c b/fs/udf/inode.c
index 3bf89a633836..f5500d2a3879 100644
--- a/fs/udf/inode.c
+++ b/fs/udf/inode.c
@@ -540,11 +540,14 @@ static int udf_do_extend_file(struct inode *inode,

udf_write_aext(inode, last_pos, &last_ext->extLocation,
last_ext->extLength, 1);
+
/*
-* We've rewritten the last extent but there may be empty
-* indirect extent after it - enter it.
+* We've rewritten the last extent. If we are going to add
+* more extents, we may need to enter possible following
+* empty indirect extent.
 */
-   udf_next_aext(inode, last_pos, &tmploc, &tmplen, 0);
+   if (new_block_bytes || prealloc_len)
+   udf_next_aext(inode, last_pos, &tmploc, &tmplen, 0);
}

/* Managed to do everything necessary? */
-- 
2.30.1

[PATCH 5.11 121/306] MIPS: kernel: Reserve exception base early to prevent corruption

2021-03-15 Thread gregkh

From: Greg Kroah-Hartman 

From: Thomas Bogendoerfer 

[ Upstream commit bd67b711bfaa02cf19e88aa2d9edae5c1c1d2739 ]

BMIPS is one of the few platforms that do change the exception base.
After commit 2dcb39645441 ("memblock: do not start bottom-up allocations
with kernel_end") we started seeing BMIPS boards fail to boot with the
built-in FDT being corrupted.

Before the cited commit, early allocations would be in the [kernel_end,
RAM_END] range, but after commit they would be within [RAM_START +
PAGE_SIZE, RAM_END].

The custom exception base handler that is installed by
bmips_ebase_setup() done for BMIPS5000 CPUs ends-up trampling on the
memory region allocated by unflatten_and_copy_device_tree() thus
corrupting the FDT used by the kernel.

To fix this, we need to perform an early reservation of the custom
exception space. Additional we reserve the first 4k (1k for R3k) for
either normal exception vector space (legacy CPUs) or special vectors
like cache exceptions.

Huge thanks to Serge for analysing and proposing a solution to this
issue.

Fixes: 2dcb39645441 ("memblock: do not start bottom-up allocations with 
kernel_end")
Reported-by: Kamal Dasu 
Debugged-by: Serge Semin 
Acked-by: Mike Rapoport 
Tested-by: Florian Fainelli 
Reviewed-by: Serge Semin 
Signed-off-by: Thomas Bogendoerfer 
Signed-off-by: Sasha Levin 
---
 arch/mips/include/asm/traps.h|  3 +++
 arch/mips/kernel/cpu-probe.c |  6 ++
 arch/mips/kernel/cpu-r3k-probe.c |  3 +++
 arch/mips/kernel/traps.c | 10 +-
 4 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/arch/mips/include/asm/traps.h b/arch/mips/include/asm/traps.h
index 6a0864bb604d..9038b91e2d8c 100644
--- a/arch/mips/include/asm/traps.h
+++ b/arch/mips/include/asm/traps.h
@@ -24,6 +24,9 @@ extern void (*board_ebase_setup)(void);
 extern void (*board_cache_error_setup)(void);
 
 extern int register_nmi_notifier(struct notifier_block *nb);
+extern void reserve_exception_space(phys_addr_t addr, unsigned long size);
+
+#define VECTORSPACING 0x100/* for EI/VI mode */
 
 #define nmi_notifier(fn, pri)  \
 ({ \
diff --git a/arch/mips/kernel/cpu-probe.c b/arch/mips/kernel/cpu-probe.c
index 31cb9199197c..21794db53c05 100644
--- a/arch/mips/kernel/cpu-probe.c
+++ b/arch/mips/kernel/cpu-probe.c
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "fpu-probe.h"
@@ -1619,6 +1620,7 @@ static inline void cpu_probe_broadcom(struct cpuinfo_mips 
*c, unsigned int cpu)
c->cputype = CPU_BMIPS3300;
__cpu_name[cpu] = "Broadcom BMIPS3300";
set_elf_platform(cpu, "bmips3300");
+   reserve_exception_space(0x400, VECTORSPACING * 64);
break;
case PRID_IMP_BMIPS43XX: {
int rev = c->processor_id & PRID_REV_MASK;
@@ -1629,6 +1631,7 @@ static inline void cpu_probe_broadcom(struct cpuinfo_mips 
*c, unsigned int cpu)
__cpu_name[cpu] = "Broadcom BMIPS4380";
set_elf_platform(cpu, "bmips4380");
c->options |= MIPS_CPU_RIXI;
+   reserve_exception_space(0x400, VECTORSPACING * 64);
} else {
c->cputype = CPU_BMIPS4350;
__cpu_name[cpu] = "Broadcom BMIPS4350";
@@ -1645,6 +1648,7 @@ static inline void cpu_probe_broadcom(struct cpuinfo_mips 
*c, unsigned int cpu)
__cpu_name[cpu] = "Broadcom BMIPS5000";
set_elf_platform(cpu, "bmips5000");
c->options |= MIPS_CPU_ULRI | MIPS_CPU_RIXI;
+   reserve_exception_space(0x1000, VECTORSPACING * 64);
break;
}
 }
@@ -2124,6 +2128,8 @@ void cpu_probe(void)
if (cpu == 0)
__ua_limit = ~((1ull << cpu_vmbits) - 1);
 #endif
+
+   reserve_exception_space(0, 0x1000);
 }
 
 void cpu_report(void)
diff --git a/arch/mips/kernel/cpu-r3k-probe.c b/arch/mips/kernel/cpu-r3k-probe.c
index abdbbe8c5a43..af654771918c 100644
--- a/arch/mips/kernel/cpu-r3k-probe.c
+++ b/arch/mips/kernel/cpu-r3k-probe.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "fpu-probe.h"
 
@@ -158,6 +159,8 @@ void cpu_probe(void)
cpu_set_fpu_opts(c);
else
cpu_set_nofpu_opts(c);
+
+   reserve_exception_space(0, 0x400);
 }
 
 void cpu_report(void)
diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c
index e0352958e2f7..808b8b61ded1 100644
--- a/arch/mips/kernel/traps.c
+++ b/arch/mips/kernel/traps.c
@@ -2009,13 +2009,16 @@ void __noreturn nmi_exception_handler(struct pt_regs 
*regs)
nmi_exit();
 }
 
-#define VECTORSPACING 0x100/* for EI/VI mode */
-
 unsigned long ebase;
 EXPORT_SYMBOL_GPL(ebase);
 unsigned long exception_handlers[32];
 unsigned long vi_handlers[64];
 
+void reserve_exception_space(p

[PATCH 5.11 129/306] udf: fix silent AED tagLocation corruption

2021-03-15 Thread gregkh

From: Greg Kroah-Hartman 

From: Steven J. Magnani 

[ Upstream commit 63c9e47a1642fc817654a1bc18a6ec4bbcc0f056 ]

When extending a file, udf_do_extend_file() may enter following empty
indirect extent. At the end of udf_do_extend_file() we revert prev_epos
to point to the last written extent. However if we end up not adding any
further extent in udf_do_extend_file(), the reverting points prev_epos
into the header area of the AED and following updates of the extents
(in udf_update_extents()) will corrupt the header.

Make sure that we do not follow indirect extent if we are not going to
add any more extents so that returning back to the last written extent
works correctly.

Link: https://lore.kernel.org/r/20210107234116.6190-2-magn...@ieee.org
Signed-off-by: Steven J. Magnani 
Signed-off-by: Jan Kara 
Signed-off-by: Sasha Levin 
---
 fs/udf/inode.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/fs/udf/inode.c b/fs/udf/inode.c
index bb89c3e43212..0dd2f93ac048 100644
--- a/fs/udf/inode.c
+++ b/fs/udf/inode.c
@@ -544,11 +544,14 @@ static int udf_do_extend_file(struct inode *inode,

udf_write_aext(inode, last_pos, &last_ext->extLocation,
last_ext->extLength, 1);
+
/*
-* We've rewritten the last extent but there may be empty
-* indirect extent after it - enter it.
+* We've rewritten the last extent. If we are going to add
+* more extents, we may need to enter possible following
+* empty indirect extent.
 */
-   udf_next_aext(inode, last_pos, &tmploc, &tmplen, 0);
+   if (new_block_bytes || prealloc_len)
+   udf_next_aext(inode, last_pos, &tmploc, &tmplen, 0);
}

/* Managed to do everything necessary? */
-- 
2.30.1

[PATCH 4.14 68/95] staging: rtl8188eu: fix potential memory corruption in rtw_check_beacon_data()

2021-03-15 Thread gregkh

From: Greg Kroah-Hartman 

From: Dan Carpenter 

commit d4ac640322b06095128a5c45ba4a1e80929fe7f3 upstream.

The "ie_len" is a value in the 1-255 range that comes from the user.  We
have to cap it to ensure that it's not too large or it could lead to
memory corruption.

Fixes: 9a7fe54ddc3a ("staging: r8188eu: Add source files for new driver - part 
1")
Signed-off-by: Dan Carpenter 
Cc: stable 
Link: https://lore.kernel.org/r/YEHyQCrFZKTXyT7J@mwanda
Signed-off-by: Greg Kroah-Hartman 
---
 drivers/staging/rtl8188eu/core/rtw_ap.c |5 +
 1 file changed, 5 insertions(+)

--- a/drivers/staging/rtl8188eu/core/rtw_ap.c
+++ b/drivers/staging/rtl8188eu/core/rtw_ap.c
@@ -912,6 +912,7 @@ int rtw_check_beacon_data(struct adapter
/* SSID */
p = rtw_get_ie(ie + _BEACON_IE_OFFSET_, _SSID_IE_, &ie_len, 
(pbss_network->IELength - _BEACON_IE_OFFSET_));
if (p && ie_len > 0) {
+   ie_len = min_t(int, ie_len, sizeof(pbss_network->Ssid.Ssid));
memset(&pbss_network->Ssid, 0, sizeof(struct ndis_802_11_ssid));
memcpy(pbss_network->Ssid.Ssid, (p + 2), ie_len);
pbss_network->Ssid.SsidLength = ie_len;
@@ -930,6 +931,7 @@ int rtw_check_beacon_data(struct adapter
/*  get supported rates */
p = rtw_get_ie(ie + _BEACON_IE_OFFSET_, _SUPPORTEDRATES_IE_, &ie_len, 
(pbss_network->IELength - _BEACON_IE_OFFSET_));
if (p) {
+   ie_len = min_t(int, ie_len, NDIS_802_11_LENGTH_RATES_EX);
memcpy(supportRate, p + 2, ie_len);
supportRateNum = ie_len;
}
@@ -937,6 +939,8 @@ int rtw_check_beacon_data(struct adapter
/* get ext_supported rates */
p = rtw_get_ie(ie + _BEACON_IE_OFFSET_, _EXT_SUPPORTEDRATES_IE_, 
&ie_len, pbss_network->IELength - _BEACON_IE_OFFSET_);
if (p) {
+   ie_len = min_t(int, ie_len,
+  NDIS_802_11_LENGTH_RATES_EX - supportRateNum);
memcpy(supportRate + supportRateNum, p + 2, ie_len);
supportRateNum += ie_len;
}
@@ -1050,6 +1054,7 @@ int rtw_check_beacon_data(struct adapter
 
pht_cap->mcs.rx_mask[0] = 0xff;
pht_cap->mcs.rx_mask[1] = 0x0;
+   ie_len = min_t(int, ie_len, sizeof(pmlmepriv->htpriv.ht_cap));
memcpy(&pmlmepriv->htpriv.ht_cap, p+2, ie_len);
}

[PATCH 4.14 32/95] udf: fix silent AED tagLocation corruption

2021-03-15 Thread gregkh

From: Greg Kroah-Hartman 

From: Steven J. Magnani 

[ Upstream commit 63c9e47a1642fc817654a1bc18a6ec4bbcc0f056 ]

When extending a file, udf_do_extend_file() may enter following empty
indirect extent. At the end of udf_do_extend_file() we revert prev_epos
to point to the last written extent. However if we end up not adding any
further extent in udf_do_extend_file(), the reverting points prev_epos
into the header area of the AED and following updates of the extents
(in udf_update_extents()) will corrupt the header.

Make sure that we do not follow indirect extent if we are not going to
add any more extents so that returning back to the last written extent
works correctly.

Link: https://lore.kernel.org/r/20210107234116.6190-2-magn...@ieee.org
Signed-off-by: Steven J. Magnani 
Signed-off-by: Jan Kara 
Signed-off-by: Sasha Levin 
---
 fs/udf/inode.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/fs/udf/inode.c b/fs/udf/inode.c
index dd57bd446340..e0e2bc19c929 100644
--- a/fs/udf/inode.c
+++ b/fs/udf/inode.c
@@ -540,11 +540,14 @@ static int udf_do_extend_file(struct inode *inode,

udf_write_aext(inode, last_pos, &last_ext->extLocation,
last_ext->extLength, 1);
+
/*
-* We've rewritten the last extent but there may be empty
-* indirect extent after it - enter it.
+* We've rewritten the last extent. If we are going to add
+* more extents, we may need to enter possible following
+* empty indirect extent.
 */
-   udf_next_aext(inode, last_pos, &tmploc, &tmplen, 0);
+   if (new_block_bytes || prealloc_len)
+   udf_next_aext(inode, last_pos, &tmploc, &tmplen, 0);
}

/* Managed to do everything necessary? */
-- 
2.30.1

[PATCH 5.10 083/290] gpio: fix gpio-device list corruption

2021-03-15 Thread gregkh

From: Greg Kroah-Hartman 

From: Johan Hovold 

commit cf25ef6b631c6fc6c0435fc91eba8734cca20511 upstream.

Make sure to hold the gpio_lock when removing the gpio device from the
gpio_devices list (when dropping the last reference) to avoid corrupting
the list when there are concurrent accesses.

Fixes: ff2b13592299 ("gpio: make the gpiochip a real device")
Cc: sta...@vger.kernel.org  # 4.6
Reviewed-by: Saravana Kannan 
Signed-off-by: Johan Hovold 
Signed-off-by: Bartosz Golaszewski 
[ johan: adjust context to 5.11 ]
Signed-off-by: Johan Hovold 
Signed-off-by: Greg Kroah-Hartman 
---
 drivers/gpio/gpiolib.c |4 
 1 file changed, 4 insertions(+)

--- a/drivers/gpio/gpiolib.c
+++ b/drivers/gpio/gpiolib.c
@@ -468,8 +468,12 @@ EXPORT_SYMBOL_GPL(gpiochip_line_is_valid
 static void gpiodevice_release(struct device *dev)
 {
struct gpio_device *gdev = dev_get_drvdata(dev);
+   unsigned long flags;
 
+   spin_lock_irqsave(&gpio_lock, flags);
list_del(&gdev->list);
+   spin_unlock_irqrestore(&gpio_lock, flags);
+
ida_free(&gpio_ida, gdev->id);
kfree_const(gdev->label);
kfree(gdev->descs);

[PATCH 5.11 038/306] gpio: fix gpio-device list corruption

2021-03-15 Thread gregkh

From: Greg Kroah-Hartman 

From: Johan Hovold 

commit cf25ef6b631c6fc6c0435fc91eba8734cca20511 upstream.

Make sure to hold the gpio_lock when removing the gpio device from the
gpio_devices list (when dropping the last reference) to avoid corrupting
the list when there are concurrent accesses.

Fixes: ff2b13592299 ("gpio: make the gpiochip a real device")
Cc: sta...@vger.kernel.org  # 4.6
Reviewed-by: Saravana Kannan 
Signed-off-by: Johan Hovold 
Signed-off-by: Bartosz Golaszewski 
[ johan: adjust context to 5.11 ]
Signed-off-by: Johan Hovold 
Signed-off-by: Greg Kroah-Hartman 
---
 drivers/gpio/gpiolib.c |4 
 1 file changed, 4 insertions(+)

--- a/drivers/gpio/gpiolib.c
+++ b/drivers/gpio/gpiolib.c
@@ -469,8 +469,12 @@ EXPORT_SYMBOL_GPL(gpiochip_line_is_valid
 static void gpiodevice_release(struct device *dev)
 {
struct gpio_device *gdev = dev_get_drvdata(dev);
+   unsigned long flags;
 
+   spin_lock_irqsave(&gpio_lock, flags);
list_del(&gdev->list);
+   spin_unlock_irqrestore(&gpio_lock, flags);
+
ida_free(&gpio_ida, gdev->id);
kfree_const(gdev->label);
kfree(gdev->descs);

[PATCH 4.9 18/78] udf: fix silent AED tagLocation corruption

2021-03-15 Thread gregkh

From: Greg Kroah-Hartman 

From: Steven J. Magnani 

[ Upstream commit 63c9e47a1642fc817654a1bc18a6ec4bbcc0f056 ]

When extending a file, udf_do_extend_file() may enter following empty
indirect extent. At the end of udf_do_extend_file() we revert prev_epos
to point to the last written extent. However if we end up not adding any
further extent in udf_do_extend_file(), the reverting points prev_epos
into the header area of the AED and following updates of the extents
(in udf_update_extents()) will corrupt the header.

Make sure that we do not follow indirect extent if we are not going to
add any more extents so that returning back to the last written extent
works correctly.

Link: https://lore.kernel.org/r/20210107234116.6190-2-magn...@ieee.org
Signed-off-by: Steven J. Magnani 
Signed-off-by: Jan Kara 
Signed-off-by: Sasha Levin 
---
 fs/udf/inode.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/fs/udf/inode.c b/fs/udf/inode.c
index 149baf5f3d19..50607673a6a9 100644
--- a/fs/udf/inode.c
+++ b/fs/udf/inode.c
@@ -548,11 +548,14 @@ static int udf_do_extend_file(struct inode *inode,

udf_write_aext(inode, last_pos, &last_ext->extLocation,
last_ext->extLength, 1);
+
/*
-* We've rewritten the last extent but there may be empty
-* indirect extent after it - enter it.
+* We've rewritten the last extent. If we are going to add
+* more extents, we may need to enter possible following
+* empty indirect extent.
 */
-   udf_next_aext(inode, last_pos, &tmploc, &tmplen, 0);
+   if (new_block_bytes || prealloc_len)
+   udf_next_aext(inode, last_pos, &tmploc, &tmplen, 0);
}

/* Managed to do everything necessary? */
-- 
2.30.1

[PATCH 4.9 47/78] staging: rtl8188eu: fix potential memory corruption in rtw_check_beacon_data()

2021-03-15 Thread gregkh

From: Greg Kroah-Hartman 

From: Dan Carpenter 

commit d4ac640322b06095128a5c45ba4a1e80929fe7f3 upstream.

The "ie_len" is a value in the 1-255 range that comes from the user.  We
have to cap it to ensure that it's not too large or it could lead to
memory corruption.

Fixes: 9a7fe54ddc3a ("staging: r8188eu: Add source files for new driver - part 
1")
Signed-off-by: Dan Carpenter 
Cc: stable 
Link: https://lore.kernel.org/r/YEHyQCrFZKTXyT7J@mwanda
Signed-off-by: Greg Kroah-Hartman 
---
 drivers/staging/rtl8188eu/core/rtw_ap.c |5 +
 1 file changed, 5 insertions(+)

--- a/drivers/staging/rtl8188eu/core/rtw_ap.c
+++ b/drivers/staging/rtl8188eu/core/rtw_ap.c
@@ -917,6 +917,7 @@ int rtw_check_beacon_data(struct adapter
/* SSID */
p = rtw_get_ie(ie + _BEACON_IE_OFFSET_, _SSID_IE_, &ie_len, 
(pbss_network->IELength - _BEACON_IE_OFFSET_));
if (p && ie_len > 0) {
+   ie_len = min_t(int, ie_len, sizeof(pbss_network->Ssid.Ssid));
memset(&pbss_network->Ssid, 0, sizeof(struct ndis_802_11_ssid));
memcpy(pbss_network->Ssid.Ssid, (p + 2), ie_len);
pbss_network->Ssid.SsidLength = ie_len;
@@ -935,6 +936,7 @@ int rtw_check_beacon_data(struct adapter
/*  get supported rates */
p = rtw_get_ie(ie + _BEACON_IE_OFFSET_, _SUPPORTEDRATES_IE_, &ie_len, 
(pbss_network->IELength - _BEACON_IE_OFFSET_));
if (p) {
+   ie_len = min_t(int, ie_len, NDIS_802_11_LENGTH_RATES_EX);
memcpy(supportRate, p + 2, ie_len);
supportRateNum = ie_len;
}
@@ -942,6 +944,8 @@ int rtw_check_beacon_data(struct adapter
/* get ext_supported rates */
p = rtw_get_ie(ie + _BEACON_IE_OFFSET_, _EXT_SUPPORTEDRATES_IE_, 
&ie_len, pbss_network->IELength - _BEACON_IE_OFFSET_);
if (p) {
+   ie_len = min_t(int, ie_len,
+  NDIS_802_11_LENGTH_RATES_EX - supportRateNum);
memcpy(supportRate + supportRateNum, p + 2, ie_len);
supportRateNum += ie_len;
}
@@ -1057,6 +1061,7 @@ int rtw_check_beacon_data(struct adapter
pht_cap->mcs.rx_mask[0] = 0xff;
pht_cap->mcs.rx_mask[1] = 0x0;
}
+   ie_len = min_t(int, ie_len, sizeof(pmlmepriv->htpriv.ht_cap));
memcpy(&pmlmepriv->htpriv.ht_cap, p+2, ie_len);
}

[PATCH 4.4 46/75] staging: rtl8188eu: fix potential memory corruption in rtw_check_beacon_data()

2021-03-15 Thread gregkh

From: Greg Kroah-Hartman 

From: Dan Carpenter 

commit d4ac640322b06095128a5c45ba4a1e80929fe7f3 upstream.

The "ie_len" is a value in the 1-255 range that comes from the user.  We
have to cap it to ensure that it's not too large or it could lead to
memory corruption.

Fixes: 9a7fe54ddc3a ("staging: r8188eu: Add source files for new driver - part 
1")
Signed-off-by: Dan Carpenter 
Cc: stable 
Link: https://lore.kernel.org/r/YEHyQCrFZKTXyT7J@mwanda
Signed-off-by: Greg Kroah-Hartman 
---
 drivers/staging/rtl8188eu/core/rtw_ap.c |5 +
 1 file changed, 5 insertions(+)

--- a/drivers/staging/rtl8188eu/core/rtw_ap.c
+++ b/drivers/staging/rtl8188eu/core/rtw_ap.c
@@ -921,6 +921,7 @@ int rtw_check_beacon_data(struct adapter
/* SSID */
p = rtw_get_ie(ie + _BEACON_IE_OFFSET_, _SSID_IE_, &ie_len, 
(pbss_network->IELength - _BEACON_IE_OFFSET_));
if (p && ie_len > 0) {
+   ie_len = min_t(int, ie_len, sizeof(pbss_network->Ssid.Ssid));
memset(&pbss_network->Ssid, 0, sizeof(struct ndis_802_11_ssid));
memcpy(pbss_network->Ssid.Ssid, (p + 2), ie_len);
pbss_network->Ssid.SsidLength = ie_len;
@@ -939,6 +940,7 @@ int rtw_check_beacon_data(struct adapter
/*  get supported rates */
p = rtw_get_ie(ie + _BEACON_IE_OFFSET_, _SUPPORTEDRATES_IE_, &ie_len, 
(pbss_network->IELength - _BEACON_IE_OFFSET_));
if (p !=  NULL) {
+   ie_len = min_t(int, ie_len, NDIS_802_11_LENGTH_RATES_EX);
memcpy(supportRate, p+2, ie_len);
supportRateNum = ie_len;
}
@@ -946,6 +948,8 @@ int rtw_check_beacon_data(struct adapter
/* get ext_supported rates */
p = rtw_get_ie(ie + _BEACON_IE_OFFSET_, _EXT_SUPPORTEDRATES_IE_, 
&ie_len, pbss_network->IELength - _BEACON_IE_OFFSET_);
if (p !=  NULL) {
+   ie_len = min_t(int, ie_len,
+  NDIS_802_11_LENGTH_RATES_EX - supportRateNum);
memcpy(supportRate+supportRateNum, p+2, ie_len);
supportRateNum += ie_len;
}
@@ -1061,6 +1065,7 @@ int rtw_check_beacon_data(struct adapter
pht_cap->supp_mcs_set[0] = 0xff;
pht_cap->supp_mcs_set[1] = 0x0;
}
+   ie_len = min_t(int, ie_len, sizeof(pmlmepriv->htpriv.ht_cap));
memcpy(&pmlmepriv->htpriv.ht_cap, p+2, ie_len);
}

Re: [PATCH] rpadlpar: fix potential drc_name corruption in store functions

2021-03-14 Thread Michael Ellerman

Tyrel Datwyler  writes:
> On 3/13/21 1:17 AM, Michal Suchánek wrote:
>> On Wed, Mar 10, 2021 at 04:30:21PM -0600, Tyrel Datwyler wrote:
>>> Both add_slot_store() and remove_slot_store() try to fix up the drc_name
>>> copied from the store buffer by placing a NULL terminator at nbyte + 1
>>> or in place of a '\n' if present. However, the static buffer that we
>>> copy the drc_name data into is not zeored and can contain anything past
>>> the n-th byte. This is problematic if a '\n' byte appears in that buffer
>>> after nbytes and the string copied into the store buffer was not NULL
>>> terminated to start with as the strchr() search for a '\n' byte will mark
>>> this incorrectly as the end of the drc_name string resulting in a drc_name
>>> string that contains garbage data after the n-th byte. The following
>>> debugging shows an example of the drmgr utility writing "PHB 4543" to
>>> the add_slot sysfs attribute, but add_slot_store logging a corrupted
>>> string value.
>>>
>>> [135823.702864] drmgr: drmgr: -c phb -a -s PHB 4543 -d 1
>>> [135823.702879] add_slot_store: drc_name = PHB 4543°|<82>!, rc = -19
>>>
>>> Fix this by NULL terminating the string when we copy it into our static
>>> buffer by coping nbytes + 1 of data from the store buffer. The code has
>> Why is it OK to copy nbytes + 1 and why is it expected that the buffer
>> contains a nul after the content?
>
> It is my understanding that the store function buffer is allocated as a
> zeroed-page which the kernel copies up to at most (PAGE_SIZE - 1) of user data
> into. Anything after nbytes would therefore be zeroed.

I think that's true, but it would be nice if we didn't have to rely on
that obscure detail in order for this code to be correct & understandable.

>> Isn't it much saner to just nul terminate the string after copying?
>
> At the cost of an extra line of code, sure.

Is there a reason we can't use strscpy()? That should deal with all the
corner cases around the string copy, and then all you have to do is look
for a newline and turn it into nul.

cheers

Re: [PATCH] rpadlpar: fix potential drc_name corruption in store functions

2021-03-14 Thread Tyrel Datwyler

On 3/13/21 1:17 AM, Michal Suchánek wrote:
> On Wed, Mar 10, 2021 at 04:30:21PM -0600, Tyrel Datwyler wrote:
>> Both add_slot_store() and remove_slot_store() try to fix up the drc_name
>> copied from the store buffer by placing a NULL terminator at nbyte + 1
>> or in place of a '\n' if present. However, the static buffer that we
>> copy the drc_name data into is not zeored and can contain anything past
>> the n-th byte. This is problematic if a '\n' byte appears in that buffer
>> after nbytes and the string copied into the store buffer was not NULL
>> terminated to start with as the strchr() search for a '\n' byte will mark
>> this incorrectly as the end of the drc_name string resulting in a drc_name
>> string that contains garbage data after the n-th byte. The following
>> debugging shows an example of the drmgr utility writing "PHB 4543" to
>> the add_slot sysfs attribute, but add_slot_store logging a corrupted
>> string value.
>>
>> [135823.702864] drmgr: drmgr: -c phb -a -s PHB 4543 -d 1
>> [135823.702879] add_slot_store: drc_name = PHB 4543°|<82>!, rc = -19
>>
>> Fix this by NULL terminating the string when we copy it into our static
>> buffer by coping nbytes + 1 of data from the store buffer. The code has
> Why is it OK to copy nbytes + 1 and why is it expected that the buffer
> contains a nul after the content?

It is my understanding that the store function buffer is allocated as a
zeroed-page which the kernel copies up to at most (PAGE_SIZE - 1) of user data
into. Anything after nbytes would therefore be zeroed.

> 
> Isn't it much saner to just nul terminate the string after copying?

At the cost of an extra line of code, sure.

-Tyrel

> 
> diff --git a/drivers/pci/hotplug/rpadlpar_sysfs.c 
> b/drivers/pci/hotplug/rpadlpar_sysfs.c
> index cdbfa5df3a51..cfbad67447da 100644
> --- a/drivers/pci/hotplug/rpadlpar_sysfs.c
> +++ b/drivers/pci/hotplug/rpadlpar_sysfs.c
> @@ -35,11 +35,11 @@ static ssize_t add_slot_store(struct kobject *kobj, 
> struct kobj_attribute *attr,
>   return 0;
>  
>   memcpy(drc_name, buf, nbytes);
> + &drc_name[nbytes] = '\0';
>  
>   end = strchr(drc_name, '\n');
> - if (!end)
> - end = &drc_name[nbytes];
> - *end = '\0';
> + if (end)
> + *end = '\0';
>  
>   rc = dlpar_add_slot(drc_name);
>   if (rc)
> @@ -66,11 +66,11 @@ static ssize_t remove_slot_store(struct kobject *kobj,
>   return 0;
>  
>   memcpy(drc_name, buf, nbytes);
> + &drc_name[nbytes] = '\0';
>  
>   end = strchr(drc_name, '\n');
> - if (!end)
> - end = &drc_name[nbytes];
> - *end = '\0';
> + if (end)
> + *end = '\0';
>  
>   rc = dlpar_remove_slot(drc_name);
>   if (rc)
> 
> Thanks
> 
> Michal
> 
>> already made sure that nbytes is not >= MAX_DRC_NAME_LEN and the store
>> buffer is guaranteed to be zeroed beyond the nth-byte of data copied
>> from the user. Further, since the string is now NULL terminated the code
>> only needs to change '\n' to '\0' when present.
>>
>> Signed-off-by: Tyrel Datwyler 
>> ---
>>  drivers/pci/hotplug/rpadlpar_sysfs.c | 14 ++
>>  1 file changed, 6 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/pci/hotplug/rpadlpar_sysfs.c 
>> b/drivers/pci/hotplug/rpadlpar_sysfs.c
>> index cdbfa5df3a51..375087921284 100644
>> --- a/drivers/pci/hotplug/rpadlpar_sysfs.c
>> +++ b/drivers/pci/hotplug/rpadlpar_sysfs.c
>> @@ -34,12 +34,11 @@ static ssize_t add_slot_store(struct kobject *kobj, 
>> struct kobj_attribute *attr,
>>  if (nbytes >= MAX_DRC_NAME_LEN)
>>  return 0;
>>  
>> -memcpy(drc_name, buf, nbytes);
>> +memcpy(drc_name, buf, nbytes + 1);
>>  
>>  end = strchr(drc_name, '\n');
>> -if (!end)
>> -end = &drc_name[nbytes];
>> -*end = '\0';
>> +if (end)
>> +*end = '\0';
>>  
>>  rc = dlpar_add_slot(drc_name);
>>  if (rc)
>> @@ -65,12 +64,11 @@ static ssize_t remove_slot_store(struct kobject *kobj,
>>  if (nbytes >= MAX_DRC_NAME_LEN)
>>  return 0;
>>  
>> -memcpy(drc_name, buf, nbytes);
>> +memcpy(drc_name, buf, nbytes + 1);
>>  
>>  end = strchr(drc_name, '\n');
>> -if (!end)
>> -end = &drc_name[nbytes];
>> -*end = '\0';
>> +if (end)
>> +*end = '\0';
>>  
>>  rc = dlpar_remove_slot(drc_name);
>>  if (rc)
>> -- 
>> 2.27.0
>>

Re: [PATCH] rpadlpar: fix potential drc_name corruption in store functions

2021-03-13 Thread Michal Suchánek

On Wed, Mar 10, 2021 at 04:30:21PM -0600, Tyrel Datwyler wrote:
> Both add_slot_store() and remove_slot_store() try to fix up the drc_name
> copied from the store buffer by placing a NULL terminator at nbyte + 1
> or in place of a '\n' if present. However, the static buffer that we
> copy the drc_name data into is not zeored and can contain anything past
> the n-th byte. This is problematic if a '\n' byte appears in that buffer
> after nbytes and the string copied into the store buffer was not NULL
> terminated to start with as the strchr() search for a '\n' byte will mark
> this incorrectly as the end of the drc_name string resulting in a drc_name
> string that contains garbage data after the n-th byte. The following
> debugging shows an example of the drmgr utility writing "PHB 4543" to
> the add_slot sysfs attribute, but add_slot_store logging a corrupted
> string value.
> 
> [135823.702864] drmgr: drmgr: -c phb -a -s PHB 4543 -d 1
> [135823.702879] add_slot_store: drc_name = PHB 4543°|<82>!, rc = -19
> 
> Fix this by NULL terminating the string when we copy it into our static
> buffer by coping nbytes + 1 of data from the store buffer. The code has
Why is it OK to copy nbytes + 1 and why is it expected that the buffer
contains a nul after the content?

Isn't it much saner to just nul terminate the string after copying?

diff --git a/drivers/pci/hotplug/rpadlpar_sysfs.c 
b/drivers/pci/hotplug/rpadlpar_sysfs.c
index cdbfa5df3a51..cfbad67447da 100644
--- a/drivers/pci/hotplug/rpadlpar_sysfs.c
+++ b/drivers/pci/hotplug/rpadlpar_sysfs.c
@@ -35,11 +35,11 @@ static ssize_t add_slot_store(struct kobject *kobj, struct 
kobj_attribute *attr,
return 0;
 
memcpy(drc_name, buf, nbytes);
+   &drc_name[nbytes] = '\0';
 
end = strchr(drc_name, '\n');
-   if (!end)
-   end = &drc_name[nbytes];
-   *end = '\0';
+   if (end)
+   *end = '\0';
 
rc = dlpar_add_slot(drc_name);
if (rc)
@@ -66,11 +66,11 @@ static ssize_t remove_slot_store(struct kobject *kobj,
return 0;
 
memcpy(drc_name, buf, nbytes);
+   &drc_name[nbytes] = '\0';
 
end = strchr(drc_name, '\n');
-   if (!end)
-   end = &drc_name[nbytes];
-   *end = '\0';
+   if (end)
+   *end = '\0';
 
rc = dlpar_remove_slot(drc_name);
if (rc)

Thanks

Michal

> already made sure that nbytes is not >= MAX_DRC_NAME_LEN and the store
> buffer is guaranteed to be zeroed beyond the nth-byte of data copied
> from the user. Further, since the string is now NULL terminated the code
> only needs to change '\n' to '\0' when present.
> 
> Signed-off-by: Tyrel Datwyler 
> ---
>  drivers/pci/hotplug/rpadlpar_sysfs.c | 14 ++
>  1 file changed, 6 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/pci/hotplug/rpadlpar_sysfs.c 
> b/drivers/pci/hotplug/rpadlpar_sysfs.c
> index cdbfa5df3a51..375087921284 100644
> --- a/drivers/pci/hotplug/rpadlpar_sysfs.c
> +++ b/drivers/pci/hotplug/rpadlpar_sysfs.c
> @@ -34,12 +34,11 @@ static ssize_t add_slot_store(struct kobject *kobj, 
> struct kobj_attribute *attr,
>   if (nbytes >= MAX_DRC_NAME_LEN)
>   return 0;
>  
> - memcpy(drc_name, buf, nbytes);
> + memcpy(drc_name, buf, nbytes + 1);
>  
>   end = strchr(drc_name, '\n');
> - if (!end)
> - end = &drc_name[nbytes];
> - *end = '\0';
> + if (end)
> + *end = '\0';
>  
>   rc = dlpar_add_slot(drc_name);
>   if (rc)
> @@ -65,12 +64,11 @@ static ssize_t remove_slot_store(struct kobject *kobj,
>   if (nbytes >= MAX_DRC_NAME_LEN)
>   return 0;
>  
> - memcpy(drc_name, buf, nbytes);
> + memcpy(drc_name, buf, nbytes + 1);
>  
>   end = strchr(drc_name, '\n');
> - if (!end)
> - end = &drc_name[nbytes];
> - *end = '\0';
> + if (end)
> + *end = '\0';
>  
>   rc = dlpar_remove_slot(drc_name);
>   if (rc)
> -- 
> 2.27.0
>

[PATCH] rpadlpar: fix potential drc_name corruption in store functions

2021-03-10 Thread Tyrel Datwyler

Both add_slot_store() and remove_slot_store() try to fix up the drc_name
copied from the store buffer by placing a NULL terminator at nbyte + 1
or in place of a '\n' if present. However, the static buffer that we
copy the drc_name data into is not zeored and can contain anything past
the n-th byte. This is problematic if a '\n' byte appears in that buffer
after nbytes and the string copied into the store buffer was not NULL
terminated to start with as the strchr() search for a '\n' byte will mark
this incorrectly as the end of the drc_name string resulting in a drc_name
string that contains garbage data after the n-th byte. The following
debugging shows an example of the drmgr utility writing "PHB 4543" to
the add_slot sysfs attribute, but add_slot_store logging a corrupted
string value.

[135823.702864] drmgr: drmgr: -c phb -a -s PHB 4543 -d 1
[135823.702879] add_slot_store: drc_name = PHB 4543°|<82>!, rc = -19

Fix this by NULL terminating the string when we copy it into our static
buffer by coping nbytes + 1 of data from the store buffer. The code has
already made sure that nbytes is not >= MAX_DRC_NAME_LEN and the store
buffer is guaranteed to be zeroed beyond the nth-byte of data copied
from the user. Further, since the string is now NULL terminated the code
only needs to change '\n' to '\0' when present.

Signed-off-by: Tyrel Datwyler 
---
 drivers/pci/hotplug/rpadlpar_sysfs.c | 14 ++
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/drivers/pci/hotplug/rpadlpar_sysfs.c 
b/drivers/pci/hotplug/rpadlpar_sysfs.c
index cdbfa5df3a51..375087921284 100644
--- a/drivers/pci/hotplug/rpadlpar_sysfs.c
+++ b/drivers/pci/hotplug/rpadlpar_sysfs.c
@@ -34,12 +34,11 @@ static ssize_t add_slot_store(struct kobject *kobj, struct 
kobj_attribute *attr,
if (nbytes >= MAX_DRC_NAME_LEN)
return 0;
 
-   memcpy(drc_name, buf, nbytes);
+   memcpy(drc_name, buf, nbytes + 1);
 
end = strchr(drc_name, '\n');
-   if (!end)
-   end = &drc_name[nbytes];
-   *end = '\0';
+   if (end)
+   *end = '\0';
 
rc = dlpar_add_slot(drc_name);
if (rc)
@@ -65,12 +64,11 @@ static ssize_t remove_slot_store(struct kobject *kobj,
if (nbytes >= MAX_DRC_NAME_LEN)
return 0;
 
-   memcpy(drc_name, buf, nbytes);
+   memcpy(drc_name, buf, nbytes + 1);
 
end = strchr(drc_name, '\n');
-   if (!end)
-   end = &drc_name[nbytes];
-   *end = '\0';
+   if (end)
+   *end = '\0';
 
rc = dlpar_remove_slot(drc_name);
if (rc)
-- 
2.27.0

Re: [PATCH] mm/slub: Add slub_debug option to panic on memory corruption

2021-03-09 Thread Vlastimil Babka

On 3/9/21 7:14 PM, Georgi Djakov wrote:
> Hi Vlastimil,
> 
> Thanks for the comment!
> 
> On 3/9/21 17:09, Vlastimil Babka wrote:
>> On 3/9/21 2:47 PM, Georgi Djakov wrote:
>>> Being able to stop the system immediately when a memory corruption
>>> is detected is crucial to finding the source of it. This is very
>>> useful when the memory can be inspected with kdump or other tools.
>>
>> Is this in some testing scenarios where you would also use e.g. 
>> panic_on_warn?
>> We could hook to that. If not, we could introduce a new
>> panic_on_memory_corruption that would apply also for debug_pagealloc and 
>> whatnot?
> 
> I would prefer that we not tie it with panic_on_warn - there might be lots of
> new code in multiple subsystems, so hitting some WARNing while testing is not
> something unexpected.
> 
> Introducing an additional panic_on_memory_corruption would work, but i noticed
> that we already have slub_debug and thought to re-use that. But indeed, аdding
> an option to panic in for example bad_page() sounds also useful, if that's 
> what
> you suggest.

Yes, that would be another example.
Also CCing Kees for input, as besides the "kdump ASAP for debugging" case, I can
imagine security hardening folks could be interested in the "somebody might have
just failed to pwn the kernel, better panic than let them continue" angle. But
I'm naive wrt security, so it might be a stupid idea :)

Vlastimil


> Thanks,
> Georgi

Re: [PATCH] mm/slub: Add slub_debug option to panic on memory corruption

2021-03-09 Thread Georgi Djakov


Hi Vlastimil,

Thanks for the comment!

On 3/9/21 17:09, Vlastimil Babka wrote:

On 3/9/21 2:47 PM, Georgi Djakov wrote:

Being able to stop the system immediately when a memory corruption
is detected is crucial to finding the source of it. This is very
useful when the memory can be inspected with kdump or other tools.


Is this in some testing scenarios where you would also use e.g. panic_on_warn?
We could hook to that. If not, we could introduce a new
panic_on_memory_corruption that would apply also for debug_pagealloc and 
whatnot?


I would prefer that we not tie it with panic_on_warn - there might be lots of
new code in multiple subsystems, so hitting some WARNing while testing is not
something unexpected.

Introducing an additional panic_on_memory_corruption would work, but i noticed
that we already have slub_debug and thought to re-use that. But indeed, аdding
an option to panic in for example bad_page() sounds also useful, if that's what
you suggest.

Thanks,
Georgi

Re: [PATCH] mm/slub: Add slub_debug option to panic on memory corruption

2021-03-09 Thread Georgi Djakov


Hi Christoph,

Thanks for the comments!

On 3/9/21 16:56, Christoph Lameter wrote:

On Tue, 9 Mar 2021, Georgi Djakov wrote:


Being able to stop the system immediately when a memory corruption
is detected is crucial to finding the source of it. This is very
useful when the memory can be inspected with kdump or other tools.


Hmmm ok.


The idea is to be able to collect data right after the corruption is
detected, otherwise more data might be corrupted and tracing becomes
more difficult.




  static void restore_bytes(struct kmem_cache *s, char *message, u8 data,
void *from, void *to)
  {
+   if (slub_debug & SLAB_CORRUPTION_PANIC)
+   panic("slab: object overwritten\n");
slab_fix(s, "Restoring 0x%p-0x%p=0x%x\n", from, to - 1, data);
memset(from, data, to - from);
  }


Why panic here? This should only be called late in the bug reporting when
an error has already been printed.


This is called by both slab_pad_check() and check_bytes_and_report(), so it
seemed like a common place where i could put the panic(). I can move it to
the caller functions instead, if that's preferred.

Thanks,
Georgi

Re: [PATCH] mm/slub: Add slub_debug option to panic on memory corruption

2021-03-09 Thread Vlastimil Babka

On 3/9/21 2:47 PM, Georgi Djakov wrote:
> Being able to stop the system immediately when a memory corruption
> is detected is crucial to finding the source of it. This is very
> useful when the memory can be inspected with kdump or other tools.

Is this in some testing scenarios where you would also use e.g. panic_on_warn?
We could hook to that. If not, we could introduce a new
panic_on_memory_corruption that would apply also for debug_pagealloc and 
whatnot?

> Let's add an option panic the kernel when slab debug catches an
> object or list corruption.
> 
> This new option is not enabled by default (yet), so it needs to be
> enabled explicitly (for example by adding "slub_debug=FZPUC" to
> the kernel command line).
> 
> Signed-off-by: Georgi Djakov 
> ---
>  Documentation/vm/slub.rst | 1 +
>  include/linux/slab.h  | 3 +++
>  mm/slab.h | 2 +-
>  mm/slub.c | 9 +
>  4 files changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/vm/slub.rst b/Documentation/vm/slub.rst
> index 03f294a638bd..32878c44f3de 100644
> --- a/Documentation/vm/slub.rst
> +++ b/Documentation/vm/slub.rst
> @@ -53,6 +53,7 @@ Possible debug options are::
>   Z   Red zoning
>   P   Poisoning (object and padding)
>   U   User tracking (free and alloc)
> + C   Panic on object corruption (enables 
> SLAB_CORRUPTION_PANIC)
>   T   Trace (please only use on single slabs)
>   A   Enable failslab filter mark for the cache
>   O   Switch debugging off for caches that would have
> diff --git a/include/linux/slab.h b/include/linux/slab.h
> index 0c97d788762c..ebff5e704d08 100644
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -39,6 +39,9 @@
>  #define SLAB_STORE_USER  ((slab_flags_t __force)0x0001U)
>  /* Panic if kmem_cache_create() fails */
>  #define SLAB_PANIC   ((slab_flags_t __force)0x0004U)
> +/* Panic if memory corruption is detected */
> +#define SLAB_CORRUPTION_PANIC((slab_flags_t __force)0x0008U)
> +
>  /*
>   * SLAB_TYPESAFE_BY_RCU - **WARNING** READ THIS!
>   *
> diff --git a/mm/slab.h b/mm/slab.h
> index 120b1d0dfb6d..ae0079017fc6 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -134,7 +134,7 @@ static inline slab_flags_t kmem_cache_flags(unsigned int 
> object_size,
>  #define SLAB_DEBUG_FLAGS (SLAB_RED_ZONE | SLAB_POISON | SLAB_STORE_USER)
>  #elif defined(CONFIG_SLUB_DEBUG)
>  #define SLAB_DEBUG_FLAGS (SLAB_RED_ZONE | SLAB_POISON | SLAB_STORE_USER | \
> -   SLAB_TRACE | SLAB_CONSISTENCY_CHECKS)
> +   SLAB_TRACE | SLAB_CONSISTENCY_CHECKS | 
> SLAB_CORRUPTION_PANIC)
>  #else
>  #define SLAB_DEBUG_FLAGS (0)
>  #endif
> diff --git a/mm/slub.c b/mm/slub.c
> index 077a019e4d7a..49351427f701 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -741,6 +741,8 @@ void object_err(struct kmem_cache *s, struct page *page,
>  {
>   slab_bug(s, "%s", reason);
>   print_trailer(s, page, object);
> + if (slub_debug & SLAB_CORRUPTION_PANIC)
> + panic(reason);
>  }
>  
>  static __printf(3, 4) void slab_err(struct kmem_cache *s, struct page *page,
> @@ -755,6 +757,8 @@ static __printf(3, 4) void slab_err(struct kmem_cache *s, 
> struct page *page,
>   slab_bug(s, "%s", buf);
>   print_page_info(page);
>   dump_stack();
> + if (slub_debug & SLAB_CORRUPTION_PANIC)
> + panic("slab: slab error\n");
>  }
>  
>  static void init_object(struct kmem_cache *s, void *object, u8 val)
> @@ -776,6 +780,8 @@ static void init_object(struct kmem_cache *s, void 
> *object, u8 val)
>  static void restore_bytes(struct kmem_cache *s, char *message, u8 data,
>   void *from, void *to)
>  {
> + if (slub_debug & SLAB_CORRUPTION_PANIC)
> + panic("slab: object overwritten\n");
>   slab_fix(s, "Restoring 0x%p-0x%p=0x%x\n", from, to - 1, data);
>   memset(from, data, to - from);
>  }
> @@ -1319,6 +1325,9 @@ parse_slub_debug_flags(char *str, slab_flags_t *flags, 
> char **slabs, bool init)
>   case 'a':
>   *flags |= SLAB_FAILSLAB;
>   break;
> + case 'c':
> + *flags |= SLAB_CORRUPTION_PANIC;
> + break;
>   case 'o':
>   /*
>* Avoid enabling debugging on caches if its minimum
>

Re: [PATCH] mm/slub: Add slub_debug option to panic on memory corruption

2021-03-09 Thread Christoph Lameter

On Tue, 9 Mar 2021, Georgi Djakov wrote:

> Being able to stop the system immediately when a memory corruption
> is detected is crucial to finding the source of it. This is very
> useful when the memory can be inspected with kdump or other tools.

Hmmm ok.

>  static void restore_bytes(struct kmem_cache *s, char *message, u8 data,
>   void *from, void *to)
>  {
> + if (slub_debug & SLAB_CORRUPTION_PANIC)
> + panic("slab: object overwritten\n");
>   slab_fix(s, "Restoring 0x%p-0x%p=0x%x\n", from, to - 1, data);
>   memset(from, data, to - from);
>  }

Why panic here? This should only be called late in the bug reporting when
an error has already been printed.

[PATCH] mm/slub: Add slub_debug option to panic on memory corruption

2021-03-09 Thread Georgi Djakov

Being able to stop the system immediately when a memory corruption
is detected is crucial to finding the source of it. This is very
useful when the memory can be inspected with kdump or other tools.

Let's add an option panic the kernel when slab debug catches an
object or list corruption.

This new option is not enabled by default (yet), so it needs to be
enabled explicitly (for example by adding "slub_debug=FZPUC" to
the kernel command line).

Signed-off-by: Georgi Djakov 
---
 Documentation/vm/slub.rst | 1 +
 include/linux/slab.h  | 3 +++
 mm/slab.h | 2 +-
 mm/slub.c | 9 +
 4 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/Documentation/vm/slub.rst b/Documentation/vm/slub.rst
index 03f294a638bd..32878c44f3de 100644
--- a/Documentation/vm/slub.rst
+++ b/Documentation/vm/slub.rst
@@ -53,6 +53,7 @@ Possible debug options are::
Z   Red zoning
P   Poisoning (object and padding)
U   User tracking (free and alloc)
+   C   Panic on object corruption (enables 
SLAB_CORRUPTION_PANIC)
T   Trace (please only use on single slabs)
A   Enable failslab filter mark for the cache
O   Switch debugging off for caches that would have
diff --git a/include/linux/slab.h b/include/linux/slab.h
index 0c97d788762c..ebff5e704d08 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -39,6 +39,9 @@
 #define SLAB_STORE_USER((slab_flags_t __force)0x0001U)
 /* Panic if kmem_cache_create() fails */
 #define SLAB_PANIC ((slab_flags_t __force)0x0004U)
+/* Panic if memory corruption is detected */
+#define SLAB_CORRUPTION_PANIC  ((slab_flags_t __force)0x0008U)
+
 /*
  * SLAB_TYPESAFE_BY_RCU - **WARNING** READ THIS!
  *
diff --git a/mm/slab.h b/mm/slab.h
index 120b1d0dfb6d..ae0079017fc6 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -134,7 +134,7 @@ static inline slab_flags_t kmem_cache_flags(unsigned int 
object_size,
 #define SLAB_DEBUG_FLAGS (SLAB_RED_ZONE | SLAB_POISON | SLAB_STORE_USER)
 #elif defined(CONFIG_SLUB_DEBUG)
 #define SLAB_DEBUG_FLAGS (SLAB_RED_ZONE | SLAB_POISON | SLAB_STORE_USER | \
- SLAB_TRACE | SLAB_CONSISTENCY_CHECKS)
+ SLAB_TRACE | SLAB_CONSISTENCY_CHECKS | 
SLAB_CORRUPTION_PANIC)
 #else
 #define SLAB_DEBUG_FLAGS (0)
 #endif
diff --git a/mm/slub.c b/mm/slub.c
index 077a019e4d7a..49351427f701 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -741,6 +741,8 @@ void object_err(struct kmem_cache *s, struct page *page,
 {
slab_bug(s, "%s", reason);
print_trailer(s, page, object);
+   if (slub_debug & SLAB_CORRUPTION_PANIC)
+   panic(reason);
 }
 
 static __printf(3, 4) void slab_err(struct kmem_cache *s, struct page *page,
@@ -755,6 +757,8 @@ static __printf(3, 4) void slab_err(struct kmem_cache *s, 
struct page *page,
slab_bug(s, "%s", buf);
print_page_info(page);
dump_stack();
+   if (slub_debug & SLAB_CORRUPTION_PANIC)
+   panic("slab: slab error\n");
 }
 
 static void init_object(struct kmem_cache *s, void *object, u8 val)
@@ -776,6 +780,8 @@ static void init_object(struct kmem_cache *s, void *object, 
u8 val)
 static void restore_bytes(struct kmem_cache *s, char *message, u8 data,
void *from, void *to)
 {
+   if (slub_debug & SLAB_CORRUPTION_PANIC)
+   panic("slab: object overwritten\n");
slab_fix(s, "Restoring 0x%p-0x%p=0x%x\n", from, to - 1, data);
memset(from, data, to - from);
 }
@@ -1319,6 +1325,9 @@ parse_slub_debug_flags(char *str, slab_flags_t *flags, 
char **slabs, bool init)
case 'a':
*flags |= SLAB_FAILSLAB;
break;
+   case 'c':
+   *flags |= SLAB_CORRUPTION_PANIC;
+   break;
case 'o':
/*
 * Avoid enabling debugging on caches if its minimum

Re: [PATCH v3] MIPS: kernel: Reserve exception base early to prevent corruption

2021-03-09 Thread Thomas Bogendoerfer

On Mon, Mar 08, 2021 at 10:24:47AM +0100, Thomas Bogendoerfer wrote:
> BMIPS is one of the few platforms that do change the exception base.
> After commit 2dcb39645441 ("memblock: do not start bottom-up allocations
> with kernel_end") we started seeing BMIPS boards fail to boot with the
> built-in FDT being corrupted.
> 
> Before the cited commit, early allocations would be in the [kernel_end,
> RAM_END] range, but after commit they would be within [RAM_START +
> PAGE_SIZE, RAM_END].
> 
> The custom exception base handler that is installed by
> bmips_ebase_setup() done for BMIPS5000 CPUs ends-up trampling on the
> memory region allocated by unflatten_and_copy_device_tree() thus
> corrupting the FDT used by the kernel.
> 
> To fix this, we need to perform an early reservation of the custom
> exception space. Additional we reserve the first 4k (1k for R3k) for
> either normal exception vector space (legacy CPUs) or special vectors
> like cache exceptions.
> 
> Huge thanks to Serge for analysing and proposing a solution to this
> issue.
> 
> Fixes: 2dcb39645441 ("memblock: do not start bottom-up allocations with 
> kernel_end")
> Reported-by: Kamal Dasu 
> Debugged-by: Serge Semin 
> Signed-off-by: Thomas Bogendoerfer 
> ---
> Changes in v3:
>  - always reserve the first 4k for all CPUs (1k for R3k)
> 
> Changes in v2:
>  - do only memblock reservation in reserve_exception_space()
>  - reserve 0..0x400 for all CPUs without ebase register and
>to addtional reserve_exception_space for BMIPS CPUs
> 
>  arch/mips/include/asm/traps.h|  3 +++
>  arch/mips/kernel/cpu-probe.c |  6 ++
>  arch/mips/kernel/cpu-r3k-probe.c |  3 +++
>  arch/mips/kernel/traps.c | 10 +-
>  4 files changed, 17 insertions(+), 5 deletions(-)

applied to mips-fixes.

Thomas.

-- 
Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
good idea.[ RFC1925, 2.3 ]

Re: [PATCH v3] MIPS: kernel: Reserve exception base early to prevent corruption

2021-03-08 Thread Serge Semin

Hi Thomas

On Mon, Mar 08, 2021 at 10:24:47AM +0100, Thomas Bogendoerfer wrote:
> BMIPS is one of the few platforms that do change the exception base.
> After commit 2dcb39645441 ("memblock: do not start bottom-up allocations
> with kernel_end") we started seeing BMIPS boards fail to boot with the
> built-in FDT being corrupted.
> 
> Before the cited commit, early allocations would be in the [kernel_end,
> RAM_END] range, but after commit they would be within [RAM_START +
> PAGE_SIZE, RAM_END].
> 
> The custom exception base handler that is installed by
> bmips_ebase_setup() done for BMIPS5000 CPUs ends-up trampling on the
> memory region allocated by unflatten_and_copy_device_tree() thus
> corrupting the FDT used by the kernel.
> 
> To fix this, we need to perform an early reservation of the custom
> exception space. Additional we reserve the first 4k (1k for R3k) for
> either normal exception vector space (legacy CPUs) or special vectors
> like cache exceptions.
> 
> Huge thanks to Serge for analysing and proposing a solution to this
> issue.

Reviewed-by: Serge Semin 

-Sergey

> 
> Fixes: 2dcb39645441 ("memblock: do not start bottom-up allocations with 
> kernel_end")
> Reported-by: Kamal Dasu 
> Debugged-by: Serge Semin 
> Signed-off-by: Thomas Bogendoerfer 
> ---
> Changes in v3:
>  - always reserve the first 4k for all CPUs (1k for R3k)
> 
> Changes in v2:
>  - do only memblock reservation in reserve_exception_space()
>  - reserve 0..0x400 for all CPUs without ebase register and
>to addtional reserve_exception_space for BMIPS CPUs
> 
>  arch/mips/include/asm/traps.h|  3 +++
>  arch/mips/kernel/cpu-probe.c |  6 ++
>  arch/mips/kernel/cpu-r3k-probe.c |  3 +++
>  arch/mips/kernel/traps.c | 10 +-
>  4 files changed, 17 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/mips/include/asm/traps.h b/arch/mips/include/asm/traps.h
> index 6aa8f126a43d..b710e76c9c65 100644
> --- a/arch/mips/include/asm/traps.h
> +++ b/arch/mips/include/asm/traps.h
> @@ -24,8 +24,11 @@ extern void (*board_ebase_setup)(void);
>  extern void (*board_cache_error_setup)(void);
>  
>  extern int register_nmi_notifier(struct notifier_block *nb);
> +extern void reserve_exception_space(phys_addr_t addr, unsigned long size);
>  extern char except_vec_nmi[];
>  
> +#define VECTORSPACING 0x100  /* for EI/VI mode */
> +
>  #define nmi_notifier(fn, pri)
> \
>  ({   \
>   static struct notifier_block fn##_nb = {\
> diff --git a/arch/mips/kernel/cpu-probe.c b/arch/mips/kernel/cpu-probe.c
> index 9a89637b4ecf..b71892064f27 100644
> --- a/arch/mips/kernel/cpu-probe.c
> +++ b/arch/mips/kernel/cpu-probe.c
> @@ -26,6 +26,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  
>  #include "fpu-probe.h"
> @@ -1628,6 +1629,7 @@ static inline void cpu_probe_broadcom(struct 
> cpuinfo_mips *c, unsigned int cpu)
>   c->cputype = CPU_BMIPS3300;
>   __cpu_name[cpu] = "Broadcom BMIPS3300";
>   set_elf_platform(cpu, "bmips3300");
> + reserve_exception_space(0x400, VECTORSPACING * 64);
>   break;
>   case PRID_IMP_BMIPS43XX: {
>   int rev = c->processor_id & PRID_REV_MASK;
> @@ -1638,6 +1640,7 @@ static inline void cpu_probe_broadcom(struct 
> cpuinfo_mips *c, unsigned int cpu)
>   __cpu_name[cpu] = "Broadcom BMIPS4380";
>   set_elf_platform(cpu, "bmips4380");
>   c->options |= MIPS_CPU_RIXI;
> + reserve_exception_space(0x400, VECTORSPACING * 64);
>   } else {
>   c->cputype = CPU_BMIPS4350;
>   __cpu_name[cpu] = "Broadcom BMIPS4350";
> @@ -1654,6 +1657,7 @@ static inline void cpu_probe_broadcom(struct 
> cpuinfo_mips *c, unsigned int cpu)
>   __cpu_name[cpu] = "Broadcom BMIPS5000";
>   set_elf_platform(cpu, "bmips5000");
>   c->options |= MIPS_CPU_ULRI | MIPS_CPU_RIXI;
> + reserve_exception_space(0x1000, VECTORSPACING * 64);
>   break;
>   }
>  }
> @@ -2133,6 +2137,8 @@ void cpu_probe(void)
>   if (cpu == 0)
>   __ua_limit = ~((1ull << cpu_vmbits) - 1);
>  #endif
> +
> + reserve_exception_space(0, 0x1000);
>  }
>  
>  void cpu_report(void)
> diff --git a/arch/mips/kernel/cpu-r3k-probe.c 
> b/arch/mips/kernel/cpu-r3k-probe.c
> index abdbbe8c5a43..af654771918c 100644
> --- a/arch/mips/kernel/cpu-r3k-probe.c
> +++ b/arch/mips/kernel/cpu-r3k-probe.c
> @@ -21,6 +21,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "fpu-probe.h"
>  
> @@ -158,6 +159,8 @@ void cpu_probe(void)
>   cpu_set_fpu_opts(c);
>   else
>   cpu_set_nofpu_opts(c);
> +
> + reserve_exception_space(0, 0x400);
>  }
>  
>  void cpu_report(void)
> diff --git a/a

Re: [PATCH v3] MIPS: kernel: Reserve exception base early to prevent corruption

2021-03-08 Thread Thomas Bogendoerfer

On Mon, Mar 08, 2021 at 10:21:10AM -0800, Florian Fainelli wrote:
> On 3/8/21 1:24 AM, Thomas Bogendoerfer wrote:
> > BMIPS is one of the few platforms that do change the exception base.
> > After commit 2dcb39645441 ("memblock: do not start bottom-up allocations
> > with kernel_end") we started seeing BMIPS boards fail to boot with the
> > built-in FDT being corrupted.
> > 
> > Before the cited commit, early allocations would be in the [kernel_end,
> > RAM_END] range, but after commit they would be within [RAM_START +
> > PAGE_SIZE, RAM_END].
> > 
> > The custom exception base handler that is installed by
> > bmips_ebase_setup() done for BMIPS5000 CPUs ends-up trampling on the
> > memory region allocated by unflatten_and_copy_device_tree() thus
> > corrupting the FDT used by the kernel.
> > 
> > To fix this, we need to perform an early reservation of the custom
> > exception space. Additional we reserve the first 4k (1k for R3k) for
> > either normal exception vector space (legacy CPUs) or special vectors
> > like cache exceptions.
> > 
> > Huge thanks to Serge for analysing and proposing a solution to this
> > issue.
> > 
> > Fixes: 2dcb39645441 ("memblock: do not start bottom-up allocations with 
> > kernel_end")
> > Reported-by: Kamal Dasu 
> > Debugged-by: Serge Semin 
> > Signed-off-by: Thomas Bogendoerfer 
> > ---
> > Changes in v3:
> >  - always reserve the first 4k for all CPUs (1k for R3k)
> > 
> > Changes in v2:
> >  - do only memblock reservation in reserve_exception_space()
> >  - reserve 0..0x400 for all CPUs without ebase register and
> >to addtional reserve_exception_space for BMIPS CPUs
> 
> Thomas, do you mind CC'ing me for subsequent versions so you can get a
> chance to have a Tested-by tag? Thank you!

sure, I hope it's the last version ;-)

> Tested-by: Florian Fainelli 

thank you.

Thomas.

-- 
Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
good idea.[ RFC1925, 2.3 ]

Re: [PATCH v3] MIPS: kernel: Reserve exception base early to prevent corruption

2021-03-08 Thread Florian Fainelli

On 3/8/21 1:24 AM, Thomas Bogendoerfer wrote:
> BMIPS is one of the few platforms that do change the exception base.
> After commit 2dcb39645441 ("memblock: do not start bottom-up allocations
> with kernel_end") we started seeing BMIPS boards fail to boot with the
> built-in FDT being corrupted.
> 
> Before the cited commit, early allocations would be in the [kernel_end,
> RAM_END] range, but after commit they would be within [RAM_START +
> PAGE_SIZE, RAM_END].
> 
> The custom exception base handler that is installed by
> bmips_ebase_setup() done for BMIPS5000 CPUs ends-up trampling on the
> memory region allocated by unflatten_and_copy_device_tree() thus
> corrupting the FDT used by the kernel.
> 
> To fix this, we need to perform an early reservation of the custom
> exception space. Additional we reserve the first 4k (1k for R3k) for
> either normal exception vector space (legacy CPUs) or special vectors
> like cache exceptions.
> 
> Huge thanks to Serge for analysing and proposing a solution to this
> issue.
> 
> Fixes: 2dcb39645441 ("memblock: do not start bottom-up allocations with 
> kernel_end")
> Reported-by: Kamal Dasu 
> Debugged-by: Serge Semin 
> Signed-off-by: Thomas Bogendoerfer 
> ---
> Changes in v3:
>  - always reserve the first 4k for all CPUs (1k for R3k)
> 
> Changes in v2:
>  - do only memblock reservation in reserve_exception_space()
>  - reserve 0..0x400 for all CPUs without ebase register and
>to addtional reserve_exception_space for BMIPS CPUs

Thomas, do you mind CC'ing me for subsequent versions so you can get a
chance to have a Tested-by tag? Thank you!

Tested-by: Florian Fainelli 
-- 
Florian

Re: [PATCH v2] MIPS: kernel: Reserve exception base early to prevent corruption

2021-03-08 Thread Roman Gushchin

On Sat, Mar 06, 2021 at 09:29:09AM +0100, Thomas Bogendoerfer wrote:
> BMIPS is one of the few platforms that do change the exception base.
> After commit 2dcb39645441 ("memblock: do not start bottom-up allocations
> with kernel_end") we started seeing BMIPS boards fail to boot with the
> built-in FDT being corrupted.
> 
> Before the cited commit, early allocations would be in the [kernel_end,
> RAM_END] range, but after commit they would be within [RAM_START +
> PAGE_SIZE, RAM_END].
> 
> The custom exception base handler that is installed by
> bmips_ebase_setup() done for BMIPS5000 CPUs ends-up trampling on the
> memory region allocated by unflatten_and_copy_device_tree() thus
> corrupting the FDT used by the kernel.
> 
> To fix this, we need to perform an early reservation of the custom
> exception space. So we reserve it already in cpu_probe() for the CPUs
> where this is fixed. For CPU with an ebase config register allocation
> of exception space will be done in trap_init().
> 
> Huge thanks to Serget for analysing and proposing a solution to this
> issue.
> 
> Fixes: 2dcb39645441 ("memblock: do not start bottom-up allocations with 
> kernel_end")
> Reported-by: Kamal Dasu 
> Debugged-by: Serge Semin 
> Signed-off-by: Thomas Bogendoerfer 
> ---
> Changes in v2:
>  - do only memblock reservation in reserve_exception_space()
>  - reserve 0..0x400 for all CPUs without ebase register and
>to addtional reserve_exception_space for BMIPS CPUs
> 
>  arch/mips/include/asm/traps.h|  3 +++
>  arch/mips/kernel/cpu-probe.c |  7 +++
>  arch/mips/kernel/cpu-r3k-probe.c |  3 +++
>  arch/mips/kernel/traps.c | 10 +-
>  4 files changed, 18 insertions(+), 5 deletions(-)

Acked-by: Roman Gushchin 

Thanks!

> 
> diff --git a/arch/mips/include/asm/traps.h b/arch/mips/include/asm/traps.h
> index 6aa8f126a43d..b710e76c9c65 100644
> --- a/arch/mips/include/asm/traps.h
> +++ b/arch/mips/include/asm/traps.h
> @@ -24,8 +24,11 @@ extern void (*board_ebase_setup)(void);
>  extern void (*board_cache_error_setup)(void);
>  
>  extern int register_nmi_notifier(struct notifier_block *nb);
> +extern void reserve_exception_space(phys_addr_t addr, unsigned long size);
>  extern char except_vec_nmi[];
>  
> +#define VECTORSPACING 0x100  /* for EI/VI mode */
> +
>  #define nmi_notifier(fn, pri)
> \
>  ({   \
>   static struct notifier_block fn##_nb = {\
> diff --git a/arch/mips/kernel/cpu-probe.c b/arch/mips/kernel/cpu-probe.c
> index 9a89637b4ecf..b565bc4b900d 100644
> --- a/arch/mips/kernel/cpu-probe.c
> +++ b/arch/mips/kernel/cpu-probe.c
> @@ -26,6 +26,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  
>  #include "fpu-probe.h"
> @@ -1628,6 +1629,7 @@ static inline void cpu_probe_broadcom(struct 
> cpuinfo_mips *c, unsigned int cpu)
>   c->cputype = CPU_BMIPS3300;
>   __cpu_name[cpu] = "Broadcom BMIPS3300";
>   set_elf_platform(cpu, "bmips3300");
> + reserve_exception_space(0x400, VECTORSPACING * 64);
>   break;
>   case PRID_IMP_BMIPS43XX: {
>   int rev = c->processor_id & PRID_REV_MASK;
> @@ -1638,6 +1640,7 @@ static inline void cpu_probe_broadcom(struct 
> cpuinfo_mips *c, unsigned int cpu)
>   __cpu_name[cpu] = "Broadcom BMIPS4380";
>   set_elf_platform(cpu, "bmips4380");
>   c->options |= MIPS_CPU_RIXI;
> + reserve_exception_space(0x400, VECTORSPACING * 64);
>   } else {
>   c->cputype = CPU_BMIPS4350;
>   __cpu_name[cpu] = "Broadcom BMIPS4350";
> @@ -1654,6 +1657,7 @@ static inline void cpu_probe_broadcom(struct 
> cpuinfo_mips *c, unsigned int cpu)
>   __cpu_name[cpu] = "Broadcom BMIPS5000";
>   set_elf_platform(cpu, "bmips5000");
>   c->options |= MIPS_CPU_ULRI | MIPS_CPU_RIXI;
> + reserve_exception_space(0x1000, VECTORSPACING * 64);
>   break;
>   }
>  }
> @@ -2133,6 +2137,9 @@ void cpu_probe(void)
>   if (cpu == 0)
>   __ua_limit = ~((1ull << cpu_vmbits) - 1);
>  #endif
> +
> + if (cpu_has_mips_r2_r6)
> + reserve_exception_space(0, 0x400);
>  }
>  
>  void cpu_report(void)
> diff --git a/arch/mips/kernel/cpu-r3k-probe.c 
> b/arch/mips/kernel/cpu-r3k-probe.c
> index abdbbe8c5a43..af654771918c 100644
> --- a/arch/mips/kernel/cpu-r3k-probe.c
> +++ b/arch/mips/kernel/cpu-r3k-probe.c
> @@ -21,6 +21,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "fpu-probe.h"
>  
> @@ -158,6 +159,8 @@ void cpu_probe(void)
>   cpu_set_fpu_opts(c);
>   else
>   cpu_set_nofpu_opts(c);
> +
> + reserve_exception_space(0, 0x400);
>  }
>  
>  void cpu_report(void)
> diff --git a/arch/mips/kernel/traps.c

Re: [PATCH v3] MIPS: kernel: Reserve exception base early to prevent corruption

2021-03-08 Thread Mike Rapoport

On Mon, Mar 08, 2021 at 10:24:47AM +0100, Thomas Bogendoerfer wrote:
> BMIPS is one of the few platforms that do change the exception base.
> After commit 2dcb39645441 ("memblock: do not start bottom-up allocations
> with kernel_end") we started seeing BMIPS boards fail to boot with the
> built-in FDT being corrupted.
> 
> Before the cited commit, early allocations would be in the [kernel_end,
> RAM_END] range, but after commit they would be within [RAM_START +
> PAGE_SIZE, RAM_END].
> 
> The custom exception base handler that is installed by
> bmips_ebase_setup() done for BMIPS5000 CPUs ends-up trampling on the
> memory region allocated by unflatten_and_copy_device_tree() thus
> corrupting the FDT used by the kernel.
> 
> To fix this, we need to perform an early reservation of the custom
> exception space. Additional we reserve the first 4k (1k for R3k) for
> either normal exception vector space (legacy CPUs) or special vectors
> like cache exceptions.

Just a side note, memblock always skips the first 4k when allocating
memory, but explicitly reserving the exception space is The Right Thing To
Do anyway :)
 
> Huge thanks to Serge for analysing and proposing a solution to this
> issue.
> 
> Fixes: 2dcb39645441 ("memblock: do not start bottom-up allocations with 
> kernel_end")
> Reported-by: Kamal Dasu 
> Debugged-by: Serge Semin 
> Signed-off-by: Thomas Bogendoerfer 

Acked-by: Mike Rapoport 

> ---
> Changes in v3:
>  - always reserve the first 4k for all CPUs (1k for R3k)
> 
> Changes in v2:
>  - do only memblock reservation in reserve_exception_space()
>  - reserve 0..0x400 for all CPUs without ebase register and
>to addtional reserve_exception_space for BMIPS CPUs
> 
>  arch/mips/include/asm/traps.h|  3 +++
>  arch/mips/kernel/cpu-probe.c |  6 ++
>  arch/mips/kernel/cpu-r3k-probe.c |  3 +++
>  arch/mips/kernel/traps.c | 10 +-
>  4 files changed, 17 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/mips/include/asm/traps.h b/arch/mips/include/asm/traps.h
> index 6aa8f126a43d..b710e76c9c65 100644
> --- a/arch/mips/include/asm/traps.h
> +++ b/arch/mips/include/asm/traps.h
> @@ -24,8 +24,11 @@ extern void (*board_ebase_setup)(void);
>  extern void (*board_cache_error_setup)(void);
>  
>  extern int register_nmi_notifier(struct notifier_block *nb);
> +extern void reserve_exception_space(phys_addr_t addr, unsigned long size);
>  extern char except_vec_nmi[];
>  
> +#define VECTORSPACING 0x100  /* for EI/VI mode */
> +
>  #define nmi_notifier(fn, pri)
> \
>  ({   \
>   static struct notifier_block fn##_nb = {\
> diff --git a/arch/mips/kernel/cpu-probe.c b/arch/mips/kernel/cpu-probe.c
> index 9a89637b4ecf..b71892064f27 100644
> --- a/arch/mips/kernel/cpu-probe.c
> +++ b/arch/mips/kernel/cpu-probe.c
> @@ -26,6 +26,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  
>  #include "fpu-probe.h"
> @@ -1628,6 +1629,7 @@ static inline void cpu_probe_broadcom(struct 
> cpuinfo_mips *c, unsigned int cpu)
>   c->cputype = CPU_BMIPS3300;
>   __cpu_name[cpu] = "Broadcom BMIPS3300";
>   set_elf_platform(cpu, "bmips3300");
> + reserve_exception_space(0x400, VECTORSPACING * 64);
>   break;
>   case PRID_IMP_BMIPS43XX: {
>   int rev = c->processor_id & PRID_REV_MASK;
> @@ -1638,6 +1640,7 @@ static inline void cpu_probe_broadcom(struct 
> cpuinfo_mips *c, unsigned int cpu)
>   __cpu_name[cpu] = "Broadcom BMIPS4380";
>   set_elf_platform(cpu, "bmips4380");
>   c->options |= MIPS_CPU_RIXI;
> + reserve_exception_space(0x400, VECTORSPACING * 64);
>   } else {
>   c->cputype = CPU_BMIPS4350;
>   __cpu_name[cpu] = "Broadcom BMIPS4350";
> @@ -1654,6 +1657,7 @@ static inline void cpu_probe_broadcom(struct 
> cpuinfo_mips *c, unsigned int cpu)
>   __cpu_name[cpu] = "Broadcom BMIPS5000";
>   set_elf_platform(cpu, "bmips5000");
>   c->options |= MIPS_CPU_ULRI | MIPS_CPU_RIXI;
> + reserve_exception_space(0x1000, VECTORSPACING * 64);
>   break;
>   }
>  }
> @@ -2133,6 +2137,8 @@ void cpu_probe(void)
>   if (cpu == 0)
>   __ua_limit = ~((1ull << cpu_vmbits) - 1);
>  #endif
> +
> + reserve_exception_space(0, 0x1000);
>  }
>  
>  void cpu_report(void)
> diff --git a/arch/mips/kernel/cpu-r3k-probe.c 
> b/arch/mips/kernel/cpu-r3k-probe.c
> index abdbbe8c5a43..af654771918c 100644
> --- a/arch/mips/kernel/cpu-r3k-probe.c
> +++ b/arch/mips/kernel/cpu-r3k-probe.c
> @@ -21,6 +21,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "fpu-probe.h"
>  
> @@ -158,6 +159,8 @@ void cpu_probe(void)
>   cpu_set_fpu_opts(c);
>   else

list_add corruption in __percpu_counter_init

2021-03-08 Thread Marc Kleine-Budde

Hello,

since updating one of our compile cluster machines from kernel 4.19+105+deb10u9 
to
5.10.0-0.bpo.3-amd64 #1 Debian 5.10.13-1~bpo10+1 we're hit by this bug
every 1...2 days:

list_add corruption. next->prev should be prev (889a9840), but was 
. (next=9c3dcaf2a310).
[ cut here ]
kernel BUG at lib/list_debug.c:25!
invalid opcode:  [#1] SMP PTI
CPU: 9 PID: 3281 Comm: python3 Not tainted 5.10.0-0.bpo.3-amd64 #1 Debian 
5.10.13-1~bpo10+1
Hardware name: Intel Corporation S5520HC/S5520HC, BIOS 
S5500.86B.01.00.0059.082320111421 08/23/2011
RIP: 0010:__list_add_valid.cold.0+0x12/0x28
Code: 80 46 32 88 48 89 ef e8 62 3e 01 00 48 c7 c0 f0 ff ff ff e9 bb 1e bf ff 
48 89 d1 48 c7 c7 f8 46 32 88 48 89 c2 e8 02 2a ff ff <0f> 0b 48 89 c1 4c 89 c6 
48 c7 c7 50 47 32 88 e8 ee 29 ff ff 0f 0b
RSP: 0018:bb0862ba3910 EFLAGS: 00010046
RAX: 0075 RBX: 9c3dcaf2a268 RCX: 
RDX:  RSI: 9c46ffb18a00 RDI: 9c46ffb18a00
RBP: 9c3dcaf2a278 R08:  R09: c000dfff
R10: 0001 R11: bb0862ba3720 R12: 9c3dcaf2a310
R13: 889a9840 R14: 0002 R15: 2800
FS:  7f4bfc65d740() GS:9c46ffb0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 01f5b100 CR3: 0013298c4001 CR4: 000206e0
Call Trace:
 __percpu_counter_init+0x64/0xa0
 blkg_rwstat_init+0x3c/0xb0
 throtl_pd_alloc+0x63/0x230
 blkg_alloc+0x134/0x180
 blkg_create+0x232/0x320
 ? kmem_cache_alloc+0x30c/0x420
 bio_associate_blkg_from_css+0x1cb/0x2c0
 bio_associate_blkg+0x20/0x70
 ? ktime_get+0x3e/0xa0
 linear_map+0x50/0x90 [dm_mod]
 __map_bio+0x3a/0x130 [dm_mod]
 __split_and_process_non_flush+0x190/0x1e0 [dm_mod]
 dm_submit_bio+0x152/0x3b0 [dm_mod]
 submit_bio_noacct+0xfb/0x410
 ? iomap_page_mkwrite_actor+0x70/0x70
 submit_bio+0x43/0x190
 iomap_readahead+0xb5/0x190
 read_pages+0x8e/0x270
 page_cache_ra_unbounded+0x1a2/0x220
 generic_file_buffered_read+0x1a9/0x9b0
 xfs_file_buffered_aio_read+0x44/0xb0 [xfs]
 xfs_file_read_iter+0x6e/0xd0 [xfs]
 new_sync_read+0x118/0x1a0
 vfs_read+0xf1/0x180
 ksys_read+0x59/0xd0
 do_syscall_64+0x33/0x80
 entry_SYSCALL_64_after_hwframe+0x44/0xa9


and another one:

list_add corruption. next->prev should be prev (8bba9840), but was 
. (next=9b12c9ce1b10).
[ cut here ]
kernel BUG at lib/list_debug.c:25!
invalid opcode:  [#1] SMP PTI
CPU: 1 PID: 11618 Comm: qtdeclarative-n Not tainted 5.10.0-0.bpo.3-amd64 #1 
Debian 5.10.13-1~bpo10+1
Hardware name: Intel Corporation S5520HC/S5520HC, BIOS 
S5500.86B.01.00.0059.082320111421 08/23/2011
RIP: 0010:__list_add_valid.cold.0+0x12/0x28
Code: 80 46 52 8b 48 89 ef e8 62 3e 01 00 48 c7 c0 f0 ff ff ff e9 bb 1e bf ff 
48 89 d1 48 c7 c7 f8 46 52 8b 48 89 c2 e8 02 2a ff ff <0f> 0b 48 89 c1 4c 89 c6 
48 c7 c7 50 

RSP: 0018:bab42038b580 EFLAGS: 00010046
RAX: 0075 RBX: 9b1629c18268 RCX: 
RDX:  RSI: 9b1bbfa18a00 RDI: 9b1bbfa18a00
RBP: 9b1629c18278 R08:  R09: c000dfff
R10: 0001 R11: bab42038b390 R12: 9b12c9ce1b10
R13: 8bba9840 R14: 0006 R15: 2800
FS:  7f9f93fb2740() GS:9b1bbfa0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7f9f9385dbc0 CR3: 00082c1f8001 CR4: 000206e0
Call Trace:
 __percpu_counter_init+0x64/0xa0
 blkg_rwstat_init+0x3c/0xb0
 throtl_pd_alloc+0x63/0x230
 blkg_alloc+0x134/0x180
 blkg_create+0x232/0x320
 ? find_busiest_group+0x41/0x360
 bio_associate_blkg_from_css+0x1cb/0x2c0
 bio_associate_blkg+0x20/0x70
 ? ktime_get+0x3e/0xa0
 linear_map+0x50/0x90 [dm_mod]
 __map_bio+0x3a/0x130 [dm_mod]
 __split_and_process_non_flush+0x190/0x1e0 [dm_mod]
 dm_submit_bio+0x152/0x3b0 [dm_mod]
 submit_bio_noacct+0xfb/0x410
 submit_bio+0x43/0x190
 ? bio_add_page+0x62/0x90
 _xfs_buf_ioapply+0x2af/0x410 [xfs]
[...]


and a 3rd one without xfs involved

list_add corruption. next->prev should be prev (a4ba9840), but was 
. (next=95071f379310).
[ cut here ]
kernel BUG at lib/list_debug.c:25!
invalid opcode:  [#1] SMP PTI
CPU: 21 PID: 12894 Comm: vgs Not tainted 5.10.0-0.bpo.3-amd64 #1 Debian 
5.10.13-1~bpo10+1
Hardware name: Intel Corporation S5520HC/S5520HC, BIOS 
S5500.86B.01.00.0059.082320111421 08/23/2011
RIP: 0010:__list_add_valid.cold.0+0x12/0x28
Code: 80 46 52 a4 48 89 ef e8 62 3e 01 00 48 c7 c0 f0 ff ff ff e9 bb 1e bf ff 
48 89 d1 48 c7 c7 f8 46 52 a4 48 89 c2 e8 02 2a ff ff <0f> 0b 48 89 c1 4c 89 c6 
48 c7 c7 50 47 52 a4 e8 ee 29 ff ff 0f 0b
RSP: 0018:ad41c741fa80 EFLAGS: 00010046
RAX: 0075 RBX: 95071f379268 RCX: 
RDX:  RSI: 950dffc98a00 RDI: 950dffc98a00
RBP: 95071f379278 R08: 00

[PATCH v3] MIPS: kernel: Reserve exception base early to prevent corruption

2021-03-08 Thread Thomas Bogendoerfer

BMIPS is one of the few platforms that do change the exception base.
After commit 2dcb39645441 ("memblock: do not start bottom-up allocations
with kernel_end") we started seeing BMIPS boards fail to boot with the
built-in FDT being corrupted.

Before the cited commit, early allocations would be in the [kernel_end,
RAM_END] range, but after commit they would be within [RAM_START +
PAGE_SIZE, RAM_END].

The custom exception base handler that is installed by
bmips_ebase_setup() done for BMIPS5000 CPUs ends-up trampling on the
memory region allocated by unflatten_and_copy_device_tree() thus
corrupting the FDT used by the kernel.

To fix this, we need to perform an early reservation of the custom
exception space. Additional we reserve the first 4k (1k for R3k) for
either normal exception vector space (legacy CPUs) or special vectors
like cache exceptions.

Huge thanks to Serge for analysing and proposing a solution to this
issue.

Fixes: 2dcb39645441 ("memblock: do not start bottom-up allocations with 
kernel_end")
Reported-by: Kamal Dasu 
Debugged-by: Serge Semin 
Signed-off-by: Thomas Bogendoerfer 
---
Changes in v3:
 - always reserve the first 4k for all CPUs (1k for R3k)

Changes in v2:
 - do only memblock reservation in reserve_exception_space()
 - reserve 0..0x400 for all CPUs without ebase register and
   to addtional reserve_exception_space for BMIPS CPUs

 arch/mips/include/asm/traps.h|  3 +++
 arch/mips/kernel/cpu-probe.c |  6 ++
 arch/mips/kernel/cpu-r3k-probe.c |  3 +++
 arch/mips/kernel/traps.c | 10 +-
 4 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/arch/mips/include/asm/traps.h b/arch/mips/include/asm/traps.h
index 6aa8f126a43d..b710e76c9c65 100644
--- a/arch/mips/include/asm/traps.h
+++ b/arch/mips/include/asm/traps.h
@@ -24,8 +24,11 @@ extern void (*board_ebase_setup)(void);
 extern void (*board_cache_error_setup)(void);
 
 extern int register_nmi_notifier(struct notifier_block *nb);
+extern void reserve_exception_space(phys_addr_t addr, unsigned long size);
 extern char except_vec_nmi[];
 
+#define VECTORSPACING 0x100/* for EI/VI mode */
+
 #define nmi_notifier(fn, pri)  \
 ({ \
static struct notifier_block fn##_nb = {\
diff --git a/arch/mips/kernel/cpu-probe.c b/arch/mips/kernel/cpu-probe.c
index 9a89637b4ecf..b71892064f27 100644
--- a/arch/mips/kernel/cpu-probe.c
+++ b/arch/mips/kernel/cpu-probe.c
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "fpu-probe.h"
@@ -1628,6 +1629,7 @@ static inline void cpu_probe_broadcom(struct cpuinfo_mips 
*c, unsigned int cpu)
c->cputype = CPU_BMIPS3300;
__cpu_name[cpu] = "Broadcom BMIPS3300";
set_elf_platform(cpu, "bmips3300");
+   reserve_exception_space(0x400, VECTORSPACING * 64);
break;
case PRID_IMP_BMIPS43XX: {
int rev = c->processor_id & PRID_REV_MASK;
@@ -1638,6 +1640,7 @@ static inline void cpu_probe_broadcom(struct cpuinfo_mips 
*c, unsigned int cpu)
__cpu_name[cpu] = "Broadcom BMIPS4380";
set_elf_platform(cpu, "bmips4380");
c->options |= MIPS_CPU_RIXI;
+   reserve_exception_space(0x400, VECTORSPACING * 64);
} else {
c->cputype = CPU_BMIPS4350;
__cpu_name[cpu] = "Broadcom BMIPS4350";
@@ -1654,6 +1657,7 @@ static inline void cpu_probe_broadcom(struct cpuinfo_mips 
*c, unsigned int cpu)
__cpu_name[cpu] = "Broadcom BMIPS5000";
set_elf_platform(cpu, "bmips5000");
c->options |= MIPS_CPU_ULRI | MIPS_CPU_RIXI;
+   reserve_exception_space(0x1000, VECTORSPACING * 64);
break;
}
 }
@@ -2133,6 +2137,8 @@ void cpu_probe(void)
if (cpu == 0)
__ua_limit = ~((1ull << cpu_vmbits) - 1);
 #endif
+
+   reserve_exception_space(0, 0x1000);
 }
 
 void cpu_report(void)
diff --git a/arch/mips/kernel/cpu-r3k-probe.c b/arch/mips/kernel/cpu-r3k-probe.c
index abdbbe8c5a43..af654771918c 100644
--- a/arch/mips/kernel/cpu-r3k-probe.c
+++ b/arch/mips/kernel/cpu-r3k-probe.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "fpu-probe.h"
 
@@ -158,6 +159,8 @@ void cpu_probe(void)
cpu_set_fpu_opts(c);
else
cpu_set_nofpu_opts(c);
+
+   reserve_exception_space(0, 0x400);
 }
 
 void cpu_report(void)
diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c
index e0352958e2f7..808b8b61ded1 100644
--- a/arch/mips/kernel/traps.c
+++ b/arch/mips/kernel/traps.c
@@ -2009,13 +2009,16 @@ void __noreturn nmi_exception_handler(struct pt_regs 
*regs)
nmi_exit();
 }
 
-#define VECTORSPACING 0x100/* for EI/VI mode */
-
 unsign

Re: [PATCH v2] MIPS: kernel: Reserve exception base early to prevent corruption

2021-03-07 Thread Maciej W. Rozycki

On Mon, 8 Mar 2021, Serge Semin wrote:

> > some of them are not R2 (SB1), others are. So best bet would be to
> > simply reserve the first 0x1000 bytes for every CPU and special handling
> > for the BMIPS case. Does this cover all cases ?
> 
> I can't say for sure whether it will cover all the cases/platforms. I
> visually analysed all the
> board_{nmi_handler,ejtag_handler,ebase,cache_error}_setup callbacks
> implementation in MIPS arch to create the list above. Exception vectors or
> some other stuff can be setup in some other platform-specific manner. But at
> least reserving a memory below PAGE_SIZE would get the situation partly back
> to before the memory below the kernel stopped being reserved. Hopefully
> one page will be enough for the platforms, which relied on that rule. The
> rest or them sooner or later will manifest itself as it has happened with
> Broadcom.

 I think reserving up to 64KiB might be a bit excessive on one hand while 
having the size of the reservation depend on configured PAGE_SIZE could be 
too unpredictable on the other.  I think 4KiB is a good compromise and I'd 
leave anything else for platform maintainers to sort out.

  Maciej

Re: [PATCH v2] MIPS: kernel: Reserve exception base early to prevent corruption

2021-03-07 Thread Serge Semin

On Sun, Mar 07, 2021 at 10:20:01PM +0100, Thomas Bogendoerfer wrote:
> On Sun, Mar 07, 2021 at 11:06:12PM +0300, Serge Semin wrote:
> > > +
> > > + if (cpu_has_mips_r2_r6)
> > > + reserve_exception_space(0, 0x400);
> > 
> > Are you sure it shouldn't be (!cpu_has_mips_r2_r6)?. What I see here
> > contradicts to what is said in Changelog v2.
> 
> d'oh, of course it has to be !cpu_has_mips_r2_r6.
> 
> > Anyway regarding the problem in general. AFAICS the next code uses the
> > lowest memory to place some specific exception handlers:
> > board_cache_error_setup pointer:
> >   arch/mips/mm/c-r4k.c: r4k_cache_error_setup() - SiByte CPUs: CPU_SB1, 
> > CPU_SB1A (up to 0x180)
> >   arch/mips/mm/c-octeon.c: octeon_cache_error_setup() - Cavium CPU: 
> > CPU_CAVIUM_OCTEON (up to 0x180)
> > board_nmi_handler_setup pointer:
> >   arch/mips/kernel/smp-bmips.c: bmips_nmi_handler_setup() - Broadcom CPU: 
> > CPU_BMIPS (up to 0x400)
> >   arch/mips/loongson2ef/common/init.c: mips_nmi_setup() - Loongson 2E CPU: 
> > MACH_LOONGSON2EF (up to 0x400)
> >   arch/mips/loongson64/init.c: mips_nmi_setup() - Loongson 64 CPU: 
> > MACH_LOONGSON64 (up to 0x400, VEIC:0xB00)
> >   arch/mips/mti-malta/malta-init.c: mips_nmi_setup() - Malta CPU: 
> > MIPS_MALTA (up to 0x400, VEIC: 0xB00)
> >   arch/mips/pistachio/init.c: mips_nmi_setup() - Pistachio CPU: 
> > MACH_PISTACHIO (up to 0x400, VEIC: 0xB00)
> > board_ejtag_handler_setup:
> >   arch/mips/mti-malta/malta-init.c: mips_ejtag_setup() - Malta CPU: 
> > MIPS_MALTA (up to 0x380, VEIC: 0xa80)
> >   arch/mips/pistachio/init.c: mips_ejtag_setup() - Pistachio CPU: 
> > MACH_PISTACHIO (up to 0x380, VEIC: 0xa80)
> > bmips_ebase_setup:
> >   arch/mips/kernel/smp-bmips.c: bmips_ebase_setup() - Broadcom CPU: 
> > CPU_BMIPS (up to 0x400 - NMI/reset, and 0x1000 - normal)
> > plat_mem_setup:
> >   arch/mips/bmips/setup.c: bcm63xx_fixup_cpu1() - Broadcom CPU: CPU_BMIPS 
> > (up to 0x220)
> >   
> > 
> > Are you sure all of them have "cpu_has_mips_r2_r6" macro returning
> > true (false) in order to safely use the lowest region in accordance
> > with the conditional statement you've added?
> 

> some of them are not R2 (SB1), others are. So best bet would be to
> simply reserve the first 0x1000 bytes for every CPU and special handling
> for the BMIPS case. Does this cover all cases ?

I can't say for sure whether it will cover all the cases/platforms. I
visually analysed all the
board_{nmi_handler,ejtag_handler,ebase,cache_error}_setup callbacks
implementation in MIPS arch to create the list above. Exception vectors or
some other stuff can be setup in some other platform-specific manner. But at
least reserving a memory below PAGE_SIZE would get the situation partly back
to before the memory below the kernel stopped being reserved. Hopefully
one page will be enough for the platforms, which relied on that rule. The
rest or them sooner or later will manifest itself as it has happened with
Broadcom.

-Sergey

> 
> Thomas.
> 
> -- 
> Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
> good idea.[ RFC1925, 2.3 ]

Re: [PATCH v2] MIPS: kernel: Reserve exception base early to prevent corruption

2021-03-07 Thread Maciej W. Rozycki

On Sun, 7 Mar 2021, Thomas Bogendoerfer wrote:

> > Are you sure all of them have "cpu_has_mips_r2_r6" macro returning
> > true (false) in order to safely use the lowest region in accordance
> > with the conditional statement you've added?
> 
> some of them are not R2 (SB1), others are. 

 For the record Malta is just about anything from MIPS IV up (though the 
QED RM5261 and RM7061 core cards have been quite rare).

  Maciej

Re: [PATCH v2] MIPS: kernel: Reserve exception base early to prevent corruption

2021-03-07 Thread Thomas Bogendoerfer

On Sun, Mar 07, 2021 at 11:06:12PM +0300, Serge Semin wrote:
> > +
> > +   if (cpu_has_mips_r2_r6)
> > +   reserve_exception_space(0, 0x400);
> 
> Are you sure it shouldn't be (!cpu_has_mips_r2_r6)?. What I see here
> contradicts to what is said in Changelog v2.

d'oh, of course it has to be !cpu_has_mips_r2_r6.

> Anyway regarding the problem in general. AFAICS the next code uses the
> lowest memory to place some specific exception handlers:
> board_cache_error_setup pointer:
>   arch/mips/mm/c-r4k.c: r4k_cache_error_setup() - SiByte CPUs: CPU_SB1, 
> CPU_SB1A (up to 0x180)
>   arch/mips/mm/c-octeon.c: octeon_cache_error_setup() - Cavium CPU: 
> CPU_CAVIUM_OCTEON (up to 0x180)
> board_nmi_handler_setup pointer:
>   arch/mips/kernel/smp-bmips.c: bmips_nmi_handler_setup() - Broadcom CPU: 
> CPU_BMIPS (up to 0x400)
>   arch/mips/loongson2ef/common/init.c: mips_nmi_setup() - Loongson 2E CPU: 
> MACH_LOONGSON2EF (up to 0x400)
>   arch/mips/loongson64/init.c: mips_nmi_setup() - Loongson 64 CPU: 
> MACH_LOONGSON64 (up to 0x400, VEIC:0xB00)
>   arch/mips/mti-malta/malta-init.c: mips_nmi_setup() - Malta CPU: MIPS_MALTA 
> (up to 0x400, VEIC: 0xB00)
>   arch/mips/pistachio/init.c: mips_nmi_setup() - Pistachio CPU: 
> MACH_PISTACHIO (up to 0x400, VEIC: 0xB00)
> board_ejtag_handler_setup:
>   arch/mips/mti-malta/malta-init.c: mips_ejtag_setup() - Malta CPU: 
> MIPS_MALTA (up to 0x380, VEIC: 0xa80)
>   arch/mips/pistachio/init.c: mips_ejtag_setup() - Pistachio CPU: 
> MACH_PISTACHIO (up to 0x380, VEIC: 0xa80)
> bmips_ebase_setup:
>   arch/mips/kernel/smp-bmips.c: bmips_ebase_setup() - Broadcom CPU: CPU_BMIPS 
> (up to 0x400 - NMI/reset, and 0x1000 - normal)
> plat_mem_setup:
>   arch/mips/bmips/setup.c: bcm63xx_fixup_cpu1() - Broadcom CPU: CPU_BMIPS (up 
> to 0x220)
>   
> 
> Are you sure all of them have "cpu_has_mips_r2_r6" macro returning
> true (false) in order to safely use the lowest region in accordance
> with the conditional statement you've added?

some of them are not R2 (SB1), others are. So best bet would be to
simply reserve the first 0x1000 bytes for every CPU and special handling
for the BMIPS case. Does this cover all cases ?

Thomas.

-- 
Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
good idea.[ RFC1925, 2.3 ]

Re: [PATCH v2] MIPS: kernel: Reserve exception base early to prevent corruption

2021-03-07 Thread Serge Semin

Hi Thomas.
I thought we'd discuss it in v1, but since you've sent v2 please see
my comment below.

On Sat, Mar 06, 2021 at 09:29:09AM +0100, Thomas Bogendoerfer wrote:
> BMIPS is one of the few platforms that do change the exception base.
> After commit 2dcb39645441 ("memblock: do not start bottom-up allocations
> with kernel_end") we started seeing BMIPS boards fail to boot with the
> built-in FDT being corrupted.
> 
> Before the cited commit, early allocations would be in the [kernel_end,
> RAM_END] range, but after commit they would be within [RAM_START +
> PAGE_SIZE, RAM_END].
> 
> The custom exception base handler that is installed by
> bmips_ebase_setup() done for BMIPS5000 CPUs ends-up trampling on the
> memory region allocated by unflatten_and_copy_device_tree() thus
> corrupting the FDT used by the kernel.
> 
> To fix this, we need to perform an early reservation of the custom
> exception space. So we reserve it already in cpu_probe() for the CPUs
> where this is fixed. For CPU with an ebase config register allocation
> of exception space will be done in trap_init().
> 
> Huge thanks to Serget for analysing and proposing a solution to this
> issue.
> 
> Fixes: 2dcb39645441 ("memblock: do not start bottom-up allocations with 
> kernel_end")
> Reported-by: Kamal Dasu 
> Debugged-by: Serge Semin 
> Signed-off-by: Thomas Bogendoerfer 
> ---
> Changes in v2:
>  - do only memblock reservation in reserve_exception_space()
>  - reserve 0..0x400 for all CPUs without ebase register and
>to addtional reserve_exception_space for BMIPS CPUs
> 
>  arch/mips/include/asm/traps.h|  3 +++
>  arch/mips/kernel/cpu-probe.c |  7 +++
>  arch/mips/kernel/cpu-r3k-probe.c |  3 +++
>  arch/mips/kernel/traps.c | 10 +-
>  4 files changed, 18 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/mips/include/asm/traps.h b/arch/mips/include/asm/traps.h
> index 6aa8f126a43d..b710e76c9c65 100644
> --- a/arch/mips/include/asm/traps.h
> +++ b/arch/mips/include/asm/traps.h
> @@ -24,8 +24,11 @@ extern void (*board_ebase_setup)(void);
>  extern void (*board_cache_error_setup)(void);
>  
>  extern int register_nmi_notifier(struct notifier_block *nb);
> +extern void reserve_exception_space(phys_addr_t addr, unsigned long size);
>  extern char except_vec_nmi[];
>  
> +#define VECTORSPACING 0x100  /* for EI/VI mode */
> +
>  #define nmi_notifier(fn, pri)
> \
>  ({   \
>   static struct notifier_block fn##_nb = {\
> diff --git a/arch/mips/kernel/cpu-probe.c b/arch/mips/kernel/cpu-probe.c
> index 9a89637b4ecf..b565bc4b900d 100644
> --- a/arch/mips/kernel/cpu-probe.c
> +++ b/arch/mips/kernel/cpu-probe.c
> @@ -26,6 +26,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  
>  #include "fpu-probe.h"
> @@ -1628,6 +1629,7 @@ static inline void cpu_probe_broadcom(struct 
> cpuinfo_mips *c, unsigned int cpu)
>   c->cputype = CPU_BMIPS3300;
>   __cpu_name[cpu] = "Broadcom BMIPS3300";
>   set_elf_platform(cpu, "bmips3300");
> + reserve_exception_space(0x400, VECTORSPACING * 64);
>   break;
>   case PRID_IMP_BMIPS43XX: {
>   int rev = c->processor_id & PRID_REV_MASK;
> @@ -1638,6 +1640,7 @@ static inline void cpu_probe_broadcom(struct 
> cpuinfo_mips *c, unsigned int cpu)
>   __cpu_name[cpu] = "Broadcom BMIPS4380";
>   set_elf_platform(cpu, "bmips4380");
>   c->options |= MIPS_CPU_RIXI;
> + reserve_exception_space(0x400, VECTORSPACING * 64);
>   } else {
>   c->cputype = CPU_BMIPS4350;
>   __cpu_name[cpu] = "Broadcom BMIPS4350";
> @@ -1654,6 +1657,7 @@ static inline void cpu_probe_broadcom(struct 
> cpuinfo_mips *c, unsigned int cpu)
>   __cpu_name[cpu] = "Broadcom BMIPS5000";
>   set_elf_platform(cpu, "bmips5000");
>   c->options |= MIPS_CPU_ULRI | MIPS_CPU_RIXI;
> + reserve_exception_space(0x1000, VECTORSPACING * 64);
>   break;
>   }
>  }
> @@ -2133,6 +2137,9 @@ void cpu_probe(void)
>   if (cpu == 0)
>   __ua_limit = ~((1ull << cpu_vmbits) - 1);
>  #endif

> +
> + if (cpu_has_mips_r2_r6)
> + reserve_exception_space(0, 0x400);

Are you sure it shouldn't be (!cpu_has_mips_r2_r6)?. What I see here
contradicts to what is said in Changelog v2.

Anyway regarding the problem in general. AFAICS the next code uses the
lowest memory to place some specific exception handlers:
board_cache_error_setup pointer:
  arch/mips/mm/c-r4k.c: r4k_cache_error_setup() - SiByte CPUs: CPU_SB1, 
CPU_SB1A (up to 0x180)
  arch/mips/mm/c-octeon.c: octeon_cache_error_setup() - Cavium CPU: 
CPU_CAVIUM_OCTEON (up to 0x180)
board_nmi_handler_setup pointer:
  arch/mips/kernel/smp-bmips.c:

Re: [PATCH v2] MIPS: kernel: Reserve exception base early to prevent corruption

2021-03-06 Thread Mike Rapoport

On Sat, Mar 06, 2021 at 09:29:09AM +0100, Thomas Bogendoerfer wrote:
> BMIPS is one of the few platforms that do change the exception base.
> After commit 2dcb39645441 ("memblock: do not start bottom-up allocations
> with kernel_end") we started seeing BMIPS boards fail to boot with the
> built-in FDT being corrupted.
> 
> Before the cited commit, early allocations would be in the [kernel_end,
> RAM_END] range, but after commit they would be within [RAM_START +
> PAGE_SIZE, RAM_END].
> 
> The custom exception base handler that is installed by
> bmips_ebase_setup() done for BMIPS5000 CPUs ends-up trampling on the
> memory region allocated by unflatten_and_copy_device_tree() thus
> corrupting the FDT used by the kernel.
> 
> To fix this, we need to perform an early reservation of the custom
> exception space. So we reserve it already in cpu_probe() for the CPUs
> where this is fixed. For CPU with an ebase config register allocation
> of exception space will be done in trap_init().
> 
> Huge thanks to Serget for analysing and proposing a solution to this
> issue.
> 
> Fixes: 2dcb39645441 ("memblock: do not start bottom-up allocations with 
> kernel_end")
> Reported-by: Kamal Dasu 
> Debugged-by: Serge Semin 
> Signed-off-by: Thomas Bogendoerfer 

Acked-by: Mike Rapoport 

> ---
> Changes in v2:
>  - do only memblock reservation in reserve_exception_space()
>  - reserve 0..0x400 for all CPUs without ebase register and
>to addtional reserve_exception_space for BMIPS CPUs
> 
>  arch/mips/include/asm/traps.h|  3 +++
>  arch/mips/kernel/cpu-probe.c |  7 +++
>  arch/mips/kernel/cpu-r3k-probe.c |  3 +++
>  arch/mips/kernel/traps.c | 10 +-
>  4 files changed, 18 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/mips/include/asm/traps.h b/arch/mips/include/asm/traps.h
> index 6aa8f126a43d..b710e76c9c65 100644
> --- a/arch/mips/include/asm/traps.h
> +++ b/arch/mips/include/asm/traps.h
> @@ -24,8 +24,11 @@ extern void (*board_ebase_setup)(void);
>  extern void (*board_cache_error_setup)(void);
>  
>  extern int register_nmi_notifier(struct notifier_block *nb);
> +extern void reserve_exception_space(phys_addr_t addr, unsigned long size);
>  extern char except_vec_nmi[];
>  
> +#define VECTORSPACING 0x100  /* for EI/VI mode */
> +
>  #define nmi_notifier(fn, pri)
> \
>  ({   \
>   static struct notifier_block fn##_nb = {\
> diff --git a/arch/mips/kernel/cpu-probe.c b/arch/mips/kernel/cpu-probe.c
> index 9a89637b4ecf..b565bc4b900d 100644
> --- a/arch/mips/kernel/cpu-probe.c
> +++ b/arch/mips/kernel/cpu-probe.c
> @@ -26,6 +26,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  
>  #include "fpu-probe.h"
> @@ -1628,6 +1629,7 @@ static inline void cpu_probe_broadcom(struct 
> cpuinfo_mips *c, unsigned int cpu)
>   c->cputype = CPU_BMIPS3300;
>   __cpu_name[cpu] = "Broadcom BMIPS3300";
>   set_elf_platform(cpu, "bmips3300");
> + reserve_exception_space(0x400, VECTORSPACING * 64);
>   break;
>   case PRID_IMP_BMIPS43XX: {
>   int rev = c->processor_id & PRID_REV_MASK;
> @@ -1638,6 +1640,7 @@ static inline void cpu_probe_broadcom(struct 
> cpuinfo_mips *c, unsigned int cpu)
>   __cpu_name[cpu] = "Broadcom BMIPS4380";
>   set_elf_platform(cpu, "bmips4380");
>   c->options |= MIPS_CPU_RIXI;
> + reserve_exception_space(0x400, VECTORSPACING * 64);
>   } else {
>   c->cputype = CPU_BMIPS4350;
>   __cpu_name[cpu] = "Broadcom BMIPS4350";
> @@ -1654,6 +1657,7 @@ static inline void cpu_probe_broadcom(struct 
> cpuinfo_mips *c, unsigned int cpu)
>   __cpu_name[cpu] = "Broadcom BMIPS5000";
>   set_elf_platform(cpu, "bmips5000");
>   c->options |= MIPS_CPU_ULRI | MIPS_CPU_RIXI;
> + reserve_exception_space(0x1000, VECTORSPACING * 64);
>   break;
>   }
>  }
> @@ -2133,6 +2137,9 @@ void cpu_probe(void)
>   if (cpu == 0)
>   __ua_limit = ~((1ull << cpu_vmbits) - 1);
>  #endif
> +
> + if (cpu_has_mips_r2_r6)
> + reserve_exception_space(0, 0x400);
>  }
>  
>  void cpu_report(void)
> diff --git a/arch/mips/kernel/cpu-r3k-probe.c 
> b/arch/mips/kernel/cpu-r3k-probe.c
> index abdbbe8c5a43..af654771918c 100644
> --- a/arch/mips/kernel/cpu-r3k-probe.c
> +++ b/arch/mips/kernel/cpu-r3k-probe.c
> @@ -21,6 +21,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "fpu-probe.h"
>  
> @@ -158,6 +159,8 @@ void cpu_probe(void)
>   cpu_set_fpu_opts(c);
>   else
>   cpu_set_nofpu_opts(c);
> +
> + reserve_exception_space(0, 0x400);
>  }
>  
>  void cpu_report(void)
> diff --git a/arch/mips/kernel/traps.c b/arch/mi

Re: [PATCH v2] MIPS: kernel: Reserve exception base early to prevent corruption

2021-03-06 Thread Florian Fainelli




On 3/6/2021 12:29 AM, Thomas Bogendoerfer wrote:
> BMIPS is one of the few platforms that do change the exception base.
> After commit 2dcb39645441 ("memblock: do not start bottom-up allocations
> with kernel_end") we started seeing BMIPS boards fail to boot with the
> built-in FDT being corrupted.
> 
> Before the cited commit, early allocations would be in the [kernel_end,
> RAM_END] range, but after commit they would be within [RAM_START +
> PAGE_SIZE, RAM_END].
> 
> The custom exception base handler that is installed by
> bmips_ebase_setup() done for BMIPS5000 CPUs ends-up trampling on the
> memory region allocated by unflatten_and_copy_device_tree() thus
> corrupting the FDT used by the kernel.
> 
> To fix this, we need to perform an early reservation of the custom
> exception space. So we reserve it already in cpu_probe() for the CPUs
> where this is fixed. For CPU with an ebase config register allocation
> of exception space will be done in trap_init().
> 
> Huge thanks to Serget for analysing and proposing a solution to this
> issue.

I made a typo on Sergey's name in my original version here.

> 
> Fixes: 2dcb39645441 ("memblock: do not start bottom-up allocations with 
> kernel_end")
> Reported-by: Kamal Dasu 
> Debugged-by: Serge Semin 
> Signed-off-by: Thomas Bogendoerfer 

Tested-by: Florian Fainelli 

Thanks!
-- 
Florian

[PATCH v2] MIPS: kernel: Reserve exception base early to prevent corruption

2021-03-06 Thread Thomas Bogendoerfer

BMIPS is one of the few platforms that do change the exception base.
After commit 2dcb39645441 ("memblock: do not start bottom-up allocations
with kernel_end") we started seeing BMIPS boards fail to boot with the
built-in FDT being corrupted.

Before the cited commit, early allocations would be in the [kernel_end,
RAM_END] range, but after commit they would be within [RAM_START +
PAGE_SIZE, RAM_END].

The custom exception base handler that is installed by
bmips_ebase_setup() done for BMIPS5000 CPUs ends-up trampling on the
memory region allocated by unflatten_and_copy_device_tree() thus
corrupting the FDT used by the kernel.

To fix this, we need to perform an early reservation of the custom
exception space. So we reserve it already in cpu_probe() for the CPUs
where this is fixed. For CPU with an ebase config register allocation
of exception space will be done in trap_init().

Huge thanks to Serget for analysing and proposing a solution to this
issue.

Fixes: 2dcb39645441 ("memblock: do not start bottom-up allocations with 
kernel_end")
Reported-by: Kamal Dasu 
Debugged-by: Serge Semin 
Signed-off-by: Thomas Bogendoerfer 
---
Changes in v2:
 - do only memblock reservation in reserve_exception_space()
 - reserve 0..0x400 for all CPUs without ebase register and
   to addtional reserve_exception_space for BMIPS CPUs

 arch/mips/include/asm/traps.h|  3 +++
 arch/mips/kernel/cpu-probe.c |  7 +++
 arch/mips/kernel/cpu-r3k-probe.c |  3 +++
 arch/mips/kernel/traps.c | 10 +-
 4 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/arch/mips/include/asm/traps.h b/arch/mips/include/asm/traps.h
index 6aa8f126a43d..b710e76c9c65 100644
--- a/arch/mips/include/asm/traps.h
+++ b/arch/mips/include/asm/traps.h
@@ -24,8 +24,11 @@ extern void (*board_ebase_setup)(void);
 extern void (*board_cache_error_setup)(void);
 
 extern int register_nmi_notifier(struct notifier_block *nb);
+extern void reserve_exception_space(phys_addr_t addr, unsigned long size);
 extern char except_vec_nmi[];
 
+#define VECTORSPACING 0x100/* for EI/VI mode */
+
 #define nmi_notifier(fn, pri)  \
 ({ \
static struct notifier_block fn##_nb = {\
diff --git a/arch/mips/kernel/cpu-probe.c b/arch/mips/kernel/cpu-probe.c
index 9a89637b4ecf..b565bc4b900d 100644
--- a/arch/mips/kernel/cpu-probe.c
+++ b/arch/mips/kernel/cpu-probe.c
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "fpu-probe.h"
@@ -1628,6 +1629,7 @@ static inline void cpu_probe_broadcom(struct cpuinfo_mips 
*c, unsigned int cpu)
c->cputype = CPU_BMIPS3300;
__cpu_name[cpu] = "Broadcom BMIPS3300";
set_elf_platform(cpu, "bmips3300");
+   reserve_exception_space(0x400, VECTORSPACING * 64);
break;
case PRID_IMP_BMIPS43XX: {
int rev = c->processor_id & PRID_REV_MASK;
@@ -1638,6 +1640,7 @@ static inline void cpu_probe_broadcom(struct cpuinfo_mips 
*c, unsigned int cpu)
__cpu_name[cpu] = "Broadcom BMIPS4380";
set_elf_platform(cpu, "bmips4380");
c->options |= MIPS_CPU_RIXI;
+   reserve_exception_space(0x400, VECTORSPACING * 64);
} else {
c->cputype = CPU_BMIPS4350;
__cpu_name[cpu] = "Broadcom BMIPS4350";
@@ -1654,6 +1657,7 @@ static inline void cpu_probe_broadcom(struct cpuinfo_mips 
*c, unsigned int cpu)
__cpu_name[cpu] = "Broadcom BMIPS5000";
set_elf_platform(cpu, "bmips5000");
c->options |= MIPS_CPU_ULRI | MIPS_CPU_RIXI;
+   reserve_exception_space(0x1000, VECTORSPACING * 64);
break;
}
 }
@@ -2133,6 +2137,9 @@ void cpu_probe(void)
if (cpu == 0)
__ua_limit = ~((1ull << cpu_vmbits) - 1);
 #endif
+
+   if (cpu_has_mips_r2_r6)
+   reserve_exception_space(0, 0x400);
 }
 
 void cpu_report(void)
diff --git a/arch/mips/kernel/cpu-r3k-probe.c b/arch/mips/kernel/cpu-r3k-probe.c
index abdbbe8c5a43..af654771918c 100644
--- a/arch/mips/kernel/cpu-r3k-probe.c
+++ b/arch/mips/kernel/cpu-r3k-probe.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "fpu-probe.h"
 
@@ -158,6 +159,8 @@ void cpu_probe(void)
cpu_set_fpu_opts(c);
else
cpu_set_nofpu_opts(c);
+
+   reserve_exception_space(0, 0x400);
 }
 
 void cpu_report(void)
diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c
index e0352958e2f7..808b8b61ded1 100644
--- a/arch/mips/kernel/traps.c
+++ b/arch/mips/kernel/traps.c
@@ -2009,13 +2009,16 @@ void __noreturn nmi_exception_handler(struct pt_regs 
*regs)
nmi_exit();
 }
 
-#define VECTORSPACING 0x100/* for EI/VI mode */
-
 unsigned lon

[PATCH v4] mm/userfaultfd: fix memory corruption due to writeprotect

2021-03-04 Thread Nadav Amit

From: Nadav Amit 

Userfaultfd self-test fails occasionally, indicating a memory
corruption.

Analyzing this problem indicates that there is a real bug since
mmap_lock is only taken for read in mwriteprotect_range() and defers
flushes, and since there is insufficient consideration of concurrent
deferred TLB flushes in wp_page_copy(). Although the PTE is flushed from
the TLBs in wp_page_copy(), this flush takes place after the copy has
already been performed, and therefore changes of the page are possible
between the time of the copy and the time in which the PTE is flushed.

To make matters worse, memory-unprotection using userfaultfd also poses
a problem. Although memory unprotection is logically a promotion of PTE
permissions, and therefore should not require a TLB flush, the current
userrfaultfd code might actually cause a demotion of the architectural
PTE permission: when userfaultfd_writeprotect() unprotects memory
region, it unintentionally *clears* the RW-bit if it was already set.
Note that this unprotecting a PTE that is not write-protected is a valid
use-case: the userfaultfd monitor might ask to unprotect a region that
holds both write-protected and write-unprotected PTEs.

The scenario that happens in selftests/vm/userfaultfd is as follows:

cpu0cpu1cpu2

[ Writable PTE
  cached in TLB ]
userfaultfd_writeprotect()
[ write-*unprotect* ]
mwriteprotect_range()
mmap_read_lock()
change_protection()

change_protection_range()
...
change_pte_range()
[ *clear* “write”-bit ]
[ defer TLB flushes ]
[ page-fault ]
...
wp_page_copy()
 cow_user_page()
  [ copy page ]
[ write to old
  page ]
...
 set_pte_at_notify()

A similar scenario can happen:

cpu0cpu1cpu2cpu3

[ Writable PTE
  cached in TLB ]
userfaultfd_writeprotect()
[ write-protect ]
[ deferred TLB flush ]
userfaultfd_writeprotect()
[ write-unprotect ]
[ deferred TLB flush]
[ page-fault ]
wp_page_copy()
 cow_user_page()
 [ copy page ]
 ...[ write to page ]
set_pte_at_notify()

This race exists since commit 292924b26024 ("userfaultfd: wp: apply
_PAGE_UFFD_WP bit"). Yet, as Yu Zhao pointed, these races became
apparent since commit 09854ba94c6a ("mm: do_wp_page() simplification")
which made wp_page_copy() more likely to take place, specifically if
page_count(page) > 1.

To resolve the aforementioned races, check whether there are pending
flushes on uffd-write-protected VMAs, and if there are, perform a flush
before doing the COW.

Further optimizations will follow to avoid during uffd-write-unprotect
unnecassary PTE write-protection and TLB flushes.

Cc: Andrea Arcangeli 
Cc: Andy Lutomirski 
Cc: Pavel Emelyanov 
Cc: Mike Kravetz 
Cc: Mike Rapoport 
Cc: Minchan Kim 
Cc: Will Deacon 
Cc: Peter Zijlstra 
Cc: sta...@vger.kernel.org # 5.9+
Suggested-by: Yu Zhao 
Reviewed-by: Peter Xu 
Tested-by: Peter Xu 
Fixes: 09854ba94c6a ("mm: do_wp_page() simplification")
Signed-off-by: Nadav Amit 

---
v3->v4:
* Fix the "Fixes" tag for real [Peter Xu]
* Reviewed-by, suggested-by tags [Peter Xu]
* Adding unlikely() [Peter Xu]

v2->v3:
* Do not acquire mmap_lock for write, flush conditionally instead [Yu]
* Change the fixes tag to the patch that made the race apparent [Yu]
* Removing patch to avoid write-protect on uffd unprotect. More
  comprehensive solution to follow (and avoid the TLB flush as well).
---
 mm/memory.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/mm/memory.c b/mm/memory.c
index 9e8576a83147..79253cb3bcd5 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3092,6 +3092,14 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf)
return handle_userfault(vmf, VM_UFFD_WP);
}
 
+   /*
+* Userfaultfd write-protect can defer flushes. Ensure the TLB
+* is flushed in this case before copying.
+*/
+   if (unlikely(userfaultfd_wp(vmf->vma) &&
+mm_tlb_flush_pending(vmf->vma->vm_mm)))
+   flush_tlb_page(vmf->vma, vmf->address

Re: [PATCH RESEND v3] mm/userfaultfd: fix memory corruption due to writeprotect

2021-03-03 Thread Nadav Amit



> On Mar 3, 2021, at 11:03 AM, Peter Xu  wrote:
> 
> On Wed, Mar 03, 2021 at 01:57:02AM -0800, Nadav Amit wrote:
>> From: Nadav Amit 
>> 
>> Userfaultfd self-test fails occasionally, indicating a memory
>> corruption.
> 
> It's failing very constantly now for me after I got it run on a 40 cores
> system...  While indeed not easy to fail on my laptop.
> 

It fails rather constantly for me, but since nobody else reproduced it,
I was afraid to say otherwise ;-)

> 
>> Fixes: 292924b26024 ("userfaultfd: wp: apply _PAGE_UFFD_WP bit")
>> Signed-off-by: Nadav Amit 
>> 
>> ---
>> v2->v3:
>> * Do not acquire mmap_lock for write, flush conditionally instead [Yu]
>> * Change the fixes tag to the patch that made the race apparent [Yu]
> 
> Did you forget about this one?  It would still be good to point to 
> 09854ba94c6a
> just to show that 5.7/5.8 stable branches shouldn't need this patch as they're
> not prone to the tlb data curruption.  Maybe also cc stable with 5.9+?

The fixes tag is wrong, as you say. I will fix it and cc stable with 5.9+.

> 
>> * Removing patch to avoid write-protect on uffd unprotect. More
>>  comprehensive solution to follow (and avoid the TLB flush as well).
>> ---
>> mm/memory.c | 7 +++
>> 1 file changed, 7 insertions(+)
>> 
>> diff --git a/mm/memory.c b/mm/memory.c
>> index 9e8576a83147..06da04f98936 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -3092,6 +3092,13 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf)
>>  return handle_userfault(vmf, VM_UFFD_WP);
>>  }
>> 
>> +/*
>> + * Userfaultfd write-protect can defer flushes. Ensure the TLB
>> + * is flushed in this case before copying.
>> + */
>> +if (userfaultfd_wp(vmf->vma) && mm_tlb_flush_pending(vmf->vma->vm_mm))
>> +flush_tlb_page(vmf->vma, vmf->address);
>> +
>>  vmf->page = vm_normal_page(vma, vmf->address, vmf->orig_pte);
>>  if (!vmf->page) {
>>  /*
>> --
>> 2.25.1
>> 
> 
> Thanks for being consistent on fixing this problem.
> 
> Maybe it's even better to put that into a "unlikely" to reduce the affect of
> normal do_wp_page as much as possible?  But I'll leave it to others.
> 
> If with the fixes tag modified:
> 
> Reviewed-by: Peter Xu 
> Tested-by: Peter Xu 

Thanks, I will send v4 later today.

Regards,
Nadav



signature.asc
Description: Message signed with OpenPGP

Re: [PATCH] MIPS: BMIPS: Reserve exception base to prevent corruption

2021-03-03 Thread Maciej W. Rozycki

On Wed, 3 Mar 2021, Thomas Bogendoerfer wrote:

> >  What's up with the R3k (the usual trigger for me) here?
> 
> I've moved r3k cpu_probe() to it's own file and when moving ebase
> reservation to cpu_probe(), I need to add it there as well. So just
> a mechanic step, I've missed.

 Ah, right, I didn't notice the split.  Thanks for taking care of it!

  Maciej

Re: [PATCH] MIPS: kernel: Reserve exception base early to prevent corruption

2021-03-03 Thread Florian Fainelli

On 3/3/21 1:14 PM, Serge Semin wrote:
> Hello  Thomas,
> Thanks for the patch. My comments are below.
> 
> On Wed, Mar 03, 2021 at 07:57:13PM +0100, Thomas Bogendoerfer wrote:
>> BMIPS is one of the few platforms that do change the exception base.
>> After commit 2dcb39645441 ("memblock: do not start bottom-up allocations
>> with kernel_end") we started seeing BMIPS boards fail to boot with the
>> built-in FDT being corrupted.
>>
>> Before the cited commit, early allocations would be in the [kernel_end,
>> RAM_END] range, but after commit they would be within [RAM_START +
>> PAGE_SIZE, RAM_END].
>>
>> The custom exception base handler that is installed by
>> bmips_ebase_setup() done for BMIPS5000 CPUs ends-up trampling on the
>> memory region allocated by unflatten_and_copy_device_tree() thus
>> corrupting the FDT used by the kernel.
>>
>> To fix this, we need to perform an early reservation of the custom
>> exception space. So we reserve it already in cpu_probe() for the CPUs
>> where this is fixed. For CPU with an ebase config register allocation
>> of exception space will be done in trap_init().
>>
>> Huge thanks to Serget for analysing and proposing a solution to this
>> issue.
>>
> 
>> Fixes: Fixes: 2dcb39645441 ("memblock: do not start bottom-up allocations 
>> with kernel_end")
> 
> Fixes tag is used twice.
> 
>> Debugged-by: Serge Semin 
>> Reported-by: Kamal Dasu 
> 
> I'd switch these tags order. First it was reported, then the
> problem was debugged. I suppose it would be also nice to add
> Florian under the second Reported-by tag if he doesn't mind. I haven't
> seen any Kamal' email message, but a report posted by Florian only.

Kamal reported it to me privately and then I brought it publicly, still
wanted to give him credit.

> 
>> Signed-off-by: Thomas Bogendoerfer 
>> ---
>>  arch/mips/include/asm/traps.h|  4 
>>  arch/mips/kernel/cpu-probe.c |  7 +++
>>  arch/mips/kernel/cpu-r3k-probe.c |  3 +++
>>  arch/mips/kernel/smp-bmips.c |  9 +
>>  arch/mips/kernel/traps.c | 31 ---
>>  5 files changed, 31 insertions(+), 23 deletions(-)
>>
>> diff --git a/arch/mips/include/asm/traps.h b/arch/mips/include/asm/traps.h
>> index 6aa8f126a43d..d74d829e1655 100644
>> --- a/arch/mips/include/asm/traps.h
>> +++ b/arch/mips/include/asm/traps.h
>> @@ -24,7 +24,11 @@ extern void (*board_ebase_setup)(void);
>>  extern void (*board_cache_error_setup)(void);
>>  
>>  extern int register_nmi_notifier(struct notifier_block *nb);
>> +extern void reserve_exception_space(unsigned long addr, unsigned long size);
>>  extern char except_vec_nmi[];
>> +extern unsigned long ebase_size;
>> +
>> +#define VECTORSPACING 0x100 /* for EI/VI mode */
>>  
>>  #define nmi_notifier(fn, pri)   
>> \
>>  ({  \
>> diff --git a/arch/mips/kernel/cpu-probe.c b/arch/mips/kernel/cpu-probe.c
>> index 9a89637b4ecf..effc45cbb351 100644
>> --- a/arch/mips/kernel/cpu-probe.c
>> +++ b/arch/mips/kernel/cpu-probe.c
>> @@ -26,6 +26,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  #include 
>>  
>>  #include "fpu-probe.h"
>> @@ -1628,6 +1629,7 @@ static inline void cpu_probe_broadcom(struct 
>> cpuinfo_mips *c, unsigned int cpu)
>>  c->cputype = CPU_BMIPS3300;
>>  __cpu_name[cpu] = "Broadcom BMIPS3300";
>>  set_elf_platform(cpu, "bmips3300");
>> +reserve_exception_space(0x8400, VECTORSPACING * 64);
>>  break;
>>  case PRID_IMP_BMIPS43XX: {
>>  int rev = c->processor_id & PRID_REV_MASK;
>> @@ -1638,6 +1640,7 @@ static inline void cpu_probe_broadcom(struct 
>> cpuinfo_mips *c, unsigned int cpu)
>>  __cpu_name[cpu] = "Broadcom BMIPS4380";
>>  set_elf_platform(cpu, "bmips4380");
>>  c->options |= MIPS_CPU_RIXI;
>> +reserve_exception_space(0x8400, VECTORSPACING * 64);
>>  } else {
>>  c->cputype = CPU_BMIPS4350;
>>  __cpu_name[cpu] = "Broadcom BMIPS4350";
>> @@ -1654,6 +1657,7 @@ static inline void cpu_probe_broadcom(struct 
>> cpuinfo_mips *c, unsigned int cpu)
>>  __cpu_name[cpu] = "Broadcom BMIPS5000";
>>  set_elf_platform(cpu, "bmips5000");
>>  c->options |= MIPS_CPU_ULRI | MIPS_CPU_RIXI;
>> +reserve_exception_space(0x80001000, VECTORSPACING * 64);
>>  break;
>>  }
>>  }
>> @@ -2133,6 +2137,9 @@ void cpu_probe(void)
>>  if (cpu == 0)
>>  __ua_limit = ~((1ull << cpu_vmbits) - 1);
>>  #endif
>> +
>> +if (ebase_size == 0 && !cpu_has_mips_r2_r6)
>> +reserve_exception_space(CAC_BASE, 0x400);
>>  }
>>  
>>  void cpu_report(void)
>> diff --git a/arch/mips/kernel/cpu-r3k-probe.c 
>> b/arch/mips/kernel/cpu-r3k-probe.c
>> index abdbbe8c5a43..6e3f4c17b810 100644
>>

Re: [PATCH] MIPS: kernel: Reserve exception base early to prevent corruption

2021-03-03 Thread Serge Semin

Hello  Thomas,
Thanks for the patch. My comments are below.

On Wed, Mar 03, 2021 at 07:57:13PM +0100, Thomas Bogendoerfer wrote:
> BMIPS is one of the few platforms that do change the exception base.
> After commit 2dcb39645441 ("memblock: do not start bottom-up allocations
> with kernel_end") we started seeing BMIPS boards fail to boot with the
> built-in FDT being corrupted.
> 
> Before the cited commit, early allocations would be in the [kernel_end,
> RAM_END] range, but after commit they would be within [RAM_START +
> PAGE_SIZE, RAM_END].
> 
> The custom exception base handler that is installed by
> bmips_ebase_setup() done for BMIPS5000 CPUs ends-up trampling on the
> memory region allocated by unflatten_and_copy_device_tree() thus
> corrupting the FDT used by the kernel.
> 
> To fix this, we need to perform an early reservation of the custom
> exception space. So we reserve it already in cpu_probe() for the CPUs
> where this is fixed. For CPU with an ebase config register allocation
> of exception space will be done in trap_init().
> 
> Huge thanks to Serget for analysing and proposing a solution to this
> issue.
> 

> Fixes: Fixes: 2dcb39645441 ("memblock: do not start bottom-up allocations 
> with kernel_end")

Fixes tag is used twice.

> Debugged-by: Serge Semin 
> Reported-by: Kamal Dasu 

I'd switch these tags order. First it was reported, then the
problem was debugged. I suppose it would be also nice to add
Florian under the second Reported-by tag if he doesn't mind. I haven't
seen any Kamal' email message, but a report posted by Florian only.

> Signed-off-by: Thomas Bogendoerfer 
> ---
>  arch/mips/include/asm/traps.h|  4 
>  arch/mips/kernel/cpu-probe.c |  7 +++
>  arch/mips/kernel/cpu-r3k-probe.c |  3 +++
>  arch/mips/kernel/smp-bmips.c |  9 +
>  arch/mips/kernel/traps.c | 31 ---
>  5 files changed, 31 insertions(+), 23 deletions(-)
> 
> diff --git a/arch/mips/include/asm/traps.h b/arch/mips/include/asm/traps.h
> index 6aa8f126a43d..d74d829e1655 100644
> --- a/arch/mips/include/asm/traps.h
> +++ b/arch/mips/include/asm/traps.h
> @@ -24,7 +24,11 @@ extern void (*board_ebase_setup)(void);
>  extern void (*board_cache_error_setup)(void);
>  
>  extern int register_nmi_notifier(struct notifier_block *nb);
> +extern void reserve_exception_space(unsigned long addr, unsigned long size);
>  extern char except_vec_nmi[];
> +extern unsigned long ebase_size;
> +
> +#define VECTORSPACING 0x100  /* for EI/VI mode */
>  
>  #define nmi_notifier(fn, pri)
> \
>  ({   \
> diff --git a/arch/mips/kernel/cpu-probe.c b/arch/mips/kernel/cpu-probe.c
> index 9a89637b4ecf..effc45cbb351 100644
> --- a/arch/mips/kernel/cpu-probe.c
> +++ b/arch/mips/kernel/cpu-probe.c
> @@ -26,6 +26,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  
>  #include "fpu-probe.h"
> @@ -1628,6 +1629,7 @@ static inline void cpu_probe_broadcom(struct 
> cpuinfo_mips *c, unsigned int cpu)
>   c->cputype = CPU_BMIPS3300;
>   __cpu_name[cpu] = "Broadcom BMIPS3300";
>   set_elf_platform(cpu, "bmips3300");
> + reserve_exception_space(0x8400, VECTORSPACING * 64);
>   break;
>   case PRID_IMP_BMIPS43XX: {
>   int rev = c->processor_id & PRID_REV_MASK;
> @@ -1638,6 +1640,7 @@ static inline void cpu_probe_broadcom(struct 
> cpuinfo_mips *c, unsigned int cpu)
>   __cpu_name[cpu] = "Broadcom BMIPS4380";
>   set_elf_platform(cpu, "bmips4380");
>   c->options |= MIPS_CPU_RIXI;
> + reserve_exception_space(0x8400, VECTORSPACING * 64);
>   } else {
>   c->cputype = CPU_BMIPS4350;
>   __cpu_name[cpu] = "Broadcom BMIPS4350";
> @@ -1654,6 +1657,7 @@ static inline void cpu_probe_broadcom(struct 
> cpuinfo_mips *c, unsigned int cpu)
>   __cpu_name[cpu] = "Broadcom BMIPS5000";
>   set_elf_platform(cpu, "bmips5000");
>   c->options |= MIPS_CPU_ULRI | MIPS_CPU_RIXI;
> + reserve_exception_space(0x80001000, VECTORSPACING * 64);
>   break;
>   }
>  }
> @@ -2133,6 +2137,9 @@ void cpu_probe(void)
>   if (cpu == 0)
>   __ua_limit = ~((1ull << cpu_vmbits) - 1);
>  #endif
> +
> + if (ebase_size == 0 && !cpu_has_mips_r2_r6)
> + reserve_exception_space(CAC_BASE, 0x400);
>  }
>  
>  void cpu_report(void)
> diff --git a/arch/mips/kernel/cpu-r3k-probe.c 
> b/arch/mips/kernel/cpu-r3k-probe.c
> index abdbbe8c5a43..6e3f4c17b810 100644
> --- a/arch/mips/kernel/cpu-r3k-probe.c
> +++ b/arch/mips/kernel/cpu-r3k-probe.c
> @@ -21,6 +21,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "fpu-probe.h"
>  
> @@ -158,6 +159,8 @@ void cpu_probe(void)
>

Re: [PATCH RESEND v3] mm/userfaultfd: fix memory corruption due to writeprotect

2021-03-03 Thread Peter Xu

On Wed, Mar 03, 2021 at 01:57:02AM -0800, Nadav Amit wrote:
> From: Nadav Amit 
> 
> Userfaultfd self-test fails occasionally, indicating a memory
> corruption.

It's failing very constantly now for me after I got it run on a 40 cores
system...  While indeed not easy to fail on my laptop.

[...]

> Fixes: 292924b26024 ("userfaultfd: wp: apply _PAGE_UFFD_WP bit")
> Signed-off-by: Nadav Amit 
> 
> ---
> v2->v3:
> * Do not acquire mmap_lock for write, flush conditionally instead [Yu]
> * Change the fixes tag to the patch that made the race apparent [Yu]

Did you forget about this one?  It would still be good to point to 09854ba94c6a
just to show that 5.7/5.8 stable branches shouldn't need this patch as they're
not prone to the tlb data curruption.  Maybe also cc stable with 5.9+?

> * Removing patch to avoid write-protect on uffd unprotect. More
>   comprehensive solution to follow (and avoid the TLB flush as well).
> ---
>  mm/memory.c | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index 9e8576a83147..06da04f98936 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3092,6 +3092,13 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf)
>   return handle_userfault(vmf, VM_UFFD_WP);
>   }
>  
> + /*
> +  * Userfaultfd write-protect can defer flushes. Ensure the TLB
> +  * is flushed in this case before copying.
> +  */
> + if (userfaultfd_wp(vmf->vma) && mm_tlb_flush_pending(vmf->vma->vm_mm))
> + flush_tlb_page(vmf->vma, vmf->address);
> +
>   vmf->page = vm_normal_page(vma, vmf->address, vmf->orig_pte);
>   if (!vmf->page) {
>   /*
> -- 
> 2.25.1
> 

Thanks for being consistent on fixing this problem.

Maybe it's even better to put that into a "unlikely" to reduce the affect of
normal do_wp_page as much as possible?  But I'll leave it to others.

If with the fixes tag modified:

Reviewed-by: Peter Xu 
Tested-by: Peter Xu 

Thanks,

-- 
Peter Xu

[PATCH] MIPS: kernel: Reserve exception base early to prevent corruption

2021-03-03 Thread Thomas Bogendoerfer

BMIPS is one of the few platforms that do change the exception base.
After commit 2dcb39645441 ("memblock: do not start bottom-up allocations
with kernel_end") we started seeing BMIPS boards fail to boot with the
built-in FDT being corrupted.

Before the cited commit, early allocations would be in the [kernel_end,
RAM_END] range, but after commit they would be within [RAM_START +
PAGE_SIZE, RAM_END].

The custom exception base handler that is installed by
bmips_ebase_setup() done for BMIPS5000 CPUs ends-up trampling on the
memory region allocated by unflatten_and_copy_device_tree() thus
corrupting the FDT used by the kernel.

To fix this, we need to perform an early reservation of the custom
exception space. So we reserve it already in cpu_probe() for the CPUs
where this is fixed. For CPU with an ebase config register allocation
of exception space will be done in trap_init().

Huge thanks to Serget for analysing and proposing a solution to this
issue.

Fixes: Fixes: 2dcb39645441 ("memblock: do not start bottom-up allocations with 
kernel_end")
Debugged-by: Serge Semin 
Reported-by: Kamal Dasu 
Signed-off-by: Thomas Bogendoerfer 
---
 arch/mips/include/asm/traps.h|  4 
 arch/mips/kernel/cpu-probe.c |  7 +++
 arch/mips/kernel/cpu-r3k-probe.c |  3 +++
 arch/mips/kernel/smp-bmips.c |  9 +
 arch/mips/kernel/traps.c | 31 ---
 5 files changed, 31 insertions(+), 23 deletions(-)

diff --git a/arch/mips/include/asm/traps.h b/arch/mips/include/asm/traps.h
index 6aa8f126a43d..d74d829e1655 100644
--- a/arch/mips/include/asm/traps.h
+++ b/arch/mips/include/asm/traps.h
@@ -24,7 +24,11 @@ extern void (*board_ebase_setup)(void);
 extern void (*board_cache_error_setup)(void);
 
 extern int register_nmi_notifier(struct notifier_block *nb);
+extern void reserve_exception_space(unsigned long addr, unsigned long size);
 extern char except_vec_nmi[];
+extern unsigned long ebase_size;
+
+#define VECTORSPACING 0x100/* for EI/VI mode */
 
 #define nmi_notifier(fn, pri)  \
 ({ \
diff --git a/arch/mips/kernel/cpu-probe.c b/arch/mips/kernel/cpu-probe.c
index 9a89637b4ecf..effc45cbb351 100644
--- a/arch/mips/kernel/cpu-probe.c
+++ b/arch/mips/kernel/cpu-probe.c
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "fpu-probe.h"
@@ -1628,6 +1629,7 @@ static inline void cpu_probe_broadcom(struct cpuinfo_mips 
*c, unsigned int cpu)
c->cputype = CPU_BMIPS3300;
__cpu_name[cpu] = "Broadcom BMIPS3300";
set_elf_platform(cpu, "bmips3300");
+   reserve_exception_space(0x8400, VECTORSPACING * 64);
break;
case PRID_IMP_BMIPS43XX: {
int rev = c->processor_id & PRID_REV_MASK;
@@ -1638,6 +1640,7 @@ static inline void cpu_probe_broadcom(struct cpuinfo_mips 
*c, unsigned int cpu)
__cpu_name[cpu] = "Broadcom BMIPS4380";
set_elf_platform(cpu, "bmips4380");
c->options |= MIPS_CPU_RIXI;
+   reserve_exception_space(0x8400, VECTORSPACING * 64);
} else {
c->cputype = CPU_BMIPS4350;
__cpu_name[cpu] = "Broadcom BMIPS4350";
@@ -1654,6 +1657,7 @@ static inline void cpu_probe_broadcom(struct cpuinfo_mips 
*c, unsigned int cpu)
__cpu_name[cpu] = "Broadcom BMIPS5000";
set_elf_platform(cpu, "bmips5000");
c->options |= MIPS_CPU_ULRI | MIPS_CPU_RIXI;
+   reserve_exception_space(0x80001000, VECTORSPACING * 64);
break;
}
 }
@@ -2133,6 +2137,9 @@ void cpu_probe(void)
if (cpu == 0)
__ua_limit = ~((1ull << cpu_vmbits) - 1);
 #endif
+
+   if (ebase_size == 0 && !cpu_has_mips_r2_r6)
+   reserve_exception_space(CAC_BASE, 0x400);
 }
 
 void cpu_report(void)
diff --git a/arch/mips/kernel/cpu-r3k-probe.c b/arch/mips/kernel/cpu-r3k-probe.c
index abdbbe8c5a43..6e3f4c17b810 100644
--- a/arch/mips/kernel/cpu-r3k-probe.c
+++ b/arch/mips/kernel/cpu-r3k-probe.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "fpu-probe.h"
 
@@ -158,6 +159,8 @@ void cpu_probe(void)
cpu_set_fpu_opts(c);
else
cpu_set_nofpu_opts(c);
+
+   reserve_exception_space(CAC_BASE, 0x400);
 }
 
 void cpu_report(void)
diff --git a/arch/mips/kernel/smp-bmips.c b/arch/mips/kernel/smp-bmips.c
index b6ef5f7312cf..ad3f2282a65a 100644
--- a/arch/mips/kernel/smp-bmips.c
+++ b/arch/mips/kernel/smp-bmips.c
@@ -528,10 +528,6 @@ static void bmips_set_reset_vec(int cpu, u32 val)
 
 void bmips_ebase_setup(void)
 {
-   unsigned long new_ebase = ebase;
-
-   BUG_ON(ebase != CKSEG0);
-
switch (current_cpu_type()) {
case CPU_BMIPS4350:

Re: [PATCH] MIPS: BMIPS: Reserve exception base to prevent corruption

2021-03-03 Thread Thomas Bogendoerfer

On Wed, Mar 03, 2021 at 06:45:52PM +0100, Maciej W. Rozycki wrote:
> On Wed, 3 Mar 2021, Thomas Bogendoerfer wrote:
> 
> > perfect, I only forgot about R3k... I'll submit a formal patch submission
> > later today.
> 
>  What's up with the R3k (the usual trigger for me) here?

I've moved r3k cpu_probe() to it's own file and when moving ebase
reservation to cpu_probe(), I need to add it there as well. So just
a mechanic step, I've missed.

Thomas.

-- 
Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
good idea.[ RFC1925, 2.3 ]

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 4002 matches

Mail list logo