[PATCH 5.11 012/210] selinux: fix cond_list corruption when changing booleans
From: Ondrej Mosnacek commit d8f5f0ea5b86300390b026b6c6e7836b7150814a upstream. Currently, duplicate_policydb_cond_list() first copies the whole conditional avtab and then tries to link to the correct entries in cond_dup_av_list() using avtab_search(). However, since the conditional avtab may contain multiple entries with the same key, this approach often fails to find the right entry, potentially leading to wrong rules being activated/deactivated when booleans are changed. To fix this, instead start with an empty conditional avtab and add the individual entries one-by-one while building the new av_lists. This approach leads to the correct result, since each entry is present in the av_lists exactly once. The issue can be reproduced with Fedora policy as follows: # sesearch -s ftpd_t -t public_content_rw_t -c dir -p create -A allow ftpd_t non_security_file_type:dir { add_name create getattr ioctl link lock open read remove_name rename reparent rmdir search setattr unlink watch watch_reads write }; [ ftpd_full_access ]:True allow ftpd_t public_content_rw_t:dir { add_name create link remove_name rename reparent rmdir setattr unlink watch watch_reads write }; [ ftpd_anon_write ]:True # setsebool ftpd_anon_write=off ftpd_connect_all_unreserved=off ftpd_connect_db=off ftpd_full_access=off On fixed kernels, the sesearch output is the same after the setsebool command: # sesearch -s ftpd_t -t public_content_rw_t -c dir -p create -A allow ftpd_t non_security_file_type:dir { add_name create getattr ioctl link lock open read remove_name rename reparent rmdir search setattr unlink watch watch_reads write }; [ ftpd_full_access ]:True allow ftpd_t public_content_rw_t:dir { add_name create link remove_name rename reparent rmdir setattr unlink watch watch_reads write }; [ ftpd_anon_write ]:True While on the broken kernels, it will be different: # sesearch -s ftpd_t -t public_content_rw_t -c dir -p create -A allow ftpd_t non_security_file_type:dir { add_name create getattr ioctl link lock open read remove_name rename reparent rmdir search setattr unlink watch watch_reads write }; [ ftpd_full_access ]:True allow ftpd_t non_security_file_type:dir { add_name create getattr ioctl link lock open read remove_name rename reparent rmdir search setattr unlink watch watch_reads write }; [ ftpd_full_access ]:True allow ftpd_t non_security_file_type:dir { add_name create getattr ioctl link lock open read remove_name rename reparent rmdir search setattr unlink watch watch_reads write }; [ ftpd_full_access ]:True While there, also simplify the computation of nslots. This changes the nslots values for nrules 2 or 3 to just two slots instead of 4, which makes the sequence more consistent. Cc: sta...@vger.kernel.org Fixes: c7c556f1e81b ("selinux: refactor changing booleans") Signed-off-by: Ondrej Mosnacek Signed-off-by: Paul Moore Signed-off-by: Greg Kroah-Hartman --- security/selinux/ss/avtab.c | 86 +++--- security/selinux/ss/avtab.h |2 security/selinux/ss/conditional.c | 12 ++--- 3 files changed, 32 insertions(+), 68 deletions(-) --- a/security/selinux/ss/avtab.c +++ b/security/selinux/ss/avtab.c @@ -308,24 +308,10 @@ void avtab_init(struct avtab *h) h->mask = 0; } -int avtab_alloc(struct avtab *h, u32 nrules) +static int avtab_alloc_common(struct avtab *h, u32 nslot) { - u32 shift = 0; - u32 work = nrules; - u32 nslot; - - if (nrules == 0) - goto avtab_alloc_out; - - while (work) { - work = work >> 1; - shift++; - } - if (shift > 2) - shift = shift - 2; - nslot = 1 << shift; - if (nslot > MAX_AVTAB_HASH_BUCKETS) - nslot = MAX_AVTAB_HASH_BUCKETS; + if (!nslot) + return 0; h->htable = kvcalloc(nslot, sizeof(void *), GFP_KERNEL); if (!h->htable) @@ -333,59 +319,37 @@ int avtab_alloc(struct avtab *h, u32 nru h->nslot = nslot; h->mask = nslot - 1; - -avtab_alloc_out: - pr_debug("SELinux: %d avtab hash slots, %d rules.\n", - h->nslot, nrules); return 0; } -int avtab_duplicate(struct avtab *new, struct avtab *orig) +int avtab_alloc(struct avtab *h, u32 nrules) { - int i; - struct avtab_node *node, *tmp, *tail; - - memset(new, 0, sizeof(*new)); + int rc; + u32 nslot = 0; - new->htable = kvcalloc(orig->nslot, sizeof(void *), GFP_KERNEL); - if (!new->htable) - return -ENOMEM; - new->nslot = orig->nslot; - new->mask = orig->mask; - - for (i = 0; i < orig->nslot; i++) { - tail = NULL; - for (node = orig->htable[i]; node; node = node->next) { - tmp = kmem_cache_zalloc(avtab_node_cachep, GFP_KERNEL); - if (!tmp) - goto error; -
[PATCH 5.10 011/188] selinux: fix cond_list corruption when changing booleans
From: Ondrej Mosnacek commit d8f5f0ea5b86300390b026b6c6e7836b7150814a upstream. Currently, duplicate_policydb_cond_list() first copies the whole conditional avtab and then tries to link to the correct entries in cond_dup_av_list() using avtab_search(). However, since the conditional avtab may contain multiple entries with the same key, this approach often fails to find the right entry, potentially leading to wrong rules being activated/deactivated when booleans are changed. To fix this, instead start with an empty conditional avtab and add the individual entries one-by-one while building the new av_lists. This approach leads to the correct result, since each entry is present in the av_lists exactly once. The issue can be reproduced with Fedora policy as follows: # sesearch -s ftpd_t -t public_content_rw_t -c dir -p create -A allow ftpd_t non_security_file_type:dir { add_name create getattr ioctl link lock open read remove_name rename reparent rmdir search setattr unlink watch watch_reads write }; [ ftpd_full_access ]:True allow ftpd_t public_content_rw_t:dir { add_name create link remove_name rename reparent rmdir setattr unlink watch watch_reads write }; [ ftpd_anon_write ]:True # setsebool ftpd_anon_write=off ftpd_connect_all_unreserved=off ftpd_connect_db=off ftpd_full_access=off On fixed kernels, the sesearch output is the same after the setsebool command: # sesearch -s ftpd_t -t public_content_rw_t -c dir -p create -A allow ftpd_t non_security_file_type:dir { add_name create getattr ioctl link lock open read remove_name rename reparent rmdir search setattr unlink watch watch_reads write }; [ ftpd_full_access ]:True allow ftpd_t public_content_rw_t:dir { add_name create link remove_name rename reparent rmdir setattr unlink watch watch_reads write }; [ ftpd_anon_write ]:True While on the broken kernels, it will be different: # sesearch -s ftpd_t -t public_content_rw_t -c dir -p create -A allow ftpd_t non_security_file_type:dir { add_name create getattr ioctl link lock open read remove_name rename reparent rmdir search setattr unlink watch watch_reads write }; [ ftpd_full_access ]:True allow ftpd_t non_security_file_type:dir { add_name create getattr ioctl link lock open read remove_name rename reparent rmdir search setattr unlink watch watch_reads write }; [ ftpd_full_access ]:True allow ftpd_t non_security_file_type:dir { add_name create getattr ioctl link lock open read remove_name rename reparent rmdir search setattr unlink watch watch_reads write }; [ ftpd_full_access ]:True While there, also simplify the computation of nslots. This changes the nslots values for nrules 2 or 3 to just two slots instead of 4, which makes the sequence more consistent. Cc: sta...@vger.kernel.org Fixes: c7c556f1e81b ("selinux: refactor changing booleans") Signed-off-by: Ondrej Mosnacek Signed-off-by: Paul Moore Signed-off-by: Greg Kroah-Hartman --- security/selinux/ss/avtab.c | 86 +++--- security/selinux/ss/avtab.h |2 security/selinux/ss/conditional.c | 12 ++--- 3 files changed, 32 insertions(+), 68 deletions(-) --- a/security/selinux/ss/avtab.c +++ b/security/selinux/ss/avtab.c @@ -308,24 +308,10 @@ void avtab_init(struct avtab *h) h->mask = 0; } -int avtab_alloc(struct avtab *h, u32 nrules) +static int avtab_alloc_common(struct avtab *h, u32 nslot) { - u32 shift = 0; - u32 work = nrules; - u32 nslot; - - if (nrules == 0) - goto avtab_alloc_out; - - while (work) { - work = work >> 1; - shift++; - } - if (shift > 2) - shift = shift - 2; - nslot = 1 << shift; - if (nslot > MAX_AVTAB_HASH_BUCKETS) - nslot = MAX_AVTAB_HASH_BUCKETS; + if (!nslot) + return 0; h->htable = kvcalloc(nslot, sizeof(void *), GFP_KERNEL); if (!h->htable) @@ -333,59 +319,37 @@ int avtab_alloc(struct avtab *h, u32 nru h->nslot = nslot; h->mask = nslot - 1; - -avtab_alloc_out: - pr_debug("SELinux: %d avtab hash slots, %d rules.\n", - h->nslot, nrules); return 0; } -int avtab_duplicate(struct avtab *new, struct avtab *orig) +int avtab_alloc(struct avtab *h, u32 nrules) { - int i; - struct avtab_node *node, *tmp, *tail; - - memset(new, 0, sizeof(*new)); + int rc; + u32 nslot = 0; - new->htable = kvcalloc(orig->nslot, sizeof(void *), GFP_KERNEL); - if (!new->htable) - return -ENOMEM; - new->nslot = orig->nslot; - new->mask = orig->mask; - - for (i = 0; i < orig->nslot; i++) { - tail = NULL; - for (node = orig->htable[i]; node; node = node->next) { - tmp = kmem_cache_zalloc(avtab_node_cachep, GFP_KERNEL); - if (!tmp) - goto error; -
Re: [PATCH v2] mm: page_poison: print page info when corruption is caught
On 4/8/21 1:08 AM, Sergei Trofimovich wrote: > When page_poison detects page corruption it's useful to see who > freed a page recently to have a guess where write-after-free > corruption happens. > > After this change corruption report has extra page data. > Example report from real corruption (includes only page_pwner part): > > pagealloc: memory corruption > e0014cd61d10: 11 00 00 00 00 00 00 00 30 1d d2 ff ff 0f 00 60 > 0..` > e0014cd61d20: b0 1d d2 ff ff 0f 00 60 90 fe 1c 00 08 00 00 20 > ...`... > ... > CPU: 1 PID: 220402 Comm: cc1plus Not tainted > 5.12.0-rc5-00107-g9720c6f59ecf #245 > Hardware name: hp server rx3600, BIOS 04.03 04/08/2008 > ... > Call Trace: > [] show_stack+0x90/0xc0 > [] dump_stack+0x150/0x1c0 > [] __kernel_unpoison_pages+0x410/0x440 > [] get_page_from_freelist+0x1460/0x2ca0 > [] __alloc_pages_nodemask+0x3c0/0x660 > [] alloc_pages_vma+0xb0/0x500 > [] __handle_mm_fault+0x1230/0x1fe0 > [] handle_mm_fault+0x310/0x4e0 > [] ia64_do_page_fault+0x1f0/0xb80 > [] ia64_leave_kernel+0x0/0x270 > page_owner tracks the page as freed > page allocated via order 0, migratetype Movable, > gfp_mask 0x100dca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), pid 37, ts > 8173444098740 > __reset_page_owner+0x40/0x200 > free_pcp_prepare+0x4d0/0x600 > free_unref_page+0x20/0x1c0 > __put_page+0x110/0x1a0 > migrate_pages+0x16d0/0x1dc0 > compact_zone+0xfc0/0x1aa0 > proactive_compact_node+0xd0/0x1e0 > kcompactd+0x550/0x600 > kthread+0x2c0/0x2e0 > call_payload+0x50/0x80 > > Here we can see that page was freed by page migration but something > managed to write to it afterwards. > > CC: Vlastimil Babka > CC: Andrew Morton > CC: linux...@kvack.org > Signed-off-by: Sergei Trofimovich Acked-by: Vlastimil Babka > --- > Change since v1: use more generic 'dump_page()' suggested by Vlastimil > Should supersede existing > mm-page_poison-print-page-owner-info-when-corruption-is-caught.patch > > mm/page_poison.c | 6 -- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/mm/page_poison.c b/mm/page_poison.c > index 65cdf844c8ad..df03126f3b2b 100644 > --- a/mm/page_poison.c > +++ b/mm/page_poison.c > @@ -2,6 +2,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -45,7 +46,7 @@ static bool single_bit_flip(unsigned char a, unsigned char > b) > return error && !(error & (error - 1)); > } > > -static void check_poison_mem(unsigned char *mem, size_t bytes) > +static void check_poison_mem(struct page *page, unsigned char *mem, size_t > bytes) > { > static DEFINE_RATELIMIT_STATE(ratelimit, 5 * HZ, 10); > unsigned char *start; > @@ -70,6 +71,7 @@ static void check_poison_mem(unsigned char *mem, size_t > bytes) > print_hex_dump(KERN_ERR, "", DUMP_PREFIX_ADDRESS, 16, 1, start, > end - start + 1, 1); > dump_stack(); > + dump_page(page, "pagealloc: corrupted page details"); > } > > static void unpoison_page(struct page *page) > @@ -82,7 +84,7 @@ static void unpoison_page(struct page *page) >* that is freed to buddy. Thus no extra check is done to >* see if a page was poisoned. >*/ > - check_poison_mem(addr, PAGE_SIZE); > + check_poison_mem(page, addr, PAGE_SIZE); > kunmap_atomic(addr); > } > >
[PATCH v2] mm: page_poison: print page info when corruption is caught
When page_poison detects page corruption it's useful to see who freed a page recently to have a guess where write-after-free corruption happens. After this change corruption report has extra page data. Example report from real corruption (includes only page_pwner part): pagealloc: memory corruption e0014cd61d10: 11 00 00 00 00 00 00 00 30 1d d2 ff ff 0f 00 60 0..` e0014cd61d20: b0 1d d2 ff ff 0f 00 60 90 fe 1c 00 08 00 00 20 ...`... ... CPU: 1 PID: 220402 Comm: cc1plus Not tainted 5.12.0-rc5-00107-g9720c6f59ecf #245 Hardware name: hp server rx3600, BIOS 04.03 04/08/2008 ... Call Trace: [] show_stack+0x90/0xc0 [] dump_stack+0x150/0x1c0 [] __kernel_unpoison_pages+0x410/0x440 [] get_page_from_freelist+0x1460/0x2ca0 [] __alloc_pages_nodemask+0x3c0/0x660 [] alloc_pages_vma+0xb0/0x500 [] __handle_mm_fault+0x1230/0x1fe0 [] handle_mm_fault+0x310/0x4e0 [] ia64_do_page_fault+0x1f0/0xb80 [] ia64_leave_kernel+0x0/0x270 page_owner tracks the page as freed page allocated via order 0, migratetype Movable, gfp_mask 0x100dca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), pid 37, ts 8173444098740 __reset_page_owner+0x40/0x200 free_pcp_prepare+0x4d0/0x600 free_unref_page+0x20/0x1c0 __put_page+0x110/0x1a0 migrate_pages+0x16d0/0x1dc0 compact_zone+0xfc0/0x1aa0 proactive_compact_node+0xd0/0x1e0 kcompactd+0x550/0x600 kthread+0x2c0/0x2e0 call_payload+0x50/0x80 Here we can see that page was freed by page migration but something managed to write to it afterwards. CC: Vlastimil Babka CC: Andrew Morton CC: linux...@kvack.org Signed-off-by: Sergei Trofimovich --- Change since v1: use more generic 'dump_page()' suggested by Vlastimil Should supersede existing mm-page_poison-print-page-owner-info-when-corruption-is-caught.patch mm/page_poison.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/mm/page_poison.c b/mm/page_poison.c index 65cdf844c8ad..df03126f3b2b 100644 --- a/mm/page_poison.c +++ b/mm/page_poison.c @@ -2,6 +2,7 @@ #include #include #include +#include #include #include #include @@ -45,7 +46,7 @@ static bool single_bit_flip(unsigned char a, unsigned char b) return error && !(error & (error - 1)); } -static void check_poison_mem(unsigned char *mem, size_t bytes) +static void check_poison_mem(struct page *page, unsigned char *mem, size_t bytes) { static DEFINE_RATELIMIT_STATE(ratelimit, 5 * HZ, 10); unsigned char *start; @@ -70,6 +71,7 @@ static void check_poison_mem(unsigned char *mem, size_t bytes) print_hex_dump(KERN_ERR, "", DUMP_PREFIX_ADDRESS, 16, 1, start, end - start + 1, 1); dump_stack(); + dump_page(page, "pagealloc: corrupted page details"); } static void unpoison_page(struct page *page) @@ -82,7 +84,7 @@ static void unpoison_page(struct page *page) * that is freed to buddy. Thus no extra check is done to * see if a page was poisoned. */ - check_poison_mem(addr, PAGE_SIZE); + check_poison_mem(page, addr, PAGE_SIZE); kunmap_atomic(addr); } -- 2.31.1
Re: [PATCH] mm: page_poison: print page owner info when corruption is caught
On Wed, Apr 07, 2021 at 02:15:50PM +0200, Vlastimil Babka wrote: > On 4/4/21 4:17 PM, Sergei Trofimovich wrote: > > When page_poison detects page corruption it's useful to see who > > freed a page recently to have a guess where write-after-free > > corruption happens. > > > > After this change corruption report has extra page_owner data. > > Example report from real corruption: > > > > pagealloc: memory corruption > > e0014cd61d10: 11 00 00 00 00 00 00 00 30 1d d2 ff ff 0f 00 60 > > e0014cd61d20: b0 1d d2 ff ff 0f 00 60 90 fe 1c 00 08 00 00 20 > > ... > > CPU: 1 PID: 220402 Comm: cc1plus Not tainted > > 5.12.0-rc5-00107-g9720c6f59ecf #245 > > Hardware name: hp server rx3600, BIOS 04.03 04/08/2008 > > ... > > Call Trace: > > [] show_stack+0x90/0xc0 > > [] dump_stack+0x150/0x1c0 > > [] __kernel_unpoison_pages+0x410/0x440 > > [] get_page_from_freelist+0x1460/0x2ca0 > > [] __alloc_pages_nodemask+0x3c0/0x660 > > [] alloc_pages_vma+0xb0/0x500 > > [] __handle_mm_fault+0x1230/0x1fe0 > > [] handle_mm_fault+0x310/0x4e0 > > [] ia64_do_page_fault+0x1f0/0xb80 > > [] ia64_leave_kernel+0x0/0x270 > > page_owner tracks the page as freed > > page allocated via order 0, migratetype Movable, > > gfp_mask 0x100dca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), pid 37, ts > > 8173444098740 > > __reset_page_owner+0x40/0x200 > > free_pcp_prepare+0x4d0/0x600 > > free_unref_page+0x20/0x1c0 > > __put_page+0x110/0x1a0 > > migrate_pages+0x16d0/0x1dc0 > > compact_zone+0xfc0/0x1aa0 > > proactive_compact_node+0xd0/0x1e0 > > kcompactd+0x550/0x600 > > kthread+0x2c0/0x2e0 > > call_payload+0x50/0x80 > > > > Here we can see that page was freed by page migration but something > > managed to write to it afterwards. > > > > CC: Andrew Morton > > CC: linux...@kvack.org > > Signed-off-by: Sergei Trofimovich > > --- > > mm/page_poison.c | 6 -- > > 1 file changed, 4 insertions(+), 2 deletions(-) > > > > diff --git a/mm/page_poison.c b/mm/page_poison.c > > index 65cdf844c8ad..ef2a1eab13d7 100644 > > --- a/mm/page_poison.c > > +++ b/mm/page_poison.c > > @@ -4,6 +4,7 @@ > > #include > > #include > > #include > > +#include > > #include > > #include > > #include > > @@ -45,7 +46,7 @@ static bool single_bit_flip(unsigned char a, unsigned > > char b) > > return error && !(error & (error - 1)); > > } > > > > -static void check_poison_mem(unsigned char *mem, size_t bytes) > > +static void check_poison_mem(struct page *page, unsigned char *mem, size_t > > bytes) > > { > > static DEFINE_RATELIMIT_STATE(ratelimit, 5 * HZ, 10); > > unsigned char *start; > > @@ -70,6 +71,7 @@ static void check_poison_mem(unsigned char *mem, size_t > > bytes) > > print_hex_dump(KERN_ERR, "", DUMP_PREFIX_ADDRESS, 16, 1, start, > > end - start + 1, 1); > > dump_stack(); > > + dump_page_owner(page); > > OK but why not a full dump_page()? Oh, I did not know it existed! Looks even better. Will send a v2 with dump_page(). > > } > > > > static void unpoison_page(struct page *page) > > @@ -82,7 +84,7 @@ static void unpoison_page(struct page *page) > > * that is freed to buddy. Thus no extra check is done to > > * see if a page was poisoned. > > */ > > - check_poison_mem(addr, PAGE_SIZE); > > + check_poison_mem(page, addr, PAGE_SIZE); > > kunmap_atomic(addr); > > } > > > > > -- Sergei
Re: [PATCH] mm: page_poison: print page owner info when corruption is caught
On 4/4/21 4:17 PM, Sergei Trofimovich wrote: > When page_poison detects page corruption it's useful to see who > freed a page recently to have a guess where write-after-free > corruption happens. > > After this change corruption report has extra page_owner data. > Example report from real corruption: > > pagealloc: memory corruption > e0014cd61d10: 11 00 00 00 00 00 00 00 30 1d d2 ff ff 0f 00 60 > e0014cd61d20: b0 1d d2 ff ff 0f 00 60 90 fe 1c 00 08 00 00 20 > ... > CPU: 1 PID: 220402 Comm: cc1plus Not tainted > 5.12.0-rc5-00107-g9720c6f59ecf #245 > Hardware name: hp server rx3600, BIOS 04.03 04/08/2008 > ... > Call Trace: > [] show_stack+0x90/0xc0 > [] dump_stack+0x150/0x1c0 > [] __kernel_unpoison_pages+0x410/0x440 > [] get_page_from_freelist+0x1460/0x2ca0 > [] __alloc_pages_nodemask+0x3c0/0x660 > [] alloc_pages_vma+0xb0/0x500 > [] __handle_mm_fault+0x1230/0x1fe0 > [] handle_mm_fault+0x310/0x4e0 > [] ia64_do_page_fault+0x1f0/0xb80 > [] ia64_leave_kernel+0x0/0x270 > page_owner tracks the page as freed > page allocated via order 0, migratetype Movable, > gfp_mask 0x100dca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), pid 37, ts > 8173444098740 > __reset_page_owner+0x40/0x200 > free_pcp_prepare+0x4d0/0x600 > free_unref_page+0x20/0x1c0 > __put_page+0x110/0x1a0 > migrate_pages+0x16d0/0x1dc0 > compact_zone+0xfc0/0x1aa0 > proactive_compact_node+0xd0/0x1e0 > kcompactd+0x550/0x600 > kthread+0x2c0/0x2e0 > call_payload+0x50/0x80 > > Here we can see that page was freed by page migration but something > managed to write to it afterwards. > > CC: Andrew Morton > CC: linux...@kvack.org > Signed-off-by: Sergei Trofimovich > --- > mm/page_poison.c | 6 -- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/mm/page_poison.c b/mm/page_poison.c > index 65cdf844c8ad..ef2a1eab13d7 100644 > --- a/mm/page_poison.c > +++ b/mm/page_poison.c > @@ -4,6 +4,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -45,7 +46,7 @@ static bool single_bit_flip(unsigned char a, unsigned char > b) > return error && !(error & (error - 1)); > } > > -static void check_poison_mem(unsigned char *mem, size_t bytes) > +static void check_poison_mem(struct page *page, unsigned char *mem, size_t > bytes) > { > static DEFINE_RATELIMIT_STATE(ratelimit, 5 * HZ, 10); > unsigned char *start; > @@ -70,6 +71,7 @@ static void check_poison_mem(unsigned char *mem, size_t > bytes) > print_hex_dump(KERN_ERR, "", DUMP_PREFIX_ADDRESS, 16, 1, start, > end - start + 1, 1); > dump_stack(); > + dump_page_owner(page); OK but why not a full dump_page()? > } > > static void unpoison_page(struct page *page) > @@ -82,7 +84,7 @@ static void unpoison_page(struct page *page) >* that is freed to buddy. Thus no extra check is done to >* see if a page was poisoned. >*/ > - check_poison_mem(addr, PAGE_SIZE); > + check_poison_mem(page, addr, PAGE_SIZE); > kunmap_atomic(addr); > } > >
[PATCH 5.11 088/152] drm/amdkfd: dqm fence memory corruption
From: Qu Huang commit e92049ae4548ba09e53eaa9c8f6964b07ea274c9 upstream. Amdgpu driver uses 4-byte data type as DQM fence memory, and transmits GPU address of fence memory to microcode through query status PM4 message. However, query status PM4 message definition and microcode processing are all processed according to 8 bytes. Fence memory only allocates 4 bytes of memory, but microcode does write 8 bytes of memory, so there is a memory corruption. Changes since v1: * Change dqm->fence_addr as a u64 pointer to fix this issue, also fix up query_status and amdkfd_fence_wait_timeout function uses 64 bit fence value to make them consistent. Signed-off-by: Qu Huang Reviewed-by: Felix Kuehling Signed-off-by: Felix Kuehling Signed-off-by: Alex Deucher Cc: sta...@vger.kernel.org Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c |2 +- drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c |6 +++--- drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h |2 +- drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c |2 +- drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c|2 +- drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_vi.c|2 +- drivers/gpu/drm/amd/amdkfd/kfd_priv.h |8 7 files changed, 12 insertions(+), 12 deletions(-) --- a/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c @@ -155,7 +155,7 @@ static int dbgdev_diq_submit_ib(struct k /* Wait till CP writes sync code: */ status = amdkfd_fence_wait_timeout( - (unsigned int *) rm_state, + rm_state, QUEUESTATE__ACTIVE, 1500); kfd_gtt_sa_free(dbgdev->dev, mem_obj); --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c @@ -1167,7 +1167,7 @@ static int start_cpsch(struct device_que if (retval) goto fail_allocate_vidmem; - dqm->fence_addr = dqm->fence_mem->cpu_ptr; + dqm->fence_addr = (uint64_t *)dqm->fence_mem->cpu_ptr; dqm->fence_gpu_addr = dqm->fence_mem->gpu_addr; init_interrupts(dqm); @@ -1340,8 +1340,8 @@ out: return retval; } -int amdkfd_fence_wait_timeout(unsigned int *fence_addr, - unsigned int fence_value, +int amdkfd_fence_wait_timeout(uint64_t *fence_addr, + uint64_t fence_value, unsigned int timeout_ms) { unsigned long end_jiffies = msecs_to_jiffies(timeout_ms) + jiffies; --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h @@ -192,7 +192,7 @@ struct device_queue_manager { uint16_tvmid_pasid[VMID_NUM]; uint64_tpipelines_addr; uint64_tfence_gpu_addr; - unsigned int*fence_addr; + uint64_t*fence_addr; struct kfd_mem_obj *fence_mem; boolactive_runlist; int sched_policy; --- a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c @@ -347,7 +347,7 @@ fail_create_runlist_ib: } int pm_send_query_status(struct packet_manager *pm, uint64_t fence_address, - uint32_t fence_value) + uint64_t fence_value) { uint32_t *buffer, size; int retval = 0; --- a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c @@ -283,7 +283,7 @@ static int pm_unmap_queues_v9(struct pac } static int pm_query_status_v9(struct packet_manager *pm, uint32_t *buffer, - uint64_t fence_address, uint32_t fence_value) + uint64_t fence_address, uint64_t fence_value) { struct pm4_mes_query_status *packet; --- a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_vi.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_vi.c @@ -263,7 +263,7 @@ static int pm_unmap_queues_vi(struct pac } static int pm_query_status_vi(struct packet_manager *pm, uint32_t *buffer, - uint64_t fence_address, uint32_t fence_value) + uint64_t fence_address, uint64_t fence_value) { struct pm4_mes_query_status *packet; --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h @@ -1003,8 +1003,8 @@ int pqm_get_wave_state(struct process_qu u32 *ctl_stack_used_size, u32 *save_area_used_size); -int amdkfd_fence_wait_timeout(unsigned int *fence_addr, - unsigned int fence_value, +int amdkfd_fence_wait_timeout(uint64_t *fence_addr, + uint64_t fence_value,
[PATCH 5.10 073/126] drm/amdkfd: dqm fence memory corruption
From: Qu Huang commit e92049ae4548ba09e53eaa9c8f6964b07ea274c9 upstream. Amdgpu driver uses 4-byte data type as DQM fence memory, and transmits GPU address of fence memory to microcode through query status PM4 message. However, query status PM4 message definition and microcode processing are all processed according to 8 bytes. Fence memory only allocates 4 bytes of memory, but microcode does write 8 bytes of memory, so there is a memory corruption. Changes since v1: * Change dqm->fence_addr as a u64 pointer to fix this issue, also fix up query_status and amdkfd_fence_wait_timeout function uses 64 bit fence value to make them consistent. Signed-off-by: Qu Huang Reviewed-by: Felix Kuehling Signed-off-by: Felix Kuehling Signed-off-by: Alex Deucher Cc: sta...@vger.kernel.org Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c |2 +- drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c |6 +++--- drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h |2 +- drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c |2 +- drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c|2 +- drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_vi.c|2 +- drivers/gpu/drm/amd/amdkfd/kfd_priv.h |8 7 files changed, 12 insertions(+), 12 deletions(-) --- a/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c @@ -155,7 +155,7 @@ static int dbgdev_diq_submit_ib(struct k /* Wait till CP writes sync code: */ status = amdkfd_fence_wait_timeout( - (unsigned int *) rm_state, + rm_state, QUEUESTATE__ACTIVE, 1500); kfd_gtt_sa_free(dbgdev->dev, mem_obj); --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c @@ -1167,7 +1167,7 @@ static int start_cpsch(struct device_que if (retval) goto fail_allocate_vidmem; - dqm->fence_addr = dqm->fence_mem->cpu_ptr; + dqm->fence_addr = (uint64_t *)dqm->fence_mem->cpu_ptr; dqm->fence_gpu_addr = dqm->fence_mem->gpu_addr; init_interrupts(dqm); @@ -1340,8 +1340,8 @@ out: return retval; } -int amdkfd_fence_wait_timeout(unsigned int *fence_addr, - unsigned int fence_value, +int amdkfd_fence_wait_timeout(uint64_t *fence_addr, + uint64_t fence_value, unsigned int timeout_ms) { unsigned long end_jiffies = msecs_to_jiffies(timeout_ms) + jiffies; --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h @@ -192,7 +192,7 @@ struct device_queue_manager { uint16_tvmid_pasid[VMID_NUM]; uint64_tpipelines_addr; uint64_tfence_gpu_addr; - unsigned int*fence_addr; + uint64_t*fence_addr; struct kfd_mem_obj *fence_mem; boolactive_runlist; int sched_policy; --- a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c @@ -345,7 +345,7 @@ fail_create_runlist_ib: } int pm_send_query_status(struct packet_manager *pm, uint64_t fence_address, - uint32_t fence_value) + uint64_t fence_value) { uint32_t *buffer, size; int retval = 0; --- a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c @@ -283,7 +283,7 @@ static int pm_unmap_queues_v9(struct pac } static int pm_query_status_v9(struct packet_manager *pm, uint32_t *buffer, - uint64_t fence_address, uint32_t fence_value) + uint64_t fence_address, uint64_t fence_value) { struct pm4_mes_query_status *packet; --- a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_vi.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_vi.c @@ -263,7 +263,7 @@ static int pm_unmap_queues_vi(struct pac } static int pm_query_status_vi(struct packet_manager *pm, uint32_t *buffer, - uint64_t fence_address, uint32_t fence_value) + uint64_t fence_address, uint64_t fence_value) { struct pm4_mes_query_status *packet; --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h @@ -1006,8 +1006,8 @@ int pqm_get_wave_state(struct process_qu u32 *ctl_stack_used_size, u32 *save_area_used_size); -int amdkfd_fence_wait_timeout(unsigned int *fence_addr, - unsigned int fence_value, +int amdkfd_fence_wait_timeout(uint64_t *fence_addr, + uint64_t fence_value,
[PATCH] mm: page_poison: print page owner info when corruption is caught
When page_poison detects page corruption it's useful to see who freed a page recently to have a guess where write-after-free corruption happens. After this change corruption report has extra page_owner data. Example report from real corruption: pagealloc: memory corruption e0014cd61d10: 11 00 00 00 00 00 00 00 30 1d d2 ff ff 0f 00 60 e0014cd61d20: b0 1d d2 ff ff 0f 00 60 90 fe 1c 00 08 00 00 20 ... CPU: 1 PID: 220402 Comm: cc1plus Not tainted 5.12.0-rc5-00107-g9720c6f59ecf #245 Hardware name: hp server rx3600, BIOS 04.03 04/08/2008 ... Call Trace: [] show_stack+0x90/0xc0 [] dump_stack+0x150/0x1c0 [] __kernel_unpoison_pages+0x410/0x440 [] get_page_from_freelist+0x1460/0x2ca0 [] __alloc_pages_nodemask+0x3c0/0x660 [] alloc_pages_vma+0xb0/0x500 [] __handle_mm_fault+0x1230/0x1fe0 [] handle_mm_fault+0x310/0x4e0 [] ia64_do_page_fault+0x1f0/0xb80 [] ia64_leave_kernel+0x0/0x270 page_owner tracks the page as freed page allocated via order 0, migratetype Movable, gfp_mask 0x100dca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), pid 37, ts 8173444098740 __reset_page_owner+0x40/0x200 free_pcp_prepare+0x4d0/0x600 free_unref_page+0x20/0x1c0 __put_page+0x110/0x1a0 migrate_pages+0x16d0/0x1dc0 compact_zone+0xfc0/0x1aa0 proactive_compact_node+0xd0/0x1e0 kcompactd+0x550/0x600 kthread+0x2c0/0x2e0 call_payload+0x50/0x80 Here we can see that page was freed by page migration but something managed to write to it afterwards. CC: Andrew Morton CC: linux...@kvack.org Signed-off-by: Sergei Trofimovich --- mm/page_poison.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/mm/page_poison.c b/mm/page_poison.c index 65cdf844c8ad..ef2a1eab13d7 100644 --- a/mm/page_poison.c +++ b/mm/page_poison.c @@ -4,6 +4,7 @@ #include #include #include +#include #include #include #include @@ -45,7 +46,7 @@ static bool single_bit_flip(unsigned char a, unsigned char b) return error && !(error & (error - 1)); } -static void check_poison_mem(unsigned char *mem, size_t bytes) +static void check_poison_mem(struct page *page, unsigned char *mem, size_t bytes) { static DEFINE_RATELIMIT_STATE(ratelimit, 5 * HZ, 10); unsigned char *start; @@ -70,6 +71,7 @@ static void check_poison_mem(unsigned char *mem, size_t bytes) print_hex_dump(KERN_ERR, "", DUMP_PREFIX_ADDRESS, 16, 1, start, end - start + 1, 1); dump_stack(); + dump_page_owner(page); } static void unpoison_page(struct page *page) @@ -82,7 +84,7 @@ static void unpoison_page(struct page *page) * that is freed to buddy. Thus no extra check is done to * see if a page was poisoned. */ - check_poison_mem(addr, PAGE_SIZE); + check_poison_mem(page, addr, PAGE_SIZE); kunmap_atomic(addr); } -- 2.31.1
[PATCH 5/5] mm/userfaultfd: fix memory corruption due to writeprotect
From: Nadav Amit Userfaultfd self-test fails occasionally, indicating a memory corruption. Analyzing this problem indicates that there is a real bug since mmap_lock is only taken for read in mwriteprotect_range() and defers flushes, and since there is insufficient consideration of concurrent deferred TLB flushes in wp_page_copy(). Although the PTE is flushed from the TLBs in wp_page_copy(), this flush takes place after the copy has already been performed, and therefore changes of the page are possible between the time of the copy and the time in which the PTE is flushed. To make matters worse, memory-unprotection using userfaultfd also poses a problem. Although memory unprotection is logically a promotion of PTE permissions, and therefore should not require a TLB flush, the current userrfaultfd code might actually cause a demotion of the architectural PTE permission: when userfaultfd_writeprotect() unprotects memory region, it unintentionally *clears* the RW-bit if it was already set. Note that this unprotecting a PTE that is not write-protected is a valid use-case: the userfaultfd monitor might ask to unprotect a region that holds both write-protected and write-unprotected PTEs. The scenario that happens in selftests/vm/userfaultfd is as follows: cpu0cpu1cpu2 [ Writable PTE cached in TLB ] userfaultfd_writeprotect() [ write-*unprotect* ] mwriteprotect_range() mmap_read_lock() change_protection() change_protection_range() ... change_pte_range() [ *clear* “write”-bit ] [ defer TLB flushes ] [ page-fault ] ... wp_page_copy() cow_user_page() [ copy page ] [ write to old page ] ... set_pte_at_notify() A similar scenario can happen: cpu0cpu1cpu2cpu3 [ Writable PTE cached in TLB ] userfaultfd_writeprotect() [ write-protect ] [ deferred TLB flush ] userfaultfd_writeprotect() [ write-unprotect ] [ deferred TLB flush] [ page-fault ] wp_page_copy() cow_user_page() [ copy page ] ...[ write to page ] set_pte_at_notify() This race exists since commit 292924b26024 ("userfaultfd: wp: apply _PAGE_UFFD_WP bit"). Yet, as Yu Zhao pointed, these races became apparent since commit 09854ba94c6a ("mm: do_wp_page() simplification") which made wp_page_copy() more likely to take place, specifically if page_count(page) > 1. To resolve the aforementioned races, check whether there are pending flushes on uffd-write-protected VMAs, and if there are, perform a flush before doing the COW. Further optimizations will follow to avoid during uffd-write-unprotect unnecassary PTE write-protection and TLB flushes. Link: https://lkml.kernel.org/r/20210304095423.3825684-1-na...@vmware.com Fixes: 09854ba94c6a ("mm: do_wp_page() simplification") Signed-off-by: Nadav Amit Suggested-by: Yu Zhao Reviewed-by: Peter Xu Tested-by: Peter Xu Cc: Andrea Arcangeli Cc: Andy Lutomirski Cc: Pavel Emelyanov Cc: Mike Kravetz Cc: Mike Rapoport Cc: Minchan Kim Cc: Will Deacon Cc: Peter Zijlstra Cc: [5.9+] Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/memory.c | 8 1 file changed, 8 insertions(+) diff --git a/mm/memory.c b/mm/memory.c index 656d90a75cf8..fe6e92de9bec 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2825,6 +2825,14 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf) { struct vm_area_struct *vma = vmf->vma; + /* +* Userfaultfd write-protect can defer flushes. Ensure the TLB +* is flushed in this case before copying. +*/ + if (unlikely(userfaultfd_wp(vmf->vma) && +mm_tlb_flush_pending(vmf->vma->vm_mm))) + flush_tlb_page(vmf->vma, vmf->address); + vmf->page = vm_normal_page(vma, vmf->address, vmf->orig_pte); if (!vmf->page) { /* -- 2.31.0.291.g576ba9dcdaf-goog
[PATCH 5/5] mm/userfaultfd: fix memory corruption due to writeprotect
From: Nadav Amit Userfaultfd self-test fails occasionally, indicating a memory corruption. Analyzing this problem indicates that there is a real bug since mmap_lock is only taken for read in mwriteprotect_range() and defers flushes, and since there is insufficient consideration of concurrent deferred TLB flushes in wp_page_copy(). Although the PTE is flushed from the TLBs in wp_page_copy(), this flush takes place after the copy has already been performed, and therefore changes of the page are possible between the time of the copy and the time in which the PTE is flushed. To make matters worse, memory-unprotection using userfaultfd also poses a problem. Although memory unprotection is logically a promotion of PTE permissions, and therefore should not require a TLB flush, the current userrfaultfd code might actually cause a demotion of the architectural PTE permission: when userfaultfd_writeprotect() unprotects memory region, it unintentionally *clears* the RW-bit if it was already set. Note that this unprotecting a PTE that is not write-protected is a valid use-case: the userfaultfd monitor might ask to unprotect a region that holds both write-protected and write-unprotected PTEs. The scenario that happens in selftests/vm/userfaultfd is as follows: cpu0cpu1cpu2 [ Writable PTE cached in TLB ] userfaultfd_writeprotect() [ write-*unprotect* ] mwriteprotect_range() mmap_read_lock() change_protection() change_protection_range() ... change_pte_range() [ *clear* “write”-bit ] [ defer TLB flushes ] [ page-fault ] ... wp_page_copy() cow_user_page() [ copy page ] [ write to old page ] ... set_pte_at_notify() A similar scenario can happen: cpu0cpu1cpu2cpu3 [ Writable PTE cached in TLB ] userfaultfd_writeprotect() [ write-protect ] [ deferred TLB flush ] userfaultfd_writeprotect() [ write-unprotect ] [ deferred TLB flush] [ page-fault ] wp_page_copy() cow_user_page() [ copy page ] ...[ write to page ] set_pte_at_notify() This race exists since commit 292924b26024 ("userfaultfd: wp: apply _PAGE_UFFD_WP bit"). Yet, as Yu Zhao pointed, these races became apparent since commit 09854ba94c6a ("mm: do_wp_page() simplification") which made wp_page_copy() more likely to take place, specifically if page_count(page) > 1. To resolve the aforementioned races, check whether there are pending flushes on uffd-write-protected VMAs, and if there are, perform a flush before doing the COW. Further optimizations will follow to avoid during uffd-write-unprotect unnecassary PTE write-protection and TLB flushes. Link: https://lkml.kernel.org/r/20210304095423.3825684-1-na...@vmware.com Fixes: 09854ba94c6a ("mm: do_wp_page() simplification") Signed-off-by: Nadav Amit Suggested-by: Yu Zhao Reviewed-by: Peter Xu Tested-by: Peter Xu Cc: Andrea Arcangeli Cc: Andy Lutomirski Cc: Pavel Emelyanov Cc: Mike Kravetz Cc: Mike Rapoport Cc: Minchan Kim Cc: Will Deacon Cc: Peter Zijlstra Cc: [5.9+] Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/memory.c | 8 1 file changed, 8 insertions(+) diff --git a/mm/memory.c b/mm/memory.c index 14470ceaf3f2..3f33651a2a39 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2810,6 +2810,14 @@ static int do_wp_page(struct vm_fault *vmf) { struct vm_area_struct *vma = vmf->vma; + /* +* Userfaultfd write-protect can defer flushes. Ensure the TLB +* is flushed in this case before copying. +*/ + if (unlikely(userfaultfd_wp(vmf->vma) && +mm_tlb_flush_pending(vmf->vma->vm_mm))) + flush_tlb_page(vmf->vma, vmf->address); + vmf->page = vm_normal_page(vma, vmf->address, vmf->orig_pte); if (!vmf->page) { /* -- 2.31.0.291.g576ba9dcdaf-goog
Re: [PATCH] drm/amdkfd: dqm fence memory corruption
Am 2021-03-26 um 5:38 a.m. schrieb Qu Huang: > On 2021/1/28 5:50, Felix Kuehling wrote: >> Am 2021-01-27 um 7:33 a.m. schrieb Qu Huang: >>> Amdgpu driver uses 4-byte data type as DQM fence memory, >>> and transmits GPU address of fence memory to microcode >>> through query status PM4 message. However, query status >>> PM4 message definition and microcode processing are all >>> processed according to 8 bytes. Fence memory only allocates >>> 4 bytes of memory, but microcode does write 8 bytes of memory, >>> so there is a memory corruption. >> >> Thank you for pointing out that discrepancy. That's a good catch! >> >> I'd prefer to fix this properly by making dqm->fence_addr a u64 pointer. >> We should probably also fix up the query_status and >> amdkfd_fence_wait_timeout function interfaces to use a 64 bit fence >> values everywhere to be consistent. >> >> Regards, >> Felix > Hi Felix, Thanks for your advice, please check v2 at > https://lore.kernel.org/patchwork/patch/1372584/ Thank you for the reminder. I somehow missed your v2 patch on the mailing list. I have reviewed and applied it to amd-staging-drm-next now. Regards, Felix > Thanks, > Qu. >> >> >>> >>> Signed-off-by: Qu Huang >>> --- >>> drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c >>> b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c >>> index e686ce2..8b38d0c 100644 >>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c >>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c >>> @@ -1161,7 +1161,7 @@ static int start_cpsch(struct >>> device_queue_manager *dqm) >>> pr_debug("Allocating fence memory\n"); >>> /* allocate fence memory on the gart */ >>> - retval = kfd_gtt_sa_allocate(dqm->dev, sizeof(*dqm->fence_addr), >>> + retval = kfd_gtt_sa_allocate(dqm->dev, sizeof(uint64_t), >>> &dqm->fence_mem); >>> if (retval) >
Re: [PATCH] drm/amdkfd: dqm fence memory corruption
On 2021/1/28 5:50, Felix Kuehling wrote: Am 2021-01-27 um 7:33 a.m. schrieb Qu Huang: Amdgpu driver uses 4-byte data type as DQM fence memory, and transmits GPU address of fence memory to microcode through query status PM4 message. However, query status PM4 message definition and microcode processing are all processed according to 8 bytes. Fence memory only allocates 4 bytes of memory, but microcode does write 8 bytes of memory, so there is a memory corruption. Thank you for pointing out that discrepancy. That's a good catch! I'd prefer to fix this properly by making dqm->fence_addr a u64 pointer. We should probably also fix up the query_status and amdkfd_fence_wait_timeout function interfaces to use a 64 bit fence values everywhere to be consistent. Regards, Felix Hi Felix, Thanks for your advice, please check v2 at https://lore.kernel.org/patchwork/patch/1372584/ Thanks, Qu. Signed-off-by: Qu Huang --- drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c index e686ce2..8b38d0c 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c @@ -1161,7 +1161,7 @@ static int start_cpsch(struct device_queue_manager *dqm) pr_debug("Allocating fence memory\n"); /* allocate fence memory on the gart */ - retval = kfd_gtt_sa_allocate(dqm->dev, sizeof(*dqm->fence_addr), + retval = kfd_gtt_sa_allocate(dqm->dev, sizeof(uint64_t), &dqm->fence_mem); if (retval)
Re: [PATCH] mm: cma: fix corruption cma_sysfs_alloc_pages_count
On Wed, Mar 24, 2021 at 11:02:47PM +0300, Dmitry Osipenko wrote: > 24.03.2021 22:57, Minchan Kim пишет: > > On Wed, Mar 24, 2021 at 10:49:58PM +0300, Dmitry Osipenko wrote: > >> 24.03.2021 22:43, Dmitry Osipenko пишет: > >>> 24.03.2021 22:20, Minchan Kim пишет: > static int __init cma_sysfs_init(void) > { > -int i = 0; > +struct kobject *cma_kobj_root; > +struct cma_kobject *cma_kobj; > struct cma *cma; > +unsigned int i; > >>> > while (--i >= 0) { > >>> > >>> Do you realize that this doesn't work anymore? > >>> > cma = &cma_areas[i]; > -kobject_put(&cma->stat->kobj); > -} > > -kfree(cma_stats); > -kobject_put(cma_kobj); > +kobject_put(&cma->cma_kobj->kobj); > +kfree(cma->cma_kobj); > >>> > >>> Freeing a null pointer? > >>> > +cma->cma_kobj = NULL; > +} > +kobject_put(cma_kobj_root); > >>> > >> > >> Please try to simulate the errors and check that error path is working > >> properly in the next version. > >> > >> Alternatively, we could remove the cma_kobj_release entirely, like Greg > >> suggested previously, and then don't care about cleaning up at all. > > > > Does he suggested it to remove cma_kobj_release?(Initially, I did but > > was rejected from Greg) > > > > Alright, I haven't followed the previous threads fully and only saw the > reply where he suggested to removed it. No problem. I just posted it new version. Hopefully, it tastes good for you. ;-) Thanks for the review!
Re: [PATCH] mm: cma: fix corruption cma_sysfs_alloc_pages_count
24.03.2021 22:57, Minchan Kim пишет: > On Wed, Mar 24, 2021 at 10:49:58PM +0300, Dmitry Osipenko wrote: >> 24.03.2021 22:43, Dmitry Osipenko пишет: >>> 24.03.2021 22:20, Minchan Kim пишет: static int __init cma_sysfs_init(void) { - int i = 0; + struct kobject *cma_kobj_root; + struct cma_kobject *cma_kobj; struct cma *cma; + unsigned int i; >>> while (--i >= 0) { >>> >>> Do you realize that this doesn't work anymore? >>> cma = &cma_areas[i]; - kobject_put(&cma->stat->kobj); - } - kfree(cma_stats); - kobject_put(cma_kobj); + kobject_put(&cma->cma_kobj->kobj); + kfree(cma->cma_kobj); >>> >>> Freeing a null pointer? >>> + cma->cma_kobj = NULL; + } + kobject_put(cma_kobj_root); >>> >> >> Please try to simulate the errors and check that error path is working >> properly in the next version. >> >> Alternatively, we could remove the cma_kobj_release entirely, like Greg >> suggested previously, and then don't care about cleaning up at all. > > Does he suggested it to remove cma_kobj_release?(Initially, I did but > was rejected from Greg) > Alright, I haven't followed the previous threads fully and only saw the reply where he suggested to removed it.
Re: [PATCH] mm: cma: fix corruption cma_sysfs_alloc_pages_count
On Wed, Mar 24, 2021 at 10:49:58PM +0300, Dmitry Osipenko wrote: > 24.03.2021 22:43, Dmitry Osipenko пишет: > > 24.03.2021 22:20, Minchan Kim пишет: > >> static int __init cma_sysfs_init(void) > >> { > >> - int i = 0; > >> + struct kobject *cma_kobj_root; > >> + struct cma_kobject *cma_kobj; > >>struct cma *cma; > >> + unsigned int i; > > > >>while (--i >= 0) { > > > > Do you realize that this doesn't work anymore? > > > >>cma = &cma_areas[i]; > >> - kobject_put(&cma->stat->kobj); > >> - } > >> > >> - kfree(cma_stats); > >> - kobject_put(cma_kobj); > >> + kobject_put(&cma->cma_kobj->kobj); > >> + kfree(cma->cma_kobj); > > > > Freeing a null pointer? > > > >> + cma->cma_kobj = NULL; > >> + } > >> + kobject_put(cma_kobj_root); > > > > Please try to simulate the errors and check that error path is working > properly in the next version. > > Alternatively, we could remove the cma_kobj_release entirely, like Greg > suggested previously, and then don't care about cleaning up at all. Does he suggested it to remove cma_kobj_release?(Initially, I did but was rejected from Greg)
Re: [PATCH] mm: cma: fix corruption cma_sysfs_alloc_pages_count
On Wed, Mar 24, 2021 at 08:53:26PM +0100, David Hildenbrand wrote: > On 24.03.21 20:45, John Hubbard wrote: > > On 3/24/21 12:20 PM, Minchan Kim wrote: > > > struct cma_stat's lifespan for cma_sysfs is different with > > > struct cma because kobject for sysfs requires dynamic object > > > while CMA is static object[1]. When CMA is initialized, > > > it couldn't use slab to allocate cma_stat since slab was not > > > initialized yet. Thus, it allocates the dynamic object > > > in subsys_initcall. > > > > > > However, the cma allocation can happens before subsys_initcall > > > then, it goes crash. > > > > > > Dmitry reported[2]: > > > > > > .. > > > [1.226190] [] (cma_sysfs_alloc_pages_count) from > > > [] (cma_alloc+0x153/0x274) > > > [1.226720] [] (cma_alloc) from [] > > > (__alloc_from_contiguous+0x37/0x8c) > > > [1.227272] [] (__alloc_from_contiguous) from [] > > > (atomic_pool_init+0x7b/0x126) > > > [1.233596] [] (atomic_pool_init) from [] > > > (do_one_initcall+0x45/0x1e4) > > > [1.234188] [] (do_one_initcall) from [] > > > (kernel_init_freeable+0x157/0x1a6) > > > [1.234741] [] (kernel_init_freeable) from [] > > > (kernel_init+0xd/0xe0) > > > [1.235289] [] (kernel_init) from [] > > > (ret_from_fork+0x11/0x1c) > > > > > > This patch moves those statistic fields of cma_stat into struct cma > > > and introduces cma_kobject wrapper to follow kobject's rule. > > > > > > At the same time, it fixes other routines based on suggestions[3][4]. > > > > > > [1] https://lore.kernel.org/linux-mm/ycoamxqt6dzkc...@kroah.com/ > > > [2] > > > https://lore.kernel.org/linux-mm/fead70a2-4330-79ff-e79a-d8511eab1...@gmail.com/ > > > [3] > > > https://lore.kernel.org/linux-mm/20210323195050.2577017-1-minc...@kernel.org/ > > > [4] > > > https://lore.kernel.org/linux-mm/20210324010547.4134370-1-minc...@kernel.org/ > > > > > > Reported-by: Dmitry Osipenko > > > Tested-by: Dmitry Osipenko > > > Suggested-by: Dmitry Osipenko > > > Suggested-by: John Hubbard > > > Suggested-by: Matthew Wilcox > > > Signed-off-by: Minchan Kim > > > --- > > > I belive it's worth to have separate patch rather than replacing > > > original patch. It will also help to merge without conflict > > > since we already filed other patch based on it. > > > Strictly speaking, separating fix part and readbility part > > > in this patch would be better but it's gray to separate them > > > since most code in this patch was done while we were fixing > > > the bug. Since we don't release it yet, I hope it will work. > > > Otherwise, I can send a replacement patch inclucing all of > > > changes happend until now with gathering SoB. > > > > If we still have a choice, we should not merge a patch that has a known > > serious problem, such as a crash. That's only done if the broken problematic > > patch has already been committed to a tree that doesn't allow rebasing, > > such as of course the main linux.git. > > > > Here, I *think* it's just in linux-next and mmotm, so we still are allowed > > to fix the original patch. > > Yes, that's what we should do in case it's not upstream yet. Clean resend + > re-apply. Okay, let me replace the original one including all other patches.
Re: [PATCH] mm: cma: fix corruption cma_sysfs_alloc_pages_count
On 24.03.21 20:45, John Hubbard wrote: On 3/24/21 12:20 PM, Minchan Kim wrote: struct cma_stat's lifespan for cma_sysfs is different with struct cma because kobject for sysfs requires dynamic object while CMA is static object[1]. When CMA is initialized, it couldn't use slab to allocate cma_stat since slab was not initialized yet. Thus, it allocates the dynamic object in subsys_initcall. However, the cma allocation can happens before subsys_initcall then, it goes crash. Dmitry reported[2]: .. [1.226190] [] (cma_sysfs_alloc_pages_count) from [] (cma_alloc+0x153/0x274) [1.226720] [] (cma_alloc) from [] (__alloc_from_contiguous+0x37/0x8c) [1.227272] [] (__alloc_from_contiguous) from [] (atomic_pool_init+0x7b/0x126) [1.233596] [] (atomic_pool_init) from [] (do_one_initcall+0x45/0x1e4) [1.234188] [] (do_one_initcall) from [] (kernel_init_freeable+0x157/0x1a6) [1.234741] [] (kernel_init_freeable) from [] (kernel_init+0xd/0xe0) [1.235289] [] (kernel_init) from [] (ret_from_fork+0x11/0x1c) This patch moves those statistic fields of cma_stat into struct cma and introduces cma_kobject wrapper to follow kobject's rule. At the same time, it fixes other routines based on suggestions[3][4]. [1] https://lore.kernel.org/linux-mm/ycoamxqt6dzkc...@kroah.com/ [2] https://lore.kernel.org/linux-mm/fead70a2-4330-79ff-e79a-d8511eab1...@gmail.com/ [3] https://lore.kernel.org/linux-mm/20210323195050.2577017-1-minc...@kernel.org/ [4] https://lore.kernel.org/linux-mm/20210324010547.4134370-1-minc...@kernel.org/ Reported-by: Dmitry Osipenko Tested-by: Dmitry Osipenko Suggested-by: Dmitry Osipenko Suggested-by: John Hubbard Suggested-by: Matthew Wilcox Signed-off-by: Minchan Kim --- I belive it's worth to have separate patch rather than replacing original patch. It will also help to merge without conflict since we already filed other patch based on it. Strictly speaking, separating fix part and readbility part in this patch would be better but it's gray to separate them since most code in this patch was done while we were fixing the bug. Since we don't release it yet, I hope it will work. Otherwise, I can send a replacement patch inclucing all of changes happend until now with gathering SoB. If we still have a choice, we should not merge a patch that has a known serious problem, such as a crash. That's only done if the broken problematic patch has already been committed to a tree that doesn't allow rebasing, such as of course the main linux.git. Here, I *think* it's just in linux-next and mmotm, so we still are allowed to fix the original patch. Yes, that's what we should do in case it's not upstream yet. Clean resend + re-apply. -- Thanks, David / dhildenb
Re: [PATCH] mm: cma: fix corruption cma_sysfs_alloc_pages_count
24.03.2021 22:43, Dmitry Osipenko пишет: > 24.03.2021 22:20, Minchan Kim пишет: >> static int __init cma_sysfs_init(void) >> { >> -int i = 0; >> +struct kobject *cma_kobj_root; >> +struct cma_kobject *cma_kobj; >> struct cma *cma; >> +unsigned int i; > >> while (--i >= 0) { > > Do you realize that this doesn't work anymore? > >> cma = &cma_areas[i]; >> -kobject_put(&cma->stat->kobj); >> -} >> >> -kfree(cma_stats); >> -kobject_put(cma_kobj); >> +kobject_put(&cma->cma_kobj->kobj); >> +kfree(cma->cma_kobj); > > Freeing a null pointer? > >> +cma->cma_kobj = NULL; >> +} >> +kobject_put(cma_kobj_root); > Please try to simulate the errors and check that error path is working properly in the next version. Alternatively, we could remove the cma_kobj_release entirely, like Greg suggested previously, and then don't care about cleaning up at all.
Re: [PATCH] mm: cma: fix corruption cma_sysfs_alloc_pages_count
On Wed, Mar 24, 2021 at 10:43:49PM +0300, Dmitry Osipenko wrote: > 24.03.2021 22:20, Minchan Kim пишет: > > static int __init cma_sysfs_init(void) > > { > > - int i = 0; > > + struct kobject *cma_kobj_root; > > + struct cma_kobject *cma_kobj; > > struct cma *cma; > > + unsigned int i; > > > while (--i >= 0) { > > Do you realize that this doesn't work anymore? > > > cma = &cma_areas[i]; > > - kobject_put(&cma->stat->kobj); > > - } > > > > - kfree(cma_stats); > > - kobject_put(cma_kobj); > > + kobject_put(&cma->cma_kobj->kobj); > > + kfree(cma->cma_kobj); > > Freeing a null pointer? Need coffee. diff --git a/mm/cma_sysfs.c b/mm/cma_sysfs.c index a670a80aad6f..73463be08df7 100644 --- a/mm/cma_sysfs.c +++ b/mm/cma_sysfs.c @@ -79,8 +79,7 @@ static int __init cma_sysfs_init(void) struct kobject *cma_kobj_root; struct cma_kobject *cma_kobj; struct cma *cma; - unsigned int i; - int err; + int i, err; cma_kobj_root = kobject_create_and_add("cma", mm_kobj); if (!cma_kobj_root) @@ -108,10 +107,7 @@ static int __init cma_sysfs_init(void) out: while (--i >= 0) { cma = &cma_areas[i]; - kobject_put(&cma->cma_kobj->kobj); - kfree(cma->cma_kobj); - cma->cma_kobj = NULL; } kobject_put(cma_kobj_root);
Re: [PATCH] mm: cma: fix corruption cma_sysfs_alloc_pages_count
On 3/24/21 12:20 PM, Minchan Kim wrote: struct cma_stat's lifespan for cma_sysfs is different with struct cma because kobject for sysfs requires dynamic object while CMA is static object[1]. When CMA is initialized, it couldn't use slab to allocate cma_stat since slab was not initialized yet. Thus, it allocates the dynamic object in subsys_initcall. However, the cma allocation can happens before subsys_initcall then, it goes crash. Dmitry reported[2]: .. [1.226190] [] (cma_sysfs_alloc_pages_count) from [] (cma_alloc+0x153/0x274) [1.226720] [] (cma_alloc) from [] (__alloc_from_contiguous+0x37/0x8c) [1.227272] [] (__alloc_from_contiguous) from [] (atomic_pool_init+0x7b/0x126) [1.233596] [] (atomic_pool_init) from [] (do_one_initcall+0x45/0x1e4) [1.234188] [] (do_one_initcall) from [] (kernel_init_freeable+0x157/0x1a6) [1.234741] [] (kernel_init_freeable) from [] (kernel_init+0xd/0xe0) [1.235289] [] (kernel_init) from [] (ret_from_fork+0x11/0x1c) This patch moves those statistic fields of cma_stat into struct cma and introduces cma_kobject wrapper to follow kobject's rule. At the same time, it fixes other routines based on suggestions[3][4]. [1] https://lore.kernel.org/linux-mm/ycoamxqt6dzkc...@kroah.com/ [2] https://lore.kernel.org/linux-mm/fead70a2-4330-79ff-e79a-d8511eab1...@gmail.com/ [3] https://lore.kernel.org/linux-mm/20210323195050.2577017-1-minc...@kernel.org/ [4] https://lore.kernel.org/linux-mm/20210324010547.4134370-1-minc...@kernel.org/ Reported-by: Dmitry Osipenko Tested-by: Dmitry Osipenko Suggested-by: Dmitry Osipenko Suggested-by: John Hubbard Suggested-by: Matthew Wilcox Signed-off-by: Minchan Kim --- I belive it's worth to have separate patch rather than replacing original patch. It will also help to merge without conflict since we already filed other patch based on it. Strictly speaking, separating fix part and readbility part in this patch would be better but it's gray to separate them since most code in this patch was done while we were fixing the bug. Since we don't release it yet, I hope it will work. Otherwise, I can send a replacement patch inclucing all of changes happend until now with gathering SoB. If we still have a choice, we should not merge a patch that has a known serious problem, such as a crash. That's only done if the broken problematic patch has already been committed to a tree that doesn't allow rebasing, such as of course the main linux.git. Here, I *think* it's just in linux-next and mmotm, so we still are allowed to fix the original patch. thanks, -- John Hubbard NVIDIA
Re: [PATCH] mm: cma: fix corruption cma_sysfs_alloc_pages_count
24.03.2021 22:20, Minchan Kim пишет: > static int __init cma_sysfs_init(void) > { > - int i = 0; > + struct kobject *cma_kobj_root; > + struct cma_kobject *cma_kobj; > struct cma *cma; > + unsigned int i; > while (--i >= 0) { Do you realize that this doesn't work anymore? > cma = &cma_areas[i]; > - kobject_put(&cma->stat->kobj); > - } > > - kfree(cma_stats); > - kobject_put(cma_kobj); > + kobject_put(&cma->cma_kobj->kobj); > + kfree(cma->cma_kobj); Freeing a null pointer? > + cma->cma_kobj = NULL; > + } > + kobject_put(cma_kobj_root);
[PATCH] mm: cma: fix corruption cma_sysfs_alloc_pages_count
struct cma_stat's lifespan for cma_sysfs is different with struct cma because kobject for sysfs requires dynamic object while CMA is static object[1]. When CMA is initialized, it couldn't use slab to allocate cma_stat since slab was not initialized yet. Thus, it allocates the dynamic object in subsys_initcall. However, the cma allocation can happens before subsys_initcall then, it goes crash. Dmitry reported[2]: .. [1.226190] [] (cma_sysfs_alloc_pages_count) from [] (cma_alloc+0x153/0x274) [1.226720] [] (cma_alloc) from [] (__alloc_from_contiguous+0x37/0x8c) [1.227272] [] (__alloc_from_contiguous) from [] (atomic_pool_init+0x7b/0x126) [1.233596] [] (atomic_pool_init) from [] (do_one_initcall+0x45/0x1e4) [1.234188] [] (do_one_initcall) from [] (kernel_init_freeable+0x157/0x1a6) [1.234741] [] (kernel_init_freeable) from [] (kernel_init+0xd/0xe0) [1.235289] [] (kernel_init) from [] (ret_from_fork+0x11/0x1c) This patch moves those statistic fields of cma_stat into struct cma and introduces cma_kobject wrapper to follow kobject's rule. At the same time, it fixes other routines based on suggestions[3][4]. [1] https://lore.kernel.org/linux-mm/ycoamxqt6dzkc...@kroah.com/ [2] https://lore.kernel.org/linux-mm/fead70a2-4330-79ff-e79a-d8511eab1...@gmail.com/ [3] https://lore.kernel.org/linux-mm/20210323195050.2577017-1-minc...@kernel.org/ [4] https://lore.kernel.org/linux-mm/20210324010547.4134370-1-minc...@kernel.org/ Reported-by: Dmitry Osipenko Tested-by: Dmitry Osipenko Suggested-by: Dmitry Osipenko Suggested-by: John Hubbard Suggested-by: Matthew Wilcox Signed-off-by: Minchan Kim --- I belive it's worth to have separate patch rather than replacing original patch. It will also help to merge without conflict since we already filed other patch based on it. Strictly speaking, separating fix part and readbility part in this patch would be better but it's gray to separate them since most code in this patch was done while we were fixing the bug. Since we don't release it yet, I hope it will work. Otherwise, I can send a replacement patch inclucing all of changes happend until now with gathering SoB. mm/cma.c | 4 +-- mm/cma.h | 25 +++--- mm/cma_sysfs.c | 88 -- 3 files changed, 65 insertions(+), 52 deletions(-) diff --git a/mm/cma.c b/mm/cma.c index 90e27458ddb7..08c45157911a 100644 --- a/mm/cma.c +++ b/mm/cma.c @@ -509,11 +509,11 @@ struct page *cma_alloc(struct cma *cma, size_t count, unsigned int align, out: if (page) { count_vm_event(CMA_ALLOC_SUCCESS); - cma_sysfs_alloc_pages_count(cma, count); + cma_sysfs_account_success_pages(cma, count); } else { count_vm_event(CMA_ALLOC_FAIL); if (cma) - cma_sysfs_fail_pages_count(cma, count); + cma_sysfs_account_fail_pages(cma, count); } return page; diff --git a/mm/cma.h b/mm/cma.h index 95d1aa2d808a..37b9b7858c8e 100644 --- a/mm/cma.h +++ b/mm/cma.h @@ -5,12 +5,8 @@ #include #include -struct cma_stat { - spinlock_t lock; - /* the number of CMA page successful allocations */ - unsigned long nr_pages_succeeded; - /* the number of CMA page allocation failures */ - unsigned long nr_pages_failed; +struct cma_kobject { + struct cma *cma; struct kobject kobj; }; @@ -27,7 +23,12 @@ struct cma { #endif char name[CMA_MAX_NAME]; #ifdef CONFIG_CMA_SYSFS - struct cma_stat *stat; + /* the number of CMA page successful allocations */ + atomic64_t nr_pages_succeeded; + /* the number of CMA page allocation failures */ + atomic64_t nr_pages_failed; + /* kobject requires dynamic object */ + struct cma_kobject *cma_kobj; #endif }; @@ -40,10 +41,12 @@ static inline unsigned long cma_bitmap_maxno(struct cma *cma) } #ifdef CONFIG_CMA_SYSFS -void cma_sysfs_alloc_pages_count(struct cma *cma, size_t count); -void cma_sysfs_fail_pages_count(struct cma *cma, size_t count); +void cma_sysfs_account_success_pages(struct cma *cma, unsigned long nr_pages); +void cma_sysfs_account_fail_pages(struct cma *cma, unsigned long nr_pages); #else -static inline void cma_sysfs_alloc_pages_count(struct cma *cma, size_t count) {}; -static inline void cma_sysfs_fail_pages_count(struct cma *cma, size_t count) {}; +static inline void cma_sysfs_account_success_pages(struct cma *cma, + unsigned long nr_pages) {}; +static inline void cma_sysfs_account_fail_pages(struct cma *cma, + unsigned long nr_pages) {}; #endif #endif diff --git a/mm/cma_sysfs.c b/mm/cma_sysfs.c index 3134b2b3a96d..a670a80aad6f 100644 --- a/mm/cma_sysfs.c +++ b/mm/cma_sysfs.c @@ -11,50 +11,54 @@ #include "cma.h" -static struct cma_stat *cma_stats; - -void cma_sysfs
Re: [PATCH 5.10 103/157] MIPS: kernel: Reserve exception base early to prevent corruption
On 3/22/2021 8:00 AM, Greg Kroah-Hartman wrote: > On Mon, Mar 22, 2021 at 07:44:05PM +0530, Naresh Kamboju wrote: >> On Mon, 22 Mar 2021 at 18:14, Greg Kroah-Hartman >> wrote: >>> >>> From: Thomas Bogendoerfer >>> >>> [ Upstream commit bd67b711bfaa02cf19e88aa2d9edae5c1c1d2739 ] >>> >>> BMIPS is one of the few platforms that do change the exception base. >>> After commit 2dcb39645441 ("memblock: do not start bottom-up allocations >>> with kernel_end") we started seeing BMIPS boards fail to boot with the >>> built-in FDT being corrupted. >>> >>> Before the cited commit, early allocations would be in the [kernel_end, >>> RAM_END] range, but after commit they would be within [RAM_START + >>> PAGE_SIZE, RAM_END]. >>> >>> The custom exception base handler that is installed by >>> bmips_ebase_setup() done for BMIPS5000 CPUs ends-up trampling on the >>> memory region allocated by unflatten_and_copy_device_tree() thus >>> corrupting the FDT used by the kernel. >>> >>> To fix this, we need to perform an early reservation of the custom >>> exception space. Additional we reserve the first 4k (1k for R3k) for >>> either normal exception vector space (legacy CPUs) or special vectors >>> like cache exceptions. >>> >>> Huge thanks to Serge for analysing and proposing a solution to this >>> issue. >>> >>> Fixes: 2dcb39645441 ("memblock: do not start bottom-up allocations with >>> kernel_end") >>> Reported-by: Kamal Dasu >>> Debugged-by: Serge Semin >>> Acked-by: Mike Rapoport >>> Tested-by: Florian Fainelli >>> Reviewed-by: Serge Semin >>> Signed-off-by: Thomas Bogendoerfer >>> Signed-off-by: Sasha Levin >>> --- >>> arch/mips/include/asm/traps.h| 3 +++ >>> arch/mips/kernel/cpu-probe.c | 6 ++ >>> arch/mips/kernel/cpu-r3k-probe.c | 3 +++ >>> arch/mips/kernel/traps.c | 10 +- >> >> mipc tinyconfig and allnoconfig builds failed on stable-rc 5.10 branch >> >> make --silent --keep-going --jobs=8 >> O=/home/tuxbuild/.cache/tuxmake/builds/1/tmp ARCH=mips >> CROSS_COMPILE=mips-linux-gnu- 'CC=sccache mips-linux-gnu-gcc' >> 'HOSTCC=sccache gcc' >> WARNING: modpost: vmlinux.o(.text+0x6a88): Section mismatch in >> reference from the function reserve_exception_space() to the function >> .meminit.text:memblock_reserve() >> The function reserve_exception_space() references >> the function __meminit memblock_reserve(). >> This is often because reserve_exception_space lacks a __meminit >> annotation or the annotation of memblock_reserve is wrong. >> >> FATAL: modpost: Section mismatches detected. >> Set CONFIG_SECTION_MISMATCH_WARN_ONLY=y to allow them. >> make[2]: *** [/builds/linux/scripts/Makefile.modpost:59: >> vmlinux.symvers] Error 1 >> >> Here is the list of build failed, >> - gcc-8-allnoconfig >> - gcc-8-tinyconfig >> - gcc-9-allnoconfig >> - gcc-9-tinyconfig >> - gcc-10-allnoconfig >> - gcc-10-tinyconfig >> - clang-10-tinyconfig >> - clang-10-allnoconfig >> - clang-11-allnoconfig >> - clang-11-tinyconfig >> - clang-12-tinyconfig >> - clang-12-allnoconfig >> >> Reported-by: Naresh Kamboju >> >> link: >> https://gitlab.com/Linaro/lkft/mirrors/stable/linux-stable-rc/-/jobs/1117167411#L142 >> >> steps to reproduce: >> --- >> # TuxMake is a command line tool and Python library that provides >> # portable and repeatable Linux kernel builds across a variety of >> # architectures, toolchains, kernel configurations, and make targets. >> # >> # TuxMake supports the concept of runtimes. >> # See https://docs.tuxmake.org/runtimes/, for that to work it requires >> # that you install podman or docker on your system. >> # >> # To install tuxmake on your system globally: >> # sudo pip3 install -U tuxmake >> # >> # See https://docs.tuxmake.org/ for complete documentation. >> >> >> tuxmake --runtime podman --target-arch mips --toolchain gcc-10 >> --kconfig tinyconfig >> > > Thanks for letting me know, I'll go drop this and push out a new -rc... 2dcb39645441 ("memblock: do not start bottom-up allocations with kernel_end") is only present in v5.11 AFAICT, so not applicable for v5.10. I did not observe this problem on v5.10. -- Florian
Re: [PATCH 5.10 103/157] MIPS: kernel: Reserve exception base early to prevent corruption
On Mon, Mar 22, 2021 at 07:44:05PM +0530, Naresh Kamboju wrote: > On Mon, 22 Mar 2021 at 18:14, Greg Kroah-Hartman > wrote: > > > > From: Thomas Bogendoerfer > > > > [ Upstream commit bd67b711bfaa02cf19e88aa2d9edae5c1c1d2739 ] > > > > BMIPS is one of the few platforms that do change the exception base. > > After commit 2dcb39645441 ("memblock: do not start bottom-up allocations > > with kernel_end") we started seeing BMIPS boards fail to boot with the > > built-in FDT being corrupted. > > > > Before the cited commit, early allocations would be in the [kernel_end, > > RAM_END] range, but after commit they would be within [RAM_START + > > PAGE_SIZE, RAM_END]. > > > > The custom exception base handler that is installed by > > bmips_ebase_setup() done for BMIPS5000 CPUs ends-up trampling on the > > memory region allocated by unflatten_and_copy_device_tree() thus > > corrupting the FDT used by the kernel. > > > > To fix this, we need to perform an early reservation of the custom > > exception space. Additional we reserve the first 4k (1k for R3k) for > > either normal exception vector space (legacy CPUs) or special vectors > > like cache exceptions. > > > > Huge thanks to Serge for analysing and proposing a solution to this > > issue. > > > > Fixes: 2dcb39645441 ("memblock: do not start bottom-up allocations with > > kernel_end") > > Reported-by: Kamal Dasu > > Debugged-by: Serge Semin > > Acked-by: Mike Rapoport > > Tested-by: Florian Fainelli > > Reviewed-by: Serge Semin > > Signed-off-by: Thomas Bogendoerfer > > Signed-off-by: Sasha Levin > > --- > > arch/mips/include/asm/traps.h| 3 +++ > > arch/mips/kernel/cpu-probe.c | 6 ++ > > arch/mips/kernel/cpu-r3k-probe.c | 3 +++ > > arch/mips/kernel/traps.c | 10 +- > > mipc tinyconfig and allnoconfig builds failed on stable-rc 5.10 branch > > make --silent --keep-going --jobs=8 > O=/home/tuxbuild/.cache/tuxmake/builds/1/tmp ARCH=mips > CROSS_COMPILE=mips-linux-gnu- 'CC=sccache mips-linux-gnu-gcc' > 'HOSTCC=sccache gcc' > WARNING: modpost: vmlinux.o(.text+0x6a88): Section mismatch in > reference from the function reserve_exception_space() to the function > .meminit.text:memblock_reserve() > The function reserve_exception_space() references > the function __meminit memblock_reserve(). > This is often because reserve_exception_space lacks a __meminit > annotation or the annotation of memblock_reserve is wrong. > > FATAL: modpost: Section mismatches detected. > Set CONFIG_SECTION_MISMATCH_WARN_ONLY=y to allow them. > make[2]: *** [/builds/linux/scripts/Makefile.modpost:59: > vmlinux.symvers] Error 1 > > Here is the list of build failed, > - gcc-8-allnoconfig > - gcc-8-tinyconfig > - gcc-9-allnoconfig > - gcc-9-tinyconfig > - gcc-10-allnoconfig > - gcc-10-tinyconfig > - clang-10-tinyconfig > - clang-10-allnoconfig > - clang-11-allnoconfig > - clang-11-tinyconfig > - clang-12-tinyconfig > - clang-12-allnoconfig > > Reported-by: Naresh Kamboju > > link: > https://gitlab.com/Linaro/lkft/mirrors/stable/linux-stable-rc/-/jobs/1117167411#L142 > > steps to reproduce: > --- > # TuxMake is a command line tool and Python library that provides > # portable and repeatable Linux kernel builds across a variety of > # architectures, toolchains, kernel configurations, and make targets. > # > # TuxMake supports the concept of runtimes. > # See https://docs.tuxmake.org/runtimes/, for that to work it requires > # that you install podman or docker on your system. > # > # To install tuxmake on your system globally: > # sudo pip3 install -U tuxmake > # > # See https://docs.tuxmake.org/ for complete documentation. > > > tuxmake --runtime podman --target-arch mips --toolchain gcc-10 > --kconfig tinyconfig > Thanks for letting me know, I'll go drop this and push out a new -rc... greg k-h
Re: [PATCH 5.10 103/157] MIPS: kernel: Reserve exception base early to prevent corruption
On Mon, 22 Mar 2021 at 18:14, Greg Kroah-Hartman wrote: > > From: Thomas Bogendoerfer > > [ Upstream commit bd67b711bfaa02cf19e88aa2d9edae5c1c1d2739 ] > > BMIPS is one of the few platforms that do change the exception base. > After commit 2dcb39645441 ("memblock: do not start bottom-up allocations > with kernel_end") we started seeing BMIPS boards fail to boot with the > built-in FDT being corrupted. > > Before the cited commit, early allocations would be in the [kernel_end, > RAM_END] range, but after commit they would be within [RAM_START + > PAGE_SIZE, RAM_END]. > > The custom exception base handler that is installed by > bmips_ebase_setup() done for BMIPS5000 CPUs ends-up trampling on the > memory region allocated by unflatten_and_copy_device_tree() thus > corrupting the FDT used by the kernel. > > To fix this, we need to perform an early reservation of the custom > exception space. Additional we reserve the first 4k (1k for R3k) for > either normal exception vector space (legacy CPUs) or special vectors > like cache exceptions. > > Huge thanks to Serge for analysing and proposing a solution to this > issue. > > Fixes: 2dcb39645441 ("memblock: do not start bottom-up allocations with > kernel_end") > Reported-by: Kamal Dasu > Debugged-by: Serge Semin > Acked-by: Mike Rapoport > Tested-by: Florian Fainelli > Reviewed-by: Serge Semin > Signed-off-by: Thomas Bogendoerfer > Signed-off-by: Sasha Levin > --- > arch/mips/include/asm/traps.h| 3 +++ > arch/mips/kernel/cpu-probe.c | 6 ++ > arch/mips/kernel/cpu-r3k-probe.c | 3 +++ > arch/mips/kernel/traps.c | 10 +- mipc tinyconfig and allnoconfig builds failed on stable-rc 5.10 branch make --silent --keep-going --jobs=8 O=/home/tuxbuild/.cache/tuxmake/builds/1/tmp ARCH=mips CROSS_COMPILE=mips-linux-gnu- 'CC=sccache mips-linux-gnu-gcc' 'HOSTCC=sccache gcc' WARNING: modpost: vmlinux.o(.text+0x6a88): Section mismatch in reference from the function reserve_exception_space() to the function .meminit.text:memblock_reserve() The function reserve_exception_space() references the function __meminit memblock_reserve(). This is often because reserve_exception_space lacks a __meminit annotation or the annotation of memblock_reserve is wrong. FATAL: modpost: Section mismatches detected. Set CONFIG_SECTION_MISMATCH_WARN_ONLY=y to allow them. make[2]: *** [/builds/linux/scripts/Makefile.modpost:59: vmlinux.symvers] Error 1 Here is the list of build failed, - gcc-8-allnoconfig - gcc-8-tinyconfig - gcc-9-allnoconfig - gcc-9-tinyconfig - gcc-10-allnoconfig - gcc-10-tinyconfig - clang-10-tinyconfig - clang-10-allnoconfig - clang-11-allnoconfig - clang-11-tinyconfig - clang-12-tinyconfig - clang-12-allnoconfig Reported-by: Naresh Kamboju link: https://gitlab.com/Linaro/lkft/mirrors/stable/linux-stable-rc/-/jobs/1117167411#L142 steps to reproduce: --- # TuxMake is a command line tool and Python library that provides # portable and repeatable Linux kernel builds across a variety of # architectures, toolchains, kernel configurations, and make targets. # # TuxMake supports the concept of runtimes. # See https://docs.tuxmake.org/runtimes/, for that to work it requires # that you install podman or docker on your system. # # To install tuxmake on your system globally: # sudo pip3 install -U tuxmake # # See https://docs.tuxmake.org/ for complete documentation. tuxmake --runtime podman --target-arch mips --toolchain gcc-10 --kconfig tinyconfig -- Linaro LKFT https://lkft.linaro.org
[PATCH 4.14 34/43] PCI: rpadlpar: Fix potential drc_name corruption in store functions
From: Tyrel Datwyler commit cc7a0bb058b85ea03db87169c60c7cfdd5d34678 upstream. Both add_slot_store() and remove_slot_store() try to fix up the drc_name copied from the store buffer by placing a NUL terminator at nbyte + 1 or in place of a '\n' if present. However, the static buffer that we copy the drc_name data into is not zeroed and can contain anything past the n-th byte. This is problematic if a '\n' byte appears in that buffer after nbytes and the string copied into the store buffer was not NUL terminated to start with as the strchr() search for a '\n' byte will mark this incorrectly as the end of the drc_name string resulting in a drc_name string that contains garbage data after the n-th byte. Additionally it will cause us to overwrite that '\n' byte on the stack with NUL, potentially corrupting data on the stack. The following debugging shows an example of the drmgr utility writing "PHB 4543" to the add_slot sysfs attribute, but add_slot_store() logging a corrupted string value. drmgr: drmgr: -c phb -a -s PHB 4543 -d 1 add_slot_store: drc_name = PHB 4543°|<82>!, rc = -19 Fix this by using strscpy() instead of memcpy() to ensure the string is NUL terminated when copied into the static drc_name buffer. Further, since the string is now NUL terminated the code only needs to change '\n' to '\0' when present. Cc: sta...@vger.kernel.org Signed-off-by: Tyrel Datwyler [mpe: Reformat change log and add mention of possible stack corruption] Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20210315214821.452959-1-tyr...@linux.ibm.com Signed-off-by: Greg Kroah-Hartman --- drivers/pci/hotplug/rpadlpar_sysfs.c | 14 ++ 1 file changed, 6 insertions(+), 8 deletions(-) --- a/drivers/pci/hotplug/rpadlpar_sysfs.c +++ b/drivers/pci/hotplug/rpadlpar_sysfs.c @@ -39,12 +39,11 @@ static ssize_t add_slot_store(struct kob if (nbytes >= MAX_DRC_NAME_LEN) return 0; - memcpy(drc_name, buf, nbytes); + strscpy(drc_name, buf, nbytes + 1); end = strchr(drc_name, '\n'); - if (!end) - end = &drc_name[nbytes]; - *end = '\0'; + if (end) + *end = '\0'; rc = dlpar_add_slot(drc_name); if (rc) @@ -70,12 +69,11 @@ static ssize_t remove_slot_store(struct if (nbytes >= MAX_DRC_NAME_LEN) return 0; - memcpy(drc_name, buf, nbytes); + strscpy(drc_name, buf, nbytes + 1); end = strchr(drc_name, '\n'); - if (!end) - end = &drc_name[nbytes]; - *end = '\0'; + if (end) + *end = '\0'; rc = dlpar_remove_slot(drc_name); if (rc)
[PATCH 4.9 17/25] PCI: rpadlpar: Fix potential drc_name corruption in store functions
From: Tyrel Datwyler commit cc7a0bb058b85ea03db87169c60c7cfdd5d34678 upstream. Both add_slot_store() and remove_slot_store() try to fix up the drc_name copied from the store buffer by placing a NUL terminator at nbyte + 1 or in place of a '\n' if present. However, the static buffer that we copy the drc_name data into is not zeroed and can contain anything past the n-th byte. This is problematic if a '\n' byte appears in that buffer after nbytes and the string copied into the store buffer was not NUL terminated to start with as the strchr() search for a '\n' byte will mark this incorrectly as the end of the drc_name string resulting in a drc_name string that contains garbage data after the n-th byte. Additionally it will cause us to overwrite that '\n' byte on the stack with NUL, potentially corrupting data on the stack. The following debugging shows an example of the drmgr utility writing "PHB 4543" to the add_slot sysfs attribute, but add_slot_store() logging a corrupted string value. drmgr: drmgr: -c phb -a -s PHB 4543 -d 1 add_slot_store: drc_name = PHB 4543°|<82>!, rc = -19 Fix this by using strscpy() instead of memcpy() to ensure the string is NUL terminated when copied into the static drc_name buffer. Further, since the string is now NUL terminated the code only needs to change '\n' to '\0' when present. Cc: sta...@vger.kernel.org Signed-off-by: Tyrel Datwyler [mpe: Reformat change log and add mention of possible stack corruption] Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20210315214821.452959-1-tyr...@linux.ibm.com Signed-off-by: Greg Kroah-Hartman --- drivers/pci/hotplug/rpadlpar_sysfs.c | 14 ++ 1 file changed, 6 insertions(+), 8 deletions(-) --- a/drivers/pci/hotplug/rpadlpar_sysfs.c +++ b/drivers/pci/hotplug/rpadlpar_sysfs.c @@ -39,12 +39,11 @@ static ssize_t add_slot_store(struct kob if (nbytes >= MAX_DRC_NAME_LEN) return 0; - memcpy(drc_name, buf, nbytes); + strscpy(drc_name, buf, nbytes + 1); end = strchr(drc_name, '\n'); - if (!end) - end = &drc_name[nbytes]; - *end = '\0'; + if (end) + *end = '\0'; rc = dlpar_add_slot(drc_name); if (rc) @@ -70,12 +69,11 @@ static ssize_t remove_slot_store(struct if (nbytes >= MAX_DRC_NAME_LEN) return 0; - memcpy(drc_name, buf, nbytes); + strscpy(drc_name, buf, nbytes + 1); end = strchr(drc_name, '\n'); - if (!end) - end = &drc_name[nbytes]; - *end = '\0'; + if (end) + *end = '\0'; rc = dlpar_remove_slot(drc_name); if (rc)
[PATCH 4.4 10/14] PCI: rpadlpar: Fix potential drc_name corruption in store functions
From: Tyrel Datwyler commit cc7a0bb058b85ea03db87169c60c7cfdd5d34678 upstream. Both add_slot_store() and remove_slot_store() try to fix up the drc_name copied from the store buffer by placing a NUL terminator at nbyte + 1 or in place of a '\n' if present. However, the static buffer that we copy the drc_name data into is not zeroed and can contain anything past the n-th byte. This is problematic if a '\n' byte appears in that buffer after nbytes and the string copied into the store buffer was not NUL terminated to start with as the strchr() search for a '\n' byte will mark this incorrectly as the end of the drc_name string resulting in a drc_name string that contains garbage data after the n-th byte. Additionally it will cause us to overwrite that '\n' byte on the stack with NUL, potentially corrupting data on the stack. The following debugging shows an example of the drmgr utility writing "PHB 4543" to the add_slot sysfs attribute, but add_slot_store() logging a corrupted string value. drmgr: drmgr: -c phb -a -s PHB 4543 -d 1 add_slot_store: drc_name = PHB 4543°|<82>!, rc = -19 Fix this by using strscpy() instead of memcpy() to ensure the string is NUL terminated when copied into the static drc_name buffer. Further, since the string is now NUL terminated the code only needs to change '\n' to '\0' when present. Cc: sta...@vger.kernel.org Signed-off-by: Tyrel Datwyler [mpe: Reformat change log and add mention of possible stack corruption] Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20210315214821.452959-1-tyr...@linux.ibm.com Signed-off-by: Greg Kroah-Hartman --- drivers/pci/hotplug/rpadlpar_sysfs.c | 14 ++ 1 file changed, 6 insertions(+), 8 deletions(-) --- a/drivers/pci/hotplug/rpadlpar_sysfs.c +++ b/drivers/pci/hotplug/rpadlpar_sysfs.c @@ -39,12 +39,11 @@ static ssize_t add_slot_store(struct kob if (nbytes >= MAX_DRC_NAME_LEN) return 0; - memcpy(drc_name, buf, nbytes); + strscpy(drc_name, buf, nbytes + 1); end = strchr(drc_name, '\n'); - if (!end) - end = &drc_name[nbytes]; - *end = '\0'; + if (end) + *end = '\0'; rc = dlpar_add_slot(drc_name); if (rc) @@ -70,12 +69,11 @@ static ssize_t remove_slot_store(struct if (nbytes >= MAX_DRC_NAME_LEN) return 0; - memcpy(drc_name, buf, nbytes); + strscpy(drc_name, buf, nbytes + 1); end = strchr(drc_name, '\n'); - if (!end) - end = &drc_name[nbytes]; - *end = '\0'; + if (end) + *end = '\0'; rc = dlpar_remove_slot(drc_name); if (rc)
[PATCH 4.19 33/43] PCI: rpadlpar: Fix potential drc_name corruption in store functions
From: Tyrel Datwyler commit cc7a0bb058b85ea03db87169c60c7cfdd5d34678 upstream. Both add_slot_store() and remove_slot_store() try to fix up the drc_name copied from the store buffer by placing a NUL terminator at nbyte + 1 or in place of a '\n' if present. However, the static buffer that we copy the drc_name data into is not zeroed and can contain anything past the n-th byte. This is problematic if a '\n' byte appears in that buffer after nbytes and the string copied into the store buffer was not NUL terminated to start with as the strchr() search for a '\n' byte will mark this incorrectly as the end of the drc_name string resulting in a drc_name string that contains garbage data after the n-th byte. Additionally it will cause us to overwrite that '\n' byte on the stack with NUL, potentially corrupting data on the stack. The following debugging shows an example of the drmgr utility writing "PHB 4543" to the add_slot sysfs attribute, but add_slot_store() logging a corrupted string value. drmgr: drmgr: -c phb -a -s PHB 4543 -d 1 add_slot_store: drc_name = PHB 4543°|<82>!, rc = -19 Fix this by using strscpy() instead of memcpy() to ensure the string is NUL terminated when copied into the static drc_name buffer. Further, since the string is now NUL terminated the code only needs to change '\n' to '\0' when present. Cc: sta...@vger.kernel.org Signed-off-by: Tyrel Datwyler [mpe: Reformat change log and add mention of possible stack corruption] Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20210315214821.452959-1-tyr...@linux.ibm.com Signed-off-by: Greg Kroah-Hartman --- drivers/pci/hotplug/rpadlpar_sysfs.c | 14 ++ 1 file changed, 6 insertions(+), 8 deletions(-) --- a/drivers/pci/hotplug/rpadlpar_sysfs.c +++ b/drivers/pci/hotplug/rpadlpar_sysfs.c @@ -34,12 +34,11 @@ static ssize_t add_slot_store(struct kob if (nbytes >= MAX_DRC_NAME_LEN) return 0; - memcpy(drc_name, buf, nbytes); + strscpy(drc_name, buf, nbytes + 1); end = strchr(drc_name, '\n'); - if (!end) - end = &drc_name[nbytes]; - *end = '\0'; + if (end) + *end = '\0'; rc = dlpar_add_slot(drc_name); if (rc) @@ -65,12 +64,11 @@ static ssize_t remove_slot_store(struct if (nbytes >= MAX_DRC_NAME_LEN) return 0; - memcpy(drc_name, buf, nbytes); + strscpy(drc_name, buf, nbytes + 1); end = strchr(drc_name, '\n'); - if (!end) - end = &drc_name[nbytes]; - *end = '\0'; + if (end) + *end = '\0'; rc = dlpar_remove_slot(drc_name); if (rc)
[PATCH 5.4 48/60] PCI: rpadlpar: Fix potential drc_name corruption in store functions
From: Tyrel Datwyler commit cc7a0bb058b85ea03db87169c60c7cfdd5d34678 upstream. Both add_slot_store() and remove_slot_store() try to fix up the drc_name copied from the store buffer by placing a NUL terminator at nbyte + 1 or in place of a '\n' if present. However, the static buffer that we copy the drc_name data into is not zeroed and can contain anything past the n-th byte. This is problematic if a '\n' byte appears in that buffer after nbytes and the string copied into the store buffer was not NUL terminated to start with as the strchr() search for a '\n' byte will mark this incorrectly as the end of the drc_name string resulting in a drc_name string that contains garbage data after the n-th byte. Additionally it will cause us to overwrite that '\n' byte on the stack with NUL, potentially corrupting data on the stack. The following debugging shows an example of the drmgr utility writing "PHB 4543" to the add_slot sysfs attribute, but add_slot_store() logging a corrupted string value. drmgr: drmgr: -c phb -a -s PHB 4543 -d 1 add_slot_store: drc_name = PHB 4543°|<82>!, rc = -19 Fix this by using strscpy() instead of memcpy() to ensure the string is NUL terminated when copied into the static drc_name buffer. Further, since the string is now NUL terminated the code only needs to change '\n' to '\0' when present. Cc: sta...@vger.kernel.org Signed-off-by: Tyrel Datwyler [mpe: Reformat change log and add mention of possible stack corruption] Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20210315214821.452959-1-tyr...@linux.ibm.com Signed-off-by: Greg Kroah-Hartman --- drivers/pci/hotplug/rpadlpar_sysfs.c | 14 ++ 1 file changed, 6 insertions(+), 8 deletions(-) --- a/drivers/pci/hotplug/rpadlpar_sysfs.c +++ b/drivers/pci/hotplug/rpadlpar_sysfs.c @@ -34,12 +34,11 @@ static ssize_t add_slot_store(struct kob if (nbytes >= MAX_DRC_NAME_LEN) return 0; - memcpy(drc_name, buf, nbytes); + strscpy(drc_name, buf, nbytes + 1); end = strchr(drc_name, '\n'); - if (!end) - end = &drc_name[nbytes]; - *end = '\0'; + if (end) + *end = '\0'; rc = dlpar_add_slot(drc_name); if (rc) @@ -65,12 +64,11 @@ static ssize_t remove_slot_store(struct if (nbytes >= MAX_DRC_NAME_LEN) return 0; - memcpy(drc_name, buf, nbytes); + strscpy(drc_name, buf, nbytes + 1); end = strchr(drc_name, '\n'); - if (!end) - end = &drc_name[nbytes]; - *end = '\0'; + if (end) + *end = '\0'; rc = dlpar_remove_slot(drc_name); if (rc)
[PATCH 5.10 137/157] PCI: rpadlpar: Fix potential drc_name corruption in store functions
From: Tyrel Datwyler commit cc7a0bb058b85ea03db87169c60c7cfdd5d34678 upstream. Both add_slot_store() and remove_slot_store() try to fix up the drc_name copied from the store buffer by placing a NUL terminator at nbyte + 1 or in place of a '\n' if present. However, the static buffer that we copy the drc_name data into is not zeroed and can contain anything past the n-th byte. This is problematic if a '\n' byte appears in that buffer after nbytes and the string copied into the store buffer was not NUL terminated to start with as the strchr() search for a '\n' byte will mark this incorrectly as the end of the drc_name string resulting in a drc_name string that contains garbage data after the n-th byte. Additionally it will cause us to overwrite that '\n' byte on the stack with NUL, potentially corrupting data on the stack. The following debugging shows an example of the drmgr utility writing "PHB 4543" to the add_slot sysfs attribute, but add_slot_store() logging a corrupted string value. drmgr: drmgr: -c phb -a -s PHB 4543 -d 1 add_slot_store: drc_name = PHB 4543°|<82>!, rc = -19 Fix this by using strscpy() instead of memcpy() to ensure the string is NUL terminated when copied into the static drc_name buffer. Further, since the string is now NUL terminated the code only needs to change '\n' to '\0' when present. Cc: sta...@vger.kernel.org Signed-off-by: Tyrel Datwyler [mpe: Reformat change log and add mention of possible stack corruption] Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20210315214821.452959-1-tyr...@linux.ibm.com Signed-off-by: Greg Kroah-Hartman --- drivers/pci/hotplug/rpadlpar_sysfs.c | 14 ++ 1 file changed, 6 insertions(+), 8 deletions(-) --- a/drivers/pci/hotplug/rpadlpar_sysfs.c +++ b/drivers/pci/hotplug/rpadlpar_sysfs.c @@ -34,12 +34,11 @@ static ssize_t add_slot_store(struct kob if (nbytes >= MAX_DRC_NAME_LEN) return 0; - memcpy(drc_name, buf, nbytes); + strscpy(drc_name, buf, nbytes + 1); end = strchr(drc_name, '\n'); - if (!end) - end = &drc_name[nbytes]; - *end = '\0'; + if (end) + *end = '\0'; rc = dlpar_add_slot(drc_name); if (rc) @@ -65,12 +64,11 @@ static ssize_t remove_slot_store(struct if (nbytes >= MAX_DRC_NAME_LEN) return 0; - memcpy(drc_name, buf, nbytes); + strscpy(drc_name, buf, nbytes + 1); end = strchr(drc_name, '\n'); - if (!end) - end = &drc_name[nbytes]; - *end = '\0'; + if (end) + *end = '\0'; rc = dlpar_remove_slot(drc_name); if (rc)
[PATCH 5.10 103/157] MIPS: kernel: Reserve exception base early to prevent corruption
From: Thomas Bogendoerfer [ Upstream commit bd67b711bfaa02cf19e88aa2d9edae5c1c1d2739 ] BMIPS is one of the few platforms that do change the exception base. After commit 2dcb39645441 ("memblock: do not start bottom-up allocations with kernel_end") we started seeing BMIPS boards fail to boot with the built-in FDT being corrupted. Before the cited commit, early allocations would be in the [kernel_end, RAM_END] range, but after commit they would be within [RAM_START + PAGE_SIZE, RAM_END]. The custom exception base handler that is installed by bmips_ebase_setup() done for BMIPS5000 CPUs ends-up trampling on the memory region allocated by unflatten_and_copy_device_tree() thus corrupting the FDT used by the kernel. To fix this, we need to perform an early reservation of the custom exception space. Additional we reserve the first 4k (1k for R3k) for either normal exception vector space (legacy CPUs) or special vectors like cache exceptions. Huge thanks to Serge for analysing and proposing a solution to this issue. Fixes: 2dcb39645441 ("memblock: do not start bottom-up allocations with kernel_end") Reported-by: Kamal Dasu Debugged-by: Serge Semin Acked-by: Mike Rapoport Tested-by: Florian Fainelli Reviewed-by: Serge Semin Signed-off-by: Thomas Bogendoerfer Signed-off-by: Sasha Levin --- arch/mips/include/asm/traps.h| 3 +++ arch/mips/kernel/cpu-probe.c | 6 ++ arch/mips/kernel/cpu-r3k-probe.c | 3 +++ arch/mips/kernel/traps.c | 10 +- 4 files changed, 17 insertions(+), 5 deletions(-) diff --git a/arch/mips/include/asm/traps.h b/arch/mips/include/asm/traps.h index 6a0864bb604d..9038b91e2d8c 100644 --- a/arch/mips/include/asm/traps.h +++ b/arch/mips/include/asm/traps.h @@ -24,6 +24,9 @@ extern void (*board_ebase_setup)(void); extern void (*board_cache_error_setup)(void); extern int register_nmi_notifier(struct notifier_block *nb); +extern void reserve_exception_space(phys_addr_t addr, unsigned long size); + +#define VECTORSPACING 0x100/* for EI/VI mode */ #define nmi_notifier(fn, pri) \ ({ \ diff --git a/arch/mips/kernel/cpu-probe.c b/arch/mips/kernel/cpu-probe.c index 31cb9199197c..21794db53c05 100644 --- a/arch/mips/kernel/cpu-probe.c +++ b/arch/mips/kernel/cpu-probe.c @@ -26,6 +26,7 @@ #include #include #include +#include #include #include "fpu-probe.h" @@ -1619,6 +1620,7 @@ static inline void cpu_probe_broadcom(struct cpuinfo_mips *c, unsigned int cpu) c->cputype = CPU_BMIPS3300; __cpu_name[cpu] = "Broadcom BMIPS3300"; set_elf_platform(cpu, "bmips3300"); + reserve_exception_space(0x400, VECTORSPACING * 64); break; case PRID_IMP_BMIPS43XX: { int rev = c->processor_id & PRID_REV_MASK; @@ -1629,6 +1631,7 @@ static inline void cpu_probe_broadcom(struct cpuinfo_mips *c, unsigned int cpu) __cpu_name[cpu] = "Broadcom BMIPS4380"; set_elf_platform(cpu, "bmips4380"); c->options |= MIPS_CPU_RIXI; + reserve_exception_space(0x400, VECTORSPACING * 64); } else { c->cputype = CPU_BMIPS4350; __cpu_name[cpu] = "Broadcom BMIPS4350"; @@ -1645,6 +1648,7 @@ static inline void cpu_probe_broadcom(struct cpuinfo_mips *c, unsigned int cpu) __cpu_name[cpu] = "Broadcom BMIPS5000"; set_elf_platform(cpu, "bmips5000"); c->options |= MIPS_CPU_ULRI | MIPS_CPU_RIXI; + reserve_exception_space(0x1000, VECTORSPACING * 64); break; } } @@ -2124,6 +2128,8 @@ void cpu_probe(void) if (cpu == 0) __ua_limit = ~((1ull << cpu_vmbits) - 1); #endif + + reserve_exception_space(0, 0x1000); } void cpu_report(void) diff --git a/arch/mips/kernel/cpu-r3k-probe.c b/arch/mips/kernel/cpu-r3k-probe.c index abdbbe8c5a43..af654771918c 100644 --- a/arch/mips/kernel/cpu-r3k-probe.c +++ b/arch/mips/kernel/cpu-r3k-probe.c @@ -21,6 +21,7 @@ #include #include #include +#include #include "fpu-probe.h" @@ -158,6 +159,8 @@ void cpu_probe(void) cpu_set_fpu_opts(c); else cpu_set_nofpu_opts(c); + + reserve_exception_space(0, 0x400); } void cpu_report(void) diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c index e0352958e2f7..808b8b61ded1 100644 --- a/arch/mips/kernel/traps.c +++ b/arch/mips/kernel/traps.c @@ -2009,13 +2009,16 @@ void __noreturn nmi_exception_handler(struct pt_regs *regs) nmi_exit(); } -#define VECTORSPACING 0x100/* for EI/VI mode */ - unsigned long ebase; EXPORT_SYMBOL_GPL(ebase); unsigned long exception_handlers[32]; unsigned long vi_handlers[64]; +void reserve_exception_space(phys_addr_t addr, unsigned l
[PATCH 5.11 099/120] PCI: rpadlpar: Fix potential drc_name corruption in store functions
From: Tyrel Datwyler commit cc7a0bb058b85ea03db87169c60c7cfdd5d34678 upstream. Both add_slot_store() and remove_slot_store() try to fix up the drc_name copied from the store buffer by placing a NUL terminator at nbyte + 1 or in place of a '\n' if present. However, the static buffer that we copy the drc_name data into is not zeroed and can contain anything past the n-th byte. This is problematic if a '\n' byte appears in that buffer after nbytes and the string copied into the store buffer was not NUL terminated to start with as the strchr() search for a '\n' byte will mark this incorrectly as the end of the drc_name string resulting in a drc_name string that contains garbage data after the n-th byte. Additionally it will cause us to overwrite that '\n' byte on the stack with NUL, potentially corrupting data on the stack. The following debugging shows an example of the drmgr utility writing "PHB 4543" to the add_slot sysfs attribute, but add_slot_store() logging a corrupted string value. drmgr: drmgr: -c phb -a -s PHB 4543 -d 1 add_slot_store: drc_name = PHB 4543°|<82>!, rc = -19 Fix this by using strscpy() instead of memcpy() to ensure the string is NUL terminated when copied into the static drc_name buffer. Further, since the string is now NUL terminated the code only needs to change '\n' to '\0' when present. Cc: sta...@vger.kernel.org Signed-off-by: Tyrel Datwyler [mpe: Reformat change log and add mention of possible stack corruption] Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20210315214821.452959-1-tyr...@linux.ibm.com Signed-off-by: Greg Kroah-Hartman --- drivers/pci/hotplug/rpadlpar_sysfs.c | 14 ++ 1 file changed, 6 insertions(+), 8 deletions(-) --- a/drivers/pci/hotplug/rpadlpar_sysfs.c +++ b/drivers/pci/hotplug/rpadlpar_sysfs.c @@ -34,12 +34,11 @@ static ssize_t add_slot_store(struct kob if (nbytes >= MAX_DRC_NAME_LEN) return 0; - memcpy(drc_name, buf, nbytes); + strscpy(drc_name, buf, nbytes + 1); end = strchr(drc_name, '\n'); - if (!end) - end = &drc_name[nbytes]; - *end = '\0'; + if (end) + *end = '\0'; rc = dlpar_add_slot(drc_name); if (rc) @@ -65,12 +64,11 @@ static ssize_t remove_slot_store(struct if (nbytes >= MAX_DRC_NAME_LEN) return 0; - memcpy(drc_name, buf, nbytes); + strscpy(drc_name, buf, nbytes + 1); end = strchr(drc_name, '\n'); - if (!end) - end = &drc_name[nbytes]; - *end = '\0'; + if (end) + *end = '\0'; rc = dlpar_remove_slot(drc_name); if (rc)
[tip: irq/core] genirq/matrix: Prevent allocation counter corruption
The following commit has been merged into the irq/core branch of tip: Commit-ID: c93a5e20c3c2dabef8ea360a3d3f18c6f68233ab Gitweb: https://git.kernel.org/tip/c93a5e20c3c2dabef8ea360a3d3f18c6f68233ab Author:Vitaly Kuznetsov AuthorDate:Fri, 19 Mar 2021 12:18:23 +01:00 Committer: Thomas Gleixner CommitterDate: Fri, 19 Mar 2021 22:52:11 +01:00 genirq/matrix: Prevent allocation counter corruption When irq_matrix_free() is called for an unallocated vector the managed_allocated and total_allocated counters get out of sync with the real state of the matrix. Later, when the last interrupt is freed, these counters will underflow resulting in UINTMAX because the counters are unsigned. While this is certainly a problem of the calling code, this can be catched in the allocator by checking the allocation bit for the to be freed vector which simplifies debugging. An example of the problem described above: https://lore.kernel.org/lkml/20210318192819.636943...@linutronix.de/ Add the missing sanity check and emit a warning when it triggers. Suggested-by: Thomas Gleixner Signed-off-by: Vitaly Kuznetsov Signed-off-by: Thomas Gleixner Link: https://lore.kernel.org/r/20210319111823.1105248-1-vkuzn...@redhat.com --- kernel/irq/matrix.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/kernel/irq/matrix.c b/kernel/irq/matrix.c index 6f8b1d1..578596e 100644 --- a/kernel/irq/matrix.c +++ b/kernel/irq/matrix.c @@ -422,7 +422,9 @@ void irq_matrix_free(struct irq_matrix *m, unsigned int cpu, if (WARN_ON_ONCE(bit < m->alloc_start || bit >= m->alloc_end)) return; - clear_bit(bit, cm->alloc_map); + if (WARN_ON_ONCE(!test_and_clear_bit(bit, cm->alloc_map))) + return; + cm->allocated--; if(managed) cm->managed_allocated--;
Re: [PATCH] mm/slub: Add slub_debug option to panic on memory corruption
On Thu, Mar 18, 2021 at 01:56:05PM +0100, Vlastimil Babka wrote: > I was going to suggest adding a panic_on_taint parameter... but turns out it > was > already added last year! And various memory corruption detections already use > TAINT_BAD_PAGE, including SLUB. > If anything's missing an add_taint() it can be added, and with the parameter > you > should get what you want. Ah-ha! That works too. I hadn't seen that -- I wonder if I can wire some other hardening things up to that. (e.g. refactor BUG_ON_CORRUPTION finally.) -- Kees Cook
Re: [PATCH] mm/slub: Add slub_debug option to panic on memory corruption
On 3/18/21 6:48 AM, Kees Cook wrote: > On Tue, Mar 09, 2021 at 07:18:32PM +0100, Vlastimil Babka wrote: >> On 3/9/21 7:14 PM, Georgi Djakov wrote: >> > Hi Vlastimil, >> > >> > Thanks for the comment! >> > >> > On 3/9/21 17:09, Vlastimil Babka wrote: >> >> On 3/9/21 2:47 PM, Georgi Djakov wrote: >> >>> Being able to stop the system immediately when a memory corruption >> >>> is detected is crucial to finding the source of it. This is very >> >>> useful when the memory can be inspected with kdump or other tools. >> >> >> >> Is this in some testing scenarios where you would also use e.g. >> >> panic_on_warn? >> >> We could hook to that. If not, we could introduce a new >> >> panic_on_memory_corruption that would apply also for debug_pagealloc and >> >> whatnot? >> > >> > I would prefer that we not tie it with panic_on_warn - there might be lots >> > of >> > new code in multiple subsystems, so hitting some WARNing while testing is >> > not >> > something unexpected. >> > >> > Introducing an additional panic_on_memory_corruption would work, but i >> > noticed >> > that we already have slub_debug and thought to re-use that. But indeed, >> > аdding >> > an option to panic in for example bad_page() sounds also useful, if that's >> > what >> > you suggest. >> >> Yes, that would be another example. >> Also CCing Kees for input, as besides the "kdump ASAP for debugging" case, I >> can >> imagine security hardening folks could be interested in the "somebody might >> have >> just failed to pwn the kernel, better panic than let them continue" angle. >> But >> I'm naive wrt security, so it might be a stupid idea :) > > I've really wanted such things, but Linus has been pretty adamant about > not wanting to provide new "panic" paths (or even BUG usage[1]). It > seems that panic_on_warn remains the way to get this behavior, > with the understanding that WARN should only be produced on > expected-to-be-impossible situations[1]. > > Hitting a WARN while testing should result in either finding and fixing > a real bug, or removing the WARN in favor of pr_warn(). :) I was going to suggest adding a panic_on_taint parameter... but turns out it was already added last year! And various memory corruption detections already use TAINT_BAD_PAGE, including SLUB. If anything's missing an add_taint() it can be added, and with the parameter you should get what you want. > -Kees > > [1] > https://www.kernel.org/doc/html/latest/process/deprecated.html#bug-and-bug-on >
[PATCH rdma-next 0/6] Fix memory corruption in CM
From: Leon Romanovsky Hi, This series from Mark fixes long standing bug in CM migration logic, reported by Ryan [1]. Thanks [1] https://lore.kernel.org/linux-rdma/cafmmrnx9cg--nunzjfm8ywqfaetsmawv4eogkb3a0+hnjdt...@mail.gmail.com/ Mark Zhang (6): Revert "IB/cm: Mark stale CM id's whenever the mad agent was unregistered" IB/cm: Remove "mad_agent" parameter of ib_cancel_mad IB/cm: Remove "mad_agent" parameter of ib_modify_mad IB/cm: Clear all associated AV's ports when remove a cm device IB/cm: Add lock protection when access av/alt_av's port of a cm_id IB/cm: Initialize av before acquire the spin lock in cm_lap_handler drivers/infiniband/core/cm.c | 359 - drivers/infiniband/core/mad.c | 17 +- drivers/infiniband/core/sa_query.c | 4 +- include/rdma/ib_mad.h | 27 ++- 4 files changed, 222 insertions(+), 185 deletions(-) -- 2.30.2
Re: [PATCH] mm/slub: Add slub_debug option to panic on memory corruption
On Tue, Mar 09, 2021 at 07:18:32PM +0100, Vlastimil Babka wrote: > On 3/9/21 7:14 PM, Georgi Djakov wrote: > > Hi Vlastimil, > > > > Thanks for the comment! > > > > On 3/9/21 17:09, Vlastimil Babka wrote: > >> On 3/9/21 2:47 PM, Georgi Djakov wrote: > >>> Being able to stop the system immediately when a memory corruption > >>> is detected is crucial to finding the source of it. This is very > >>> useful when the memory can be inspected with kdump or other tools. > >> > >> Is this in some testing scenarios where you would also use e.g. > >> panic_on_warn? > >> We could hook to that. If not, we could introduce a new > >> panic_on_memory_corruption that would apply also for debug_pagealloc and > >> whatnot? > > > > I would prefer that we not tie it with panic_on_warn - there might be lots > > of > > new code in multiple subsystems, so hitting some WARNing while testing is > > not > > something unexpected. > > > > Introducing an additional panic_on_memory_corruption would work, but i > > noticed > > that we already have slub_debug and thought to re-use that. But indeed, > > аdding > > an option to panic in for example bad_page() sounds also useful, if that's > > what > > you suggest. > > Yes, that would be another example. > Also CCing Kees for input, as besides the "kdump ASAP for debugging" case, I > can > imagine security hardening folks could be interested in the "somebody might > have > just failed to pwn the kernel, better panic than let them continue" angle. But > I'm naive wrt security, so it might be a stupid idea :) I've really wanted such things, but Linus has been pretty adamant about not wanting to provide new "panic" paths (or even BUG usage[1]). It seems that panic_on_warn remains the way to get this behavior, with the understanding that WARN should only be produced on expected-to-be-impossible situations[1]. Hitting a WARN while testing should result in either finding and fixing a real bug, or removing the WARN in favor of pr_warn(). :) -Kees [1] https://www.kernel.org/doc/html/latest/process/deprecated.html#bug-and-bug-on -- Kees Cook
Re: BUG: KFENCE: memory corruption in usb_get_device_descriptor
On Wed, Mar 17, 2021 at 04:56:15PM +0530, Naresh Kamboju wrote: > On Wed, 17 Mar 2021 at 15:34, Greg Kroah-Hartman > wrote: > > > > On Wed, Mar 17, 2021 at 02:28:40PM +0530, Naresh Kamboju wrote: > > > While booting Linux mainline master 5.12.0-rc2 and 5.12.0-rc3 on arm64 > > > Hikey device the following KFENCE bug was found. > > > > > > Recently, we have enabled CONFIG_KFENCE=y and started seeing this crash. > > > kernel BUG log: > > > > What USB traffic are you having here? > > This is getting triggered while booting the device. > We are not running any traffic. Ah, so this is device probe time. > > And has this ever not triggered? > > No. > It was not triggered before. > Since CONFIG_KFENCE=y is added to our builds recently we are able to > reproduce always on recent builds. > > Steps to reproduce: > 1) Build arm64 kernel Image with this given config. > - tuxmake --runtime podman --target-arch arm64 --toolchain gcc-9 > --kconfig defconfig --kconfig-add > https://builds.tuxbuild.com/1pfztfszUNcDwOAyMrw2wPMKNfc/config > 2) Boot arm64 hikey hi6220 device > 3) While booting the device you will get to see this kernel BUG: > > [ 18.243075] BUG: KFENCE: memory corruption in > usb_get_device_descriptor+0x80/0xb0 > [ 18.813861] BUG: KFENCE: memory corruption in > __usbnet_read_cmd.isra.0+0xd0/0x1a0 There was a warning before this, from the hub code, when reading from this device as well. Perhaps this is just a side affect of the real memory corruption issue somewhere else? Bisection would be nice, but I'm placing odds on this always being an issue here in this driver code... thanks for the report. greg k-h
Re: [PATCH v2] rpadlpar: fix potential drc_name corruption in store functions
On Mon, 15 Mar 2021 15:48:21 -0600, Tyrel Datwyler wrote: > Both add_slot_store() and remove_slot_store() try to fix up the drc_name > copied from the store buffer by placing a NULL terminator at nbyte + 1 > or in place of a '\n' if present. However, the static buffer that we > copy the drc_name data into is not zeored and can contain anything past > the n-th byte. This is problematic if a '\n' byte appears in that buffer > after nbytes and the string copied into the store buffer was not NULL > terminated to start with as the strchr() search for a '\n' byte will mark > this incorrectly as the end of the drc_name string resulting in a drc_name > string that contains garbage data after the n-th byte. The following > debugging shows an example of the drmgr utility writing "PHB 4543" to > the add_slot sysfs attribute, but add_slot_store logging a corrupted > string value. > > [...] Applied to powerpc/fixes. [1/1] rpadlpar: fix potential drc_name corruption in store functions https://git.kernel.org/powerpc/c/cc7a0bb058b85ea03db87169c60c7cfdd5d34678 cheers
Re: BUG: KFENCE: memory corruption in usb_get_device_descriptor
On Wed, 17 Mar 2021 at 15:34, Greg Kroah-Hartman wrote: > > On Wed, Mar 17, 2021 at 02:28:40PM +0530, Naresh Kamboju wrote: > > While booting Linux mainline master 5.12.0-rc2 and 5.12.0-rc3 on arm64 > > Hikey device the following KFENCE bug was found. > > > > Recently, we have enabled CONFIG_KFENCE=y and started seeing this crash. > > kernel BUG log: > > What USB traffic are you having here? This is getting triggered while booting the device. We are not running any traffic. > > And has this ever not triggered? No. It was not triggered before. Since CONFIG_KFENCE=y is added to our builds recently we are able to reproduce always on recent builds. Steps to reproduce: 1) Build arm64 kernel Image with this given config. - tuxmake --runtime podman --target-arch arm64 --toolchain gcc-9 --kconfig defconfig --kconfig-add https://builds.tuxbuild.com/1pfztfszUNcDwOAyMrw2wPMKNfc/config 2) Boot arm64 hikey hi6220 device 3) While booting the device you will get to see this kernel BUG: [ 18.243075] BUG: KFENCE: memory corruption in usb_get_device_descriptor+0x80/0xb0 [ 18.813861] BUG: KFENCE: memory corruption in __usbnet_read_cmd.isra.0+0xd0/0x1a0 link: https://qa-reports.linaro.org/lkft/linux-mainline-master/build/v5.12-rc2-487-gf296bfd5cd04/testrun/4155170/suite/linux-log-parser/test/check-kernel-bug-2388200/log - Naresh
Re: BUG: KFENCE: memory corruption in usb_get_device_descriptor
On Wed, Mar 17, 2021 at 02:28:40PM +0530, Naresh Kamboju wrote: > While booting Linux mainline master 5.12.0-rc2 and 5.12.0-rc3 on arm64 > Hikey device the following KFENCE bug was found. > > Recently, we have enabled CONFIG_KFENCE=y and started seeing this crash. > kernel BUG log: What USB traffic are you having here? And has this ever not triggered? thanks, greg k-h
BUG: KFENCE: memory corruption in usb_get_device_descriptor
While booting Linux mainline master 5.12.0-rc2 and 5.12.0-rc3 on arm64 Hikey device the following KFENCE bug was found. Recently, we have enabled CONFIG_KFENCE=y and started seeing this crash. kernel BUG log: [ 18.243075] BUG: KFENCE: memory corruption in usb_get_device_descriptor+0x80/0xb0 [ 18.243075] [ 18.253016] Corrupted memory at 0xbb4567e7 [ ! ! . . . . . . . . . . . . . . ] (in kfence-#118): [ 18.263817] usb_get_device_descriptor+0x80/0xb0 [ 18.268978] hub_port_init+0x3e8/0xb70 [ 18.273189] hub_event+0x578/0x1628 [ 18.277109] process_one_work+0x1c8/0x488 [ 18.281593] worker_thread+0x54/0x428 [ 18.285692] kthread+0x120/0x158 [ 18.289320] ret_from_fork+0x10/0x34 [ 18.293330] [ 18.295018] kfence-#118 [0xb55b54e8-0x1fc57965, size=18, cache=kmalloc-128] allocated by task 204: [ 18.306534] usb_get_device_descriptor+0x40/0xb0 [ 18.311693] hub_port_init+0x3e8/0xb70 [ 18.315900] hub_event+0x578/0x1628 [ 18.319819] process_one_work+0x1c8/0x488 [ 18.324301] worker_thread+0x54/0x428 [ 18.328397] kthread+0x120/0x158 [ 18.332024] ret_from_fork+0x10/0x34 root@hikey:~# [ 18.33603. /lava-2388200/environment 3] [ 18.338544] CPU: 7 PID: 204 Comm: kworker/7:2 Not tainted 5.12.0-rc2 #2 [ 18.345902] Hardware name: HiKey Development Board (DT) [ 18.351715] Workqueue: usb_hub_wq hub_event [ 18.356428] == . /lava[ 18.805771] == [ 18.813861] BUG: KFENCE: memory corruption in __usbnet_read_cmd.isra.0+0xd0/0x1a0 [ 18.813861] [ 18.823804] Corrupted memory at 0x7cedde53 [ ! ! ! . . . . . . . . . . . . . ] (in kfence-#121): [ 18.834603] __usbnet_read_cmd.isra.0+0xd0/0x1a0 [ 18.839765] usbnet_read_cmd+0x70/0xa8 [ 18.843965] asix_read_cmd+0x60/0xa0 [ 18.847981] ax88772a_hw_reset+0x148/0x468 [ 18.852570] ax88772_bind+0x1c8/0x310 [ 18.856683] usbnet_probe+0x29c/0x7d8 [ 18.860788] usb_probe_interface+0xe0/0x2c0 -[ 18.865236] really_probe+0xf0/0x4d8 [ 18.869016] driver_probe_device+0xfc/0x168 [ 18.873430] __device_attach_driver+0x94/0x120 [ 18.878116] bus_for_each_drv+0x80/0xd8 [ 18.882165] __device_attach+0xfc/0x180 [ 18.886214] device_initial_probe+0x1c/0x28 [ 18.890627] bus_probe_device+0xa4/0xb0 [ 18.894676] device_add+0x3a8/0x7e8 [ 18.898357] usb_set_configuration+0x488/0x8e8 [ 18.903044] usb_generic_driver_probe+0x58/0x98 [ 18.907823] usb_probe_device+0x44/0x108 [ 18.911964] really_probe+0xf0/0x4d8 2[ 18.924600] driver_probe_device+0xfc/0x168 [ 18.937379] __device_attach_driver+0x94/0x120 [ 18.950406] bus_for_each_drv+0x80/0xd8 [ 18.960383] __device_attach+0xfc/0x180 [ 18.969078] device_initial_probe+0x1c/0x28 3[ 18.977855] bus_probe_device+0xa4/0xb0 [ 18.986226] device_add+0x3a8/0x7e8 [ 18.994190] usb_new_device+0x1e0/0x590 [ 19.002475] hub_event+0x5ec/0x1628 [ 19.010352] process_one_work+0x1c8/0x488 [ 19.018792] worker_thread+0x54/0x428 [ 19.026921] kthread+0x120/0x158 [ 19.034614] ret_from_fork+0x10/0x34 8[ 19.042712] [ 19.048623] kfence-#121 [0x8a763b3c-0x8a763b3c, size=1, cache=kmalloc-128] allocated by task 204: [ 19.063612] __usbnet_read_cmd.isra.0+0x60/0x1a0 [ 19.072924] usbnet_read_cmd+0x70/0xa8 [ 19.081325] asix_read_cmd+0x60/0xa0 [ 19.089503] ax88772a_hw_reset+0x148/0x468 8[ 19.098163] ax88772_bind+0x1c8/0x310 [ 19.106312] usbnet_probe+0x29c/0x7d8 [ 19.114407] usb_probe_interface+0xe0/0x2c0 [ 19.122950] really_probe+0xf0/0x4d8 [ 19.130811] driver_probe_device+0xfc/0x168 [ 19.139273] __device_attach_driver+0x94/0x120 [ 19.148025] bus_for_each_drv+0x80/0xd8 [ 19.156148] __device_attach+0xfc/0x180 2[ 19.164287] device_initial_probe+0x1c/0x28 [ 19.172782] bus_probe_device+0xa4/0xb0 [ 19.180948] device_add+0x3a8/0x7e8 [ 19.188758] usb_set_configuration+0x488/0x8e8 [ 19.197455] usb_generic_driver_probe+0x58/0x98 [ 19.206120] usb_probe_device+0x44/0x108 [ 19.214175] really_probe+0xf0/0x4d8 0[ 19.221885] driver_probe_device+0xfc/0x168 [ 19.230202] __device_attach_driver+0x94/0x120 [ 19.238794] bus_for_each_drv+0x80/0xd8 [ 19.246780] __device_attach+0xfc/0x180 [ 19.254790] device_initial_probe+0x1c/0x28 [ 19.263145] bus_probe_device+0xa4/0xb0 [ 19.27] device_add+0x3a8/0x7e8 0[ 19.278682] usb_new_device+0x1e0/0x590 [ 19.286583] hub_event+0x5ec/0x1628 [ 19.294055] process_one_work+0x1c8/0x488 [ 19.302102] worker_thread+0x54/0x428 [ 19.309743] kthread+0x120/0x158 [ 19.316894] ret_from_fork+0x10/0x34 [ 19.324306] [ 19.329495] CPU: 7 PID: 204 Comm: kworker/7:2 Tainted: GB 5.12.0-rc2 #2 /[ 19.341360] Hardware name: HiKey Development Board (DT) [ 19.350439] Workqueue: usb_hub_wq hub_event Reported-by: Naresh Kamboju metadata: git branch: master git repo: https
[PATCH v2] rpadlpar: fix potential drc_name corruption in store functions
Both add_slot_store() and remove_slot_store() try to fix up the drc_name copied from the store buffer by placing a NULL terminator at nbyte + 1 or in place of a '\n' if present. However, the static buffer that we copy the drc_name data into is not zeored and can contain anything past the n-th byte. This is problematic if a '\n' byte appears in that buffer after nbytes and the string copied into the store buffer was not NULL terminated to start with as the strchr() search for a '\n' byte will mark this incorrectly as the end of the drc_name string resulting in a drc_name string that contains garbage data after the n-th byte. The following debugging shows an example of the drmgr utility writing "PHB 4543" to the add_slot sysfs attribute, but add_slot_store logging a corrupted string value. [135823.702864] drmgr: drmgr: -c phb -a -s PHB 4543 -d 1 [135823.702879] add_slot_store: drc_name = PHB 4543°|<82>!, rc = -19 Fix this by using strscpy() instead of memcpy() to ensure the string is NULL terminated when copied into the static drc_name buffer. Further, since the string is now NULL terminated the code only needs to change '\n' to '\0' when present. Signed-off-by: Tyrel Datwyler --- Changes in v2: * use strscpy instead of memcpy (suggested by mpe) drivers/pci/hotplug/rpadlpar_sysfs.c | 14 ++ 1 file changed, 6 insertions(+), 8 deletions(-) diff --git a/drivers/pci/hotplug/rpadlpar_sysfs.c b/drivers/pci/hotplug/rpadlpar_sysfs.c index cdbfa5df3a51..dbfa0b55d31a 100644 --- a/drivers/pci/hotplug/rpadlpar_sysfs.c +++ b/drivers/pci/hotplug/rpadlpar_sysfs.c @@ -34,12 +34,11 @@ static ssize_t add_slot_store(struct kobject *kobj, struct kobj_attribute *attr, if (nbytes >= MAX_DRC_NAME_LEN) return 0; - memcpy(drc_name, buf, nbytes); + strscpy(drc_name, buf, nbytes + 1); end = strchr(drc_name, '\n'); - if (!end) - end = &drc_name[nbytes]; - *end = '\0'; + if (end) + *end = '\0'; rc = dlpar_add_slot(drc_name); if (rc) @@ -65,12 +64,11 @@ static ssize_t remove_slot_store(struct kobject *kobj, if (nbytes >= MAX_DRC_NAME_LEN) return 0; - memcpy(drc_name, buf, nbytes); + strscpy(drc_name, buf, nbytes + 1); end = strchr(drc_name, '\n'); - if (!end) - end = &drc_name[nbytes]; - *end = '\0'; + if (end) + *end = '\0'; rc = dlpar_remove_slot(drc_name); if (rc) -- 2.27.0
Re: [PATCH] rpadlpar: fix potential drc_name corruption in store functions
On 3/14/21 7:52 PM, Michael Ellerman wrote: > Tyrel Datwyler writes: >> On 3/13/21 1:17 AM, Michal Suchánek wrote: >>> On Wed, Mar 10, 2021 at 04:30:21PM -0600, Tyrel Datwyler wrote: Both add_slot_store() and remove_slot_store() try to fix up the drc_name copied from the store buffer by placing a NULL terminator at nbyte + 1 or in place of a '\n' if present. However, the static buffer that we copy the drc_name data into is not zeored and can contain anything past the n-th byte. This is problematic if a '\n' byte appears in that buffer after nbytes and the string copied into the store buffer was not NULL terminated to start with as the strchr() search for a '\n' byte will mark this incorrectly as the end of the drc_name string resulting in a drc_name string that contains garbage data after the n-th byte. The following debugging shows an example of the drmgr utility writing "PHB 4543" to the add_slot sysfs attribute, but add_slot_store logging a corrupted string value. [135823.702864] drmgr: drmgr: -c phb -a -s PHB 4543 -d 1 [135823.702879] add_slot_store: drc_name = PHB 4543°|<82>!, rc = -19 Fix this by NULL terminating the string when we copy it into our static buffer by coping nbytes + 1 of data from the store buffer. The code has >>> Why is it OK to copy nbytes + 1 and why is it expected that the buffer >>> contains a nul after the content? >> >> It is my understanding that the store function buffer is allocated as a >> zeroed-page which the kernel copies up to at most (PAGE_SIZE - 1) of user >> data >> into. Anything after nbytes would therefore be zeroed. > > I think that's true, but it would be nice if we didn't have to rely on > that obscure detail in order for this code to be correct & understandable. I think its a security guarantee, but I guess barring a comment that explicitly outlines the correctness it probably isn't obvious. > >>> Isn't it much saner to just nul terminate the string after copying? >> >> At the cost of an extra line of code, sure. > > Is there a reason we can't use strscpy()? That should deal with all the > corner cases around the string copy, and then all you have to do is look > for a newline and turn it into nul. Fine with me. I'll spin v2 with strscpy(). -Tyrel > > cheers >
[PATCH 5.11 301/306] mm/userfaultfd: fix memory corruption due to writeprotect
From: Greg Kroah-Hartman From: Nadav Amit commit 6ce64428d62026a10cb5d80138ff2f90cc21d367 upstream. Userfaultfd self-test fails occasionally, indicating a memory corruption. Analyzing this problem indicates that there is a real bug since mmap_lock is only taken for read in mwriteprotect_range() and defers flushes, and since there is insufficient consideration of concurrent deferred TLB flushes in wp_page_copy(). Although the PTE is flushed from the TLBs in wp_page_copy(), this flush takes place after the copy has already been performed, and therefore changes of the page are possible between the time of the copy and the time in which the PTE is flushed. To make matters worse, memory-unprotection using userfaultfd also poses a problem. Although memory unprotection is logically a promotion of PTE permissions, and therefore should not require a TLB flush, the current userrfaultfd code might actually cause a demotion of the architectural PTE permission: when userfaultfd_writeprotect() unprotects memory region, it unintentionally *clears* the RW-bit if it was already set. Note that this unprotecting a PTE that is not write-protected is a valid use-case: the userfaultfd monitor might ask to unprotect a region that holds both write-protected and write-unprotected PTEs. The scenario that happens in selftests/vm/userfaultfd is as follows: cpu0cpu1cpu2 [ Writable PTE cached in TLB ] userfaultfd_writeprotect() [ write-*unprotect* ] mwriteprotect_range() mmap_read_lock() change_protection() change_protection_range() ... change_pte_range() [ *clear* “write”-bit ] [ defer TLB flushes ] [ page-fault ] ... wp_page_copy() cow_user_page() [ copy page ] [ write to old page ] ... set_pte_at_notify() A similar scenario can happen: cpu0cpu1cpu2cpu3 [ Writable PTE cached in TLB ] userfaultfd_writeprotect() [ write-protect ] [ deferred TLB flush ] userfaultfd_writeprotect() [ write-unprotect ] [ deferred TLB flush] [ page-fault ] wp_page_copy() cow_user_page() [ copy page ] ...[ write to page ] set_pte_at_notify() This race exists since commit 292924b26024 ("userfaultfd: wp: apply _PAGE_UFFD_WP bit"). Yet, as Yu Zhao pointed, these races became apparent since commit 09854ba94c6a ("mm: do_wp_page() simplification") which made wp_page_copy() more likely to take place, specifically if page_count(page) > 1. To resolve the aforementioned races, check whether there are pending flushes on uffd-write-protected VMAs, and if there are, perform a flush before doing the COW. Further optimizations will follow to avoid during uffd-write-unprotect unnecassary PTE write-protection and TLB flushes. Link: https://lkml.kernel.org/r/20210304095423.3825684-1-na...@vmware.com Fixes: 09854ba94c6a ("mm: do_wp_page() simplification") Signed-off-by: Nadav Amit Suggested-by: Yu Zhao Reviewed-by: Peter Xu Tested-by: Peter Xu Cc: Andrea Arcangeli Cc: Andy Lutomirski Cc: Pavel Emelyanov Cc: Mike Kravetz Cc: Mike Rapoport Cc: Minchan Kim Cc: Will Deacon Cc: Peter Zijlstra Cc: [5.9+] Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- mm/memory.c |8 1 file changed, 8 insertions(+) --- a/mm/memory.c +++ b/mm/memory.c @@ -3092,6 +3092,14 @@ static vm_fault_t do_wp_page(struct vm_f return handle_userfault(vmf, VM_UFFD_WP); } + /* +* Userfaultfd write-protect can defer flushes. Ensure the TLB +* is flushed in this case before copying. +*/ + if (unlikely(userfaultfd_wp(vmf->vma) && +mm_tlb_flush_pending(vmf->vma->vm_mm))) + flush_tlb_page(vmf->vma, vmf->address); + vmf->page = vm_normal_page(vma, vmf->address, vmf->orig_pte); if (!vmf->page) { /*
[PATCH 5.10 283/290] mm/userfaultfd: fix memory corruption due to writeprotect
From: Greg Kroah-Hartman From: Nadav Amit commit 6ce64428d62026a10cb5d80138ff2f90cc21d367 upstream. Userfaultfd self-test fails occasionally, indicating a memory corruption. Analyzing this problem indicates that there is a real bug since mmap_lock is only taken for read in mwriteprotect_range() and defers flushes, and since there is insufficient consideration of concurrent deferred TLB flushes in wp_page_copy(). Although the PTE is flushed from the TLBs in wp_page_copy(), this flush takes place after the copy has already been performed, and therefore changes of the page are possible between the time of the copy and the time in which the PTE is flushed. To make matters worse, memory-unprotection using userfaultfd also poses a problem. Although memory unprotection is logically a promotion of PTE permissions, and therefore should not require a TLB flush, the current userrfaultfd code might actually cause a demotion of the architectural PTE permission: when userfaultfd_writeprotect() unprotects memory region, it unintentionally *clears* the RW-bit if it was already set. Note that this unprotecting a PTE that is not write-protected is a valid use-case: the userfaultfd monitor might ask to unprotect a region that holds both write-protected and write-unprotected PTEs. The scenario that happens in selftests/vm/userfaultfd is as follows: cpu0cpu1cpu2 [ Writable PTE cached in TLB ] userfaultfd_writeprotect() [ write-*unprotect* ] mwriteprotect_range() mmap_read_lock() change_protection() change_protection_range() ... change_pte_range() [ *clear* “write”-bit ] [ defer TLB flushes ] [ page-fault ] ... wp_page_copy() cow_user_page() [ copy page ] [ write to old page ] ... set_pte_at_notify() A similar scenario can happen: cpu0cpu1cpu2cpu3 [ Writable PTE cached in TLB ] userfaultfd_writeprotect() [ write-protect ] [ deferred TLB flush ] userfaultfd_writeprotect() [ write-unprotect ] [ deferred TLB flush] [ page-fault ] wp_page_copy() cow_user_page() [ copy page ] ...[ write to page ] set_pte_at_notify() This race exists since commit 292924b26024 ("userfaultfd: wp: apply _PAGE_UFFD_WP bit"). Yet, as Yu Zhao pointed, these races became apparent since commit 09854ba94c6a ("mm: do_wp_page() simplification") which made wp_page_copy() more likely to take place, specifically if page_count(page) > 1. To resolve the aforementioned races, check whether there are pending flushes on uffd-write-protected VMAs, and if there are, perform a flush before doing the COW. Further optimizations will follow to avoid during uffd-write-unprotect unnecassary PTE write-protection and TLB flushes. Link: https://lkml.kernel.org/r/20210304095423.3825684-1-na...@vmware.com Fixes: 09854ba94c6a ("mm: do_wp_page() simplification") Signed-off-by: Nadav Amit Suggested-by: Yu Zhao Reviewed-by: Peter Xu Tested-by: Peter Xu Cc: Andrea Arcangeli Cc: Andy Lutomirski Cc: Pavel Emelyanov Cc: Mike Kravetz Cc: Mike Rapoport Cc: Minchan Kim Cc: Will Deacon Cc: Peter Zijlstra Cc: [5.9+] Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- mm/memory.c |8 1 file changed, 8 insertions(+) --- a/mm/memory.c +++ b/mm/memory.c @@ -3090,6 +3090,14 @@ static vm_fault_t do_wp_page(struct vm_f return handle_userfault(vmf, VM_UFFD_WP); } + /* +* Userfaultfd write-protect can defer flushes. Ensure the TLB +* is flushed in this case before copying. +*/ + if (unlikely(userfaultfd_wp(vmf->vma) && +mm_tlb_flush_pending(vmf->vma->vm_mm))) + flush_tlb_page(vmf->vma, vmf->address); + vmf->page = vm_normal_page(vma, vmf->address, vmf->orig_pte); if (!vmf->page) { /*
[PATCH 5.11 224/306] staging: rtl8188eu: fix potential memory corruption in rtw_check_beacon_data()
From: Greg Kroah-Hartman From: Dan Carpenter commit d4ac640322b06095128a5c45ba4a1e80929fe7f3 upstream. The "ie_len" is a value in the 1-255 range that comes from the user. We have to cap it to ensure that it's not too large or it could lead to memory corruption. Fixes: 9a7fe54ddc3a ("staging: r8188eu: Add source files for new driver - part 1") Signed-off-by: Dan Carpenter Cc: stable Link: https://lore.kernel.org/r/YEHyQCrFZKTXyT7J@mwanda Signed-off-by: Greg Kroah-Hartman --- drivers/staging/rtl8188eu/core/rtw_ap.c |5 + 1 file changed, 5 insertions(+) --- a/drivers/staging/rtl8188eu/core/rtw_ap.c +++ b/drivers/staging/rtl8188eu/core/rtw_ap.c @@ -791,6 +791,7 @@ int rtw_check_beacon_data(struct adapter p = rtw_get_ie(ie + _BEACON_IE_OFFSET_, WLAN_EID_SSID, &ie_len, pbss_network->ie_length - _BEACON_IE_OFFSET_); if (p && ie_len > 0) { + ie_len = min_t(int, ie_len, sizeof(pbss_network->ssid.ssid)); memset(&pbss_network->ssid, 0, sizeof(struct ndis_802_11_ssid)); memcpy(pbss_network->ssid.ssid, p + 2, ie_len); pbss_network->ssid.ssid_length = ie_len; @@ -811,6 +812,7 @@ int rtw_check_beacon_data(struct adapter p = rtw_get_ie(ie + _BEACON_IE_OFFSET_, WLAN_EID_SUPP_RATES, &ie_len, pbss_network->ie_length - _BEACON_IE_OFFSET_); if (p) { + ie_len = min_t(int, ie_len, NDIS_802_11_LENGTH_RATES_EX); memcpy(supportRate, p + 2, ie_len); supportRateNum = ie_len; } @@ -819,6 +821,8 @@ int rtw_check_beacon_data(struct adapter p = rtw_get_ie(ie + _BEACON_IE_OFFSET_, WLAN_EID_EXT_SUPP_RATES, &ie_len, pbss_network->ie_length - _BEACON_IE_OFFSET_); if (p) { + ie_len = min_t(int, ie_len, + NDIS_802_11_LENGTH_RATES_EX - supportRateNum); memcpy(supportRate + supportRateNum, p + 2, ie_len); supportRateNum += ie_len; } @@ -934,6 +938,7 @@ int rtw_check_beacon_data(struct adapter pht_cap->mcs.rx_mask[0] = 0xff; pht_cap->mcs.rx_mask[1] = 0x0; + ie_len = min_t(int, ie_len, sizeof(pmlmepriv->htpriv.ht_cap)); memcpy(&pmlmepriv->htpriv.ht_cap, p + 2, ie_len); }
[PATCH 5.10 228/290] staging: rtl8188eu: fix potential memory corruption in rtw_check_beacon_data()
From: Greg Kroah-Hartman From: Dan Carpenter commit d4ac640322b06095128a5c45ba4a1e80929fe7f3 upstream. The "ie_len" is a value in the 1-255 range that comes from the user. We have to cap it to ensure that it's not too large or it could lead to memory corruption. Fixes: 9a7fe54ddc3a ("staging: r8188eu: Add source files for new driver - part 1") Signed-off-by: Dan Carpenter Cc: stable Link: https://lore.kernel.org/r/YEHyQCrFZKTXyT7J@mwanda Signed-off-by: Greg Kroah-Hartman --- drivers/staging/rtl8188eu/core/rtw_ap.c |5 + 1 file changed, 5 insertions(+) --- a/drivers/staging/rtl8188eu/core/rtw_ap.c +++ b/drivers/staging/rtl8188eu/core/rtw_ap.c @@ -791,6 +791,7 @@ int rtw_check_beacon_data(struct adapter p = rtw_get_ie(ie + _BEACON_IE_OFFSET_, _SSID_IE_, &ie_len, pbss_network->ie_length - _BEACON_IE_OFFSET_); if (p && ie_len > 0) { + ie_len = min_t(int, ie_len, sizeof(pbss_network->ssid.ssid)); memset(&pbss_network->ssid, 0, sizeof(struct ndis_802_11_ssid)); memcpy(pbss_network->ssid.ssid, p + 2, ie_len); pbss_network->ssid.ssid_length = ie_len; @@ -811,6 +812,7 @@ int rtw_check_beacon_data(struct adapter p = rtw_get_ie(ie + _BEACON_IE_OFFSET_, _SUPPORTEDRATES_IE_, &ie_len, pbss_network->ie_length - _BEACON_IE_OFFSET_); if (p) { + ie_len = min_t(int, ie_len, NDIS_802_11_LENGTH_RATES_EX); memcpy(supportRate, p + 2, ie_len); supportRateNum = ie_len; } @@ -819,6 +821,8 @@ int rtw_check_beacon_data(struct adapter p = rtw_get_ie(ie + _BEACON_IE_OFFSET_, _EXT_SUPPORTEDRATES_IE_, &ie_len, pbss_network->ie_length - _BEACON_IE_OFFSET_); if (p) { + ie_len = min_t(int, ie_len, + NDIS_802_11_LENGTH_RATES_EX - supportRateNum); memcpy(supportRate + supportRateNum, p + 2, ie_len); supportRateNum += ie_len; } @@ -934,6 +938,7 @@ int rtw_check_beacon_data(struct adapter pht_cap->mcs.rx_mask[0] = 0xff; pht_cap->mcs.rx_mask[1] = 0x0; + ie_len = min_t(int, ie_len, sizeof(pmlmepriv->htpriv.ht_cap)); memcpy(&pmlmepriv->htpriv.ht_cap, p + 2, ie_len); }
[PATCH 5.10 159/290] kasan: fix memory corruption in kasan_bitops_tags test
From: Greg Kroah-Hartman From: Andrey Konovalov [ Upstream commit e66e1799a76621003e5b04c9c057826a2152e103 ] Since the hardware tag-based KASAN mode might not have a redzone that comes after an allocated object (when kasan.mode=prod is enabled), the kasan_bitops_tags() test ends up corrupting the next object in memory. Change the test so it always accesses the redzone that lies within the allocated object's boundaries. Link: https://linux-review.googlesource.com/id/I67f51d1ee48f0a8d0fe2658c2a39e4879fe0832a Link: https://lkml.kernel.org/r/7d452ce4ae35bb1988d2c9244dfea56cf2cc9315.1610733117.git.andreyk...@google.com Signed-off-by: Andrey Konovalov Reviewed-by: Marco Elver Reviewed-by: Alexander Potapenko Cc: Andrey Ryabinin Cc: Branislav Rankov Cc: Catalin Marinas Cc: Dmitry Vyukov Cc: Evgenii Stepanov Cc: Kevin Brodsky Cc: Peter Collingbourne Cc: Vincenzo Frascino Cc: Will Deacon Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin --- lib/test_kasan.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/lib/test_kasan.c b/lib/test_kasan.c index 662f862702fc..400507f1e5db 100644 --- a/lib/test_kasan.c +++ b/lib/test_kasan.c @@ -737,13 +737,13 @@ static void kasan_bitops_tags(struct kunit *test) return; } - /* Allocation size will be rounded to up granule size, which is 16. */ - bits = kzalloc(sizeof(*bits), GFP_KERNEL); + /* kmalloc-64 cache will be used and the last 16 bytes will be the redzone. */ + bits = kzalloc(48, GFP_KERNEL); KUNIT_ASSERT_NOT_ERR_OR_NULL(test, bits); - /* Do the accesses past the 16 allocated bytes. */ - kasan_bitops_modify(test, BITS_PER_LONG, &bits[1]); - kasan_bitops_test_and_modify(test, BITS_PER_LONG + BITS_PER_BYTE, &bits[1]); + /* Do the accesses past the 48 allocated bytes, but within the redone. */ + kasan_bitops_modify(test, BITS_PER_LONG, (void *)bits + 48); + kasan_bitops_test_and_modify(test, BITS_PER_LONG + BITS_PER_BYTE, (void *)bits + 48); kfree(bits); } -- 2.30.1
[PATCH 5.11 156/306] kasan: fix memory corruption in kasan_bitops_tags test
From: Greg Kroah-Hartman From: Andrey Konovalov [ Upstream commit e66e1799a76621003e5b04c9c057826a2152e103 ] Since the hardware tag-based KASAN mode might not have a redzone that comes after an allocated object (when kasan.mode=prod is enabled), the kasan_bitops_tags() test ends up corrupting the next object in memory. Change the test so it always accesses the redzone that lies within the allocated object's boundaries. Link: https://linux-review.googlesource.com/id/I67f51d1ee48f0a8d0fe2658c2a39e4879fe0832a Link: https://lkml.kernel.org/r/7d452ce4ae35bb1988d2c9244dfea56cf2cc9315.1610733117.git.andreyk...@google.com Signed-off-by: Andrey Konovalov Reviewed-by: Marco Elver Reviewed-by: Alexander Potapenko Cc: Andrey Ryabinin Cc: Branislav Rankov Cc: Catalin Marinas Cc: Dmitry Vyukov Cc: Evgenii Stepanov Cc: Kevin Brodsky Cc: Peter Collingbourne Cc: Vincenzo Frascino Cc: Will Deacon Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin --- lib/test_kasan.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/lib/test_kasan.c b/lib/test_kasan.c index 2947274cc2d3..5a2f104ca13f 100644 --- a/lib/test_kasan.c +++ b/lib/test_kasan.c @@ -737,13 +737,13 @@ static void kasan_bitops_tags(struct kunit *test) return; } - /* Allocation size will be rounded to up granule size, which is 16. */ - bits = kzalloc(sizeof(*bits), GFP_KERNEL); + /* kmalloc-64 cache will be used and the last 16 bytes will be the redzone. */ + bits = kzalloc(48, GFP_KERNEL); KUNIT_ASSERT_NOT_ERR_OR_NULL(test, bits); - /* Do the accesses past the 16 allocated bytes. */ - kasan_bitops_modify(test, BITS_PER_LONG, &bits[1]); - kasan_bitops_test_and_modify(test, BITS_PER_LONG + BITS_PER_BYTE, &bits[1]); + /* Do the accesses past the 48 allocated bytes, but within the redone. */ + kasan_bitops_modify(test, BITS_PER_LONG, (void *)bits + 48); + kasan_bitops_test_and_modify(test, BITS_PER_LONG + BITS_PER_BYTE, (void *)bits + 48); kfree(bits); } -- 2.30.1
[PATCH 5.10 134/290] udf: fix silent AED tagLocation corruption
From: Greg Kroah-Hartman From: Steven J. Magnani [ Upstream commit 63c9e47a1642fc817654a1bc18a6ec4bbcc0f056 ] When extending a file, udf_do_extend_file() may enter following empty indirect extent. At the end of udf_do_extend_file() we revert prev_epos to point to the last written extent. However if we end up not adding any further extent in udf_do_extend_file(), the reverting points prev_epos into the header area of the AED and following updates of the extents (in udf_update_extents()) will corrupt the header. Make sure that we do not follow indirect extent if we are not going to add any more extents so that returning back to the last written extent works correctly. Link: https://lore.kernel.org/r/20210107234116.6190-2-magn...@ieee.org Signed-off-by: Steven J. Magnani Signed-off-by: Jan Kara Signed-off-by: Sasha Levin --- fs/udf/inode.c | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/fs/udf/inode.c b/fs/udf/inode.c index bb89c3e43212..0dd2f93ac048 100644 --- a/fs/udf/inode.c +++ b/fs/udf/inode.c @@ -544,11 +544,14 @@ static int udf_do_extend_file(struct inode *inode, udf_write_aext(inode, last_pos, &last_ext->extLocation, last_ext->extLength, 1); + /* -* We've rewritten the last extent but there may be empty -* indirect extent after it - enter it. +* We've rewritten the last extent. If we are going to add +* more extents, we may need to enter possible following +* empty indirect extent. */ - udf_next_aext(inode, last_pos, &tmploc, &tmplen, 0); + if (new_block_bytes || prealloc_len) + udf_next_aext(inode, last_pos, &tmploc, &tmplen, 0); } /* Managed to do everything necessary? */ -- 2.30.1
[PATCH 5.4 130/168] staging: rtl8188eu: fix potential memory corruption in rtw_check_beacon_data()
From: Greg Kroah-Hartman From: Dan Carpenter commit d4ac640322b06095128a5c45ba4a1e80929fe7f3 upstream. The "ie_len" is a value in the 1-255 range that comes from the user. We have to cap it to ensure that it's not too large or it could lead to memory corruption. Fixes: 9a7fe54ddc3a ("staging: r8188eu: Add source files for new driver - part 1") Signed-off-by: Dan Carpenter Cc: stable Link: https://lore.kernel.org/r/YEHyQCrFZKTXyT7J@mwanda Signed-off-by: Greg Kroah-Hartman --- drivers/staging/rtl8188eu/core/rtw_ap.c |5 + 1 file changed, 5 insertions(+) --- a/drivers/staging/rtl8188eu/core/rtw_ap.c +++ b/drivers/staging/rtl8188eu/core/rtw_ap.c @@ -784,6 +784,7 @@ int rtw_check_beacon_data(struct adapter /* SSID */ p = rtw_get_ie(ie + _BEACON_IE_OFFSET_, _SSID_IE_, &ie_len, (pbss_network->ie_length - _BEACON_IE_OFFSET_)); if (p && ie_len > 0) { + ie_len = min_t(int, ie_len, sizeof(pbss_network->ssid.ssid)); memset(&pbss_network->ssid, 0, sizeof(struct ndis_802_11_ssid)); memcpy(pbss_network->ssid.ssid, (p + 2), ie_len); pbss_network->ssid.ssid_length = ie_len; @@ -802,6 +803,7 @@ int rtw_check_beacon_data(struct adapter /* get supported rates */ p = rtw_get_ie(ie + _BEACON_IE_OFFSET_, _SUPPORTEDRATES_IE_, &ie_len, (pbss_network->ie_length - _BEACON_IE_OFFSET_)); if (p) { + ie_len = min_t(int, ie_len, NDIS_802_11_LENGTH_RATES_EX); memcpy(supportRate, p + 2, ie_len); supportRateNum = ie_len; } @@ -809,6 +811,8 @@ int rtw_check_beacon_data(struct adapter /* get ext_supported rates */ p = rtw_get_ie(ie + _BEACON_IE_OFFSET_, _EXT_SUPPORTEDRATES_IE_, &ie_len, pbss_network->ie_length - _BEACON_IE_OFFSET_); if (p) { + ie_len = min_t(int, ie_len, + NDIS_802_11_LENGTH_RATES_EX - supportRateNum); memcpy(supportRate + supportRateNum, p + 2, ie_len); supportRateNum += ie_len; } @@ -922,6 +926,7 @@ int rtw_check_beacon_data(struct adapter pht_cap->mcs.rx_mask[0] = 0xff; pht_cap->mcs.rx_mask[1] = 0x0; + ie_len = min_t(int, ie_len, sizeof(pmlmepriv->htpriv.ht_cap)); memcpy(&pmlmepriv->htpriv.ht_cap, p + 2, ie_len); }
[PATCH 4.19 092/120] staging: rtl8188eu: fix potential memory corruption in rtw_check_beacon_data()
From: Greg Kroah-Hartman From: Dan Carpenter commit d4ac640322b06095128a5c45ba4a1e80929fe7f3 upstream. The "ie_len" is a value in the 1-255 range that comes from the user. We have to cap it to ensure that it's not too large or it could lead to memory corruption. Fixes: 9a7fe54ddc3a ("staging: r8188eu: Add source files for new driver - part 1") Signed-off-by: Dan Carpenter Cc: stable Link: https://lore.kernel.org/r/YEHyQCrFZKTXyT7J@mwanda Signed-off-by: Greg Kroah-Hartman --- drivers/staging/rtl8188eu/core/rtw_ap.c |5 + 1 file changed, 5 insertions(+) --- a/drivers/staging/rtl8188eu/core/rtw_ap.c +++ b/drivers/staging/rtl8188eu/core/rtw_ap.c @@ -794,6 +794,7 @@ int rtw_check_beacon_data(struct adapter /* SSID */ p = rtw_get_ie(ie + _BEACON_IE_OFFSET_, _SSID_IE_, &ie_len, (pbss_network->ie_length - _BEACON_IE_OFFSET_)); if (p && ie_len > 0) { + ie_len = min_t(int, ie_len, sizeof(pbss_network->Ssid.Ssid)); memset(&pbss_network->Ssid, 0, sizeof(struct ndis_802_11_ssid)); memcpy(pbss_network->Ssid.Ssid, (p + 2), ie_len); pbss_network->Ssid.SsidLength = ie_len; @@ -812,6 +813,7 @@ int rtw_check_beacon_data(struct adapter /* get supported rates */ p = rtw_get_ie(ie + _BEACON_IE_OFFSET_, _SUPPORTEDRATES_IE_, &ie_len, (pbss_network->ie_length - _BEACON_IE_OFFSET_)); if (p) { + ie_len = min_t(int, ie_len, NDIS_802_11_LENGTH_RATES_EX); memcpy(supportRate, p + 2, ie_len); supportRateNum = ie_len; } @@ -819,6 +821,8 @@ int rtw_check_beacon_data(struct adapter /* get ext_supported rates */ p = rtw_get_ie(ie + _BEACON_IE_OFFSET_, _EXT_SUPPORTEDRATES_IE_, &ie_len, pbss_network->ie_length - _BEACON_IE_OFFSET_); if (p) { + ie_len = min_t(int, ie_len, + NDIS_802_11_LENGTH_RATES_EX - supportRateNum); memcpy(supportRate + supportRateNum, p + 2, ie_len); supportRateNum += ie_len; } @@ -932,6 +936,7 @@ int rtw_check_beacon_data(struct adapter pht_cap->mcs.rx_mask[0] = 0xff; pht_cap->mcs.rx_mask[1] = 0x0; + ie_len = min_t(int, ie_len, sizeof(pmlmepriv->htpriv.ht_cap)); memcpy(&pmlmepriv->htpriv.ht_cap, p+2, ie_len); }
[PATCH 5.4 064/168] udf: fix silent AED tagLocation corruption
From: Greg Kroah-Hartman From: Steven J. Magnani [ Upstream commit 63c9e47a1642fc817654a1bc18a6ec4bbcc0f056 ] When extending a file, udf_do_extend_file() may enter following empty indirect extent. At the end of udf_do_extend_file() we revert prev_epos to point to the last written extent. However if we end up not adding any further extent in udf_do_extend_file(), the reverting points prev_epos into the header area of the AED and following updates of the extents (in udf_update_extents()) will corrupt the header. Make sure that we do not follow indirect extent if we are not going to add any more extents so that returning back to the last written extent works correctly. Link: https://lore.kernel.org/r/20210107234116.6190-2-magn...@ieee.org Signed-off-by: Steven J. Magnani Signed-off-by: Jan Kara Signed-off-by: Sasha Levin --- fs/udf/inode.c | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/fs/udf/inode.c b/fs/udf/inode.c index 97a192eb9949..507f8f910327 100644 --- a/fs/udf/inode.c +++ b/fs/udf/inode.c @@ -547,11 +547,14 @@ static int udf_do_extend_file(struct inode *inode, udf_write_aext(inode, last_pos, &last_ext->extLocation, last_ext->extLength, 1); + /* -* We've rewritten the last extent but there may be empty -* indirect extent after it - enter it. +* We've rewritten the last extent. If we are going to add +* more extents, we may need to enter possible following +* empty indirect extent. */ - udf_next_aext(inode, last_pos, &tmploc, &tmplen, 0); + if (new_block_bytes || prealloc_len) + udf_next_aext(inode, last_pos, &tmploc, &tmplen, 0); } /* Managed to do everything necessary? */ -- 2.30.1
[PATCH 4.19 043/120] udf: fix silent AED tagLocation corruption
From: Greg Kroah-Hartman From: Steven J. Magnani [ Upstream commit 63c9e47a1642fc817654a1bc18a6ec4bbcc0f056 ] When extending a file, udf_do_extend_file() may enter following empty indirect extent. At the end of udf_do_extend_file() we revert prev_epos to point to the last written extent. However if we end up not adding any further extent in udf_do_extend_file(), the reverting points prev_epos into the header area of the AED and following updates of the extents (in udf_update_extents()) will corrupt the header. Make sure that we do not follow indirect extent if we are not going to add any more extents so that returning back to the last written extent works correctly. Link: https://lore.kernel.org/r/20210107234116.6190-2-magn...@ieee.org Signed-off-by: Steven J. Magnani Signed-off-by: Jan Kara Signed-off-by: Sasha Levin --- fs/udf/inode.c | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/fs/udf/inode.c b/fs/udf/inode.c index 3bf89a633836..f5500d2a3879 100644 --- a/fs/udf/inode.c +++ b/fs/udf/inode.c @@ -540,11 +540,14 @@ static int udf_do_extend_file(struct inode *inode, udf_write_aext(inode, last_pos, &last_ext->extLocation, last_ext->extLength, 1); + /* -* We've rewritten the last extent but there may be empty -* indirect extent after it - enter it. +* We've rewritten the last extent. If we are going to add +* more extents, we may need to enter possible following +* empty indirect extent. */ - udf_next_aext(inode, last_pos, &tmploc, &tmplen, 0); + if (new_block_bytes || prealloc_len) + udf_next_aext(inode, last_pos, &tmploc, &tmplen, 0); } /* Managed to do everything necessary? */ -- 2.30.1
[PATCH 5.11 121/306] MIPS: kernel: Reserve exception base early to prevent corruption
From: Greg Kroah-Hartman From: Thomas Bogendoerfer [ Upstream commit bd67b711bfaa02cf19e88aa2d9edae5c1c1d2739 ] BMIPS is one of the few platforms that do change the exception base. After commit 2dcb39645441 ("memblock: do not start bottom-up allocations with kernel_end") we started seeing BMIPS boards fail to boot with the built-in FDT being corrupted. Before the cited commit, early allocations would be in the [kernel_end, RAM_END] range, but after commit they would be within [RAM_START + PAGE_SIZE, RAM_END]. The custom exception base handler that is installed by bmips_ebase_setup() done for BMIPS5000 CPUs ends-up trampling on the memory region allocated by unflatten_and_copy_device_tree() thus corrupting the FDT used by the kernel. To fix this, we need to perform an early reservation of the custom exception space. Additional we reserve the first 4k (1k for R3k) for either normal exception vector space (legacy CPUs) or special vectors like cache exceptions. Huge thanks to Serge for analysing and proposing a solution to this issue. Fixes: 2dcb39645441 ("memblock: do not start bottom-up allocations with kernel_end") Reported-by: Kamal Dasu Debugged-by: Serge Semin Acked-by: Mike Rapoport Tested-by: Florian Fainelli Reviewed-by: Serge Semin Signed-off-by: Thomas Bogendoerfer Signed-off-by: Sasha Levin --- arch/mips/include/asm/traps.h| 3 +++ arch/mips/kernel/cpu-probe.c | 6 ++ arch/mips/kernel/cpu-r3k-probe.c | 3 +++ arch/mips/kernel/traps.c | 10 +- 4 files changed, 17 insertions(+), 5 deletions(-) diff --git a/arch/mips/include/asm/traps.h b/arch/mips/include/asm/traps.h index 6a0864bb604d..9038b91e2d8c 100644 --- a/arch/mips/include/asm/traps.h +++ b/arch/mips/include/asm/traps.h @@ -24,6 +24,9 @@ extern void (*board_ebase_setup)(void); extern void (*board_cache_error_setup)(void); extern int register_nmi_notifier(struct notifier_block *nb); +extern void reserve_exception_space(phys_addr_t addr, unsigned long size); + +#define VECTORSPACING 0x100/* for EI/VI mode */ #define nmi_notifier(fn, pri) \ ({ \ diff --git a/arch/mips/kernel/cpu-probe.c b/arch/mips/kernel/cpu-probe.c index 31cb9199197c..21794db53c05 100644 --- a/arch/mips/kernel/cpu-probe.c +++ b/arch/mips/kernel/cpu-probe.c @@ -26,6 +26,7 @@ #include #include #include +#include #include #include "fpu-probe.h" @@ -1619,6 +1620,7 @@ static inline void cpu_probe_broadcom(struct cpuinfo_mips *c, unsigned int cpu) c->cputype = CPU_BMIPS3300; __cpu_name[cpu] = "Broadcom BMIPS3300"; set_elf_platform(cpu, "bmips3300"); + reserve_exception_space(0x400, VECTORSPACING * 64); break; case PRID_IMP_BMIPS43XX: { int rev = c->processor_id & PRID_REV_MASK; @@ -1629,6 +1631,7 @@ static inline void cpu_probe_broadcom(struct cpuinfo_mips *c, unsigned int cpu) __cpu_name[cpu] = "Broadcom BMIPS4380"; set_elf_platform(cpu, "bmips4380"); c->options |= MIPS_CPU_RIXI; + reserve_exception_space(0x400, VECTORSPACING * 64); } else { c->cputype = CPU_BMIPS4350; __cpu_name[cpu] = "Broadcom BMIPS4350"; @@ -1645,6 +1648,7 @@ static inline void cpu_probe_broadcom(struct cpuinfo_mips *c, unsigned int cpu) __cpu_name[cpu] = "Broadcom BMIPS5000"; set_elf_platform(cpu, "bmips5000"); c->options |= MIPS_CPU_ULRI | MIPS_CPU_RIXI; + reserve_exception_space(0x1000, VECTORSPACING * 64); break; } } @@ -2124,6 +2128,8 @@ void cpu_probe(void) if (cpu == 0) __ua_limit = ~((1ull << cpu_vmbits) - 1); #endif + + reserve_exception_space(0, 0x1000); } void cpu_report(void) diff --git a/arch/mips/kernel/cpu-r3k-probe.c b/arch/mips/kernel/cpu-r3k-probe.c index abdbbe8c5a43..af654771918c 100644 --- a/arch/mips/kernel/cpu-r3k-probe.c +++ b/arch/mips/kernel/cpu-r3k-probe.c @@ -21,6 +21,7 @@ #include #include #include +#include #include "fpu-probe.h" @@ -158,6 +159,8 @@ void cpu_probe(void) cpu_set_fpu_opts(c); else cpu_set_nofpu_opts(c); + + reserve_exception_space(0, 0x400); } void cpu_report(void) diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c index e0352958e2f7..808b8b61ded1 100644 --- a/arch/mips/kernel/traps.c +++ b/arch/mips/kernel/traps.c @@ -2009,13 +2009,16 @@ void __noreturn nmi_exception_handler(struct pt_regs *regs) nmi_exit(); } -#define VECTORSPACING 0x100/* for EI/VI mode */ - unsigned long ebase; EXPORT_SYMBOL_GPL(ebase); unsigned long exception_handlers[32]; unsigned long vi_handlers[64]; +void reserve_exception_space(p
[PATCH 5.11 129/306] udf: fix silent AED tagLocation corruption
From: Greg Kroah-Hartman From: Steven J. Magnani [ Upstream commit 63c9e47a1642fc817654a1bc18a6ec4bbcc0f056 ] When extending a file, udf_do_extend_file() may enter following empty indirect extent. At the end of udf_do_extend_file() we revert prev_epos to point to the last written extent. However if we end up not adding any further extent in udf_do_extend_file(), the reverting points prev_epos into the header area of the AED and following updates of the extents (in udf_update_extents()) will corrupt the header. Make sure that we do not follow indirect extent if we are not going to add any more extents so that returning back to the last written extent works correctly. Link: https://lore.kernel.org/r/20210107234116.6190-2-magn...@ieee.org Signed-off-by: Steven J. Magnani Signed-off-by: Jan Kara Signed-off-by: Sasha Levin --- fs/udf/inode.c | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/fs/udf/inode.c b/fs/udf/inode.c index bb89c3e43212..0dd2f93ac048 100644 --- a/fs/udf/inode.c +++ b/fs/udf/inode.c @@ -544,11 +544,14 @@ static int udf_do_extend_file(struct inode *inode, udf_write_aext(inode, last_pos, &last_ext->extLocation, last_ext->extLength, 1); + /* -* We've rewritten the last extent but there may be empty -* indirect extent after it - enter it. +* We've rewritten the last extent. If we are going to add +* more extents, we may need to enter possible following +* empty indirect extent. */ - udf_next_aext(inode, last_pos, &tmploc, &tmplen, 0); + if (new_block_bytes || prealloc_len) + udf_next_aext(inode, last_pos, &tmploc, &tmplen, 0); } /* Managed to do everything necessary? */ -- 2.30.1
[PATCH 4.14 68/95] staging: rtl8188eu: fix potential memory corruption in rtw_check_beacon_data()
From: Greg Kroah-Hartman From: Dan Carpenter commit d4ac640322b06095128a5c45ba4a1e80929fe7f3 upstream. The "ie_len" is a value in the 1-255 range that comes from the user. We have to cap it to ensure that it's not too large or it could lead to memory corruption. Fixes: 9a7fe54ddc3a ("staging: r8188eu: Add source files for new driver - part 1") Signed-off-by: Dan Carpenter Cc: stable Link: https://lore.kernel.org/r/YEHyQCrFZKTXyT7J@mwanda Signed-off-by: Greg Kroah-Hartman --- drivers/staging/rtl8188eu/core/rtw_ap.c |5 + 1 file changed, 5 insertions(+) --- a/drivers/staging/rtl8188eu/core/rtw_ap.c +++ b/drivers/staging/rtl8188eu/core/rtw_ap.c @@ -912,6 +912,7 @@ int rtw_check_beacon_data(struct adapter /* SSID */ p = rtw_get_ie(ie + _BEACON_IE_OFFSET_, _SSID_IE_, &ie_len, (pbss_network->IELength - _BEACON_IE_OFFSET_)); if (p && ie_len > 0) { + ie_len = min_t(int, ie_len, sizeof(pbss_network->Ssid.Ssid)); memset(&pbss_network->Ssid, 0, sizeof(struct ndis_802_11_ssid)); memcpy(pbss_network->Ssid.Ssid, (p + 2), ie_len); pbss_network->Ssid.SsidLength = ie_len; @@ -930,6 +931,7 @@ int rtw_check_beacon_data(struct adapter /* get supported rates */ p = rtw_get_ie(ie + _BEACON_IE_OFFSET_, _SUPPORTEDRATES_IE_, &ie_len, (pbss_network->IELength - _BEACON_IE_OFFSET_)); if (p) { + ie_len = min_t(int, ie_len, NDIS_802_11_LENGTH_RATES_EX); memcpy(supportRate, p + 2, ie_len); supportRateNum = ie_len; } @@ -937,6 +939,8 @@ int rtw_check_beacon_data(struct adapter /* get ext_supported rates */ p = rtw_get_ie(ie + _BEACON_IE_OFFSET_, _EXT_SUPPORTEDRATES_IE_, &ie_len, pbss_network->IELength - _BEACON_IE_OFFSET_); if (p) { + ie_len = min_t(int, ie_len, + NDIS_802_11_LENGTH_RATES_EX - supportRateNum); memcpy(supportRate + supportRateNum, p + 2, ie_len); supportRateNum += ie_len; } @@ -1050,6 +1054,7 @@ int rtw_check_beacon_data(struct adapter pht_cap->mcs.rx_mask[0] = 0xff; pht_cap->mcs.rx_mask[1] = 0x0; + ie_len = min_t(int, ie_len, sizeof(pmlmepriv->htpriv.ht_cap)); memcpy(&pmlmepriv->htpriv.ht_cap, p+2, ie_len); }
[PATCH 4.14 32/95] udf: fix silent AED tagLocation corruption
From: Greg Kroah-Hartman From: Steven J. Magnani [ Upstream commit 63c9e47a1642fc817654a1bc18a6ec4bbcc0f056 ] When extending a file, udf_do_extend_file() may enter following empty indirect extent. At the end of udf_do_extend_file() we revert prev_epos to point to the last written extent. However if we end up not adding any further extent in udf_do_extend_file(), the reverting points prev_epos into the header area of the AED and following updates of the extents (in udf_update_extents()) will corrupt the header. Make sure that we do not follow indirect extent if we are not going to add any more extents so that returning back to the last written extent works correctly. Link: https://lore.kernel.org/r/20210107234116.6190-2-magn...@ieee.org Signed-off-by: Steven J. Magnani Signed-off-by: Jan Kara Signed-off-by: Sasha Levin --- fs/udf/inode.c | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/fs/udf/inode.c b/fs/udf/inode.c index dd57bd446340..e0e2bc19c929 100644 --- a/fs/udf/inode.c +++ b/fs/udf/inode.c @@ -540,11 +540,14 @@ static int udf_do_extend_file(struct inode *inode, udf_write_aext(inode, last_pos, &last_ext->extLocation, last_ext->extLength, 1); + /* -* We've rewritten the last extent but there may be empty -* indirect extent after it - enter it. +* We've rewritten the last extent. If we are going to add +* more extents, we may need to enter possible following +* empty indirect extent. */ - udf_next_aext(inode, last_pos, &tmploc, &tmplen, 0); + if (new_block_bytes || prealloc_len) + udf_next_aext(inode, last_pos, &tmploc, &tmplen, 0); } /* Managed to do everything necessary? */ -- 2.30.1
[PATCH 5.10 083/290] gpio: fix gpio-device list corruption
From: Greg Kroah-Hartman From: Johan Hovold commit cf25ef6b631c6fc6c0435fc91eba8734cca20511 upstream. Make sure to hold the gpio_lock when removing the gpio device from the gpio_devices list (when dropping the last reference) to avoid corrupting the list when there are concurrent accesses. Fixes: ff2b13592299 ("gpio: make the gpiochip a real device") Cc: sta...@vger.kernel.org # 4.6 Reviewed-by: Saravana Kannan Signed-off-by: Johan Hovold Signed-off-by: Bartosz Golaszewski [ johan: adjust context to 5.11 ] Signed-off-by: Johan Hovold Signed-off-by: Greg Kroah-Hartman --- drivers/gpio/gpiolib.c |4 1 file changed, 4 insertions(+) --- a/drivers/gpio/gpiolib.c +++ b/drivers/gpio/gpiolib.c @@ -468,8 +468,12 @@ EXPORT_SYMBOL_GPL(gpiochip_line_is_valid static void gpiodevice_release(struct device *dev) { struct gpio_device *gdev = dev_get_drvdata(dev); + unsigned long flags; + spin_lock_irqsave(&gpio_lock, flags); list_del(&gdev->list); + spin_unlock_irqrestore(&gpio_lock, flags); + ida_free(&gpio_ida, gdev->id); kfree_const(gdev->label); kfree(gdev->descs);
[PATCH 5.11 038/306] gpio: fix gpio-device list corruption
From: Greg Kroah-Hartman From: Johan Hovold commit cf25ef6b631c6fc6c0435fc91eba8734cca20511 upstream. Make sure to hold the gpio_lock when removing the gpio device from the gpio_devices list (when dropping the last reference) to avoid corrupting the list when there are concurrent accesses. Fixes: ff2b13592299 ("gpio: make the gpiochip a real device") Cc: sta...@vger.kernel.org # 4.6 Reviewed-by: Saravana Kannan Signed-off-by: Johan Hovold Signed-off-by: Bartosz Golaszewski [ johan: adjust context to 5.11 ] Signed-off-by: Johan Hovold Signed-off-by: Greg Kroah-Hartman --- drivers/gpio/gpiolib.c |4 1 file changed, 4 insertions(+) --- a/drivers/gpio/gpiolib.c +++ b/drivers/gpio/gpiolib.c @@ -469,8 +469,12 @@ EXPORT_SYMBOL_GPL(gpiochip_line_is_valid static void gpiodevice_release(struct device *dev) { struct gpio_device *gdev = dev_get_drvdata(dev); + unsigned long flags; + spin_lock_irqsave(&gpio_lock, flags); list_del(&gdev->list); + spin_unlock_irqrestore(&gpio_lock, flags); + ida_free(&gpio_ida, gdev->id); kfree_const(gdev->label); kfree(gdev->descs);
[PATCH 4.9 18/78] udf: fix silent AED tagLocation corruption
From: Greg Kroah-Hartman From: Steven J. Magnani [ Upstream commit 63c9e47a1642fc817654a1bc18a6ec4bbcc0f056 ] When extending a file, udf_do_extend_file() may enter following empty indirect extent. At the end of udf_do_extend_file() we revert prev_epos to point to the last written extent. However if we end up not adding any further extent in udf_do_extend_file(), the reverting points prev_epos into the header area of the AED and following updates of the extents (in udf_update_extents()) will corrupt the header. Make sure that we do not follow indirect extent if we are not going to add any more extents so that returning back to the last written extent works correctly. Link: https://lore.kernel.org/r/20210107234116.6190-2-magn...@ieee.org Signed-off-by: Steven J. Magnani Signed-off-by: Jan Kara Signed-off-by: Sasha Levin --- fs/udf/inode.c | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/fs/udf/inode.c b/fs/udf/inode.c index 149baf5f3d19..50607673a6a9 100644 --- a/fs/udf/inode.c +++ b/fs/udf/inode.c @@ -548,11 +548,14 @@ static int udf_do_extend_file(struct inode *inode, udf_write_aext(inode, last_pos, &last_ext->extLocation, last_ext->extLength, 1); + /* -* We've rewritten the last extent but there may be empty -* indirect extent after it - enter it. +* We've rewritten the last extent. If we are going to add +* more extents, we may need to enter possible following +* empty indirect extent. */ - udf_next_aext(inode, last_pos, &tmploc, &tmplen, 0); + if (new_block_bytes || prealloc_len) + udf_next_aext(inode, last_pos, &tmploc, &tmplen, 0); } /* Managed to do everything necessary? */ -- 2.30.1
[PATCH 4.9 47/78] staging: rtl8188eu: fix potential memory corruption in rtw_check_beacon_data()
From: Greg Kroah-Hartman From: Dan Carpenter commit d4ac640322b06095128a5c45ba4a1e80929fe7f3 upstream. The "ie_len" is a value in the 1-255 range that comes from the user. We have to cap it to ensure that it's not too large or it could lead to memory corruption. Fixes: 9a7fe54ddc3a ("staging: r8188eu: Add source files for new driver - part 1") Signed-off-by: Dan Carpenter Cc: stable Link: https://lore.kernel.org/r/YEHyQCrFZKTXyT7J@mwanda Signed-off-by: Greg Kroah-Hartman --- drivers/staging/rtl8188eu/core/rtw_ap.c |5 + 1 file changed, 5 insertions(+) --- a/drivers/staging/rtl8188eu/core/rtw_ap.c +++ b/drivers/staging/rtl8188eu/core/rtw_ap.c @@ -917,6 +917,7 @@ int rtw_check_beacon_data(struct adapter /* SSID */ p = rtw_get_ie(ie + _BEACON_IE_OFFSET_, _SSID_IE_, &ie_len, (pbss_network->IELength - _BEACON_IE_OFFSET_)); if (p && ie_len > 0) { + ie_len = min_t(int, ie_len, sizeof(pbss_network->Ssid.Ssid)); memset(&pbss_network->Ssid, 0, sizeof(struct ndis_802_11_ssid)); memcpy(pbss_network->Ssid.Ssid, (p + 2), ie_len); pbss_network->Ssid.SsidLength = ie_len; @@ -935,6 +936,7 @@ int rtw_check_beacon_data(struct adapter /* get supported rates */ p = rtw_get_ie(ie + _BEACON_IE_OFFSET_, _SUPPORTEDRATES_IE_, &ie_len, (pbss_network->IELength - _BEACON_IE_OFFSET_)); if (p) { + ie_len = min_t(int, ie_len, NDIS_802_11_LENGTH_RATES_EX); memcpy(supportRate, p + 2, ie_len); supportRateNum = ie_len; } @@ -942,6 +944,8 @@ int rtw_check_beacon_data(struct adapter /* get ext_supported rates */ p = rtw_get_ie(ie + _BEACON_IE_OFFSET_, _EXT_SUPPORTEDRATES_IE_, &ie_len, pbss_network->IELength - _BEACON_IE_OFFSET_); if (p) { + ie_len = min_t(int, ie_len, + NDIS_802_11_LENGTH_RATES_EX - supportRateNum); memcpy(supportRate + supportRateNum, p + 2, ie_len); supportRateNum += ie_len; } @@ -1057,6 +1061,7 @@ int rtw_check_beacon_data(struct adapter pht_cap->mcs.rx_mask[0] = 0xff; pht_cap->mcs.rx_mask[1] = 0x0; } + ie_len = min_t(int, ie_len, sizeof(pmlmepriv->htpriv.ht_cap)); memcpy(&pmlmepriv->htpriv.ht_cap, p+2, ie_len); }
[PATCH 4.4 46/75] staging: rtl8188eu: fix potential memory corruption in rtw_check_beacon_data()
From: Greg Kroah-Hartman From: Dan Carpenter commit d4ac640322b06095128a5c45ba4a1e80929fe7f3 upstream. The "ie_len" is a value in the 1-255 range that comes from the user. We have to cap it to ensure that it's not too large or it could lead to memory corruption. Fixes: 9a7fe54ddc3a ("staging: r8188eu: Add source files for new driver - part 1") Signed-off-by: Dan Carpenter Cc: stable Link: https://lore.kernel.org/r/YEHyQCrFZKTXyT7J@mwanda Signed-off-by: Greg Kroah-Hartman --- drivers/staging/rtl8188eu/core/rtw_ap.c |5 + 1 file changed, 5 insertions(+) --- a/drivers/staging/rtl8188eu/core/rtw_ap.c +++ b/drivers/staging/rtl8188eu/core/rtw_ap.c @@ -921,6 +921,7 @@ int rtw_check_beacon_data(struct adapter /* SSID */ p = rtw_get_ie(ie + _BEACON_IE_OFFSET_, _SSID_IE_, &ie_len, (pbss_network->IELength - _BEACON_IE_OFFSET_)); if (p && ie_len > 0) { + ie_len = min_t(int, ie_len, sizeof(pbss_network->Ssid.Ssid)); memset(&pbss_network->Ssid, 0, sizeof(struct ndis_802_11_ssid)); memcpy(pbss_network->Ssid.Ssid, (p + 2), ie_len); pbss_network->Ssid.SsidLength = ie_len; @@ -939,6 +940,7 @@ int rtw_check_beacon_data(struct adapter /* get supported rates */ p = rtw_get_ie(ie + _BEACON_IE_OFFSET_, _SUPPORTEDRATES_IE_, &ie_len, (pbss_network->IELength - _BEACON_IE_OFFSET_)); if (p != NULL) { + ie_len = min_t(int, ie_len, NDIS_802_11_LENGTH_RATES_EX); memcpy(supportRate, p+2, ie_len); supportRateNum = ie_len; } @@ -946,6 +948,8 @@ int rtw_check_beacon_data(struct adapter /* get ext_supported rates */ p = rtw_get_ie(ie + _BEACON_IE_OFFSET_, _EXT_SUPPORTEDRATES_IE_, &ie_len, pbss_network->IELength - _BEACON_IE_OFFSET_); if (p != NULL) { + ie_len = min_t(int, ie_len, + NDIS_802_11_LENGTH_RATES_EX - supportRateNum); memcpy(supportRate+supportRateNum, p+2, ie_len); supportRateNum += ie_len; } @@ -1061,6 +1065,7 @@ int rtw_check_beacon_data(struct adapter pht_cap->supp_mcs_set[0] = 0xff; pht_cap->supp_mcs_set[1] = 0x0; } + ie_len = min_t(int, ie_len, sizeof(pmlmepriv->htpriv.ht_cap)); memcpy(&pmlmepriv->htpriv.ht_cap, p+2, ie_len); }
Re: [PATCH] rpadlpar: fix potential drc_name corruption in store functions
Tyrel Datwyler writes: > On 3/13/21 1:17 AM, Michal Suchánek wrote: >> On Wed, Mar 10, 2021 at 04:30:21PM -0600, Tyrel Datwyler wrote: >>> Both add_slot_store() and remove_slot_store() try to fix up the drc_name >>> copied from the store buffer by placing a NULL terminator at nbyte + 1 >>> or in place of a '\n' if present. However, the static buffer that we >>> copy the drc_name data into is not zeored and can contain anything past >>> the n-th byte. This is problematic if a '\n' byte appears in that buffer >>> after nbytes and the string copied into the store buffer was not NULL >>> terminated to start with as the strchr() search for a '\n' byte will mark >>> this incorrectly as the end of the drc_name string resulting in a drc_name >>> string that contains garbage data after the n-th byte. The following >>> debugging shows an example of the drmgr utility writing "PHB 4543" to >>> the add_slot sysfs attribute, but add_slot_store logging a corrupted >>> string value. >>> >>> [135823.702864] drmgr: drmgr: -c phb -a -s PHB 4543 -d 1 >>> [135823.702879] add_slot_store: drc_name = PHB 4543°|<82>!, rc = -19 >>> >>> Fix this by NULL terminating the string when we copy it into our static >>> buffer by coping nbytes + 1 of data from the store buffer. The code has >> Why is it OK to copy nbytes + 1 and why is it expected that the buffer >> contains a nul after the content? > > It is my understanding that the store function buffer is allocated as a > zeroed-page which the kernel copies up to at most (PAGE_SIZE - 1) of user data > into. Anything after nbytes would therefore be zeroed. I think that's true, but it would be nice if we didn't have to rely on that obscure detail in order for this code to be correct & understandable. >> Isn't it much saner to just nul terminate the string after copying? > > At the cost of an extra line of code, sure. Is there a reason we can't use strscpy()? That should deal with all the corner cases around the string copy, and then all you have to do is look for a newline and turn it into nul. cheers
Re: [PATCH] rpadlpar: fix potential drc_name corruption in store functions
On 3/13/21 1:17 AM, Michal Suchánek wrote: > On Wed, Mar 10, 2021 at 04:30:21PM -0600, Tyrel Datwyler wrote: >> Both add_slot_store() and remove_slot_store() try to fix up the drc_name >> copied from the store buffer by placing a NULL terminator at nbyte + 1 >> or in place of a '\n' if present. However, the static buffer that we >> copy the drc_name data into is not zeored and can contain anything past >> the n-th byte. This is problematic if a '\n' byte appears in that buffer >> after nbytes and the string copied into the store buffer was not NULL >> terminated to start with as the strchr() search for a '\n' byte will mark >> this incorrectly as the end of the drc_name string resulting in a drc_name >> string that contains garbage data after the n-th byte. The following >> debugging shows an example of the drmgr utility writing "PHB 4543" to >> the add_slot sysfs attribute, but add_slot_store logging a corrupted >> string value. >> >> [135823.702864] drmgr: drmgr: -c phb -a -s PHB 4543 -d 1 >> [135823.702879] add_slot_store: drc_name = PHB 4543°|<82>!, rc = -19 >> >> Fix this by NULL terminating the string when we copy it into our static >> buffer by coping nbytes + 1 of data from the store buffer. The code has > Why is it OK to copy nbytes + 1 and why is it expected that the buffer > contains a nul after the content? It is my understanding that the store function buffer is allocated as a zeroed-page which the kernel copies up to at most (PAGE_SIZE - 1) of user data into. Anything after nbytes would therefore be zeroed. > > Isn't it much saner to just nul terminate the string after copying? At the cost of an extra line of code, sure. -Tyrel > > diff --git a/drivers/pci/hotplug/rpadlpar_sysfs.c > b/drivers/pci/hotplug/rpadlpar_sysfs.c > index cdbfa5df3a51..cfbad67447da 100644 > --- a/drivers/pci/hotplug/rpadlpar_sysfs.c > +++ b/drivers/pci/hotplug/rpadlpar_sysfs.c > @@ -35,11 +35,11 @@ static ssize_t add_slot_store(struct kobject *kobj, > struct kobj_attribute *attr, > return 0; > > memcpy(drc_name, buf, nbytes); > + &drc_name[nbytes] = '\0'; > > end = strchr(drc_name, '\n'); > - if (!end) > - end = &drc_name[nbytes]; > - *end = '\0'; > + if (end) > + *end = '\0'; > > rc = dlpar_add_slot(drc_name); > if (rc) > @@ -66,11 +66,11 @@ static ssize_t remove_slot_store(struct kobject *kobj, > return 0; > > memcpy(drc_name, buf, nbytes); > + &drc_name[nbytes] = '\0'; > > end = strchr(drc_name, '\n'); > - if (!end) > - end = &drc_name[nbytes]; > - *end = '\0'; > + if (end) > + *end = '\0'; > > rc = dlpar_remove_slot(drc_name); > if (rc) > > Thanks > > Michal > >> already made sure that nbytes is not >= MAX_DRC_NAME_LEN and the store >> buffer is guaranteed to be zeroed beyond the nth-byte of data copied >> from the user. Further, since the string is now NULL terminated the code >> only needs to change '\n' to '\0' when present. >> >> Signed-off-by: Tyrel Datwyler >> --- >> drivers/pci/hotplug/rpadlpar_sysfs.c | 14 ++ >> 1 file changed, 6 insertions(+), 8 deletions(-) >> >> diff --git a/drivers/pci/hotplug/rpadlpar_sysfs.c >> b/drivers/pci/hotplug/rpadlpar_sysfs.c >> index cdbfa5df3a51..375087921284 100644 >> --- a/drivers/pci/hotplug/rpadlpar_sysfs.c >> +++ b/drivers/pci/hotplug/rpadlpar_sysfs.c >> @@ -34,12 +34,11 @@ static ssize_t add_slot_store(struct kobject *kobj, >> struct kobj_attribute *attr, >> if (nbytes >= MAX_DRC_NAME_LEN) >> return 0; >> >> -memcpy(drc_name, buf, nbytes); >> +memcpy(drc_name, buf, nbytes + 1); >> >> end = strchr(drc_name, '\n'); >> -if (!end) >> -end = &drc_name[nbytes]; >> -*end = '\0'; >> +if (end) >> +*end = '\0'; >> >> rc = dlpar_add_slot(drc_name); >> if (rc) >> @@ -65,12 +64,11 @@ static ssize_t remove_slot_store(struct kobject *kobj, >> if (nbytes >= MAX_DRC_NAME_LEN) >> return 0; >> >> -memcpy(drc_name, buf, nbytes); >> +memcpy(drc_name, buf, nbytes + 1); >> >> end = strchr(drc_name, '\n'); >> -if (!end) >> -end = &drc_name[nbytes]; >> -*end = '\0'; >> +if (end) >> +*end = '\0'; >> >> rc = dlpar_remove_slot(drc_name); >> if (rc) >> -- >> 2.27.0 >>
Re: [PATCH] rpadlpar: fix potential drc_name corruption in store functions
On Wed, Mar 10, 2021 at 04:30:21PM -0600, Tyrel Datwyler wrote: > Both add_slot_store() and remove_slot_store() try to fix up the drc_name > copied from the store buffer by placing a NULL terminator at nbyte + 1 > or in place of a '\n' if present. However, the static buffer that we > copy the drc_name data into is not zeored and can contain anything past > the n-th byte. This is problematic if a '\n' byte appears in that buffer > after nbytes and the string copied into the store buffer was not NULL > terminated to start with as the strchr() search for a '\n' byte will mark > this incorrectly as the end of the drc_name string resulting in a drc_name > string that contains garbage data after the n-th byte. The following > debugging shows an example of the drmgr utility writing "PHB 4543" to > the add_slot sysfs attribute, but add_slot_store logging a corrupted > string value. > > [135823.702864] drmgr: drmgr: -c phb -a -s PHB 4543 -d 1 > [135823.702879] add_slot_store: drc_name = PHB 4543°|<82>!, rc = -19 > > Fix this by NULL terminating the string when we copy it into our static > buffer by coping nbytes + 1 of data from the store buffer. The code has Why is it OK to copy nbytes + 1 and why is it expected that the buffer contains a nul after the content? Isn't it much saner to just nul terminate the string after copying? diff --git a/drivers/pci/hotplug/rpadlpar_sysfs.c b/drivers/pci/hotplug/rpadlpar_sysfs.c index cdbfa5df3a51..cfbad67447da 100644 --- a/drivers/pci/hotplug/rpadlpar_sysfs.c +++ b/drivers/pci/hotplug/rpadlpar_sysfs.c @@ -35,11 +35,11 @@ static ssize_t add_slot_store(struct kobject *kobj, struct kobj_attribute *attr, return 0; memcpy(drc_name, buf, nbytes); + &drc_name[nbytes] = '\0'; end = strchr(drc_name, '\n'); - if (!end) - end = &drc_name[nbytes]; - *end = '\0'; + if (end) + *end = '\0'; rc = dlpar_add_slot(drc_name); if (rc) @@ -66,11 +66,11 @@ static ssize_t remove_slot_store(struct kobject *kobj, return 0; memcpy(drc_name, buf, nbytes); + &drc_name[nbytes] = '\0'; end = strchr(drc_name, '\n'); - if (!end) - end = &drc_name[nbytes]; - *end = '\0'; + if (end) + *end = '\0'; rc = dlpar_remove_slot(drc_name); if (rc) Thanks Michal > already made sure that nbytes is not >= MAX_DRC_NAME_LEN and the store > buffer is guaranteed to be zeroed beyond the nth-byte of data copied > from the user. Further, since the string is now NULL terminated the code > only needs to change '\n' to '\0' when present. > > Signed-off-by: Tyrel Datwyler > --- > drivers/pci/hotplug/rpadlpar_sysfs.c | 14 ++ > 1 file changed, 6 insertions(+), 8 deletions(-) > > diff --git a/drivers/pci/hotplug/rpadlpar_sysfs.c > b/drivers/pci/hotplug/rpadlpar_sysfs.c > index cdbfa5df3a51..375087921284 100644 > --- a/drivers/pci/hotplug/rpadlpar_sysfs.c > +++ b/drivers/pci/hotplug/rpadlpar_sysfs.c > @@ -34,12 +34,11 @@ static ssize_t add_slot_store(struct kobject *kobj, > struct kobj_attribute *attr, > if (nbytes >= MAX_DRC_NAME_LEN) > return 0; > > - memcpy(drc_name, buf, nbytes); > + memcpy(drc_name, buf, nbytes + 1); > > end = strchr(drc_name, '\n'); > - if (!end) > - end = &drc_name[nbytes]; > - *end = '\0'; > + if (end) > + *end = '\0'; > > rc = dlpar_add_slot(drc_name); > if (rc) > @@ -65,12 +64,11 @@ static ssize_t remove_slot_store(struct kobject *kobj, > if (nbytes >= MAX_DRC_NAME_LEN) > return 0; > > - memcpy(drc_name, buf, nbytes); > + memcpy(drc_name, buf, nbytes + 1); > > end = strchr(drc_name, '\n'); > - if (!end) > - end = &drc_name[nbytes]; > - *end = '\0'; > + if (end) > + *end = '\0'; > > rc = dlpar_remove_slot(drc_name); > if (rc) > -- > 2.27.0 >
[PATCH] rpadlpar: fix potential drc_name corruption in store functions
Both add_slot_store() and remove_slot_store() try to fix up the drc_name copied from the store buffer by placing a NULL terminator at nbyte + 1 or in place of a '\n' if present. However, the static buffer that we copy the drc_name data into is not zeored and can contain anything past the n-th byte. This is problematic if a '\n' byte appears in that buffer after nbytes and the string copied into the store buffer was not NULL terminated to start with as the strchr() search for a '\n' byte will mark this incorrectly as the end of the drc_name string resulting in a drc_name string that contains garbage data after the n-th byte. The following debugging shows an example of the drmgr utility writing "PHB 4543" to the add_slot sysfs attribute, but add_slot_store logging a corrupted string value. [135823.702864] drmgr: drmgr: -c phb -a -s PHB 4543 -d 1 [135823.702879] add_slot_store: drc_name = PHB 4543°|<82>!, rc = -19 Fix this by NULL terminating the string when we copy it into our static buffer by coping nbytes + 1 of data from the store buffer. The code has already made sure that nbytes is not >= MAX_DRC_NAME_LEN and the store buffer is guaranteed to be zeroed beyond the nth-byte of data copied from the user. Further, since the string is now NULL terminated the code only needs to change '\n' to '\0' when present. Signed-off-by: Tyrel Datwyler --- drivers/pci/hotplug/rpadlpar_sysfs.c | 14 ++ 1 file changed, 6 insertions(+), 8 deletions(-) diff --git a/drivers/pci/hotplug/rpadlpar_sysfs.c b/drivers/pci/hotplug/rpadlpar_sysfs.c index cdbfa5df3a51..375087921284 100644 --- a/drivers/pci/hotplug/rpadlpar_sysfs.c +++ b/drivers/pci/hotplug/rpadlpar_sysfs.c @@ -34,12 +34,11 @@ static ssize_t add_slot_store(struct kobject *kobj, struct kobj_attribute *attr, if (nbytes >= MAX_DRC_NAME_LEN) return 0; - memcpy(drc_name, buf, nbytes); + memcpy(drc_name, buf, nbytes + 1); end = strchr(drc_name, '\n'); - if (!end) - end = &drc_name[nbytes]; - *end = '\0'; + if (end) + *end = '\0'; rc = dlpar_add_slot(drc_name); if (rc) @@ -65,12 +64,11 @@ static ssize_t remove_slot_store(struct kobject *kobj, if (nbytes >= MAX_DRC_NAME_LEN) return 0; - memcpy(drc_name, buf, nbytes); + memcpy(drc_name, buf, nbytes + 1); end = strchr(drc_name, '\n'); - if (!end) - end = &drc_name[nbytes]; - *end = '\0'; + if (end) + *end = '\0'; rc = dlpar_remove_slot(drc_name); if (rc) -- 2.27.0
Re: [PATCH] mm/slub: Add slub_debug option to panic on memory corruption
On 3/9/21 7:14 PM, Georgi Djakov wrote: > Hi Vlastimil, > > Thanks for the comment! > > On 3/9/21 17:09, Vlastimil Babka wrote: >> On 3/9/21 2:47 PM, Georgi Djakov wrote: >>> Being able to stop the system immediately when a memory corruption >>> is detected is crucial to finding the source of it. This is very >>> useful when the memory can be inspected with kdump or other tools. >> >> Is this in some testing scenarios where you would also use e.g. >> panic_on_warn? >> We could hook to that. If not, we could introduce a new >> panic_on_memory_corruption that would apply also for debug_pagealloc and >> whatnot? > > I would prefer that we not tie it with panic_on_warn - there might be lots of > new code in multiple subsystems, so hitting some WARNing while testing is not > something unexpected. > > Introducing an additional panic_on_memory_corruption would work, but i noticed > that we already have slub_debug and thought to re-use that. But indeed, аdding > an option to panic in for example bad_page() sounds also useful, if that's > what > you suggest. Yes, that would be another example. Also CCing Kees for input, as besides the "kdump ASAP for debugging" case, I can imagine security hardening folks could be interested in the "somebody might have just failed to pwn the kernel, better panic than let them continue" angle. But I'm naive wrt security, so it might be a stupid idea :) Vlastimil > Thanks, > Georgi
Re: [PATCH] mm/slub: Add slub_debug option to panic on memory corruption
Hi Vlastimil, Thanks for the comment! On 3/9/21 17:09, Vlastimil Babka wrote: On 3/9/21 2:47 PM, Georgi Djakov wrote: Being able to stop the system immediately when a memory corruption is detected is crucial to finding the source of it. This is very useful when the memory can be inspected with kdump or other tools. Is this in some testing scenarios where you would also use e.g. panic_on_warn? We could hook to that. If not, we could introduce a new panic_on_memory_corruption that would apply also for debug_pagealloc and whatnot? I would prefer that we not tie it with panic_on_warn - there might be lots of new code in multiple subsystems, so hitting some WARNing while testing is not something unexpected. Introducing an additional panic_on_memory_corruption would work, but i noticed that we already have slub_debug and thought to re-use that. But indeed, аdding an option to panic in for example bad_page() sounds also useful, if that's what you suggest. Thanks, Georgi
Re: [PATCH] mm/slub: Add slub_debug option to panic on memory corruption
Hi Christoph, Thanks for the comments! On 3/9/21 16:56, Christoph Lameter wrote: On Tue, 9 Mar 2021, Georgi Djakov wrote: Being able to stop the system immediately when a memory corruption is detected is crucial to finding the source of it. This is very useful when the memory can be inspected with kdump or other tools. Hmmm ok. The idea is to be able to collect data right after the corruption is detected, otherwise more data might be corrupted and tracing becomes more difficult. static void restore_bytes(struct kmem_cache *s, char *message, u8 data, void *from, void *to) { + if (slub_debug & SLAB_CORRUPTION_PANIC) + panic("slab: object overwritten\n"); slab_fix(s, "Restoring 0x%p-0x%p=0x%x\n", from, to - 1, data); memset(from, data, to - from); } Why panic here? This should only be called late in the bug reporting when an error has already been printed. This is called by both slab_pad_check() and check_bytes_and_report(), so it seemed like a common place where i could put the panic(). I can move it to the caller functions instead, if that's preferred. Thanks, Georgi
Re: [PATCH] mm/slub: Add slub_debug option to panic on memory corruption
On 3/9/21 2:47 PM, Georgi Djakov wrote: > Being able to stop the system immediately when a memory corruption > is detected is crucial to finding the source of it. This is very > useful when the memory can be inspected with kdump or other tools. Is this in some testing scenarios where you would also use e.g. panic_on_warn? We could hook to that. If not, we could introduce a new panic_on_memory_corruption that would apply also for debug_pagealloc and whatnot? > Let's add an option panic the kernel when slab debug catches an > object or list corruption. > > This new option is not enabled by default (yet), so it needs to be > enabled explicitly (for example by adding "slub_debug=FZPUC" to > the kernel command line). > > Signed-off-by: Georgi Djakov > --- > Documentation/vm/slub.rst | 1 + > include/linux/slab.h | 3 +++ > mm/slab.h | 2 +- > mm/slub.c | 9 + > 4 files changed, 14 insertions(+), 1 deletion(-) > > diff --git a/Documentation/vm/slub.rst b/Documentation/vm/slub.rst > index 03f294a638bd..32878c44f3de 100644 > --- a/Documentation/vm/slub.rst > +++ b/Documentation/vm/slub.rst > @@ -53,6 +53,7 @@ Possible debug options are:: > Z Red zoning > P Poisoning (object and padding) > U User tracking (free and alloc) > + C Panic on object corruption (enables > SLAB_CORRUPTION_PANIC) > T Trace (please only use on single slabs) > A Enable failslab filter mark for the cache > O Switch debugging off for caches that would have > diff --git a/include/linux/slab.h b/include/linux/slab.h > index 0c97d788762c..ebff5e704d08 100644 > --- a/include/linux/slab.h > +++ b/include/linux/slab.h > @@ -39,6 +39,9 @@ > #define SLAB_STORE_USER ((slab_flags_t __force)0x0001U) > /* Panic if kmem_cache_create() fails */ > #define SLAB_PANIC ((slab_flags_t __force)0x0004U) > +/* Panic if memory corruption is detected */ > +#define SLAB_CORRUPTION_PANIC((slab_flags_t __force)0x0008U) > + > /* > * SLAB_TYPESAFE_BY_RCU - **WARNING** READ THIS! > * > diff --git a/mm/slab.h b/mm/slab.h > index 120b1d0dfb6d..ae0079017fc6 100644 > --- a/mm/slab.h > +++ b/mm/slab.h > @@ -134,7 +134,7 @@ static inline slab_flags_t kmem_cache_flags(unsigned int > object_size, > #define SLAB_DEBUG_FLAGS (SLAB_RED_ZONE | SLAB_POISON | SLAB_STORE_USER) > #elif defined(CONFIG_SLUB_DEBUG) > #define SLAB_DEBUG_FLAGS (SLAB_RED_ZONE | SLAB_POISON | SLAB_STORE_USER | \ > - SLAB_TRACE | SLAB_CONSISTENCY_CHECKS) > + SLAB_TRACE | SLAB_CONSISTENCY_CHECKS | > SLAB_CORRUPTION_PANIC) > #else > #define SLAB_DEBUG_FLAGS (0) > #endif > diff --git a/mm/slub.c b/mm/slub.c > index 077a019e4d7a..49351427f701 100644 > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -741,6 +741,8 @@ void object_err(struct kmem_cache *s, struct page *page, > { > slab_bug(s, "%s", reason); > print_trailer(s, page, object); > + if (slub_debug & SLAB_CORRUPTION_PANIC) > + panic(reason); > } > > static __printf(3, 4) void slab_err(struct kmem_cache *s, struct page *page, > @@ -755,6 +757,8 @@ static __printf(3, 4) void slab_err(struct kmem_cache *s, > struct page *page, > slab_bug(s, "%s", buf); > print_page_info(page); > dump_stack(); > + if (slub_debug & SLAB_CORRUPTION_PANIC) > + panic("slab: slab error\n"); > } > > static void init_object(struct kmem_cache *s, void *object, u8 val) > @@ -776,6 +780,8 @@ static void init_object(struct kmem_cache *s, void > *object, u8 val) > static void restore_bytes(struct kmem_cache *s, char *message, u8 data, > void *from, void *to) > { > + if (slub_debug & SLAB_CORRUPTION_PANIC) > + panic("slab: object overwritten\n"); > slab_fix(s, "Restoring 0x%p-0x%p=0x%x\n", from, to - 1, data); > memset(from, data, to - from); > } > @@ -1319,6 +1325,9 @@ parse_slub_debug_flags(char *str, slab_flags_t *flags, > char **slabs, bool init) > case 'a': > *flags |= SLAB_FAILSLAB; > break; > + case 'c': > + *flags |= SLAB_CORRUPTION_PANIC; > + break; > case 'o': > /* >* Avoid enabling debugging on caches if its minimum >
Re: [PATCH] mm/slub: Add slub_debug option to panic on memory corruption
On Tue, 9 Mar 2021, Georgi Djakov wrote: > Being able to stop the system immediately when a memory corruption > is detected is crucial to finding the source of it. This is very > useful when the memory can be inspected with kdump or other tools. Hmmm ok. > static void restore_bytes(struct kmem_cache *s, char *message, u8 data, > void *from, void *to) > { > + if (slub_debug & SLAB_CORRUPTION_PANIC) > + panic("slab: object overwritten\n"); > slab_fix(s, "Restoring 0x%p-0x%p=0x%x\n", from, to - 1, data); > memset(from, data, to - from); > } Why panic here? This should only be called late in the bug reporting when an error has already been printed.
[PATCH] mm/slub: Add slub_debug option to panic on memory corruption
Being able to stop the system immediately when a memory corruption is detected is crucial to finding the source of it. This is very useful when the memory can be inspected with kdump or other tools. Let's add an option panic the kernel when slab debug catches an object or list corruption. This new option is not enabled by default (yet), so it needs to be enabled explicitly (for example by adding "slub_debug=FZPUC" to the kernel command line). Signed-off-by: Georgi Djakov --- Documentation/vm/slub.rst | 1 + include/linux/slab.h | 3 +++ mm/slab.h | 2 +- mm/slub.c | 9 + 4 files changed, 14 insertions(+), 1 deletion(-) diff --git a/Documentation/vm/slub.rst b/Documentation/vm/slub.rst index 03f294a638bd..32878c44f3de 100644 --- a/Documentation/vm/slub.rst +++ b/Documentation/vm/slub.rst @@ -53,6 +53,7 @@ Possible debug options are:: Z Red zoning P Poisoning (object and padding) U User tracking (free and alloc) + C Panic on object corruption (enables SLAB_CORRUPTION_PANIC) T Trace (please only use on single slabs) A Enable failslab filter mark for the cache O Switch debugging off for caches that would have diff --git a/include/linux/slab.h b/include/linux/slab.h index 0c97d788762c..ebff5e704d08 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -39,6 +39,9 @@ #define SLAB_STORE_USER((slab_flags_t __force)0x0001U) /* Panic if kmem_cache_create() fails */ #define SLAB_PANIC ((slab_flags_t __force)0x0004U) +/* Panic if memory corruption is detected */ +#define SLAB_CORRUPTION_PANIC ((slab_flags_t __force)0x0008U) + /* * SLAB_TYPESAFE_BY_RCU - **WARNING** READ THIS! * diff --git a/mm/slab.h b/mm/slab.h index 120b1d0dfb6d..ae0079017fc6 100644 --- a/mm/slab.h +++ b/mm/slab.h @@ -134,7 +134,7 @@ static inline slab_flags_t kmem_cache_flags(unsigned int object_size, #define SLAB_DEBUG_FLAGS (SLAB_RED_ZONE | SLAB_POISON | SLAB_STORE_USER) #elif defined(CONFIG_SLUB_DEBUG) #define SLAB_DEBUG_FLAGS (SLAB_RED_ZONE | SLAB_POISON | SLAB_STORE_USER | \ - SLAB_TRACE | SLAB_CONSISTENCY_CHECKS) + SLAB_TRACE | SLAB_CONSISTENCY_CHECKS | SLAB_CORRUPTION_PANIC) #else #define SLAB_DEBUG_FLAGS (0) #endif diff --git a/mm/slub.c b/mm/slub.c index 077a019e4d7a..49351427f701 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -741,6 +741,8 @@ void object_err(struct kmem_cache *s, struct page *page, { slab_bug(s, "%s", reason); print_trailer(s, page, object); + if (slub_debug & SLAB_CORRUPTION_PANIC) + panic(reason); } static __printf(3, 4) void slab_err(struct kmem_cache *s, struct page *page, @@ -755,6 +757,8 @@ static __printf(3, 4) void slab_err(struct kmem_cache *s, struct page *page, slab_bug(s, "%s", buf); print_page_info(page); dump_stack(); + if (slub_debug & SLAB_CORRUPTION_PANIC) + panic("slab: slab error\n"); } static void init_object(struct kmem_cache *s, void *object, u8 val) @@ -776,6 +780,8 @@ static void init_object(struct kmem_cache *s, void *object, u8 val) static void restore_bytes(struct kmem_cache *s, char *message, u8 data, void *from, void *to) { + if (slub_debug & SLAB_CORRUPTION_PANIC) + panic("slab: object overwritten\n"); slab_fix(s, "Restoring 0x%p-0x%p=0x%x\n", from, to - 1, data); memset(from, data, to - from); } @@ -1319,6 +1325,9 @@ parse_slub_debug_flags(char *str, slab_flags_t *flags, char **slabs, bool init) case 'a': *flags |= SLAB_FAILSLAB; break; + case 'c': + *flags |= SLAB_CORRUPTION_PANIC; + break; case 'o': /* * Avoid enabling debugging on caches if its minimum
Re: [PATCH v3] MIPS: kernel: Reserve exception base early to prevent corruption
On Mon, Mar 08, 2021 at 10:24:47AM +0100, Thomas Bogendoerfer wrote: > BMIPS is one of the few platforms that do change the exception base. > After commit 2dcb39645441 ("memblock: do not start bottom-up allocations > with kernel_end") we started seeing BMIPS boards fail to boot with the > built-in FDT being corrupted. > > Before the cited commit, early allocations would be in the [kernel_end, > RAM_END] range, but after commit they would be within [RAM_START + > PAGE_SIZE, RAM_END]. > > The custom exception base handler that is installed by > bmips_ebase_setup() done for BMIPS5000 CPUs ends-up trampling on the > memory region allocated by unflatten_and_copy_device_tree() thus > corrupting the FDT used by the kernel. > > To fix this, we need to perform an early reservation of the custom > exception space. Additional we reserve the first 4k (1k for R3k) for > either normal exception vector space (legacy CPUs) or special vectors > like cache exceptions. > > Huge thanks to Serge for analysing and proposing a solution to this > issue. > > Fixes: 2dcb39645441 ("memblock: do not start bottom-up allocations with > kernel_end") > Reported-by: Kamal Dasu > Debugged-by: Serge Semin > Signed-off-by: Thomas Bogendoerfer > --- > Changes in v3: > - always reserve the first 4k for all CPUs (1k for R3k) > > Changes in v2: > - do only memblock reservation in reserve_exception_space() > - reserve 0..0x400 for all CPUs without ebase register and >to addtional reserve_exception_space for BMIPS CPUs > > arch/mips/include/asm/traps.h| 3 +++ > arch/mips/kernel/cpu-probe.c | 6 ++ > arch/mips/kernel/cpu-r3k-probe.c | 3 +++ > arch/mips/kernel/traps.c | 10 +- > 4 files changed, 17 insertions(+), 5 deletions(-) applied to mips-fixes. Thomas. -- Crap can work. Given enough thrust pigs will fly, but it's not necessarily a good idea.[ RFC1925, 2.3 ]
Re: [PATCH v3] MIPS: kernel: Reserve exception base early to prevent corruption
Hi Thomas On Mon, Mar 08, 2021 at 10:24:47AM +0100, Thomas Bogendoerfer wrote: > BMIPS is one of the few platforms that do change the exception base. > After commit 2dcb39645441 ("memblock: do not start bottom-up allocations > with kernel_end") we started seeing BMIPS boards fail to boot with the > built-in FDT being corrupted. > > Before the cited commit, early allocations would be in the [kernel_end, > RAM_END] range, but after commit they would be within [RAM_START + > PAGE_SIZE, RAM_END]. > > The custom exception base handler that is installed by > bmips_ebase_setup() done for BMIPS5000 CPUs ends-up trampling on the > memory region allocated by unflatten_and_copy_device_tree() thus > corrupting the FDT used by the kernel. > > To fix this, we need to perform an early reservation of the custom > exception space. Additional we reserve the first 4k (1k for R3k) for > either normal exception vector space (legacy CPUs) or special vectors > like cache exceptions. > > Huge thanks to Serge for analysing and proposing a solution to this > issue. Reviewed-by: Serge Semin -Sergey > > Fixes: 2dcb39645441 ("memblock: do not start bottom-up allocations with > kernel_end") > Reported-by: Kamal Dasu > Debugged-by: Serge Semin > Signed-off-by: Thomas Bogendoerfer > --- > Changes in v3: > - always reserve the first 4k for all CPUs (1k for R3k) > > Changes in v2: > - do only memblock reservation in reserve_exception_space() > - reserve 0..0x400 for all CPUs without ebase register and >to addtional reserve_exception_space for BMIPS CPUs > > arch/mips/include/asm/traps.h| 3 +++ > arch/mips/kernel/cpu-probe.c | 6 ++ > arch/mips/kernel/cpu-r3k-probe.c | 3 +++ > arch/mips/kernel/traps.c | 10 +- > 4 files changed, 17 insertions(+), 5 deletions(-) > > diff --git a/arch/mips/include/asm/traps.h b/arch/mips/include/asm/traps.h > index 6aa8f126a43d..b710e76c9c65 100644 > --- a/arch/mips/include/asm/traps.h > +++ b/arch/mips/include/asm/traps.h > @@ -24,8 +24,11 @@ extern void (*board_ebase_setup)(void); > extern void (*board_cache_error_setup)(void); > > extern int register_nmi_notifier(struct notifier_block *nb); > +extern void reserve_exception_space(phys_addr_t addr, unsigned long size); > extern char except_vec_nmi[]; > > +#define VECTORSPACING 0x100 /* for EI/VI mode */ > + > #define nmi_notifier(fn, pri) > \ > ({ \ > static struct notifier_block fn##_nb = {\ > diff --git a/arch/mips/kernel/cpu-probe.c b/arch/mips/kernel/cpu-probe.c > index 9a89637b4ecf..b71892064f27 100644 > --- a/arch/mips/kernel/cpu-probe.c > +++ b/arch/mips/kernel/cpu-probe.c > @@ -26,6 +26,7 @@ > #include > #include > #include > +#include > #include > > #include "fpu-probe.h" > @@ -1628,6 +1629,7 @@ static inline void cpu_probe_broadcom(struct > cpuinfo_mips *c, unsigned int cpu) > c->cputype = CPU_BMIPS3300; > __cpu_name[cpu] = "Broadcom BMIPS3300"; > set_elf_platform(cpu, "bmips3300"); > + reserve_exception_space(0x400, VECTORSPACING * 64); > break; > case PRID_IMP_BMIPS43XX: { > int rev = c->processor_id & PRID_REV_MASK; > @@ -1638,6 +1640,7 @@ static inline void cpu_probe_broadcom(struct > cpuinfo_mips *c, unsigned int cpu) > __cpu_name[cpu] = "Broadcom BMIPS4380"; > set_elf_platform(cpu, "bmips4380"); > c->options |= MIPS_CPU_RIXI; > + reserve_exception_space(0x400, VECTORSPACING * 64); > } else { > c->cputype = CPU_BMIPS4350; > __cpu_name[cpu] = "Broadcom BMIPS4350"; > @@ -1654,6 +1657,7 @@ static inline void cpu_probe_broadcom(struct > cpuinfo_mips *c, unsigned int cpu) > __cpu_name[cpu] = "Broadcom BMIPS5000"; > set_elf_platform(cpu, "bmips5000"); > c->options |= MIPS_CPU_ULRI | MIPS_CPU_RIXI; > + reserve_exception_space(0x1000, VECTORSPACING * 64); > break; > } > } > @@ -2133,6 +2137,8 @@ void cpu_probe(void) > if (cpu == 0) > __ua_limit = ~((1ull << cpu_vmbits) - 1); > #endif > + > + reserve_exception_space(0, 0x1000); > } > > void cpu_report(void) > diff --git a/arch/mips/kernel/cpu-r3k-probe.c > b/arch/mips/kernel/cpu-r3k-probe.c > index abdbbe8c5a43..af654771918c 100644 > --- a/arch/mips/kernel/cpu-r3k-probe.c > +++ b/arch/mips/kernel/cpu-r3k-probe.c > @@ -21,6 +21,7 @@ > #include > #include > #include > +#include > > #include "fpu-probe.h" > > @@ -158,6 +159,8 @@ void cpu_probe(void) > cpu_set_fpu_opts(c); > else > cpu_set_nofpu_opts(c); > + > + reserve_exception_space(0, 0x400); > } > > void cpu_report(void) > diff --git a/a
Re: [PATCH v3] MIPS: kernel: Reserve exception base early to prevent corruption
On Mon, Mar 08, 2021 at 10:21:10AM -0800, Florian Fainelli wrote: > On 3/8/21 1:24 AM, Thomas Bogendoerfer wrote: > > BMIPS is one of the few platforms that do change the exception base. > > After commit 2dcb39645441 ("memblock: do not start bottom-up allocations > > with kernel_end") we started seeing BMIPS boards fail to boot with the > > built-in FDT being corrupted. > > > > Before the cited commit, early allocations would be in the [kernel_end, > > RAM_END] range, but after commit they would be within [RAM_START + > > PAGE_SIZE, RAM_END]. > > > > The custom exception base handler that is installed by > > bmips_ebase_setup() done for BMIPS5000 CPUs ends-up trampling on the > > memory region allocated by unflatten_and_copy_device_tree() thus > > corrupting the FDT used by the kernel. > > > > To fix this, we need to perform an early reservation of the custom > > exception space. Additional we reserve the first 4k (1k for R3k) for > > either normal exception vector space (legacy CPUs) or special vectors > > like cache exceptions. > > > > Huge thanks to Serge for analysing and proposing a solution to this > > issue. > > > > Fixes: 2dcb39645441 ("memblock: do not start bottom-up allocations with > > kernel_end") > > Reported-by: Kamal Dasu > > Debugged-by: Serge Semin > > Signed-off-by: Thomas Bogendoerfer > > --- > > Changes in v3: > > - always reserve the first 4k for all CPUs (1k for R3k) > > > > Changes in v2: > > - do only memblock reservation in reserve_exception_space() > > - reserve 0..0x400 for all CPUs without ebase register and > >to addtional reserve_exception_space for BMIPS CPUs > > Thomas, do you mind CC'ing me for subsequent versions so you can get a > chance to have a Tested-by tag? Thank you! sure, I hope it's the last version ;-) > Tested-by: Florian Fainelli thank you. Thomas. -- Crap can work. Given enough thrust pigs will fly, but it's not necessarily a good idea.[ RFC1925, 2.3 ]
Re: [PATCH v3] MIPS: kernel: Reserve exception base early to prevent corruption
On 3/8/21 1:24 AM, Thomas Bogendoerfer wrote: > BMIPS is one of the few platforms that do change the exception base. > After commit 2dcb39645441 ("memblock: do not start bottom-up allocations > with kernel_end") we started seeing BMIPS boards fail to boot with the > built-in FDT being corrupted. > > Before the cited commit, early allocations would be in the [kernel_end, > RAM_END] range, but after commit they would be within [RAM_START + > PAGE_SIZE, RAM_END]. > > The custom exception base handler that is installed by > bmips_ebase_setup() done for BMIPS5000 CPUs ends-up trampling on the > memory region allocated by unflatten_and_copy_device_tree() thus > corrupting the FDT used by the kernel. > > To fix this, we need to perform an early reservation of the custom > exception space. Additional we reserve the first 4k (1k for R3k) for > either normal exception vector space (legacy CPUs) or special vectors > like cache exceptions. > > Huge thanks to Serge for analysing and proposing a solution to this > issue. > > Fixes: 2dcb39645441 ("memblock: do not start bottom-up allocations with > kernel_end") > Reported-by: Kamal Dasu > Debugged-by: Serge Semin > Signed-off-by: Thomas Bogendoerfer > --- > Changes in v3: > - always reserve the first 4k for all CPUs (1k for R3k) > > Changes in v2: > - do only memblock reservation in reserve_exception_space() > - reserve 0..0x400 for all CPUs without ebase register and >to addtional reserve_exception_space for BMIPS CPUs Thomas, do you mind CC'ing me for subsequent versions so you can get a chance to have a Tested-by tag? Thank you! Tested-by: Florian Fainelli -- Florian
Re: [PATCH v2] MIPS: kernel: Reserve exception base early to prevent corruption
On Sat, Mar 06, 2021 at 09:29:09AM +0100, Thomas Bogendoerfer wrote: > BMIPS is one of the few platforms that do change the exception base. > After commit 2dcb39645441 ("memblock: do not start bottom-up allocations > with kernel_end") we started seeing BMIPS boards fail to boot with the > built-in FDT being corrupted. > > Before the cited commit, early allocations would be in the [kernel_end, > RAM_END] range, but after commit they would be within [RAM_START + > PAGE_SIZE, RAM_END]. > > The custom exception base handler that is installed by > bmips_ebase_setup() done for BMIPS5000 CPUs ends-up trampling on the > memory region allocated by unflatten_and_copy_device_tree() thus > corrupting the FDT used by the kernel. > > To fix this, we need to perform an early reservation of the custom > exception space. So we reserve it already in cpu_probe() for the CPUs > where this is fixed. For CPU with an ebase config register allocation > of exception space will be done in trap_init(). > > Huge thanks to Serget for analysing and proposing a solution to this > issue. > > Fixes: 2dcb39645441 ("memblock: do not start bottom-up allocations with > kernel_end") > Reported-by: Kamal Dasu > Debugged-by: Serge Semin > Signed-off-by: Thomas Bogendoerfer > --- > Changes in v2: > - do only memblock reservation in reserve_exception_space() > - reserve 0..0x400 for all CPUs without ebase register and >to addtional reserve_exception_space for BMIPS CPUs > > arch/mips/include/asm/traps.h| 3 +++ > arch/mips/kernel/cpu-probe.c | 7 +++ > arch/mips/kernel/cpu-r3k-probe.c | 3 +++ > arch/mips/kernel/traps.c | 10 +- > 4 files changed, 18 insertions(+), 5 deletions(-) Acked-by: Roman Gushchin Thanks! > > diff --git a/arch/mips/include/asm/traps.h b/arch/mips/include/asm/traps.h > index 6aa8f126a43d..b710e76c9c65 100644 > --- a/arch/mips/include/asm/traps.h > +++ b/arch/mips/include/asm/traps.h > @@ -24,8 +24,11 @@ extern void (*board_ebase_setup)(void); > extern void (*board_cache_error_setup)(void); > > extern int register_nmi_notifier(struct notifier_block *nb); > +extern void reserve_exception_space(phys_addr_t addr, unsigned long size); > extern char except_vec_nmi[]; > > +#define VECTORSPACING 0x100 /* for EI/VI mode */ > + > #define nmi_notifier(fn, pri) > \ > ({ \ > static struct notifier_block fn##_nb = {\ > diff --git a/arch/mips/kernel/cpu-probe.c b/arch/mips/kernel/cpu-probe.c > index 9a89637b4ecf..b565bc4b900d 100644 > --- a/arch/mips/kernel/cpu-probe.c > +++ b/arch/mips/kernel/cpu-probe.c > @@ -26,6 +26,7 @@ > #include > #include > #include > +#include > #include > > #include "fpu-probe.h" > @@ -1628,6 +1629,7 @@ static inline void cpu_probe_broadcom(struct > cpuinfo_mips *c, unsigned int cpu) > c->cputype = CPU_BMIPS3300; > __cpu_name[cpu] = "Broadcom BMIPS3300"; > set_elf_platform(cpu, "bmips3300"); > + reserve_exception_space(0x400, VECTORSPACING * 64); > break; > case PRID_IMP_BMIPS43XX: { > int rev = c->processor_id & PRID_REV_MASK; > @@ -1638,6 +1640,7 @@ static inline void cpu_probe_broadcom(struct > cpuinfo_mips *c, unsigned int cpu) > __cpu_name[cpu] = "Broadcom BMIPS4380"; > set_elf_platform(cpu, "bmips4380"); > c->options |= MIPS_CPU_RIXI; > + reserve_exception_space(0x400, VECTORSPACING * 64); > } else { > c->cputype = CPU_BMIPS4350; > __cpu_name[cpu] = "Broadcom BMIPS4350"; > @@ -1654,6 +1657,7 @@ static inline void cpu_probe_broadcom(struct > cpuinfo_mips *c, unsigned int cpu) > __cpu_name[cpu] = "Broadcom BMIPS5000"; > set_elf_platform(cpu, "bmips5000"); > c->options |= MIPS_CPU_ULRI | MIPS_CPU_RIXI; > + reserve_exception_space(0x1000, VECTORSPACING * 64); > break; > } > } > @@ -2133,6 +2137,9 @@ void cpu_probe(void) > if (cpu == 0) > __ua_limit = ~((1ull << cpu_vmbits) - 1); > #endif > + > + if (cpu_has_mips_r2_r6) > + reserve_exception_space(0, 0x400); > } > > void cpu_report(void) > diff --git a/arch/mips/kernel/cpu-r3k-probe.c > b/arch/mips/kernel/cpu-r3k-probe.c > index abdbbe8c5a43..af654771918c 100644 > --- a/arch/mips/kernel/cpu-r3k-probe.c > +++ b/arch/mips/kernel/cpu-r3k-probe.c > @@ -21,6 +21,7 @@ > #include > #include > #include > +#include > > #include "fpu-probe.h" > > @@ -158,6 +159,8 @@ void cpu_probe(void) > cpu_set_fpu_opts(c); > else > cpu_set_nofpu_opts(c); > + > + reserve_exception_space(0, 0x400); > } > > void cpu_report(void) > diff --git a/arch/mips/kernel/traps.c
Re: [PATCH v3] MIPS: kernel: Reserve exception base early to prevent corruption
On Mon, Mar 08, 2021 at 10:24:47AM +0100, Thomas Bogendoerfer wrote: > BMIPS is one of the few platforms that do change the exception base. > After commit 2dcb39645441 ("memblock: do not start bottom-up allocations > with kernel_end") we started seeing BMIPS boards fail to boot with the > built-in FDT being corrupted. > > Before the cited commit, early allocations would be in the [kernel_end, > RAM_END] range, but after commit they would be within [RAM_START + > PAGE_SIZE, RAM_END]. > > The custom exception base handler that is installed by > bmips_ebase_setup() done for BMIPS5000 CPUs ends-up trampling on the > memory region allocated by unflatten_and_copy_device_tree() thus > corrupting the FDT used by the kernel. > > To fix this, we need to perform an early reservation of the custom > exception space. Additional we reserve the first 4k (1k for R3k) for > either normal exception vector space (legacy CPUs) or special vectors > like cache exceptions. Just a side note, memblock always skips the first 4k when allocating memory, but explicitly reserving the exception space is The Right Thing To Do anyway :) > Huge thanks to Serge for analysing and proposing a solution to this > issue. > > Fixes: 2dcb39645441 ("memblock: do not start bottom-up allocations with > kernel_end") > Reported-by: Kamal Dasu > Debugged-by: Serge Semin > Signed-off-by: Thomas Bogendoerfer Acked-by: Mike Rapoport > --- > Changes in v3: > - always reserve the first 4k for all CPUs (1k for R3k) > > Changes in v2: > - do only memblock reservation in reserve_exception_space() > - reserve 0..0x400 for all CPUs without ebase register and >to addtional reserve_exception_space for BMIPS CPUs > > arch/mips/include/asm/traps.h| 3 +++ > arch/mips/kernel/cpu-probe.c | 6 ++ > arch/mips/kernel/cpu-r3k-probe.c | 3 +++ > arch/mips/kernel/traps.c | 10 +- > 4 files changed, 17 insertions(+), 5 deletions(-) > > diff --git a/arch/mips/include/asm/traps.h b/arch/mips/include/asm/traps.h > index 6aa8f126a43d..b710e76c9c65 100644 > --- a/arch/mips/include/asm/traps.h > +++ b/arch/mips/include/asm/traps.h > @@ -24,8 +24,11 @@ extern void (*board_ebase_setup)(void); > extern void (*board_cache_error_setup)(void); > > extern int register_nmi_notifier(struct notifier_block *nb); > +extern void reserve_exception_space(phys_addr_t addr, unsigned long size); > extern char except_vec_nmi[]; > > +#define VECTORSPACING 0x100 /* for EI/VI mode */ > + > #define nmi_notifier(fn, pri) > \ > ({ \ > static struct notifier_block fn##_nb = {\ > diff --git a/arch/mips/kernel/cpu-probe.c b/arch/mips/kernel/cpu-probe.c > index 9a89637b4ecf..b71892064f27 100644 > --- a/arch/mips/kernel/cpu-probe.c > +++ b/arch/mips/kernel/cpu-probe.c > @@ -26,6 +26,7 @@ > #include > #include > #include > +#include > #include > > #include "fpu-probe.h" > @@ -1628,6 +1629,7 @@ static inline void cpu_probe_broadcom(struct > cpuinfo_mips *c, unsigned int cpu) > c->cputype = CPU_BMIPS3300; > __cpu_name[cpu] = "Broadcom BMIPS3300"; > set_elf_platform(cpu, "bmips3300"); > + reserve_exception_space(0x400, VECTORSPACING * 64); > break; > case PRID_IMP_BMIPS43XX: { > int rev = c->processor_id & PRID_REV_MASK; > @@ -1638,6 +1640,7 @@ static inline void cpu_probe_broadcom(struct > cpuinfo_mips *c, unsigned int cpu) > __cpu_name[cpu] = "Broadcom BMIPS4380"; > set_elf_platform(cpu, "bmips4380"); > c->options |= MIPS_CPU_RIXI; > + reserve_exception_space(0x400, VECTORSPACING * 64); > } else { > c->cputype = CPU_BMIPS4350; > __cpu_name[cpu] = "Broadcom BMIPS4350"; > @@ -1654,6 +1657,7 @@ static inline void cpu_probe_broadcom(struct > cpuinfo_mips *c, unsigned int cpu) > __cpu_name[cpu] = "Broadcom BMIPS5000"; > set_elf_platform(cpu, "bmips5000"); > c->options |= MIPS_CPU_ULRI | MIPS_CPU_RIXI; > + reserve_exception_space(0x1000, VECTORSPACING * 64); > break; > } > } > @@ -2133,6 +2137,8 @@ void cpu_probe(void) > if (cpu == 0) > __ua_limit = ~((1ull << cpu_vmbits) - 1); > #endif > + > + reserve_exception_space(0, 0x1000); > } > > void cpu_report(void) > diff --git a/arch/mips/kernel/cpu-r3k-probe.c > b/arch/mips/kernel/cpu-r3k-probe.c > index abdbbe8c5a43..af654771918c 100644 > --- a/arch/mips/kernel/cpu-r3k-probe.c > +++ b/arch/mips/kernel/cpu-r3k-probe.c > @@ -21,6 +21,7 @@ > #include > #include > #include > +#include > > #include "fpu-probe.h" > > @@ -158,6 +159,8 @@ void cpu_probe(void) > cpu_set_fpu_opts(c); > else
list_add corruption in __percpu_counter_init
Hello, since updating one of our compile cluster machines from kernel 4.19+105+deb10u9 to 5.10.0-0.bpo.3-amd64 #1 Debian 5.10.13-1~bpo10+1 we're hit by this bug every 1...2 days: list_add corruption. next->prev should be prev (889a9840), but was . (next=9c3dcaf2a310). [ cut here ] kernel BUG at lib/list_debug.c:25! invalid opcode: [#1] SMP PTI CPU: 9 PID: 3281 Comm: python3 Not tainted 5.10.0-0.bpo.3-amd64 #1 Debian 5.10.13-1~bpo10+1 Hardware name: Intel Corporation S5520HC/S5520HC, BIOS S5500.86B.01.00.0059.082320111421 08/23/2011 RIP: 0010:__list_add_valid.cold.0+0x12/0x28 Code: 80 46 32 88 48 89 ef e8 62 3e 01 00 48 c7 c0 f0 ff ff ff e9 bb 1e bf ff 48 89 d1 48 c7 c7 f8 46 32 88 48 89 c2 e8 02 2a ff ff <0f> 0b 48 89 c1 4c 89 c6 48 c7 c7 50 47 32 88 e8 ee 29 ff ff 0f 0b RSP: 0018:bb0862ba3910 EFLAGS: 00010046 RAX: 0075 RBX: 9c3dcaf2a268 RCX: RDX: RSI: 9c46ffb18a00 RDI: 9c46ffb18a00 RBP: 9c3dcaf2a278 R08: R09: c000dfff R10: 0001 R11: bb0862ba3720 R12: 9c3dcaf2a310 R13: 889a9840 R14: 0002 R15: 2800 FS: 7f4bfc65d740() GS:9c46ffb0() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 01f5b100 CR3: 0013298c4001 CR4: 000206e0 Call Trace: __percpu_counter_init+0x64/0xa0 blkg_rwstat_init+0x3c/0xb0 throtl_pd_alloc+0x63/0x230 blkg_alloc+0x134/0x180 blkg_create+0x232/0x320 ? kmem_cache_alloc+0x30c/0x420 bio_associate_blkg_from_css+0x1cb/0x2c0 bio_associate_blkg+0x20/0x70 ? ktime_get+0x3e/0xa0 linear_map+0x50/0x90 [dm_mod] __map_bio+0x3a/0x130 [dm_mod] __split_and_process_non_flush+0x190/0x1e0 [dm_mod] dm_submit_bio+0x152/0x3b0 [dm_mod] submit_bio_noacct+0xfb/0x410 ? iomap_page_mkwrite_actor+0x70/0x70 submit_bio+0x43/0x190 iomap_readahead+0xb5/0x190 read_pages+0x8e/0x270 page_cache_ra_unbounded+0x1a2/0x220 generic_file_buffered_read+0x1a9/0x9b0 xfs_file_buffered_aio_read+0x44/0xb0 [xfs] xfs_file_read_iter+0x6e/0xd0 [xfs] new_sync_read+0x118/0x1a0 vfs_read+0xf1/0x180 ksys_read+0x59/0xd0 do_syscall_64+0x33/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xa9 and another one: list_add corruption. next->prev should be prev (8bba9840), but was . (next=9b12c9ce1b10). [ cut here ] kernel BUG at lib/list_debug.c:25! invalid opcode: [#1] SMP PTI CPU: 1 PID: 11618 Comm: qtdeclarative-n Not tainted 5.10.0-0.bpo.3-amd64 #1 Debian 5.10.13-1~bpo10+1 Hardware name: Intel Corporation S5520HC/S5520HC, BIOS S5500.86B.01.00.0059.082320111421 08/23/2011 RIP: 0010:__list_add_valid.cold.0+0x12/0x28 Code: 80 46 52 8b 48 89 ef e8 62 3e 01 00 48 c7 c0 f0 ff ff ff e9 bb 1e bf ff 48 89 d1 48 c7 c7 f8 46 52 8b 48 89 c2 e8 02 2a ff ff <0f> 0b 48 89 c1 4c 89 c6 48 c7 c7 50 RSP: 0018:bab42038b580 EFLAGS: 00010046 RAX: 0075 RBX: 9b1629c18268 RCX: RDX: RSI: 9b1bbfa18a00 RDI: 9b1bbfa18a00 RBP: 9b1629c18278 R08: R09: c000dfff R10: 0001 R11: bab42038b390 R12: 9b12c9ce1b10 R13: 8bba9840 R14: 0006 R15: 2800 FS: 7f9f93fb2740() GS:9b1bbfa0() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7f9f9385dbc0 CR3: 00082c1f8001 CR4: 000206e0 Call Trace: __percpu_counter_init+0x64/0xa0 blkg_rwstat_init+0x3c/0xb0 throtl_pd_alloc+0x63/0x230 blkg_alloc+0x134/0x180 blkg_create+0x232/0x320 ? find_busiest_group+0x41/0x360 bio_associate_blkg_from_css+0x1cb/0x2c0 bio_associate_blkg+0x20/0x70 ? ktime_get+0x3e/0xa0 linear_map+0x50/0x90 [dm_mod] __map_bio+0x3a/0x130 [dm_mod] __split_and_process_non_flush+0x190/0x1e0 [dm_mod] dm_submit_bio+0x152/0x3b0 [dm_mod] submit_bio_noacct+0xfb/0x410 submit_bio+0x43/0x190 ? bio_add_page+0x62/0x90 _xfs_buf_ioapply+0x2af/0x410 [xfs] [...] and a 3rd one without xfs involved list_add corruption. next->prev should be prev (a4ba9840), but was . (next=95071f379310). [ cut here ] kernel BUG at lib/list_debug.c:25! invalid opcode: [#1] SMP PTI CPU: 21 PID: 12894 Comm: vgs Not tainted 5.10.0-0.bpo.3-amd64 #1 Debian 5.10.13-1~bpo10+1 Hardware name: Intel Corporation S5520HC/S5520HC, BIOS S5500.86B.01.00.0059.082320111421 08/23/2011 RIP: 0010:__list_add_valid.cold.0+0x12/0x28 Code: 80 46 52 a4 48 89 ef e8 62 3e 01 00 48 c7 c0 f0 ff ff ff e9 bb 1e bf ff 48 89 d1 48 c7 c7 f8 46 52 a4 48 89 c2 e8 02 2a ff ff <0f> 0b 48 89 c1 4c 89 c6 48 c7 c7 50 47 52 a4 e8 ee 29 ff ff 0f 0b RSP: 0018:ad41c741fa80 EFLAGS: 00010046 RAX: 0075 RBX: 95071f379268 RCX: RDX: RSI: 950dffc98a00 RDI: 950dffc98a00 RBP: 95071f379278 R08: 00
[PATCH v3] MIPS: kernel: Reserve exception base early to prevent corruption
BMIPS is one of the few platforms that do change the exception base. After commit 2dcb39645441 ("memblock: do not start bottom-up allocations with kernel_end") we started seeing BMIPS boards fail to boot with the built-in FDT being corrupted. Before the cited commit, early allocations would be in the [kernel_end, RAM_END] range, but after commit they would be within [RAM_START + PAGE_SIZE, RAM_END]. The custom exception base handler that is installed by bmips_ebase_setup() done for BMIPS5000 CPUs ends-up trampling on the memory region allocated by unflatten_and_copy_device_tree() thus corrupting the FDT used by the kernel. To fix this, we need to perform an early reservation of the custom exception space. Additional we reserve the first 4k (1k for R3k) for either normal exception vector space (legacy CPUs) or special vectors like cache exceptions. Huge thanks to Serge for analysing and proposing a solution to this issue. Fixes: 2dcb39645441 ("memblock: do not start bottom-up allocations with kernel_end") Reported-by: Kamal Dasu Debugged-by: Serge Semin Signed-off-by: Thomas Bogendoerfer --- Changes in v3: - always reserve the first 4k for all CPUs (1k for R3k) Changes in v2: - do only memblock reservation in reserve_exception_space() - reserve 0..0x400 for all CPUs without ebase register and to addtional reserve_exception_space for BMIPS CPUs arch/mips/include/asm/traps.h| 3 +++ arch/mips/kernel/cpu-probe.c | 6 ++ arch/mips/kernel/cpu-r3k-probe.c | 3 +++ arch/mips/kernel/traps.c | 10 +- 4 files changed, 17 insertions(+), 5 deletions(-) diff --git a/arch/mips/include/asm/traps.h b/arch/mips/include/asm/traps.h index 6aa8f126a43d..b710e76c9c65 100644 --- a/arch/mips/include/asm/traps.h +++ b/arch/mips/include/asm/traps.h @@ -24,8 +24,11 @@ extern void (*board_ebase_setup)(void); extern void (*board_cache_error_setup)(void); extern int register_nmi_notifier(struct notifier_block *nb); +extern void reserve_exception_space(phys_addr_t addr, unsigned long size); extern char except_vec_nmi[]; +#define VECTORSPACING 0x100/* for EI/VI mode */ + #define nmi_notifier(fn, pri) \ ({ \ static struct notifier_block fn##_nb = {\ diff --git a/arch/mips/kernel/cpu-probe.c b/arch/mips/kernel/cpu-probe.c index 9a89637b4ecf..b71892064f27 100644 --- a/arch/mips/kernel/cpu-probe.c +++ b/arch/mips/kernel/cpu-probe.c @@ -26,6 +26,7 @@ #include #include #include +#include #include #include "fpu-probe.h" @@ -1628,6 +1629,7 @@ static inline void cpu_probe_broadcom(struct cpuinfo_mips *c, unsigned int cpu) c->cputype = CPU_BMIPS3300; __cpu_name[cpu] = "Broadcom BMIPS3300"; set_elf_platform(cpu, "bmips3300"); + reserve_exception_space(0x400, VECTORSPACING * 64); break; case PRID_IMP_BMIPS43XX: { int rev = c->processor_id & PRID_REV_MASK; @@ -1638,6 +1640,7 @@ static inline void cpu_probe_broadcom(struct cpuinfo_mips *c, unsigned int cpu) __cpu_name[cpu] = "Broadcom BMIPS4380"; set_elf_platform(cpu, "bmips4380"); c->options |= MIPS_CPU_RIXI; + reserve_exception_space(0x400, VECTORSPACING * 64); } else { c->cputype = CPU_BMIPS4350; __cpu_name[cpu] = "Broadcom BMIPS4350"; @@ -1654,6 +1657,7 @@ static inline void cpu_probe_broadcom(struct cpuinfo_mips *c, unsigned int cpu) __cpu_name[cpu] = "Broadcom BMIPS5000"; set_elf_platform(cpu, "bmips5000"); c->options |= MIPS_CPU_ULRI | MIPS_CPU_RIXI; + reserve_exception_space(0x1000, VECTORSPACING * 64); break; } } @@ -2133,6 +2137,8 @@ void cpu_probe(void) if (cpu == 0) __ua_limit = ~((1ull << cpu_vmbits) - 1); #endif + + reserve_exception_space(0, 0x1000); } void cpu_report(void) diff --git a/arch/mips/kernel/cpu-r3k-probe.c b/arch/mips/kernel/cpu-r3k-probe.c index abdbbe8c5a43..af654771918c 100644 --- a/arch/mips/kernel/cpu-r3k-probe.c +++ b/arch/mips/kernel/cpu-r3k-probe.c @@ -21,6 +21,7 @@ #include #include #include +#include #include "fpu-probe.h" @@ -158,6 +159,8 @@ void cpu_probe(void) cpu_set_fpu_opts(c); else cpu_set_nofpu_opts(c); + + reserve_exception_space(0, 0x400); } void cpu_report(void) diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c index e0352958e2f7..808b8b61ded1 100644 --- a/arch/mips/kernel/traps.c +++ b/arch/mips/kernel/traps.c @@ -2009,13 +2009,16 @@ void __noreturn nmi_exception_handler(struct pt_regs *regs) nmi_exit(); } -#define VECTORSPACING 0x100/* for EI/VI mode */ - unsign
Re: [PATCH v2] MIPS: kernel: Reserve exception base early to prevent corruption
On Mon, 8 Mar 2021, Serge Semin wrote: > > some of them are not R2 (SB1), others are. So best bet would be to > > simply reserve the first 0x1000 bytes for every CPU and special handling > > for the BMIPS case. Does this cover all cases ? > > I can't say for sure whether it will cover all the cases/platforms. I > visually analysed all the > board_{nmi_handler,ejtag_handler,ebase,cache_error}_setup callbacks > implementation in MIPS arch to create the list above. Exception vectors or > some other stuff can be setup in some other platform-specific manner. But at > least reserving a memory below PAGE_SIZE would get the situation partly back > to before the memory below the kernel stopped being reserved. Hopefully > one page will be enough for the platforms, which relied on that rule. The > rest or them sooner or later will manifest itself as it has happened with > Broadcom. I think reserving up to 64KiB might be a bit excessive on one hand while having the size of the reservation depend on configured PAGE_SIZE could be too unpredictable on the other. I think 4KiB is a good compromise and I'd leave anything else for platform maintainers to sort out. Maciej
Re: [PATCH v2] MIPS: kernel: Reserve exception base early to prevent corruption
On Sun, Mar 07, 2021 at 10:20:01PM +0100, Thomas Bogendoerfer wrote: > On Sun, Mar 07, 2021 at 11:06:12PM +0300, Serge Semin wrote: > > > + > > > + if (cpu_has_mips_r2_r6) > > > + reserve_exception_space(0, 0x400); > > > > Are you sure it shouldn't be (!cpu_has_mips_r2_r6)?. What I see here > > contradicts to what is said in Changelog v2. > > d'oh, of course it has to be !cpu_has_mips_r2_r6. > > > Anyway regarding the problem in general. AFAICS the next code uses the > > lowest memory to place some specific exception handlers: > > board_cache_error_setup pointer: > > arch/mips/mm/c-r4k.c: r4k_cache_error_setup() - SiByte CPUs: CPU_SB1, > > CPU_SB1A (up to 0x180) > > arch/mips/mm/c-octeon.c: octeon_cache_error_setup() - Cavium CPU: > > CPU_CAVIUM_OCTEON (up to 0x180) > > board_nmi_handler_setup pointer: > > arch/mips/kernel/smp-bmips.c: bmips_nmi_handler_setup() - Broadcom CPU: > > CPU_BMIPS (up to 0x400) > > arch/mips/loongson2ef/common/init.c: mips_nmi_setup() - Loongson 2E CPU: > > MACH_LOONGSON2EF (up to 0x400) > > arch/mips/loongson64/init.c: mips_nmi_setup() - Loongson 64 CPU: > > MACH_LOONGSON64 (up to 0x400, VEIC:0xB00) > > arch/mips/mti-malta/malta-init.c: mips_nmi_setup() - Malta CPU: > > MIPS_MALTA (up to 0x400, VEIC: 0xB00) > > arch/mips/pistachio/init.c: mips_nmi_setup() - Pistachio CPU: > > MACH_PISTACHIO (up to 0x400, VEIC: 0xB00) > > board_ejtag_handler_setup: > > arch/mips/mti-malta/malta-init.c: mips_ejtag_setup() - Malta CPU: > > MIPS_MALTA (up to 0x380, VEIC: 0xa80) > > arch/mips/pistachio/init.c: mips_ejtag_setup() - Pistachio CPU: > > MACH_PISTACHIO (up to 0x380, VEIC: 0xa80) > > bmips_ebase_setup: > > arch/mips/kernel/smp-bmips.c: bmips_ebase_setup() - Broadcom CPU: > > CPU_BMIPS (up to 0x400 - NMI/reset, and 0x1000 - normal) > > plat_mem_setup: > > arch/mips/bmips/setup.c: bcm63xx_fixup_cpu1() - Broadcom CPU: CPU_BMIPS > > (up to 0x220) > > > > > > Are you sure all of them have "cpu_has_mips_r2_r6" macro returning > > true (false) in order to safely use the lowest region in accordance > > with the conditional statement you've added? > > some of them are not R2 (SB1), others are. So best bet would be to > simply reserve the first 0x1000 bytes for every CPU and special handling > for the BMIPS case. Does this cover all cases ? I can't say for sure whether it will cover all the cases/platforms. I visually analysed all the board_{nmi_handler,ejtag_handler,ebase,cache_error}_setup callbacks implementation in MIPS arch to create the list above. Exception vectors or some other stuff can be setup in some other platform-specific manner. But at least reserving a memory below PAGE_SIZE would get the situation partly back to before the memory below the kernel stopped being reserved. Hopefully one page will be enough for the platforms, which relied on that rule. The rest or them sooner or later will manifest itself as it has happened with Broadcom. -Sergey > > Thomas. > > -- > Crap can work. Given enough thrust pigs will fly, but it's not necessarily a > good idea.[ RFC1925, 2.3 ]
Re: [PATCH v2] MIPS: kernel: Reserve exception base early to prevent corruption
On Sun, 7 Mar 2021, Thomas Bogendoerfer wrote: > > Are you sure all of them have "cpu_has_mips_r2_r6" macro returning > > true (false) in order to safely use the lowest region in accordance > > with the conditional statement you've added? > > some of them are not R2 (SB1), others are. For the record Malta is just about anything from MIPS IV up (though the QED RM5261 and RM7061 core cards have been quite rare). Maciej
Re: [PATCH v2] MIPS: kernel: Reserve exception base early to prevent corruption
On Sun, Mar 07, 2021 at 11:06:12PM +0300, Serge Semin wrote: > > + > > + if (cpu_has_mips_r2_r6) > > + reserve_exception_space(0, 0x400); > > Are you sure it shouldn't be (!cpu_has_mips_r2_r6)?. What I see here > contradicts to what is said in Changelog v2. d'oh, of course it has to be !cpu_has_mips_r2_r6. > Anyway regarding the problem in general. AFAICS the next code uses the > lowest memory to place some specific exception handlers: > board_cache_error_setup pointer: > arch/mips/mm/c-r4k.c: r4k_cache_error_setup() - SiByte CPUs: CPU_SB1, > CPU_SB1A (up to 0x180) > arch/mips/mm/c-octeon.c: octeon_cache_error_setup() - Cavium CPU: > CPU_CAVIUM_OCTEON (up to 0x180) > board_nmi_handler_setup pointer: > arch/mips/kernel/smp-bmips.c: bmips_nmi_handler_setup() - Broadcom CPU: > CPU_BMIPS (up to 0x400) > arch/mips/loongson2ef/common/init.c: mips_nmi_setup() - Loongson 2E CPU: > MACH_LOONGSON2EF (up to 0x400) > arch/mips/loongson64/init.c: mips_nmi_setup() - Loongson 64 CPU: > MACH_LOONGSON64 (up to 0x400, VEIC:0xB00) > arch/mips/mti-malta/malta-init.c: mips_nmi_setup() - Malta CPU: MIPS_MALTA > (up to 0x400, VEIC: 0xB00) > arch/mips/pistachio/init.c: mips_nmi_setup() - Pistachio CPU: > MACH_PISTACHIO (up to 0x400, VEIC: 0xB00) > board_ejtag_handler_setup: > arch/mips/mti-malta/malta-init.c: mips_ejtag_setup() - Malta CPU: > MIPS_MALTA (up to 0x380, VEIC: 0xa80) > arch/mips/pistachio/init.c: mips_ejtag_setup() - Pistachio CPU: > MACH_PISTACHIO (up to 0x380, VEIC: 0xa80) > bmips_ebase_setup: > arch/mips/kernel/smp-bmips.c: bmips_ebase_setup() - Broadcom CPU: CPU_BMIPS > (up to 0x400 - NMI/reset, and 0x1000 - normal) > plat_mem_setup: > arch/mips/bmips/setup.c: bcm63xx_fixup_cpu1() - Broadcom CPU: CPU_BMIPS (up > to 0x220) > > > Are you sure all of them have "cpu_has_mips_r2_r6" macro returning > true (false) in order to safely use the lowest region in accordance > with the conditional statement you've added? some of them are not R2 (SB1), others are. So best bet would be to simply reserve the first 0x1000 bytes for every CPU and special handling for the BMIPS case. Does this cover all cases ? Thomas. -- Crap can work. Given enough thrust pigs will fly, but it's not necessarily a good idea.[ RFC1925, 2.3 ]
Re: [PATCH v2] MIPS: kernel: Reserve exception base early to prevent corruption
Hi Thomas. I thought we'd discuss it in v1, but since you've sent v2 please see my comment below. On Sat, Mar 06, 2021 at 09:29:09AM +0100, Thomas Bogendoerfer wrote: > BMIPS is one of the few platforms that do change the exception base. > After commit 2dcb39645441 ("memblock: do not start bottom-up allocations > with kernel_end") we started seeing BMIPS boards fail to boot with the > built-in FDT being corrupted. > > Before the cited commit, early allocations would be in the [kernel_end, > RAM_END] range, but after commit they would be within [RAM_START + > PAGE_SIZE, RAM_END]. > > The custom exception base handler that is installed by > bmips_ebase_setup() done for BMIPS5000 CPUs ends-up trampling on the > memory region allocated by unflatten_and_copy_device_tree() thus > corrupting the FDT used by the kernel. > > To fix this, we need to perform an early reservation of the custom > exception space. So we reserve it already in cpu_probe() for the CPUs > where this is fixed. For CPU with an ebase config register allocation > of exception space will be done in trap_init(). > > Huge thanks to Serget for analysing and proposing a solution to this > issue. > > Fixes: 2dcb39645441 ("memblock: do not start bottom-up allocations with > kernel_end") > Reported-by: Kamal Dasu > Debugged-by: Serge Semin > Signed-off-by: Thomas Bogendoerfer > --- > Changes in v2: > - do only memblock reservation in reserve_exception_space() > - reserve 0..0x400 for all CPUs without ebase register and >to addtional reserve_exception_space for BMIPS CPUs > > arch/mips/include/asm/traps.h| 3 +++ > arch/mips/kernel/cpu-probe.c | 7 +++ > arch/mips/kernel/cpu-r3k-probe.c | 3 +++ > arch/mips/kernel/traps.c | 10 +- > 4 files changed, 18 insertions(+), 5 deletions(-) > > diff --git a/arch/mips/include/asm/traps.h b/arch/mips/include/asm/traps.h > index 6aa8f126a43d..b710e76c9c65 100644 > --- a/arch/mips/include/asm/traps.h > +++ b/arch/mips/include/asm/traps.h > @@ -24,8 +24,11 @@ extern void (*board_ebase_setup)(void); > extern void (*board_cache_error_setup)(void); > > extern int register_nmi_notifier(struct notifier_block *nb); > +extern void reserve_exception_space(phys_addr_t addr, unsigned long size); > extern char except_vec_nmi[]; > > +#define VECTORSPACING 0x100 /* for EI/VI mode */ > + > #define nmi_notifier(fn, pri) > \ > ({ \ > static struct notifier_block fn##_nb = {\ > diff --git a/arch/mips/kernel/cpu-probe.c b/arch/mips/kernel/cpu-probe.c > index 9a89637b4ecf..b565bc4b900d 100644 > --- a/arch/mips/kernel/cpu-probe.c > +++ b/arch/mips/kernel/cpu-probe.c > @@ -26,6 +26,7 @@ > #include > #include > #include > +#include > #include > > #include "fpu-probe.h" > @@ -1628,6 +1629,7 @@ static inline void cpu_probe_broadcom(struct > cpuinfo_mips *c, unsigned int cpu) > c->cputype = CPU_BMIPS3300; > __cpu_name[cpu] = "Broadcom BMIPS3300"; > set_elf_platform(cpu, "bmips3300"); > + reserve_exception_space(0x400, VECTORSPACING * 64); > break; > case PRID_IMP_BMIPS43XX: { > int rev = c->processor_id & PRID_REV_MASK; > @@ -1638,6 +1640,7 @@ static inline void cpu_probe_broadcom(struct > cpuinfo_mips *c, unsigned int cpu) > __cpu_name[cpu] = "Broadcom BMIPS4380"; > set_elf_platform(cpu, "bmips4380"); > c->options |= MIPS_CPU_RIXI; > + reserve_exception_space(0x400, VECTORSPACING * 64); > } else { > c->cputype = CPU_BMIPS4350; > __cpu_name[cpu] = "Broadcom BMIPS4350"; > @@ -1654,6 +1657,7 @@ static inline void cpu_probe_broadcom(struct > cpuinfo_mips *c, unsigned int cpu) > __cpu_name[cpu] = "Broadcom BMIPS5000"; > set_elf_platform(cpu, "bmips5000"); > c->options |= MIPS_CPU_ULRI | MIPS_CPU_RIXI; > + reserve_exception_space(0x1000, VECTORSPACING * 64); > break; > } > } > @@ -2133,6 +2137,9 @@ void cpu_probe(void) > if (cpu == 0) > __ua_limit = ~((1ull << cpu_vmbits) - 1); > #endif > + > + if (cpu_has_mips_r2_r6) > + reserve_exception_space(0, 0x400); Are you sure it shouldn't be (!cpu_has_mips_r2_r6)?. What I see here contradicts to what is said in Changelog v2. Anyway regarding the problem in general. AFAICS the next code uses the lowest memory to place some specific exception handlers: board_cache_error_setup pointer: arch/mips/mm/c-r4k.c: r4k_cache_error_setup() - SiByte CPUs: CPU_SB1, CPU_SB1A (up to 0x180) arch/mips/mm/c-octeon.c: octeon_cache_error_setup() - Cavium CPU: CPU_CAVIUM_OCTEON (up to 0x180) board_nmi_handler_setup pointer: arch/mips/kernel/smp-bmips.c:
Re: [PATCH v2] MIPS: kernel: Reserve exception base early to prevent corruption
On Sat, Mar 06, 2021 at 09:29:09AM +0100, Thomas Bogendoerfer wrote: > BMIPS is one of the few platforms that do change the exception base. > After commit 2dcb39645441 ("memblock: do not start bottom-up allocations > with kernel_end") we started seeing BMIPS boards fail to boot with the > built-in FDT being corrupted. > > Before the cited commit, early allocations would be in the [kernel_end, > RAM_END] range, but after commit they would be within [RAM_START + > PAGE_SIZE, RAM_END]. > > The custom exception base handler that is installed by > bmips_ebase_setup() done for BMIPS5000 CPUs ends-up trampling on the > memory region allocated by unflatten_and_copy_device_tree() thus > corrupting the FDT used by the kernel. > > To fix this, we need to perform an early reservation of the custom > exception space. So we reserve it already in cpu_probe() for the CPUs > where this is fixed. For CPU with an ebase config register allocation > of exception space will be done in trap_init(). > > Huge thanks to Serget for analysing and proposing a solution to this > issue. > > Fixes: 2dcb39645441 ("memblock: do not start bottom-up allocations with > kernel_end") > Reported-by: Kamal Dasu > Debugged-by: Serge Semin > Signed-off-by: Thomas Bogendoerfer Acked-by: Mike Rapoport > --- > Changes in v2: > - do only memblock reservation in reserve_exception_space() > - reserve 0..0x400 for all CPUs without ebase register and >to addtional reserve_exception_space for BMIPS CPUs > > arch/mips/include/asm/traps.h| 3 +++ > arch/mips/kernel/cpu-probe.c | 7 +++ > arch/mips/kernel/cpu-r3k-probe.c | 3 +++ > arch/mips/kernel/traps.c | 10 +- > 4 files changed, 18 insertions(+), 5 deletions(-) > > diff --git a/arch/mips/include/asm/traps.h b/arch/mips/include/asm/traps.h > index 6aa8f126a43d..b710e76c9c65 100644 > --- a/arch/mips/include/asm/traps.h > +++ b/arch/mips/include/asm/traps.h > @@ -24,8 +24,11 @@ extern void (*board_ebase_setup)(void); > extern void (*board_cache_error_setup)(void); > > extern int register_nmi_notifier(struct notifier_block *nb); > +extern void reserve_exception_space(phys_addr_t addr, unsigned long size); > extern char except_vec_nmi[]; > > +#define VECTORSPACING 0x100 /* for EI/VI mode */ > + > #define nmi_notifier(fn, pri) > \ > ({ \ > static struct notifier_block fn##_nb = {\ > diff --git a/arch/mips/kernel/cpu-probe.c b/arch/mips/kernel/cpu-probe.c > index 9a89637b4ecf..b565bc4b900d 100644 > --- a/arch/mips/kernel/cpu-probe.c > +++ b/arch/mips/kernel/cpu-probe.c > @@ -26,6 +26,7 @@ > #include > #include > #include > +#include > #include > > #include "fpu-probe.h" > @@ -1628,6 +1629,7 @@ static inline void cpu_probe_broadcom(struct > cpuinfo_mips *c, unsigned int cpu) > c->cputype = CPU_BMIPS3300; > __cpu_name[cpu] = "Broadcom BMIPS3300"; > set_elf_platform(cpu, "bmips3300"); > + reserve_exception_space(0x400, VECTORSPACING * 64); > break; > case PRID_IMP_BMIPS43XX: { > int rev = c->processor_id & PRID_REV_MASK; > @@ -1638,6 +1640,7 @@ static inline void cpu_probe_broadcom(struct > cpuinfo_mips *c, unsigned int cpu) > __cpu_name[cpu] = "Broadcom BMIPS4380"; > set_elf_platform(cpu, "bmips4380"); > c->options |= MIPS_CPU_RIXI; > + reserve_exception_space(0x400, VECTORSPACING * 64); > } else { > c->cputype = CPU_BMIPS4350; > __cpu_name[cpu] = "Broadcom BMIPS4350"; > @@ -1654,6 +1657,7 @@ static inline void cpu_probe_broadcom(struct > cpuinfo_mips *c, unsigned int cpu) > __cpu_name[cpu] = "Broadcom BMIPS5000"; > set_elf_platform(cpu, "bmips5000"); > c->options |= MIPS_CPU_ULRI | MIPS_CPU_RIXI; > + reserve_exception_space(0x1000, VECTORSPACING * 64); > break; > } > } > @@ -2133,6 +2137,9 @@ void cpu_probe(void) > if (cpu == 0) > __ua_limit = ~((1ull << cpu_vmbits) - 1); > #endif > + > + if (cpu_has_mips_r2_r6) > + reserve_exception_space(0, 0x400); > } > > void cpu_report(void) > diff --git a/arch/mips/kernel/cpu-r3k-probe.c > b/arch/mips/kernel/cpu-r3k-probe.c > index abdbbe8c5a43..af654771918c 100644 > --- a/arch/mips/kernel/cpu-r3k-probe.c > +++ b/arch/mips/kernel/cpu-r3k-probe.c > @@ -21,6 +21,7 @@ > #include > #include > #include > +#include > > #include "fpu-probe.h" > > @@ -158,6 +159,8 @@ void cpu_probe(void) > cpu_set_fpu_opts(c); > else > cpu_set_nofpu_opts(c); > + > + reserve_exception_space(0, 0x400); > } > > void cpu_report(void) > diff --git a/arch/mips/kernel/traps.c b/arch/mi
Re: [PATCH v2] MIPS: kernel: Reserve exception base early to prevent corruption
On 3/6/2021 12:29 AM, Thomas Bogendoerfer wrote: > BMIPS is one of the few platforms that do change the exception base. > After commit 2dcb39645441 ("memblock: do not start bottom-up allocations > with kernel_end") we started seeing BMIPS boards fail to boot with the > built-in FDT being corrupted. > > Before the cited commit, early allocations would be in the [kernel_end, > RAM_END] range, but after commit they would be within [RAM_START + > PAGE_SIZE, RAM_END]. > > The custom exception base handler that is installed by > bmips_ebase_setup() done for BMIPS5000 CPUs ends-up trampling on the > memory region allocated by unflatten_and_copy_device_tree() thus > corrupting the FDT used by the kernel. > > To fix this, we need to perform an early reservation of the custom > exception space. So we reserve it already in cpu_probe() for the CPUs > where this is fixed. For CPU with an ebase config register allocation > of exception space will be done in trap_init(). > > Huge thanks to Serget for analysing and proposing a solution to this > issue. I made a typo on Sergey's name in my original version here. > > Fixes: 2dcb39645441 ("memblock: do not start bottom-up allocations with > kernel_end") > Reported-by: Kamal Dasu > Debugged-by: Serge Semin > Signed-off-by: Thomas Bogendoerfer Tested-by: Florian Fainelli Thanks! -- Florian
[PATCH v2] MIPS: kernel: Reserve exception base early to prevent corruption
BMIPS is one of the few platforms that do change the exception base. After commit 2dcb39645441 ("memblock: do not start bottom-up allocations with kernel_end") we started seeing BMIPS boards fail to boot with the built-in FDT being corrupted. Before the cited commit, early allocations would be in the [kernel_end, RAM_END] range, but after commit they would be within [RAM_START + PAGE_SIZE, RAM_END]. The custom exception base handler that is installed by bmips_ebase_setup() done for BMIPS5000 CPUs ends-up trampling on the memory region allocated by unflatten_and_copy_device_tree() thus corrupting the FDT used by the kernel. To fix this, we need to perform an early reservation of the custom exception space. So we reserve it already in cpu_probe() for the CPUs where this is fixed. For CPU with an ebase config register allocation of exception space will be done in trap_init(). Huge thanks to Serget for analysing and proposing a solution to this issue. Fixes: 2dcb39645441 ("memblock: do not start bottom-up allocations with kernel_end") Reported-by: Kamal Dasu Debugged-by: Serge Semin Signed-off-by: Thomas Bogendoerfer --- Changes in v2: - do only memblock reservation in reserve_exception_space() - reserve 0..0x400 for all CPUs without ebase register and to addtional reserve_exception_space for BMIPS CPUs arch/mips/include/asm/traps.h| 3 +++ arch/mips/kernel/cpu-probe.c | 7 +++ arch/mips/kernel/cpu-r3k-probe.c | 3 +++ arch/mips/kernel/traps.c | 10 +- 4 files changed, 18 insertions(+), 5 deletions(-) diff --git a/arch/mips/include/asm/traps.h b/arch/mips/include/asm/traps.h index 6aa8f126a43d..b710e76c9c65 100644 --- a/arch/mips/include/asm/traps.h +++ b/arch/mips/include/asm/traps.h @@ -24,8 +24,11 @@ extern void (*board_ebase_setup)(void); extern void (*board_cache_error_setup)(void); extern int register_nmi_notifier(struct notifier_block *nb); +extern void reserve_exception_space(phys_addr_t addr, unsigned long size); extern char except_vec_nmi[]; +#define VECTORSPACING 0x100/* for EI/VI mode */ + #define nmi_notifier(fn, pri) \ ({ \ static struct notifier_block fn##_nb = {\ diff --git a/arch/mips/kernel/cpu-probe.c b/arch/mips/kernel/cpu-probe.c index 9a89637b4ecf..b565bc4b900d 100644 --- a/arch/mips/kernel/cpu-probe.c +++ b/arch/mips/kernel/cpu-probe.c @@ -26,6 +26,7 @@ #include #include #include +#include #include #include "fpu-probe.h" @@ -1628,6 +1629,7 @@ static inline void cpu_probe_broadcom(struct cpuinfo_mips *c, unsigned int cpu) c->cputype = CPU_BMIPS3300; __cpu_name[cpu] = "Broadcom BMIPS3300"; set_elf_platform(cpu, "bmips3300"); + reserve_exception_space(0x400, VECTORSPACING * 64); break; case PRID_IMP_BMIPS43XX: { int rev = c->processor_id & PRID_REV_MASK; @@ -1638,6 +1640,7 @@ static inline void cpu_probe_broadcom(struct cpuinfo_mips *c, unsigned int cpu) __cpu_name[cpu] = "Broadcom BMIPS4380"; set_elf_platform(cpu, "bmips4380"); c->options |= MIPS_CPU_RIXI; + reserve_exception_space(0x400, VECTORSPACING * 64); } else { c->cputype = CPU_BMIPS4350; __cpu_name[cpu] = "Broadcom BMIPS4350"; @@ -1654,6 +1657,7 @@ static inline void cpu_probe_broadcom(struct cpuinfo_mips *c, unsigned int cpu) __cpu_name[cpu] = "Broadcom BMIPS5000"; set_elf_platform(cpu, "bmips5000"); c->options |= MIPS_CPU_ULRI | MIPS_CPU_RIXI; + reserve_exception_space(0x1000, VECTORSPACING * 64); break; } } @@ -2133,6 +2137,9 @@ void cpu_probe(void) if (cpu == 0) __ua_limit = ~((1ull << cpu_vmbits) - 1); #endif + + if (cpu_has_mips_r2_r6) + reserve_exception_space(0, 0x400); } void cpu_report(void) diff --git a/arch/mips/kernel/cpu-r3k-probe.c b/arch/mips/kernel/cpu-r3k-probe.c index abdbbe8c5a43..af654771918c 100644 --- a/arch/mips/kernel/cpu-r3k-probe.c +++ b/arch/mips/kernel/cpu-r3k-probe.c @@ -21,6 +21,7 @@ #include #include #include +#include #include "fpu-probe.h" @@ -158,6 +159,8 @@ void cpu_probe(void) cpu_set_fpu_opts(c); else cpu_set_nofpu_opts(c); + + reserve_exception_space(0, 0x400); } void cpu_report(void) diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c index e0352958e2f7..808b8b61ded1 100644 --- a/arch/mips/kernel/traps.c +++ b/arch/mips/kernel/traps.c @@ -2009,13 +2009,16 @@ void __noreturn nmi_exception_handler(struct pt_regs *regs) nmi_exit(); } -#define VECTORSPACING 0x100/* for EI/VI mode */ - unsigned lon
[PATCH v4] mm/userfaultfd: fix memory corruption due to writeprotect
From: Nadav Amit Userfaultfd self-test fails occasionally, indicating a memory corruption. Analyzing this problem indicates that there is a real bug since mmap_lock is only taken for read in mwriteprotect_range() and defers flushes, and since there is insufficient consideration of concurrent deferred TLB flushes in wp_page_copy(). Although the PTE is flushed from the TLBs in wp_page_copy(), this flush takes place after the copy has already been performed, and therefore changes of the page are possible between the time of the copy and the time in which the PTE is flushed. To make matters worse, memory-unprotection using userfaultfd also poses a problem. Although memory unprotection is logically a promotion of PTE permissions, and therefore should not require a TLB flush, the current userrfaultfd code might actually cause a demotion of the architectural PTE permission: when userfaultfd_writeprotect() unprotects memory region, it unintentionally *clears* the RW-bit if it was already set. Note that this unprotecting a PTE that is not write-protected is a valid use-case: the userfaultfd monitor might ask to unprotect a region that holds both write-protected and write-unprotected PTEs. The scenario that happens in selftests/vm/userfaultfd is as follows: cpu0cpu1cpu2 [ Writable PTE cached in TLB ] userfaultfd_writeprotect() [ write-*unprotect* ] mwriteprotect_range() mmap_read_lock() change_protection() change_protection_range() ... change_pte_range() [ *clear* “write”-bit ] [ defer TLB flushes ] [ page-fault ] ... wp_page_copy() cow_user_page() [ copy page ] [ write to old page ] ... set_pte_at_notify() A similar scenario can happen: cpu0cpu1cpu2cpu3 [ Writable PTE cached in TLB ] userfaultfd_writeprotect() [ write-protect ] [ deferred TLB flush ] userfaultfd_writeprotect() [ write-unprotect ] [ deferred TLB flush] [ page-fault ] wp_page_copy() cow_user_page() [ copy page ] ...[ write to page ] set_pte_at_notify() This race exists since commit 292924b26024 ("userfaultfd: wp: apply _PAGE_UFFD_WP bit"). Yet, as Yu Zhao pointed, these races became apparent since commit 09854ba94c6a ("mm: do_wp_page() simplification") which made wp_page_copy() more likely to take place, specifically if page_count(page) > 1. To resolve the aforementioned races, check whether there are pending flushes on uffd-write-protected VMAs, and if there are, perform a flush before doing the COW. Further optimizations will follow to avoid during uffd-write-unprotect unnecassary PTE write-protection and TLB flushes. Cc: Andrea Arcangeli Cc: Andy Lutomirski Cc: Pavel Emelyanov Cc: Mike Kravetz Cc: Mike Rapoport Cc: Minchan Kim Cc: Will Deacon Cc: Peter Zijlstra Cc: sta...@vger.kernel.org # 5.9+ Suggested-by: Yu Zhao Reviewed-by: Peter Xu Tested-by: Peter Xu Fixes: 09854ba94c6a ("mm: do_wp_page() simplification") Signed-off-by: Nadav Amit --- v3->v4: * Fix the "Fixes" tag for real [Peter Xu] * Reviewed-by, suggested-by tags [Peter Xu] * Adding unlikely() [Peter Xu] v2->v3: * Do not acquire mmap_lock for write, flush conditionally instead [Yu] * Change the fixes tag to the patch that made the race apparent [Yu] * Removing patch to avoid write-protect on uffd unprotect. More comprehensive solution to follow (and avoid the TLB flush as well). --- mm/memory.c | 8 1 file changed, 8 insertions(+) diff --git a/mm/memory.c b/mm/memory.c index 9e8576a83147..79253cb3bcd5 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3092,6 +3092,14 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf) return handle_userfault(vmf, VM_UFFD_WP); } + /* +* Userfaultfd write-protect can defer flushes. Ensure the TLB +* is flushed in this case before copying. +*/ + if (unlikely(userfaultfd_wp(vmf->vma) && +mm_tlb_flush_pending(vmf->vma->vm_mm))) + flush_tlb_page(vmf->vma, vmf->address
Re: [PATCH RESEND v3] mm/userfaultfd: fix memory corruption due to writeprotect
> On Mar 3, 2021, at 11:03 AM, Peter Xu wrote: > > On Wed, Mar 03, 2021 at 01:57:02AM -0800, Nadav Amit wrote: >> From: Nadav Amit >> >> Userfaultfd self-test fails occasionally, indicating a memory >> corruption. > > It's failing very constantly now for me after I got it run on a 40 cores > system... While indeed not easy to fail on my laptop. > It fails rather constantly for me, but since nobody else reproduced it, I was afraid to say otherwise ;-) > >> Fixes: 292924b26024 ("userfaultfd: wp: apply _PAGE_UFFD_WP bit") >> Signed-off-by: Nadav Amit >> >> --- >> v2->v3: >> * Do not acquire mmap_lock for write, flush conditionally instead [Yu] >> * Change the fixes tag to the patch that made the race apparent [Yu] > > Did you forget about this one? It would still be good to point to > 09854ba94c6a > just to show that 5.7/5.8 stable branches shouldn't need this patch as they're > not prone to the tlb data curruption. Maybe also cc stable with 5.9+? The fixes tag is wrong, as you say. I will fix it and cc stable with 5.9+. > >> * Removing patch to avoid write-protect on uffd unprotect. More >> comprehensive solution to follow (and avoid the TLB flush as well). >> --- >> mm/memory.c | 7 +++ >> 1 file changed, 7 insertions(+) >> >> diff --git a/mm/memory.c b/mm/memory.c >> index 9e8576a83147..06da04f98936 100644 >> --- a/mm/memory.c >> +++ b/mm/memory.c >> @@ -3092,6 +3092,13 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf) >> return handle_userfault(vmf, VM_UFFD_WP); >> } >> >> +/* >> + * Userfaultfd write-protect can defer flushes. Ensure the TLB >> + * is flushed in this case before copying. >> + */ >> +if (userfaultfd_wp(vmf->vma) && mm_tlb_flush_pending(vmf->vma->vm_mm)) >> +flush_tlb_page(vmf->vma, vmf->address); >> + >> vmf->page = vm_normal_page(vma, vmf->address, vmf->orig_pte); >> if (!vmf->page) { >> /* >> -- >> 2.25.1 >> > > Thanks for being consistent on fixing this problem. > > Maybe it's even better to put that into a "unlikely" to reduce the affect of > normal do_wp_page as much as possible? But I'll leave it to others. > > If with the fixes tag modified: > > Reviewed-by: Peter Xu > Tested-by: Peter Xu Thanks, I will send v4 later today. Regards, Nadav signature.asc Description: Message signed with OpenPGP
Re: [PATCH] MIPS: BMIPS: Reserve exception base to prevent corruption
On Wed, 3 Mar 2021, Thomas Bogendoerfer wrote: > > What's up with the R3k (the usual trigger for me) here? > > I've moved r3k cpu_probe() to it's own file and when moving ebase > reservation to cpu_probe(), I need to add it there as well. So just > a mechanic step, I've missed. Ah, right, I didn't notice the split. Thanks for taking care of it! Maciej
Re: [PATCH] MIPS: kernel: Reserve exception base early to prevent corruption
On 3/3/21 1:14 PM, Serge Semin wrote: > Hello Thomas, > Thanks for the patch. My comments are below. > > On Wed, Mar 03, 2021 at 07:57:13PM +0100, Thomas Bogendoerfer wrote: >> BMIPS is one of the few platforms that do change the exception base. >> After commit 2dcb39645441 ("memblock: do not start bottom-up allocations >> with kernel_end") we started seeing BMIPS boards fail to boot with the >> built-in FDT being corrupted. >> >> Before the cited commit, early allocations would be in the [kernel_end, >> RAM_END] range, but after commit they would be within [RAM_START + >> PAGE_SIZE, RAM_END]. >> >> The custom exception base handler that is installed by >> bmips_ebase_setup() done for BMIPS5000 CPUs ends-up trampling on the >> memory region allocated by unflatten_and_copy_device_tree() thus >> corrupting the FDT used by the kernel. >> >> To fix this, we need to perform an early reservation of the custom >> exception space. So we reserve it already in cpu_probe() for the CPUs >> where this is fixed. For CPU with an ebase config register allocation >> of exception space will be done in trap_init(). >> >> Huge thanks to Serget for analysing and proposing a solution to this >> issue. >> > >> Fixes: Fixes: 2dcb39645441 ("memblock: do not start bottom-up allocations >> with kernel_end") > > Fixes tag is used twice. > >> Debugged-by: Serge Semin >> Reported-by: Kamal Dasu > > I'd switch these tags order. First it was reported, then the > problem was debugged. I suppose it would be also nice to add > Florian under the second Reported-by tag if he doesn't mind. I haven't > seen any Kamal' email message, but a report posted by Florian only. Kamal reported it to me privately and then I brought it publicly, still wanted to give him credit. > >> Signed-off-by: Thomas Bogendoerfer >> --- >> arch/mips/include/asm/traps.h| 4 >> arch/mips/kernel/cpu-probe.c | 7 +++ >> arch/mips/kernel/cpu-r3k-probe.c | 3 +++ >> arch/mips/kernel/smp-bmips.c | 9 + >> arch/mips/kernel/traps.c | 31 --- >> 5 files changed, 31 insertions(+), 23 deletions(-) >> >> diff --git a/arch/mips/include/asm/traps.h b/arch/mips/include/asm/traps.h >> index 6aa8f126a43d..d74d829e1655 100644 >> --- a/arch/mips/include/asm/traps.h >> +++ b/arch/mips/include/asm/traps.h >> @@ -24,7 +24,11 @@ extern void (*board_ebase_setup)(void); >> extern void (*board_cache_error_setup)(void); >> >> extern int register_nmi_notifier(struct notifier_block *nb); >> +extern void reserve_exception_space(unsigned long addr, unsigned long size); >> extern char except_vec_nmi[]; >> +extern unsigned long ebase_size; >> + >> +#define VECTORSPACING 0x100 /* for EI/VI mode */ >> >> #define nmi_notifier(fn, pri) >> \ >> ({ \ >> diff --git a/arch/mips/kernel/cpu-probe.c b/arch/mips/kernel/cpu-probe.c >> index 9a89637b4ecf..effc45cbb351 100644 >> --- a/arch/mips/kernel/cpu-probe.c >> +++ b/arch/mips/kernel/cpu-probe.c >> @@ -26,6 +26,7 @@ >> #include >> #include >> #include >> +#include >> #include >> >> #include "fpu-probe.h" >> @@ -1628,6 +1629,7 @@ static inline void cpu_probe_broadcom(struct >> cpuinfo_mips *c, unsigned int cpu) >> c->cputype = CPU_BMIPS3300; >> __cpu_name[cpu] = "Broadcom BMIPS3300"; >> set_elf_platform(cpu, "bmips3300"); >> +reserve_exception_space(0x8400, VECTORSPACING * 64); >> break; >> case PRID_IMP_BMIPS43XX: { >> int rev = c->processor_id & PRID_REV_MASK; >> @@ -1638,6 +1640,7 @@ static inline void cpu_probe_broadcom(struct >> cpuinfo_mips *c, unsigned int cpu) >> __cpu_name[cpu] = "Broadcom BMIPS4380"; >> set_elf_platform(cpu, "bmips4380"); >> c->options |= MIPS_CPU_RIXI; >> +reserve_exception_space(0x8400, VECTORSPACING * 64); >> } else { >> c->cputype = CPU_BMIPS4350; >> __cpu_name[cpu] = "Broadcom BMIPS4350"; >> @@ -1654,6 +1657,7 @@ static inline void cpu_probe_broadcom(struct >> cpuinfo_mips *c, unsigned int cpu) >> __cpu_name[cpu] = "Broadcom BMIPS5000"; >> set_elf_platform(cpu, "bmips5000"); >> c->options |= MIPS_CPU_ULRI | MIPS_CPU_RIXI; >> +reserve_exception_space(0x80001000, VECTORSPACING * 64); >> break; >> } >> } >> @@ -2133,6 +2137,9 @@ void cpu_probe(void) >> if (cpu == 0) >> __ua_limit = ~((1ull << cpu_vmbits) - 1); >> #endif >> + >> +if (ebase_size == 0 && !cpu_has_mips_r2_r6) >> +reserve_exception_space(CAC_BASE, 0x400); >> } >> >> void cpu_report(void) >> diff --git a/arch/mips/kernel/cpu-r3k-probe.c >> b/arch/mips/kernel/cpu-r3k-probe.c >> index abdbbe8c5a43..6e3f4c17b810 100644 >>
Re: [PATCH] MIPS: kernel: Reserve exception base early to prevent corruption
Hello Thomas, Thanks for the patch. My comments are below. On Wed, Mar 03, 2021 at 07:57:13PM +0100, Thomas Bogendoerfer wrote: > BMIPS is one of the few platforms that do change the exception base. > After commit 2dcb39645441 ("memblock: do not start bottom-up allocations > with kernel_end") we started seeing BMIPS boards fail to boot with the > built-in FDT being corrupted. > > Before the cited commit, early allocations would be in the [kernel_end, > RAM_END] range, but after commit they would be within [RAM_START + > PAGE_SIZE, RAM_END]. > > The custom exception base handler that is installed by > bmips_ebase_setup() done for BMIPS5000 CPUs ends-up trampling on the > memory region allocated by unflatten_and_copy_device_tree() thus > corrupting the FDT used by the kernel. > > To fix this, we need to perform an early reservation of the custom > exception space. So we reserve it already in cpu_probe() for the CPUs > where this is fixed. For CPU with an ebase config register allocation > of exception space will be done in trap_init(). > > Huge thanks to Serget for analysing and proposing a solution to this > issue. > > Fixes: Fixes: 2dcb39645441 ("memblock: do not start bottom-up allocations > with kernel_end") Fixes tag is used twice. > Debugged-by: Serge Semin > Reported-by: Kamal Dasu I'd switch these tags order. First it was reported, then the problem was debugged. I suppose it would be also nice to add Florian under the second Reported-by tag if he doesn't mind. I haven't seen any Kamal' email message, but a report posted by Florian only. > Signed-off-by: Thomas Bogendoerfer > --- > arch/mips/include/asm/traps.h| 4 > arch/mips/kernel/cpu-probe.c | 7 +++ > arch/mips/kernel/cpu-r3k-probe.c | 3 +++ > arch/mips/kernel/smp-bmips.c | 9 + > arch/mips/kernel/traps.c | 31 --- > 5 files changed, 31 insertions(+), 23 deletions(-) > > diff --git a/arch/mips/include/asm/traps.h b/arch/mips/include/asm/traps.h > index 6aa8f126a43d..d74d829e1655 100644 > --- a/arch/mips/include/asm/traps.h > +++ b/arch/mips/include/asm/traps.h > @@ -24,7 +24,11 @@ extern void (*board_ebase_setup)(void); > extern void (*board_cache_error_setup)(void); > > extern int register_nmi_notifier(struct notifier_block *nb); > +extern void reserve_exception_space(unsigned long addr, unsigned long size); > extern char except_vec_nmi[]; > +extern unsigned long ebase_size; > + > +#define VECTORSPACING 0x100 /* for EI/VI mode */ > > #define nmi_notifier(fn, pri) > \ > ({ \ > diff --git a/arch/mips/kernel/cpu-probe.c b/arch/mips/kernel/cpu-probe.c > index 9a89637b4ecf..effc45cbb351 100644 > --- a/arch/mips/kernel/cpu-probe.c > +++ b/arch/mips/kernel/cpu-probe.c > @@ -26,6 +26,7 @@ > #include > #include > #include > +#include > #include > > #include "fpu-probe.h" > @@ -1628,6 +1629,7 @@ static inline void cpu_probe_broadcom(struct > cpuinfo_mips *c, unsigned int cpu) > c->cputype = CPU_BMIPS3300; > __cpu_name[cpu] = "Broadcom BMIPS3300"; > set_elf_platform(cpu, "bmips3300"); > + reserve_exception_space(0x8400, VECTORSPACING * 64); > break; > case PRID_IMP_BMIPS43XX: { > int rev = c->processor_id & PRID_REV_MASK; > @@ -1638,6 +1640,7 @@ static inline void cpu_probe_broadcom(struct > cpuinfo_mips *c, unsigned int cpu) > __cpu_name[cpu] = "Broadcom BMIPS4380"; > set_elf_platform(cpu, "bmips4380"); > c->options |= MIPS_CPU_RIXI; > + reserve_exception_space(0x8400, VECTORSPACING * 64); > } else { > c->cputype = CPU_BMIPS4350; > __cpu_name[cpu] = "Broadcom BMIPS4350"; > @@ -1654,6 +1657,7 @@ static inline void cpu_probe_broadcom(struct > cpuinfo_mips *c, unsigned int cpu) > __cpu_name[cpu] = "Broadcom BMIPS5000"; > set_elf_platform(cpu, "bmips5000"); > c->options |= MIPS_CPU_ULRI | MIPS_CPU_RIXI; > + reserve_exception_space(0x80001000, VECTORSPACING * 64); > break; > } > } > @@ -2133,6 +2137,9 @@ void cpu_probe(void) > if (cpu == 0) > __ua_limit = ~((1ull << cpu_vmbits) - 1); > #endif > + > + if (ebase_size == 0 && !cpu_has_mips_r2_r6) > + reserve_exception_space(CAC_BASE, 0x400); > } > > void cpu_report(void) > diff --git a/arch/mips/kernel/cpu-r3k-probe.c > b/arch/mips/kernel/cpu-r3k-probe.c > index abdbbe8c5a43..6e3f4c17b810 100644 > --- a/arch/mips/kernel/cpu-r3k-probe.c > +++ b/arch/mips/kernel/cpu-r3k-probe.c > @@ -21,6 +21,7 @@ > #include > #include > #include > +#include > > #include "fpu-probe.h" > > @@ -158,6 +159,8 @@ void cpu_probe(void) >
Re: [PATCH RESEND v3] mm/userfaultfd: fix memory corruption due to writeprotect
On Wed, Mar 03, 2021 at 01:57:02AM -0800, Nadav Amit wrote: > From: Nadav Amit > > Userfaultfd self-test fails occasionally, indicating a memory > corruption. It's failing very constantly now for me after I got it run on a 40 cores system... While indeed not easy to fail on my laptop. [...] > Fixes: 292924b26024 ("userfaultfd: wp: apply _PAGE_UFFD_WP bit") > Signed-off-by: Nadav Amit > > --- > v2->v3: > * Do not acquire mmap_lock for write, flush conditionally instead [Yu] > * Change the fixes tag to the patch that made the race apparent [Yu] Did you forget about this one? It would still be good to point to 09854ba94c6a just to show that 5.7/5.8 stable branches shouldn't need this patch as they're not prone to the tlb data curruption. Maybe also cc stable with 5.9+? > * Removing patch to avoid write-protect on uffd unprotect. More > comprehensive solution to follow (and avoid the TLB flush as well). > --- > mm/memory.c | 7 +++ > 1 file changed, 7 insertions(+) > > diff --git a/mm/memory.c b/mm/memory.c > index 9e8576a83147..06da04f98936 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -3092,6 +3092,13 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf) > return handle_userfault(vmf, VM_UFFD_WP); > } > > + /* > + * Userfaultfd write-protect can defer flushes. Ensure the TLB > + * is flushed in this case before copying. > + */ > + if (userfaultfd_wp(vmf->vma) && mm_tlb_flush_pending(vmf->vma->vm_mm)) > + flush_tlb_page(vmf->vma, vmf->address); > + > vmf->page = vm_normal_page(vma, vmf->address, vmf->orig_pte); > if (!vmf->page) { > /* > -- > 2.25.1 > Thanks for being consistent on fixing this problem. Maybe it's even better to put that into a "unlikely" to reduce the affect of normal do_wp_page as much as possible? But I'll leave it to others. If with the fixes tag modified: Reviewed-by: Peter Xu Tested-by: Peter Xu Thanks, -- Peter Xu
[PATCH] MIPS: kernel: Reserve exception base early to prevent corruption
BMIPS is one of the few platforms that do change the exception base. After commit 2dcb39645441 ("memblock: do not start bottom-up allocations with kernel_end") we started seeing BMIPS boards fail to boot with the built-in FDT being corrupted. Before the cited commit, early allocations would be in the [kernel_end, RAM_END] range, but after commit they would be within [RAM_START + PAGE_SIZE, RAM_END]. The custom exception base handler that is installed by bmips_ebase_setup() done for BMIPS5000 CPUs ends-up trampling on the memory region allocated by unflatten_and_copy_device_tree() thus corrupting the FDT used by the kernel. To fix this, we need to perform an early reservation of the custom exception space. So we reserve it already in cpu_probe() for the CPUs where this is fixed. For CPU with an ebase config register allocation of exception space will be done in trap_init(). Huge thanks to Serget for analysing and proposing a solution to this issue. Fixes: Fixes: 2dcb39645441 ("memblock: do not start bottom-up allocations with kernel_end") Debugged-by: Serge Semin Reported-by: Kamal Dasu Signed-off-by: Thomas Bogendoerfer --- arch/mips/include/asm/traps.h| 4 arch/mips/kernel/cpu-probe.c | 7 +++ arch/mips/kernel/cpu-r3k-probe.c | 3 +++ arch/mips/kernel/smp-bmips.c | 9 + arch/mips/kernel/traps.c | 31 --- 5 files changed, 31 insertions(+), 23 deletions(-) diff --git a/arch/mips/include/asm/traps.h b/arch/mips/include/asm/traps.h index 6aa8f126a43d..d74d829e1655 100644 --- a/arch/mips/include/asm/traps.h +++ b/arch/mips/include/asm/traps.h @@ -24,7 +24,11 @@ extern void (*board_ebase_setup)(void); extern void (*board_cache_error_setup)(void); extern int register_nmi_notifier(struct notifier_block *nb); +extern void reserve_exception_space(unsigned long addr, unsigned long size); extern char except_vec_nmi[]; +extern unsigned long ebase_size; + +#define VECTORSPACING 0x100/* for EI/VI mode */ #define nmi_notifier(fn, pri) \ ({ \ diff --git a/arch/mips/kernel/cpu-probe.c b/arch/mips/kernel/cpu-probe.c index 9a89637b4ecf..effc45cbb351 100644 --- a/arch/mips/kernel/cpu-probe.c +++ b/arch/mips/kernel/cpu-probe.c @@ -26,6 +26,7 @@ #include #include #include +#include #include #include "fpu-probe.h" @@ -1628,6 +1629,7 @@ static inline void cpu_probe_broadcom(struct cpuinfo_mips *c, unsigned int cpu) c->cputype = CPU_BMIPS3300; __cpu_name[cpu] = "Broadcom BMIPS3300"; set_elf_platform(cpu, "bmips3300"); + reserve_exception_space(0x8400, VECTORSPACING * 64); break; case PRID_IMP_BMIPS43XX: { int rev = c->processor_id & PRID_REV_MASK; @@ -1638,6 +1640,7 @@ static inline void cpu_probe_broadcom(struct cpuinfo_mips *c, unsigned int cpu) __cpu_name[cpu] = "Broadcom BMIPS4380"; set_elf_platform(cpu, "bmips4380"); c->options |= MIPS_CPU_RIXI; + reserve_exception_space(0x8400, VECTORSPACING * 64); } else { c->cputype = CPU_BMIPS4350; __cpu_name[cpu] = "Broadcom BMIPS4350"; @@ -1654,6 +1657,7 @@ static inline void cpu_probe_broadcom(struct cpuinfo_mips *c, unsigned int cpu) __cpu_name[cpu] = "Broadcom BMIPS5000"; set_elf_platform(cpu, "bmips5000"); c->options |= MIPS_CPU_ULRI | MIPS_CPU_RIXI; + reserve_exception_space(0x80001000, VECTORSPACING * 64); break; } } @@ -2133,6 +2137,9 @@ void cpu_probe(void) if (cpu == 0) __ua_limit = ~((1ull << cpu_vmbits) - 1); #endif + + if (ebase_size == 0 && !cpu_has_mips_r2_r6) + reserve_exception_space(CAC_BASE, 0x400); } void cpu_report(void) diff --git a/arch/mips/kernel/cpu-r3k-probe.c b/arch/mips/kernel/cpu-r3k-probe.c index abdbbe8c5a43..6e3f4c17b810 100644 --- a/arch/mips/kernel/cpu-r3k-probe.c +++ b/arch/mips/kernel/cpu-r3k-probe.c @@ -21,6 +21,7 @@ #include #include #include +#include #include "fpu-probe.h" @@ -158,6 +159,8 @@ void cpu_probe(void) cpu_set_fpu_opts(c); else cpu_set_nofpu_opts(c); + + reserve_exception_space(CAC_BASE, 0x400); } void cpu_report(void) diff --git a/arch/mips/kernel/smp-bmips.c b/arch/mips/kernel/smp-bmips.c index b6ef5f7312cf..ad3f2282a65a 100644 --- a/arch/mips/kernel/smp-bmips.c +++ b/arch/mips/kernel/smp-bmips.c @@ -528,10 +528,6 @@ static void bmips_set_reset_vec(int cpu, u32 val) void bmips_ebase_setup(void) { - unsigned long new_ebase = ebase; - - BUG_ON(ebase != CKSEG0); - switch (current_cpu_type()) { case CPU_BMIPS4350:
Re: [PATCH] MIPS: BMIPS: Reserve exception base to prevent corruption
On Wed, Mar 03, 2021 at 06:45:52PM +0100, Maciej W. Rozycki wrote: > On Wed, 3 Mar 2021, Thomas Bogendoerfer wrote: > > > perfect, I only forgot about R3k... I'll submit a formal patch submission > > later today. > > What's up with the R3k (the usual trigger for me) here? I've moved r3k cpu_probe() to it's own file and when moving ebase reservation to cpu_probe(), I need to add it there as well. So just a mechanic step, I've missed. Thomas. -- Crap can work. Given enough thrust pigs will fly, but it's not necessarily a good idea.[ RFC1925, 2.3 ]