Re: [REGRESSION] mm: filemap_map_pages NULL pointer dereference
Konstantin, Andrew, On Fri, Feb 05, 2016 at 02:19:40PM -0800, Andrew Morton wrote: > On Fri, 5 Feb 2016 10:05:02 -0800 Jeremiah Mahler wrote: > [...] > > This should fix it up. > > From: Konstantin Khlebnikov > Subject: radix-tree: fix oops after radix_tree_iter_retry > > Helper radix_tree_iter_retry() resets next_index to the current index. In > following radix_tree_next_slot current chunk size becomes zero. This > isn't checked and it tries to dereference null pointer in slot. > > Tagged iterator is fine because retry happens only at slot 0 where tag > bitmask in iter->tags is filled with single bit. > > Fixes: 46437f9a554f ("radix-tree: fix race in gang lookup") > Signed-off-by: Konstantin Khlebnikov > Cc: Matthew Wilcox > Cc: Hugh Dickins > Cc: Ohad Ben-Cohen > Cc: Jeremiah Mahler > Cc: > Signed-off-by: Andrew Morton > --- > > include/linux/radix-tree.h |6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff -puN > include/linux/radix-tree.h~radix-tree-fix-oops-after-radix_tree_iter_retry > include/linux/radix-tree.h > --- > a/include/linux/radix-tree.h~radix-tree-fix-oops-after-radix_tree_iter_retry > +++ a/include/linux/radix-tree.h > @@ -400,7 +400,7 @@ void **radix_tree_iter_retry(struct radi > * @iter:pointer to radix tree iterator > * Returns: current chunk size > */ > -static __always_inline unsigned > +static __always_inline long > radix_tree_chunk_size(struct radix_tree_iter *iter) > { > return iter->next_index - iter->index; > @@ -434,9 +434,9 @@ radix_tree_next_slot(void **slot, struct > return slot + offset + 1; > } > } else { > - unsigned size = radix_tree_chunk_size(iter) - 1; > + long size = radix_tree_chunk_size(iter); > > - while (size--) { > + while (--size > 0) { > slot++; > iter->index++; > if (likely(*slot)) > _ > Fix is still working great after a couple days. Tested-by: Jeremiah Mahler -- - Jeremiah Mahler
Re: [REGRESSION] mm: filemap_map_pages NULL pointer dereference
Konstantin, On Sun, Feb 07, 2016 at 11:27:53AM +0300, Konstantin Khlebnikov wrote: > On Sat, Feb 6, 2016 at 9:18 PM, Jeremiah Mahler wrote: [...] > >> -static __always_inline unsigned > >> +static __always_inline long > >> radix_tree_chunk_size(struct radix_tree_iter *iter) > >> { > >> return iter->next_index - iter->index; > >> @@ -434,9 +434,9 @@ radix_tree_next_slot(void **slot, struct > >> return slot + offset + 1; > >> } > >> } else { > >> - unsigned size = radix_tree_chunk_size(iter) - 1; > >> + long size = radix_tree_chunk_size(iter); > >> > >> - while (size--) { > >> + while (--size > 0) { > >> slot++; > >> iter->index++; > >> if (likely(*slot)) > >> _ > >> > > > > I have applied this patch to my kernel and so far the bug has not > > come back. Thanks for the quick fix. > > > > Although I don't quite understand how this fixes the slot==NULL problem. > > Unless I am missing something, it looks like the while loop will be > > executed the same number of times but the size variable will no > > longer go negative as it did before. > > That's simple. Slot is dereferenced after checking remaining size. > Old version checked only for != 0. After iter-retry size is zero and > afrer "- 1" it overlaps into positive range. In new version it's signed and > checked for > 0. > OK, I get it now. Thanks for the explanation. -- - Jeremiah Mahler
Re: [REGRESSION] mm: filemap_map_pages NULL pointer dereference
On Sat, Feb 6, 2016 at 9:18 PM, Jeremiah Mahler wrote: > Andrew, > > On Fri, Feb 05, 2016 at 02:19:40PM -0800, Andrew Morton wrote: >> On Fri, 5 Feb 2016 10:05:02 -0800 Jeremiah Mahler wrote: >> > [...] >> > unable to handle kernel NULL pointer dereference >> >> This should fix it up. >> > [...] >> >> include/linux/radix-tree.h |6 +++--- >> 1 file changed, 3 insertions(+), 3 deletions(-) >> >> diff -puN >> include/linux/radix-tree.h~radix-tree-fix-oops-after-radix_tree_iter_retry >> include/linux/radix-tree.h >> --- >> a/include/linux/radix-tree.h~radix-tree-fix-oops-after-radix_tree_iter_retry >> +++ a/include/linux/radix-tree.h >> @@ -400,7 +400,7 @@ void **radix_tree_iter_retry(struct radi >> * @iter:pointer to radix tree iterator >> * Returns: current chunk size >> */ >> -static __always_inline unsigned >> +static __always_inline long >> radix_tree_chunk_size(struct radix_tree_iter *iter) >> { >> return iter->next_index - iter->index; >> @@ -434,9 +434,9 @@ radix_tree_next_slot(void **slot, struct >> return slot + offset + 1; >> } >> } else { >> - unsigned size = radix_tree_chunk_size(iter) - 1; >> + long size = radix_tree_chunk_size(iter); >> >> - while (size--) { >> + while (--size > 0) { >> slot++; >> iter->index++; >> if (likely(*slot)) >> _ >> > > I have applied this patch to my kernel and so far the bug has not > come back. Thanks for the quick fix. > > Although I don't quite understand how this fixes the slot==NULL problem. > Unless I am missing something, it looks like the while loop will be > executed the same number of times but the size variable will no > longer go negative as it did before. That's simple. Slot is dereferenced after checking remaining size. Old version checked only for != 0. After iter-retry size is zero and afrer "- 1" it overlaps into positive range. In new version it's signed and checked for > 0. > > -- > - Jeremiah Mahler
Re: [REGRESSION] mm: filemap_map_pages NULL pointer dereference
On Sat, Feb 6, 2016 at 9:18 PM, Jeremiah Mahlerwrote: > Andrew, > > On Fri, Feb 05, 2016 at 02:19:40PM -0800, Andrew Morton wrote: >> On Fri, 5 Feb 2016 10:05:02 -0800 Jeremiah Mahler wrote: >> > [...] >> > unable to handle kernel NULL pointer dereference >> >> This should fix it up. >> > [...] >> >> include/linux/radix-tree.h |6 +++--- >> 1 file changed, 3 insertions(+), 3 deletions(-) >> >> diff -puN >> include/linux/radix-tree.h~radix-tree-fix-oops-after-radix_tree_iter_retry >> include/linux/radix-tree.h >> --- >> a/include/linux/radix-tree.h~radix-tree-fix-oops-after-radix_tree_iter_retry >> +++ a/include/linux/radix-tree.h >> @@ -400,7 +400,7 @@ void **radix_tree_iter_retry(struct radi >> * @iter:pointer to radix tree iterator >> * Returns: current chunk size >> */ >> -static __always_inline unsigned >> +static __always_inline long >> radix_tree_chunk_size(struct radix_tree_iter *iter) >> { >> return iter->next_index - iter->index; >> @@ -434,9 +434,9 @@ radix_tree_next_slot(void **slot, struct >> return slot + offset + 1; >> } >> } else { >> - unsigned size = radix_tree_chunk_size(iter) - 1; >> + long size = radix_tree_chunk_size(iter); >> >> - while (size--) { >> + while (--size > 0) { >> slot++; >> iter->index++; >> if (likely(*slot)) >> _ >> > > I have applied this patch to my kernel and so far the bug has not > come back. Thanks for the quick fix. > > Although I don't quite understand how this fixes the slot==NULL problem. > Unless I am missing something, it looks like the while loop will be > executed the same number of times but the size variable will no > longer go negative as it did before. That's simple. Slot is dereferenced after checking remaining size. Old version checked only for != 0. After iter-retry size is zero and afrer "- 1" it overlaps into positive range. In new version it's signed and checked for > 0. > > -- > - Jeremiah Mahler
Re: [REGRESSION] mm: filemap_map_pages NULL pointer dereference
Konstantin, Andrew, On Fri, Feb 05, 2016 at 02:19:40PM -0800, Andrew Morton wrote: > On Fri, 5 Feb 2016 10:05:02 -0800 Jeremiah Mahlerwrote: > [...] > > This should fix it up. > > From: Konstantin Khlebnikov > Subject: radix-tree: fix oops after radix_tree_iter_retry > > Helper radix_tree_iter_retry() resets next_index to the current index. In > following radix_tree_next_slot current chunk size becomes zero. This > isn't checked and it tries to dereference null pointer in slot. > > Tagged iterator is fine because retry happens only at slot 0 where tag > bitmask in iter->tags is filled with single bit. > > Fixes: 46437f9a554f ("radix-tree: fix race in gang lookup") > Signed-off-by: Konstantin Khlebnikov > Cc: Matthew Wilcox > Cc: Hugh Dickins > Cc: Ohad Ben-Cohen > Cc: Jeremiah Mahler > Cc: > Signed-off-by: Andrew Morton > --- > > include/linux/radix-tree.h |6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff -puN > include/linux/radix-tree.h~radix-tree-fix-oops-after-radix_tree_iter_retry > include/linux/radix-tree.h > --- > a/include/linux/radix-tree.h~radix-tree-fix-oops-after-radix_tree_iter_retry > +++ a/include/linux/radix-tree.h > @@ -400,7 +400,7 @@ void **radix_tree_iter_retry(struct radi > * @iter:pointer to radix tree iterator > * Returns: current chunk size > */ > -static __always_inline unsigned > +static __always_inline long > radix_tree_chunk_size(struct radix_tree_iter *iter) > { > return iter->next_index - iter->index; > @@ -434,9 +434,9 @@ radix_tree_next_slot(void **slot, struct > return slot + offset + 1; > } > } else { > - unsigned size = radix_tree_chunk_size(iter) - 1; > + long size = radix_tree_chunk_size(iter); > > - while (size--) { > + while (--size > 0) { > slot++; > iter->index++; > if (likely(*slot)) > _ > Fix is still working great after a couple days. Tested-by: Jeremiah Mahler -- - Jeremiah Mahler
Re: [REGRESSION] mm: filemap_map_pages NULL pointer dereference
Konstantin, On Sun, Feb 07, 2016 at 11:27:53AM +0300, Konstantin Khlebnikov wrote: > On Sat, Feb 6, 2016 at 9:18 PM, Jeremiah Mahlerwrote: [...] > >> -static __always_inline unsigned > >> +static __always_inline long > >> radix_tree_chunk_size(struct radix_tree_iter *iter) > >> { > >> return iter->next_index - iter->index; > >> @@ -434,9 +434,9 @@ radix_tree_next_slot(void **slot, struct > >> return slot + offset + 1; > >> } > >> } else { > >> - unsigned size = radix_tree_chunk_size(iter) - 1; > >> + long size = radix_tree_chunk_size(iter); > >> > >> - while (size--) { > >> + while (--size > 0) { > >> slot++; > >> iter->index++; > >> if (likely(*slot)) > >> _ > >> > > > > I have applied this patch to my kernel and so far the bug has not > > come back. Thanks for the quick fix. > > > > Although I don't quite understand how this fixes the slot==NULL problem. > > Unless I am missing something, it looks like the while loop will be > > executed the same number of times but the size variable will no > > longer go negative as it did before. > > That's simple. Slot is dereferenced after checking remaining size. > Old version checked only for != 0. After iter-retry size is zero and > afrer "- 1" it overlaps into positive range. In new version it's signed and > checked for > 0. > OK, I get it now. Thanks for the explanation. -- - Jeremiah Mahler
Re: [REGRESSION] mm: filemap_map_pages NULL pointer dereference
Andrew, On Fri, Feb 05, 2016 at 02:19:40PM -0800, Andrew Morton wrote: > On Fri, 5 Feb 2016 10:05:02 -0800 Jeremiah Mahler wrote: > [...] > > unable to handle kernel NULL pointer dereference > > This should fix it up. > [...] > > include/linux/radix-tree.h |6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff -puN > include/linux/radix-tree.h~radix-tree-fix-oops-after-radix_tree_iter_retry > include/linux/radix-tree.h > --- > a/include/linux/radix-tree.h~radix-tree-fix-oops-after-radix_tree_iter_retry > +++ a/include/linux/radix-tree.h > @@ -400,7 +400,7 @@ void **radix_tree_iter_retry(struct radi > * @iter:pointer to radix tree iterator > * Returns: current chunk size > */ > -static __always_inline unsigned > +static __always_inline long > radix_tree_chunk_size(struct radix_tree_iter *iter) > { > return iter->next_index - iter->index; > @@ -434,9 +434,9 @@ radix_tree_next_slot(void **slot, struct > return slot + offset + 1; > } > } else { > - unsigned size = radix_tree_chunk_size(iter) - 1; > + long size = radix_tree_chunk_size(iter); > > - while (size--) { > + while (--size > 0) { > slot++; > iter->index++; > if (likely(*slot)) > _ > I have applied this patch to my kernel and so far the bug has not come back. Thanks for the quick fix. Although I don't quite understand how this fixes the slot==NULL problem. Unless I am missing something, it looks like the while loop will be executed the same number of times but the size variable will no longer go negative as it did before. -- - Jeremiah Mahler
Re: [REGRESSION] mm: filemap_map_pages NULL pointer dereference
Andrew, On Fri, Feb 05, 2016 at 02:19:40PM -0800, Andrew Morton wrote: > On Fri, 5 Feb 2016 10:05:02 -0800 Jeremiah Mahlerwrote: > [...] > > unable to handle kernel NULL pointer dereference > > This should fix it up. > [...] > > include/linux/radix-tree.h |6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff -puN > include/linux/radix-tree.h~radix-tree-fix-oops-after-radix_tree_iter_retry > include/linux/radix-tree.h > --- > a/include/linux/radix-tree.h~radix-tree-fix-oops-after-radix_tree_iter_retry > +++ a/include/linux/radix-tree.h > @@ -400,7 +400,7 @@ void **radix_tree_iter_retry(struct radi > * @iter:pointer to radix tree iterator > * Returns: current chunk size > */ > -static __always_inline unsigned > +static __always_inline long > radix_tree_chunk_size(struct radix_tree_iter *iter) > { > return iter->next_index - iter->index; > @@ -434,9 +434,9 @@ radix_tree_next_slot(void **slot, struct > return slot + offset + 1; > } > } else { > - unsigned size = radix_tree_chunk_size(iter) - 1; > + long size = radix_tree_chunk_size(iter); > > - while (size--) { > + while (--size > 0) { > slot++; > iter->index++; > if (likely(*slot)) > _ > I have applied this patch to my kernel and so far the bug has not come back. Thanks for the quick fix. Although I don't quite understand how this fixes the slot==NULL problem. Unless I am missing something, it looks like the while loop will be executed the same number of times but the size variable will no longer go negative as it did before. -- - Jeremiah Mahler
Re: [REGRESSION] mm: filemap_map_pages NULL pointer dereference
On Fri, 5 Feb 2016 10:05:02 -0800 Jeremiah Mahler wrote: > On a Lenovo X1 Carbon running -next (20160201+, 20160203+) I have > experienced several system hangs. I usually notice it first when > my browser (Chrome) stops responding but then other programs will stop > responding as well. The only fix is a reboot. It is sporadic but it > will usually occur once a day. > > In the logs there will be a > > unable to handle kernel NULL pointer dereference This should fix it up. From: Konstantin Khlebnikov Subject: radix-tree: fix oops after radix_tree_iter_retry Helper radix_tree_iter_retry() resets next_index to the current index. In following radix_tree_next_slot current chunk size becomes zero. This isn't checked and it tries to dereference null pointer in slot. Tagged iterator is fine because retry happens only at slot 0 where tag bitmask in iter->tags is filled with single bit. Fixes: 46437f9a554f ("radix-tree: fix race in gang lookup") Signed-off-by: Konstantin Khlebnikov Cc: Matthew Wilcox Cc: Hugh Dickins Cc: Ohad Ben-Cohen Cc: Jeremiah Mahler Cc: Signed-off-by: Andrew Morton --- include/linux/radix-tree.h |6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff -puN include/linux/radix-tree.h~radix-tree-fix-oops-after-radix_tree_iter_retry include/linux/radix-tree.h --- a/include/linux/radix-tree.h~radix-tree-fix-oops-after-radix_tree_iter_retry +++ a/include/linux/radix-tree.h @@ -400,7 +400,7 @@ void **radix_tree_iter_retry(struct radi * @iter: pointer to radix tree iterator * Returns:current chunk size */ -static __always_inline unsigned +static __always_inline long radix_tree_chunk_size(struct radix_tree_iter *iter) { return iter->next_index - iter->index; @@ -434,9 +434,9 @@ radix_tree_next_slot(void **slot, struct return slot + offset + 1; } } else { - unsigned size = radix_tree_chunk_size(iter) - 1; + long size = radix_tree_chunk_size(iter); - while (size--) { + while (--size > 0) { slot++; iter->index++; if (likely(*slot)) _
Re: [REGRESSION] mm: filemap_map_pages NULL pointer dereference
On Fri, 5 Feb 2016 10:05:02 -0800 Jeremiah Mahler wrote: > all, > > On a Lenovo X1 Carbon running -next (20160201+, 20160203+) I have > experienced several system hangs. I usually notice it first when > my browser (Chrome) stops responding but then other programs will stop > responding as well. The only fix is a reboot. It is sporadic but it > will usually occur once a day. > > In the logs there will be a > > unable to handle kernel NULL pointer dereference > > message related to filemap_map_pages+0x10d/0x290 (below). > > > ... > [51985.993033] BUG: unable to handle kernel NULL pointer dereference at > 0008 > [51985.993087] IP: [] filemap_map_pages+0x10d/0x290 > [51985.993123] PGD 2c772067 PUD 0 > [51985.993144] Oops: [#1] SMP > [51985.993166] Modules linked in: ctr ccm cpufreq_conservative cpufreq_stats > cpufreq_userspace cpufreq_powersave binfmt_misc i915 arc4 iwldvm mac80211 > x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul > crc32c_intel iTCO_wdt ghash_clmulni_intel iTCO_vendor_support > jitterentropy_rng sha256_generic hmac drbg snd_hda_codec_hdmi aesni_intel > snd_hda_codec_realtek aes_x86_64 iwlwifi glue_helper snd_hda_codec_generic > i2c_algo_bit lrw drm_kms_helper gf128mul ablk_helper cryptd snd_hda_intel drm > psmouse snd_hda_codec cfg80211 pcspkr evdev serio_raw snd_hwdep i2c_i801 > snd_hda_core sg snd_pcm mei_me lpc_ich mfd_core mei shpchp snd_timer i2c_core > wmi thinkpad_acpi nvram snd battery tpm_tis soundcore ac tpm video button > intel_smartconnect btusb btbcm btintel bluetooth rfkill loop ipv6 autofs4 > [51985.993591] ext4 crc16 mbcache jbd2 sd_mod ahci libahci libata ehci_pci > sdhci_pci scsi_mod xhci_pci sdhci xhci_hcd ehci_hcd mmc_core usbcore > usb_common thermal > [51985.993680] CPU: 2 PID: 22993 Comm: chrome Not tainted > 4.5.0-rc2-next-20160203+ #11 > [51985.993714] Hardware name: LENOVO 3443CTO/3443CTO, BIOS G6ET59WW (2.03 ) > 09/11/2012 > [51985.993760] task: 88004bb04dc0 ti: 88002a2f8000 task.ti: > 88002a2f8000 > [51985.993804] RIP: 0010:[] [] > filemap_map_pages+0x10d/0x290 > [51985.993845] RSP: :88002a2fbdf8 EFLAGS: 00010202 > [51985.993874] RAX: 0007fff8 RBX: 0001 RCX: > 0003 > [51985.993911] RDX: RSI: ea5bdd1c RDI: > ea5bdd00 > [51985.993948] RBP: 8800beff4220 R08: 007f R09: > > [51985.993985] R10: R11: 8800a39382b8 R12: > 8801182b9440 > [51985.994023] R13: 88002a2fbe90 R14: 8800be568d80 R15: > 0008 > [51985.994061] FS: 7f3e20276a40() GS:88011e30() > knlGS: > [51985.994103] CS: 0010 DS: ES: CR0: 80050033 > [51985.994134] CR2: 0008 CR3: be6eb000 CR4: > 001406e0 > [51985.994172] Stack: > [51985.994184] 8800beff4228 7f3e0ae63000 0001 > > [51985.994229] 0001 7f3e0ae63000 8800be568d80 > 0054 > [51985.994273] 88000318 88003584c318 8800840076c0 > 8117b073 > [51985.994318] Call Trace: > [51985.994336] [] ? handle_mm_fault+0x13b3/0x1790 > [51985.994370] [] ? up_write+0x21/0x30 > [51985.994400] [] ? __do_page_fault+0x192/0x410 > [51985.994434] [] ? page_fault+0x28/0x30 > [51985.994463] Code: 00 00 00 48 8b 54 24 10 49 3b 55 28 74 48 48 8b 44 24 18 > 83 e8 01 29 d0 49 8d 04 c7 49 39 c7 74 19 49 83 c7 08 48 83 44 24 10 01 <49> > 83 3f 00 74 eb 4d 85 ff 0f 85 3b ff ff ff 48 8b 3c 24 48 8d > [51985.994656] RIP [] filemap_map_pages+0x10d/0x290 > [51985.994692] RSP > [51985.994711] CR2: 0008 > > ... > > Referring again to the RIP line from the trace. > > [51985.994656] RIP [] filemap_map_pages+0x10d/0x290 > > jeri@hudson:~/linux-next$ gdb vmlinux > (gdb) list *0x8114a19d > 0x8114a19d is in filemap_map_pages > (include/linux/radix-tree.h:465). > 460 unsigned size = radix_tree_chunk_size(iter) - 1; > 461 > 462 while (size--) { > 463 slot++; > 464 iter->index++; > 465 if (likely(*slot)) > 466 return slot; > 467 if (flags & RADIX_TREE_ITER_CONTIG) { > 468 /* forbid switching to the next chunk */ > 469 iter->next_index = 0; > (gdb) > > Assuming I traced the addresses correctly, this indicates that the > fault is triggered when the value in the slot pointer is accessed. > Perhaps slot is being incremented beyond its valid range? That's super helpful, thanks. The faulting address was 0x0008, so radix_tree_next_slot() was called with slot==NULL. And looking at it, I don't see how this code can work at all: :
[REGRESSION] mm: filemap_map_pages NULL pointer dereference
all, On a Lenovo X1 Carbon running -next (20160201+, 20160203+) I have experienced several system hangs. I usually notice it first when my browser (Chrome) stops responding but then other programs will stop responding as well. The only fix is a reboot. It is sporadic but it will usually occur once a day. In the logs there will be a unable to handle kernel NULL pointer dereference message related to filemap_map_pages+0x10d/0x290 (below). ... [51985.993033] BUG: unable to handle kernel NULL pointer dereference at 0008 [51985.993087] IP: [] filemap_map_pages+0x10d/0x290 [51985.993123] PGD 2c772067 PUD 0 [51985.993144] Oops: [#1] SMP [51985.993166] Modules linked in: ctr ccm cpufreq_conservative cpufreq_stats cpufreq_userspace cpufreq_powersave binfmt_misc i915 arc4 iwldvm mac80211 x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul crc32c_intel iTCO_wdt ghash_clmulni_intel iTCO_vendor_support jitterentropy_rng sha256_generic hmac drbg snd_hda_codec_hdmi aesni_intel snd_hda_codec_realtek aes_x86_64 iwlwifi glue_helper snd_hda_codec_generic i2c_algo_bit lrw drm_kms_helper gf128mul ablk_helper cryptd snd_hda_intel drm psmouse snd_hda_codec cfg80211 pcspkr evdev serio_raw snd_hwdep i2c_i801 snd_hda_core sg snd_pcm mei_me lpc_ich mfd_core mei shpchp snd_timer i2c_core wmi thinkpad_acpi nvram snd battery tpm_tis soundcore ac tpm video button intel_smartconnect btusb btbcm btintel bluetooth rfkill loop ipv6 autofs4 [51985.993591] ext4 crc16 mbcache jbd2 sd_mod ahci libahci libata ehci_pci sdhci_pci scsi_mod xhci_pci sdhci xhci_hcd ehci_hcd mmc_core usbcore usb_common thermal [51985.993680] CPU: 2 PID: 22993 Comm: chrome Not tainted 4.5.0-rc2-next-20160203+ #11 [51985.993714] Hardware name: LENOVO 3443CTO/3443CTO, BIOS G6ET59WW (2.03 ) 09/11/2012 [51985.993760] task: 88004bb04dc0 ti: 88002a2f8000 task.ti: 88002a2f8000 [51985.993804] RIP: 0010:[] [] filemap_map_pages+0x10d/0x290 [51985.993845] RSP: :88002a2fbdf8 EFLAGS: 00010202 [51985.993874] RAX: 0007fff8 RBX: 0001 RCX: 0003 [51985.993911] RDX: RSI: ea5bdd1c RDI: ea5bdd00 [51985.993948] RBP: 8800beff4220 R08: 007f R09: [51985.993985] R10: R11: 8800a39382b8 R12: 8801182b9440 [51985.994023] R13: 88002a2fbe90 R14: 8800be568d80 R15: 0008 [51985.994061] FS: 7f3e20276a40() GS:88011e30() knlGS: [51985.994103] CS: 0010 DS: ES: CR0: 80050033 [51985.994134] CR2: 0008 CR3: be6eb000 CR4: 001406e0 [51985.994172] Stack: [51985.994184] 8800beff4228 7f3e0ae63000 0001 [51985.994229] 0001 7f3e0ae63000 8800be568d80 0054 [51985.994273] 88000318 88003584c318 8800840076c0 8117b073 [51985.994318] Call Trace: [51985.994336] [] ? handle_mm_fault+0x13b3/0x1790 [51985.994370] [] ? up_write+0x21/0x30 [51985.994400] [] ? __do_page_fault+0x192/0x410 [51985.994434] [] ? page_fault+0x28/0x30 [51985.994463] Code: 00 00 00 48 8b 54 24 10 49 3b 55 28 74 48 48 8b 44 24 18 83 e8 01 29 d0 49 8d 04 c7 49 39 c7 74 19 49 83 c7 08 48 83 44 24 10 01 <49> 83 3f 00 74 eb 4d 85 ff 0f 85 3b ff ff ff 48 8b 3c 24 48 8d [51985.994656] RIP [] filemap_map_pages+0x10d/0x290 [51985.994692] RSP [51985.994711] CR2: 0008 [51986.002154] ---[ end trace da60309b42c1da53 ]--- [52006.988971] INFO: rcu_sched self-detected stall on CPU [52006.988978] 3-...: (5249 ticks this GP) idle=765/141/0 softirq=931903/931903 fqs=5247 [52006.988980] (t=5250 jiffies g=821290 c=821289 q=1968) [52006.988982] Task dump for CPU 3: [52006.988985] CompositorTileW R running task0 22999 1425 0x0108 [52006.988988] 81851580 81148a21 88011e397540 81851580 [52006.988991] 880117daa180 810c5e19 00983e0c [52006.988993] 810cfd7e 0092 0092 003b9aca [52006.988995] Call Trace: [52006.988996][] ? rcu_dump_cpu_stacks+0x71/0x8a [52006.989004] [] ? rcu_check_callbacks+0x6e9/0x790 [52006.989007] [] ? timekeeping_update+0xee/0x150 [52006.989009] [] ? tick_sched_handle.isra.14+0x50/0x50 [52006.989011] [] ? update_process_times+0x32/0x60 [52006.989013] [] ? tick_sched_handle.isra.14+0x20/0x50 [52006.989014] [] ? tick_sched_timer+0x38/0x70 [52006.989016] [] ? __hrtimer_run_queues+0xec/0x230 [52006.989017] [] ? hrtimer_interrupt+0x9a/0x1a0 [52006.989020] [] ? smp_apic_timer_interrupt+0x39/0x50 [52006.989022] [] ? apic_timer_interrupt+0x82/0x90 [52006.989023][] ? delay_tsc+0x25/0x50 [52006.989028] [] ? do_raw_spin_lock+0x86/0x150 [52006.989031] [] ? handle_mm_fault+0x4b5/0x1790 [52006.989033] [] ?
[REGRESSION] mm: filemap_map_pages NULL pointer dereference
all, On a Lenovo X1 Carbon running -next (20160201+, 20160203+) I have experienced several system hangs. I usually notice it first when my browser (Chrome) stops responding but then other programs will stop responding as well. The only fix is a reboot. It is sporadic but it will usually occur once a day. In the logs there will be a unable to handle kernel NULL pointer dereference message related to filemap_map_pages+0x10d/0x290 (below). ... [51985.993033] BUG: unable to handle kernel NULL pointer dereference at 0008 [51985.993087] IP: [] filemap_map_pages+0x10d/0x290 [51985.993123] PGD 2c772067 PUD 0 [51985.993144] Oops: [#1] SMP [51985.993166] Modules linked in: ctr ccm cpufreq_conservative cpufreq_stats cpufreq_userspace cpufreq_powersave binfmt_misc i915 arc4 iwldvm mac80211 x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul crc32c_intel iTCO_wdt ghash_clmulni_intel iTCO_vendor_support jitterentropy_rng sha256_generic hmac drbg snd_hda_codec_hdmi aesni_intel snd_hda_codec_realtek aes_x86_64 iwlwifi glue_helper snd_hda_codec_generic i2c_algo_bit lrw drm_kms_helper gf128mul ablk_helper cryptd snd_hda_intel drm psmouse snd_hda_codec cfg80211 pcspkr evdev serio_raw snd_hwdep i2c_i801 snd_hda_core sg snd_pcm mei_me lpc_ich mfd_core mei shpchp snd_timer i2c_core wmi thinkpad_acpi nvram snd battery tpm_tis soundcore ac tpm video button intel_smartconnect btusb btbcm btintel bluetooth rfkill loop ipv6 autofs4 [51985.993591] ext4 crc16 mbcache jbd2 sd_mod ahci libahci libata ehci_pci sdhci_pci scsi_mod xhci_pci sdhci xhci_hcd ehci_hcd mmc_core usbcore usb_common thermal [51985.993680] CPU: 2 PID: 22993 Comm: chrome Not tainted 4.5.0-rc2-next-20160203+ #11 [51985.993714] Hardware name: LENOVO 3443CTO/3443CTO, BIOS G6ET59WW (2.03 ) 09/11/2012 [51985.993760] task: 88004bb04dc0 ti: 88002a2f8000 task.ti: 88002a2f8000 [51985.993804] RIP: 0010:[] [] filemap_map_pages+0x10d/0x290 [51985.993845] RSP: :88002a2fbdf8 EFLAGS: 00010202 [51985.993874] RAX: 0007fff8 RBX: 0001 RCX: 0003 [51985.993911] RDX: RSI: ea5bdd1c RDI: ea5bdd00 [51985.993948] RBP: 8800beff4220 R08: 007f R09: [51985.993985] R10: R11: 8800a39382b8 R12: 8801182b9440 [51985.994023] R13: 88002a2fbe90 R14: 8800be568d80 R15: 0008 [51985.994061] FS: 7f3e20276a40() GS:88011e30() knlGS: [51985.994103] CS: 0010 DS: ES: CR0: 80050033 [51985.994134] CR2: 0008 CR3: be6eb000 CR4: 001406e0 [51985.994172] Stack: [51985.994184] 8800beff4228 7f3e0ae63000 0001 [51985.994229] 0001 7f3e0ae63000 8800be568d80 0054 [51985.994273] 88000318 88003584c318 8800840076c0 8117b073 [51985.994318] Call Trace: [51985.994336] [] ? handle_mm_fault+0x13b3/0x1790 [51985.994370] [] ? up_write+0x21/0x30 [51985.994400] [] ? __do_page_fault+0x192/0x410 [51985.994434] [] ? page_fault+0x28/0x30 [51985.994463] Code: 00 00 00 48 8b 54 24 10 49 3b 55 28 74 48 48 8b 44 24 18 83 e8 01 29 d0 49 8d 04 c7 49 39 c7 74 19 49 83 c7 08 48 83 44 24 10 01 <49> 83 3f 00 74 eb 4d 85 ff 0f 85 3b ff ff ff 48 8b 3c 24 48 8d [51985.994656] RIP [] filemap_map_pages+0x10d/0x290 [51985.994692] RSP [51985.994711] CR2: 0008 [51986.002154] ---[ end trace da60309b42c1da53 ]--- [52006.988971] INFO: rcu_sched self-detected stall on CPU [52006.988978] 3-...: (5249 ticks this GP) idle=765/141/0 softirq=931903/931903 fqs=5247 [52006.988980] (t=5250 jiffies g=821290 c=821289 q=1968) [52006.988982] Task dump for CPU 3: [52006.988985] CompositorTileW R running task0 22999 1425 0x0108 [52006.988988] 81851580 81148a21 88011e397540 81851580 [52006.988991] 880117daa180 810c5e19 00983e0c [52006.988993] 810cfd7e 0092 0092 003b9aca [52006.988995] Call Trace: [52006.988996][] ? rcu_dump_cpu_stacks+0x71/0x8a [52006.989004] [] ? rcu_check_callbacks+0x6e9/0x790 [52006.989007] [] ? timekeeping_update+0xee/0x150 [52006.989009] [] ? tick_sched_handle.isra.14+0x50/0x50 [52006.989011] [] ? update_process_times+0x32/0x60 [52006.989013] [] ? tick_sched_handle.isra.14+0x20/0x50 [52006.989014] [] ? tick_sched_timer+0x38/0x70 [52006.989016] [] ? __hrtimer_run_queues+0xec/0x230 [52006.989017] [] ? hrtimer_interrupt+0x9a/0x1a0 [52006.989020] [] ? smp_apic_timer_interrupt+0x39/0x50 [52006.989022] [] ? apic_timer_interrupt+0x82/0x90 [52006.989023][] ? delay_tsc+0x25/0x50 [52006.989028] [] ? do_raw_spin_lock+0x86/0x150 [52006.989031] [] ? handle_mm_fault+0x4b5/0x1790 [52006.989033] [] ?
Re: [REGRESSION] mm: filemap_map_pages NULL pointer dereference
On Fri, 5 Feb 2016 10:05:02 -0800 Jeremiah Mahlerwrote: > all, > > On a Lenovo X1 Carbon running -next (20160201+, 20160203+) I have > experienced several system hangs. I usually notice it first when > my browser (Chrome) stops responding but then other programs will stop > responding as well. The only fix is a reboot. It is sporadic but it > will usually occur once a day. > > In the logs there will be a > > unable to handle kernel NULL pointer dereference > > message related to filemap_map_pages+0x10d/0x290 (below). > > > ... > [51985.993033] BUG: unable to handle kernel NULL pointer dereference at > 0008 > [51985.993087] IP: [] filemap_map_pages+0x10d/0x290 > [51985.993123] PGD 2c772067 PUD 0 > [51985.993144] Oops: [#1] SMP > [51985.993166] Modules linked in: ctr ccm cpufreq_conservative cpufreq_stats > cpufreq_userspace cpufreq_powersave binfmt_misc i915 arc4 iwldvm mac80211 > x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul > crc32c_intel iTCO_wdt ghash_clmulni_intel iTCO_vendor_support > jitterentropy_rng sha256_generic hmac drbg snd_hda_codec_hdmi aesni_intel > snd_hda_codec_realtek aes_x86_64 iwlwifi glue_helper snd_hda_codec_generic > i2c_algo_bit lrw drm_kms_helper gf128mul ablk_helper cryptd snd_hda_intel drm > psmouse snd_hda_codec cfg80211 pcspkr evdev serio_raw snd_hwdep i2c_i801 > snd_hda_core sg snd_pcm mei_me lpc_ich mfd_core mei shpchp snd_timer i2c_core > wmi thinkpad_acpi nvram snd battery tpm_tis soundcore ac tpm video button > intel_smartconnect btusb btbcm btintel bluetooth rfkill loop ipv6 autofs4 > [51985.993591] ext4 crc16 mbcache jbd2 sd_mod ahci libahci libata ehci_pci > sdhci_pci scsi_mod xhci_pci sdhci xhci_hcd ehci_hcd mmc_core usbcore > usb_common thermal > [51985.993680] CPU: 2 PID: 22993 Comm: chrome Not tainted > 4.5.0-rc2-next-20160203+ #11 > [51985.993714] Hardware name: LENOVO 3443CTO/3443CTO, BIOS G6ET59WW (2.03 ) > 09/11/2012 > [51985.993760] task: 88004bb04dc0 ti: 88002a2f8000 task.ti: > 88002a2f8000 > [51985.993804] RIP: 0010:[] [] > filemap_map_pages+0x10d/0x290 > [51985.993845] RSP: :88002a2fbdf8 EFLAGS: 00010202 > [51985.993874] RAX: 0007fff8 RBX: 0001 RCX: > 0003 > [51985.993911] RDX: RSI: ea5bdd1c RDI: > ea5bdd00 > [51985.993948] RBP: 8800beff4220 R08: 007f R09: > > [51985.993985] R10: R11: 8800a39382b8 R12: > 8801182b9440 > [51985.994023] R13: 88002a2fbe90 R14: 8800be568d80 R15: > 0008 > [51985.994061] FS: 7f3e20276a40() GS:88011e30() > knlGS: > [51985.994103] CS: 0010 DS: ES: CR0: 80050033 > [51985.994134] CR2: 0008 CR3: be6eb000 CR4: > 001406e0 > [51985.994172] Stack: > [51985.994184] 8800beff4228 7f3e0ae63000 0001 > > [51985.994229] 0001 7f3e0ae63000 8800be568d80 > 0054 > [51985.994273] 88000318 88003584c318 8800840076c0 > 8117b073 > [51985.994318] Call Trace: > [51985.994336] [] ? handle_mm_fault+0x13b3/0x1790 > [51985.994370] [] ? up_write+0x21/0x30 > [51985.994400] [] ? __do_page_fault+0x192/0x410 > [51985.994434] [] ? page_fault+0x28/0x30 > [51985.994463] Code: 00 00 00 48 8b 54 24 10 49 3b 55 28 74 48 48 8b 44 24 18 > 83 e8 01 29 d0 49 8d 04 c7 49 39 c7 74 19 49 83 c7 08 48 83 44 24 10 01 <49> > 83 3f 00 74 eb 4d 85 ff 0f 85 3b ff ff ff 48 8b 3c 24 48 8d > [51985.994656] RIP [] filemap_map_pages+0x10d/0x290 > [51985.994692] RSP > [51985.994711] CR2: 0008 > > ... > > Referring again to the RIP line from the trace. > > [51985.994656] RIP [] filemap_map_pages+0x10d/0x290 > > jeri@hudson:~/linux-next$ gdb vmlinux > (gdb) list *0x8114a19d > 0x8114a19d is in filemap_map_pages > (include/linux/radix-tree.h:465). > 460 unsigned size = radix_tree_chunk_size(iter) - 1; > 461 > 462 while (size--) { > 463 slot++; > 464 iter->index++; > 465 if (likely(*slot)) > 466 return slot; > 467 if (flags & RADIX_TREE_ITER_CONTIG) { > 468 /* forbid switching to the next chunk */ > 469 iter->next_index = 0; > (gdb) > > Assuming I traced the addresses correctly, this indicates that the > fault is triggered when the value in the slot pointer is accessed. > Perhaps slot is being incremented beyond its valid range? That's super helpful, thanks. The faulting address was 0x0008, so radix_tree_next_slot() was called with slot==NULL. And looking at it, I don't see how this code can
Re: [REGRESSION] mm: filemap_map_pages NULL pointer dereference
On Fri, 5 Feb 2016 10:05:02 -0800 Jeremiah Mahlerwrote: > On a Lenovo X1 Carbon running -next (20160201+, 20160203+) I have > experienced several system hangs. I usually notice it first when > my browser (Chrome) stops responding but then other programs will stop > responding as well. The only fix is a reboot. It is sporadic but it > will usually occur once a day. > > In the logs there will be a > > unable to handle kernel NULL pointer dereference This should fix it up. From: Konstantin Khlebnikov Subject: radix-tree: fix oops after radix_tree_iter_retry Helper radix_tree_iter_retry() resets next_index to the current index. In following radix_tree_next_slot current chunk size becomes zero. This isn't checked and it tries to dereference null pointer in slot. Tagged iterator is fine because retry happens only at slot 0 where tag bitmask in iter->tags is filled with single bit. Fixes: 46437f9a554f ("radix-tree: fix race in gang lookup") Signed-off-by: Konstantin Khlebnikov Cc: Matthew Wilcox Cc: Hugh Dickins Cc: Ohad Ben-Cohen Cc: Jeremiah Mahler Cc: Signed-off-by: Andrew Morton --- include/linux/radix-tree.h |6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff -puN include/linux/radix-tree.h~radix-tree-fix-oops-after-radix_tree_iter_retry include/linux/radix-tree.h --- a/include/linux/radix-tree.h~radix-tree-fix-oops-after-radix_tree_iter_retry +++ a/include/linux/radix-tree.h @@ -400,7 +400,7 @@ void **radix_tree_iter_retry(struct radi * @iter: pointer to radix tree iterator * Returns:current chunk size */ -static __always_inline unsigned +static __always_inline long radix_tree_chunk_size(struct radix_tree_iter *iter) { return iter->next_index - iter->index; @@ -434,9 +434,9 @@ radix_tree_next_slot(void **slot, struct return slot + offset + 1; } } else { - unsigned size = radix_tree_chunk_size(iter) - 1; + long size = radix_tree_chunk_size(iter); - while (size--) { + while (--size > 0) { slot++; iter->index++; if (likely(*slot)) _