Re: [REGRESSION] mm: filemap_map_pages NULL pointer dereference

2016-02-07 Thread Jeremiah Mahler
Konstantin, Andrew,

On Fri, Feb 05, 2016 at 02:19:40PM -0800, Andrew Morton wrote:
> On Fri, 5 Feb 2016 10:05:02 -0800 Jeremiah Mahler  wrote:
> 
[...]
> 
> This should fix it up.
> 
> From: Konstantin Khlebnikov 
> Subject: radix-tree: fix oops after radix_tree_iter_retry
> 
> Helper radix_tree_iter_retry() resets next_index to the current index.  In
> following radix_tree_next_slot current chunk size becomes zero.  This
> isn't checked and it tries to dereference null pointer in slot.
> 
> Tagged iterator is fine because retry happens only at slot 0 where tag
> bitmask in iter->tags is filled with single bit.
> 
> Fixes: 46437f9a554f ("radix-tree: fix race in gang lookup")
> Signed-off-by: Konstantin Khlebnikov 
> Cc: Matthew Wilcox 
> Cc: Hugh Dickins 
> Cc: Ohad Ben-Cohen 
> Cc: Jeremiah Mahler 
> Cc: 
> Signed-off-by: Andrew Morton 
> ---
> 
>  include/linux/radix-tree.h |6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff -puN 
> include/linux/radix-tree.h~radix-tree-fix-oops-after-radix_tree_iter_retry 
> include/linux/radix-tree.h
> --- 
> a/include/linux/radix-tree.h~radix-tree-fix-oops-after-radix_tree_iter_retry
> +++ a/include/linux/radix-tree.h
> @@ -400,7 +400,7 @@ void **radix_tree_iter_retry(struct radi
>   * @iter:pointer to radix tree iterator
>   * Returns:  current chunk size
>   */
> -static __always_inline unsigned
> +static __always_inline long
>  radix_tree_chunk_size(struct radix_tree_iter *iter)
>  {
>   return iter->next_index - iter->index;
> @@ -434,9 +434,9 @@ radix_tree_next_slot(void **slot, struct
>   return slot + offset + 1;
>   }
>   } else {
> - unsigned size = radix_tree_chunk_size(iter) - 1;
> + long size = radix_tree_chunk_size(iter);
>  
> - while (size--) {
> + while (--size > 0) {
>   slot++;
>   iter->index++;
>   if (likely(*slot))
> _
> 

Fix is still working great after a couple days.

Tested-by: Jeremiah Mahler 

-- 
- Jeremiah Mahler


Re: [REGRESSION] mm: filemap_map_pages NULL pointer dereference

2016-02-07 Thread Jeremiah Mahler
Konstantin,

On Sun, Feb 07, 2016 at 11:27:53AM +0300, Konstantin Khlebnikov wrote:
> On Sat, Feb 6, 2016 at 9:18 PM, Jeremiah Mahler  wrote:
[...]
> >> -static __always_inline unsigned
> >> +static __always_inline long
> >>  radix_tree_chunk_size(struct radix_tree_iter *iter)
> >>  {
> >>   return iter->next_index - iter->index;
> >> @@ -434,9 +434,9 @@ radix_tree_next_slot(void **slot, struct
> >>   return slot + offset + 1;
> >>   }
> >>   } else {
> >> - unsigned size = radix_tree_chunk_size(iter) - 1;
> >> + long size = radix_tree_chunk_size(iter);
> >>
> >> - while (size--) {
> >> + while (--size > 0) {
> >>   slot++;
> >>   iter->index++;
> >>   if (likely(*slot))
> >> _
> >>
> >
> > I have applied this patch to my kernel and so far the bug has not
> > come back.  Thanks for the quick fix.
> >
> > Although I don't quite understand how this fixes the slot==NULL problem.
> > Unless I am missing something, it looks like the while loop will be
> > executed the same number of times but the size variable will no
> > longer go negative as it did before.
> 
> That's simple. Slot is dereferenced after checking remaining size.
> Old version checked only for != 0. After iter-retry size is zero and
> afrer "- 1" it overlaps into positive range. In new version it's signed and
> checked for > 0.
> 

OK, I get it now.  Thanks for the explanation.

-- 
- Jeremiah Mahler


Re: [REGRESSION] mm: filemap_map_pages NULL pointer dereference

2016-02-07 Thread Konstantin Khlebnikov
On Sat, Feb 6, 2016 at 9:18 PM, Jeremiah Mahler  wrote:
> Andrew,
>
> On Fri, Feb 05, 2016 at 02:19:40PM -0800, Andrew Morton wrote:
>> On Fri, 5 Feb 2016 10:05:02 -0800 Jeremiah Mahler  wrote:
>>
> [...]
>> >   unable to handle kernel NULL pointer dereference
>>
>> This should fix it up.
>>
> [...]
>>
>>  include/linux/radix-tree.h |6 +++---
>>  1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff -puN 
>> include/linux/radix-tree.h~radix-tree-fix-oops-after-radix_tree_iter_retry 
>> include/linux/radix-tree.h
>> --- 
>> a/include/linux/radix-tree.h~radix-tree-fix-oops-after-radix_tree_iter_retry
>> +++ a/include/linux/radix-tree.h
>> @@ -400,7 +400,7 @@ void **radix_tree_iter_retry(struct radi
>>   * @iter:pointer to radix tree iterator
>>   * Returns:  current chunk size
>>   */
>> -static __always_inline unsigned
>> +static __always_inline long
>>  radix_tree_chunk_size(struct radix_tree_iter *iter)
>>  {
>>   return iter->next_index - iter->index;
>> @@ -434,9 +434,9 @@ radix_tree_next_slot(void **slot, struct
>>   return slot + offset + 1;
>>   }
>>   } else {
>> - unsigned size = radix_tree_chunk_size(iter) - 1;
>> + long size = radix_tree_chunk_size(iter);
>>
>> - while (size--) {
>> + while (--size > 0) {
>>   slot++;
>>   iter->index++;
>>   if (likely(*slot))
>> _
>>
>
> I have applied this patch to my kernel and so far the bug has not
> come back.  Thanks for the quick fix.
>
> Although I don't quite understand how this fixes the slot==NULL problem.
> Unless I am missing something, it looks like the while loop will be
> executed the same number of times but the size variable will no
> longer go negative as it did before.

That's simple. Slot is dereferenced after checking remaining size.
Old version checked only for != 0. After iter-retry size is zero and
afrer "- 1" it overlaps into positive range. In new version it's signed and
checked for > 0.

>
> --
> - Jeremiah Mahler


Re: [REGRESSION] mm: filemap_map_pages NULL pointer dereference

2016-02-07 Thread Konstantin Khlebnikov
On Sat, Feb 6, 2016 at 9:18 PM, Jeremiah Mahler  wrote:
> Andrew,
>
> On Fri, Feb 05, 2016 at 02:19:40PM -0800, Andrew Morton wrote:
>> On Fri, 5 Feb 2016 10:05:02 -0800 Jeremiah Mahler  wrote:
>>
> [...]
>> >   unable to handle kernel NULL pointer dereference
>>
>> This should fix it up.
>>
> [...]
>>
>>  include/linux/radix-tree.h |6 +++---
>>  1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff -puN 
>> include/linux/radix-tree.h~radix-tree-fix-oops-after-radix_tree_iter_retry 
>> include/linux/radix-tree.h
>> --- 
>> a/include/linux/radix-tree.h~radix-tree-fix-oops-after-radix_tree_iter_retry
>> +++ a/include/linux/radix-tree.h
>> @@ -400,7 +400,7 @@ void **radix_tree_iter_retry(struct radi
>>   * @iter:pointer to radix tree iterator
>>   * Returns:  current chunk size
>>   */
>> -static __always_inline unsigned
>> +static __always_inline long
>>  radix_tree_chunk_size(struct radix_tree_iter *iter)
>>  {
>>   return iter->next_index - iter->index;
>> @@ -434,9 +434,9 @@ radix_tree_next_slot(void **slot, struct
>>   return slot + offset + 1;
>>   }
>>   } else {
>> - unsigned size = radix_tree_chunk_size(iter) - 1;
>> + long size = radix_tree_chunk_size(iter);
>>
>> - while (size--) {
>> + while (--size > 0) {
>>   slot++;
>>   iter->index++;
>>   if (likely(*slot))
>> _
>>
>
> I have applied this patch to my kernel and so far the bug has not
> come back.  Thanks for the quick fix.
>
> Although I don't quite understand how this fixes the slot==NULL problem.
> Unless I am missing something, it looks like the while loop will be
> executed the same number of times but the size variable will no
> longer go negative as it did before.

That's simple. Slot is dereferenced after checking remaining size.
Old version checked only for != 0. After iter-retry size is zero and
afrer "- 1" it overlaps into positive range. In new version it's signed and
checked for > 0.

>
> --
> - Jeremiah Mahler


Re: [REGRESSION] mm: filemap_map_pages NULL pointer dereference

2016-02-07 Thread Jeremiah Mahler
Konstantin, Andrew,

On Fri, Feb 05, 2016 at 02:19:40PM -0800, Andrew Morton wrote:
> On Fri, 5 Feb 2016 10:05:02 -0800 Jeremiah Mahler  wrote:
> 
[...]
> 
> This should fix it up.
> 
> From: Konstantin Khlebnikov 
> Subject: radix-tree: fix oops after radix_tree_iter_retry
> 
> Helper radix_tree_iter_retry() resets next_index to the current index.  In
> following radix_tree_next_slot current chunk size becomes zero.  This
> isn't checked and it tries to dereference null pointer in slot.
> 
> Tagged iterator is fine because retry happens only at slot 0 where tag
> bitmask in iter->tags is filled with single bit.
> 
> Fixes: 46437f9a554f ("radix-tree: fix race in gang lookup")
> Signed-off-by: Konstantin Khlebnikov 
> Cc: Matthew Wilcox 
> Cc: Hugh Dickins 
> Cc: Ohad Ben-Cohen 
> Cc: Jeremiah Mahler 
> Cc: 
> Signed-off-by: Andrew Morton 
> ---
> 
>  include/linux/radix-tree.h |6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff -puN 
> include/linux/radix-tree.h~radix-tree-fix-oops-after-radix_tree_iter_retry 
> include/linux/radix-tree.h
> --- 
> a/include/linux/radix-tree.h~radix-tree-fix-oops-after-radix_tree_iter_retry
> +++ a/include/linux/radix-tree.h
> @@ -400,7 +400,7 @@ void **radix_tree_iter_retry(struct radi
>   * @iter:pointer to radix tree iterator
>   * Returns:  current chunk size
>   */
> -static __always_inline unsigned
> +static __always_inline long
>  radix_tree_chunk_size(struct radix_tree_iter *iter)
>  {
>   return iter->next_index - iter->index;
> @@ -434,9 +434,9 @@ radix_tree_next_slot(void **slot, struct
>   return slot + offset + 1;
>   }
>   } else {
> - unsigned size = radix_tree_chunk_size(iter) - 1;
> + long size = radix_tree_chunk_size(iter);
>  
> - while (size--) {
> + while (--size > 0) {
>   slot++;
>   iter->index++;
>   if (likely(*slot))
> _
> 

Fix is still working great after a couple days.

Tested-by: Jeremiah Mahler 

-- 
- Jeremiah Mahler


Re: [REGRESSION] mm: filemap_map_pages NULL pointer dereference

2016-02-07 Thread Jeremiah Mahler
Konstantin,

On Sun, Feb 07, 2016 at 11:27:53AM +0300, Konstantin Khlebnikov wrote:
> On Sat, Feb 6, 2016 at 9:18 PM, Jeremiah Mahler  wrote:
[...]
> >> -static __always_inline unsigned
> >> +static __always_inline long
> >>  radix_tree_chunk_size(struct radix_tree_iter *iter)
> >>  {
> >>   return iter->next_index - iter->index;
> >> @@ -434,9 +434,9 @@ radix_tree_next_slot(void **slot, struct
> >>   return slot + offset + 1;
> >>   }
> >>   } else {
> >> - unsigned size = radix_tree_chunk_size(iter) - 1;
> >> + long size = radix_tree_chunk_size(iter);
> >>
> >> - while (size--) {
> >> + while (--size > 0) {
> >>   slot++;
> >>   iter->index++;
> >>   if (likely(*slot))
> >> _
> >>
> >
> > I have applied this patch to my kernel and so far the bug has not
> > come back.  Thanks for the quick fix.
> >
> > Although I don't quite understand how this fixes the slot==NULL problem.
> > Unless I am missing something, it looks like the while loop will be
> > executed the same number of times but the size variable will no
> > longer go negative as it did before.
> 
> That's simple. Slot is dereferenced after checking remaining size.
> Old version checked only for != 0. After iter-retry size is zero and
> afrer "- 1" it overlaps into positive range. In new version it's signed and
> checked for > 0.
> 

OK, I get it now.  Thanks for the explanation.

-- 
- Jeremiah Mahler


Re: [REGRESSION] mm: filemap_map_pages NULL pointer dereference

2016-02-06 Thread Jeremiah Mahler
Andrew,

On Fri, Feb 05, 2016 at 02:19:40PM -0800, Andrew Morton wrote:
> On Fri, 5 Feb 2016 10:05:02 -0800 Jeremiah Mahler  wrote:
> 
[...]
> >   unable to handle kernel NULL pointer dereference
> 
> This should fix it up.
> 
[...]
> 
>  include/linux/radix-tree.h |6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff -puN 
> include/linux/radix-tree.h~radix-tree-fix-oops-after-radix_tree_iter_retry 
> include/linux/radix-tree.h
> --- 
> a/include/linux/radix-tree.h~radix-tree-fix-oops-after-radix_tree_iter_retry
> +++ a/include/linux/radix-tree.h
> @@ -400,7 +400,7 @@ void **radix_tree_iter_retry(struct radi
>   * @iter:pointer to radix tree iterator
>   * Returns:  current chunk size
>   */
> -static __always_inline unsigned
> +static __always_inline long
>  radix_tree_chunk_size(struct radix_tree_iter *iter)
>  {
>   return iter->next_index - iter->index;
> @@ -434,9 +434,9 @@ radix_tree_next_slot(void **slot, struct
>   return slot + offset + 1;
>   }
>   } else {
> - unsigned size = radix_tree_chunk_size(iter) - 1;
> + long size = radix_tree_chunk_size(iter);
>  
> - while (size--) {
> + while (--size > 0) {
>   slot++;
>   iter->index++;
>   if (likely(*slot))
> _
> 

I have applied this patch to my kernel and so far the bug has not
come back.  Thanks for the quick fix.

Although I don't quite understand how this fixes the slot==NULL problem.
Unless I am missing something, it looks like the while loop will be
executed the same number of times but the size variable will no
longer go negative as it did before.

-- 
- Jeremiah Mahler


Re: [REGRESSION] mm: filemap_map_pages NULL pointer dereference

2016-02-06 Thread Jeremiah Mahler
Andrew,

On Fri, Feb 05, 2016 at 02:19:40PM -0800, Andrew Morton wrote:
> On Fri, 5 Feb 2016 10:05:02 -0800 Jeremiah Mahler  wrote:
> 
[...]
> >   unable to handle kernel NULL pointer dereference
> 
> This should fix it up.
> 
[...]
> 
>  include/linux/radix-tree.h |6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff -puN 
> include/linux/radix-tree.h~radix-tree-fix-oops-after-radix_tree_iter_retry 
> include/linux/radix-tree.h
> --- 
> a/include/linux/radix-tree.h~radix-tree-fix-oops-after-radix_tree_iter_retry
> +++ a/include/linux/radix-tree.h
> @@ -400,7 +400,7 @@ void **radix_tree_iter_retry(struct radi
>   * @iter:pointer to radix tree iterator
>   * Returns:  current chunk size
>   */
> -static __always_inline unsigned
> +static __always_inline long
>  radix_tree_chunk_size(struct radix_tree_iter *iter)
>  {
>   return iter->next_index - iter->index;
> @@ -434,9 +434,9 @@ radix_tree_next_slot(void **slot, struct
>   return slot + offset + 1;
>   }
>   } else {
> - unsigned size = radix_tree_chunk_size(iter) - 1;
> + long size = radix_tree_chunk_size(iter);
>  
> - while (size--) {
> + while (--size > 0) {
>   slot++;
>   iter->index++;
>   if (likely(*slot))
> _
> 

I have applied this patch to my kernel and so far the bug has not
come back.  Thanks for the quick fix.

Although I don't quite understand how this fixes the slot==NULL problem.
Unless I am missing something, it looks like the while loop will be
executed the same number of times but the size variable will no
longer go negative as it did before.

-- 
- Jeremiah Mahler


Re: [REGRESSION] mm: filemap_map_pages NULL pointer dereference

2016-02-05 Thread Andrew Morton
On Fri, 5 Feb 2016 10:05:02 -0800 Jeremiah Mahler  wrote:

> On a Lenovo X1 Carbon running -next (20160201+, 20160203+) I have
> experienced several system hangs.  I usually notice it first when
> my browser (Chrome) stops responding but then other programs will stop
> responding as well.  The only fix is a reboot.  It is sporadic but it
> will usually occur once a day.
> 
> In the logs there will be a
> 
>   unable to handle kernel NULL pointer dereference

This should fix it up.

From: Konstantin Khlebnikov 
Subject: radix-tree: fix oops after radix_tree_iter_retry

Helper radix_tree_iter_retry() resets next_index to the current index.  In
following radix_tree_next_slot current chunk size becomes zero.  This
isn't checked and it tries to dereference null pointer in slot.

Tagged iterator is fine because retry happens only at slot 0 where tag
bitmask in iter->tags is filled with single bit.

Fixes: 46437f9a554f ("radix-tree: fix race in gang lookup")
Signed-off-by: Konstantin Khlebnikov 
Cc: Matthew Wilcox 
Cc: Hugh Dickins 
Cc: Ohad Ben-Cohen 
Cc: Jeremiah Mahler 
Cc: 
Signed-off-by: Andrew Morton 
---

 include/linux/radix-tree.h |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff -puN 
include/linux/radix-tree.h~radix-tree-fix-oops-after-radix_tree_iter_retry 
include/linux/radix-tree.h
--- a/include/linux/radix-tree.h~radix-tree-fix-oops-after-radix_tree_iter_retry
+++ a/include/linux/radix-tree.h
@@ -400,7 +400,7 @@ void **radix_tree_iter_retry(struct radi
  * @iter:  pointer to radix tree iterator
  * Returns:current chunk size
  */
-static __always_inline unsigned
+static __always_inline long
 radix_tree_chunk_size(struct radix_tree_iter *iter)
 {
return iter->next_index - iter->index;
@@ -434,9 +434,9 @@ radix_tree_next_slot(void **slot, struct
return slot + offset + 1;
}
} else {
-   unsigned size = radix_tree_chunk_size(iter) - 1;
+   long size = radix_tree_chunk_size(iter);
 
-   while (size--) {
+   while (--size > 0) {
slot++;
iter->index++;
if (likely(*slot))
_



Re: [REGRESSION] mm: filemap_map_pages NULL pointer dereference

2016-02-05 Thread Andrew Morton
On Fri, 5 Feb 2016 10:05:02 -0800 Jeremiah Mahler  wrote:

> all,
> 
> On a Lenovo X1 Carbon running -next (20160201+, 20160203+) I have
> experienced several system hangs.  I usually notice it first when
> my browser (Chrome) stops responding but then other programs will stop
> responding as well.  The only fix is a reboot.  It is sporadic but it
> will usually occur once a day.
> 
> In the logs there will be a
> 
>   unable to handle kernel NULL pointer dereference
> 
> message related to filemap_map_pages+0x10d/0x290 (below).
> 
> 
> ...
> [51985.993033] BUG: unable to handle kernel NULL pointer dereference at 
> 0008
> [51985.993087] IP: [] filemap_map_pages+0x10d/0x290
> [51985.993123] PGD 2c772067 PUD 0 
> [51985.993144] Oops:  [#1] SMP 
> [51985.993166] Modules linked in: ctr ccm cpufreq_conservative cpufreq_stats 
> cpufreq_userspace cpufreq_powersave binfmt_misc i915 arc4 iwldvm mac80211 
> x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul 
> crc32c_intel iTCO_wdt ghash_clmulni_intel iTCO_vendor_support 
> jitterentropy_rng sha256_generic hmac drbg snd_hda_codec_hdmi aesni_intel 
> snd_hda_codec_realtek aes_x86_64 iwlwifi glue_helper snd_hda_codec_generic 
> i2c_algo_bit lrw drm_kms_helper gf128mul ablk_helper cryptd snd_hda_intel drm 
> psmouse snd_hda_codec cfg80211 pcspkr evdev serio_raw snd_hwdep i2c_i801 
> snd_hda_core sg snd_pcm mei_me lpc_ich mfd_core mei shpchp snd_timer i2c_core 
> wmi thinkpad_acpi nvram snd battery tpm_tis soundcore ac tpm video button 
> intel_smartconnect btusb btbcm btintel bluetooth rfkill loop ipv6 autofs4
> [51985.993591]  ext4 crc16 mbcache jbd2 sd_mod ahci libahci libata ehci_pci 
> sdhci_pci scsi_mod xhci_pci sdhci xhci_hcd ehci_hcd mmc_core usbcore 
> usb_common thermal
> [51985.993680] CPU: 2 PID: 22993 Comm: chrome Not tainted 
> 4.5.0-rc2-next-20160203+ #11
> [51985.993714] Hardware name: LENOVO 3443CTO/3443CTO, BIOS G6ET59WW (2.03 ) 
> 09/11/2012
> [51985.993760] task: 88004bb04dc0 ti: 88002a2f8000 task.ti: 
> 88002a2f8000
> [51985.993804] RIP: 0010:[]  [] 
> filemap_map_pages+0x10d/0x290
> [51985.993845] RSP: :88002a2fbdf8  EFLAGS: 00010202
> [51985.993874] RAX: 0007fff8 RBX: 0001 RCX: 
> 0003
> [51985.993911] RDX:  RSI: ea5bdd1c RDI: 
> ea5bdd00
> [51985.993948] RBP: 8800beff4220 R08: 007f R09: 
> 
> [51985.993985] R10:  R11: 8800a39382b8 R12: 
> 8801182b9440
> [51985.994023] R13: 88002a2fbe90 R14: 8800be568d80 R15: 
> 0008
> [51985.994061] FS:  7f3e20276a40() GS:88011e30() 
> knlGS:
> [51985.994103] CS:  0010 DS:  ES:  CR0: 80050033
> [51985.994134] CR2: 0008 CR3: be6eb000 CR4: 
> 001406e0
> [51985.994172] Stack:
> [51985.994184]  8800beff4228 7f3e0ae63000 0001 
> 
> [51985.994229]  0001 7f3e0ae63000 8800be568d80 
> 0054
> [51985.994273]  88000318 88003584c318 8800840076c0 
> 8117b073
> [51985.994318] Call Trace:
> [51985.994336]  [] ? handle_mm_fault+0x13b3/0x1790
> [51985.994370]  [] ? up_write+0x21/0x30
> [51985.994400]  [] ? __do_page_fault+0x192/0x410
> [51985.994434]  [] ? page_fault+0x28/0x30
> [51985.994463] Code: 00 00 00 48 8b 54 24 10 49 3b 55 28 74 48 48 8b 44 24 18 
> 83 e8 01 29 d0 49 8d 04 c7 49 39 c7 74 19 49 83 c7 08 48 83 44 24 10 01 <49> 
> 83 3f 00 74 eb 4d 85 ff 0f 85 3b ff ff ff 48 8b 3c 24 48 8d 
> [51985.994656] RIP  [] filemap_map_pages+0x10d/0x290
> [51985.994692]  RSP 
> [51985.994711] CR2: 0008
> 
> ...
>
> Referring again to the RIP line from the trace.
> 
> [51985.994656] RIP  [] filemap_map_pages+0x10d/0x290
> 
> jeri@hudson:~/linux-next$ gdb vmlinux
> (gdb) list *0x8114a19d
> 0x8114a19d is in filemap_map_pages
> (include/linux/radix-tree.h:465).
> 460   unsigned size = radix_tree_chunk_size(iter) - 1;
> 461   
> 462   while (size--) {
> 463   slot++;
> 464   iter->index++;
> 465   if (likely(*slot))
> 466   return slot;
> 467   if (flags & RADIX_TREE_ITER_CONTIG) {
> 468   /* forbid switching to the next chunk */
> 469   iter->next_index = 0;
> (gdb)
> 
> Assuming I traced the addresses correctly, this indicates that the
> fault is triggered when the value in the slot pointer is accessed.
> Perhaps slot is being incremented beyond its valid range?

That's super helpful, thanks.

The faulting address was 0x0008, so radix_tree_next_slot()
was called with slot==NULL.

And looking at it, I don't see how this code can work at all:


:  

[REGRESSION] mm: filemap_map_pages NULL pointer dereference

2016-02-05 Thread Jeremiah Mahler
all,

On a Lenovo X1 Carbon running -next (20160201+, 20160203+) I have
experienced several system hangs.  I usually notice it first when
my browser (Chrome) stops responding but then other programs will stop
responding as well.  The only fix is a reboot.  It is sporadic but it
will usually occur once a day.

In the logs there will be a

  unable to handle kernel NULL pointer dereference

message related to filemap_map_pages+0x10d/0x290 (below).


...
[51985.993033] BUG: unable to handle kernel NULL pointer dereference at 
0008
[51985.993087] IP: [] filemap_map_pages+0x10d/0x290
[51985.993123] PGD 2c772067 PUD 0 
[51985.993144] Oops:  [#1] SMP 
[51985.993166] Modules linked in: ctr ccm cpufreq_conservative cpufreq_stats 
cpufreq_userspace cpufreq_powersave binfmt_misc i915 arc4 iwldvm mac80211 
x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul 
crc32c_intel iTCO_wdt ghash_clmulni_intel iTCO_vendor_support jitterentropy_rng 
sha256_generic hmac drbg snd_hda_codec_hdmi aesni_intel snd_hda_codec_realtek 
aes_x86_64 iwlwifi glue_helper snd_hda_codec_generic i2c_algo_bit lrw 
drm_kms_helper gf128mul ablk_helper cryptd snd_hda_intel drm psmouse 
snd_hda_codec cfg80211 pcspkr evdev serio_raw snd_hwdep i2c_i801 snd_hda_core 
sg snd_pcm mei_me lpc_ich mfd_core mei shpchp snd_timer i2c_core wmi 
thinkpad_acpi nvram snd battery tpm_tis soundcore ac tpm video button 
intel_smartconnect btusb btbcm btintel bluetooth rfkill loop ipv6 autofs4
[51985.993591]  ext4 crc16 mbcache jbd2 sd_mod ahci libahci libata ehci_pci 
sdhci_pci scsi_mod xhci_pci sdhci xhci_hcd ehci_hcd mmc_core usbcore usb_common 
thermal
[51985.993680] CPU: 2 PID: 22993 Comm: chrome Not tainted 
4.5.0-rc2-next-20160203+ #11
[51985.993714] Hardware name: LENOVO 3443CTO/3443CTO, BIOS G6ET59WW (2.03 ) 
09/11/2012
[51985.993760] task: 88004bb04dc0 ti: 88002a2f8000 task.ti: 
88002a2f8000
[51985.993804] RIP: 0010:[]  [] 
filemap_map_pages+0x10d/0x290
[51985.993845] RSP: :88002a2fbdf8  EFLAGS: 00010202
[51985.993874] RAX: 0007fff8 RBX: 0001 RCX: 0003
[51985.993911] RDX:  RSI: ea5bdd1c RDI: ea5bdd00
[51985.993948] RBP: 8800beff4220 R08: 007f R09: 
[51985.993985] R10:  R11: 8800a39382b8 R12: 8801182b9440
[51985.994023] R13: 88002a2fbe90 R14: 8800be568d80 R15: 0008
[51985.994061] FS:  7f3e20276a40() GS:88011e30() 
knlGS:
[51985.994103] CS:  0010 DS:  ES:  CR0: 80050033
[51985.994134] CR2: 0008 CR3: be6eb000 CR4: 001406e0
[51985.994172] Stack:
[51985.994184]  8800beff4228 7f3e0ae63000 0001 

[51985.994229]  0001 7f3e0ae63000 8800be568d80 
0054
[51985.994273]  88000318 88003584c318 8800840076c0 
8117b073
[51985.994318] Call Trace:
[51985.994336]  [] ? handle_mm_fault+0x13b3/0x1790
[51985.994370]  [] ? up_write+0x21/0x30
[51985.994400]  [] ? __do_page_fault+0x192/0x410
[51985.994434]  [] ? page_fault+0x28/0x30
[51985.994463] Code: 00 00 00 48 8b 54 24 10 49 3b 55 28 74 48 48 8b 44 24 18 
83 e8 01 29 d0 49 8d 04 c7 49 39 c7 74 19 49 83 c7 08 48 83 44 24 10 01 <49> 83 
3f 00 74 eb 4d 85 ff 0f 85 3b ff ff ff 48 8b 3c 24 48 8d 
[51985.994656] RIP  [] filemap_map_pages+0x10d/0x290
[51985.994692]  RSP 
[51985.994711] CR2: 0008
[51986.002154] ---[ end trace da60309b42c1da53 ]---
[52006.988971] INFO: rcu_sched self-detected stall on CPU
[52006.988978]  3-...: (5249 ticks this GP) idle=765/141/0 
softirq=931903/931903 fqs=5247 
[52006.988980]   (t=5250 jiffies g=821290 c=821289 q=1968)
[52006.988982] Task dump for CPU 3:
[52006.988985] CompositorTileW R  running task0 22999   1425 0x0108
[52006.988988]  81851580 81148a21 88011e397540 
81851580
[52006.988991]   880117daa180 810c5e19 
00983e0c
[52006.988993]  810cfd7e 0092 0092 
003b9aca
[52006.988995] Call Trace:
[52006.988996][] ? rcu_dump_cpu_stacks+0x71/0x8a
[52006.989004]  [] ? rcu_check_callbacks+0x6e9/0x790
[52006.989007]  [] ? timekeeping_update+0xee/0x150
[52006.989009]  [] ? tick_sched_handle.isra.14+0x50/0x50
[52006.989011]  [] ? update_process_times+0x32/0x60
[52006.989013]  [] ? tick_sched_handle.isra.14+0x20/0x50
[52006.989014]  [] ? tick_sched_timer+0x38/0x70
[52006.989016]  [] ? __hrtimer_run_queues+0xec/0x230
[52006.989017]  [] ? hrtimer_interrupt+0x9a/0x1a0
[52006.989020]  [] ? smp_apic_timer_interrupt+0x39/0x50
[52006.989022]  [] ? apic_timer_interrupt+0x82/0x90
[52006.989023][] ? delay_tsc+0x25/0x50
[52006.989028]  [] ? do_raw_spin_lock+0x86/0x150
[52006.989031]  [] ? handle_mm_fault+0x4b5/0x1790
[52006.989033]  [] ? 

[REGRESSION] mm: filemap_map_pages NULL pointer dereference

2016-02-05 Thread Jeremiah Mahler
all,

On a Lenovo X1 Carbon running -next (20160201+, 20160203+) I have
experienced several system hangs.  I usually notice it first when
my browser (Chrome) stops responding but then other programs will stop
responding as well.  The only fix is a reboot.  It is sporadic but it
will usually occur once a day.

In the logs there will be a

  unable to handle kernel NULL pointer dereference

message related to filemap_map_pages+0x10d/0x290 (below).


...
[51985.993033] BUG: unable to handle kernel NULL pointer dereference at 
0008
[51985.993087] IP: [] filemap_map_pages+0x10d/0x290
[51985.993123] PGD 2c772067 PUD 0 
[51985.993144] Oops:  [#1] SMP 
[51985.993166] Modules linked in: ctr ccm cpufreq_conservative cpufreq_stats 
cpufreq_userspace cpufreq_powersave binfmt_misc i915 arc4 iwldvm mac80211 
x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul 
crc32c_intel iTCO_wdt ghash_clmulni_intel iTCO_vendor_support jitterentropy_rng 
sha256_generic hmac drbg snd_hda_codec_hdmi aesni_intel snd_hda_codec_realtek 
aes_x86_64 iwlwifi glue_helper snd_hda_codec_generic i2c_algo_bit lrw 
drm_kms_helper gf128mul ablk_helper cryptd snd_hda_intel drm psmouse 
snd_hda_codec cfg80211 pcspkr evdev serio_raw snd_hwdep i2c_i801 snd_hda_core 
sg snd_pcm mei_me lpc_ich mfd_core mei shpchp snd_timer i2c_core wmi 
thinkpad_acpi nvram snd battery tpm_tis soundcore ac tpm video button 
intel_smartconnect btusb btbcm btintel bluetooth rfkill loop ipv6 autofs4
[51985.993591]  ext4 crc16 mbcache jbd2 sd_mod ahci libahci libata ehci_pci 
sdhci_pci scsi_mod xhci_pci sdhci xhci_hcd ehci_hcd mmc_core usbcore usb_common 
thermal
[51985.993680] CPU: 2 PID: 22993 Comm: chrome Not tainted 
4.5.0-rc2-next-20160203+ #11
[51985.993714] Hardware name: LENOVO 3443CTO/3443CTO, BIOS G6ET59WW (2.03 ) 
09/11/2012
[51985.993760] task: 88004bb04dc0 ti: 88002a2f8000 task.ti: 
88002a2f8000
[51985.993804] RIP: 0010:[]  [] 
filemap_map_pages+0x10d/0x290
[51985.993845] RSP: :88002a2fbdf8  EFLAGS: 00010202
[51985.993874] RAX: 0007fff8 RBX: 0001 RCX: 0003
[51985.993911] RDX:  RSI: ea5bdd1c RDI: ea5bdd00
[51985.993948] RBP: 8800beff4220 R08: 007f R09: 
[51985.993985] R10:  R11: 8800a39382b8 R12: 8801182b9440
[51985.994023] R13: 88002a2fbe90 R14: 8800be568d80 R15: 0008
[51985.994061] FS:  7f3e20276a40() GS:88011e30() 
knlGS:
[51985.994103] CS:  0010 DS:  ES:  CR0: 80050033
[51985.994134] CR2: 0008 CR3: be6eb000 CR4: 001406e0
[51985.994172] Stack:
[51985.994184]  8800beff4228 7f3e0ae63000 0001 

[51985.994229]  0001 7f3e0ae63000 8800be568d80 
0054
[51985.994273]  88000318 88003584c318 8800840076c0 
8117b073
[51985.994318] Call Trace:
[51985.994336]  [] ? handle_mm_fault+0x13b3/0x1790
[51985.994370]  [] ? up_write+0x21/0x30
[51985.994400]  [] ? __do_page_fault+0x192/0x410
[51985.994434]  [] ? page_fault+0x28/0x30
[51985.994463] Code: 00 00 00 48 8b 54 24 10 49 3b 55 28 74 48 48 8b 44 24 18 
83 e8 01 29 d0 49 8d 04 c7 49 39 c7 74 19 49 83 c7 08 48 83 44 24 10 01 <49> 83 
3f 00 74 eb 4d 85 ff 0f 85 3b ff ff ff 48 8b 3c 24 48 8d 
[51985.994656] RIP  [] filemap_map_pages+0x10d/0x290
[51985.994692]  RSP 
[51985.994711] CR2: 0008
[51986.002154] ---[ end trace da60309b42c1da53 ]---
[52006.988971] INFO: rcu_sched self-detected stall on CPU
[52006.988978]  3-...: (5249 ticks this GP) idle=765/141/0 
softirq=931903/931903 fqs=5247 
[52006.988980]   (t=5250 jiffies g=821290 c=821289 q=1968)
[52006.988982] Task dump for CPU 3:
[52006.988985] CompositorTileW R  running task0 22999   1425 0x0108
[52006.988988]  81851580 81148a21 88011e397540 
81851580
[52006.988991]   880117daa180 810c5e19 
00983e0c
[52006.988993]  810cfd7e 0092 0092 
003b9aca
[52006.988995] Call Trace:
[52006.988996][] ? rcu_dump_cpu_stacks+0x71/0x8a
[52006.989004]  [] ? rcu_check_callbacks+0x6e9/0x790
[52006.989007]  [] ? timekeeping_update+0xee/0x150
[52006.989009]  [] ? tick_sched_handle.isra.14+0x50/0x50
[52006.989011]  [] ? update_process_times+0x32/0x60
[52006.989013]  [] ? tick_sched_handle.isra.14+0x20/0x50
[52006.989014]  [] ? tick_sched_timer+0x38/0x70
[52006.989016]  [] ? __hrtimer_run_queues+0xec/0x230
[52006.989017]  [] ? hrtimer_interrupt+0x9a/0x1a0
[52006.989020]  [] ? smp_apic_timer_interrupt+0x39/0x50
[52006.989022]  [] ? apic_timer_interrupt+0x82/0x90
[52006.989023][] ? delay_tsc+0x25/0x50
[52006.989028]  [] ? do_raw_spin_lock+0x86/0x150
[52006.989031]  [] ? handle_mm_fault+0x4b5/0x1790
[52006.989033]  [] ? 

Re: [REGRESSION] mm: filemap_map_pages NULL pointer dereference

2016-02-05 Thread Andrew Morton
On Fri, 5 Feb 2016 10:05:02 -0800 Jeremiah Mahler  wrote:

> all,
> 
> On a Lenovo X1 Carbon running -next (20160201+, 20160203+) I have
> experienced several system hangs.  I usually notice it first when
> my browser (Chrome) stops responding but then other programs will stop
> responding as well.  The only fix is a reboot.  It is sporadic but it
> will usually occur once a day.
> 
> In the logs there will be a
> 
>   unable to handle kernel NULL pointer dereference
> 
> message related to filemap_map_pages+0x10d/0x290 (below).
> 
> 
> ...
> [51985.993033] BUG: unable to handle kernel NULL pointer dereference at 
> 0008
> [51985.993087] IP: [] filemap_map_pages+0x10d/0x290
> [51985.993123] PGD 2c772067 PUD 0 
> [51985.993144] Oops:  [#1] SMP 
> [51985.993166] Modules linked in: ctr ccm cpufreq_conservative cpufreq_stats 
> cpufreq_userspace cpufreq_powersave binfmt_misc i915 arc4 iwldvm mac80211 
> x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul 
> crc32c_intel iTCO_wdt ghash_clmulni_intel iTCO_vendor_support 
> jitterentropy_rng sha256_generic hmac drbg snd_hda_codec_hdmi aesni_intel 
> snd_hda_codec_realtek aes_x86_64 iwlwifi glue_helper snd_hda_codec_generic 
> i2c_algo_bit lrw drm_kms_helper gf128mul ablk_helper cryptd snd_hda_intel drm 
> psmouse snd_hda_codec cfg80211 pcspkr evdev serio_raw snd_hwdep i2c_i801 
> snd_hda_core sg snd_pcm mei_me lpc_ich mfd_core mei shpchp snd_timer i2c_core 
> wmi thinkpad_acpi nvram snd battery tpm_tis soundcore ac tpm video button 
> intel_smartconnect btusb btbcm btintel bluetooth rfkill loop ipv6 autofs4
> [51985.993591]  ext4 crc16 mbcache jbd2 sd_mod ahci libahci libata ehci_pci 
> sdhci_pci scsi_mod xhci_pci sdhci xhci_hcd ehci_hcd mmc_core usbcore 
> usb_common thermal
> [51985.993680] CPU: 2 PID: 22993 Comm: chrome Not tainted 
> 4.5.0-rc2-next-20160203+ #11
> [51985.993714] Hardware name: LENOVO 3443CTO/3443CTO, BIOS G6ET59WW (2.03 ) 
> 09/11/2012
> [51985.993760] task: 88004bb04dc0 ti: 88002a2f8000 task.ti: 
> 88002a2f8000
> [51985.993804] RIP: 0010:[]  [] 
> filemap_map_pages+0x10d/0x290
> [51985.993845] RSP: :88002a2fbdf8  EFLAGS: 00010202
> [51985.993874] RAX: 0007fff8 RBX: 0001 RCX: 
> 0003
> [51985.993911] RDX:  RSI: ea5bdd1c RDI: 
> ea5bdd00
> [51985.993948] RBP: 8800beff4220 R08: 007f R09: 
> 
> [51985.993985] R10:  R11: 8800a39382b8 R12: 
> 8801182b9440
> [51985.994023] R13: 88002a2fbe90 R14: 8800be568d80 R15: 
> 0008
> [51985.994061] FS:  7f3e20276a40() GS:88011e30() 
> knlGS:
> [51985.994103] CS:  0010 DS:  ES:  CR0: 80050033
> [51985.994134] CR2: 0008 CR3: be6eb000 CR4: 
> 001406e0
> [51985.994172] Stack:
> [51985.994184]  8800beff4228 7f3e0ae63000 0001 
> 
> [51985.994229]  0001 7f3e0ae63000 8800be568d80 
> 0054
> [51985.994273]  88000318 88003584c318 8800840076c0 
> 8117b073
> [51985.994318] Call Trace:
> [51985.994336]  [] ? handle_mm_fault+0x13b3/0x1790
> [51985.994370]  [] ? up_write+0x21/0x30
> [51985.994400]  [] ? __do_page_fault+0x192/0x410
> [51985.994434]  [] ? page_fault+0x28/0x30
> [51985.994463] Code: 00 00 00 48 8b 54 24 10 49 3b 55 28 74 48 48 8b 44 24 18 
> 83 e8 01 29 d0 49 8d 04 c7 49 39 c7 74 19 49 83 c7 08 48 83 44 24 10 01 <49> 
> 83 3f 00 74 eb 4d 85 ff 0f 85 3b ff ff ff 48 8b 3c 24 48 8d 
> [51985.994656] RIP  [] filemap_map_pages+0x10d/0x290
> [51985.994692]  RSP 
> [51985.994711] CR2: 0008
> 
> ...
>
> Referring again to the RIP line from the trace.
> 
> [51985.994656] RIP  [] filemap_map_pages+0x10d/0x290
> 
> jeri@hudson:~/linux-next$ gdb vmlinux
> (gdb) list *0x8114a19d
> 0x8114a19d is in filemap_map_pages
> (include/linux/radix-tree.h:465).
> 460   unsigned size = radix_tree_chunk_size(iter) - 1;
> 461   
> 462   while (size--) {
> 463   slot++;
> 464   iter->index++;
> 465   if (likely(*slot))
> 466   return slot;
> 467   if (flags & RADIX_TREE_ITER_CONTIG) {
> 468   /* forbid switching to the next chunk */
> 469   iter->next_index = 0;
> (gdb)
> 
> Assuming I traced the addresses correctly, this indicates that the
> fault is triggered when the value in the slot pointer is accessed.
> Perhaps slot is being incremented beyond its valid range?

That's super helpful, thanks.

The faulting address was 0x0008, so radix_tree_next_slot()
was called with slot==NULL.

And looking at it, I don't see how this code can 

Re: [REGRESSION] mm: filemap_map_pages NULL pointer dereference

2016-02-05 Thread Andrew Morton
On Fri, 5 Feb 2016 10:05:02 -0800 Jeremiah Mahler  wrote:

> On a Lenovo X1 Carbon running -next (20160201+, 20160203+) I have
> experienced several system hangs.  I usually notice it first when
> my browser (Chrome) stops responding but then other programs will stop
> responding as well.  The only fix is a reboot.  It is sporadic but it
> will usually occur once a day.
> 
> In the logs there will be a
> 
>   unable to handle kernel NULL pointer dereference

This should fix it up.

From: Konstantin Khlebnikov 
Subject: radix-tree: fix oops after radix_tree_iter_retry

Helper radix_tree_iter_retry() resets next_index to the current index.  In
following radix_tree_next_slot current chunk size becomes zero.  This
isn't checked and it tries to dereference null pointer in slot.

Tagged iterator is fine because retry happens only at slot 0 where tag
bitmask in iter->tags is filled with single bit.

Fixes: 46437f9a554f ("radix-tree: fix race in gang lookup")
Signed-off-by: Konstantin Khlebnikov 
Cc: Matthew Wilcox 
Cc: Hugh Dickins 
Cc: Ohad Ben-Cohen 
Cc: Jeremiah Mahler 
Cc: 
Signed-off-by: Andrew Morton 
---

 include/linux/radix-tree.h |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff -puN 
include/linux/radix-tree.h~radix-tree-fix-oops-after-radix_tree_iter_retry 
include/linux/radix-tree.h
--- a/include/linux/radix-tree.h~radix-tree-fix-oops-after-radix_tree_iter_retry
+++ a/include/linux/radix-tree.h
@@ -400,7 +400,7 @@ void **radix_tree_iter_retry(struct radi
  * @iter:  pointer to radix tree iterator
  * Returns:current chunk size
  */
-static __always_inline unsigned
+static __always_inline long
 radix_tree_chunk_size(struct radix_tree_iter *iter)
 {
return iter->next_index - iter->index;
@@ -434,9 +434,9 @@ radix_tree_next_slot(void **slot, struct
return slot + offset + 1;
}
} else {
-   unsigned size = radix_tree_chunk_size(iter) - 1;
+   long size = radix_tree_chunk_size(iter);
 
-   while (size--) {
+   while (--size > 0) {
slot++;
iter->index++;
if (likely(*slot))
_