Re: NULL pointer dereference in the kernel 3.10

2017-04-10 Thread Willy Tarreau
On Mon, Apr 10, 2017 at 10:33:56PM +0800, zhong jiang wrote:
> On 2017/4/10 22:13, Willy Tarreau wrote:
> > On Mon, Apr 10, 2017 at 10:06:59PM +0800, zhong jiang wrote:
> >> On 2017/4/10 20:48, Michal Hocko wrote:
> >>> On Mon 10-04-17 20:10:06, zhong jiang wrote:
>  On 2017/4/10 16:56, Mel Gorman wrote:
> > On Sat, Apr 08, 2017 at 09:39:42PM +0800, zhong jiang wrote:
> >> when runing the stabile docker cases in the vm.   The following issue 
> >> will come up.
> >>
> >> #40 [8801b57ffb30] async_page_fault at 8165c9f8
> >> [exception RIP: down_read_trylock+5]
> >> RIP: 810aca65  RSP: 8801b57ffbe8  RFLAGS: 00010202
> >> RAX:   RBX: 88018ae858c1  RCX: 
> >> RDX:   RSI:   RDI: 0008
> >> RBP: 8801b57ffc10   R8: ea0006903de0   R9: 8800b3c61810
> >> R10: 22cb  R11:   R12: 88018ae858c0
> >> R13: ea0006903dc0  R14: 0008  R15: ea0006903dc0
> >> ORIG_RAX:   CS: 0010  SS: 
> > Post the full report including the kernel version and state whether any
> > additional patches to 3.10 are applied.
> >
>   Hi, Mel
> 
>  Our kernel from RHEL 7.2, Addtional patches all from upstream -- 
>  include Bugfix and CVE.
> >>> I believe you should contact Redhat for the support. This is a) old
> >>> kernel and b) with other patches which might or might not be relevant.
> >>   Ok, regardless of the kernel version, we just discuss the situation in 
> >> theory.  if commit
> >>   624483f3ea8  ("mm: rmap: fix use-after-free in __put_anon_vma")  is not 
> >> exist. the issue
> >>  will trigger . Any thought.
> > But this commit was backported into 3.10.43, so stable kernel users are 
> > safe.
> >
> > Regards,
> > Willy
> >
> > .
>   yes,  you are sure that the commit can fix the issue.

No, I have absolutely no opinion on either the commit nor the bug, what
I'm saying is that any up-to-date 3.10 contains the commit you mentionned,
so if that's the fix, you just need to ensure your kernel is up to date,
that's all.

Willy



Re: NULL pointer dereference in the kernel 3.10

2017-04-10 Thread zhong jiang
On 2017/4/10 22:13, Willy Tarreau wrote:
> On Mon, Apr 10, 2017 at 10:06:59PM +0800, zhong jiang wrote:
>> On 2017/4/10 20:48, Michal Hocko wrote:
>>> On Mon 10-04-17 20:10:06, zhong jiang wrote:
 On 2017/4/10 16:56, Mel Gorman wrote:
> On Sat, Apr 08, 2017 at 09:39:42PM +0800, zhong jiang wrote:
>> when runing the stabile docker cases in the vm.   The following issue 
>> will come up.
>>
>> #40 [8801b57ffb30] async_page_fault at 8165c9f8
>> [exception RIP: down_read_trylock+5]
>> RIP: 810aca65  RSP: 8801b57ffbe8  RFLAGS: 00010202
>> RAX:   RBX: 88018ae858c1  RCX: 
>> RDX:   RSI:   RDI: 0008
>> RBP: 8801b57ffc10   R8: ea0006903de0   R9: 8800b3c61810
>> R10: 22cb  R11:   R12: 88018ae858c0
>> R13: ea0006903dc0  R14: 0008  R15: ea0006903dc0
>> ORIG_RAX:   CS: 0010  SS: 
> Post the full report including the kernel version and state whether any
> additional patches to 3.10 are applied.
>
  Hi, Mel

 Our kernel from RHEL 7.2, Addtional patches all from upstream -- 
 include Bugfix and CVE.
>>> I believe you should contact Redhat for the support. This is a) old
>>> kernel and b) with other patches which might or might not be relevant.
>>   Ok, regardless of the kernel version, we just discuss the situation in 
>> theory.  if commit
>>   624483f3ea8  ("mm: rmap: fix use-after-free in __put_anon_vma")  is not 
>> exist. the issue
>>  will trigger . Any thought.
> But this commit was backported into 3.10.43, so stable kernel users are safe.
>
> Regards,
> Willy
>
> .
  yes,  you are sure that the commit can fix the issue.





Re: NULL pointer dereference in the kernel 3.10

2017-04-10 Thread zhong jiang
On 2017/4/10 22:06, Mel Gorman wrote:
> On Mon, Apr 10, 2017 at 08:10:06PM +0800, zhong jiang wrote:
>> On 2017/4/10 16:56, Mel Gorman wrote:
>>> On Sat, Apr 08, 2017 at 09:39:42PM +0800, zhong jiang wrote:
 when runing the stabile docker cases in the vm.   The following issue will 
 come up.

 #40 [8801b57ffb30] async_page_fault at 8165c9f8
 [exception RIP: down_read_trylock+5]
 RIP: 810aca65  RSP: 8801b57ffbe8  RFLAGS: 00010202
 RAX:   RBX: 88018ae858c1  RCX: 
 RDX:   RSI:   RDI: 0008
 RBP: 8801b57ffc10   R8: ea0006903de0   R9: 8800b3c61810
 R10: 22cb  R11:   R12: 88018ae858c0
 R13: ea0006903dc0  R14: 0008  R15: ea0006903dc0
 ORIG_RAX:   CS: 0010  SS: 
>>> Post the full report including the kernel version and state whether any
>>> additional patches to 3.10 are applied.
>>>
>>  Hi, Mel
>>
>> Our kernel from RHEL 7.2, Addtional patches all from upstream -- 
>> include Bugfix and CVE.
>>
>> Commit 624483f3ea8 ("mm: rmap: fix use-after-free in __put_anon_vma") 
>> exclude in
>> the RHEL 7.2. it looks seems to the issue. but I don't know how it triggered.
>> or it is not the correct fix.  Any suggestion? Thanks
>>
> I'm afraid you'll need to bring it up with RHEL support as it contains
> a number of backported patches from them that cannot be meaningfully
> evaluated outside of RedHat and they may have additional questions on the
> patches applied on top.
>
 Thanks



Re: NULL pointer dereference in the kernel 3.10

2017-04-10 Thread Willy Tarreau
On Mon, Apr 10, 2017 at 10:06:59PM +0800, zhong jiang wrote:
> On 2017/4/10 20:48, Michal Hocko wrote:
> > On Mon 10-04-17 20:10:06, zhong jiang wrote:
> >> On 2017/4/10 16:56, Mel Gorman wrote:
> >>> On Sat, Apr 08, 2017 at 09:39:42PM +0800, zhong jiang wrote:
>  when runing the stabile docker cases in the vm.   The following issue 
>  will come up.
> 
>  #40 [8801b57ffb30] async_page_fault at 8165c9f8
>  [exception RIP: down_read_trylock+5]
>  RIP: 810aca65  RSP: 8801b57ffbe8  RFLAGS: 00010202
>  RAX:   RBX: 88018ae858c1  RCX: 
>  RDX:   RSI:   RDI: 0008
>  RBP: 8801b57ffc10   R8: ea0006903de0   R9: 8800b3c61810
>  R10: 22cb  R11:   R12: 88018ae858c0
>  R13: ea0006903dc0  R14: 0008  R15: ea0006903dc0
>  ORIG_RAX:   CS: 0010  SS: 
> >>> Post the full report including the kernel version and state whether any
> >>> additional patches to 3.10 are applied.
> >>>
> >>  Hi, Mel
> >>
> >> Our kernel from RHEL 7.2, Addtional patches all from upstream -- 
> >> include Bugfix and CVE.
> > I believe you should contact Redhat for the support. This is a) old
> > kernel and b) with other patches which might or might not be relevant.
>   Ok, regardless of the kernel version, we just discuss the situation in 
> theory.  if commit
>   624483f3ea8  ("mm: rmap: fix use-after-free in __put_anon_vma")  is not 
> exist. the issue
>  will trigger . Any thought.

But this commit was backported into 3.10.43, so stable kernel users are safe.

Regards,
Willy


Re: NULL pointer dereference in the kernel 3.10

2017-04-10 Thread zhong jiang
On 2017/4/10 20:48, Michal Hocko wrote:
> On Mon 10-04-17 20:10:06, zhong jiang wrote:
>> On 2017/4/10 16:56, Mel Gorman wrote:
>>> On Sat, Apr 08, 2017 at 09:39:42PM +0800, zhong jiang wrote:
 when runing the stabile docker cases in the vm.   The following issue will 
 come up.

 #40 [8801b57ffb30] async_page_fault at 8165c9f8
 [exception RIP: down_read_trylock+5]
 RIP: 810aca65  RSP: 8801b57ffbe8  RFLAGS: 00010202
 RAX:   RBX: 88018ae858c1  RCX: 
 RDX:   RSI:   RDI: 0008
 RBP: 8801b57ffc10   R8: ea0006903de0   R9: 8800b3c61810
 R10: 22cb  R11:   R12: 88018ae858c0
 R13: ea0006903dc0  R14: 0008  R15: ea0006903dc0
 ORIG_RAX:   CS: 0010  SS: 
>>> Post the full report including the kernel version and state whether any
>>> additional patches to 3.10 are applied.
>>>
>>  Hi, Mel
>>
>> Our kernel from RHEL 7.2, Addtional patches all from upstream -- 
>> include Bugfix and CVE.
> I believe you should contact Redhat for the support. This is a) old
> kernel and b) with other patches which might or might not be relevant.
  Ok, regardless of the kernel version, we just discuss the situation in 
theory.  if commit
  624483f3ea8  ("mm: rmap: fix use-after-free in __put_anon_vma")  is not 
exist. the issue
 will trigger . Any thought.

Thanks
zhongjiang 



Re: NULL pointer dereference in the kernel 3.10

2017-04-10 Thread Mel Gorman
On Mon, Apr 10, 2017 at 08:10:06PM +0800, zhong jiang wrote:
> On 2017/4/10 16:56, Mel Gorman wrote:
> > On Sat, Apr 08, 2017 at 09:39:42PM +0800, zhong jiang wrote:
> >> when runing the stabile docker cases in the vm.   The following issue will 
> >> come up.
> >>
> >> #40 [8801b57ffb30] async_page_fault at 8165c9f8
> >> [exception RIP: down_read_trylock+5]
> >> RIP: 810aca65  RSP: 8801b57ffbe8  RFLAGS: 00010202
> >> RAX:   RBX: 88018ae858c1  RCX: 
> >> RDX:   RSI:   RDI: 0008
> >> RBP: 8801b57ffc10   R8: ea0006903de0   R9: 8800b3c61810
> >> R10: 22cb  R11:   R12: 88018ae858c0
> >> R13: ea0006903dc0  R14: 0008  R15: ea0006903dc0
> >> ORIG_RAX:   CS: 0010  SS: 
> > Post the full report including the kernel version and state whether any
> > additional patches to 3.10 are applied.
> >
>  Hi, Mel
>
> Our kernel from RHEL 7.2, Addtional patches all from upstream -- 
> include Bugfix and CVE.
> 
> Commit 624483f3ea8 ("mm: rmap: fix use-after-free in __put_anon_vma") exclude 
> in
> the RHEL 7.2. it looks seems to the issue. but I don't know how it triggered.
> or it is not the correct fix.  Any suggestion? Thanks
> 

I'm afraid you'll need to bring it up with RHEL support as it contains
a number of backported patches from them that cannot be meaningfully
evaluated outside of RedHat and they may have additional questions on the
patches applied on top.

-- 
Mel Gorman
SUSE Labs


Re: NULL pointer dereference in the kernel 3.10

2017-04-10 Thread Michal Hocko
On Mon 10-04-17 20:10:06, zhong jiang wrote:
> On 2017/4/10 16:56, Mel Gorman wrote:
> > On Sat, Apr 08, 2017 at 09:39:42PM +0800, zhong jiang wrote:
> >> when runing the stabile docker cases in the vm.   The following issue will 
> >> come up.
> >>
> >> #40 [8801b57ffb30] async_page_fault at 8165c9f8
> >> [exception RIP: down_read_trylock+5]
> >> RIP: 810aca65  RSP: 8801b57ffbe8  RFLAGS: 00010202
> >> RAX:   RBX: 88018ae858c1  RCX: 
> >> RDX:   RSI:   RDI: 0008
> >> RBP: 8801b57ffc10   R8: ea0006903de0   R9: 8800b3c61810
> >> R10: 22cb  R11:   R12: 88018ae858c0
> >> R13: ea0006903dc0  R14: 0008  R15: ea0006903dc0
> >> ORIG_RAX:   CS: 0010  SS: 
> > Post the full report including the kernel version and state whether any
> > additional patches to 3.10 are applied.
> >
>  Hi, Mel
>
> Our kernel from RHEL 7.2, Addtional patches all from upstream -- 
> include Bugfix and CVE.

I believe you should contact Redhat for the support. This is a) old
kernel and b) with other patches which might or might not be relevant.
-- 
Michal Hocko
SUSE Labs


Re: NULL pointer dereference in the kernel 3.10

2017-04-10 Thread zhong jiang
On 2017/4/10 16:56, Mel Gorman wrote:
> On Sat, Apr 08, 2017 at 09:39:42PM +0800, zhong jiang wrote:
>> when runing the stabile docker cases in the vm.   The following issue will 
>> come up.
>>
>> #40 [8801b57ffb30] async_page_fault at 8165c9f8
>> [exception RIP: down_read_trylock+5]
>> RIP: 810aca65  RSP: 8801b57ffbe8  RFLAGS: 00010202
>> RAX:   RBX: 88018ae858c1  RCX: 
>> RDX:   RSI:   RDI: 0008
>> RBP: 8801b57ffc10   R8: ea0006903de0   R9: 8800b3c61810
>> R10: 22cb  R11:   R12: 88018ae858c0
>> R13: ea0006903dc0  R14: 0008  R15: ea0006903dc0
>> ORIG_RAX:   CS: 0010  SS: 
> Post the full report including the kernel version and state whether any
> additional patches to 3.10 are applied.
>
 Hi, Mel
   
Our kernel from RHEL 7.2, Addtional patches all from upstream -- 
include Bugfix and CVE.

Commit 624483f3ea8 ("mm: rmap: fix use-after-free in __put_anon_vma") exclude in
the RHEL 7.2. it looks seems to the issue. but I don't know how it triggered.
or it is not the correct fix.  Any suggestion? Thanks


partly dmesg will print in the following.

[59982.162223] EXT4-fs (dm-6): mounted filesystem with ordered data mode. Opts: 
(null)
[59985.261635] device-mapper: ioctl: remove_all left 8 open device(s)
[59986.492174] EXT4-fs (dm-5): mounted filesystem with ordered data mode. Opts: 
(null)
[59987.445606] device-mapper: ioctl: remove_all left 8 open device(s)
[59987.625887] EXT4-fs (dm-6): mounted filesystem with ordered data mode. Opts: 
(null)
[59988.174600] device-mapper: ioctl: remove_all left 8 open device(s)
[59988.345667] EXT4-fs (dm-5): mounted filesystem with ordered data mode. Opts: 
(null)
[59990.951713] EXT4-fs (dm-6): mounted filesystem with ordered data mode. Opts: 
(null)
[59991.025185] device vethd295793 entered promiscuous mode
[59991.025253] IPv6: ADDRCONF(NETDEV_UP): vethd295793: link is not ready
[59991.860817] IPv6: ADDRCONF(NETDEV_CHANGE): vethd295793: link becomes ready
[59991.860836] docker0: port 4(vethd295793) entered forwarding state
[59991.860840] docker0: port 4(vethd295793) entered forwarding state
[59992.704027] docker0: port 4(vethd295793) entered disabled state
[59992.724049] EXT4-fs (dm-9): mounted filesystem with ordered data mode. Opts: 
(null)
[59993.098341] docker0: port 4(vethd295793) entered disabled state
[59993.102583] device vethd295793 left promiscuous mode
[59993.102605] docker0: port 4(vethd295793) entered disabled state
[59995.109048] EXT4-fs (dm-5): mounted filesystem with ordered data mode. Opts: 
(null)
[59995.229390] docker0: port 2(veth2ad76e2) entered disabled state
[59995.523997] docker0: port 2(veth2ad76e2) entered disabled state
[59995.528183] device veth2ad76e2 left promiscuous mode
[59995.528202] docker0: port 2(veth2ad76e2) entered disabled state
[59995.975559] device-mapper: ioctl: remove_all left 8 open device(s)
[59996.084575] EXT4-fs (dm-6): mounted filesystem with ordered data mode. Opts: 
(null)
[59996.660641] device-mapper: ioctl: remove_all left 7 open device(s)
[59997.109018] EXT4-fs (dm-4): mounted filesystem with ordered data mode. Opts: 
(null)
[59998.360101] EXT4-fs (dm-5): mounted filesystem with ordered data mode. Opts: 
(null)
[60001.721429] EXT4-fs (dm-6): mounted filesystem with ordered data mode. Opts: 
(null)
[60001.771433] device vethcca3b6a entered promiscuous mode
[60001.771643] IPv6: ADDRCONF(NETDEV_UP): vethcca3b6a: link is not ready
[60002.872102] IPv6: ADDRCONF(NETDEV_CHANGE): vethcca3b6a: link becomes ready
[60002.872124] docker0: port 2(vethcca3b6a) entered forwarding state
[60002.872130] docker0: port 2(vethcca3b6a) entered forwarding state
[60005.041654] EXT4-fs (dm-5): mounted filesystem with ordered data mode. Opts: 
(null)
[60005.597179] EXT4-fs (dm-5): mounted filesystem with ordered data mode. Opts: 
(null)
[60013.731728] [/usr/bin/os_rotate_and_save_log.sh]space of output directory is 
larger than 500M bytes,delete the oldest tar file 
messages-20170321181104-129.tar.bz2
[60016.243601] EXT4-fs (dm-5): mounted filesystem with ordered data mode. Opts: 
(null)
[60016.669594] device-mapper: ioctl: remove_all left 9 open device(s)
[60016.930232] EXT4-fs (dm-9): mounted filesystem with ordered data mode. Opts: 
(null)
[60017.918511] docker0: port 2(vethcca3b6a) entered forwarding state
[60022.197574] device-mapper: ioctl: remove_all left 8 open device(s)
[60022.575774] EXT4-fs (dm-4): mounted filesystem with ordered data mode. Opts: 
(null)
[60023.288744] EXT4-fs (dm-5): mounted filesystem with ordered data mode. Opts: 
(null)
[60024.282579] device-mapper: ioctl: remove_all left 8 open device(s)
[60024.505905] EXT4-fs (dm-4): mounted filesystem with ordered data mode. Opts: 
(null)
[60024.934311] EXT4-fs (dm-5): mounted filesystem with ordered data mode. Opts: 
(null)
[60025.168626] EXT4-fs (dm-8): 

Re: NULL pointer dereference in the kernel 3.10

2017-04-10 Thread Hillf Danton
On April 10, 2017 5:54 PM Xishi Qiu wrote: 
> On 2017/4/10 17:37, Hillf Danton wrote:
> 
> > On April 10, 2017 4:57 PM Xishi Qiu wrote:
> >> On 2017/4/10 14:42, Hillf Danton wrote:
> >>
> >>> On April 08, 2017 9:40 PM zhong Jiang wrote:
> 
>  when runing the stabile docker cases in the vm.   The following issue 
>  will come up.
> 
>  #40 [8801b57ffb30] async_page_fault at 8165c9f8
>  [exception RIP: down_read_trylock+5]
>  RIP: 810aca65  RSP: 8801b57ffbe8  RFLAGS: 00010202
>  RAX:   RBX: 88018ae858c1  RCX: 
>  RDX:   RSI:   RDI: 0008
>  RBP: 8801b57ffc10   R8: ea0006903de0   R9: 8800b3c61810
>  R10: 22cb  R11:   R12: 88018ae858c0
>  R13: ea0006903dc0  R14: 0008  R15: ea0006903dc0
>  ORIG_RAX:   CS: 0010  SS: 
>  #41 [8801b57ffbe8] page_lock_anon_vma_read at 811b241c
>  #42 [8801b57ffc18] page_referenced at 811b26a7
>  #43 [8801b57ffc90] shrink_active_list at 8118d634
>  #44 [8801b57ffd48] balance_pgdat at 8118f088
>  #45 [8801b57ffe20] kswapd at 8118f633
>  #46 [8801b57ffec8] kthread at 810a795f
>  #47 [8801b57fff50] ret_from_fork at 81665398
>  crash> struct page.mapping ea0006903dc0
>    mapping = 0x88018ae858c1
>  crash> struct anon_vma 0x88018ae858c0
>  struct anon_vma {
>    root = 0x0,
>    rwsem = {
>  count = 0,
>  wait_lock = {
>    raw_lock = {
>  {
>    head_tail = 1,
>    tickets = {
>  head = 1,
>  tail = 0
>    }
>  }
>    }
>  },
>  wait_list = {
>    next = 0x0,
>    prev = 0x0
>  }
>    },
>    refcount = {
>  counter = 0
>    },
>    rb_root = {
>  rb_node = 0x0
>    }
>  }
> 
>  This maks me wonder,  the anon_vma do not come from slab structure.
>  and the content is abnormal. IMO,  At least anon_vma->root will not NULL.
>  The issue can be reproduced every other week.
> 
> >>> Check please if commit
> >>> 624483f3ea8 ("mm: rmap: fix use-after-free in __put_anon_vma")
> >>> is included in the 3.10 you are running.
> >>>
> >> We missed this patch in RHEL 7.2
> >> Could you please give more details for how it triggered?
> >
> > Sorry, I could not.
> > I guess it is UAF as described in the log of that commit.
> > And if it works for you, we know how.
> >
> > Hillf
> >
> 
> __put_anon_vma|   page_lock_anon_vma_read
>   anon_vma_free(root) |
>   | root_anon_vma = ACCESS_ONCE(anon_vma->root)
>   | down_read_trylock(&root_anon_vma->rwsem)
>   anon_vma_free(anon_vma) |
> 
> I find anon_vma was created by SLAB_DESTROY_BY_RCU, so it will not merge
> by other slabs, and free_slab() will not free it during 
> page_lock_anon_vma_read(),
> because it holds rcu_read_lock(), right?
> 
Dunno frankly, Sir, you know, I am not an rmap expert like you.
And pretty much probable I made a wrong guess, and sorry again.

> If root_anon_vma was reuse by someone, why "crash> struct anon_vma"
> shows almost zero?
> 
thank you very much
Hillf



Re: NULL pointer dereference in the kernel 3.10

2017-04-10 Thread Xishi Qiu
On 2017/4/10 17:37, Hillf Danton wrote:

> On April 10, 2017 4:57 PM Xishi Qiu wrote: 
>> On 2017/4/10 14:42, Hillf Danton wrote:
>>
>>> On April 08, 2017 9:40 PM zhong Jiang wrote:

 when runing the stabile docker cases in the vm.   The following issue will 
 come up.

 #40 [8801b57ffb30] async_page_fault at 8165c9f8
 [exception RIP: down_read_trylock+5]
 RIP: 810aca65  RSP: 8801b57ffbe8  RFLAGS: 00010202
 RAX:   RBX: 88018ae858c1  RCX: 
 RDX:   RSI:   RDI: 0008
 RBP: 8801b57ffc10   R8: ea0006903de0   R9: 8800b3c61810
 R10: 22cb  R11:   R12: 88018ae858c0
 R13: ea0006903dc0  R14: 0008  R15: ea0006903dc0
 ORIG_RAX:   CS: 0010  SS: 
 #41 [8801b57ffbe8] page_lock_anon_vma_read at 811b241c
 #42 [8801b57ffc18] page_referenced at 811b26a7
 #43 [8801b57ffc90] shrink_active_list at 8118d634
 #44 [8801b57ffd48] balance_pgdat at 8118f088
 #45 [8801b57ffe20] kswapd at 8118f633
 #46 [8801b57ffec8] kthread at 810a795f
 #47 [8801b57fff50] ret_from_fork at 81665398
 crash> struct page.mapping ea0006903dc0
   mapping = 0x88018ae858c1
 crash> struct anon_vma 0x88018ae858c0
 struct anon_vma {
   root = 0x0,
   rwsem = {
 count = 0,
 wait_lock = {
   raw_lock = {
 {
   head_tail = 1,
   tickets = {
 head = 1,
 tail = 0
   }
 }
   }
 },
 wait_list = {
   next = 0x0,
   prev = 0x0
 }
   },
   refcount = {
 counter = 0
   },
   rb_root = {
 rb_node = 0x0
   }
 }

 This maks me wonder,  the anon_vma do not come from slab structure.
 and the content is abnormal. IMO,  At least anon_vma->root will not NULL.
 The issue can be reproduced every other week.

>>> Check please if commit
>>> 624483f3ea8 ("mm: rmap: fix use-after-free in __put_anon_vma")
>>> is included in the 3.10 you are running.
>>>
>> We missed this patch in RHEL 7.2
>> Could you please give more details for how it triggered?
> 
> Sorry, I could not. 
> I guess it is UAF as described in the log of that commit.
> And if it works for you, we know how.
> 
> Hillf
> 

__put_anon_vma|   page_lock_anon_vma_read
  anon_vma_free(root) |
  | root_anon_vma = ACCESS_ONCE(anon_vma->root)
  | down_read_trylock(&root_anon_vma->rwsem)
  anon_vma_free(anon_vma) |

I find anon_vma was created by SLAB_DESTROY_BY_RCU, so it will not merge
by other slabs, and free_slab() will not free it during 
page_lock_anon_vma_read(),
because it holds rcu_read_lock(), right?

If root_anon_vma was reuse by someone, why "crash> struct anon_vma"
shows almost zero?

Thanks,
Xishi Qiu

> 
> 
> 
> .
> 





Re: NULL pointer dereference in the kernel 3.10

2017-04-10 Thread Hillf Danton
On April 10, 2017 4:57 PM Xishi Qiu wrote: 
> On 2017/4/10 14:42, Hillf Danton wrote:
> 
> > On April 08, 2017 9:40 PM zhong Jiang wrote:
> >>
> >> when runing the stabile docker cases in the vm.   The following issue will 
> >> come up.
> >>
> >> #40 [8801b57ffb30] async_page_fault at 8165c9f8
> >> [exception RIP: down_read_trylock+5]
> >> RIP: 810aca65  RSP: 8801b57ffbe8  RFLAGS: 00010202
> >> RAX:   RBX: 88018ae858c1  RCX: 
> >> RDX:   RSI:   RDI: 0008
> >> RBP: 8801b57ffc10   R8: ea0006903de0   R9: 8800b3c61810
> >> R10: 22cb  R11:   R12: 88018ae858c0
> >> R13: ea0006903dc0  R14: 0008  R15: ea0006903dc0
> >> ORIG_RAX:   CS: 0010  SS: 
> >> #41 [8801b57ffbe8] page_lock_anon_vma_read at 811b241c
> >> #42 [8801b57ffc18] page_referenced at 811b26a7
> >> #43 [8801b57ffc90] shrink_active_list at 8118d634
> >> #44 [8801b57ffd48] balance_pgdat at 8118f088
> >> #45 [8801b57ffe20] kswapd at 8118f633
> >> #46 [8801b57ffec8] kthread at 810a795f
> >> #47 [8801b57fff50] ret_from_fork at 81665398
> >> crash> struct page.mapping ea0006903dc0
> >>   mapping = 0x88018ae858c1
> >> crash> struct anon_vma 0x88018ae858c0
> >> struct anon_vma {
> >>   root = 0x0,
> >>   rwsem = {
> >> count = 0,
> >> wait_lock = {
> >>   raw_lock = {
> >> {
> >>   head_tail = 1,
> >>   tickets = {
> >> head = 1,
> >> tail = 0
> >>   }
> >> }
> >>   }
> >> },
> >> wait_list = {
> >>   next = 0x0,
> >>   prev = 0x0
> >> }
> >>   },
> >>   refcount = {
> >> counter = 0
> >>   },
> >>   rb_root = {
> >> rb_node = 0x0
> >>   }
> >> }
> >>
> >> This maks me wonder,  the anon_vma do not come from slab structure.
> >> and the content is abnormal. IMO,  At least anon_vma->root will not NULL.
> >> The issue can be reproduced every other week.
> >>
> > Check please if commit
> > 624483f3ea8 ("mm: rmap: fix use-after-free in __put_anon_vma")
> > is included in the 3.10 you are running.
> >
> We missed this patch in RHEL 7.2
> Could you please give more details for how it triggered?

Sorry, I could not. 
I guess it is UAF as described in the log of that commit.
And if it works for you, we know how.

Hillf





Re: NULL pointer dereference in the kernel 3.10

2017-04-10 Thread Xishi Qiu
On 2017/4/10 14:42, Hillf Danton wrote:

> On April 08, 2017 9:40 PM zhong Jiang wrote: 
>>
>> when runing the stabile docker cases in the vm.   The following issue will 
>> come up.
>>
>> #40 [8801b57ffb30] async_page_fault at 8165c9f8
>> [exception RIP: down_read_trylock+5]
>> RIP: 810aca65  RSP: 8801b57ffbe8  RFLAGS: 00010202
>> RAX:   RBX: 88018ae858c1  RCX: 
>> RDX:   RSI:   RDI: 0008
>> RBP: 8801b57ffc10   R8: ea0006903de0   R9: 8800b3c61810
>> R10: 22cb  R11:   R12: 88018ae858c0
>> R13: ea0006903dc0  R14: 0008  R15: ea0006903dc0
>> ORIG_RAX:   CS: 0010  SS: 
>> #41 [8801b57ffbe8] page_lock_anon_vma_read at 811b241c
>> #42 [8801b57ffc18] page_referenced at 811b26a7
>> #43 [8801b57ffc90] shrink_active_list at 8118d634
>> #44 [8801b57ffd48] balance_pgdat at 8118f088
>> #45 [8801b57ffe20] kswapd at 8118f633
>> #46 [8801b57ffec8] kthread at 810a795f
>> #47 [8801b57fff50] ret_from_fork at 81665398
>> crash> struct page.mapping ea0006903dc0
>>   mapping = 0x88018ae858c1
>> crash> struct anon_vma 0x88018ae858c0
>> struct anon_vma {
>>   root = 0x0,
>>   rwsem = {
>> count = 0,
>> wait_lock = {
>>   raw_lock = {
>> {
>>   head_tail = 1,
>>   tickets = {
>> head = 1,
>> tail = 0
>>   }
>> }
>>   }
>> },
>> wait_list = {
>>   next = 0x0,
>>   prev = 0x0
>> }
>>   },
>>   refcount = {
>> counter = 0
>>   },
>>   rb_root = {
>> rb_node = 0x0
>>   }
>> }
>>
>> This maks me wonder,  the anon_vma do not come from slab structure.
>> and the content is abnormal. IMO,  At least anon_vma->root will not NULL.
>> The issue can be reproduced every other week.
>>
> Check please if commit 
> 624483f3ea8 ("mm: rmap: fix use-after-free in __put_anon_vma")
> is included in the 3.10 you are running.
> 

Hi Hillf,

We missed this patch in RHEL 7.2
Could you please give more details for how it triggered?

Thanks,
Xishi QIu

> btw, why not run the mainline?
> 
> Hillf
> 
> 
> 
> .
> 





Re: NULL pointer dereference in the kernel 3.10

2017-04-10 Thread Mel Gorman
On Sat, Apr 08, 2017 at 09:39:42PM +0800, zhong jiang wrote:
> when runing the stabile docker cases in the vm.   The following issue will 
> come up.
> 
> #40 [8801b57ffb30] async_page_fault at 8165c9f8
> [exception RIP: down_read_trylock+5]
> RIP: 810aca65  RSP: 8801b57ffbe8  RFLAGS: 00010202
> RAX:   RBX: 88018ae858c1  RCX: 
> RDX:   RSI:   RDI: 0008
> RBP: 8801b57ffc10   R8: ea0006903de0   R9: 8800b3c61810
> R10: 22cb  R11:   R12: 88018ae858c0
> R13: ea0006903dc0  R14: 0008  R15: ea0006903dc0
> ORIG_RAX:   CS: 0010  SS: 

Post the full report including the kernel version and state whether any
additional patches to 3.10 are applied.

-- 
Mel Gorman
SUSE Labs


Re: NULL pointer dereference in the kernel 3.10

2017-04-09 Thread Hillf Danton
On April 08, 2017 9:40 PM zhong Jiang wrote: 
> 
> when runing the stabile docker cases in the vm.   The following issue will 
> come up.
> 
> #40 [8801b57ffb30] async_page_fault at 8165c9f8
> [exception RIP: down_read_trylock+5]
> RIP: 810aca65  RSP: 8801b57ffbe8  RFLAGS: 00010202
> RAX:   RBX: 88018ae858c1  RCX: 
> RDX:   RSI:   RDI: 0008
> RBP: 8801b57ffc10   R8: ea0006903de0   R9: 8800b3c61810
> R10: 22cb  R11:   R12: 88018ae858c0
> R13: ea0006903dc0  R14: 0008  R15: ea0006903dc0
> ORIG_RAX:   CS: 0010  SS: 
> #41 [8801b57ffbe8] page_lock_anon_vma_read at 811b241c
> #42 [8801b57ffc18] page_referenced at 811b26a7
> #43 [8801b57ffc90] shrink_active_list at 8118d634
> #44 [8801b57ffd48] balance_pgdat at 8118f088
> #45 [8801b57ffe20] kswapd at 8118f633
> #46 [8801b57ffec8] kthread at 810a795f
> #47 [8801b57fff50] ret_from_fork at 81665398
> crash> struct page.mapping ea0006903dc0
>   mapping = 0x88018ae858c1
> crash> struct anon_vma 0x88018ae858c0
> struct anon_vma {
>   root = 0x0,
>   rwsem = {
> count = 0,
> wait_lock = {
>   raw_lock = {
> {
>   head_tail = 1,
>   tickets = {
> head = 1,
> tail = 0
>   }
> }
>   }
> },
> wait_list = {
>   next = 0x0,
>   prev = 0x0
> }
>   },
>   refcount = {
> counter = 0
>   },
>   rb_root = {
> rb_node = 0x0
>   }
> }
> 
> This maks me wonder,  the anon_vma do not come from slab structure.
> and the content is abnormal. IMO,  At least anon_vma->root will not NULL.
> The issue can be reproduced every other week.
> 
Check please if commit 
624483f3ea8 ("mm: rmap: fix use-after-free in __put_anon_vma")
is included in the 3.10 you are running.

btw, why not run the mainline?

Hillf




NULL pointer dereference in the kernel 3.10

2017-04-08 Thread zhong jiang
when runing the stabile docker cases in the vm.   The following issue will come 
up.

#40 [8801b57ffb30] async_page_fault at 8165c9f8
[exception RIP: down_read_trylock+5]
RIP: 810aca65  RSP: 8801b57ffbe8  RFLAGS: 00010202
RAX:   RBX: 88018ae858c1  RCX: 
RDX:   RSI:   RDI: 0008
RBP: 8801b57ffc10   R8: ea0006903de0   R9: 8800b3c61810
R10: 22cb  R11:   R12: 88018ae858c0
R13: ea0006903dc0  R14: 0008  R15: ea0006903dc0
ORIG_RAX:   CS: 0010  SS: 
#41 [8801b57ffbe8] page_lock_anon_vma_read at 811b241c
#42 [8801b57ffc18] page_referenced at 811b26a7
#43 [8801b57ffc90] shrink_active_list at 8118d634
#44 [8801b57ffd48] balance_pgdat at 8118f088
#45 [8801b57ffe20] kswapd at 8118f633
#46 [8801b57ffec8] kthread at 810a795f
#47 [8801b57fff50] ret_from_fork at 81665398
crash> struct page.mapping ea0006903dc0
  mapping = 0x88018ae858c1
crash> struct anon_vma 0x88018ae858c0
struct anon_vma {
  root = 0x0,
  rwsem = {
count = 0,
wait_lock = {
  raw_lock = {
{
  head_tail = 1,
  tickets = {
head = 1,
tail = 0
  }
}
  }
},
wait_list = {
  next = 0x0,
  prev = 0x0
}
  },
  refcount = {
counter = 0
  },
  rb_root = {
rb_node = 0x0
  }
}

This maks me wonder,  the anon_vma do not come from slab structure.
and the content is abnormal. IMO,  At least anon_vma->root will not NULL.
The issue can be reproduced every other week.

Any comments will be appreciated.

Thanks
zhongjiang