Re: NULL pointer dereference in the kernel 3.10
On Mon, Apr 10, 2017 at 10:33:56PM +0800, zhong jiang wrote: > On 2017/4/10 22:13, Willy Tarreau wrote: > > On Mon, Apr 10, 2017 at 10:06:59PM +0800, zhong jiang wrote: > >> On 2017/4/10 20:48, Michal Hocko wrote: > >>> On Mon 10-04-17 20:10:06, zhong jiang wrote: > On 2017/4/10 16:56, Mel Gorman wrote: > > On Sat, Apr 08, 2017 at 09:39:42PM +0800, zhong jiang wrote: > >> when runing the stabile docker cases in the vm. The following issue > >> will come up. > >> > >> #40 [8801b57ffb30] async_page_fault at 8165c9f8 > >> [exception RIP: down_read_trylock+5] > >> RIP: 810aca65 RSP: 8801b57ffbe8 RFLAGS: 00010202 > >> RAX: RBX: 88018ae858c1 RCX: > >> RDX: RSI: RDI: 0008 > >> RBP: 8801b57ffc10 R8: ea0006903de0 R9: 8800b3c61810 > >> R10: 22cb R11: R12: 88018ae858c0 > >> R13: ea0006903dc0 R14: 0008 R15: ea0006903dc0 > >> ORIG_RAX: CS: 0010 SS: > > Post the full report including the kernel version and state whether any > > additional patches to 3.10 are applied. > > > Hi, Mel > > Our kernel from RHEL 7.2, Addtional patches all from upstream -- > include Bugfix and CVE. > >>> I believe you should contact Redhat for the support. This is a) old > >>> kernel and b) with other patches which might or might not be relevant. > >> Ok, regardless of the kernel version, we just discuss the situation in > >> theory. if commit > >> 624483f3ea8 ("mm: rmap: fix use-after-free in __put_anon_vma") is not > >> exist. the issue > >> will trigger . Any thought. > > But this commit was backported into 3.10.43, so stable kernel users are > > safe. > > > > Regards, > > Willy > > > > . > yes, you are sure that the commit can fix the issue. No, I have absolutely no opinion on either the commit nor the bug, what I'm saying is that any up-to-date 3.10 contains the commit you mentionned, so if that's the fix, you just need to ensure your kernel is up to date, that's all. Willy
Re: NULL pointer dereference in the kernel 3.10
On 2017/4/10 22:13, Willy Tarreau wrote: > On Mon, Apr 10, 2017 at 10:06:59PM +0800, zhong jiang wrote: >> On 2017/4/10 20:48, Michal Hocko wrote: >>> On Mon 10-04-17 20:10:06, zhong jiang wrote: On 2017/4/10 16:56, Mel Gorman wrote: > On Sat, Apr 08, 2017 at 09:39:42PM +0800, zhong jiang wrote: >> when runing the stabile docker cases in the vm. The following issue >> will come up. >> >> #40 [8801b57ffb30] async_page_fault at 8165c9f8 >> [exception RIP: down_read_trylock+5] >> RIP: 810aca65 RSP: 8801b57ffbe8 RFLAGS: 00010202 >> RAX: RBX: 88018ae858c1 RCX: >> RDX: RSI: RDI: 0008 >> RBP: 8801b57ffc10 R8: ea0006903de0 R9: 8800b3c61810 >> R10: 22cb R11: R12: 88018ae858c0 >> R13: ea0006903dc0 R14: 0008 R15: ea0006903dc0 >> ORIG_RAX: CS: 0010 SS: > Post the full report including the kernel version and state whether any > additional patches to 3.10 are applied. > Hi, Mel Our kernel from RHEL 7.2, Addtional patches all from upstream -- include Bugfix and CVE. >>> I believe you should contact Redhat for the support. This is a) old >>> kernel and b) with other patches which might or might not be relevant. >> Ok, regardless of the kernel version, we just discuss the situation in >> theory. if commit >> 624483f3ea8 ("mm: rmap: fix use-after-free in __put_anon_vma") is not >> exist. the issue >> will trigger . Any thought. > But this commit was backported into 3.10.43, so stable kernel users are safe. > > Regards, > Willy > > . yes, you are sure that the commit can fix the issue.
Re: NULL pointer dereference in the kernel 3.10
On 2017/4/10 22:06, Mel Gorman wrote: > On Mon, Apr 10, 2017 at 08:10:06PM +0800, zhong jiang wrote: >> On 2017/4/10 16:56, Mel Gorman wrote: >>> On Sat, Apr 08, 2017 at 09:39:42PM +0800, zhong jiang wrote: when runing the stabile docker cases in the vm. The following issue will come up. #40 [8801b57ffb30] async_page_fault at 8165c9f8 [exception RIP: down_read_trylock+5] RIP: 810aca65 RSP: 8801b57ffbe8 RFLAGS: 00010202 RAX: RBX: 88018ae858c1 RCX: RDX: RSI: RDI: 0008 RBP: 8801b57ffc10 R8: ea0006903de0 R9: 8800b3c61810 R10: 22cb R11: R12: 88018ae858c0 R13: ea0006903dc0 R14: 0008 R15: ea0006903dc0 ORIG_RAX: CS: 0010 SS: >>> Post the full report including the kernel version and state whether any >>> additional patches to 3.10 are applied. >>> >> Hi, Mel >> >> Our kernel from RHEL 7.2, Addtional patches all from upstream -- >> include Bugfix and CVE. >> >> Commit 624483f3ea8 ("mm: rmap: fix use-after-free in __put_anon_vma") >> exclude in >> the RHEL 7.2. it looks seems to the issue. but I don't know how it triggered. >> or it is not the correct fix. Any suggestion? Thanks >> > I'm afraid you'll need to bring it up with RHEL support as it contains > a number of backported patches from them that cannot be meaningfully > evaluated outside of RedHat and they may have additional questions on the > patches applied on top. > Thanks
Re: NULL pointer dereference in the kernel 3.10
On Mon, Apr 10, 2017 at 10:06:59PM +0800, zhong jiang wrote: > On 2017/4/10 20:48, Michal Hocko wrote: > > On Mon 10-04-17 20:10:06, zhong jiang wrote: > >> On 2017/4/10 16:56, Mel Gorman wrote: > >>> On Sat, Apr 08, 2017 at 09:39:42PM +0800, zhong jiang wrote: > when runing the stabile docker cases in the vm. The following issue > will come up. > > #40 [8801b57ffb30] async_page_fault at 8165c9f8 > [exception RIP: down_read_trylock+5] > RIP: 810aca65 RSP: 8801b57ffbe8 RFLAGS: 00010202 > RAX: RBX: 88018ae858c1 RCX: > RDX: RSI: RDI: 0008 > RBP: 8801b57ffc10 R8: ea0006903de0 R9: 8800b3c61810 > R10: 22cb R11: R12: 88018ae858c0 > R13: ea0006903dc0 R14: 0008 R15: ea0006903dc0 > ORIG_RAX: CS: 0010 SS: > >>> Post the full report including the kernel version and state whether any > >>> additional patches to 3.10 are applied. > >>> > >> Hi, Mel > >> > >> Our kernel from RHEL 7.2, Addtional patches all from upstream -- > >> include Bugfix and CVE. > > I believe you should contact Redhat for the support. This is a) old > > kernel and b) with other patches which might or might not be relevant. > Ok, regardless of the kernel version, we just discuss the situation in > theory. if commit > 624483f3ea8 ("mm: rmap: fix use-after-free in __put_anon_vma") is not > exist. the issue > will trigger . Any thought. But this commit was backported into 3.10.43, so stable kernel users are safe. Regards, Willy
Re: NULL pointer dereference in the kernel 3.10
On 2017/4/10 20:48, Michal Hocko wrote: > On Mon 10-04-17 20:10:06, zhong jiang wrote: >> On 2017/4/10 16:56, Mel Gorman wrote: >>> On Sat, Apr 08, 2017 at 09:39:42PM +0800, zhong jiang wrote: when runing the stabile docker cases in the vm. The following issue will come up. #40 [8801b57ffb30] async_page_fault at 8165c9f8 [exception RIP: down_read_trylock+5] RIP: 810aca65 RSP: 8801b57ffbe8 RFLAGS: 00010202 RAX: RBX: 88018ae858c1 RCX: RDX: RSI: RDI: 0008 RBP: 8801b57ffc10 R8: ea0006903de0 R9: 8800b3c61810 R10: 22cb R11: R12: 88018ae858c0 R13: ea0006903dc0 R14: 0008 R15: ea0006903dc0 ORIG_RAX: CS: 0010 SS: >>> Post the full report including the kernel version and state whether any >>> additional patches to 3.10 are applied. >>> >> Hi, Mel >> >> Our kernel from RHEL 7.2, Addtional patches all from upstream -- >> include Bugfix and CVE. > I believe you should contact Redhat for the support. This is a) old > kernel and b) with other patches which might or might not be relevant. Ok, regardless of the kernel version, we just discuss the situation in theory. if commit 624483f3ea8 ("mm: rmap: fix use-after-free in __put_anon_vma") is not exist. the issue will trigger . Any thought. Thanks zhongjiang
Re: NULL pointer dereference in the kernel 3.10
On Mon, Apr 10, 2017 at 08:10:06PM +0800, zhong jiang wrote: > On 2017/4/10 16:56, Mel Gorman wrote: > > On Sat, Apr 08, 2017 at 09:39:42PM +0800, zhong jiang wrote: > >> when runing the stabile docker cases in the vm. The following issue will > >> come up. > >> > >> #40 [8801b57ffb30] async_page_fault at 8165c9f8 > >> [exception RIP: down_read_trylock+5] > >> RIP: 810aca65 RSP: 8801b57ffbe8 RFLAGS: 00010202 > >> RAX: RBX: 88018ae858c1 RCX: > >> RDX: RSI: RDI: 0008 > >> RBP: 8801b57ffc10 R8: ea0006903de0 R9: 8800b3c61810 > >> R10: 22cb R11: R12: 88018ae858c0 > >> R13: ea0006903dc0 R14: 0008 R15: ea0006903dc0 > >> ORIG_RAX: CS: 0010 SS: > > Post the full report including the kernel version and state whether any > > additional patches to 3.10 are applied. > > > Hi, Mel > > Our kernel from RHEL 7.2, Addtional patches all from upstream -- > include Bugfix and CVE. > > Commit 624483f3ea8 ("mm: rmap: fix use-after-free in __put_anon_vma") exclude > in > the RHEL 7.2. it looks seems to the issue. but I don't know how it triggered. > or it is not the correct fix. Any suggestion? Thanks > I'm afraid you'll need to bring it up with RHEL support as it contains a number of backported patches from them that cannot be meaningfully evaluated outside of RedHat and they may have additional questions on the patches applied on top. -- Mel Gorman SUSE Labs
Re: NULL pointer dereference in the kernel 3.10
On Mon 10-04-17 20:10:06, zhong jiang wrote: > On 2017/4/10 16:56, Mel Gorman wrote: > > On Sat, Apr 08, 2017 at 09:39:42PM +0800, zhong jiang wrote: > >> when runing the stabile docker cases in the vm. The following issue will > >> come up. > >> > >> #40 [8801b57ffb30] async_page_fault at 8165c9f8 > >> [exception RIP: down_read_trylock+5] > >> RIP: 810aca65 RSP: 8801b57ffbe8 RFLAGS: 00010202 > >> RAX: RBX: 88018ae858c1 RCX: > >> RDX: RSI: RDI: 0008 > >> RBP: 8801b57ffc10 R8: ea0006903de0 R9: 8800b3c61810 > >> R10: 22cb R11: R12: 88018ae858c0 > >> R13: ea0006903dc0 R14: 0008 R15: ea0006903dc0 > >> ORIG_RAX: CS: 0010 SS: > > Post the full report including the kernel version and state whether any > > additional patches to 3.10 are applied. > > > Hi, Mel > > Our kernel from RHEL 7.2, Addtional patches all from upstream -- > include Bugfix and CVE. I believe you should contact Redhat for the support. This is a) old kernel and b) with other patches which might or might not be relevant. -- Michal Hocko SUSE Labs
Re: NULL pointer dereference in the kernel 3.10
On 2017/4/10 16:56, Mel Gorman wrote: > On Sat, Apr 08, 2017 at 09:39:42PM +0800, zhong jiang wrote: >> when runing the stabile docker cases in the vm. The following issue will >> come up. >> >> #40 [8801b57ffb30] async_page_fault at 8165c9f8 >> [exception RIP: down_read_trylock+5] >> RIP: 810aca65 RSP: 8801b57ffbe8 RFLAGS: 00010202 >> RAX: RBX: 88018ae858c1 RCX: >> RDX: RSI: RDI: 0008 >> RBP: 8801b57ffc10 R8: ea0006903de0 R9: 8800b3c61810 >> R10: 22cb R11: R12: 88018ae858c0 >> R13: ea0006903dc0 R14: 0008 R15: ea0006903dc0 >> ORIG_RAX: CS: 0010 SS: > Post the full report including the kernel version and state whether any > additional patches to 3.10 are applied. > Hi, Mel Our kernel from RHEL 7.2, Addtional patches all from upstream -- include Bugfix and CVE. Commit 624483f3ea8 ("mm: rmap: fix use-after-free in __put_anon_vma") exclude in the RHEL 7.2. it looks seems to the issue. but I don't know how it triggered. or it is not the correct fix. Any suggestion? Thanks partly dmesg will print in the following. [59982.162223] EXT4-fs (dm-6): mounted filesystem with ordered data mode. Opts: (null) [59985.261635] device-mapper: ioctl: remove_all left 8 open device(s) [59986.492174] EXT4-fs (dm-5): mounted filesystem with ordered data mode. Opts: (null) [59987.445606] device-mapper: ioctl: remove_all left 8 open device(s) [59987.625887] EXT4-fs (dm-6): mounted filesystem with ordered data mode. Opts: (null) [59988.174600] device-mapper: ioctl: remove_all left 8 open device(s) [59988.345667] EXT4-fs (dm-5): mounted filesystem with ordered data mode. Opts: (null) [59990.951713] EXT4-fs (dm-6): mounted filesystem with ordered data mode. Opts: (null) [59991.025185] device vethd295793 entered promiscuous mode [59991.025253] IPv6: ADDRCONF(NETDEV_UP): vethd295793: link is not ready [59991.860817] IPv6: ADDRCONF(NETDEV_CHANGE): vethd295793: link becomes ready [59991.860836] docker0: port 4(vethd295793) entered forwarding state [59991.860840] docker0: port 4(vethd295793) entered forwarding state [59992.704027] docker0: port 4(vethd295793) entered disabled state [59992.724049] EXT4-fs (dm-9): mounted filesystem with ordered data mode. Opts: (null) [59993.098341] docker0: port 4(vethd295793) entered disabled state [59993.102583] device vethd295793 left promiscuous mode [59993.102605] docker0: port 4(vethd295793) entered disabled state [59995.109048] EXT4-fs (dm-5): mounted filesystem with ordered data mode. Opts: (null) [59995.229390] docker0: port 2(veth2ad76e2) entered disabled state [59995.523997] docker0: port 2(veth2ad76e2) entered disabled state [59995.528183] device veth2ad76e2 left promiscuous mode [59995.528202] docker0: port 2(veth2ad76e2) entered disabled state [59995.975559] device-mapper: ioctl: remove_all left 8 open device(s) [59996.084575] EXT4-fs (dm-6): mounted filesystem with ordered data mode. Opts: (null) [59996.660641] device-mapper: ioctl: remove_all left 7 open device(s) [59997.109018] EXT4-fs (dm-4): mounted filesystem with ordered data mode. Opts: (null) [59998.360101] EXT4-fs (dm-5): mounted filesystem with ordered data mode. Opts: (null) [60001.721429] EXT4-fs (dm-6): mounted filesystem with ordered data mode. Opts: (null) [60001.771433] device vethcca3b6a entered promiscuous mode [60001.771643] IPv6: ADDRCONF(NETDEV_UP): vethcca3b6a: link is not ready [60002.872102] IPv6: ADDRCONF(NETDEV_CHANGE): vethcca3b6a: link becomes ready [60002.872124] docker0: port 2(vethcca3b6a) entered forwarding state [60002.872130] docker0: port 2(vethcca3b6a) entered forwarding state [60005.041654] EXT4-fs (dm-5): mounted filesystem with ordered data mode. Opts: (null) [60005.597179] EXT4-fs (dm-5): mounted filesystem with ordered data mode. Opts: (null) [60013.731728] [/usr/bin/os_rotate_and_save_log.sh]space of output directory is larger than 500M bytes,delete the oldest tar file messages-20170321181104-129.tar.bz2 [60016.243601] EXT4-fs (dm-5): mounted filesystem with ordered data mode. Opts: (null) [60016.669594] device-mapper: ioctl: remove_all left 9 open device(s) [60016.930232] EXT4-fs (dm-9): mounted filesystem with ordered data mode. Opts: (null) [60017.918511] docker0: port 2(vethcca3b6a) entered forwarding state [60022.197574] device-mapper: ioctl: remove_all left 8 open device(s) [60022.575774] EXT4-fs (dm-4): mounted filesystem with ordered data mode. Opts: (null) [60023.288744] EXT4-fs (dm-5): mounted filesystem with ordered data mode. Opts: (null) [60024.282579] device-mapper: ioctl: remove_all left 8 open device(s) [60024.505905] EXT4-fs (dm-4): mounted filesystem with ordered data mode. Opts: (null) [60024.934311] EXT4-fs (dm-5): mounted filesystem with ordered data mode. Opts: (null) [60025.168626] EXT4-fs (dm-8):
Re: NULL pointer dereference in the kernel 3.10
On April 10, 2017 5:54 PM Xishi Qiu wrote: > On 2017/4/10 17:37, Hillf Danton wrote: > > > On April 10, 2017 4:57 PM Xishi Qiu wrote: > >> On 2017/4/10 14:42, Hillf Danton wrote: > >> > >>> On April 08, 2017 9:40 PM zhong Jiang wrote: > > when runing the stabile docker cases in the vm. The following issue > will come up. > > #40 [8801b57ffb30] async_page_fault at 8165c9f8 > [exception RIP: down_read_trylock+5] > RIP: 810aca65 RSP: 8801b57ffbe8 RFLAGS: 00010202 > RAX: RBX: 88018ae858c1 RCX: > RDX: RSI: RDI: 0008 > RBP: 8801b57ffc10 R8: ea0006903de0 R9: 8800b3c61810 > R10: 22cb R11: R12: 88018ae858c0 > R13: ea0006903dc0 R14: 0008 R15: ea0006903dc0 > ORIG_RAX: CS: 0010 SS: > #41 [8801b57ffbe8] page_lock_anon_vma_read at 811b241c > #42 [8801b57ffc18] page_referenced at 811b26a7 > #43 [8801b57ffc90] shrink_active_list at 8118d634 > #44 [8801b57ffd48] balance_pgdat at 8118f088 > #45 [8801b57ffe20] kswapd at 8118f633 > #46 [8801b57ffec8] kthread at 810a795f > #47 [8801b57fff50] ret_from_fork at 81665398 > crash> struct page.mapping ea0006903dc0 > mapping = 0x88018ae858c1 > crash> struct anon_vma 0x88018ae858c0 > struct anon_vma { > root = 0x0, > rwsem = { > count = 0, > wait_lock = { > raw_lock = { > { > head_tail = 1, > tickets = { > head = 1, > tail = 0 > } > } > } > }, > wait_list = { > next = 0x0, > prev = 0x0 > } > }, > refcount = { > counter = 0 > }, > rb_root = { > rb_node = 0x0 > } > } > > This maks me wonder, the anon_vma do not come from slab structure. > and the content is abnormal. IMO, At least anon_vma->root will not NULL. > The issue can be reproduced every other week. > > >>> Check please if commit > >>> 624483f3ea8 ("mm: rmap: fix use-after-free in __put_anon_vma") > >>> is included in the 3.10 you are running. > >>> > >> We missed this patch in RHEL 7.2 > >> Could you please give more details for how it triggered? > > > > Sorry, I could not. > > I guess it is UAF as described in the log of that commit. > > And if it works for you, we know how. > > > > Hillf > > > > __put_anon_vma| page_lock_anon_vma_read > anon_vma_free(root) | > | root_anon_vma = ACCESS_ONCE(anon_vma->root) > | down_read_trylock(&root_anon_vma->rwsem) > anon_vma_free(anon_vma) | > > I find anon_vma was created by SLAB_DESTROY_BY_RCU, so it will not merge > by other slabs, and free_slab() will not free it during > page_lock_anon_vma_read(), > because it holds rcu_read_lock(), right? > Dunno frankly, Sir, you know, I am not an rmap expert like you. And pretty much probable I made a wrong guess, and sorry again. > If root_anon_vma was reuse by someone, why "crash> struct anon_vma" > shows almost zero? > thank you very much Hillf
Re: NULL pointer dereference in the kernel 3.10
On 2017/4/10 17:37, Hillf Danton wrote: > On April 10, 2017 4:57 PM Xishi Qiu wrote: >> On 2017/4/10 14:42, Hillf Danton wrote: >> >>> On April 08, 2017 9:40 PM zhong Jiang wrote: when runing the stabile docker cases in the vm. The following issue will come up. #40 [8801b57ffb30] async_page_fault at 8165c9f8 [exception RIP: down_read_trylock+5] RIP: 810aca65 RSP: 8801b57ffbe8 RFLAGS: 00010202 RAX: RBX: 88018ae858c1 RCX: RDX: RSI: RDI: 0008 RBP: 8801b57ffc10 R8: ea0006903de0 R9: 8800b3c61810 R10: 22cb R11: R12: 88018ae858c0 R13: ea0006903dc0 R14: 0008 R15: ea0006903dc0 ORIG_RAX: CS: 0010 SS: #41 [8801b57ffbe8] page_lock_anon_vma_read at 811b241c #42 [8801b57ffc18] page_referenced at 811b26a7 #43 [8801b57ffc90] shrink_active_list at 8118d634 #44 [8801b57ffd48] balance_pgdat at 8118f088 #45 [8801b57ffe20] kswapd at 8118f633 #46 [8801b57ffec8] kthread at 810a795f #47 [8801b57fff50] ret_from_fork at 81665398 crash> struct page.mapping ea0006903dc0 mapping = 0x88018ae858c1 crash> struct anon_vma 0x88018ae858c0 struct anon_vma { root = 0x0, rwsem = { count = 0, wait_lock = { raw_lock = { { head_tail = 1, tickets = { head = 1, tail = 0 } } } }, wait_list = { next = 0x0, prev = 0x0 } }, refcount = { counter = 0 }, rb_root = { rb_node = 0x0 } } This maks me wonder, the anon_vma do not come from slab structure. and the content is abnormal. IMO, At least anon_vma->root will not NULL. The issue can be reproduced every other week. >>> Check please if commit >>> 624483f3ea8 ("mm: rmap: fix use-after-free in __put_anon_vma") >>> is included in the 3.10 you are running. >>> >> We missed this patch in RHEL 7.2 >> Could you please give more details for how it triggered? > > Sorry, I could not. > I guess it is UAF as described in the log of that commit. > And if it works for you, we know how. > > Hillf > __put_anon_vma| page_lock_anon_vma_read anon_vma_free(root) | | root_anon_vma = ACCESS_ONCE(anon_vma->root) | down_read_trylock(&root_anon_vma->rwsem) anon_vma_free(anon_vma) | I find anon_vma was created by SLAB_DESTROY_BY_RCU, so it will not merge by other slabs, and free_slab() will not free it during page_lock_anon_vma_read(), because it holds rcu_read_lock(), right? If root_anon_vma was reuse by someone, why "crash> struct anon_vma" shows almost zero? Thanks, Xishi Qiu > > > > . >
Re: NULL pointer dereference in the kernel 3.10
On April 10, 2017 4:57 PM Xishi Qiu wrote: > On 2017/4/10 14:42, Hillf Danton wrote: > > > On April 08, 2017 9:40 PM zhong Jiang wrote: > >> > >> when runing the stabile docker cases in the vm. The following issue will > >> come up. > >> > >> #40 [8801b57ffb30] async_page_fault at 8165c9f8 > >> [exception RIP: down_read_trylock+5] > >> RIP: 810aca65 RSP: 8801b57ffbe8 RFLAGS: 00010202 > >> RAX: RBX: 88018ae858c1 RCX: > >> RDX: RSI: RDI: 0008 > >> RBP: 8801b57ffc10 R8: ea0006903de0 R9: 8800b3c61810 > >> R10: 22cb R11: R12: 88018ae858c0 > >> R13: ea0006903dc0 R14: 0008 R15: ea0006903dc0 > >> ORIG_RAX: CS: 0010 SS: > >> #41 [8801b57ffbe8] page_lock_anon_vma_read at 811b241c > >> #42 [8801b57ffc18] page_referenced at 811b26a7 > >> #43 [8801b57ffc90] shrink_active_list at 8118d634 > >> #44 [8801b57ffd48] balance_pgdat at 8118f088 > >> #45 [8801b57ffe20] kswapd at 8118f633 > >> #46 [8801b57ffec8] kthread at 810a795f > >> #47 [8801b57fff50] ret_from_fork at 81665398 > >> crash> struct page.mapping ea0006903dc0 > >> mapping = 0x88018ae858c1 > >> crash> struct anon_vma 0x88018ae858c0 > >> struct anon_vma { > >> root = 0x0, > >> rwsem = { > >> count = 0, > >> wait_lock = { > >> raw_lock = { > >> { > >> head_tail = 1, > >> tickets = { > >> head = 1, > >> tail = 0 > >> } > >> } > >> } > >> }, > >> wait_list = { > >> next = 0x0, > >> prev = 0x0 > >> } > >> }, > >> refcount = { > >> counter = 0 > >> }, > >> rb_root = { > >> rb_node = 0x0 > >> } > >> } > >> > >> This maks me wonder, the anon_vma do not come from slab structure. > >> and the content is abnormal. IMO, At least anon_vma->root will not NULL. > >> The issue can be reproduced every other week. > >> > > Check please if commit > > 624483f3ea8 ("mm: rmap: fix use-after-free in __put_anon_vma") > > is included in the 3.10 you are running. > > > We missed this patch in RHEL 7.2 > Could you please give more details for how it triggered? Sorry, I could not. I guess it is UAF as described in the log of that commit. And if it works for you, we know how. Hillf
Re: NULL pointer dereference in the kernel 3.10
On 2017/4/10 14:42, Hillf Danton wrote: > On April 08, 2017 9:40 PM zhong Jiang wrote: >> >> when runing the stabile docker cases in the vm. The following issue will >> come up. >> >> #40 [8801b57ffb30] async_page_fault at 8165c9f8 >> [exception RIP: down_read_trylock+5] >> RIP: 810aca65 RSP: 8801b57ffbe8 RFLAGS: 00010202 >> RAX: RBX: 88018ae858c1 RCX: >> RDX: RSI: RDI: 0008 >> RBP: 8801b57ffc10 R8: ea0006903de0 R9: 8800b3c61810 >> R10: 22cb R11: R12: 88018ae858c0 >> R13: ea0006903dc0 R14: 0008 R15: ea0006903dc0 >> ORIG_RAX: CS: 0010 SS: >> #41 [8801b57ffbe8] page_lock_anon_vma_read at 811b241c >> #42 [8801b57ffc18] page_referenced at 811b26a7 >> #43 [8801b57ffc90] shrink_active_list at 8118d634 >> #44 [8801b57ffd48] balance_pgdat at 8118f088 >> #45 [8801b57ffe20] kswapd at 8118f633 >> #46 [8801b57ffec8] kthread at 810a795f >> #47 [8801b57fff50] ret_from_fork at 81665398 >> crash> struct page.mapping ea0006903dc0 >> mapping = 0x88018ae858c1 >> crash> struct anon_vma 0x88018ae858c0 >> struct anon_vma { >> root = 0x0, >> rwsem = { >> count = 0, >> wait_lock = { >> raw_lock = { >> { >> head_tail = 1, >> tickets = { >> head = 1, >> tail = 0 >> } >> } >> } >> }, >> wait_list = { >> next = 0x0, >> prev = 0x0 >> } >> }, >> refcount = { >> counter = 0 >> }, >> rb_root = { >> rb_node = 0x0 >> } >> } >> >> This maks me wonder, the anon_vma do not come from slab structure. >> and the content is abnormal. IMO, At least anon_vma->root will not NULL. >> The issue can be reproduced every other week. >> > Check please if commit > 624483f3ea8 ("mm: rmap: fix use-after-free in __put_anon_vma") > is included in the 3.10 you are running. > Hi Hillf, We missed this patch in RHEL 7.2 Could you please give more details for how it triggered? Thanks, Xishi QIu > btw, why not run the mainline? > > Hillf > > > > . >
Re: NULL pointer dereference in the kernel 3.10
On Sat, Apr 08, 2017 at 09:39:42PM +0800, zhong jiang wrote: > when runing the stabile docker cases in the vm. The following issue will > come up. > > #40 [8801b57ffb30] async_page_fault at 8165c9f8 > [exception RIP: down_read_trylock+5] > RIP: 810aca65 RSP: 8801b57ffbe8 RFLAGS: 00010202 > RAX: RBX: 88018ae858c1 RCX: > RDX: RSI: RDI: 0008 > RBP: 8801b57ffc10 R8: ea0006903de0 R9: 8800b3c61810 > R10: 22cb R11: R12: 88018ae858c0 > R13: ea0006903dc0 R14: 0008 R15: ea0006903dc0 > ORIG_RAX: CS: 0010 SS: Post the full report including the kernel version and state whether any additional patches to 3.10 are applied. -- Mel Gorman SUSE Labs
Re: NULL pointer dereference in the kernel 3.10
On April 08, 2017 9:40 PM zhong Jiang wrote: > > when runing the stabile docker cases in the vm. The following issue will > come up. > > #40 [8801b57ffb30] async_page_fault at 8165c9f8 > [exception RIP: down_read_trylock+5] > RIP: 810aca65 RSP: 8801b57ffbe8 RFLAGS: 00010202 > RAX: RBX: 88018ae858c1 RCX: > RDX: RSI: RDI: 0008 > RBP: 8801b57ffc10 R8: ea0006903de0 R9: 8800b3c61810 > R10: 22cb R11: R12: 88018ae858c0 > R13: ea0006903dc0 R14: 0008 R15: ea0006903dc0 > ORIG_RAX: CS: 0010 SS: > #41 [8801b57ffbe8] page_lock_anon_vma_read at 811b241c > #42 [8801b57ffc18] page_referenced at 811b26a7 > #43 [8801b57ffc90] shrink_active_list at 8118d634 > #44 [8801b57ffd48] balance_pgdat at 8118f088 > #45 [8801b57ffe20] kswapd at 8118f633 > #46 [8801b57ffec8] kthread at 810a795f > #47 [8801b57fff50] ret_from_fork at 81665398 > crash> struct page.mapping ea0006903dc0 > mapping = 0x88018ae858c1 > crash> struct anon_vma 0x88018ae858c0 > struct anon_vma { > root = 0x0, > rwsem = { > count = 0, > wait_lock = { > raw_lock = { > { > head_tail = 1, > tickets = { > head = 1, > tail = 0 > } > } > } > }, > wait_list = { > next = 0x0, > prev = 0x0 > } > }, > refcount = { > counter = 0 > }, > rb_root = { > rb_node = 0x0 > } > } > > This maks me wonder, the anon_vma do not come from slab structure. > and the content is abnormal. IMO, At least anon_vma->root will not NULL. > The issue can be reproduced every other week. > Check please if commit 624483f3ea8 ("mm: rmap: fix use-after-free in __put_anon_vma") is included in the 3.10 you are running. btw, why not run the mainline? Hillf
NULL pointer dereference in the kernel 3.10
when runing the stabile docker cases in the vm. The following issue will come up. #40 [8801b57ffb30] async_page_fault at 8165c9f8 [exception RIP: down_read_trylock+5] RIP: 810aca65 RSP: 8801b57ffbe8 RFLAGS: 00010202 RAX: RBX: 88018ae858c1 RCX: RDX: RSI: RDI: 0008 RBP: 8801b57ffc10 R8: ea0006903de0 R9: 8800b3c61810 R10: 22cb R11: R12: 88018ae858c0 R13: ea0006903dc0 R14: 0008 R15: ea0006903dc0 ORIG_RAX: CS: 0010 SS: #41 [8801b57ffbe8] page_lock_anon_vma_read at 811b241c #42 [8801b57ffc18] page_referenced at 811b26a7 #43 [8801b57ffc90] shrink_active_list at 8118d634 #44 [8801b57ffd48] balance_pgdat at 8118f088 #45 [8801b57ffe20] kswapd at 8118f633 #46 [8801b57ffec8] kthread at 810a795f #47 [8801b57fff50] ret_from_fork at 81665398 crash> struct page.mapping ea0006903dc0 mapping = 0x88018ae858c1 crash> struct anon_vma 0x88018ae858c0 struct anon_vma { root = 0x0, rwsem = { count = 0, wait_lock = { raw_lock = { { head_tail = 1, tickets = { head = 1, tail = 0 } } } }, wait_list = { next = 0x0, prev = 0x0 } }, refcount = { counter = 0 }, rb_root = { rb_node = 0x0 } } This maks me wonder, the anon_vma do not come from slab structure. and the content is abnormal. IMO, At least anon_vma->root will not NULL. The issue can be reproduced every other week. Any comments will be appreciated. Thanks zhongjiang