Re: [PATCH 3/4] mm/tlb, x86/mm: Support invalidating TLB caches for RCU_TABLE_FREE
On Wed, 22 Aug 2018 22:03:40 -0700 Linus Torvalds wrote: > On Wed, Aug 22, 2018 at 9:33 PM Nicholas Piggin wrote: > > > > I think it was quite well understood and fixed here, a145abf12c9 but > > again that was before I really started looking at it. > > You don't understand the problem. More fundamentally I think I didn't understand this fix, I think actually powerpc/radix does have a bug here. a145abf12c9 was really just a replacement for x86's hack of expanding the TLB invalidation range when freeing page table to capture page walk cache (powerpc/radix needs a different instruction so that didn't work for us). But I hadn't really looked at this fix closely rather Peter's follow up post about making powerpc page walk cache flushing design a generic concept. My point in this reply was more that my patches from the other month weren't a blundering issue to fix this bug without realising it, they were purely about avoiding the x86 TLB range expanding hack (that won't be needed if generic users all move over). > > All the x86 people thought WE ALREADY DID THAT. > > Because we had done this all correctly over a decade ago! > > Nobody realized that it had been screwed up by the powerpc code, and The powerpc/hash code is not screwed up though AFAIKS. You can't take arch specific code and slap a "generic" label on it, least of all the crazy powerpc/hash code, you of all people would agree with that :) > the commit you point to was believed to be a new *powerpc* only issue, > because the semantics on powerpc has changed because of the radix > tree. > > The semantics on x86 have never changed, they've always been the same. > So why would the x86 people react to powerpc doing something that x86 > had already always done. > > See? > > Nobody cared one whit about commit a145abf12c9, because it just > handles a new powerpc-specific case. > > > I don't really understand what the issue you have with powerpc here. > > powerpc hash has the page table flushing accessors which are just > > no-ops, it's the generic code that fails to call them properly. Surely > > there was no powerpc patch that removed those calls from generic code? > > Yes there was. > > Look where the generic code *came* from. > > It's powerpc code. > > See commit 267239116987 ("mm, powerpc: move the RCU page-table freeing > into generic code"). > > The powerpc code was made into the generic code, because the powerpc > code had to handle all those special RCU freeing things etc that > others didn't. > > It's just that when x86 was then switched over to use the "generic" > code, people didn't realize that the generic code didn't do the TLB > invalidations for page tables, because they hadn't been needed on > powerpc. Sure, there was a minor bug in the port. Not that it was a closely guarded secret that powerpc didn't flush page table pages, but it's a relatively subtle issue in complex code. That happens. > > So the powerpc code that was made generic, never really was. The new > "generic" code had a powerpc-specific quirk. > > That then very subtly broke x86, without the x86 people ever > realizing. Because the old simple non-RCU x86 code had never had that > issue, it just treated the leaf pages and the directory pages exactly > the same. > > See? > > And THAT is why I talk about the powerpc code. Because what is > "generic" code in 4.18 (and several releases before) oisn't actually > generic. > > And that's basically exactly the bug that the patches from PeterZ is > fixing. Making the "tlb_remove_table()" code always flush the tlb, the > way it should have when it was made generic. It just sounded like you were blaming correct powerpc/hash code for this. It's just a minor bug in taking that code into generic, not really a big deal, right? Or are you saying powerpc devs or code could be doing something better to play nicer with the rest of the archs? Honestly trying to improve things here, and encouraged by x86 and ARM looking to move over to a saner page walk cache tracking design and sharing more code with powerpc/radix. I would help with reviewing things or writing code or porting powerpc bits if I can. Thanks, Nick
Re: [PATCH 3/4] mm/tlb, x86/mm: Support invalidating TLB caches for RCU_TABLE_FREE
On Wed, 22 Aug 2018 22:03:40 -0700 Linus Torvalds wrote: > On Wed, Aug 22, 2018 at 9:33 PM Nicholas Piggin wrote: > > > > I think it was quite well understood and fixed here, a145abf12c9 but > > again that was before I really started looking at it. > > You don't understand the problem. More fundamentally I think I didn't understand this fix, I think actually powerpc/radix does have a bug here. a145abf12c9 was really just a replacement for x86's hack of expanding the TLB invalidation range when freeing page table to capture page walk cache (powerpc/radix needs a different instruction so that didn't work for us). But I hadn't really looked at this fix closely rather Peter's follow up post about making powerpc page walk cache flushing design a generic concept. My point in this reply was more that my patches from the other month weren't a blundering issue to fix this bug without realising it, they were purely about avoiding the x86 TLB range expanding hack (that won't be needed if generic users all move over). > > All the x86 people thought WE ALREADY DID THAT. > > Because we had done this all correctly over a decade ago! > > Nobody realized that it had been screwed up by the powerpc code, and The powerpc/hash code is not screwed up though AFAIKS. You can't take arch specific code and slap a "generic" label on it, least of all the crazy powerpc/hash code, you of all people would agree with that :) > the commit you point to was believed to be a new *powerpc* only issue, > because the semantics on powerpc has changed because of the radix > tree. > > The semantics on x86 have never changed, they've always been the same. > So why would the x86 people react to powerpc doing something that x86 > had already always done. > > See? > > Nobody cared one whit about commit a145abf12c9, because it just > handles a new powerpc-specific case. > > > I don't really understand what the issue you have with powerpc here. > > powerpc hash has the page table flushing accessors which are just > > no-ops, it's the generic code that fails to call them properly. Surely > > there was no powerpc patch that removed those calls from generic code? > > Yes there was. > > Look where the generic code *came* from. > > It's powerpc code. > > See commit 267239116987 ("mm, powerpc: move the RCU page-table freeing > into generic code"). > > The powerpc code was made into the generic code, because the powerpc > code had to handle all those special RCU freeing things etc that > others didn't. > > It's just that when x86 was then switched over to use the "generic" > code, people didn't realize that the generic code didn't do the TLB > invalidations for page tables, because they hadn't been needed on > powerpc. Sure, there was a minor bug in the port. Not that it was a closely guarded secret that powerpc didn't flush page table pages, but it's a relatively subtle issue in complex code. That happens. > > So the powerpc code that was made generic, never really was. The new > "generic" code had a powerpc-specific quirk. > > That then very subtly broke x86, without the x86 people ever > realizing. Because the old simple non-RCU x86 code had never had that > issue, it just treated the leaf pages and the directory pages exactly > the same. > > See? > > And THAT is why I talk about the powerpc code. Because what is > "generic" code in 4.18 (and several releases before) oisn't actually > generic. > > And that's basically exactly the bug that the patches from PeterZ is > fixing. Making the "tlb_remove_table()" code always flush the tlb, the > way it should have when it was made generic. It just sounded like you were blaming correct powerpc/hash code for this. It's just a minor bug in taking that code into generic, not really a big deal, right? Or are you saying powerpc devs or code could be doing something better to play nicer with the rest of the archs? Honestly trying to improve things here, and encouraged by x86 and ARM looking to move over to a saner page walk cache tracking design and sharing more code with powerpc/radix. I would help with reviewing things or writing code or porting powerpc bits if I can. Thanks, Nick
Re: [PATCH 1/2] dt-bindings: dmaengine: add DT binding for UniPhier MIO DMAC
On 23 August 2018 at 10:48, Masahiro Yamada wrote: > Hi Jassi, > > > 2018-08-21 19:44 GMT+09:00 Jassi Brar : >> On 21 August 2018 at 15:17, Masahiro Yamada >> wrote: >>> (+CC Rob, DT, LKML) >>> >>> I forgot to CC this to DT community... >>> >>> >>> 2018-08-21 18:30 GMT+09:00 Masahiro Yamada : The MIO DMAC (Media IO DMA Controller) is used in UniPhier LD4, Pro4, and sLD8 SoCs. Signed-off-by: Masahiro Yamada --- .../devicetree/bindings/dma/uniphier-mio-dmac.txt | 28 ++ 1 file changed, 28 insertions(+) create mode 100644 Documentation/devicetree/bindings/dma/uniphier-mio-dmac.txt diff --git a/Documentation/devicetree/bindings/dma/uniphier-mio-dmac.txt b/Documentation/devicetree/bindings/dma/uniphier-mio-dmac.txt new file mode 100644 index 000..a9e969e --- /dev/null +++ b/Documentation/devicetree/bindings/dma/uniphier-mio-dmac.txt @@ -0,0 +1,28 @@ +UniPhier Media IO DMA controller + +This works as an external DMA engine for SD/eMMC controllers etc. +found in UniPhier LD4, Pro4, sLD8 SoCs. + +Required properties: +- compatible: should be "socionext,uniphier-mio-dmac". +- reg: offset and length of the register set for the device. +- interrupts: a list of interrupt specifiers associated with the DMA channels. +- clocks: a single clock specifier +- #dma-cells: should be <1>. The single cell represents the channel number. +- dma-channels: specify the number of the DMA channels. This should match to + the number of tuples in the interrupts property. + >> Can we not infer the number of channels from interrupt tuples? After >> all the driver assumes they are same. > > > It would be possible to count the number of tuples > in "interrupts". > > > > I know of_irq_count(), but I do not see any driver > in drivers/dma/ that calls it. > > > I guess the reason is that of_irq_count() is not exported, > so tristate drivers like this cannot use it. > > > I checked Documentation/devicetree/bindings/dma/, > and some controllers specify _redundant_ dma-channels property. > > fsl-mxs-dma.txt > renesas,rcar-dmac.txt > renesas,usb-dmac.txt > :) I am not sure "because others are doing it" is a good reason to introduce redundancy. > I also see counter-implementation. > > > bcm2835-dma.c hard-codes the number of channels in the driver. > tegra210-adma.c associates nr_channels with compatible string. > > > > I will wait for comments from the maintainers. > > If desired, I will export of_irq_count() > and use it from my driver. > If you don't want to leave too much footprint, you could do count = 0; while (of_irq_parse_one(dev, count, ) == 0) count++ of_irq_parse_one() is already exported. A good side-effect is you wouldn't have to hardcode the count in the driver (like bcm and tegra examples you quote). Having said that, I wouldn't lose sleep over it. So Cheers!
Re: [PATCH 1/2] dt-bindings: dmaengine: add DT binding for UniPhier MIO DMAC
On 23 August 2018 at 10:48, Masahiro Yamada wrote: > Hi Jassi, > > > 2018-08-21 19:44 GMT+09:00 Jassi Brar : >> On 21 August 2018 at 15:17, Masahiro Yamada >> wrote: >>> (+CC Rob, DT, LKML) >>> >>> I forgot to CC this to DT community... >>> >>> >>> 2018-08-21 18:30 GMT+09:00 Masahiro Yamada : The MIO DMAC (Media IO DMA Controller) is used in UniPhier LD4, Pro4, and sLD8 SoCs. Signed-off-by: Masahiro Yamada --- .../devicetree/bindings/dma/uniphier-mio-dmac.txt | 28 ++ 1 file changed, 28 insertions(+) create mode 100644 Documentation/devicetree/bindings/dma/uniphier-mio-dmac.txt diff --git a/Documentation/devicetree/bindings/dma/uniphier-mio-dmac.txt b/Documentation/devicetree/bindings/dma/uniphier-mio-dmac.txt new file mode 100644 index 000..a9e969e --- /dev/null +++ b/Documentation/devicetree/bindings/dma/uniphier-mio-dmac.txt @@ -0,0 +1,28 @@ +UniPhier Media IO DMA controller + +This works as an external DMA engine for SD/eMMC controllers etc. +found in UniPhier LD4, Pro4, sLD8 SoCs. + +Required properties: +- compatible: should be "socionext,uniphier-mio-dmac". +- reg: offset and length of the register set for the device. +- interrupts: a list of interrupt specifiers associated with the DMA channels. +- clocks: a single clock specifier +- #dma-cells: should be <1>. The single cell represents the channel number. +- dma-channels: specify the number of the DMA channels. This should match to + the number of tuples in the interrupts property. + >> Can we not infer the number of channels from interrupt tuples? After >> all the driver assumes they are same. > > > It would be possible to count the number of tuples > in "interrupts". > > > > I know of_irq_count(), but I do not see any driver > in drivers/dma/ that calls it. > > > I guess the reason is that of_irq_count() is not exported, > so tristate drivers like this cannot use it. > > > I checked Documentation/devicetree/bindings/dma/, > and some controllers specify _redundant_ dma-channels property. > > fsl-mxs-dma.txt > renesas,rcar-dmac.txt > renesas,usb-dmac.txt > :) I am not sure "because others are doing it" is a good reason to introduce redundancy. > I also see counter-implementation. > > > bcm2835-dma.c hard-codes the number of channels in the driver. > tegra210-adma.c associates nr_channels with compatible string. > > > > I will wait for comments from the maintainers. > > If desired, I will export of_irq_count() > and use it from my driver. > If you don't want to leave too much footprint, you could do count = 0; while (of_irq_parse_one(dev, count, ) == 0) count++ of_irq_parse_one() is already exported. A good side-effect is you wouldn't have to hardcode the count in the driver (like bcm and tegra examples you quote). Having said that, I wouldn't lose sleep over it. So Cheers!
Re: KASAN: null-ptr-deref Write in binder_update_page_range
Hi, On Wed, Aug 22, 2018 at 03:07:04PM +0900, Dae R. Jeong wrote: > Reporting the crash: KASAN: null-ptr-deref Write in binder_update_page_range > > This crash has been found in v4.18-rc3 using RaceFuzzer (a modified > version of Syzkaller), which we describe more at the end of this > report. > > Our analysis shows that the race occurs when invoking two syscalls > concurrently, mmap$binder() and ioctl$BINDER_WRITE_READ. More > specifically, since two code lines `alloc->vma = vma;` and > `alloc->vma_vm_mm = vma->vm_mm;` in binder_alloc_mmap_handler() is not > an atomic operation during mmap$binder() syscall, there is a time > window that `alloc->vma` is assigned but `alloc->vma_vm_mm` isn't > assigned. It causes the null pointer dereference in > binder_alloc_new_buf_locked() since it checks whether `alloc->vma` is > NULL, but it doesn't check that `alloc->vma_vm_mm` is NULL. More > details on the thread interleaving and the crash log are follows. > > > Thread interleaving: > CPU0 (binder_alloc_mmap_handler) CPU1 > (binder_alloc_new_buf_locked) > = = > // drivers/android/binder_alloc.c > // #L718 (v4.18-rc3) > alloc->vma = vma; > // > drivers/android/binder_alloc.c > // #L346 (v4.18-rc3) > if (alloc->vma == NULL) { > ... > // alloc->vma is not NULL > at this point > return ERR_PTR(-ESRCH); > } > ... > // #L438 > binder_update_page_range(alloc, > 0, > (void > *)PAGE_ALIGN((uintptr_t)buffer->data), > end_page_addr); > > // In > binder_update_page_range() #L218 > // But still alloc->vma_vm_mm > is NULL here > if (need_mm && > mmget_not_zero(alloc->vma_vm_mm)) > alloc->vma_vm_mm = vma->vm_mm; > > > Crash Log: > == > BUG: KASAN: null-ptr-deref in __atomic_add_unless > include/asm-generic/atomic-instrumented.h:89 [inline] > BUG: KASAN: null-ptr-deref in atomic_add_unless include/linux/atomic.h:533 > [inline] > BUG: KASAN: null-ptr-deref in mmget_not_zero include/linux/sched/mm.h:75 > [inline] > BUG: KASAN: null-ptr-deref in binder_update_page_range+0xece/0x18e0 > drivers/android/binder_alloc.c:218 > Write of size 4 at addr 0058 by task syz-executor0/11184 > > CPU: 1 PID: 11184 Comm: syz-executor0 Not tainted 4.18.0-rc3 #1 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014 > Call Trace: > __dump_stack lib/dump_stack.c:77 [inline] > dump_stack+0x16e/0x22c lib/dump_stack.c:113 > kasan_report_error mm/kasan/report.c:352 [inline] > kasan_report+0x163/0x380 mm/kasan/report.c:412 > check_memory_region_inline mm/kasan/kasan.c:260 [inline] > check_memory_region+0x140/0x1a0 mm/kasan/kasan.c:267 > kasan_check_write+0x14/0x20 mm/kasan/kasan.c:278 > __atomic_add_unless include/asm-generic/atomic-instrumented.h:89 [inline] > atomic_add_unless include/linux/atomic.h:533 [inline] > mmget_not_zero include/linux/sched/mm.h:75 [inline] > binder_update_page_range+0xece/0x18e0 drivers/android/binder_alloc.c:218 > binder_alloc_new_buf_locked drivers/android/binder_alloc.c:443 [inline] > binder_alloc_new_buf+0x467/0xc30 drivers/android/binder_alloc.c:513 > binder_transaction+0x125b/0x4fb0 drivers/android/binder.c:2957 > binder_thread_write+0xc08/0x2770 drivers/android/binder.c:3528 > binder_ioctl_write_read.isra.39+0x24f/0x8e0 drivers/android/binder.c:4456 > binder_ioctl+0xa86/0xf34 drivers/android/binder.c:4596 > vfs_ioctl fs/ioctl.c:46 [inline] > do_vfs_ioctl+0x154/0xd40 fs/ioctl.c:686 > ksys_ioctl+0x94/0xb0 fs/ioctl.c:701 > __do_sys_ioctl fs/ioctl.c:708 [inline] > __se_sys_ioctl fs/ioctl.c:706 [inline] > __x64_sys_ioctl+0x43/0x50 fs/ioctl.c:706 > do_syscall_64+0x167/0x4b0 arch/x86/entry/common.c:290 > entry_SYSCALL_64_after_hwframe+0x49/0xbe > RIP: 0033:0x456469 > Code: 1d ba fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 > 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f > 83 eb b9 fb ff c3 66 2e 0f 1f 84 00 00 00 00 > RSP: 002b:7f575f268b28 EFLAGS: 0246 ORIG_RAX: 0010 > RAX: ffda RBX: 0072bfa0 RCX: 00456469 > RDX: 23c0 RSI: c0306201 RDI: 0016 > RBP:
Re: KASAN: null-ptr-deref Write in binder_update_page_range
Hi, On Wed, Aug 22, 2018 at 03:07:04PM +0900, Dae R. Jeong wrote: > Reporting the crash: KASAN: null-ptr-deref Write in binder_update_page_range > > This crash has been found in v4.18-rc3 using RaceFuzzer (a modified > version of Syzkaller), which we describe more at the end of this > report. > > Our analysis shows that the race occurs when invoking two syscalls > concurrently, mmap$binder() and ioctl$BINDER_WRITE_READ. More > specifically, since two code lines `alloc->vma = vma;` and > `alloc->vma_vm_mm = vma->vm_mm;` in binder_alloc_mmap_handler() is not > an atomic operation during mmap$binder() syscall, there is a time > window that `alloc->vma` is assigned but `alloc->vma_vm_mm` isn't > assigned. It causes the null pointer dereference in > binder_alloc_new_buf_locked() since it checks whether `alloc->vma` is > NULL, but it doesn't check that `alloc->vma_vm_mm` is NULL. More > details on the thread interleaving and the crash log are follows. > > > Thread interleaving: > CPU0 (binder_alloc_mmap_handler) CPU1 > (binder_alloc_new_buf_locked) > = = > // drivers/android/binder_alloc.c > // #L718 (v4.18-rc3) > alloc->vma = vma; > // > drivers/android/binder_alloc.c > // #L346 (v4.18-rc3) > if (alloc->vma == NULL) { > ... > // alloc->vma is not NULL > at this point > return ERR_PTR(-ESRCH); > } > ... > // #L438 > binder_update_page_range(alloc, > 0, > (void > *)PAGE_ALIGN((uintptr_t)buffer->data), > end_page_addr); > > // In > binder_update_page_range() #L218 > // But still alloc->vma_vm_mm > is NULL here > if (need_mm && > mmget_not_zero(alloc->vma_vm_mm)) > alloc->vma_vm_mm = vma->vm_mm; > > > Crash Log: > == > BUG: KASAN: null-ptr-deref in __atomic_add_unless > include/asm-generic/atomic-instrumented.h:89 [inline] > BUG: KASAN: null-ptr-deref in atomic_add_unless include/linux/atomic.h:533 > [inline] > BUG: KASAN: null-ptr-deref in mmget_not_zero include/linux/sched/mm.h:75 > [inline] > BUG: KASAN: null-ptr-deref in binder_update_page_range+0xece/0x18e0 > drivers/android/binder_alloc.c:218 > Write of size 4 at addr 0058 by task syz-executor0/11184 > > CPU: 1 PID: 11184 Comm: syz-executor0 Not tainted 4.18.0-rc3 #1 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014 > Call Trace: > __dump_stack lib/dump_stack.c:77 [inline] > dump_stack+0x16e/0x22c lib/dump_stack.c:113 > kasan_report_error mm/kasan/report.c:352 [inline] > kasan_report+0x163/0x380 mm/kasan/report.c:412 > check_memory_region_inline mm/kasan/kasan.c:260 [inline] > check_memory_region+0x140/0x1a0 mm/kasan/kasan.c:267 > kasan_check_write+0x14/0x20 mm/kasan/kasan.c:278 > __atomic_add_unless include/asm-generic/atomic-instrumented.h:89 [inline] > atomic_add_unless include/linux/atomic.h:533 [inline] > mmget_not_zero include/linux/sched/mm.h:75 [inline] > binder_update_page_range+0xece/0x18e0 drivers/android/binder_alloc.c:218 > binder_alloc_new_buf_locked drivers/android/binder_alloc.c:443 [inline] > binder_alloc_new_buf+0x467/0xc30 drivers/android/binder_alloc.c:513 > binder_transaction+0x125b/0x4fb0 drivers/android/binder.c:2957 > binder_thread_write+0xc08/0x2770 drivers/android/binder.c:3528 > binder_ioctl_write_read.isra.39+0x24f/0x8e0 drivers/android/binder.c:4456 > binder_ioctl+0xa86/0xf34 drivers/android/binder.c:4596 > vfs_ioctl fs/ioctl.c:46 [inline] > do_vfs_ioctl+0x154/0xd40 fs/ioctl.c:686 > ksys_ioctl+0x94/0xb0 fs/ioctl.c:701 > __do_sys_ioctl fs/ioctl.c:708 [inline] > __se_sys_ioctl fs/ioctl.c:706 [inline] > __x64_sys_ioctl+0x43/0x50 fs/ioctl.c:706 > do_syscall_64+0x167/0x4b0 arch/x86/entry/common.c:290 > entry_SYSCALL_64_after_hwframe+0x49/0xbe > RIP: 0033:0x456469 > Code: 1d ba fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 > 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f > 83 eb b9 fb ff c3 66 2e 0f 1f 84 00 00 00 00 > RSP: 002b:7f575f268b28 EFLAGS: 0246 ORIG_RAX: 0010 > RAX: ffda RBX: 0072bfa0 RCX: 00456469 > RDX: 23c0 RSI: c0306201 RDI: 0016 > RBP:
[PATCH] mei: fix use-after-free in mei_cl_write
From: John Hubbard KASAN reports a use-after-free during startup, in mei_cl_write: BUG: KASAN: use-after-free in mei_cl_write+0x601/0x870 [mei] (drivers/misc/mei/client.c:1770) This is caused by commit 98e70866aacb ("mei: add support for variable length mei headers."), which changed the return value from len, to buf-size. That ends up using a stale buf pointer, because in some situations, the cb (callback) is deleted before it gets here. However, fortunately, len remains unchanged throughout the function (and I don't see anything else that would require re-reading buf->size either), so the fix is to simply revert the change, and return len, as before. CC: Tomas Winkler CC: Arnd Bergmann CC: Greg Kroah-Hartman Signed-off-by: John Hubbard --- drivers/misc/mei/client.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/misc/mei/client.c b/drivers/misc/mei/client.c index 4ab6251d418e..ebdcf0b450e2 100644 --- a/drivers/misc/mei/client.c +++ b/drivers/misc/mei/client.c @@ -1767,7 +1767,7 @@ ssize_t mei_cl_write(struct mei_cl *cl, struct mei_cl_cb *cb) } } - rets = buf->size; + rets = len; err: cl_dbg(dev, cl, "rpm: autosuspend\n"); pm_runtime_mark_last_busy(dev->dev); -- 2.18.0
[PATCH] mei: fix use-after-free in mei_cl_write
From: John Hubbard KASAN reports a use-after-free during startup, in mei_cl_write: BUG: KASAN: use-after-free in mei_cl_write+0x601/0x870 [mei] (drivers/misc/mei/client.c:1770) This is caused by commit 98e70866aacb ("mei: add support for variable length mei headers."), which changed the return value from len, to buf-size. That ends up using a stale buf pointer, because in some situations, the cb (callback) is deleted before it gets here. However, fortunately, len remains unchanged throughout the function (and I don't see anything else that would require re-reading buf->size either), so the fix is to simply revert the change, and return len, as before. CC: Tomas Winkler CC: Arnd Bergmann CC: Greg Kroah-Hartman Signed-off-by: John Hubbard --- drivers/misc/mei/client.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/misc/mei/client.c b/drivers/misc/mei/client.c index 4ab6251d418e..ebdcf0b450e2 100644 --- a/drivers/misc/mei/client.c +++ b/drivers/misc/mei/client.c @@ -1767,7 +1767,7 @@ ssize_t mei_cl_write(struct mei_cl *cl, struct mei_cl_cb *cb) } } - rets = buf->size; + rets = len; err: cl_dbg(dev, cl, "rpm: autosuspend\n"); pm_runtime_mark_last_busy(dev->dev); -- 2.18.0
Re: [PATCH 3/4] mm/tlb, x86/mm: Support invalidating TLB caches for RCU_TABLE_FREE
On Wed, 2018-08-22 at 22:11 -0700, Linus Torvalds wrote: > On Wed, Aug 22, 2018 at 9:54 PM Benjamin Herrenschmidt > wrote: > > > > > > So we do need a different flush instruction for the page tables vs. the > > normal TLB pages. > > Right. ARM wants it too. x86 is odd in that a regular "invlpg" already > invalidates all the internal tlb cache nodes. > > So the "new world order" is exactly that patch that PeterZ sent you, that > adds a > > + unsigned intfreed_tables : 1; > .../... > So instead, when you get to the actual "tlb_flush(tlb)", you do > exactly that - flush the tlb. And the mmu_gather structure shows you > how much you need to flush. If you see that "freed_tables" is set, > then you know that you need to also do the special instruction to > flush the inner level caches. The range continues to show the page > range. Yup. That looks like a generic version of the "need_flush_all" flag we have, which is fine by us. Just don't blame powerpc for all the historical crap :-) Cheers, Ben. > >Linus
Re: [PATCH 3/4] mm/tlb, x86/mm: Support invalidating TLB caches for RCU_TABLE_FREE
On Wed, 2018-08-22 at 22:11 -0700, Linus Torvalds wrote: > On Wed, Aug 22, 2018 at 9:54 PM Benjamin Herrenschmidt > wrote: > > > > > > So we do need a different flush instruction for the page tables vs. the > > normal TLB pages. > > Right. ARM wants it too. x86 is odd in that a regular "invlpg" already > invalidates all the internal tlb cache nodes. > > So the "new world order" is exactly that patch that PeterZ sent you, that > adds a > > + unsigned intfreed_tables : 1; > .../... > So instead, when you get to the actual "tlb_flush(tlb)", you do > exactly that - flush the tlb. And the mmu_gather structure shows you > how much you need to flush. If you see that "freed_tables" is set, > then you know that you need to also do the special instruction to > flush the inner level caches. The range continues to show the page > range. Yup. That looks like a generic version of the "need_flush_all" flag we have, which is fine by us. Just don't blame powerpc for all the historical crap :-) Cheers, Ben. > >Linus
Re: [PATCH 3/4] mm/tlb, x86/mm: Support invalidating TLB caches for RCU_TABLE_FREE
On Wed, Aug 22, 2018 at 10:11 PM Linus Torvalds wrote: > > So instead, when you get to the actual "tlb_flush(tlb)", you do > exactly that - flush the tlb. And the mmu_gather structure shows you > how much you need to flush. If you see that "freed_tables" is set, > then you know that you need to also do the special instruction to > flush the inner level caches. The range continues to show the page > range. Note that this obviously works fine for a hashed table model too - you just ignore the "freed_tables" bit entirely and continue to do whatever you always did. And we can ignore it on x86 too, because we just see the range, and we invalidate the range, and that will always invalidate the mid-level caching too. So the new bit is literally for arm and powerpc-radix (and maybe s390), but we want to make the actual VM interface truly generic and not have one set of code with five different behaviors (which we _currently_ kind of have with the whole in addition to all the HAVE_RCU_TABLE_FREE etc config options that modify how the code works. It would be good to also cut down on the millions of functions that each architecture can override, because Christ, it got very confusing at times to follow just how the code flowed from generic code to architecture-specific macros back to generic code and then arch-specific inline helper functions. It's a maze of underscores and "page" vs "table", and "flush" vs "remove" etc. But that "it would be good to really make everybody to use as much of the generic code as possible" and everybody have the same pattern, that's a future thing. But the whole "let's just add that "freed_tables" thing would be part of trying to get people to use the same overall pattern, even if some architectures might not care about that detail. Linus
Re: [PATCH 3/4] mm/tlb, x86/mm: Support invalidating TLB caches for RCU_TABLE_FREE
On Wed, Aug 22, 2018 at 10:11 PM Linus Torvalds wrote: > > So instead, when you get to the actual "tlb_flush(tlb)", you do > exactly that - flush the tlb. And the mmu_gather structure shows you > how much you need to flush. If you see that "freed_tables" is set, > then you know that you need to also do the special instruction to > flush the inner level caches. The range continues to show the page > range. Note that this obviously works fine for a hashed table model too - you just ignore the "freed_tables" bit entirely and continue to do whatever you always did. And we can ignore it on x86 too, because we just see the range, and we invalidate the range, and that will always invalidate the mid-level caching too. So the new bit is literally for arm and powerpc-radix (and maybe s390), but we want to make the actual VM interface truly generic and not have one set of code with five different behaviors (which we _currently_ kind of have with the whole in addition to all the HAVE_RCU_TABLE_FREE etc config options that modify how the code works. It would be good to also cut down on the millions of functions that each architecture can override, because Christ, it got very confusing at times to follow just how the code flowed from generic code to architecture-specific macros back to generic code and then arch-specific inline helper functions. It's a maze of underscores and "page" vs "table", and "flush" vs "remove" etc. But that "it would be good to really make everybody to use as much of the generic code as possible" and everybody have the same pattern, that's a future thing. But the whole "let's just add that "freed_tables" thing would be part of trying to get people to use the same overall pattern, even if some architectures might not care about that detail. Linus
Re: [PATCH 1/2] dt-bindings: dmaengine: add DT binding for UniPhier MIO DMAC
Hi Jassi, 2018-08-21 19:44 GMT+09:00 Jassi Brar : > On 21 August 2018 at 15:17, Masahiro Yamada > wrote: >> (+CC Rob, DT, LKML) >> >> I forgot to CC this to DT community... >> >> >> 2018-08-21 18:30 GMT+09:00 Masahiro Yamada : >>> The MIO DMAC (Media IO DMA Controller) is used in UniPhier LD4, >>> Pro4, and sLD8 SoCs. >>> >>> Signed-off-by: Masahiro Yamada >>> --- >>> >>> .../devicetree/bindings/dma/uniphier-mio-dmac.txt | 28 >>> ++ >>> 1 file changed, 28 insertions(+) >>> create mode 100644 >>> Documentation/devicetree/bindings/dma/uniphier-mio-dmac.txt >>> >>> diff --git a/Documentation/devicetree/bindings/dma/uniphier-mio-dmac.txt >>> b/Documentation/devicetree/bindings/dma/uniphier-mio-dmac.txt >>> new file mode 100644 >>> index 000..a9e969e >>> --- /dev/null >>> +++ b/Documentation/devicetree/bindings/dma/uniphier-mio-dmac.txt >>> @@ -0,0 +1,28 @@ >>> +UniPhier Media IO DMA controller >>> + >>> +This works as an external DMA engine for SD/eMMC controllers etc. >>> +found in UniPhier LD4, Pro4, sLD8 SoCs. >>> + >>> +Required properties: >>> +- compatible: should be "socionext,uniphier-mio-dmac". >>> +- reg: offset and length of the register set for the device. >>> +- interrupts: a list of interrupt specifiers associated with the DMA >>> channels. >>> +- clocks: a single clock specifier >>> +- #dma-cells: should be <1>. The single cell represents the channel number. >>> +- dma-channels: specify the number of the DMA channels. This should match >>> to >>> + the number of tuples in the interrupts property. >>> + > Can we not infer the number of channels from interrupt tuples? After > all the driver assumes they are same. It would be possible to count the number of tuples in "interrupts". I know of_irq_count(), but I do not see any driver in drivers/dma/ that calls it. I guess the reason is that of_irq_count() is not exported, so tristate drivers like this cannot use it. I checked Documentation/devicetree/bindings/dma/, and some controllers specify _redundant_ dma-channels property. fsl-mxs-dma.txt renesas,rcar-dmac.txt renesas,usb-dmac.txt I also see counter-implementation. bcm2835-dma.c hard-codes the number of channels in the driver. tegra210-adma.c associates nr_channels with compatible string. I will wait for comments from the maintainers. If desired, I will export of_irq_count() and use it from my driver. Thanks. -- Best Regards Masahiro Yamada
Re: [PATCH 1/2] dt-bindings: dmaengine: add DT binding for UniPhier MIO DMAC
Hi Jassi, 2018-08-21 19:44 GMT+09:00 Jassi Brar : > On 21 August 2018 at 15:17, Masahiro Yamada > wrote: >> (+CC Rob, DT, LKML) >> >> I forgot to CC this to DT community... >> >> >> 2018-08-21 18:30 GMT+09:00 Masahiro Yamada : >>> The MIO DMAC (Media IO DMA Controller) is used in UniPhier LD4, >>> Pro4, and sLD8 SoCs. >>> >>> Signed-off-by: Masahiro Yamada >>> --- >>> >>> .../devicetree/bindings/dma/uniphier-mio-dmac.txt | 28 >>> ++ >>> 1 file changed, 28 insertions(+) >>> create mode 100644 >>> Documentation/devicetree/bindings/dma/uniphier-mio-dmac.txt >>> >>> diff --git a/Documentation/devicetree/bindings/dma/uniphier-mio-dmac.txt >>> b/Documentation/devicetree/bindings/dma/uniphier-mio-dmac.txt >>> new file mode 100644 >>> index 000..a9e969e >>> --- /dev/null >>> +++ b/Documentation/devicetree/bindings/dma/uniphier-mio-dmac.txt >>> @@ -0,0 +1,28 @@ >>> +UniPhier Media IO DMA controller >>> + >>> +This works as an external DMA engine for SD/eMMC controllers etc. >>> +found in UniPhier LD4, Pro4, sLD8 SoCs. >>> + >>> +Required properties: >>> +- compatible: should be "socionext,uniphier-mio-dmac". >>> +- reg: offset and length of the register set for the device. >>> +- interrupts: a list of interrupt specifiers associated with the DMA >>> channels. >>> +- clocks: a single clock specifier >>> +- #dma-cells: should be <1>. The single cell represents the channel number. >>> +- dma-channels: specify the number of the DMA channels. This should match >>> to >>> + the number of tuples in the interrupts property. >>> + > Can we not infer the number of channels from interrupt tuples? After > all the driver assumes they are same. It would be possible to count the number of tuples in "interrupts". I know of_irq_count(), but I do not see any driver in drivers/dma/ that calls it. I guess the reason is that of_irq_count() is not exported, so tristate drivers like this cannot use it. I checked Documentation/devicetree/bindings/dma/, and some controllers specify _redundant_ dma-channels property. fsl-mxs-dma.txt renesas,rcar-dmac.txt renesas,usb-dmac.txt I also see counter-implementation. bcm2835-dma.c hard-codes the number of channels in the driver. tegra210-adma.c associates nr_channels with compatible string. I will wait for comments from the maintainers. If desired, I will export of_irq_count() and use it from my driver. Thanks. -- Best Regards Masahiro Yamada
Re: [PATCH 2/4] mm/tlb: Remove tlb_remove_table() non-concurrent condition
On Wed, 22 Aug 2018 21:54:48 -0700 Linus Torvalds wrote: > On Wed, Aug 22, 2018 at 9:16 PM Nicholas Piggin wrote: > > > > On Wed, 22 Aug 2018 20:35:16 -0700 > > Linus Torvalds wrote: > > > > > > And yes, those lazy tlbs are all kernel threads, but they can still > > > speculatively load user addresses. > > > > So? > > > > If the arch does not shoot those all down after the user page tables > > are removed then it's buggy regardless of this short cut. > > Read the code. > > That shortcut frees the pages *WITHOUT* doing that TLB flush. I've read the code and I understand that. Hence I'm asking because I did't think the changelog matched the change. But possibly it was because I actually didn't read the changelog enough -- I guess it does say the TLB operation is the problem. I may have got side tracked by the word speculativee. > It just > does __tlb_remove_table(), which does *not* do that whole page > queueing so that we can batch flush and then free later. > > So the fast-case it's buggy, exactly because of the reason you state. Okay sure, thanks for confirming it for me. I would ask for changelog to be slightly expanded on, but maybe it's just my reading comprehension needs improvement... > > The only real problem I could see would be if a page walk cache still > > points to the freed table, then the table gets re-allocated and used > > elsewhere, and meanwhile a speculative access tries to load an entry > > from the page that is an invalid form of page table that might cause > > a machine check or something. That would be (u)arch specific, but if > > that's what we're concerned with here it's a different issue and needs > > to be documented as such. > > We've *seen* that case, we had exactly that when we were being > aggressive about trying to avoid flushes for the lazy mmu case > entirely, because "we can just flush when we activate the lazy mm > state instead". > > The logic was actually solid from a TLB case - who cares what garbage > TLB entries we might speculatively fill, when we're going to flush > them before they can possibly be used. > > It turns out that that logic is solid, but hits another problem: at > least some AMD CPU's will cause machine checks when the TLB entries > are inconsistent with the machine setup. That can't happen with a > *good* page table, but when you speculatively walk already freed > tables, you can get any garbage. Yeah that does make sense. > > I forget what the exact trigger was, but this was actually reported. > So you can't free page directory pages without flushing the tlb first > (to make that internal tlb node caches are flushed). > > So the logic for freeing leaf pages and freeing middle nodes has to be > exactly the same: you make the modification to the page table to > remove the node/leaf, you queue up the memory for freeing, you flush > the tlb, and you then free the page. > > That's the ordering that tlb_remove_page() has always honored, but > that tlb_remove_tabl() didn't. > > It honored it for the *normal* case, which is why it took so long to > notice that the TLB shootdown had been broken on x86 when it moved to > the "generic" code. The *normal* case does this all right, and batches > things up, and then when the batch fills up it does a > tlb_table_flush() which does the TLB flush and schedules the actual > freeing. > > But there were two cases that *didn't* do that. The special "I'm the > only thread" fast case, and the "oops I ran out of memory, so now I'll > fake it, and just synchronize with page twalkers manually, and then do > that special direct remove without flushing the tlb". > > NOTE! Jann triggered that one by > (a) forcing the machine low on memory > (b) force-poisoning the page tables immediately after free > > I suspect it's really hard to trigger under normal loads, exactly > because the *normal* case gets it right. It's literally the "oops, I > can't batch up because I ran out of memory" case that gets it wrong. > > (And the special single-thread + lazy or use_mm() case, but that's > going to be entirely impossible to trigger, because in practice it's > just a single thread, and you won't ever hit the magic timing needed > that frees the page in the single thread at exactly the same time that > some optimistic lazy mm on another cpu happens to speculatively load > that address). > > So the "atomic_read(_users)" case is likely entirely impossible to > trigger any sane way. But because Jann found the problem with the 'ran > out of memory" case, we started looking at the more theoretical cases > that matched the same kind of "no tlb flush before free" pattern. Thanks for giving that background. In that case I'm happy with this fix. Thanks, Nick
Re: [PATCH 2/4] mm/tlb: Remove tlb_remove_table() non-concurrent condition
On Wed, 22 Aug 2018 21:54:48 -0700 Linus Torvalds wrote: > On Wed, Aug 22, 2018 at 9:16 PM Nicholas Piggin wrote: > > > > On Wed, 22 Aug 2018 20:35:16 -0700 > > Linus Torvalds wrote: > > > > > > And yes, those lazy tlbs are all kernel threads, but they can still > > > speculatively load user addresses. > > > > So? > > > > If the arch does not shoot those all down after the user page tables > > are removed then it's buggy regardless of this short cut. > > Read the code. > > That shortcut frees the pages *WITHOUT* doing that TLB flush. I've read the code and I understand that. Hence I'm asking because I did't think the changelog matched the change. But possibly it was because I actually didn't read the changelog enough -- I guess it does say the TLB operation is the problem. I may have got side tracked by the word speculativee. > It just > does __tlb_remove_table(), which does *not* do that whole page > queueing so that we can batch flush and then free later. > > So the fast-case it's buggy, exactly because of the reason you state. Okay sure, thanks for confirming it for me. I would ask for changelog to be slightly expanded on, but maybe it's just my reading comprehension needs improvement... > > The only real problem I could see would be if a page walk cache still > > points to the freed table, then the table gets re-allocated and used > > elsewhere, and meanwhile a speculative access tries to load an entry > > from the page that is an invalid form of page table that might cause > > a machine check or something. That would be (u)arch specific, but if > > that's what we're concerned with here it's a different issue and needs > > to be documented as such. > > We've *seen* that case, we had exactly that when we were being > aggressive about trying to avoid flushes for the lazy mmu case > entirely, because "we can just flush when we activate the lazy mm > state instead". > > The logic was actually solid from a TLB case - who cares what garbage > TLB entries we might speculatively fill, when we're going to flush > them before they can possibly be used. > > It turns out that that logic is solid, but hits another problem: at > least some AMD CPU's will cause machine checks when the TLB entries > are inconsistent with the machine setup. That can't happen with a > *good* page table, but when you speculatively walk already freed > tables, you can get any garbage. Yeah that does make sense. > > I forget what the exact trigger was, but this was actually reported. > So you can't free page directory pages without flushing the tlb first > (to make that internal tlb node caches are flushed). > > So the logic for freeing leaf pages and freeing middle nodes has to be > exactly the same: you make the modification to the page table to > remove the node/leaf, you queue up the memory for freeing, you flush > the tlb, and you then free the page. > > That's the ordering that tlb_remove_page() has always honored, but > that tlb_remove_tabl() didn't. > > It honored it for the *normal* case, which is why it took so long to > notice that the TLB shootdown had been broken on x86 when it moved to > the "generic" code. The *normal* case does this all right, and batches > things up, and then when the batch fills up it does a > tlb_table_flush() which does the TLB flush and schedules the actual > freeing. > > But there were two cases that *didn't* do that. The special "I'm the > only thread" fast case, and the "oops I ran out of memory, so now I'll > fake it, and just synchronize with page twalkers manually, and then do > that special direct remove without flushing the tlb". > > NOTE! Jann triggered that one by > (a) forcing the machine low on memory > (b) force-poisoning the page tables immediately after free > > I suspect it's really hard to trigger under normal loads, exactly > because the *normal* case gets it right. It's literally the "oops, I > can't batch up because I ran out of memory" case that gets it wrong. > > (And the special single-thread + lazy or use_mm() case, but that's > going to be entirely impossible to trigger, because in practice it's > just a single thread, and you won't ever hit the magic timing needed > that frees the page in the single thread at exactly the same time that > some optimistic lazy mm on another cpu happens to speculatively load > that address). > > So the "atomic_read(_users)" case is likely entirely impossible to > trigger any sane way. But because Jann found the problem with the 'ran > out of memory" case, we started looking at the more theoretical cases > that matched the same kind of "no tlb flush before free" pattern. Thanks for giving that background. In that case I'm happy with this fix. Thanks, Nick
[PATCH] mm: respect arch_dup_mmap() return value
Commit d70f2a14b72a4 ("include/linux/sched/mm.h: uninline mmdrop_async(), etc") ignored the return value of arch_dup_mmap(). As a result, on x86, a failure to duplicate the LDT (e.g., due to memory allocation error), would leave the duplicated memory mapping in an inconsistent state. Fix by regarding the return value, as it was before the change. Fixes: d70f2a14b72a4 ("include/linux/sched/mm.h: uninline mmdrop_async(), etc") Cc: Andrew Morton Cc: sta...@vger.kernel.org Signed-off-by: Nadav Amit --- kernel/fork.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/kernel/fork.c b/kernel/fork.c index 1b27babc4c78..4527d1d331de 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -549,8 +549,7 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm, goto out; } /* a new mm has just been created */ - arch_dup_mmap(oldmm, mm); - retval = 0; + retval = arch_dup_mmap(oldmm, mm); out: up_write(>mmap_sem); flush_tlb_mm(oldmm); -- 2.17.1
[PATCH] mm: respect arch_dup_mmap() return value
Commit d70f2a14b72a4 ("include/linux/sched/mm.h: uninline mmdrop_async(), etc") ignored the return value of arch_dup_mmap(). As a result, on x86, a failure to duplicate the LDT (e.g., due to memory allocation error), would leave the duplicated memory mapping in an inconsistent state. Fix by regarding the return value, as it was before the change. Fixes: d70f2a14b72a4 ("include/linux/sched/mm.h: uninline mmdrop_async(), etc") Cc: Andrew Morton Cc: sta...@vger.kernel.org Signed-off-by: Nadav Amit --- kernel/fork.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/kernel/fork.c b/kernel/fork.c index 1b27babc4c78..4527d1d331de 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -549,8 +549,7 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm, goto out; } /* a new mm has just been created */ - arch_dup_mmap(oldmm, mm); - retval = 0; + retval = arch_dup_mmap(oldmm, mm); out: up_write(>mmap_sem); flush_tlb_mm(oldmm); -- 2.17.1
Re: [PATCH 3/4] mm/tlb, x86/mm: Support invalidating TLB caches for RCU_TABLE_FREE
On Wed, Aug 22, 2018 at 9:54 PM Benjamin Herrenschmidt wrote: > > > So we do need a different flush instruction for the page tables vs. the > normal TLB pages. Right. ARM wants it too. x86 is odd in that a regular "invlpg" already invalidates all the internal tlb cache nodes. So the "new world order" is exactly that patch that PeterZ sent you, that adds a + unsigned intfreed_tables : 1; to the 'struct mmu_gather', and then makes all those pte/pmd/pud/p4d_free_tlb() functions set that bit. So I'm referring to the email PeterZ sent you in this thread that said: Nick, Will is already looking at using this to remove the synchronous invalidation from __p*_free_tlb() for ARM, could you have a look to see if PowerPC-radix could benefit from that too? Basically, using a patch like the below, would give your tlb_flush() information on if tables were removed or not. then, in that model, you do *not* need to override these pte/pmd/pud/p4d_free_tlb() macros at all (well, you *can* if you want to, for doing games with the range modification, but let's sayt that you don't need that right now). So instead, when you get to the actual "tlb_flush(tlb)", you do exactly that - flush the tlb. And the mmu_gather structure shows you how much you need to flush. If you see that "freed_tables" is set, then you know that you need to also do the special instruction to flush the inner level caches. The range continues to show the page range. Linus
Re: [PATCH 3/4] mm/tlb, x86/mm: Support invalidating TLB caches for RCU_TABLE_FREE
On Wed, Aug 22, 2018 at 9:54 PM Benjamin Herrenschmidt wrote: > > > So we do need a different flush instruction for the page tables vs. the > normal TLB pages. Right. ARM wants it too. x86 is odd in that a regular "invlpg" already invalidates all the internal tlb cache nodes. So the "new world order" is exactly that patch that PeterZ sent you, that adds a + unsigned intfreed_tables : 1; to the 'struct mmu_gather', and then makes all those pte/pmd/pud/p4d_free_tlb() functions set that bit. So I'm referring to the email PeterZ sent you in this thread that said: Nick, Will is already looking at using this to remove the synchronous invalidation from __p*_free_tlb() for ARM, could you have a look to see if PowerPC-radix could benefit from that too? Basically, using a patch like the below, would give your tlb_flush() information on if tables were removed or not. then, in that model, you do *not* need to override these pte/pmd/pud/p4d_free_tlb() macros at all (well, you *can* if you want to, for doing games with the range modification, but let's sayt that you don't need that right now). So instead, when you get to the actual "tlb_flush(tlb)", you do exactly that - flush the tlb. And the mmu_gather structure shows you how much you need to flush. If you see that "freed_tables" is set, then you know that you need to also do the special instruction to flush the inner level caches. The range continues to show the page range. Linus
Re: [PATCH] iscsi-target: Don't use stack buffer for scatterlist
ccing Maurizio because he was working on the same issue. On 08/22/2018 12:37 PM, Laura Abbott wrote: > Fedora got a bug report of a crash with iSCSI: > > kernel BUG at include/linux/scatterlist.h:143! > ... > RIP: 0010:iscsit_do_crypto_hash_buf+0x154/0x180 [iscsi_target_mod] > ... > Call Trace: > ? iscsi_target_tx_thread+0x200/0x200 [iscsi_target_mod] > iscsit_get_rx_pdu+0x4cd/0xa90 [iscsi_target_mod] > ? native_sched_clock+0x3e/0xa0 > ? iscsi_target_tx_thread+0x200/0x200 [iscsi_target_mod] > iscsi_target_rx_thread+0x81/0xf0 [iscsi_target_mod] > kthread+0x120/0x140 > ? kthread_create_worker_on_cpu+0x70/0x70 > ret_from_fork+0x3a/0x50 > > This is a BUG_ON for using a stack buffer with a scatterlist. > There are two cases that trigger this bug. Switch to using a > dynamically allocated buffer for one case and do not assign > a NULL buffer in another case. > > Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1615258 > Signed-off-by: Laura Abbott > --- > drivers/target/iscsi/iscsi_target.c | 11 +-- > 1 file changed, 9 insertions(+), 2 deletions(-) > > diff --git a/drivers/target/iscsi/iscsi_target.c > b/drivers/target/iscsi/iscsi_target.c > index 8e223799347a..8b40f0e99e2c 100644 > --- a/drivers/target/iscsi/iscsi_target.c > +++ b/drivers/target/iscsi/iscsi_target.c > @@ -1419,7 +1419,8 @@ static void iscsit_do_crypto_hash_buf(struct > ahash_request *hash, > > sg_init_table(sg, ARRAY_SIZE(sg)); > sg_set_buf(sg, buf, payload_length); > - sg_set_buf(sg + 1, pad_bytes, padding); > + if (padding) > + sg_set_buf(sg + 1, pad_bytes, padding); > > ahash_request_set_crypt(hash, sg, data_crc, payload_length + padding); > > @@ -3913,10 +3914,14 @@ static bool iscsi_target_check_conn_state(struct > iscsi_conn *conn) > static void iscsit_get_rx_pdu(struct iscsi_conn *conn) > { > int ret; > - u8 buffer[ISCSI_HDR_LEN], opcode; > + u8 *buffer, opcode; > u32 checksum = 0, digest = 0; > struct kvec iov; > > + buffer = kmalloc_array(ISCSI_HDR_LEN, sizeof(*buffer), GFP_KERNEL); > + if (!buffer) > + return; > + > while (!kthread_should_stop()) { > /* >* Ensure that both TX and RX per connection kthreads > @@ -3985,6 +3990,8 @@ static void iscsit_get_rx_pdu(struct iscsi_conn *conn) > if (ret < 0) > return; > } You need to also change all the returns to breaks between the kmalloc above and kfree here. > + kfree(buffer); > } > > int iscsi_target_rx_thread(void *arg) >
Re: [PATCH] iscsi-target: Don't use stack buffer for scatterlist
ccing Maurizio because he was working on the same issue. On 08/22/2018 12:37 PM, Laura Abbott wrote: > Fedora got a bug report of a crash with iSCSI: > > kernel BUG at include/linux/scatterlist.h:143! > ... > RIP: 0010:iscsit_do_crypto_hash_buf+0x154/0x180 [iscsi_target_mod] > ... > Call Trace: > ? iscsi_target_tx_thread+0x200/0x200 [iscsi_target_mod] > iscsit_get_rx_pdu+0x4cd/0xa90 [iscsi_target_mod] > ? native_sched_clock+0x3e/0xa0 > ? iscsi_target_tx_thread+0x200/0x200 [iscsi_target_mod] > iscsi_target_rx_thread+0x81/0xf0 [iscsi_target_mod] > kthread+0x120/0x140 > ? kthread_create_worker_on_cpu+0x70/0x70 > ret_from_fork+0x3a/0x50 > > This is a BUG_ON for using a stack buffer with a scatterlist. > There are two cases that trigger this bug. Switch to using a > dynamically allocated buffer for one case and do not assign > a NULL buffer in another case. > > Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1615258 > Signed-off-by: Laura Abbott > --- > drivers/target/iscsi/iscsi_target.c | 11 +-- > 1 file changed, 9 insertions(+), 2 deletions(-) > > diff --git a/drivers/target/iscsi/iscsi_target.c > b/drivers/target/iscsi/iscsi_target.c > index 8e223799347a..8b40f0e99e2c 100644 > --- a/drivers/target/iscsi/iscsi_target.c > +++ b/drivers/target/iscsi/iscsi_target.c > @@ -1419,7 +1419,8 @@ static void iscsit_do_crypto_hash_buf(struct > ahash_request *hash, > > sg_init_table(sg, ARRAY_SIZE(sg)); > sg_set_buf(sg, buf, payload_length); > - sg_set_buf(sg + 1, pad_bytes, padding); > + if (padding) > + sg_set_buf(sg + 1, pad_bytes, padding); > > ahash_request_set_crypt(hash, sg, data_crc, payload_length + padding); > > @@ -3913,10 +3914,14 @@ static bool iscsi_target_check_conn_state(struct > iscsi_conn *conn) > static void iscsit_get_rx_pdu(struct iscsi_conn *conn) > { > int ret; > - u8 buffer[ISCSI_HDR_LEN], opcode; > + u8 *buffer, opcode; > u32 checksum = 0, digest = 0; > struct kvec iov; > > + buffer = kmalloc_array(ISCSI_HDR_LEN, sizeof(*buffer), GFP_KERNEL); > + if (!buffer) > + return; > + > while (!kthread_should_stop()) { > /* >* Ensure that both TX and RX per connection kthreads > @@ -3985,6 +3990,8 @@ static void iscsit_get_rx_pdu(struct iscsi_conn *conn) > if (ret < 0) > return; > } You need to also change all the returns to breaks between the kmalloc above and kfree here. > + kfree(buffer); > } > > int iscsi_target_rx_thread(void *arg) >
Re: [PATCH 3/4] mm/tlb, x86/mm: Support invalidating TLB caches for RCU_TABLE_FREE
On Wed, Aug 22, 2018 at 9:33 PM Nicholas Piggin wrote: > > I think it was quite well understood and fixed here, a145abf12c9 but > again that was before I really started looking at it. You don't understand the problem. All the x86 people thought WE ALREADY DID THAT. Because we had done this all correctly over a decade ago! Nobody realized that it had been screwed up by the powerpc code, and the commit you point to was believed to be a new *powerpc* only issue, because the semantics on powerpc has changed because of the radix tree. The semantics on x86 have never changed, they've always been the same. So why would the x86 people react to powerpc doing something that x86 had already always done. See? Nobody cared one whit about commit a145abf12c9, because it just handles a new powerpc-specific case. > I don't really understand what the issue you have with powerpc here. > powerpc hash has the page table flushing accessors which are just > no-ops, it's the generic code that fails to call them properly. Surely > there was no powerpc patch that removed those calls from generic code? Yes there was. Look where the generic code *came* from. It's powerpc code. See commit 267239116987 ("mm, powerpc: move the RCU page-table freeing into generic code"). The powerpc code was made into the generic code, because the powerpc code had to handle all those special RCU freeing things etc that others didn't. It's just that when x86 was then switched over to use the "generic" code, people didn't realize that the generic code didn't do the TLB invalidations for page tables, because they hadn't been needed on powerpc. So the powerpc code that was made generic, never really was. The new "generic" code had a powerpc-specific quirk. That then very subtly broke x86, without the x86 people ever realizing. Because the old simple non-RCU x86 code had never had that issue, it just treated the leaf pages and the directory pages exactly the same. See? And THAT is why I talk about the powerpc code. Because what is "generic" code in 4.18 (and several releases before) oisn't actually generic. And that's basically exactly the bug that the patches from PeterZ is fixing. Making the "tlb_remove_table()" code always flush the tlb, the way it should have when it was made generic. Linus
Re: [PATCH 3/4] mm/tlb, x86/mm: Support invalidating TLB caches for RCU_TABLE_FREE
On Wed, Aug 22, 2018 at 9:33 PM Nicholas Piggin wrote: > > I think it was quite well understood and fixed here, a145abf12c9 but > again that was before I really started looking at it. You don't understand the problem. All the x86 people thought WE ALREADY DID THAT. Because we had done this all correctly over a decade ago! Nobody realized that it had been screwed up by the powerpc code, and the commit you point to was believed to be a new *powerpc* only issue, because the semantics on powerpc has changed because of the radix tree. The semantics on x86 have never changed, they've always been the same. So why would the x86 people react to powerpc doing something that x86 had already always done. See? Nobody cared one whit about commit a145abf12c9, because it just handles a new powerpc-specific case. > I don't really understand what the issue you have with powerpc here. > powerpc hash has the page table flushing accessors which are just > no-ops, it's the generic code that fails to call them properly. Surely > there was no powerpc patch that removed those calls from generic code? Yes there was. Look where the generic code *came* from. It's powerpc code. See commit 267239116987 ("mm, powerpc: move the RCU page-table freeing into generic code"). The powerpc code was made into the generic code, because the powerpc code had to handle all those special RCU freeing things etc that others didn't. It's just that when x86 was then switched over to use the "generic" code, people didn't realize that the generic code didn't do the TLB invalidations for page tables, because they hadn't been needed on powerpc. So the powerpc code that was made generic, never really was. The new "generic" code had a powerpc-specific quirk. That then very subtly broke x86, without the x86 people ever realizing. Because the old simple non-RCU x86 code had never had that issue, it just treated the leaf pages and the directory pages exactly the same. See? And THAT is why I talk about the powerpc code. Because what is "generic" code in 4.18 (and several releases before) oisn't actually generic. And that's basically exactly the bug that the patches from PeterZ is fixing. Making the "tlb_remove_table()" code always flush the tlb, the way it should have when it was made generic. Linus
Re: [PATCH 2/4] mm/tlb: Remove tlb_remove_table() non-concurrent condition
On Wed, Aug 22, 2018 at 9:16 PM Nicholas Piggin wrote: > > On Wed, 22 Aug 2018 20:35:16 -0700 > Linus Torvalds wrote: > > > > And yes, those lazy tlbs are all kernel threads, but they can still > > speculatively load user addresses. > > So? > > If the arch does not shoot those all down after the user page tables > are removed then it's buggy regardless of this short cut. Read the code. That shortcut frees the pages *WITHOUT* doing that TLB flush. It just does __tlb_remove_table(), which does *not* do that whole page queueing so that we can batch flush and then free later. So the fast-case it's buggy, exactly because of the reason you state. The logic used to be that if you were the only cpu that had that tlb, and it was a mid-level table, it didn't need that synchronization at all. And that logic is simply wrong. Exactly because even if mm_users is 1, there can be things looking up TLB entries on other CPU's. Either because of a lazy mm and a hw walker with speculation, or because of use_mm() and a software walker. So the whole "free immediately" shortcut was bogus. You *always* need to queue, then flush the tlb, and then free. That said, that wasn't actually the bug that Jann found. He found the bug a few lines lower down, where the "I can't allocate memory for queueing" ended up *also* not flushing the TLB. > The only real problem I could see would be if a page walk cache still > points to the freed table, then the table gets re-allocated and used > elsewhere, and meanwhile a speculative access tries to load an entry > from the page that is an invalid form of page table that might cause > a machine check or something. That would be (u)arch specific, but if > that's what we're concerned with here it's a different issue and needs > to be documented as such. We've *seen* that case, we had exactly that when we were being aggressive about trying to avoid flushes for the lazy mmu case entirely, because "we can just flush when we activate the lazy mm state instead". The logic was actually solid from a TLB case - who cares what garbage TLB entries we might speculatively fill, when we're going to flush them before they can possibly be used. It turns out that that logic is solid, but hits another problem: at least some AMD CPU's will cause machine checks when the TLB entries are inconsistent with the machine setup. That can't happen with a *good* page table, but when you speculatively walk already freed tables, you can get any garbage. I forget what the exact trigger was, but this was actually reported. So you can't free page directory pages without flushing the tlb first (to make that internal tlb node caches are flushed). So the logic for freeing leaf pages and freeing middle nodes has to be exactly the same: you make the modification to the page table to remove the node/leaf, you queue up the memory for freeing, you flush the tlb, and you then free the page. That's the ordering that tlb_remove_page() has always honored, but that tlb_remove_tabl() didn't. It honored it for the *normal* case, which is why it took so long to notice that the TLB shootdown had been broken on x86 when it moved to the "generic" code. The *normal* case does this all right, and batches things up, and then when the batch fills up it does a tlb_table_flush() which does the TLB flush and schedules the actual freeing. But there were two cases that *didn't* do that. The special "I'm the only thread" fast case, and the "oops I ran out of memory, so now I'll fake it, and just synchronize with page twalkers manually, and then do that special direct remove without flushing the tlb". NOTE! Jann triggered that one by (a) forcing the machine low on memory (b) force-poisoning the page tables immediately after free I suspect it's really hard to trigger under normal loads, exactly because the *normal* case gets it right. It's literally the "oops, I can't batch up because I ran out of memory" case that gets it wrong. (And the special single-thread + lazy or use_mm() case, but that's going to be entirely impossible to trigger, because in practice it's just a single thread, and you won't ever hit the magic timing needed that frees the page in the single thread at exactly the same time that some optimistic lazy mm on another cpu happens to speculatively load that address). So the "atomic_read(_users)" case is likely entirely impossible to trigger any sane way. But because Jann found the problem with the 'ran out of memory" case, we started looking at the more theoretical cases that matched the same kind of "no tlb flush before free" pattern. Linus
Re: [PATCH 2/4] mm/tlb: Remove tlb_remove_table() non-concurrent condition
On Wed, Aug 22, 2018 at 9:16 PM Nicholas Piggin wrote: > > On Wed, 22 Aug 2018 20:35:16 -0700 > Linus Torvalds wrote: > > > > And yes, those lazy tlbs are all kernel threads, but they can still > > speculatively load user addresses. > > So? > > If the arch does not shoot those all down after the user page tables > are removed then it's buggy regardless of this short cut. Read the code. That shortcut frees the pages *WITHOUT* doing that TLB flush. It just does __tlb_remove_table(), which does *not* do that whole page queueing so that we can batch flush and then free later. So the fast-case it's buggy, exactly because of the reason you state. The logic used to be that if you were the only cpu that had that tlb, and it was a mid-level table, it didn't need that synchronization at all. And that logic is simply wrong. Exactly because even if mm_users is 1, there can be things looking up TLB entries on other CPU's. Either because of a lazy mm and a hw walker with speculation, or because of use_mm() and a software walker. So the whole "free immediately" shortcut was bogus. You *always* need to queue, then flush the tlb, and then free. That said, that wasn't actually the bug that Jann found. He found the bug a few lines lower down, where the "I can't allocate memory for queueing" ended up *also* not flushing the TLB. > The only real problem I could see would be if a page walk cache still > points to the freed table, then the table gets re-allocated and used > elsewhere, and meanwhile a speculative access tries to load an entry > from the page that is an invalid form of page table that might cause > a machine check or something. That would be (u)arch specific, but if > that's what we're concerned with here it's a different issue and needs > to be documented as such. We've *seen* that case, we had exactly that when we were being aggressive about trying to avoid flushes for the lazy mmu case entirely, because "we can just flush when we activate the lazy mm state instead". The logic was actually solid from a TLB case - who cares what garbage TLB entries we might speculatively fill, when we're going to flush them before they can possibly be used. It turns out that that logic is solid, but hits another problem: at least some AMD CPU's will cause machine checks when the TLB entries are inconsistent with the machine setup. That can't happen with a *good* page table, but when you speculatively walk already freed tables, you can get any garbage. I forget what the exact trigger was, but this was actually reported. So you can't free page directory pages without flushing the tlb first (to make that internal tlb node caches are flushed). So the logic for freeing leaf pages and freeing middle nodes has to be exactly the same: you make the modification to the page table to remove the node/leaf, you queue up the memory for freeing, you flush the tlb, and you then free the page. That's the ordering that tlb_remove_page() has always honored, but that tlb_remove_tabl() didn't. It honored it for the *normal* case, which is why it took so long to notice that the TLB shootdown had been broken on x86 when it moved to the "generic" code. The *normal* case does this all right, and batches things up, and then when the batch fills up it does a tlb_table_flush() which does the TLB flush and schedules the actual freeing. But there were two cases that *didn't* do that. The special "I'm the only thread" fast case, and the "oops I ran out of memory, so now I'll fake it, and just synchronize with page twalkers manually, and then do that special direct remove without flushing the tlb". NOTE! Jann triggered that one by (a) forcing the machine low on memory (b) force-poisoning the page tables immediately after free I suspect it's really hard to trigger under normal loads, exactly because the *normal* case gets it right. It's literally the "oops, I can't batch up because I ran out of memory" case that gets it wrong. (And the special single-thread + lazy or use_mm() case, but that's going to be entirely impossible to trigger, because in practice it's just a single thread, and you won't ever hit the magic timing needed that frees the page in the single thread at exactly the same time that some optimistic lazy mm on another cpu happens to speculatively load that address). So the "atomic_read(_users)" case is likely entirely impossible to trigger any sane way. But because Jann found the problem with the 'ran out of memory" case, we started looking at the more theoretical cases that matched the same kind of "no tlb flush before free" pattern. Linus
Re: [PATCH 3/4] mm/tlb, x86/mm: Support invalidating TLB caches for RCU_TABLE_FREE
On Wed, 2018-08-22 at 20:59 -0700, Linus Torvalds wrote: > On Wed, Aug 22, 2018 at 8:45 PM Nicholas Piggin wrote: > > > > powerpc/radix has no such issue, it already does this tracking. > > Yeah, I now realize that this was why you wanted to add that hacky > thing to the generic code, so that you can add the tlb_flush_pgtable() > call. > > I thought it was because powerpc had some special flush instruction > for it, and the regular tlb flush didn't do it. But no. It was because > the regular code had lost the tlb flush _entirely_, because powerpc > didn't want it. Heh :-) Well, back on hash we didn't (we do now with Radix) but I wouldn't blame us for the generic code being broken ... the RCU table freeing was in arch/powerpc at the time :-) I don't think it was us making it generic :) > > We were discussing this a couple of months ago, I wasn't aware of ARM's > > issue but I suggested x86 could go the same way as powerpc. > > The problem is that x86 _used_ to do this all correctly long long ago. > > And then we switched over to the "generic" table flushing (which > harkens back to the powerpc code). Yes, we wrote it the RCU stuff to solve the races with SW walking, which is completely orthogonal with HW walking & TLB content. We didn't do the move to generic code though ;-) > Which actually turned out to be not generic at all, and did not flush > the internal pages like x86 used to (back when x86 just used > tlb_remove_page for everything). Well, having RCU do the flushing is rather generic, it makes sense whenever there's somebody doing a SW walk *and* you don't have IPIs to synchronize your flushes (ie, anybody with HW TLB invalidation broadcast basically, so ARM and us). > So as a result, x86 had unintentionally lost the TLB flush we used to > have, because tlb_remove_table() had lost the tlb flushing because of > a powerpc quirk. This is a somewhat odd way of putting the "blame" :-) But yeah ok... > You then added it back as a hacky per-architecture hook (apparently > having realized that you never did it at all), which didn't fix the > unintentional lack of flushing on x86. > > So now we're going to do it right. No more "oh, powerpc didn't need > to flush because the hash tables weren't in the tlb at all" thing in > the generic code that then others need to work around. So we do need a different flush instruction for the page tables vs. the normal TLB pages. Cheers, Ben.
Re: [PATCH 3/4] mm/tlb, x86/mm: Support invalidating TLB caches for RCU_TABLE_FREE
On Wed, 2018-08-22 at 20:59 -0700, Linus Torvalds wrote: > On Wed, Aug 22, 2018 at 8:45 PM Nicholas Piggin wrote: > > > > powerpc/radix has no such issue, it already does this tracking. > > Yeah, I now realize that this was why you wanted to add that hacky > thing to the generic code, so that you can add the tlb_flush_pgtable() > call. > > I thought it was because powerpc had some special flush instruction > for it, and the regular tlb flush didn't do it. But no. It was because > the regular code had lost the tlb flush _entirely_, because powerpc > didn't want it. Heh :-) Well, back on hash we didn't (we do now with Radix) but I wouldn't blame us for the generic code being broken ... the RCU table freeing was in arch/powerpc at the time :-) I don't think it was us making it generic :) > > We were discussing this a couple of months ago, I wasn't aware of ARM's > > issue but I suggested x86 could go the same way as powerpc. > > The problem is that x86 _used_ to do this all correctly long long ago. > > And then we switched over to the "generic" table flushing (which > harkens back to the powerpc code). Yes, we wrote it the RCU stuff to solve the races with SW walking, which is completely orthogonal with HW walking & TLB content. We didn't do the move to generic code though ;-) > Which actually turned out to be not generic at all, and did not flush > the internal pages like x86 used to (back when x86 just used > tlb_remove_page for everything). Well, having RCU do the flushing is rather generic, it makes sense whenever there's somebody doing a SW walk *and* you don't have IPIs to synchronize your flushes (ie, anybody with HW TLB invalidation broadcast basically, so ARM and us). > So as a result, x86 had unintentionally lost the TLB flush we used to > have, because tlb_remove_table() had lost the tlb flushing because of > a powerpc quirk. This is a somewhat odd way of putting the "blame" :-) But yeah ok... > You then added it back as a hacky per-architecture hook (apparently > having realized that you never did it at all), which didn't fix the > unintentional lack of flushing on x86. > > So now we're going to do it right. No more "oh, powerpc didn't need > to flush because the hash tables weren't in the tlb at all" thing in > the generic code that then others need to work around. So we do need a different flush instruction for the page tables vs. the normal TLB pages. Cheers, Ben.
Re: [PATCH 3/4] mm/tlb, x86/mm: Support invalidating TLB caches for RCU_TABLE_FREE
On Wed, 22 Aug 2018 20:59:46 -0700 Linus Torvalds wrote: > On Wed, Aug 22, 2018 at 8:45 PM Nicholas Piggin wrote: > > > > powerpc/radix has no such issue, it already does this tracking. > > Yeah, I now realize that this was why you wanted to add that hacky > thing to the generic code, so that you can add the tlb_flush_pgtable() > call. > > I thought it was because powerpc had some special flush instruction > for it, and the regular tlb flush didn't do it. But no. powerpc/radix does have a special instruction for it, that is why I posted the patch :) > It was because > the regular code had lost the tlb flush _entirely_, because powerpc > didn't want it. I think that was long before I started looking at the code. powerpc/hash hardware has no idea about the page tables so yeah they don't need it. > > > We were discussing this a couple of months ago, I wasn't aware of ARM's > > issue but I suggested x86 could go the same way as powerpc. > > The problem is that x86 _used_ to do this all correctly long long ago. > > And then we switched over to the "generic" table flushing (which > harkens back to the powerpc code). > > Which actually turned out to be not generic at all, and did not flush > the internal pages like x86 used to (back when x86 just used > tlb_remove_page for everything). > > So as a result, x86 had unintentionally lost the TLB flush we used to > have, because tlb_remove_table() had lost the tlb flushing because of > a powerpc quirk. > > You then added it back as a hacky per-architecture hook (apparently > having realized that you never did it at all), which didn't fix the I think it was quite well understood and fixed here, a145abf12c9 but again that was before I really started looking at it. The hooks I added recently are for a different reason, and it's actaully the opposite problem -- to work around the hacky generic code that x86 foisted on other archs. > unintentional lack of flushing on x86. > > So now we're going to do it right. No more "oh, powerpc didn't need > to flush because the hash tables weren't in the tlb at all" thing in > the generic code that then others need to work around. I don't really understand what the issue you have with powerpc here. powerpc hash has the page table flushing accessors which are just no-ops, it's the generic code that fails to call them properly. Surely there was no powerpc patch that removed those calls from generic code? powerpc/radix yes it does some arch specific things to do its page walk cache flushing, but it is a better design than the hacks x86 has in generic code, surely. I thought you basically agreed and thought x86 / generic code could move to that kind of model. Thanks, Nick
Re: [PATCH 3/4] mm/tlb, x86/mm: Support invalidating TLB caches for RCU_TABLE_FREE
On Wed, 22 Aug 2018 20:59:46 -0700 Linus Torvalds wrote: > On Wed, Aug 22, 2018 at 8:45 PM Nicholas Piggin wrote: > > > > powerpc/radix has no such issue, it already does this tracking. > > Yeah, I now realize that this was why you wanted to add that hacky > thing to the generic code, so that you can add the tlb_flush_pgtable() > call. > > I thought it was because powerpc had some special flush instruction > for it, and the regular tlb flush didn't do it. But no. powerpc/radix does have a special instruction for it, that is why I posted the patch :) > It was because > the regular code had lost the tlb flush _entirely_, because powerpc > didn't want it. I think that was long before I started looking at the code. powerpc/hash hardware has no idea about the page tables so yeah they don't need it. > > > We were discussing this a couple of months ago, I wasn't aware of ARM's > > issue but I suggested x86 could go the same way as powerpc. > > The problem is that x86 _used_ to do this all correctly long long ago. > > And then we switched over to the "generic" table flushing (which > harkens back to the powerpc code). > > Which actually turned out to be not generic at all, and did not flush > the internal pages like x86 used to (back when x86 just used > tlb_remove_page for everything). > > So as a result, x86 had unintentionally lost the TLB flush we used to > have, because tlb_remove_table() had lost the tlb flushing because of > a powerpc quirk. > > You then added it back as a hacky per-architecture hook (apparently > having realized that you never did it at all), which didn't fix the I think it was quite well understood and fixed here, a145abf12c9 but again that was before I really started looking at it. The hooks I added recently are for a different reason, and it's actaully the opposite problem -- to work around the hacky generic code that x86 foisted on other archs. > unintentional lack of flushing on x86. > > So now we're going to do it right. No more "oh, powerpc didn't need > to flush because the hash tables weren't in the tlb at all" thing in > the generic code that then others need to work around. I don't really understand what the issue you have with powerpc here. powerpc hash has the page table flushing accessors which are just no-ops, it's the generic code that fails to call them properly. Surely there was no powerpc patch that removed those calls from generic code? powerpc/radix yes it does some arch specific things to do its page walk cache flushing, but it is a better design than the hacks x86 has in generic code, surely. I thought you basically agreed and thought x86 / generic code could move to that kind of model. Thanks, Nick
[GIT PULL 4/4] ARM: Device-tree updates
Business as usual -- the bulk of our changes are to devicetree files with new hardware support, new SoCs and platforms, and new board types. New SoCs/platforms: - Raspberry Pi Compute Module (CM1) and IO board - i.MX6SSL from NXP - Renesas RZ/N1D SoC (R9A06G032), Dual Cortex-A7 with Ethernet, CAN and PLC interfaces - TI AM654 SoC, Quad Cortex-A53, safety subsystem with Cortex-R5 controllers, communication and PRU subsystem and lots of other interfaces (PCIe, USB3, etc). New boards and systems: - Several Atmel at91-based boards from Laird - Marvell Armada388-based Helios4 board from SolidRun - Samsung Aires-based phones (s5pv210) - Allwinner A64-based Pinebook laptop In addition to the above, there's the usual amount of new devices described on existing platforms, fixes and tweaks and new minor variants of boards/platforms. The following changes since commit a8bba4bb81f33b520f2e9bee78cf2e845e16c9c3: Merge tag 'armsoc-defconfig' into HEAD are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc.git tags/armsoc-dt for you to fetch changes up to afd3e3dad6761ddf08119afe121bfbe096c0844b: Merge tag 'qcom-arm64-for-4.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/agross/linux into next/dt Adam Ford (6): ARM: dts: da850-evm: Enable SATA port ARM: dts: da850-evm: Enable LCD and backlight ARM: dts: am3517-evm: Remove unnessary PMIC parameters ARM: dts: am3517-som: Add WL127x Wifi ARM: dts: am3517-som: Add builtin Bluetooth ARM: dts: am3517-evm: Enable USB1 Host Alex Gonzalez (2): ARM: dts: imx6ul: Add DTS for ConnectCore 6UL System-On-Module (SOM) ARM: dts: imx6ul: Add DTS for ConnectCore 6UL SBC Express Alexander Kurz (6): ARM: dts: imx6qdl-wandboard: remove regulators bus node ARM: dts: imx6dl-riotboard: remove regulators bus node ARM: dts: imx6qdl-wandboard: enable USB OTG ARM: dts: imx6dl-riotboard: fix OTG regulator polarity ARM: dts: imx50: fix KPP pin definition typos ARM: dts: imx50: remove non-existent USB instances Alexandre Belloni (4): dt-bindings: arm: remove PMC bindings dt-bindings: clk: at91: Document all the PMC compatibles ARM: at91: fix USB clock detection handling ARM: dts: fix PMC compatible Alexandre Torgue (2): ARM: dts: stm32: remove gpio aliases for stm32mp157c ARM: dts: stm32: Reorder nodes in stm32mp157c-ed1 Amelie Delaunay (9): ARM: dts: stm32: update rtc st,syscfg property on stm32f429 ARM: dts: stm32: update rtc st,syscfg property on stm32f746 ARM: dts: stm32: add RTC support to stm32mp157c ARM: dts: stm32: enable RTC on stm32mp157c-ed1 ARM: dts: stm32: enable USB Host (USBH) EHCI controller on stm32mp157c-ev1 ARM: dts: stm32: add USB OTG HS support for stm32mp157c SoC ARM: dts: stm32: enable USB OTG HS on stm32mp157c-ev1 ARM: dts: stm32: add SPI support on stm32mp157c ARM: dts: stm32: add SPI1 support on stm32mp157c-ev1 Amit Kucheria (2): arm64: dts: msm8996: thermal: Initialise via DT and add second controller arm64: dts: sdm845: Add tsens nodes Anand Moon (1): ARM: dts: exynos: Add missing interrupts for pwm node on Exynos5 Andre Przywara (1): arm64: dts: allwinner: a64: Add PWM controllers Andrey Smirnov (13): ARM: dts: imx51-zii-rdu1: Make sure SD1_WP is low ARM: dts: imx51-zii-rdu1: Populate RAVE SP EEPROM nodes ARM: dts: imx6qdl-zii-rdu2: Populate RAVE SP EEPROM nodes ARM: dts: imx6qdl-zii-rdu2: Populate RAVE SP backlight node ARM: dts: imx6qdl-zii-rdu2: Populate RAVE SP power button node ARM: dts: imx51-babbage: Make use of pinctrl_usbh1reg ARM: dts: imx: Add ZII SCU2 Mezz board ARM: dts: imx: Add ZII SCU3 ESB ARM: dts: imx51-zii-scu3-esb: Add switch IRQ line pinumx config ARM: dts: imx51-zii-scu3-esb: Fix RAVE SP watchdog compatible string ARM: dts: vf610: Add ZII SSMB SPU3 board ARM: dts: vf610: Add ZII CFU1 board ARM: dts: vf610-zii-ssmb-spu3: Fix W=1 level warnings Andy Gross (2): Merge tag 'qcom-arm64-for-4.19' into arm64-for-4.19-2 Merge tag 'qcom-drivers-for-4.19' into arm64-for-4.19-2 Anson Huang (14): ARM: dts: imx: add cooling-cells for cpufreq cooling device ARM: dts: imx6sll: declare src module to be compatible to imx51's src ARM: dts: imx6sll-evk: enable usdhc3 slot ARM: dts: imx6: correct anatop regulators range ARM: dts: imx6sx: add ocram_s support ARM: dts: imx6ul: add GPIO clocks ARM: dts: imx6sll-evk: enable PWM1 for backlight driver ARM: dts: imx6sll-evk: correct lcd regulator GPIO pin ARM: dts: imx6sll-evk: enable SEIKO 43WVF1G lcdif panel ARM: dts: imx6sl-evk: add missing GPIO iomux setting ARM: dts:
[GIT PULL 4/4] ARM: Device-tree updates
Business as usual -- the bulk of our changes are to devicetree files with new hardware support, new SoCs and platforms, and new board types. New SoCs/platforms: - Raspberry Pi Compute Module (CM1) and IO board - i.MX6SSL from NXP - Renesas RZ/N1D SoC (R9A06G032), Dual Cortex-A7 with Ethernet, CAN and PLC interfaces - TI AM654 SoC, Quad Cortex-A53, safety subsystem with Cortex-R5 controllers, communication and PRU subsystem and lots of other interfaces (PCIe, USB3, etc). New boards and systems: - Several Atmel at91-based boards from Laird - Marvell Armada388-based Helios4 board from SolidRun - Samsung Aires-based phones (s5pv210) - Allwinner A64-based Pinebook laptop In addition to the above, there's the usual amount of new devices described on existing platforms, fixes and tweaks and new minor variants of boards/platforms. The following changes since commit a8bba4bb81f33b520f2e9bee78cf2e845e16c9c3: Merge tag 'armsoc-defconfig' into HEAD are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc.git tags/armsoc-dt for you to fetch changes up to afd3e3dad6761ddf08119afe121bfbe096c0844b: Merge tag 'qcom-arm64-for-4.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/agross/linux into next/dt Adam Ford (6): ARM: dts: da850-evm: Enable SATA port ARM: dts: da850-evm: Enable LCD and backlight ARM: dts: am3517-evm: Remove unnessary PMIC parameters ARM: dts: am3517-som: Add WL127x Wifi ARM: dts: am3517-som: Add builtin Bluetooth ARM: dts: am3517-evm: Enable USB1 Host Alex Gonzalez (2): ARM: dts: imx6ul: Add DTS for ConnectCore 6UL System-On-Module (SOM) ARM: dts: imx6ul: Add DTS for ConnectCore 6UL SBC Express Alexander Kurz (6): ARM: dts: imx6qdl-wandboard: remove regulators bus node ARM: dts: imx6dl-riotboard: remove regulators bus node ARM: dts: imx6qdl-wandboard: enable USB OTG ARM: dts: imx6dl-riotboard: fix OTG regulator polarity ARM: dts: imx50: fix KPP pin definition typos ARM: dts: imx50: remove non-existent USB instances Alexandre Belloni (4): dt-bindings: arm: remove PMC bindings dt-bindings: clk: at91: Document all the PMC compatibles ARM: at91: fix USB clock detection handling ARM: dts: fix PMC compatible Alexandre Torgue (2): ARM: dts: stm32: remove gpio aliases for stm32mp157c ARM: dts: stm32: Reorder nodes in stm32mp157c-ed1 Amelie Delaunay (9): ARM: dts: stm32: update rtc st,syscfg property on stm32f429 ARM: dts: stm32: update rtc st,syscfg property on stm32f746 ARM: dts: stm32: add RTC support to stm32mp157c ARM: dts: stm32: enable RTC on stm32mp157c-ed1 ARM: dts: stm32: enable USB Host (USBH) EHCI controller on stm32mp157c-ev1 ARM: dts: stm32: add USB OTG HS support for stm32mp157c SoC ARM: dts: stm32: enable USB OTG HS on stm32mp157c-ev1 ARM: dts: stm32: add SPI support on stm32mp157c ARM: dts: stm32: add SPI1 support on stm32mp157c-ev1 Amit Kucheria (2): arm64: dts: msm8996: thermal: Initialise via DT and add second controller arm64: dts: sdm845: Add tsens nodes Anand Moon (1): ARM: dts: exynos: Add missing interrupts for pwm node on Exynos5 Andre Przywara (1): arm64: dts: allwinner: a64: Add PWM controllers Andrey Smirnov (13): ARM: dts: imx51-zii-rdu1: Make sure SD1_WP is low ARM: dts: imx51-zii-rdu1: Populate RAVE SP EEPROM nodes ARM: dts: imx6qdl-zii-rdu2: Populate RAVE SP EEPROM nodes ARM: dts: imx6qdl-zii-rdu2: Populate RAVE SP backlight node ARM: dts: imx6qdl-zii-rdu2: Populate RAVE SP power button node ARM: dts: imx51-babbage: Make use of pinctrl_usbh1reg ARM: dts: imx: Add ZII SCU2 Mezz board ARM: dts: imx: Add ZII SCU3 ESB ARM: dts: imx51-zii-scu3-esb: Add switch IRQ line pinumx config ARM: dts: imx51-zii-scu3-esb: Fix RAVE SP watchdog compatible string ARM: dts: vf610: Add ZII SSMB SPU3 board ARM: dts: vf610: Add ZII CFU1 board ARM: dts: vf610-zii-ssmb-spu3: Fix W=1 level warnings Andy Gross (2): Merge tag 'qcom-arm64-for-4.19' into arm64-for-4.19-2 Merge tag 'qcom-drivers-for-4.19' into arm64-for-4.19-2 Anson Huang (14): ARM: dts: imx: add cooling-cells for cpufreq cooling device ARM: dts: imx6sll: declare src module to be compatible to imx51's src ARM: dts: imx6sll-evk: enable usdhc3 slot ARM: dts: imx6: correct anatop regulators range ARM: dts: imx6sx: add ocram_s support ARM: dts: imx6ul: add GPIO clocks ARM: dts: imx6sll-evk: enable PWM1 for backlight driver ARM: dts: imx6sll-evk: correct lcd regulator GPIO pin ARM: dts: imx6sll-evk: enable SEIKO 43WVF1G lcdif panel ARM: dts: imx6sl-evk: add missing GPIO iomux setting ARM: dts:
[GIT PULL 3/4] ARM: SoC defconfig updates
We keep these separate since some files are shared and conflict-prone, but there isn't really much to write about here. Some of the churnier pieces is for the Aspeed platforms, which did an overdue refresh of the defconfig, and enabled USB gadget and some drivers from there. Most of the rest are minor additions here and there to turn on drivers that are needed or useful on the various platforms. The following changes since commit 4344bb3ec602d8b613a2f6c71f84eb631468a432: Merge tag 'armsoc-drivers' into HEAD are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc.git tags/armsoc-defconfig for you to fetch changes up to 07d268f541dee9e58ded056fb6782942faa89a91: Merge tag 'v4.19-rockchip-defconfig64-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into next/defconfig Abhishek Sahu (2): ARM: qcom_defconfig: Enable QCOM NAND related configs arm64: defconfig: Enable CONFIG_MTD_NAND_QCOM for IPQ8074 Adam Ford (1): ARM: davinci_all_defconfig: set CONFIG_BACKLIGHT_PWM=m Amelie Delaunay (1): ARM: multi_v7_defconfig: enable STM32 RTC Baruch Siach (2): ARM: mvebu_v7_defconfig: sync defconfig ARM: mvebu_v7_defconfig: enable SFP support Benjamin Herrenschmidt (2): arm: configs: Add USB gadget to Aspeed G4 defconfig arm: configs: Add USB gadget to Aspeed G5 defconfig David Lechner (2): ARM: davinci_all_defconfig: remove CONFIG_DAVINCI_RESET_CLOCKS ARM: davinci_all_defconfig: Enable Bluetooth Dien Pham (1): arm64: defconfig: Enable BD9571MWV regulator Enric Balletbo i Serra (1): arm64: defconfig: Enable more peripherals for Samsung Chromebook Plus. Florian Fainelli (2): Merge tag 'bcm2835-defconfig-next-2018-07-03' into defconfig/next Merge tag 'bcm2835-defconfig-64-next-2018-07-03' into defconfig-arm64/next Geert Uytterhoeven (5): ARM: shmobile: defconfig: Drop NET_VENDOR_=n ARM: shmobile: defconfig: Enable reset controller support ARM: shmobile: defconfig: Enable support for RZN1D-DB ARM: shmobile: defconfig: Disable /sbin/hotplug fork-bomb ARM: multi_v7_defconfig: Enable support for RZN1D-DB Gregory CLEMENT (1): ARM: multi_v7_defconfig: Add Marvell NAND controller support Hugues Fruchet (1): ARM: multi_v7_defconfig: enable STM32 DCMI media support Joel Stanley (4): ARM: config: aspeed: Update defconfig ARM: config: multi_v5: Refresh configuration ARM: config: multi_v5: Enable ASPEED drivers ARM: config: aspeed: Enable new FSI drivers Kishon Vijay Abraham I (1): ARM: configs: keystone: Enable CONFIG_MMC_SDHCI_OMAP Kunihiko Hayashi (1): ARM: multi_v7_defconfig: add CONFIG_UNIPHIER_THERMAL and CONFIG_SNI_AVE Leonard Crestez (1): ARM: imx_v6_v7_defconfig: Enable imx6qdl-sabreauto sensors Murali Karicheri (1): ARM: keystone: k2g: enable micrel and dp83867 phys Olof Johansson (15): Merge tag 'keystone_config_for_4.19' of git://git.kernel.org/.../ssantosh/linux-keystone into next/defconfig Merge tag 'arm-soc/for-4.19/defconfig' of https://github.com/Broadcom/stblinux into next/defconfig Merge tag 'arm-soc/for-4.19/defconfig-arm64' of https://github.com/Broadcom/stblinux into next/defconfig Merge tag 'davinci-for-v4.19/defconfig' of git://git.kernel.org/.../nsekhar/linux-davinci into next/defconfig Merge tag 'samsung-defconfig-4.19' of https://git.kernel.org/.../krzk/linux into next/defconfig Merge tag 'stm32-defconfig-for-v4.19-1' of git://git.kernel.org/.../atorgue/stm32 into next/defconfig Merge tag 'mvebu-defconfig-4.19-1' of git://git.infradead.org/linux-mvebu into next/defconfig Merge tag 'hisi-defconfig-for-4.19' of git://github.com/hisilicon/linux-hisi into next/defconfig Merge tag 'imx-defconfig-4.19' of git://git.kernel.org/.../shawnguo/linux into next/defconfig Merge tag 'renesas-arm64-defconfig-for-v4.19' of https://git.kernel.org/.../horms/renesas into next/defconfig Merge tag 'qcom-arm64-defconfig-for-4.19' of git://git.kernel.org/.../agross/linux into next/defconfig Merge tag 'qcom-defconfig-for-4.19' of git://git.kernel.org/.../agross/linux into next/defconfig Merge tag 'renesas-arm-defconfig-for-v4.19' of https://git.kernel.org/.../horms/renesas into next/defconfig Merge tag 'aspeed-4.19-defconfig' of git://git.kernel.org/.../joel/aspeed into next/defconfig Merge tag 'v4.19-rockchip-defconfig64-1' of git://git.kernel.org/.../mmind/linux-rockchip into next/defconfig Paweł Chmiel (3): ARM: s5pv210_defconfig: Run make savedefconfig ARM: s5pv210_defconfig: Enable drivers for Samsung Aries based phones ARM: s5pv210_defconfig: Enable options needed to boot typical Linux distro Pierre-Yves
[GIT PULL 3/4] ARM: SoC defconfig updates
We keep these separate since some files are shared and conflict-prone, but there isn't really much to write about here. Some of the churnier pieces is for the Aspeed platforms, which did an overdue refresh of the defconfig, and enabled USB gadget and some drivers from there. Most of the rest are minor additions here and there to turn on drivers that are needed or useful on the various platforms. The following changes since commit 4344bb3ec602d8b613a2f6c71f84eb631468a432: Merge tag 'armsoc-drivers' into HEAD are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc.git tags/armsoc-defconfig for you to fetch changes up to 07d268f541dee9e58ded056fb6782942faa89a91: Merge tag 'v4.19-rockchip-defconfig64-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into next/defconfig Abhishek Sahu (2): ARM: qcom_defconfig: Enable QCOM NAND related configs arm64: defconfig: Enable CONFIG_MTD_NAND_QCOM for IPQ8074 Adam Ford (1): ARM: davinci_all_defconfig: set CONFIG_BACKLIGHT_PWM=m Amelie Delaunay (1): ARM: multi_v7_defconfig: enable STM32 RTC Baruch Siach (2): ARM: mvebu_v7_defconfig: sync defconfig ARM: mvebu_v7_defconfig: enable SFP support Benjamin Herrenschmidt (2): arm: configs: Add USB gadget to Aspeed G4 defconfig arm: configs: Add USB gadget to Aspeed G5 defconfig David Lechner (2): ARM: davinci_all_defconfig: remove CONFIG_DAVINCI_RESET_CLOCKS ARM: davinci_all_defconfig: Enable Bluetooth Dien Pham (1): arm64: defconfig: Enable BD9571MWV regulator Enric Balletbo i Serra (1): arm64: defconfig: Enable more peripherals for Samsung Chromebook Plus. Florian Fainelli (2): Merge tag 'bcm2835-defconfig-next-2018-07-03' into defconfig/next Merge tag 'bcm2835-defconfig-64-next-2018-07-03' into defconfig-arm64/next Geert Uytterhoeven (5): ARM: shmobile: defconfig: Drop NET_VENDOR_=n ARM: shmobile: defconfig: Enable reset controller support ARM: shmobile: defconfig: Enable support for RZN1D-DB ARM: shmobile: defconfig: Disable /sbin/hotplug fork-bomb ARM: multi_v7_defconfig: Enable support for RZN1D-DB Gregory CLEMENT (1): ARM: multi_v7_defconfig: Add Marvell NAND controller support Hugues Fruchet (1): ARM: multi_v7_defconfig: enable STM32 DCMI media support Joel Stanley (4): ARM: config: aspeed: Update defconfig ARM: config: multi_v5: Refresh configuration ARM: config: multi_v5: Enable ASPEED drivers ARM: config: aspeed: Enable new FSI drivers Kishon Vijay Abraham I (1): ARM: configs: keystone: Enable CONFIG_MMC_SDHCI_OMAP Kunihiko Hayashi (1): ARM: multi_v7_defconfig: add CONFIG_UNIPHIER_THERMAL and CONFIG_SNI_AVE Leonard Crestez (1): ARM: imx_v6_v7_defconfig: Enable imx6qdl-sabreauto sensors Murali Karicheri (1): ARM: keystone: k2g: enable micrel and dp83867 phys Olof Johansson (15): Merge tag 'keystone_config_for_4.19' of git://git.kernel.org/.../ssantosh/linux-keystone into next/defconfig Merge tag 'arm-soc/for-4.19/defconfig' of https://github.com/Broadcom/stblinux into next/defconfig Merge tag 'arm-soc/for-4.19/defconfig-arm64' of https://github.com/Broadcom/stblinux into next/defconfig Merge tag 'davinci-for-v4.19/defconfig' of git://git.kernel.org/.../nsekhar/linux-davinci into next/defconfig Merge tag 'samsung-defconfig-4.19' of https://git.kernel.org/.../krzk/linux into next/defconfig Merge tag 'stm32-defconfig-for-v4.19-1' of git://git.kernel.org/.../atorgue/stm32 into next/defconfig Merge tag 'mvebu-defconfig-4.19-1' of git://git.infradead.org/linux-mvebu into next/defconfig Merge tag 'hisi-defconfig-for-4.19' of git://github.com/hisilicon/linux-hisi into next/defconfig Merge tag 'imx-defconfig-4.19' of git://git.kernel.org/.../shawnguo/linux into next/defconfig Merge tag 'renesas-arm64-defconfig-for-v4.19' of https://git.kernel.org/.../horms/renesas into next/defconfig Merge tag 'qcom-arm64-defconfig-for-4.19' of git://git.kernel.org/.../agross/linux into next/defconfig Merge tag 'qcom-defconfig-for-4.19' of git://git.kernel.org/.../agross/linux into next/defconfig Merge tag 'renesas-arm-defconfig-for-v4.19' of https://git.kernel.org/.../horms/renesas into next/defconfig Merge tag 'aspeed-4.19-defconfig' of git://git.kernel.org/.../joel/aspeed into next/defconfig Merge tag 'v4.19-rockchip-defconfig64-1' of git://git.kernel.org/.../mmind/linux-rockchip into next/defconfig Paweł Chmiel (3): ARM: s5pv210_defconfig: Run make savedefconfig ARM: s5pv210_defconfig: Enable drivers for Samsung Aries based phones ARM: s5pv210_defconfig: Enable options needed to boot typical Linux distro Pierre-Yves
[GIT PULL 2/4] ARM: SoC driver updates
Some of the larger changes this merge window: - Removal of drivers for Exynos5440, a Samsung SoC that never saw widespread use. - Uniphier support for USB3 and SPI reset handling - Syste control and SRAM drivers and bindings for Allwinner platforms - Qualcomm AOSS (Always-on subsystem) reset controller drivers - Raspberry Pi hwmon driver for voltage - Mediatek pwrap (pmic) support for MT6797 SoC The following changes since commit 8928c21c542a61c7cf7e33794a84e774040fb718: Merge tag 'armsoc-soc' into HEAD are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc.git tags/armsoc-drivers for you to fetch changes up to 29ed45fff05899f6f39d05fe1c32b1bc51f8926b: Merge tag 'v4.18-next-soc' of https://git.kernel.org/pub/scm/linux/kernel/git/matthias.bgg/linux into next/drivers Argus Lin (4): dt-bindings: pwrap: mediatek: add pwrap support for MT6797 soc: mediatek: pwrap: fix cipher init setting error soc: mediatek: pwrap: add pwrap driver for mt6797 SoCs soc: mediatek: pwrap: add mt6351 driver for mt6797 SoCs Arnd Bergmann (2): tee: replace getnstimeofday64() with ktime_get_real_ts64() soc: ti: wkup_m3_ipc: mark PM functions as __maybe_unused Chen-Yu Tsai (2): dt-bindings: sram: Rename A64 SRAM controller compatible soc: sunxi: sram: Add updated compatible string for A64 system control Dan Carpenter (1): firmware: arm_scmi: remove some unnecessary checks Dave Gerlach (2): memory: ti-emif-sram: Add resume function to recopy sram code soc: ti: wkup_m3_ipc: Add wkup_m3_request_wake_src Dmitry Osipenko (1): memory: tegra: Correct driver probe order Doug Berger (1): soc: bcm: brcmstb: pm: Add support for newer rev B3.0 controllers Faiz Abbas (3): clk: ti: dra7: Add clkctrl clock data for the mcan clocks bus: ti-sysc: Add support for using ti-sysc for MCAN on dra76x bus: ti-sysc: Add support for software reset Florian Fainelli (2): soc: bcm: brcmstb: Add missing DDR MEMC compatible strings Merge tag 'bcm2835-drivers-next-2018-07-03' into drivers/next Horia Geantă (1): staging: fsl-dpaa2: eth: move generic FD defines to DPIO Icenowy Zheng (3): soc: sunxi: export a regmap for EMAC clock reg on A64 dt-bindings: add binding for the Allwinner A64 DE2 bus bus: add bus driver for accessing Allwinner A64 DE2 Joakim Tjernlund (1): soc: fsl: qe: gpio: Add qe_gpio_set_multiple Keerthy (1): soc: ti: wkup_m3_ipc: Add rtc_only with ddr in self refresh mode support Kees Cook (1): firmware: raspberrypi: Remove VLA usage Krzysztof Kozlowski (4): ata: ahci-platform: Remove support for Exynos5440 cpufreq: exynos: Remove support for Exynos5440 clk: samsung: Remove support for Exynos5440 usb: host: exynos: Remove support for Exynos5440 Kunihiko Hayashi (4): reset: simple: export reset_simple_ops to be referred from modules dt-bindings: reset: uniphier: add USB3 core reset support reset: uniphier: add USB3 core reset control reset: uniphier: add reset control support for SPI Leonard Crestez (2): soc: imx: gpc: Disable 6sl display power gating for ERR006287 soc: imx6qp: Use GENPD_FLAG_ALWAYS_ON for PU errata Li Yang (1): soc: fsl: cleanup Kconfig menu Maxime Ripard (2): drivers: soc: sunxi: Add support for the C1 SRAM region soc: sunxi: Add the A13, A23 and H3 system control compatibles Olof Johansson (14): Merge tag 'tee-drv-for-4.18' of git://git.linaro.org/people/jens.wiklander/linux-tee into next/drivers Merge tag 'soc_drivers_for_4.19' of git://git.kernel.org/.../ssantosh/linux-keystone into next/drivers Merge tag 'arm-soc/for-4.19/drivers' of https://github.com/Broadcom/stblinux into next/drivers Merge tag 'omap-for-v4.19/ti-sysc-v2-signed' of git://git.kernel.org/.../tmlind/linux-omap into next/drivers Merge tag 'tegra-for-4.19-memory' of git://git.kernel.org/.../tegra/linux into next/drivers Merge tag 'scmi-update-4.19' of git://git.kernel.org/.../sudeep.holla/linux into next/drivers Merge tag 'vexpress-update-4.19' of git://git.kernel.org/.../sudeep.holla/linux into next/drivers Merge tag 'reset-for-4.19' of git://git.pengutronix.de/git/pza/linux into next/drivers Merge tag 'imx-drivers-4.19' of git://git.kernel.org/.../shawnguo/linux into next/drivers Merge tag 'qcom-drivers-for-4.19' of git://git.kernel.org/.../agross/linux into next/drivers Merge tag 'sunxi-drivers-for-4.19' of https://git.kernel.org/.../sunxi/linux into next/drivers Merge tag 'soc-fsl-for-4.19' of git://git.kernel.org/.../leo/linux into next/drivers Merge tag 'samsung-drivers-exynos5440-4.19' of https://git.kernel.org/.../krzk/linux into
[GIT PULL 1/4] ARM: 32-bit SoC platform updates
Most of the SoC updates in this cycle are cleanups and moves to more modern infrastructure: - Davinci was moved to common clock framework - OMAP1-based Amstrad E3 "Superphone" saw a bunch of cleanups to the keyboard interface (bitbanged AT keyboard via GPIO). - Removal of some stale code for Renesas platforms - Power management improvements for i.MX6LL The following changes since commit 815f0ddb346c196018d4d8f8f55c12b83da1de3f: include/linux/compiler*.h: make compiler-*.h mutually exclusive are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc.git tags/armsoc-soc for you to fetch changes up to f0fc40aff6fee100ffbed8328a0df88f8aa75fce: ARM: uniphier: select RESET_CONTROLLER Andy Shevchenko (1): ARM: OMAP2+: reuse DEFINE_SHOW_ATTRIBUTE() macro Anson Huang (6): ARM: imx: add standby mode suspend for i.MX6SLL ARM: imx: add mem mode suspend for i.MX6SLL ARM: imx: add L2 page power control for GPC ARM: imx: add cpu idle support for i.MX6SLL ARM: imx: remove i.MX6SLL support in i.MX6SL cpu idle driver ARM: imx: enable bus auto clock gating function for i.mx6sll Arnd Bergmann (4): ARM: imx: fix i.MX6SLL build ARM: imx: call imx6sx_cpuidle_init() conditionally for 6sll soc: r9a06g032: don't build SMP files for non-SMP config ARM: shmobile: r8a7779: hide unused r8a7779_platform_cpu_kill Arvind Yadav (1): ARM: OMAP1: constify gpio_led Bartosz Golaszewski (13): clk: davinci: psc-da850: remove the 'davinci_nand.0" lookup clk: davinci: psc-dm365: use two lookup entries for the aemif clock clk: davinci: psc-dm644x: use two lookup entries for the aemif clock clk: davinci: psc-dm646x: use two lookup entries for the aemif clock clk: davinci: psc-da830: add a lookup entry for aemif clock ARM: davinci: omapl138-hawk: add aemif & nand support ARM: davinci: da850-evm: use aemif platform driver in legacy mode ARM: davinci: dm365-evm: use the ti-aemif soc driver ARM: davinci: dm644x-evm: use aemif platform driver ARM: davinci: da830-evm: use aemif platform driver ARM: davinci: dm646x-evm: use aemif platform driver ARM: davinci: mityomapl138: use aemif platform driver ARM: davinci: unduplicate aemif support Boris Brezillon (1): MAINTAINERS: Remove the AT91 clk driver entry Claudiu Beznea (3): ARM: at91: pm: Use ULP0 naming instead of slow clock ARM: at91: pm: add PMC fast startup registers defines ARM: at91: pm: configure wakeup sources for ULP1 mode Clément Peron (3): ARM: imx: remove inexistant EPIT timer init ARM: debug: Add iProc UART3 debug addresses ARM: debug: fix BCM2836 order entry Dave Gerlach (2): ARM: hwmod: RTC: Don't assume lock/unlock will be called with irq enabled ARM: OMAP2+: sleep33/43xx: Make sleep actions configurable David Lechner (21): ARM: davinci: pass clock as parameter to davinci_timer_init() ARM: davinci: da830: add new clock init using common clock framework ARM: davinci: da850: add new clock init using common clock framework ARM: davinci: dm355: add new clock init using common clock framework ARM: davinci: dm365: add new clock init using common clock framework ARM: davinci: dm644x: add new clock init using common clock framework ARM: davinci: dm646x: add new clock init using common clock framework ARM: davinci: da8xx: add new USB PHY clock init using common clock framework ARM: davinci: da8xx: add new sata_refclk init using common clock framework ARM: davinci: remove CONFIG_DAVINCI_RESET_CLOCKS ARM: davinci: switch to common clock framework ARM: davinci: da830: Remove legacy clock init ARM: davinci: da850: Remove legacy clock init ARM: davinci: dm355: Remove legacy clock init ARM: davinci: dm365: Remove legacy clock init ARM: davinci: dm644x: Remove legacy clock init ARM: davinci: dm646x: Remove legacy clock init ARM: davinci: da8xx: Remove legacy USB and SATA clock init ARM: davinci: remove legacy clocks ARM: davinci: add device tree support to timer ARM: davinci: da8xx-dt: switch to device tree clocks Denis Efremov (1): ARM: s3c24xx: Fix typo in guard macro of s3c2412.h Ethan Tuttle (1): ARM: mvebu: declare asm symbols as character arrays in pmsu.c Fabio Estevam (2): ARM: imx51: Configure M4IF to avoid visual artifacts ARM: mx5: Set the DBGEN bit in ARM_GPC register Geert Uytterhoeven (16): ARM: shmobile: r8a7790: Remove legacy SMP fallback code ARM: shmobile: r8a7790: Use common R-Car Gen2 machine definition ARM: shmobile: r8a7791: Remove legacy SMP fallback code ARM: shmobile: r8a7791: Use common R-Car Gen2 machine definition
[GIT PULL 0/4] ARM: SoC/platform updates
Hi Linus, Another medium-sized batch of ARM updates this release cycle. Not a large number of new platforms in this release, but there are a few -- from Renesas and TI in particular. 790 patches this time around, 173 authors. 26531 lines added and 13363 removed -- very close to the size of last release. A bit more detail in each pull request, but this is overall a "business as usual" release this time around. Please merge! Thanks, -Olof
[GIT PULL 2/4] ARM: SoC driver updates
Some of the larger changes this merge window: - Removal of drivers for Exynos5440, a Samsung SoC that never saw widespread use. - Uniphier support for USB3 and SPI reset handling - Syste control and SRAM drivers and bindings for Allwinner platforms - Qualcomm AOSS (Always-on subsystem) reset controller drivers - Raspberry Pi hwmon driver for voltage - Mediatek pwrap (pmic) support for MT6797 SoC The following changes since commit 8928c21c542a61c7cf7e33794a84e774040fb718: Merge tag 'armsoc-soc' into HEAD are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc.git tags/armsoc-drivers for you to fetch changes up to 29ed45fff05899f6f39d05fe1c32b1bc51f8926b: Merge tag 'v4.18-next-soc' of https://git.kernel.org/pub/scm/linux/kernel/git/matthias.bgg/linux into next/drivers Argus Lin (4): dt-bindings: pwrap: mediatek: add pwrap support for MT6797 soc: mediatek: pwrap: fix cipher init setting error soc: mediatek: pwrap: add pwrap driver for mt6797 SoCs soc: mediatek: pwrap: add mt6351 driver for mt6797 SoCs Arnd Bergmann (2): tee: replace getnstimeofday64() with ktime_get_real_ts64() soc: ti: wkup_m3_ipc: mark PM functions as __maybe_unused Chen-Yu Tsai (2): dt-bindings: sram: Rename A64 SRAM controller compatible soc: sunxi: sram: Add updated compatible string for A64 system control Dan Carpenter (1): firmware: arm_scmi: remove some unnecessary checks Dave Gerlach (2): memory: ti-emif-sram: Add resume function to recopy sram code soc: ti: wkup_m3_ipc: Add wkup_m3_request_wake_src Dmitry Osipenko (1): memory: tegra: Correct driver probe order Doug Berger (1): soc: bcm: brcmstb: pm: Add support for newer rev B3.0 controllers Faiz Abbas (3): clk: ti: dra7: Add clkctrl clock data for the mcan clocks bus: ti-sysc: Add support for using ti-sysc for MCAN on dra76x bus: ti-sysc: Add support for software reset Florian Fainelli (2): soc: bcm: brcmstb: Add missing DDR MEMC compatible strings Merge tag 'bcm2835-drivers-next-2018-07-03' into drivers/next Horia Geantă (1): staging: fsl-dpaa2: eth: move generic FD defines to DPIO Icenowy Zheng (3): soc: sunxi: export a regmap for EMAC clock reg on A64 dt-bindings: add binding for the Allwinner A64 DE2 bus bus: add bus driver for accessing Allwinner A64 DE2 Joakim Tjernlund (1): soc: fsl: qe: gpio: Add qe_gpio_set_multiple Keerthy (1): soc: ti: wkup_m3_ipc: Add rtc_only with ddr in self refresh mode support Kees Cook (1): firmware: raspberrypi: Remove VLA usage Krzysztof Kozlowski (4): ata: ahci-platform: Remove support for Exynos5440 cpufreq: exynos: Remove support for Exynos5440 clk: samsung: Remove support for Exynos5440 usb: host: exynos: Remove support for Exynos5440 Kunihiko Hayashi (4): reset: simple: export reset_simple_ops to be referred from modules dt-bindings: reset: uniphier: add USB3 core reset support reset: uniphier: add USB3 core reset control reset: uniphier: add reset control support for SPI Leonard Crestez (2): soc: imx: gpc: Disable 6sl display power gating for ERR006287 soc: imx6qp: Use GENPD_FLAG_ALWAYS_ON for PU errata Li Yang (1): soc: fsl: cleanup Kconfig menu Maxime Ripard (2): drivers: soc: sunxi: Add support for the C1 SRAM region soc: sunxi: Add the A13, A23 and H3 system control compatibles Olof Johansson (14): Merge tag 'tee-drv-for-4.18' of git://git.linaro.org/people/jens.wiklander/linux-tee into next/drivers Merge tag 'soc_drivers_for_4.19' of git://git.kernel.org/.../ssantosh/linux-keystone into next/drivers Merge tag 'arm-soc/for-4.19/drivers' of https://github.com/Broadcom/stblinux into next/drivers Merge tag 'omap-for-v4.19/ti-sysc-v2-signed' of git://git.kernel.org/.../tmlind/linux-omap into next/drivers Merge tag 'tegra-for-4.19-memory' of git://git.kernel.org/.../tegra/linux into next/drivers Merge tag 'scmi-update-4.19' of git://git.kernel.org/.../sudeep.holla/linux into next/drivers Merge tag 'vexpress-update-4.19' of git://git.kernel.org/.../sudeep.holla/linux into next/drivers Merge tag 'reset-for-4.19' of git://git.pengutronix.de/git/pza/linux into next/drivers Merge tag 'imx-drivers-4.19' of git://git.kernel.org/.../shawnguo/linux into next/drivers Merge tag 'qcom-drivers-for-4.19' of git://git.kernel.org/.../agross/linux into next/drivers Merge tag 'sunxi-drivers-for-4.19' of https://git.kernel.org/.../sunxi/linux into next/drivers Merge tag 'soc-fsl-for-4.19' of git://git.kernel.org/.../leo/linux into next/drivers Merge tag 'samsung-drivers-exynos5440-4.19' of https://git.kernel.org/.../krzk/linux into
[GIT PULL 1/4] ARM: 32-bit SoC platform updates
Most of the SoC updates in this cycle are cleanups and moves to more modern infrastructure: - Davinci was moved to common clock framework - OMAP1-based Amstrad E3 "Superphone" saw a bunch of cleanups to the keyboard interface (bitbanged AT keyboard via GPIO). - Removal of some stale code for Renesas platforms - Power management improvements for i.MX6LL The following changes since commit 815f0ddb346c196018d4d8f8f55c12b83da1de3f: include/linux/compiler*.h: make compiler-*.h mutually exclusive are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc.git tags/armsoc-soc for you to fetch changes up to f0fc40aff6fee100ffbed8328a0df88f8aa75fce: ARM: uniphier: select RESET_CONTROLLER Andy Shevchenko (1): ARM: OMAP2+: reuse DEFINE_SHOW_ATTRIBUTE() macro Anson Huang (6): ARM: imx: add standby mode suspend for i.MX6SLL ARM: imx: add mem mode suspend for i.MX6SLL ARM: imx: add L2 page power control for GPC ARM: imx: add cpu idle support for i.MX6SLL ARM: imx: remove i.MX6SLL support in i.MX6SL cpu idle driver ARM: imx: enable bus auto clock gating function for i.mx6sll Arnd Bergmann (4): ARM: imx: fix i.MX6SLL build ARM: imx: call imx6sx_cpuidle_init() conditionally for 6sll soc: r9a06g032: don't build SMP files for non-SMP config ARM: shmobile: r8a7779: hide unused r8a7779_platform_cpu_kill Arvind Yadav (1): ARM: OMAP1: constify gpio_led Bartosz Golaszewski (13): clk: davinci: psc-da850: remove the 'davinci_nand.0" lookup clk: davinci: psc-dm365: use two lookup entries for the aemif clock clk: davinci: psc-dm644x: use two lookup entries for the aemif clock clk: davinci: psc-dm646x: use two lookup entries for the aemif clock clk: davinci: psc-da830: add a lookup entry for aemif clock ARM: davinci: omapl138-hawk: add aemif & nand support ARM: davinci: da850-evm: use aemif platform driver in legacy mode ARM: davinci: dm365-evm: use the ti-aemif soc driver ARM: davinci: dm644x-evm: use aemif platform driver ARM: davinci: da830-evm: use aemif platform driver ARM: davinci: dm646x-evm: use aemif platform driver ARM: davinci: mityomapl138: use aemif platform driver ARM: davinci: unduplicate aemif support Boris Brezillon (1): MAINTAINERS: Remove the AT91 clk driver entry Claudiu Beznea (3): ARM: at91: pm: Use ULP0 naming instead of slow clock ARM: at91: pm: add PMC fast startup registers defines ARM: at91: pm: configure wakeup sources for ULP1 mode Clément Peron (3): ARM: imx: remove inexistant EPIT timer init ARM: debug: Add iProc UART3 debug addresses ARM: debug: fix BCM2836 order entry Dave Gerlach (2): ARM: hwmod: RTC: Don't assume lock/unlock will be called with irq enabled ARM: OMAP2+: sleep33/43xx: Make sleep actions configurable David Lechner (21): ARM: davinci: pass clock as parameter to davinci_timer_init() ARM: davinci: da830: add new clock init using common clock framework ARM: davinci: da850: add new clock init using common clock framework ARM: davinci: dm355: add new clock init using common clock framework ARM: davinci: dm365: add new clock init using common clock framework ARM: davinci: dm644x: add new clock init using common clock framework ARM: davinci: dm646x: add new clock init using common clock framework ARM: davinci: da8xx: add new USB PHY clock init using common clock framework ARM: davinci: da8xx: add new sata_refclk init using common clock framework ARM: davinci: remove CONFIG_DAVINCI_RESET_CLOCKS ARM: davinci: switch to common clock framework ARM: davinci: da830: Remove legacy clock init ARM: davinci: da850: Remove legacy clock init ARM: davinci: dm355: Remove legacy clock init ARM: davinci: dm365: Remove legacy clock init ARM: davinci: dm644x: Remove legacy clock init ARM: davinci: dm646x: Remove legacy clock init ARM: davinci: da8xx: Remove legacy USB and SATA clock init ARM: davinci: remove legacy clocks ARM: davinci: add device tree support to timer ARM: davinci: da8xx-dt: switch to device tree clocks Denis Efremov (1): ARM: s3c24xx: Fix typo in guard macro of s3c2412.h Ethan Tuttle (1): ARM: mvebu: declare asm symbols as character arrays in pmsu.c Fabio Estevam (2): ARM: imx51: Configure M4IF to avoid visual artifacts ARM: mx5: Set the DBGEN bit in ARM_GPC register Geert Uytterhoeven (16): ARM: shmobile: r8a7790: Remove legacy SMP fallback code ARM: shmobile: r8a7790: Use common R-Car Gen2 machine definition ARM: shmobile: r8a7791: Remove legacy SMP fallback code ARM: shmobile: r8a7791: Use common R-Car Gen2 machine definition
[GIT PULL 0/4] ARM: SoC/platform updates
Hi Linus, Another medium-sized batch of ARM updates this release cycle. Not a large number of new platforms in this release, but there are a few -- from Renesas and TI in particular. 790 patches this time around, 173 authors. 26531 lines added and 13363 removed -- very close to the size of last release. A bit more detail in each pull request, but this is overall a "business as usual" release this time around. Please merge! Thanks, -Olof
Re: [RFC PATCH 3/5] RISC-V: Add cpu_operatios structure
On 8/22/18 8:54 PM, Anup Patel wrote: On Wed, Aug 22, 2018 at 11:33 AM, Christoph Hellwig wrote: On Tue, Aug 21, 2018 at 10:34:38PM +0530, Anup Patel wrote: The cpu_operations is certainly required because SOC vendors will add vendor-specific mechanism to selectively bringing-up CPUs/HARTs instead of all CPUs entering Linux kernel simultaneously. In fact, we might also end-up having CPU ON/OFF operations in SBI. Your forgot an essential part in your analysis: Right now we only have one single way to deal with cpu on/offlining, and that is the dummy WFI kind. Once other ways show up we can build proper infrastructure, but until then this is just a white elephant as we have no idea how these abstractions will look like. And my hope is that we'll just see new SBI calls, in which case we'll just need SBI and dummy version and can avoid all the indirect calls. IMHO, rather than waiting for new CPU ON/OFF methods to come-up we can keep the cpu_operations ready. Also, we are not re-inventing anything here which we might have to discard later because cpu_operations are already tried and hardened for Linux ARM64. I agree with you that in long-term SBI-based CPU ON/OFF will be widely used. Most likely we will have at-least two CPU ON/OFF methods: 1. Existing lottery based spinning 2. New SBI calls Regards, Anup I am fine with either keeping the cpu_ops infrastructure for now or reintroducing again along with better smp enablement methods. Anyways, there were concerns about all existing booting method (all cpu thrown to Linux at the same time). I was thinking to adopt spin table boot method for RISC-V as well. I can drop this patch now and reintroduce with spin table boot method. Any thoughts ? Regards, Atish
Re: [RFC PATCH 3/5] RISC-V: Add cpu_operatios structure
On 8/22/18 8:54 PM, Anup Patel wrote: On Wed, Aug 22, 2018 at 11:33 AM, Christoph Hellwig wrote: On Tue, Aug 21, 2018 at 10:34:38PM +0530, Anup Patel wrote: The cpu_operations is certainly required because SOC vendors will add vendor-specific mechanism to selectively bringing-up CPUs/HARTs instead of all CPUs entering Linux kernel simultaneously. In fact, we might also end-up having CPU ON/OFF operations in SBI. Your forgot an essential part in your analysis: Right now we only have one single way to deal with cpu on/offlining, and that is the dummy WFI kind. Once other ways show up we can build proper infrastructure, but until then this is just a white elephant as we have no idea how these abstractions will look like. And my hope is that we'll just see new SBI calls, in which case we'll just need SBI and dummy version and can avoid all the indirect calls. IMHO, rather than waiting for new CPU ON/OFF methods to come-up we can keep the cpu_operations ready. Also, we are not re-inventing anything here which we might have to discard later because cpu_operations are already tried and hardened for Linux ARM64. I agree with you that in long-term SBI-based CPU ON/OFF will be widely used. Most likely we will have at-least two CPU ON/OFF methods: 1. Existing lottery based spinning 2. New SBI calls Regards, Anup I am fine with either keeping the cpu_ops infrastructure for now or reintroducing again along with better smp enablement methods. Anyways, there were concerns about all existing booting method (all cpu thrown to Linux at the same time). I was thinking to adopt spin table boot method for RISC-V as well. I can drop this patch now and reintroduce with spin table boot method. Any thoughts ? Regards, Atish
Re: [PATCH 2/4] mm/tlb: Remove tlb_remove_table() non-concurrent condition
On Wed, 22 Aug 2018 20:35:16 -0700 Linus Torvalds wrote: > On Wed, Aug 22, 2018 at 8:31 PM Nicholas Piggin wrote: > > > > > > So that leaves speculative operations. I don't see where the problem is > > with those either -- this shortcut needs to ensure there are no other > > *non speculative* operations. mm_users is correct for that. > > No. Because mm_users doesn't contain any lazy tlb users. > > And yes, those lazy tlbs are all kernel threads, but they can still > speculatively load user addresses. So? If the arch does not shoot those all down after the user page tables are removed then it's buggy regardless of this short cut. The only real problem I could see would be if a page walk cache still points to the freed table, then the table gets re-allocated and used elsewhere, and meanwhile a speculative access tries to load an entry from the page that is an invalid form of page table that might cause a machine check or something. That would be (u)arch specific, but if that's what we're concerned with here it's a different issue and needs to be documented as such. I'll have a look at powerpc and see if we can cope with it. If so, I'll make it an arch specific opt-in short cut. Thanks, Nick
Re: [PATCH 2/4] mm/tlb: Remove tlb_remove_table() non-concurrent condition
On Wed, 22 Aug 2018 20:35:16 -0700 Linus Torvalds wrote: > On Wed, Aug 22, 2018 at 8:31 PM Nicholas Piggin wrote: > > > > > > So that leaves speculative operations. I don't see where the problem is > > with those either -- this shortcut needs to ensure there are no other > > *non speculative* operations. mm_users is correct for that. > > No. Because mm_users doesn't contain any lazy tlb users. > > And yes, those lazy tlbs are all kernel threads, but they can still > speculatively load user addresses. So? If the arch does not shoot those all down after the user page tables are removed then it's buggy regardless of this short cut. The only real problem I could see would be if a page walk cache still points to the freed table, then the table gets re-allocated and used elsewhere, and meanwhile a speculative access tries to load an entry from the page that is an invalid form of page table that might cause a machine check or something. That would be (u)arch specific, but if that's what we're concerned with here it's a different issue and needs to be documented as such. I'll have a look at powerpc and see if we can cope with it. If so, I'll make it an arch specific opt-in short cut. Thanks, Nick
Re: [lkp-robot] [ipc] 296ba26b66: BUG:sleeping_function_called_from_invalid_context_at_mm/memory.c
On Thu, 23 Aug 2018, kernel test robot wrote: FYI, we noticed the following commit (built with gcc-7): commit: 296ba26b6681b6e07ed419b3004647167cb17f61 ("ipc: drop ipc_lock()") https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master I suspect this is because that commit forgot to set EIDRM for the !ipc_valid_object() case. So the callers check IS_ERR(shm_lock()), which won't fail the op as it did before the commit. diff --git a/ipc/shm.c b/ipc/shm.c index b0eb3757ab89..4cd402e4cfeb 100644 --- a/ipc/shm.c +++ b/ipc/shm.c @@ -199,6 +199,7 @@ static inline struct shmid_kernel *shm_lock(struct ipc_namespace *ns, int id) } ipc_unlock_object(ipcp); + ipcp = ERR_PTR(-EIDRM); err: rcu_read_unlock(); /*
Re: [lkp-robot] [ipc] 296ba26b66: BUG:sleeping_function_called_from_invalid_context_at_mm/memory.c
On Thu, 23 Aug 2018, kernel test robot wrote: FYI, we noticed the following commit (built with gcc-7): commit: 296ba26b6681b6e07ed419b3004647167cb17f61 ("ipc: drop ipc_lock()") https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master I suspect this is because that commit forgot to set EIDRM for the !ipc_valid_object() case. So the callers check IS_ERR(shm_lock()), which won't fail the op as it did before the commit. diff --git a/ipc/shm.c b/ipc/shm.c index b0eb3757ab89..4cd402e4cfeb 100644 --- a/ipc/shm.c +++ b/ipc/shm.c @@ -199,6 +199,7 @@ static inline struct shmid_kernel *shm_lock(struct ipc_namespace *ns, int id) } ipc_unlock_object(ipcp); + ipcp = ERR_PTR(-EIDRM); err: rcu_read_unlock(); /*
Re: [PATCH 3/4] mm/tlb, x86/mm: Support invalidating TLB caches for RCU_TABLE_FREE
On Wed, Aug 22, 2018 at 8:45 PM Nicholas Piggin wrote: > > powerpc/radix has no such issue, it already does this tracking. Yeah, I now realize that this was why you wanted to add that hacky thing to the generic code, so that you can add the tlb_flush_pgtable() call. I thought it was because powerpc had some special flush instruction for it, and the regular tlb flush didn't do it. But no. It was because the regular code had lost the tlb flush _entirely_, because powerpc didn't want it. > We were discussing this a couple of months ago, I wasn't aware of ARM's > issue but I suggested x86 could go the same way as powerpc. The problem is that x86 _used_ to do this all correctly long long ago. And then we switched over to the "generic" table flushing (which harkens back to the powerpc code). Which actually turned out to be not generic at all, and did not flush the internal pages like x86 used to (back when x86 just used tlb_remove_page for everything). So as a result, x86 had unintentionally lost the TLB flush we used to have, because tlb_remove_table() had lost the tlb flushing because of a powerpc quirk. You then added it back as a hacky per-architecture hook (apparently having realized that you never did it at all), which didn't fix the unintentional lack of flushing on x86. So now we're going to do it right. No more "oh, powerpc didn't need to flush because the hash tables weren't in the tlb at all" thing in the generic code that then others need to work around. Linus
Re: [PATCH 3/4] mm/tlb, x86/mm: Support invalidating TLB caches for RCU_TABLE_FREE
On Wed, Aug 22, 2018 at 8:45 PM Nicholas Piggin wrote: > > powerpc/radix has no such issue, it already does this tracking. Yeah, I now realize that this was why you wanted to add that hacky thing to the generic code, so that you can add the tlb_flush_pgtable() call. I thought it was because powerpc had some special flush instruction for it, and the regular tlb flush didn't do it. But no. It was because the regular code had lost the tlb flush _entirely_, because powerpc didn't want it. > We were discussing this a couple of months ago, I wasn't aware of ARM's > issue but I suggested x86 could go the same way as powerpc. The problem is that x86 _used_ to do this all correctly long long ago. And then we switched over to the "generic" table flushing (which harkens back to the powerpc code). Which actually turned out to be not generic at all, and did not flush the internal pages like x86 used to (back when x86 just used tlb_remove_page for everything). So as a result, x86 had unintentionally lost the TLB flush we used to have, because tlb_remove_table() had lost the tlb flushing because of a powerpc quirk. You then added it back as a hacky per-architecture hook (apparently having realized that you never did it at all), which didn't fix the unintentional lack of flushing on x86. So now we're going to do it right. No more "oh, powerpc didn't need to flush because the hash tables weren't in the tlb at all" thing in the generic code that then others need to work around. Linus
mv: cannot stat 'drivers/net/wireless/intel/iwlwifi/mvm/.tmp_mx_ops.o': No such file or directory
tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master head: 815f0ddb346c196018d4d8f8f55c12b83da1de3f commit: 0fbe9a245c60bedebb6dd329966f463bb724450a microblaze: add endianness options to LDFLAGS instead of LD date: 4 weeks ago config: microblaze-allmodconfig (attached as .config) compiler: microblaze-linux-gcc (GCC) 8.1.0 reproduce: wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross git checkout 0fbe9a245c60bedebb6dd329966f463bb724450a # save the attached .config to linux build tree GCC_VERSION=8.1.0 make.cross ARCH=microblaze All errors (new ones prefixed by >>): microblaze-linux-ld: drivers/net/wireless/intel/iwlwifi/mvm/ops.o: compiled for a little endian system and target is big endian microblaze-linux-ld: failed to merge target specific data of file drivers/net/wireless/intel/iwlwifi/mvm/ops.o microblaze-linux-ld: drivers/net/wireless/intel/iwlwifi/mvm/.tmp_mc_ops.o: compiled for a little endian system and target is big endian microblaze-linux-ld: failed to merge target specific data of file drivers/net/wireless/intel/iwlwifi/mvm/.tmp_mc_ops.o >> mv: cannot stat 'drivers/net/wireless/intel/iwlwifi/mvm/.tmp_mx_ops.o': No >> such file or directory --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
Re: [PATCH] x86, kdump: Fix efi=noruntime NULL pointer dereference
On 08/22/18 at 06:23pm, Dave Young wrote: > On 08/21/18 at 03:39pm, Ard Biesheuvel wrote: > > On 9 August 2018 at 11:13, Dave Young wrote: > > > On 08/09/18 at 09:33am, Mike Galbraith wrote: > > >> On Thu, 2018-08-09 at 12:21 +0800, Dave Young wrote: > > >> > Hi Mike, > > >> > > > >> > Thanks for the patch! > > >> > On 08/08/18 at 04:03pm, Mike Galbraith wrote: > > >> > > When booting with efi=noruntime, we call efi_runtime_map_copy() while > > >> > > loading the kdump kernel, and trip over a NULL efi.memmap.map. Avoid > > >> > > that and a useless allocation when the only mapping we can use (1:1) > > >> > > is not available. > > >> > > > >> > At first glance, efi_get_runtime_map_size should return 0 in case > > >> > noruntime. > > >> > > >> What efi does internally at unmap time is to leave everything except > > >> efi.mmap.map untouched, setting it to NULL and turning off EFI_MEMMAP, > > >> rendering efi.mmap.map accessors useless/unsafe without first checking > > >> EFI_MEMMAP. > > > > > > Probably the x86 efi_init should reset nr_map to zero in case runtime is > > > disabled. But let's see how Ard thinks about this and cc linux-efi. > > > > > > As for efi_get_runtime_map_size, it was introduced for x86 kexec use. > > > for copying runtime maps, so I think it is reasonable this function > > > return zero in case no runtime. > > > > > > > I don't see the patch in the context so I cannot comment in great detail. > > The patch is below: > https://lore.kernel.org/lkml/1533737025.4936.3.ca...@gmx.de > > > > > In any case, it is better to decouple EFI_MEMMAP from EFI_RUNTIME > > dependencies. On x86, one may imply the other, but this is not > > generally the case. > > > > That means that efi_get_runtime_map_size() should probably check the > > EFI_RUNTIME flag, and return 0 if it is cleared. Perhaps there are > > other places where EFI_MEMMAP flag checks are missing, but I consider > > that a separate issue. > > Yes, I also agree with to check EFI_RUNTIME_SERVICES. There is no point for > efi_get_runtime_map_size to return a value other than 0 in case > EFI_RUNTIME_SERVICES > is not set ie. via efi=noruntime > > Is below patch acceptable? The copy function can be changed to return > an error in case map size == 0, but that can be done later along with > the caller size cleanups in kexec code Forgot to add Mike's reported-by tag.. Mike, since we are going this way, I'm working on a kexec code cleanup, but it needs careful testing so still need some time. Can you help test below efi fix and provide you tested-by if it works? > --- > > efi: check EFI_RUNTIME_SERVICES flag in runtime map copying code > > Mike reported a kexec_file_load NULL pointer dereference bug like below: > [5.878262] BUG: unable to handle kernel NULL pointer dereference at > > [5.879868] PGD 80013c1f1067 P4D 80013c1f1067 PUD 13aea7067 PMD 0 > [5.881225] Oops: [#1] SMP PTI > [5.882068] Modules linked in: > [5.882851] CPU: 0 PID: 394 Comm: kexec Kdump: loaded Not tainted > 4.17.0-rc2+ #648 > [5.884333] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 > 02/06/2015 > [5.885843] RIP: 0010:memcpy_erms+0x6/0x10 > [5.886789] RSP: 0018:c958bd00 EFLAGS: 00010246 > [5.887899] RAX: 880138e050b0 RBX: 000980b0 RCX: > 0ba0 > [5.889304] RDX: 0ba0 RSI: RDI: > 880138e050b0 > [5.890977] RBP: 880138e04000 R08: 0017 R09: > 0002 > [5.892524] R10: 00099000 R11: 52d0 R12: > 39400200 > [5.893967] R13: 880138e05000 R14: 0ba0 R15: > c9a4d000 > [5.895378] FS: 7f167c9e6740() GS:88013fc0() > knlGS: > [5.896953] CS: 0010 DS: ES: CR0: 80050033 > [5.898143] CR2: CR3: 00013c3ec002 CR4: > 001606f0 > [5.899542] DR0: DR1: DR2: > > [5.900962] DR3: DR6: fffe0ff0 DR7: > 0400 > [5.902552] Call Trace: > [5.903267] efi_runtime_map_copy+0x28/0x30 > [5.904956] bzImage64_load+0x59d/0x736 > [5.906881] ? arch_kexec_kernel_image_load+0x6d/0x70 > [5.908243] ? __se_sys_kexec_file_load+0x24b/0x750 > [5.909352] ? _cond_resched+0x19/0x30 > [5.910286] ? do_syscall_64+0x65/0x180 > [5.911229] ? entry_SYSCALL_64_after_hwframe+0x44/0xa9 > [5.912365] Code: 90 90 90 90 90 eb 1e 0f 1f 00 48 89 f8 48 89 d1 48 c1 e9 > 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89 d1 > a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38 > [5.916235] RIP: memcpy_erms+0x6/0x10 RSP: c958bd00 > [5.917507] CR2: > [5.918762] ---[ end trace 5cf4c4b3b93d7fdd ]--- > > Changing
mv: cannot stat 'drivers/net/wireless/intel/iwlwifi/mvm/.tmp_mx_ops.o': No such file or directory
tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master head: 815f0ddb346c196018d4d8f8f55c12b83da1de3f commit: 0fbe9a245c60bedebb6dd329966f463bb724450a microblaze: add endianness options to LDFLAGS instead of LD date: 4 weeks ago config: microblaze-allmodconfig (attached as .config) compiler: microblaze-linux-gcc (GCC) 8.1.0 reproduce: wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross git checkout 0fbe9a245c60bedebb6dd329966f463bb724450a # save the attached .config to linux build tree GCC_VERSION=8.1.0 make.cross ARCH=microblaze All errors (new ones prefixed by >>): microblaze-linux-ld: drivers/net/wireless/intel/iwlwifi/mvm/ops.o: compiled for a little endian system and target is big endian microblaze-linux-ld: failed to merge target specific data of file drivers/net/wireless/intel/iwlwifi/mvm/ops.o microblaze-linux-ld: drivers/net/wireless/intel/iwlwifi/mvm/.tmp_mc_ops.o: compiled for a little endian system and target is big endian microblaze-linux-ld: failed to merge target specific data of file drivers/net/wireless/intel/iwlwifi/mvm/.tmp_mc_ops.o >> mv: cannot stat 'drivers/net/wireless/intel/iwlwifi/mvm/.tmp_mx_ops.o': No >> such file or directory --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
Re: [PATCH] x86, kdump: Fix efi=noruntime NULL pointer dereference
On 08/22/18 at 06:23pm, Dave Young wrote: > On 08/21/18 at 03:39pm, Ard Biesheuvel wrote: > > On 9 August 2018 at 11:13, Dave Young wrote: > > > On 08/09/18 at 09:33am, Mike Galbraith wrote: > > >> On Thu, 2018-08-09 at 12:21 +0800, Dave Young wrote: > > >> > Hi Mike, > > >> > > > >> > Thanks for the patch! > > >> > On 08/08/18 at 04:03pm, Mike Galbraith wrote: > > >> > > When booting with efi=noruntime, we call efi_runtime_map_copy() while > > >> > > loading the kdump kernel, and trip over a NULL efi.memmap.map. Avoid > > >> > > that and a useless allocation when the only mapping we can use (1:1) > > >> > > is not available. > > >> > > > >> > At first glance, efi_get_runtime_map_size should return 0 in case > > >> > noruntime. > > >> > > >> What efi does internally at unmap time is to leave everything except > > >> efi.mmap.map untouched, setting it to NULL and turning off EFI_MEMMAP, > > >> rendering efi.mmap.map accessors useless/unsafe without first checking > > >> EFI_MEMMAP. > > > > > > Probably the x86 efi_init should reset nr_map to zero in case runtime is > > > disabled. But let's see how Ard thinks about this and cc linux-efi. > > > > > > As for efi_get_runtime_map_size, it was introduced for x86 kexec use. > > > for copying runtime maps, so I think it is reasonable this function > > > return zero in case no runtime. > > > > > > > I don't see the patch in the context so I cannot comment in great detail. > > The patch is below: > https://lore.kernel.org/lkml/1533737025.4936.3.ca...@gmx.de > > > > > In any case, it is better to decouple EFI_MEMMAP from EFI_RUNTIME > > dependencies. On x86, one may imply the other, but this is not > > generally the case. > > > > That means that efi_get_runtime_map_size() should probably check the > > EFI_RUNTIME flag, and return 0 if it is cleared. Perhaps there are > > other places where EFI_MEMMAP flag checks are missing, but I consider > > that a separate issue. > > Yes, I also agree with to check EFI_RUNTIME_SERVICES. There is no point for > efi_get_runtime_map_size to return a value other than 0 in case > EFI_RUNTIME_SERVICES > is not set ie. via efi=noruntime > > Is below patch acceptable? The copy function can be changed to return > an error in case map size == 0, but that can be done later along with > the caller size cleanups in kexec code Forgot to add Mike's reported-by tag.. Mike, since we are going this way, I'm working on a kexec code cleanup, but it needs careful testing so still need some time. Can you help test below efi fix and provide you tested-by if it works? > --- > > efi: check EFI_RUNTIME_SERVICES flag in runtime map copying code > > Mike reported a kexec_file_load NULL pointer dereference bug like below: > [5.878262] BUG: unable to handle kernel NULL pointer dereference at > > [5.879868] PGD 80013c1f1067 P4D 80013c1f1067 PUD 13aea7067 PMD 0 > [5.881225] Oops: [#1] SMP PTI > [5.882068] Modules linked in: > [5.882851] CPU: 0 PID: 394 Comm: kexec Kdump: loaded Not tainted > 4.17.0-rc2+ #648 > [5.884333] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 > 02/06/2015 > [5.885843] RIP: 0010:memcpy_erms+0x6/0x10 > [5.886789] RSP: 0018:c958bd00 EFLAGS: 00010246 > [5.887899] RAX: 880138e050b0 RBX: 000980b0 RCX: > 0ba0 > [5.889304] RDX: 0ba0 RSI: RDI: > 880138e050b0 > [5.890977] RBP: 880138e04000 R08: 0017 R09: > 0002 > [5.892524] R10: 00099000 R11: 52d0 R12: > 39400200 > [5.893967] R13: 880138e05000 R14: 0ba0 R15: > c9a4d000 > [5.895378] FS: 7f167c9e6740() GS:88013fc0() > knlGS: > [5.896953] CS: 0010 DS: ES: CR0: 80050033 > [5.898143] CR2: CR3: 00013c3ec002 CR4: > 001606f0 > [5.899542] DR0: DR1: DR2: > > [5.900962] DR3: DR6: fffe0ff0 DR7: > 0400 > [5.902552] Call Trace: > [5.903267] efi_runtime_map_copy+0x28/0x30 > [5.904956] bzImage64_load+0x59d/0x736 > [5.906881] ? arch_kexec_kernel_image_load+0x6d/0x70 > [5.908243] ? __se_sys_kexec_file_load+0x24b/0x750 > [5.909352] ? _cond_resched+0x19/0x30 > [5.910286] ? do_syscall_64+0x65/0x180 > [5.911229] ? entry_SYSCALL_64_after_hwframe+0x44/0xa9 > [5.912365] Code: 90 90 90 90 90 eb 1e 0f 1f 00 48 89 f8 48 89 d1 48 c1 e9 > 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89 d1 > a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38 > [5.916235] RIP: memcpy_erms+0x6/0x10 RSP: c958bd00 > [5.917507] CR2: > [5.918762] ---[ end trace 5cf4c4b3b93d7fdd ]--- > > Changing
Re: [PATCH 3/4] mm/tlb, x86/mm: Support invalidating TLB caches for RCU_TABLE_FREE
On Wed, 22 Aug 2018 17:55:27 +0200 Peter Zijlstra wrote: > On Wed, Aug 22, 2018 at 05:30:15PM +0200, Peter Zijlstra wrote: > > ARM > > which later used this put an explicit TLB invalidate in their > > __p*_free_tlb() functions, and PowerPC-radix followed that example. > > > +/* > > + * If we want tlb_remove_table() to imply TLB invalidates. > > + */ > > +static inline void tlb_table_invalidate(struct mmu_gather *tlb) > > +{ > > +#ifdef CONFIG_HAVE_RCU_TABLE_INVALIDATE > > + /* > > +* Invalidate page-table caches used by hardware walkers. Then we still > > +* need to RCU-sched wait while freeing the pages because software > > +* walkers can still be in-flight. > > +*/ > > + __tlb_flush_mmu_tlbonly(tlb); > > +#endif > > +} > > > Nick, Will is already looking at using this to remove the synchronous > invalidation from __p*_free_tlb() for ARM, could you have a look to see > if PowerPC-radix could benefit from that too? powerpc/radix has no such issue, it already does this tracking. We were discussing this a couple of months ago, I wasn't aware of ARM's issue but I suggested x86 could go the same way as powerpc. Would have been good to get some feedback on some of the proposed approaches there. Because it's not just pwc tracking but if you do this you don't need those silly hacks in generic code to expand the TLB address range either. So powerpc has no fundamental problem with making this stuff generic. If you need a fix for x86 and ARM for this merge window though, I would suggest just copying what powerpc already has. Next time we can consider consolidating them all into generic code. Thanks, Nick > > Basically, using a patch like the below, would give your tlb_flush() > information on if tables were removed or not. > > --- > > --- a/include/asm-generic/tlb.h > +++ b/include/asm-generic/tlb.h > @@ -96,12 +96,22 @@ struct mmu_gather { > #endif > unsigned long start; > unsigned long end; > - /* we are in the middle of an operation to clear > - * a full mm and can make some optimizations */ > - unsigned intfullmm : 1, > - /* we have performed an operation which > - * requires a complete flush of the tlb */ > - need_flush_all : 1; > + /* > + * we are in the middle of an operation to clear > + * a full mm and can make some optimizations > + */ > + unsigned intfullmm : 1; > + > + /* > + * we have performed an operation which > + * requires a complete flush of the tlb > + */ > + unsigned intneed_flush_all : 1; > + > + /* > + * we have removed page directories > + */ > + unsigned intfreed_tables : 1; > > struct mmu_gather_batch *active; > struct mmu_gather_batch local; > @@ -136,6 +146,7 @@ static inline void __tlb_reset_range(str > tlb->start = TASK_SIZE; > tlb->end = 0; > } > + tlb->freed_tables = 0; > } > > static inline void tlb_remove_page_size(struct mmu_gather *tlb, > @@ -269,6 +280,7 @@ static inline void tlb_remove_check_page > #define pte_free_tlb(tlb, ptep, address) \ > do {\ > __tlb_adjust_range(tlb, address, PAGE_SIZE);\ > + tlb->freed_tables = 1; \ > __pte_free_tlb(tlb, ptep, address); \ > } while (0) > #endif > @@ -276,7 +288,8 @@ static inline void tlb_remove_check_page > #ifndef pmd_free_tlb > #define pmd_free_tlb(tlb, pmdp, address) \ > do {\ > - __tlb_adjust_range(tlb, address, PAGE_SIZE);\ > + __tlb_adjust_range(tlb, address, PAGE_SIZE);\ > + tlb->freed_tables = 1; \ > __pmd_free_tlb(tlb, pmdp, address); \ > } while (0) > #endif > @@ -286,6 +299,7 @@ static inline void tlb_remove_check_page > #define pud_free_tlb(tlb, pudp, address) \ > do {\ > __tlb_adjust_range(tlb, address, PAGE_SIZE);\ > + tlb->freed_tables = 1; \ > __pud_free_tlb(tlb, pudp, address); \ > } while (0) > #endif > @@ -295,7 +309,8 @@ static inline void tlb_remove_check_page > #ifndef p4d_free_tlb > #define p4d_free_tlb(tlb, pudp, address) \ > do {\ > - __tlb_adjust_range(tlb, address, PAGE_SIZE);\ > + __tlb_adjust_range(tlb, address, PAGE_SIZE);\ > + tlb->freed_tables = 1; \ > __p4d_free_tlb(tlb, pudp, address); \ > } while (0) > #endif
Re: [PATCH 3/4] mm/tlb, x86/mm: Support invalidating TLB caches for RCU_TABLE_FREE
On Wed, 22 Aug 2018 17:55:27 +0200 Peter Zijlstra wrote: > On Wed, Aug 22, 2018 at 05:30:15PM +0200, Peter Zijlstra wrote: > > ARM > > which later used this put an explicit TLB invalidate in their > > __p*_free_tlb() functions, and PowerPC-radix followed that example. > > > +/* > > + * If we want tlb_remove_table() to imply TLB invalidates. > > + */ > > +static inline void tlb_table_invalidate(struct mmu_gather *tlb) > > +{ > > +#ifdef CONFIG_HAVE_RCU_TABLE_INVALIDATE > > + /* > > +* Invalidate page-table caches used by hardware walkers. Then we still > > +* need to RCU-sched wait while freeing the pages because software > > +* walkers can still be in-flight. > > +*/ > > + __tlb_flush_mmu_tlbonly(tlb); > > +#endif > > +} > > > Nick, Will is already looking at using this to remove the synchronous > invalidation from __p*_free_tlb() for ARM, could you have a look to see > if PowerPC-radix could benefit from that too? powerpc/radix has no such issue, it already does this tracking. We were discussing this a couple of months ago, I wasn't aware of ARM's issue but I suggested x86 could go the same way as powerpc. Would have been good to get some feedback on some of the proposed approaches there. Because it's not just pwc tracking but if you do this you don't need those silly hacks in generic code to expand the TLB address range either. So powerpc has no fundamental problem with making this stuff generic. If you need a fix for x86 and ARM for this merge window though, I would suggest just copying what powerpc already has. Next time we can consider consolidating them all into generic code. Thanks, Nick > > Basically, using a patch like the below, would give your tlb_flush() > information on if tables were removed or not. > > --- > > --- a/include/asm-generic/tlb.h > +++ b/include/asm-generic/tlb.h > @@ -96,12 +96,22 @@ struct mmu_gather { > #endif > unsigned long start; > unsigned long end; > - /* we are in the middle of an operation to clear > - * a full mm and can make some optimizations */ > - unsigned intfullmm : 1, > - /* we have performed an operation which > - * requires a complete flush of the tlb */ > - need_flush_all : 1; > + /* > + * we are in the middle of an operation to clear > + * a full mm and can make some optimizations > + */ > + unsigned intfullmm : 1; > + > + /* > + * we have performed an operation which > + * requires a complete flush of the tlb > + */ > + unsigned intneed_flush_all : 1; > + > + /* > + * we have removed page directories > + */ > + unsigned intfreed_tables : 1; > > struct mmu_gather_batch *active; > struct mmu_gather_batch local; > @@ -136,6 +146,7 @@ static inline void __tlb_reset_range(str > tlb->start = TASK_SIZE; > tlb->end = 0; > } > + tlb->freed_tables = 0; > } > > static inline void tlb_remove_page_size(struct mmu_gather *tlb, > @@ -269,6 +280,7 @@ static inline void tlb_remove_check_page > #define pte_free_tlb(tlb, ptep, address) \ > do {\ > __tlb_adjust_range(tlb, address, PAGE_SIZE);\ > + tlb->freed_tables = 1; \ > __pte_free_tlb(tlb, ptep, address); \ > } while (0) > #endif > @@ -276,7 +288,8 @@ static inline void tlb_remove_check_page > #ifndef pmd_free_tlb > #define pmd_free_tlb(tlb, pmdp, address) \ > do {\ > - __tlb_adjust_range(tlb, address, PAGE_SIZE);\ > + __tlb_adjust_range(tlb, address, PAGE_SIZE);\ > + tlb->freed_tables = 1; \ > __pmd_free_tlb(tlb, pmdp, address); \ > } while (0) > #endif > @@ -286,6 +299,7 @@ static inline void tlb_remove_check_page > #define pud_free_tlb(tlb, pudp, address) \ > do {\ > __tlb_adjust_range(tlb, address, PAGE_SIZE);\ > + tlb->freed_tables = 1; \ > __pud_free_tlb(tlb, pudp, address); \ > } while (0) > #endif > @@ -295,7 +309,8 @@ static inline void tlb_remove_check_page > #ifndef p4d_free_tlb > #define p4d_free_tlb(tlb, pudp, address) \ > do {\ > - __tlb_adjust_range(tlb, address, PAGE_SIZE);\ > + __tlb_adjust_range(tlb, address, PAGE_SIZE);\ > + tlb->freed_tables = 1; \ > __p4d_free_tlb(tlb, pudp, address); \ > } while (0) > #endif
arch/powerpc/boot/../../../lib/xz/xz_crc32.c:18:10: fatal error: linux/crc32poly.h: No such file or directory
tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master head: 815f0ddb346c196018d4d8f8f55c12b83da1de3f commit: faa16bc404d72a5afb857c924c83a5f691f83386 lib: Use existing define with polynomial date: 4 weeks ago config: powerpc-skiroot_defconfig (attached as .config) compiler: powerpc64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0 reproduce: wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross git checkout faa16bc404d72a5afb857c924c83a5f691f83386 # save the attached .config to linux build tree GCC_VERSION=7.2.0 make.cross ARCH=powerpc All errors (new ones prefixed by >>): In file included from arch/powerpc/boot/../../../lib/decompress_unxz.c:233:0, from arch/powerpc/boot/decompress.c:42: >> arch/powerpc/boot/../../../lib/xz/xz_crc32.c:18:10: fatal error: >> linux/crc32poly.h: No such file or directory #include ^~~ compilation terminated. vim +18 arch/powerpc/boot/../../../lib/xz/xz_crc32.c > 18 #include 19 #include "xz_private.h" 20 --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
arch/powerpc/boot/../../../lib/xz/xz_crc32.c:18:10: fatal error: linux/crc32poly.h: No such file or directory
tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master head: 815f0ddb346c196018d4d8f8f55c12b83da1de3f commit: faa16bc404d72a5afb857c924c83a5f691f83386 lib: Use existing define with polynomial date: 4 weeks ago config: powerpc-skiroot_defconfig (attached as .config) compiler: powerpc64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0 reproduce: wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross git checkout faa16bc404d72a5afb857c924c83a5f691f83386 # save the attached .config to linux build tree GCC_VERSION=7.2.0 make.cross ARCH=powerpc All errors (new ones prefixed by >>): In file included from arch/powerpc/boot/../../../lib/decompress_unxz.c:233:0, from arch/powerpc/boot/decompress.c:42: >> arch/powerpc/boot/../../../lib/xz/xz_crc32.c:18:10: fatal error: >> linux/crc32poly.h: No such file or directory #include ^~~ compilation terminated. vim +18 arch/powerpc/boot/../../../lib/xz/xz_crc32.c > 18 #include 19 #include "xz_private.h" 20 --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
Re: [PATCH 2/4] mm/tlb: Remove tlb_remove_table() non-concurrent condition
On Wed, Aug 22, 2018 at 8:35 PM Linus Torvalds wrote: > > No. Because mm_users doesn't contain any lazy tlb users. .. or, as it turns out, the use_mm() case either, which can do gup_fast(). Oh well. Linus
Re: [PATCH 2/4] mm/tlb: Remove tlb_remove_table() non-concurrent condition
On Wed, Aug 22, 2018 at 8:35 PM Linus Torvalds wrote: > > No. Because mm_users doesn't contain any lazy tlb users. .. or, as it turns out, the use_mm() case either, which can do gup_fast(). Oh well. Linus
Re: [PATCH 2/4] mm/tlb: Remove tlb_remove_table() non-concurrent condition
On Wed, Aug 22, 2018 at 8:31 PM Nicholas Piggin wrote: > > > So that leaves speculative operations. I don't see where the problem is > with those either -- this shortcut needs to ensure there are no other > *non speculative* operations. mm_users is correct for that. No. Because mm_users doesn't contain any lazy tlb users. And yes, those lazy tlbs are all kernel threads, but they can still speculatively load user addresses. Linus
Re: [PATCH 2/4] mm/tlb: Remove tlb_remove_table() non-concurrent condition
On Wed, Aug 22, 2018 at 8:31 PM Nicholas Piggin wrote: > > > So that leaves speculative operations. I don't see where the problem is > with those either -- this shortcut needs to ensure there are no other > *non speculative* operations. mm_users is correct for that. No. Because mm_users doesn't contain any lazy tlb users. And yes, those lazy tlbs are all kernel threads, but they can still speculatively load user addresses. Linus
linux-next: Tree for Aug 23
Hi all, Please do not add any v4.20 material to your linux-next included branches until after v4.19-rc1 has been released. Changes since 20180822: New tree: at91-fixes The kbuild tree gained a build failure so I used the version from next-20180822. Non-merge commits (relative to Linus' tree): 1609 1899 files changed, 59754 insertions(+), 27656 deletions(-) I have created today's linux-next tree at git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git (patches at http://www.kernel.org/pub/linux/kernel/next/ ). If you are tracking the linux-next tree using git, you should not use "git pull" to do so as that will try to merge the new linux-next release with the old one. You should use "git fetch" and checkout or reset to the new master. You can see which trees have been included by looking in the Next/Trees file in the source. There are also quilt-import.log and merge.log files in the Next directory. Between each merge, the tree was built with a ppc64_defconfig for powerpc, an allmodconfig for x86_64, a multi_v7_defconfig for arm and a native build of tools/perf. After the final fixups (if any), I do an x86_64 modules_install followed by builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc and sparc64 defconfig. And finally, a simple boot test of the powerpc pseries_le_defconfig kernel in qemu (with and without kvm enabled). Below is a summary of the state of the merge. I am currently merging 287 trees (counting Linus' and 65 trees of bug fix patches pending for the current merge release). Stats about the size of the tree over time can be seen at http://neuling.org/linux-next-size.html . Status of my local build tests will be at http://kisskb.ellerman.id.au/linux-next . If maintainers want to give advice about cross compilers/configs that work, we are always open to add more builds. Thanks to Randy Dunlap for doing many randconfig builds. And to Paul Gortmaker for triage and bug fixes. -- Cheers, Stephen Rothwell $ git checkout master $ git reset --hard stable Merging origin/master (899fbc33fd77 Merge tag 'platform-drivers-x86-v4.19-1' of git://git.infradead.org/linux-platform-drivers-x86) Merging fixes/master (147a89bc71e7 Merge tag 'kconfig-v4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild) Merging kbuild-current/fixes (ad1d69735878 Merge tag 'fuse-update-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse) Merging arc-current/for-curr (fce0d0affdc5 ARC: cleanup show_faulting_vma()) Merging arm-current/fixes (afc9f65e01cd ARM: 8781/1: Fix Thumb-2 syscall return for binutils 2.29+) Merging arm64-fixes/for-next/fixes (5ad356eabc47 arm64: mm: check for upper PAGE_SHIFT bits in pfn_valid()) Merging m68k-current/for-linus (71a896687b85 m68k/defconfig: Update defconfigs for v4.18-rc6) Merging powerpc-fixes/fixes (cca19f0b684f powerpc/64s/radix: Fix missing global invalidations when removing copro) Merging sparc/master (df2def49c57b Merge tag 'acpi-4.19-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm) Merging fscrypt-current/for-stable (ae64f9bd1d36 Linux 4.15-rc2) Merging net/master (78c452fe7068 Merge branch 'net_sched-fixes') Merging bpf/master (9b2e0388bec8 bpf: sockmap: write_space events need to be passed to TCP handler) Merging ipsec/master (25432eba9cd8 openvswitch: meter: Fix setting meter id for new entries) Merging netfilter/master (9c86336c15db ip6_vti: fix a null pointer deference when destroy vti6 tunnel) Merging ipvs/master (feb9f55c33e5 netfilter: nft_dynset: allow dynamic updates of non-anonymous set) Merging wireless-drivers/master (299b6365a3b7 brcmfmac: fix regression in parsing NVRAM for multiple devices) Merging mac80211/master (8a54d8fc160e cfg80211: remove division by size of sizeof(struct ieee80211_wmm_rule)) Merging rdma-fixes/for-rc (845b397a7771 IB/ucm: fix UCM link error) Merging sound-current/for-linus (8a328ac1f9eb ALSA: hda/realtek - Fix HP Headset Mic can't record) Merging sound-asoc-fixes/for-linus (ba095ffc9324 Merge branch 'asoc-4.18' into asoc-linus) Merging regmap-fixes/for-linus (94710cac0ef4 Linux 4.18) Merging regulator-fixes/for-linus (84d77c5ab32d Merge branch 'regulator-4.18' into regulator-linus) Merging spi-fixes/for-linus (8a7c14551b9b Merge branch 'spi-4.18' into spi-linus) Merging pci-current/for-linus (44bda4b7d26e PCI: Fix is_added/is_busmaster race condition) Merging driver-core.current/driver-core-linus (ad1d69735878 Merge tag 'fuse-update-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse) Merging tty.current/tty-linus (ad1d69735878 Merge tag 'fuse-update-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse) Merging usb.current/usb-linus (ad1d69735878 Merge tag 'fuse-update-4.19' of git://git.kernel.org/pub/scm/linux
linux-next: Tree for Aug 23
Hi all, Please do not add any v4.20 material to your linux-next included branches until after v4.19-rc1 has been released. Changes since 20180822: New tree: at91-fixes The kbuild tree gained a build failure so I used the version from next-20180822. Non-merge commits (relative to Linus' tree): 1609 1899 files changed, 59754 insertions(+), 27656 deletions(-) I have created today's linux-next tree at git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git (patches at http://www.kernel.org/pub/linux/kernel/next/ ). If you are tracking the linux-next tree using git, you should not use "git pull" to do so as that will try to merge the new linux-next release with the old one. You should use "git fetch" and checkout or reset to the new master. You can see which trees have been included by looking in the Next/Trees file in the source. There are also quilt-import.log and merge.log files in the Next directory. Between each merge, the tree was built with a ppc64_defconfig for powerpc, an allmodconfig for x86_64, a multi_v7_defconfig for arm and a native build of tools/perf. After the final fixups (if any), I do an x86_64 modules_install followed by builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc and sparc64 defconfig. And finally, a simple boot test of the powerpc pseries_le_defconfig kernel in qemu (with and without kvm enabled). Below is a summary of the state of the merge. I am currently merging 287 trees (counting Linus' and 65 trees of bug fix patches pending for the current merge release). Stats about the size of the tree over time can be seen at http://neuling.org/linux-next-size.html . Status of my local build tests will be at http://kisskb.ellerman.id.au/linux-next . If maintainers want to give advice about cross compilers/configs that work, we are always open to add more builds. Thanks to Randy Dunlap for doing many randconfig builds. And to Paul Gortmaker for triage and bug fixes. -- Cheers, Stephen Rothwell $ git checkout master $ git reset --hard stable Merging origin/master (899fbc33fd77 Merge tag 'platform-drivers-x86-v4.19-1' of git://git.infradead.org/linux-platform-drivers-x86) Merging fixes/master (147a89bc71e7 Merge tag 'kconfig-v4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild) Merging kbuild-current/fixes (ad1d69735878 Merge tag 'fuse-update-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse) Merging arc-current/for-curr (fce0d0affdc5 ARC: cleanup show_faulting_vma()) Merging arm-current/fixes (afc9f65e01cd ARM: 8781/1: Fix Thumb-2 syscall return for binutils 2.29+) Merging arm64-fixes/for-next/fixes (5ad356eabc47 arm64: mm: check for upper PAGE_SHIFT bits in pfn_valid()) Merging m68k-current/for-linus (71a896687b85 m68k/defconfig: Update defconfigs for v4.18-rc6) Merging powerpc-fixes/fixes (cca19f0b684f powerpc/64s/radix: Fix missing global invalidations when removing copro) Merging sparc/master (df2def49c57b Merge tag 'acpi-4.19-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm) Merging fscrypt-current/for-stable (ae64f9bd1d36 Linux 4.15-rc2) Merging net/master (78c452fe7068 Merge branch 'net_sched-fixes') Merging bpf/master (9b2e0388bec8 bpf: sockmap: write_space events need to be passed to TCP handler) Merging ipsec/master (25432eba9cd8 openvswitch: meter: Fix setting meter id for new entries) Merging netfilter/master (9c86336c15db ip6_vti: fix a null pointer deference when destroy vti6 tunnel) Merging ipvs/master (feb9f55c33e5 netfilter: nft_dynset: allow dynamic updates of non-anonymous set) Merging wireless-drivers/master (299b6365a3b7 brcmfmac: fix regression in parsing NVRAM for multiple devices) Merging mac80211/master (8a54d8fc160e cfg80211: remove division by size of sizeof(struct ieee80211_wmm_rule)) Merging rdma-fixes/for-rc (845b397a7771 IB/ucm: fix UCM link error) Merging sound-current/for-linus (8a328ac1f9eb ALSA: hda/realtek - Fix HP Headset Mic can't record) Merging sound-asoc-fixes/for-linus (ba095ffc9324 Merge branch 'asoc-4.18' into asoc-linus) Merging regmap-fixes/for-linus (94710cac0ef4 Linux 4.18) Merging regulator-fixes/for-linus (84d77c5ab32d Merge branch 'regulator-4.18' into regulator-linus) Merging spi-fixes/for-linus (8a7c14551b9b Merge branch 'spi-4.18' into spi-linus) Merging pci-current/for-linus (44bda4b7d26e PCI: Fix is_added/is_busmaster race condition) Merging driver-core.current/driver-core-linus (ad1d69735878 Merge tag 'fuse-update-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse) Merging tty.current/tty-linus (ad1d69735878 Merge tag 'fuse-update-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse) Merging usb.current/usb-linus (ad1d69735878 Merge tag 'fuse-update-4.19' of git://git.kernel.org/pub/scm/linux
Re: [PATCH 2/4] mm/tlb: Remove tlb_remove_table() non-concurrent condition
On Wed, 22 Aug 2018 17:30:14 +0200 Peter Zijlstra wrote: > Will noted that only checking mm_users is incorrect; we should also > check mm_count in order to cover CPUs that have a lazy reference to > this mm (and could do speculative TLB operations). Why is that incorrect? This shortcut has nothing to do with no TLBs -- not sure about x86, but other CPUs can certainly have remaining TLBs here, speculative operations or not (even if they don't have an mm_count ref they can have TLBs here). So that leaves speculative operations. I don't see where the problem is with those either -- this shortcut needs to ensure there are no other *non speculative* operations. mm_users is correct for that. If there is a speculation security problem here it should be carefully documented otherwise it's going to be re-introduced... I actually have a patch to extend this optimisation further that I'm going to send out again today. It's nice to avoid the double handling of the pages. Thanks, Nick > > If removing this turns out to be a performance issue, we can > re-instate a more complete check, but in tlb_table_flush() eliding the > call_rcu_sched(). > > Cc: sta...@kernel.org > Cc: Nicholas Piggin > Cc: David Miller > Cc: Will Deacon > Cc: Martin Schwidefsky > Cc: Michael Ellerman > Fixes: 267239116987 ("mm, powerpc: move the RCU page-table freeing into > generic code") > Reported-by: Will Deacon > Signed-off-by: Peter Zijlstra (Intel) > --- > mm/memory.c |9 - > 1 file changed, 9 deletions(-) > > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -375,15 +375,6 @@ void tlb_remove_table(struct mmu_gather > { > struct mmu_table_batch **batch = >batch; > > - /* > - * When there's less then two users of this mm there cannot be a > - * concurrent page-table walk. > - */ > - if (atomic_read(>mm->mm_users) < 2) { > - __tlb_remove_table(table); > - return; > - } > - > if (*batch == NULL) { > *batch = (struct mmu_table_batch *)__get_free_page(GFP_NOWAIT | > __GFP_NOWARN); > if (*batch == NULL) { > >
Re: [PATCH 2/4] mm/tlb: Remove tlb_remove_table() non-concurrent condition
On Wed, 22 Aug 2018 17:30:14 +0200 Peter Zijlstra wrote: > Will noted that only checking mm_users is incorrect; we should also > check mm_count in order to cover CPUs that have a lazy reference to > this mm (and could do speculative TLB operations). Why is that incorrect? This shortcut has nothing to do with no TLBs -- not sure about x86, but other CPUs can certainly have remaining TLBs here, speculative operations or not (even if they don't have an mm_count ref they can have TLBs here). So that leaves speculative operations. I don't see where the problem is with those either -- this shortcut needs to ensure there are no other *non speculative* operations. mm_users is correct for that. If there is a speculation security problem here it should be carefully documented otherwise it's going to be re-introduced... I actually have a patch to extend this optimisation further that I'm going to send out again today. It's nice to avoid the double handling of the pages. Thanks, Nick > > If removing this turns out to be a performance issue, we can > re-instate a more complete check, but in tlb_table_flush() eliding the > call_rcu_sched(). > > Cc: sta...@kernel.org > Cc: Nicholas Piggin > Cc: David Miller > Cc: Will Deacon > Cc: Martin Schwidefsky > Cc: Michael Ellerman > Fixes: 267239116987 ("mm, powerpc: move the RCU page-table freeing into > generic code") > Reported-by: Will Deacon > Signed-off-by: Peter Zijlstra (Intel) > --- > mm/memory.c |9 - > 1 file changed, 9 deletions(-) > > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -375,15 +375,6 @@ void tlb_remove_table(struct mmu_gather > { > struct mmu_table_batch **batch = >batch; > > - /* > - * When there's less then two users of this mm there cannot be a > - * concurrent page-table walk. > - */ > - if (atomic_read(>mm->mm_users) < 2) { > - __tlb_remove_table(table); > - return; > - } > - > if (*batch == NULL) { > *batch = (struct mmu_table_batch *)__get_free_page(GFP_NOWAIT | > __GFP_NOWARN); > if (*batch == NULL) { > >
Re: KASAN: use-after-free Write in irq_bypass_register_consumer
On Sat, May 26, 2018 at 11:24:01AM +0200, Dmitry Vyukov wrote: > On Sun, May 13, 2018 at 8:21 AM, Eric Biggers wrote: > > On Thu, Apr 05, 2018 at 08:15:24PM -0700, Eric Biggers wrote: > >> On Mon, Jan 29, 2018 at 01:29:48PM +0800, Tianyu Lan wrote: > >> > > >> > > >> > On 1/27/2018 7:27 AM, Eric Biggers wrote: > >> > > On Sat, Dec 16, 2017 at 04:37:02PM +0800, Lan, Tianyu wrote: > >> > > > The root cause is that kvm_irqfd_assign() and kvm_irqfd_deassign() > >> > > > can't > >> > > > be run in parallel. Some data structure(e.g, irqfd->consumer) will be > >> > > > crashed because irqfd may be freed in deassign path before they are > >> > > > used > >> > > > in assign path. The other data maybe used in deassign path before > >> > > > initialization. Syzbot test hit such case. Add mutx between > >> > > > kvm_irqfd_assign() and kvm_irqfd_deassign() can fix such issue. Will > >> > > > send patch to fix it. > >> > > > > >> > > > On 12/16/2017 12:53 PM, Tianyu Lan wrote: > >> > > > > I reproduced the issue. Will have a look. > >> > > > > > >> > > > > -- Best regards Tianyu Lan 2017-12-15 18:14 GMT+08:00 syzbot > >> > > > > : > >> > > > > > syzkaller has found reproducer for the following crash on > >> > > > > > 82bcf1def3b5f1251177ad47c44f7e17af039b4b > >> > > > > > git://git.cmpxchg.org/linux-mmots.git/master > >> > > > > > compiler: gcc (GCC) 7.1.1 20170620 > >> > > > > > .config is attached > >> > > > > > Raw console output is attached. > >> > > > > > C reproducer is attached > >> > > > > > syzkaller reproducer is attached. Seehttps://goo.gl/kgGztJ > >> > > > > > for information about syzkaller reproducers > >> > > > > > > >> > > > > > > >> > > > > > == > >> > > > > > BUG: KASAN: use-after-free in __list_add include/linux/list.h:64 > >> > > > > > [inline] > >> > > > > > BUG: KASAN: use-after-free in list_add include/linux/list.h:79 > >> > > > > > [inline] > >> > > > > > BUG: KASAN: use-after-free in > >> > > > > > irq_bypass_register_consumer+0x4b4/0x500 > >> > > > > > virt/lib/irqbypass.c:217 > >> > > > > > Write of size 8 at addr 8801cdf51180 by task > >> > > > > > syzkaller436086/15031 > >> > > > > > > >> > > > > > CPU: 1 PID: 15031 Comm: syzkaller436086 Not tainted > >> > > > > > 4.15.0-rc2-mm1+ #39 > >> > > > > > Hardware name: Google Google Compute Engine/Google Compute > >> > > > > > Engine, BIOS > >> > > > > > Google 01/01/2011 > >> > > > > > Call Trace: > >> > > > > >__dump_stack lib/dump_stack.c:17 [inline] > >> > > > > >dump_stack+0x194/0x257 lib/dump_stack.c:53 > >> > > > > >print_address_description+0x73/0x250 mm/kasan/report.c:252 > >> > > > > >kasan_report_error mm/kasan/report.c:351 [inline] > >> > > > > >kasan_report+0x25b/0x340 mm/kasan/report.c:409 > >> > > > > >__asan_report_store8_noabort+0x17/0x20 mm/kasan/report.c:435 > >> > > > > >__list_add include/linux/list.h:64 [inline] > >> > > > > >list_add include/linux/list.h:79 [inline] > >> > > > > >irq_bypass_register_consumer+0x4b4/0x500 > >> > > > > > virt/lib/irqbypass.c:217 > >> > > > > >kvm_irqfd_assign arch/x86/kvm/../../../virt/kvm/eventfd.c:417 > >> > > > > > [inline] > >> > > > > >kvm_irqfd+0x137f/0x1d50 > >> > > > > > arch/x86/kvm/../../../virt/kvm/eventfd.c:572 > >> > > > > >kvm_vm_ioctl+0x1079/0x1c40 > >> > > > > > arch/x86/kvm/../../../virt/kvm/kvm_main.c:2992 > >> > > > > >vfs_ioctl fs/ioctl.c:46 [inline] > >> > > > > >do_vfs_ioctl+0x1b1/0x1530 fs/ioctl.c:686 > >> > > > > >SYSC_ioctl fs/ioctl.c:701 [inline] > >> > > > > >SyS_ioctl+0x8f/0xc0 fs/ioctl.c:692 > >> > > > > >entry_SYSCALL_64_fastpath+0x1f/0x96 > >> > > > > > RIP: 0033:0x44d379 > >> > > > > > RSP: 002b:7fc5ff9a9d08 EFLAGS: 0246 ORIG_RAX: > >> > > > > > 0010 > >> > > > > > RAX: ffda RBX: 7fc5ff9aa700 RCX: 0044d379 > >> > > > > > RDX: 20080fe0 RSI: 4020ae76 RDI: 0005 > >> > > > > > RBP: 007ff900 R08: 7fc5ff9aa700 R09: 7fc5ff9aa700 > >> > > > > > R10: 7fc5ff9aa700 R11: 0246 R12: > >> > > > > > R13: 007ff8ff R14: 7fc5ff9aa9c0 R15: > >> > > > > > > >> > > > > > Allocated by task 15031: > >> > > > > >save_stack+0x43/0xd0 mm/kasan/kasan.c:447 > >> > > > > >set_track mm/kasan/kasan.c:459 [inline] > >> > > > > >kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551 > >> > > > > >kmem_cache_alloc_trace+0x136/0x750 mm/slab.c:3614 > >> > > > > >kmalloc include/linux/slab.h:516 [inline] > >> > > > > >kzalloc include/linux/slab.h:705 [inline] > >> > > > > >kvm_irqfd_assign arch/x86/kvm/../../../virt/kvm/eventfd.c:296 > >> > > > > > [inline] > >> > > > > >kvm_irqfd+0x16c/0x1d50 > >> > > > > > arch/x86/kvm/../../../virt/kvm/eventfd.c:572 > >> > > > > >kvm_vm_ioctl+0x1079/0x1c40 > >> > > > > > arch/x86/kvm/../../../virt/kvm/kvm_main.c:2992 > >> > > > > >
Re: KASAN: use-after-free Write in irq_bypass_register_consumer
On Sat, May 26, 2018 at 11:24:01AM +0200, Dmitry Vyukov wrote: > On Sun, May 13, 2018 at 8:21 AM, Eric Biggers wrote: > > On Thu, Apr 05, 2018 at 08:15:24PM -0700, Eric Biggers wrote: > >> On Mon, Jan 29, 2018 at 01:29:48PM +0800, Tianyu Lan wrote: > >> > > >> > > >> > On 1/27/2018 7:27 AM, Eric Biggers wrote: > >> > > On Sat, Dec 16, 2017 at 04:37:02PM +0800, Lan, Tianyu wrote: > >> > > > The root cause is that kvm_irqfd_assign() and kvm_irqfd_deassign() > >> > > > can't > >> > > > be run in parallel. Some data structure(e.g, irqfd->consumer) will be > >> > > > crashed because irqfd may be freed in deassign path before they are > >> > > > used > >> > > > in assign path. The other data maybe used in deassign path before > >> > > > initialization. Syzbot test hit such case. Add mutx between > >> > > > kvm_irqfd_assign() and kvm_irqfd_deassign() can fix such issue. Will > >> > > > send patch to fix it. > >> > > > > >> > > > On 12/16/2017 12:53 PM, Tianyu Lan wrote: > >> > > > > I reproduced the issue. Will have a look. > >> > > > > > >> > > > > -- Best regards Tianyu Lan 2017-12-15 18:14 GMT+08:00 syzbot > >> > > > > : > >> > > > > > syzkaller has found reproducer for the following crash on > >> > > > > > 82bcf1def3b5f1251177ad47c44f7e17af039b4b > >> > > > > > git://git.cmpxchg.org/linux-mmots.git/master > >> > > > > > compiler: gcc (GCC) 7.1.1 20170620 > >> > > > > > .config is attached > >> > > > > > Raw console output is attached. > >> > > > > > C reproducer is attached > >> > > > > > syzkaller reproducer is attached. Seehttps://goo.gl/kgGztJ > >> > > > > > for information about syzkaller reproducers > >> > > > > > > >> > > > > > > >> > > > > > == > >> > > > > > BUG: KASAN: use-after-free in __list_add include/linux/list.h:64 > >> > > > > > [inline] > >> > > > > > BUG: KASAN: use-after-free in list_add include/linux/list.h:79 > >> > > > > > [inline] > >> > > > > > BUG: KASAN: use-after-free in > >> > > > > > irq_bypass_register_consumer+0x4b4/0x500 > >> > > > > > virt/lib/irqbypass.c:217 > >> > > > > > Write of size 8 at addr 8801cdf51180 by task > >> > > > > > syzkaller436086/15031 > >> > > > > > > >> > > > > > CPU: 1 PID: 15031 Comm: syzkaller436086 Not tainted > >> > > > > > 4.15.0-rc2-mm1+ #39 > >> > > > > > Hardware name: Google Google Compute Engine/Google Compute > >> > > > > > Engine, BIOS > >> > > > > > Google 01/01/2011 > >> > > > > > Call Trace: > >> > > > > >__dump_stack lib/dump_stack.c:17 [inline] > >> > > > > >dump_stack+0x194/0x257 lib/dump_stack.c:53 > >> > > > > >print_address_description+0x73/0x250 mm/kasan/report.c:252 > >> > > > > >kasan_report_error mm/kasan/report.c:351 [inline] > >> > > > > >kasan_report+0x25b/0x340 mm/kasan/report.c:409 > >> > > > > >__asan_report_store8_noabort+0x17/0x20 mm/kasan/report.c:435 > >> > > > > >__list_add include/linux/list.h:64 [inline] > >> > > > > >list_add include/linux/list.h:79 [inline] > >> > > > > >irq_bypass_register_consumer+0x4b4/0x500 > >> > > > > > virt/lib/irqbypass.c:217 > >> > > > > >kvm_irqfd_assign arch/x86/kvm/../../../virt/kvm/eventfd.c:417 > >> > > > > > [inline] > >> > > > > >kvm_irqfd+0x137f/0x1d50 > >> > > > > > arch/x86/kvm/../../../virt/kvm/eventfd.c:572 > >> > > > > >kvm_vm_ioctl+0x1079/0x1c40 > >> > > > > > arch/x86/kvm/../../../virt/kvm/kvm_main.c:2992 > >> > > > > >vfs_ioctl fs/ioctl.c:46 [inline] > >> > > > > >do_vfs_ioctl+0x1b1/0x1530 fs/ioctl.c:686 > >> > > > > >SYSC_ioctl fs/ioctl.c:701 [inline] > >> > > > > >SyS_ioctl+0x8f/0xc0 fs/ioctl.c:692 > >> > > > > >entry_SYSCALL_64_fastpath+0x1f/0x96 > >> > > > > > RIP: 0033:0x44d379 > >> > > > > > RSP: 002b:7fc5ff9a9d08 EFLAGS: 0246 ORIG_RAX: > >> > > > > > 0010 > >> > > > > > RAX: ffda RBX: 7fc5ff9aa700 RCX: 0044d379 > >> > > > > > RDX: 20080fe0 RSI: 4020ae76 RDI: 0005 > >> > > > > > RBP: 007ff900 R08: 7fc5ff9aa700 R09: 7fc5ff9aa700 > >> > > > > > R10: 7fc5ff9aa700 R11: 0246 R12: > >> > > > > > R13: 007ff8ff R14: 7fc5ff9aa9c0 R15: > >> > > > > > > >> > > > > > Allocated by task 15031: > >> > > > > >save_stack+0x43/0xd0 mm/kasan/kasan.c:447 > >> > > > > >set_track mm/kasan/kasan.c:459 [inline] > >> > > > > >kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551 > >> > > > > >kmem_cache_alloc_trace+0x136/0x750 mm/slab.c:3614 > >> > > > > >kmalloc include/linux/slab.h:516 [inline] > >> > > > > >kzalloc include/linux/slab.h:705 [inline] > >> > > > > >kvm_irqfd_assign arch/x86/kvm/../../../virt/kvm/eventfd.c:296 > >> > > > > > [inline] > >> > > > > >kvm_irqfd+0x16c/0x1d50 > >> > > > > > arch/x86/kvm/../../../virt/kvm/eventfd.c:572 > >> > > > > >kvm_vm_ioctl+0x1079/0x1c40 > >> > > > > > arch/x86/kvm/../../../virt/kvm/kvm_main.c:2992 > >> > > > > >
Compliment of the day to you Dear Friend.
Compliment of the day to you Dear Friend. Dear Friend. I am Mrs. Amina Kadi. am sending this brief letter to solicit your partnership to transfer $5.5 million US Dollars. I shall send you more information and procedures when I receive positive response from you. Mrs. Amina Kadi
Compliment of the day to you Dear Friend.
Compliment of the day to you Dear Friend. Dear Friend. I am Mrs. Amina Kadi. am sending this brief letter to solicit your partnership to transfer $5.5 million US Dollars. I shall send you more information and procedures when I receive positive response from you. Mrs. Amina Kadi
Re: [PATCH 1/9] of: add helper to lookup compatible child node
Hi Johan, I love your patch! Yet something to improve: [auto build test ERROR on robh/for-next] [also build test ERROR on v4.18 next-20180822] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Johan-Hovold/of-fix-compatible-child-node-lookups/20180823-074211 base: https://git.kernel.org/pub/scm/linux/kernel/git/robh/linux.git for-next config: mips-decstation_defconfig (attached as .config) compiler: mipsel-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0 reproduce: wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree GCC_VERSION=7.2.0 make.cross ARCH=mips All errors (new ones prefixed by >>): In file included from include/linux/irqdomain.h:35:0, from arch/mips/include/asm/irq.h:14, from include/linux/irq.h:23, from include/asm-generic/hardirq.h:13, from arch/mips/include/asm/hardirq.h:16, from include/linux/hardirq.h:9, from include/linux/interrupt.h:11, from arch/mips/dec/ecc-berr.c:16: >> include/linux/of.h:637:28: error: 'of_get_compatible_child' defined but not >> used [-Werror=unused-function] static struct device_node *of_get_compatible_child(const struct device_node *parent, ^~~ cc1: all warnings being treated as errors vim +/of_get_compatible_child +637 include/linux/of.h 636 > 637 static struct device_node *of_get_compatible_child(const struct > device_node *parent, 638 const char *compatible) 639 { 640 return NULL; 641 } 642 --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
Re: [PATCH 1/9] of: add helper to lookup compatible child node
Hi Johan, I love your patch! Yet something to improve: [auto build test ERROR on robh/for-next] [also build test ERROR on v4.18 next-20180822] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Johan-Hovold/of-fix-compatible-child-node-lookups/20180823-074211 base: https://git.kernel.org/pub/scm/linux/kernel/git/robh/linux.git for-next config: mips-decstation_defconfig (attached as .config) compiler: mipsel-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0 reproduce: wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree GCC_VERSION=7.2.0 make.cross ARCH=mips All errors (new ones prefixed by >>): In file included from include/linux/irqdomain.h:35:0, from arch/mips/include/asm/irq.h:14, from include/linux/irq.h:23, from include/asm-generic/hardirq.h:13, from arch/mips/include/asm/hardirq.h:16, from include/linux/hardirq.h:9, from include/linux/interrupt.h:11, from arch/mips/dec/ecc-berr.c:16: >> include/linux/of.h:637:28: error: 'of_get_compatible_child' defined but not >> used [-Werror=unused-function] static struct device_node *of_get_compatible_child(const struct device_node *parent, ^~~ cc1: all warnings being treated as errors vim +/of_get_compatible_child +637 include/linux/of.h 636 > 637 static struct device_node *of_get_compatible_child(const struct > device_node *parent, 638 const char *compatible) 639 { 640 return NULL; 641 } 642 --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
[lkp-robot] [x86] 19efe000d3: PANIC:double_fault
FYI, we noticed the following commit (built with gcc-5): commit: 19efe000d3258032d9a1dfb25313a092f9454da0 ("x86: Remap the IRQ stack so it has guard pages") https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git x86/guard_pages in testcase: trinity with following parameters: runtime: 300s test-description: Trinity is a linux system call fuzz tester. test-url: http://codemonkey.org.uk/projects/trinity/ on test machine: qemu-system-x86_64 -enable-kvm -cpu IvyBridge -m 420M caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace): +--+++ | | 0d997f71d5 | 19efe000d3 | +--+++ | boot_successes | 6 | 0 | | boot_failures| 5 | 32 | | invoked_oom-killer:gfp_mask=0x | 5 || | Mem-Info | 5 || | Out_of_memory:Kill_process | 3 || | Kernel_panic-not_syncing:Out_of_memory_and_no_killable_processes | 3 || | RIP:__put_user_4 | 1 || | PANIC:double_fault | 0 | 32 | | RIP:trace_hardirqs_off_thunk | 0 | 32 | | Kernel_panic-not_syncing:Machine_halted | 0 | 32 | | WARNING:kernel_stack | 0 | 32 | +--+++ [0.004000] memory used by lock dependency info: 7871 kB [0.004000] per task-struct memory footprint: 2688 bytes [0.004000] ACPI: Core revision 20180531 [0.004000] clocksource: hpet: mask: 0x max_cycles: 0x, max_idle_ns: 19112604467 ns [0.004000] hpet clockevent registered [0.004000] PANIC: double fault, error_code: 0x0 [0.004000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.18.0-rc4-00074-g19efe000 #2 [0.004000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 [0.004000] RIP: 0010:trace_hardirqs_off_thunk+0xb/0x1c [0.004000] Code: 5f 5d c3 55 48 89 e5 57 56 52 51 50 41 50 41 51 41 52 41 53 48 8b 7d 08 e8 f5 4e 2b 00 eb 34 55 48 89 e5 57 56 52 51 50 41 50 <41> 51 41 52 41 53 48 8b 7d 08 e8 eb 52 2b 00 eb 18 55 48 89 e5 57 [0.004000] RSP: :c900 EFLAGS: 00010087 [0.004000] RAX: 82800a97 RBX: 0001 RCX: 82800a97 [0.004000] RDX: RSI: 82800f68 RDI: 83678c68 [0.004000] RBP: c930 R08: R09: [0.004000] R10: R11: R12: [0.004000] R13: R14: R15: [0.004000] FS: () GS:88001340() knlGS: [0.004000] CS: 0010 DS: ES: CR0: 80050033 [0.004000] CR2: c8f8 CR3: 0366e000 CR4: 000406b0 [0.004000] Call Trace: [0.004000] Kernel panic - not syncing: Machine halted. [0.004000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.18.0-rc4-00074-g19efe000 #2 [0.004000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 [0.004000] Call Trace: [0.004000] <#DF> [0.004000] dump_stack+0x9b/0xe7 [0.004000] panic+0x1ad/0x325 [0.004000] ? refcount_error_report+0x267/0x267 [0.004000] df_debug+0x32/0x32 [0.004000] do_double_fault+0x1b3/0x1c7 [0.004000] double_fault+0x23/0x30 [0.004000] RIP: 0010:trace_hardirqs_off_thunk+0xb/0x1c [0.004000] Code: 5f 5d c3 55 48 89 e5 57 56 52 51 50 41 50 41 51 41 52 41 53 48 8b 7d 08 e8 f5 4e 2b 00 eb 34 55 48 89 e5 57 56 52 51 50 41 50 <41> 51 41 52 41 53 48 8b 7d 08 e8 eb 52 2b 00 eb 18 55 48 89 e5 57 [0.004000] RSP: :c900 EFLAGS: 00010087 [0.004000] RAX: 82800a97 RBX: 0001 RCX: 82800a97 [0.004000] RDX: RSI: 82800f68 RDI: 83678c68 [0.004000] RBP: c930 R08: R09: [0.004000] R10: R11: R12: [0.004000] R13: R14: R15: [0.004000] ? native_iret+0x7/0x7 [0.004000] ? async_page_fault+0x8/0x30 [0.004000] WARNING: kernel stack regs at (ptrval)
[lkp-robot] [x86] 19efe000d3: PANIC:double_fault
FYI, we noticed the following commit (built with gcc-5): commit: 19efe000d3258032d9a1dfb25313a092f9454da0 ("x86: Remap the IRQ stack so it has guard pages") https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git x86/guard_pages in testcase: trinity with following parameters: runtime: 300s test-description: Trinity is a linux system call fuzz tester. test-url: http://codemonkey.org.uk/projects/trinity/ on test machine: qemu-system-x86_64 -enable-kvm -cpu IvyBridge -m 420M caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace): +--+++ | | 0d997f71d5 | 19efe000d3 | +--+++ | boot_successes | 6 | 0 | | boot_failures| 5 | 32 | | invoked_oom-killer:gfp_mask=0x | 5 || | Mem-Info | 5 || | Out_of_memory:Kill_process | 3 || | Kernel_panic-not_syncing:Out_of_memory_and_no_killable_processes | 3 || | RIP:__put_user_4 | 1 || | PANIC:double_fault | 0 | 32 | | RIP:trace_hardirqs_off_thunk | 0 | 32 | | Kernel_panic-not_syncing:Machine_halted | 0 | 32 | | WARNING:kernel_stack | 0 | 32 | +--+++ [0.004000] memory used by lock dependency info: 7871 kB [0.004000] per task-struct memory footprint: 2688 bytes [0.004000] ACPI: Core revision 20180531 [0.004000] clocksource: hpet: mask: 0x max_cycles: 0x, max_idle_ns: 19112604467 ns [0.004000] hpet clockevent registered [0.004000] PANIC: double fault, error_code: 0x0 [0.004000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.18.0-rc4-00074-g19efe000 #2 [0.004000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 [0.004000] RIP: 0010:trace_hardirqs_off_thunk+0xb/0x1c [0.004000] Code: 5f 5d c3 55 48 89 e5 57 56 52 51 50 41 50 41 51 41 52 41 53 48 8b 7d 08 e8 f5 4e 2b 00 eb 34 55 48 89 e5 57 56 52 51 50 41 50 <41> 51 41 52 41 53 48 8b 7d 08 e8 eb 52 2b 00 eb 18 55 48 89 e5 57 [0.004000] RSP: :c900 EFLAGS: 00010087 [0.004000] RAX: 82800a97 RBX: 0001 RCX: 82800a97 [0.004000] RDX: RSI: 82800f68 RDI: 83678c68 [0.004000] RBP: c930 R08: R09: [0.004000] R10: R11: R12: [0.004000] R13: R14: R15: [0.004000] FS: () GS:88001340() knlGS: [0.004000] CS: 0010 DS: ES: CR0: 80050033 [0.004000] CR2: c8f8 CR3: 0366e000 CR4: 000406b0 [0.004000] Call Trace: [0.004000] Kernel panic - not syncing: Machine halted. [0.004000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.18.0-rc4-00074-g19efe000 #2 [0.004000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 [0.004000] Call Trace: [0.004000] <#DF> [0.004000] dump_stack+0x9b/0xe7 [0.004000] panic+0x1ad/0x325 [0.004000] ? refcount_error_report+0x267/0x267 [0.004000] df_debug+0x32/0x32 [0.004000] do_double_fault+0x1b3/0x1c7 [0.004000] double_fault+0x23/0x30 [0.004000] RIP: 0010:trace_hardirqs_off_thunk+0xb/0x1c [0.004000] Code: 5f 5d c3 55 48 89 e5 57 56 52 51 50 41 50 41 51 41 52 41 53 48 8b 7d 08 e8 f5 4e 2b 00 eb 34 55 48 89 e5 57 56 52 51 50 41 50 <41> 51 41 52 41 53 48 8b 7d 08 e8 eb 52 2b 00 eb 18 55 48 89 e5 57 [0.004000] RSP: :c900 EFLAGS: 00010087 [0.004000] RAX: 82800a97 RBX: 0001 RCX: 82800a97 [0.004000] RDX: RSI: 82800f68 RDI: 83678c68 [0.004000] RBP: c930 R08: R09: [0.004000] R10: R11: R12: [0.004000] R13: R14: R15: [0.004000] ? native_iret+0x7/0x7 [0.004000] ? async_page_fault+0x8/0x30 [0.004000] WARNING: kernel stack regs at (ptrval)
[PATCH] cpufreq: ti-cpufreq: Only register platform_device when supported
Currently the ti-cpufreq driver blindly registers a 'ti-cpufreq' to force the driver to probe on any platforms where the driver is built in. However, this should only happen on platforms that actually can make use of the driver. There is already functionality in place to match the SoC compatible so let's factor this out into a separate call and make sure we find a match before creating the ti-cpufreq driver device. Signed-off-by: Dave Gerlach --- drivers/cpufreq/ti-cpufreq.c | 25 - 1 file changed, 20 insertions(+), 5 deletions(-) diff --git a/drivers/cpufreq/ti-cpufreq.c b/drivers/cpufreq/ti-cpufreq.c index 3f0e2a14895a..541fdcf17b57 100644 --- a/drivers/cpufreq/ti-cpufreq.c +++ b/drivers/cpufreq/ti-cpufreq.c @@ -201,19 +201,31 @@ static const struct of_device_id ti_cpufreq_of_match[] = { {}, }; +static const struct of_device_id *ti_cpufreq_match_node(void) +{ + struct device_node *np; + const struct of_device_id *match; + + np = of_find_node_by_path("/"); + match = of_match_node(ti_cpufreq_of_match, np); + of_node_put(np); + + if (!match) + return NULL; + else + return match; +} + static int ti_cpufreq_probe(struct platform_device *pdev) { u32 version[VERSION_COUNT]; - struct device_node *np; const struct of_device_id *match; struct opp_table *ti_opp_table; struct ti_cpufreq_data *opp_data; const char * const reg_names[] = {"vdd", "vbb"}; int ret; - np = of_find_node_by_path("/"); - match = of_match_node(ti_cpufreq_of_match, np); - of_node_put(np); + match = ti_cpufreq_match_node(); if (!match) return -ENODEV; @@ -290,7 +302,10 @@ static int ti_cpufreq_probe(struct platform_device *pdev) static int ti_cpufreq_init(void) { - platform_device_register_simple("ti-cpufreq", -1, NULL, 0); + /* Check to ensure we are on a compatible platform */ + if (ti_cpufreq_match_node()) + platform_device_register_simple("ti-cpufreq", -1, NULL, 0); + return 0; } module_init(ti_cpufreq_init); -- 2.16.1
[PATCH] cpufreq: ti-cpufreq: Only register platform_device when supported
Currently the ti-cpufreq driver blindly registers a 'ti-cpufreq' to force the driver to probe on any platforms where the driver is built in. However, this should only happen on platforms that actually can make use of the driver. There is already functionality in place to match the SoC compatible so let's factor this out into a separate call and make sure we find a match before creating the ti-cpufreq driver device. Signed-off-by: Dave Gerlach --- drivers/cpufreq/ti-cpufreq.c | 25 - 1 file changed, 20 insertions(+), 5 deletions(-) diff --git a/drivers/cpufreq/ti-cpufreq.c b/drivers/cpufreq/ti-cpufreq.c index 3f0e2a14895a..541fdcf17b57 100644 --- a/drivers/cpufreq/ti-cpufreq.c +++ b/drivers/cpufreq/ti-cpufreq.c @@ -201,19 +201,31 @@ static const struct of_device_id ti_cpufreq_of_match[] = { {}, }; +static const struct of_device_id *ti_cpufreq_match_node(void) +{ + struct device_node *np; + const struct of_device_id *match; + + np = of_find_node_by_path("/"); + match = of_match_node(ti_cpufreq_of_match, np); + of_node_put(np); + + if (!match) + return NULL; + else + return match; +} + static int ti_cpufreq_probe(struct platform_device *pdev) { u32 version[VERSION_COUNT]; - struct device_node *np; const struct of_device_id *match; struct opp_table *ti_opp_table; struct ti_cpufreq_data *opp_data; const char * const reg_names[] = {"vdd", "vbb"}; int ret; - np = of_find_node_by_path("/"); - match = of_match_node(ti_cpufreq_of_match, np); - of_node_put(np); + match = ti_cpufreq_match_node(); if (!match) return -ENODEV; @@ -290,7 +302,10 @@ static int ti_cpufreq_probe(struct platform_device *pdev) static int ti_cpufreq_init(void) { - platform_device_register_simple("ti-cpufreq", -1, NULL, 0); + /* Check to ensure we are on a compatible platform */ + if (ti_cpufreq_match_node()) + platform_device_register_simple("ti-cpufreq", -1, NULL, 0); + return 0; } module_init(ti_cpufreq_init); -- 2.16.1
Re: [PATCH] include/linux/compiler*.h: make compiler-*.h mutually exclusive
Hi Nick 2018-08-23 8:37 GMT+09:00 Nick Desaulniers : > Commit cafa0010cd51 ("Raise the minimum required gcc version to 4.6") > recently exposed a brittle part of the build for supporting non-gcc > compilers. > > Both Clang and ICC define __GNUC__, __GNUC_MINOR__, and > __GNUC_PATCHLEVEL__ for quick compatibility with code bases that haven't > added compiler specific checks for __clang__ or __INTEL_COMPILER. > > This is brittle, as they happened to get compatibility by posing as a > certain version of GCC. This broke when upgrading the minimal version > of GCC required to build the kernel, to a version above what ICC and > Clang claim to be. > > Rather than always including compiler-gcc.h then undefining or > redefining macros in compiler-intel.h or compiler-clang.h, let's > separate out the compiler specific macro definitions into mutually > exclusive headers, do more proper compiler detection, and keep shared > definitions in compiler_types.h. > > Reported-by: Masahiro Yamada > Suggested-by: Eli Friedman > Suggested-by: Joe Perches > Signed-off-by: Nick Desaulniers > --- > arch/arm/kernel/asm-offsets.c | 13 +- > drivers/gpu/drm/i915/i915_utils.h | 2 +- > drivers/watchdog/kempld_wdt.c | 5 - > include/linux/compiler-clang.h| 20 ++- > include/linux/compiler-gcc.h | 88 --- > include/linux/compiler-intel.h| 13 +- > include/linux/compiler_types.h| 238 ++ > mm/ksm.c | 4 +- > mm/migrate.c | 3 +- > 9 files changed, 133 insertions(+), 253 deletions(-) If I build ARM linux with Clang, I see the following error. arch/arm/firmware/trusted_foundations.c:34:13: error: variable has incomplete type 'void' static void __naked tf_generic_smc(u32 type, u32 arg1, u32 arg2) ^ arch/arm/firmware/trusted_foundations.c:34:20: error: expected ';' after top level declarator static void __naked tf_generic_smc(u32 type, u32 arg1, u32 arg2) ^ ; arch/arm/firmware/trusted_foundations.c:77:25: error: use of undeclared identifier 'trusted_foundations_ops' register_firmware_ops(_foundations_ops); ^ 3 errors generated. scripts/Makefile.build:307: recipe for target 'arch/arm/firmware/trusted_foundations.o' failed make[2]: *** [arch/arm/firmware/trusted_foundations.o] Error 1 Makefile:1057: recipe for target 'arch/arm/firmware' failed make[1]: *** [arch/arm/firmware] Error 2 make[1]: *** Waiting for unfinished jobs __naked is defined in It was previous included by all compilers, but now it is only by _true_ GCC. Even if I move the __naked definition to , I see a different error. arch/arm/firmware/trusted_foundations.c:47:10: error: parameter references not allowed in naked functions : "r" (type), "r" (arg1), "r" (arg2) ^ arch/arm/firmware/trusted_foundations.c:34:13: note: attribute is here static void __naked tf_generic_smc(u32 type, u32 arg1, u32 arg2) ^ ./include/linux/compiler_types.h:277:33: note: expanded from macro '__naked' #define __naked __attribute__((naked)) ^ I do not know what a correct fix is. -- Best Regards Masahiro Yamada
Re: [PATCH] include/linux/compiler*.h: make compiler-*.h mutually exclusive
Hi Nick 2018-08-23 8:37 GMT+09:00 Nick Desaulniers : > Commit cafa0010cd51 ("Raise the minimum required gcc version to 4.6") > recently exposed a brittle part of the build for supporting non-gcc > compilers. > > Both Clang and ICC define __GNUC__, __GNUC_MINOR__, and > __GNUC_PATCHLEVEL__ for quick compatibility with code bases that haven't > added compiler specific checks for __clang__ or __INTEL_COMPILER. > > This is brittle, as they happened to get compatibility by posing as a > certain version of GCC. This broke when upgrading the minimal version > of GCC required to build the kernel, to a version above what ICC and > Clang claim to be. > > Rather than always including compiler-gcc.h then undefining or > redefining macros in compiler-intel.h or compiler-clang.h, let's > separate out the compiler specific macro definitions into mutually > exclusive headers, do more proper compiler detection, and keep shared > definitions in compiler_types.h. > > Reported-by: Masahiro Yamada > Suggested-by: Eli Friedman > Suggested-by: Joe Perches > Signed-off-by: Nick Desaulniers > --- > arch/arm/kernel/asm-offsets.c | 13 +- > drivers/gpu/drm/i915/i915_utils.h | 2 +- > drivers/watchdog/kempld_wdt.c | 5 - > include/linux/compiler-clang.h| 20 ++- > include/linux/compiler-gcc.h | 88 --- > include/linux/compiler-intel.h| 13 +- > include/linux/compiler_types.h| 238 ++ > mm/ksm.c | 4 +- > mm/migrate.c | 3 +- > 9 files changed, 133 insertions(+), 253 deletions(-) If I build ARM linux with Clang, I see the following error. arch/arm/firmware/trusted_foundations.c:34:13: error: variable has incomplete type 'void' static void __naked tf_generic_smc(u32 type, u32 arg1, u32 arg2) ^ arch/arm/firmware/trusted_foundations.c:34:20: error: expected ';' after top level declarator static void __naked tf_generic_smc(u32 type, u32 arg1, u32 arg2) ^ ; arch/arm/firmware/trusted_foundations.c:77:25: error: use of undeclared identifier 'trusted_foundations_ops' register_firmware_ops(_foundations_ops); ^ 3 errors generated. scripts/Makefile.build:307: recipe for target 'arch/arm/firmware/trusted_foundations.o' failed make[2]: *** [arch/arm/firmware/trusted_foundations.o] Error 1 Makefile:1057: recipe for target 'arch/arm/firmware' failed make[1]: *** [arch/arm/firmware] Error 2 make[1]: *** Waiting for unfinished jobs __naked is defined in It was previous included by all compilers, but now it is only by _true_ GCC. Even if I move the __naked definition to , I see a different error. arch/arm/firmware/trusted_foundations.c:47:10: error: parameter references not allowed in naked functions : "r" (type), "r" (arg1), "r" (arg2) ^ arch/arm/firmware/trusted_foundations.c:34:13: note: attribute is here static void __naked tf_generic_smc(u32 type, u32 arg1, u32 arg2) ^ ./include/linux/compiler_types.h:277:33: note: expanded from macro '__naked' #define __naked __attribute__((naked)) ^ I do not know what a correct fix is. -- Best Regards Masahiro Yamada
Re: [PATCH] irq: ingenic: Drop dependency on MACH_INGENIC
Hi Paul, Thank you for the patch! Perhaps something to improve: [auto build test WARNING on tip/irq/core] [also build test WARNING on v4.18 next-20180822] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Paul-Cercueil/irq-ingenic-Drop-dependency-on-MACH_INGENIC/20180822-083429 config: mips-allmodconfig (attached as .config) compiler: mips-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0 reproduce: wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree GCC_VERSION=7.2.0 make.cross ARCH=mips All warnings (new ones prefixed by >>): In file included from drivers//irqchip/irq-ingenic.c:30:0: >> arch/mips/include/asm/mach-jz4740/irq.h:63:0: warning: "NR_IRQS" redefined #define NR_IRQS (JZ4740_IRQ_ADC_BASE + 6) In file included from arch/mips/include/asm/irq.h:18:0, from include/linux/irq.h:23, from include/asm-generic/hardirq.h:13, from arch/mips/include/asm/hardirq.h:16, from include/linux/hardirq.h:9, from include/linux/interrupt.h:11, from drivers//irqchip/irq-ingenic.c:19: arch/mips/include/asm/mach-generic/irq.h:12:0: note: this is the location of the previous definition #define NR_IRQS 128 vim +/NR_IRQS +63 arch/mips/include/asm/mach-jz4740/irq.h 9869848d Lars-Peter Clausen 2010-07-17 62 9869848d Lars-Peter Clausen 2010-07-17 @63 #define NR_IRQS (JZ4740_IRQ_ADC_BASE + 6) 9869848d Lars-Peter Clausen 2010-07-17 64 :: The code at line 63 was first introduced by commit :: 9869848d12601cdddf097a36aebe0b10dc5d177b MIPS: JZ4740: Add IRQ handler code :: TO: Lars-Peter Clausen :: CC: Ralf Baechle --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
Re: [PATCH] irq: ingenic: Drop dependency on MACH_INGENIC
Hi Paul, Thank you for the patch! Perhaps something to improve: [auto build test WARNING on tip/irq/core] [also build test WARNING on v4.18 next-20180822] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Paul-Cercueil/irq-ingenic-Drop-dependency-on-MACH_INGENIC/20180822-083429 config: mips-allmodconfig (attached as .config) compiler: mips-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0 reproduce: wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree GCC_VERSION=7.2.0 make.cross ARCH=mips All warnings (new ones prefixed by >>): In file included from drivers//irqchip/irq-ingenic.c:30:0: >> arch/mips/include/asm/mach-jz4740/irq.h:63:0: warning: "NR_IRQS" redefined #define NR_IRQS (JZ4740_IRQ_ADC_BASE + 6) In file included from arch/mips/include/asm/irq.h:18:0, from include/linux/irq.h:23, from include/asm-generic/hardirq.h:13, from arch/mips/include/asm/hardirq.h:16, from include/linux/hardirq.h:9, from include/linux/interrupt.h:11, from drivers//irqchip/irq-ingenic.c:19: arch/mips/include/asm/mach-generic/irq.h:12:0: note: this is the location of the previous definition #define NR_IRQS 128 vim +/NR_IRQS +63 arch/mips/include/asm/mach-jz4740/irq.h 9869848d Lars-Peter Clausen 2010-07-17 62 9869848d Lars-Peter Clausen 2010-07-17 @63 #define NR_IRQS (JZ4740_IRQ_ADC_BASE + 6) 9869848d Lars-Peter Clausen 2010-07-17 64 :: The code at line 63 was first introduced by commit :: 9869848d12601cdddf097a36aebe0b10dc5d177b MIPS: JZ4740: Add IRQ handler code :: TO: Lars-Peter Clausen :: CC: Ralf Baechle --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
[PATCH] scsi/virio_scsi.c: do not call virtscsi_remove_vqs() in virtscsi_init() to avoid crash bug
If some error happened before find_vqs, error branch will goto virtscsi_remove_vqs to free vqs. Actually the vqs have not been allocated successfully, so this will cause wild-pointer-free problem. So virtscsi_remove_vqs could be deleted as no error will happen after find_vqs. Signed-off-by: Jun Piao --- drivers/scsi/virtio_scsi.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c index 1c72db9..da0fd74 100644 --- a/drivers/scsi/virtio_scsi.c +++ b/drivers/scsi/virtio_scsi.c @@ -833,8 +833,6 @@ static int virtscsi_init(struct virtio_device *vdev, kfree(names); kfree(callbacks); kfree(vqs); - if (err) - virtscsi_remove_vqs(vdev); return err; } --
[PATCH] scsi/virio_scsi.c: do not call virtscsi_remove_vqs() in virtscsi_init() to avoid crash bug
If some error happened before find_vqs, error branch will goto virtscsi_remove_vqs to free vqs. Actually the vqs have not been allocated successfully, so this will cause wild-pointer-free problem. So virtscsi_remove_vqs could be deleted as no error will happen after find_vqs. Signed-off-by: Jun Piao --- drivers/scsi/virtio_scsi.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c index 1c72db9..da0fd74 100644 --- a/drivers/scsi/virtio_scsi.c +++ b/drivers/scsi/virtio_scsi.c @@ -833,8 +833,6 @@ static int virtscsi_init(struct virtio_device *vdev, kfree(names); kfree(callbacks); kfree(vqs); - if (err) - virtscsi_remove_vqs(vdev); return err; } --
[GIT PULL] tracing: gcov and fixes to no kprobes on notrace functions
Linus, Masami found an off by one bug in the code that keeps "notrace" functions from being traced by kprobes. During my testing, I found that there's places that we may want to add kprobes to notrace, thus we may end up changing this code before 4.19 is released. The history behind this change is that we found that adding kprobes to various notrace functions caused the kernel to crashed. We took the safe route and decided not to allow kprobes to trace any notrace function. But because notrace is added to functions that just cause weird side effects to the function tracer, but are still safe, preventing kprobes for all notrace functions may be too much of a big hammer. One such place is __schedule() is marked notrace, to keep function tracer from doing strange recursive loops when it gets traced with NEED_RESCHED set. With this change, one can not add kprobes to the scheduler. Masami also added code to use gcov on ftrace. Please pull the latest trace-v4.19-1 tree, which can be found at: git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git trace-v4.19-1 Tag SHA1: 808ce93b6ef6d4949eee7c48ab5e002142367393 Head SHA1: 9161a864ff88e800de50494da095af19832e9583 Masami Hiramatsu (2): tracing: Allow gcov profiling on only ftrace subsystem tracing/kprobes: Fix to check notrace function with correct range kernel/trace/Kconfig| 12 kernel/trace/Makefile | 5 + kernel/trace/trace_kprobe.c | 9 - 3 files changed, 25 insertions(+), 1 deletion(-) --- diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig index fd6754b88f87..aec44d838283 100644 --- a/kernel/trace/Kconfig +++ b/kernel/trace/Kconfig @@ -774,6 +774,18 @@ config TRACING_EVENTS_GPIO help Enable tracing events for gpio subsystem +config GCOV_PROFILE_FTRACE + bool "Enable GCOV profiling on ftrace subsystem" + depends on GCOV_KERNEL + help + Enable GCOV profiling on ftrace subsystem for checking + which functions/lines are tested. + + If unsure, say N. + + Note that on a kernel compiled with this config, ftrace will + run significantly slower. + endif # FTRACE endif # TRACING_SUPPORT diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile index 98d53b39a8ee..f81dadbc7c4a 100644 --- a/kernel/trace/Makefile +++ b/kernel/trace/Makefile @@ -23,6 +23,11 @@ ifdef CONFIG_TRACING_BRANCHES KBUILD_CFLAGS += -DDISABLE_BRANCH_PROFILING endif +# for GCOV coverage profiling +ifdef CONFIG_GCOV_PROFILE_FTRACE +GCOV_PROFILE := y +endif + CFLAGS_trace_benchmark.o := -I$(src) CFLAGS_trace_events_filter.o := -I$(src) diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c index 65a4157af851..ad384b31fe01 100644 --- a/kernel/trace/trace_kprobe.c +++ b/kernel/trace/trace_kprobe.c @@ -513,7 +513,14 @@ static bool within_notrace_func(struct trace_kprobe *tk) if (!addr || !kallsyms_lookup_size_offset(addr, , )) return false; - return !ftrace_location_range(addr - offset, addr - offset + size); + /* Get the entry address of the target function */ + addr -= offset; + + /* +* Since ftrace_location_range() does inclusive range check, we need +* to subtract 1 byte from the end address. +*/ + return !ftrace_location_range(addr, addr + size - 1); } #else #define within_notrace_func(tk)(false)
[GIT PULL] tracing: gcov and fixes to no kprobes on notrace functions
Linus, Masami found an off by one bug in the code that keeps "notrace" functions from being traced by kprobes. During my testing, I found that there's places that we may want to add kprobes to notrace, thus we may end up changing this code before 4.19 is released. The history behind this change is that we found that adding kprobes to various notrace functions caused the kernel to crashed. We took the safe route and decided not to allow kprobes to trace any notrace function. But because notrace is added to functions that just cause weird side effects to the function tracer, but are still safe, preventing kprobes for all notrace functions may be too much of a big hammer. One such place is __schedule() is marked notrace, to keep function tracer from doing strange recursive loops when it gets traced with NEED_RESCHED set. With this change, one can not add kprobes to the scheduler. Masami also added code to use gcov on ftrace. Please pull the latest trace-v4.19-1 tree, which can be found at: git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git trace-v4.19-1 Tag SHA1: 808ce93b6ef6d4949eee7c48ab5e002142367393 Head SHA1: 9161a864ff88e800de50494da095af19832e9583 Masami Hiramatsu (2): tracing: Allow gcov profiling on only ftrace subsystem tracing/kprobes: Fix to check notrace function with correct range kernel/trace/Kconfig| 12 kernel/trace/Makefile | 5 + kernel/trace/trace_kprobe.c | 9 - 3 files changed, 25 insertions(+), 1 deletion(-) --- diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig index fd6754b88f87..aec44d838283 100644 --- a/kernel/trace/Kconfig +++ b/kernel/trace/Kconfig @@ -774,6 +774,18 @@ config TRACING_EVENTS_GPIO help Enable tracing events for gpio subsystem +config GCOV_PROFILE_FTRACE + bool "Enable GCOV profiling on ftrace subsystem" + depends on GCOV_KERNEL + help + Enable GCOV profiling on ftrace subsystem for checking + which functions/lines are tested. + + If unsure, say N. + + Note that on a kernel compiled with this config, ftrace will + run significantly slower. + endif # FTRACE endif # TRACING_SUPPORT diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile index 98d53b39a8ee..f81dadbc7c4a 100644 --- a/kernel/trace/Makefile +++ b/kernel/trace/Makefile @@ -23,6 +23,11 @@ ifdef CONFIG_TRACING_BRANCHES KBUILD_CFLAGS += -DDISABLE_BRANCH_PROFILING endif +# for GCOV coverage profiling +ifdef CONFIG_GCOV_PROFILE_FTRACE +GCOV_PROFILE := y +endif + CFLAGS_trace_benchmark.o := -I$(src) CFLAGS_trace_events_filter.o := -I$(src) diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c index 65a4157af851..ad384b31fe01 100644 --- a/kernel/trace/trace_kprobe.c +++ b/kernel/trace/trace_kprobe.c @@ -513,7 +513,14 @@ static bool within_notrace_func(struct trace_kprobe *tk) if (!addr || !kallsyms_lookup_size_offset(addr, , )) return false; - return !ftrace_location_range(addr - offset, addr - offset + size); + /* Get the entry address of the target function */ + addr -= offset; + + /* +* Since ftrace_location_range() does inclusive range check, we need +* to subtract 1 byte from the end address. +*/ + return !ftrace_location_range(addr, addr + size - 1); } #else #define within_notrace_func(tk)(false)
Re: linux-next: build warning after merge of the apparmor tree
On 08/22/2018 05:20 PM, Stephen Rothwell wrote: > Hi John, > > After merging the apparmor tree, today's linux-next build (x86_64 > allmodconfig) produced this warning: > > security/apparmor/policy_unpack.c: In function 'unpack_dfa': > security/apparmor/policy_unpack.c:426:1: warning: label 'fail' defined but > not used [-Wunused-label] > fail: > ^~~~ > > Introduced by commit > > fb5841091f28 ("apparmor: remove no-op permission check in policy_unpack") > sorry fixed
Re: linux-next: build warning after merge of the apparmor tree
On 08/22/2018 05:20 PM, Stephen Rothwell wrote: > Hi John, > > After merging the apparmor tree, today's linux-next build (x86_64 > allmodconfig) produced this warning: > > security/apparmor/policy_unpack.c: In function 'unpack_dfa': > security/apparmor/policy_unpack.c:426:1: warning: label 'fail' defined but > not used [-Wunused-label] > fail: > ^~~~ > > Introduced by commit > > fb5841091f28 ("apparmor: remove no-op permission check in policy_unpack") > sorry fixed
Re: Overview of performance improvements of recent SMB3 compounding patches
We also have stat() that drops from 6 to 2 roundtrips. For metadata I think the only remaining low hanging fruit is readdir(). Currently in cifs.ko scanning a directory takes a minimum of 4 roundtrips : open query and get data query and get the error STATUS_NO_MORE_FILES close For small-ish directories (the common case I expect will be that almost all directories will fit in one reply of data) we can cut this down to just a single roundtrip by using the compound : open; query; query; close If the second query fails with the error above we know we are done and we finished in a single roundtrip. The drawback is for the rare cases where the directory did not fit in the reply, then we will have to throw all the data away and re-start from scratch using the old loop thus making all large directory scans even slower than they are right now, well just one roundtrip slower but still. (Technically, the protocol does support re-opening a directory and resuming at a specific FileIndex but it is not implemented server-side.) I think it will still be worth it if we can get the common case to drop from 4 to 1 roundtrip. It will involve touching cifs_readdir() which is hairy code but if someone would want to tackle this I think there is opportunity to make a huge impact on directory listing performance. regards ronnie sahlberg - Original Message - > From: "Steve French" > To: "CIFS" , "samba-technical" > , "LKML" > , "linux-fsdevel" > , "ronnie sahlberg" > > Sent: Thursday, 23 August, 2018 5:28:01 AM > Subject: Re: Overview of performance improvements of recent SMB3 compounding > patches > > Continuing the experiments with Ronnie's patches show additional > promising performance results from other common scenarios: > > Very good news that the number of roundtrips (request/response pairs > to the server) has dropped so substantially. Reducing latency, and > allowing the server to more efficiently process the requests leads to > much better performance for these common operations: > > - rename goes from 9 request/response pairs to 5 ("mv /mnt/file /mnt/file1") > - hardlink goes from 8 to only 3 (!) ("ln /mnt/file1 /mnt/file2") > - symlink (with mfsymlinks enabled) goes from 11 to 9 ("ln -s > /mnt/file1 /mnt/file3") > - touch (existing file) 6 down to 4 > > In current kernel we benefit from compounding now on stafs ("stat -f > /mnt"), and in the earlier note I described the improvements in > create, unlink, mkdir and rmdir which were also awesome. > > This is very exciting. > > On Tue, Aug 21, 2018 at 1:24 PM Steve French wrote: > > > > In experiments today with Ronnie's most recent compounding patches I > > see the expected significant improvements in create/mkdir/unlink/rmdir > > operations over SMB3 mounts (tests were to Samba but would be similar > > to all modern servers). See below: > > > > "touch /mnt/file" goes from 6 request/response pairs to 4 with > > Ronnie's compounding patches > > "rm /mnt/file" from 5 to 2 request/response pairs > > "mkdir /mnt/newdir" 6 pairs to 3 pairs > > "rmdir /mnt/newdir" 6 pairs down to 2 pairs > > > > Good job Ronnie! > > -- > > Thanks, > > > > Steve > > > > -- > Thanks, > > Steve >
Re: Overview of performance improvements of recent SMB3 compounding patches
We also have stat() that drops from 6 to 2 roundtrips. For metadata I think the only remaining low hanging fruit is readdir(). Currently in cifs.ko scanning a directory takes a minimum of 4 roundtrips : open query and get data query and get the error STATUS_NO_MORE_FILES close For small-ish directories (the common case I expect will be that almost all directories will fit in one reply of data) we can cut this down to just a single roundtrip by using the compound : open; query; query; close If the second query fails with the error above we know we are done and we finished in a single roundtrip. The drawback is for the rare cases where the directory did not fit in the reply, then we will have to throw all the data away and re-start from scratch using the old loop thus making all large directory scans even slower than they are right now, well just one roundtrip slower but still. (Technically, the protocol does support re-opening a directory and resuming at a specific FileIndex but it is not implemented server-side.) I think it will still be worth it if we can get the common case to drop from 4 to 1 roundtrip. It will involve touching cifs_readdir() which is hairy code but if someone would want to tackle this I think there is opportunity to make a huge impact on directory listing performance. regards ronnie sahlberg - Original Message - > From: "Steve French" > To: "CIFS" , "samba-technical" > , "LKML" > , "linux-fsdevel" > , "ronnie sahlberg" > > Sent: Thursday, 23 August, 2018 5:28:01 AM > Subject: Re: Overview of performance improvements of recent SMB3 compounding > patches > > Continuing the experiments with Ronnie's patches show additional > promising performance results from other common scenarios: > > Very good news that the number of roundtrips (request/response pairs > to the server) has dropped so substantially. Reducing latency, and > allowing the server to more efficiently process the requests leads to > much better performance for these common operations: > > - rename goes from 9 request/response pairs to 5 ("mv /mnt/file /mnt/file1") > - hardlink goes from 8 to only 3 (!) ("ln /mnt/file1 /mnt/file2") > - symlink (with mfsymlinks enabled) goes from 11 to 9 ("ln -s > /mnt/file1 /mnt/file3") > - touch (existing file) 6 down to 4 > > In current kernel we benefit from compounding now on stafs ("stat -f > /mnt"), and in the earlier note I described the improvements in > create, unlink, mkdir and rmdir which were also awesome. > > This is very exciting. > > On Tue, Aug 21, 2018 at 1:24 PM Steve French wrote: > > > > In experiments today with Ronnie's most recent compounding patches I > > see the expected significant improvements in create/mkdir/unlink/rmdir > > operations over SMB3 mounts (tests were to Samba but would be similar > > to all modern servers). See below: > > > > "touch /mnt/file" goes from 6 request/response pairs to 4 with > > Ronnie's compounding patches > > "rm /mnt/file" from 5 to 2 request/response pairs > > "mkdir /mnt/newdir" 6 pairs to 3 pairs > > "rmdir /mnt/newdir" 6 pairs down to 2 pairs > > > > Good job Ronnie! > > -- > > Thanks, > > > > Steve > > > > -- > Thanks, > > Steve >
Re: [PATCH v5 0/4] x86/boot/KASLR: Parse ACPI table and limit kaslr in immovable memory.
Hi all, No reply for 2 weeks, any comments? Thanks, Chao Fan On Tue, Aug 07, 2018 at 02:49:56PM +0800, Chao Fan wrote: >***Background: >People reported that kaslr may randomly chooses some positions >which are located in movable memory regions. This will break memory >hotplug feature and make the memory can't be removed. > >***Solutions: >There should be a method to limit kaslr to choosing immovable memory >regions, so there are 2 solutions: >1) Add a kernel parameter to specify the memory regions. >2) Get the information of memory hotremove, then kaslr will know the > right regions. >In method 2, information about memory hot remove is in ACPI >tables, which will be parsed after 'start_kernel', kaslr can't get >the information. >In method 1, users should know the regions address and specify in >kernel parameter. > >In the earliest time, I tried to dig ACPI tabls to solve this problem. >But I didn't splite the code in 'compressed/' and ACPI code, so the patch >is hard to follow so refused by community. >Somebody suggest to add a kernel parameter to specify the >immovable memory so that limit kaslr in these regions. Then I make >a patchset. After several versions, Ingo gave a suggestion: >https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1634024.html >Follow Ingo's suggestion, imitate the ACPI code to parse the acpi >tables, so that the kaslr can get necessary memory information in >ACPI tables. >Since I think ACPI code is independent part, so copy the codes >and functions to 'compressed/' directory, so that kaslr won't >influence the initialization of ACPI. > >PATCH 1/4 Reuse the head file of linux/acpi.h, and copy a fcuntion from > ACPI code. >PATCH 2/4 Functions to parse ACPI code. >PATCH 3/4 If 'CONFIG_MEMORY_HOTREMOVE' specified, walk all nodes and > store the information of immovable memory regions. >PATCH 4/4 According to the immovable memory regions, filter the > immovable regions which KASLR can choose. > >***Test results: > - I did a very simple test, and it can get the memory information in > bios and efi KVM guest machine, and put it by early printk. But no > more tests, so it's with RFC tag. > >v1->v2: > - Simplify some code. >Follow Baoquan He's suggestion: > - Reuse the head file of acpi code. > >v2->v3: > - Test in more conditions, so remove the 'RFC' tag. > - Change some comments. > >v3->v4: >Follow Thomas Gleixner's suggetsion: > - Put the whole efi related function into #define CONFIG_EFI and return > false in the other stub. > - Simplify two functions in head file. > >v4->v5: >Follow Dou Liyang's suggestion: > - Add more comments about some functions based on kernel code. > - Change some typo in comments. > - Clean useless variable. > - Add check for the boundary of array. > - Add check for 'movable_node' parameter > >Any comments will be welcome. > > >Chao Fan (4): > x86/boot: Add acpitb.h to help parse acpi tables > x86/boot: Add acpitb.c to parse acpi tables > x86/boot/KASLR: Walk srat tables to filter immovable memory > x86/boot/KASLR: Limit kaslr to choosing the immovable memory > > arch/x86/boot/compressed/Makefile | 4 + > arch/x86/boot/compressed/acpitb.c | 272 ++ > arch/x86/boot/compressed/acpitb.h | 7 + > arch/x86/boot/compressed/kaslr.c | 125 -- > 4 files changed, 397 insertions(+), 11 deletions(-) > create mode 100644 arch/x86/boot/compressed/acpitb.c > create mode 100644 arch/x86/boot/compressed/acpitb.h > >-- >2.17.1 >
Re: [PATCH v5 0/4] x86/boot/KASLR: Parse ACPI table and limit kaslr in immovable memory.
Hi all, No reply for 2 weeks, any comments? Thanks, Chao Fan On Tue, Aug 07, 2018 at 02:49:56PM +0800, Chao Fan wrote: >***Background: >People reported that kaslr may randomly chooses some positions >which are located in movable memory regions. This will break memory >hotplug feature and make the memory can't be removed. > >***Solutions: >There should be a method to limit kaslr to choosing immovable memory >regions, so there are 2 solutions: >1) Add a kernel parameter to specify the memory regions. >2) Get the information of memory hotremove, then kaslr will know the > right regions. >In method 2, information about memory hot remove is in ACPI >tables, which will be parsed after 'start_kernel', kaslr can't get >the information. >In method 1, users should know the regions address and specify in >kernel parameter. > >In the earliest time, I tried to dig ACPI tabls to solve this problem. >But I didn't splite the code in 'compressed/' and ACPI code, so the patch >is hard to follow so refused by community. >Somebody suggest to add a kernel parameter to specify the >immovable memory so that limit kaslr in these regions. Then I make >a patchset. After several versions, Ingo gave a suggestion: >https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1634024.html >Follow Ingo's suggestion, imitate the ACPI code to parse the acpi >tables, so that the kaslr can get necessary memory information in >ACPI tables. >Since I think ACPI code is independent part, so copy the codes >and functions to 'compressed/' directory, so that kaslr won't >influence the initialization of ACPI. > >PATCH 1/4 Reuse the head file of linux/acpi.h, and copy a fcuntion from > ACPI code. >PATCH 2/4 Functions to parse ACPI code. >PATCH 3/4 If 'CONFIG_MEMORY_HOTREMOVE' specified, walk all nodes and > store the information of immovable memory regions. >PATCH 4/4 According to the immovable memory regions, filter the > immovable regions which KASLR can choose. > >***Test results: > - I did a very simple test, and it can get the memory information in > bios and efi KVM guest machine, and put it by early printk. But no > more tests, so it's with RFC tag. > >v1->v2: > - Simplify some code. >Follow Baoquan He's suggestion: > - Reuse the head file of acpi code. > >v2->v3: > - Test in more conditions, so remove the 'RFC' tag. > - Change some comments. > >v3->v4: >Follow Thomas Gleixner's suggetsion: > - Put the whole efi related function into #define CONFIG_EFI and return > false in the other stub. > - Simplify two functions in head file. > >v4->v5: >Follow Dou Liyang's suggestion: > - Add more comments about some functions based on kernel code. > - Change some typo in comments. > - Clean useless variable. > - Add check for the boundary of array. > - Add check for 'movable_node' parameter > >Any comments will be welcome. > > >Chao Fan (4): > x86/boot: Add acpitb.h to help parse acpi tables > x86/boot: Add acpitb.c to parse acpi tables > x86/boot/KASLR: Walk srat tables to filter immovable memory > x86/boot/KASLR: Limit kaslr to choosing the immovable memory > > arch/x86/boot/compressed/Makefile | 4 + > arch/x86/boot/compressed/acpitb.c | 272 ++ > arch/x86/boot/compressed/acpitb.h | 7 + > arch/x86/boot/compressed/kaslr.c | 125 -- > 4 files changed, 397 insertions(+), 11 deletions(-) > create mode 100644 arch/x86/boot/compressed/acpitb.c > create mode 100644 arch/x86/boot/compressed/acpitb.h > >-- >2.17.1 >
Re: [BUGFIX PATCH] tracing/kprobes: Fix to check notrace function with correct range
On Thu, 23 Aug 2018 10:18:31 +0900 Masami Hiramatsu wrote: > On Wed, 22 Aug 2018 08:58:09 -0400 > Steven Rostedt wrote: > > > On Tue, 21 Aug 2018 09:42:49 -0400 > > Steven Rostedt wrote: > > > > > On Tue, 21 Aug 2018 22:04:57 +0900 > > > Masami Hiramatsu wrote: > > > > > > > Fix within_notrace_func() to check notrace function correctly. > > > > > > > > Since the ftrace_location_range(start, end) function checks > > > > the range inclusively (start <= ftrace-loc <= end), the end > > > > address must not include the entry address of next function. > > > > > > > > However, within_notrace_func() uses kallsyms_lookup_size_offset() > > > > to get the function size and calculate the end address from > > > > adding the size to the entry address. This means the end address > > > > is the entry address of the next function. > > > > > > > > In the result, within_notrace_func() fails to find notrace > > > > function if the next function of the target function is > > > > ftraced. > > > > > > > > Let's subtract 1 from the end address so that ftrace_location_range() > > > > can check it correctly. > > > > > > > > Fixes: commit 45408c4f9250 ("tracing: kprobes: Prohibit probing on > > > > notrace function") > > > > Signed-off-by: Masami Hiramatsu > > > > Reported-by: Michael Rodin > > > > --- > > > > > > > > > > Applied. Thanks Masami! I'll start testing this and send it upstream > > > when it's finished. > > > > > > > Hmm, this fix shows the extent of not tracing functions with notrace > > attched much deeper. For one thing, we can't hook kprobes to the > > __schedule() function (which is now what all the main schedule > > functions call). One of my tests used this function to test kprobes and > > it failed. > > > > I'll push this to Linus, but I'm wondering if we want to perhaps add a > > white list of functions marked "notrace" but still kprobes can trace? > > Yes, that is what I'm thinking about. Maybe most of notrace function is > safe for kprobes (like __schedule()). And if it is really critical, those > functions must be marked by NOKPROBE_SYMBOL(). > Correct. This notrace was added because we want to not trace any function call between preempt_schedule() and where NEED_RESCHED is cleared. Because the function tracer does a preempt_enable_notrace() which triggers the preempt_schedule() call again, and we can end up in a recursive loop. But that is fixed, but by tracing __schedule() we do a song and dance of calling preempt_schedule_notrace and makes the trace output weird. See commit 499d79559ffe4. -- Steve
Re: [BUGFIX PATCH] tracing/kprobes: Fix to check notrace function with correct range
On Thu, 23 Aug 2018 10:18:31 +0900 Masami Hiramatsu wrote: > On Wed, 22 Aug 2018 08:58:09 -0400 > Steven Rostedt wrote: > > > On Tue, 21 Aug 2018 09:42:49 -0400 > > Steven Rostedt wrote: > > > > > On Tue, 21 Aug 2018 22:04:57 +0900 > > > Masami Hiramatsu wrote: > > > > > > > Fix within_notrace_func() to check notrace function correctly. > > > > > > > > Since the ftrace_location_range(start, end) function checks > > > > the range inclusively (start <= ftrace-loc <= end), the end > > > > address must not include the entry address of next function. > > > > > > > > However, within_notrace_func() uses kallsyms_lookup_size_offset() > > > > to get the function size and calculate the end address from > > > > adding the size to the entry address. This means the end address > > > > is the entry address of the next function. > > > > > > > > In the result, within_notrace_func() fails to find notrace > > > > function if the next function of the target function is > > > > ftraced. > > > > > > > > Let's subtract 1 from the end address so that ftrace_location_range() > > > > can check it correctly. > > > > > > > > Fixes: commit 45408c4f9250 ("tracing: kprobes: Prohibit probing on > > > > notrace function") > > > > Signed-off-by: Masami Hiramatsu > > > > Reported-by: Michael Rodin > > > > --- > > > > > > > > > > Applied. Thanks Masami! I'll start testing this and send it upstream > > > when it's finished. > > > > > > > Hmm, this fix shows the extent of not tracing functions with notrace > > attched much deeper. For one thing, we can't hook kprobes to the > > __schedule() function (which is now what all the main schedule > > functions call). One of my tests used this function to test kprobes and > > it failed. > > > > I'll push this to Linus, but I'm wondering if we want to perhaps add a > > white list of functions marked "notrace" but still kprobes can trace? > > Yes, that is what I'm thinking about. Maybe most of notrace function is > safe for kprobes (like __schedule()). And if it is really critical, those > functions must be marked by NOKPROBE_SYMBOL(). > Correct. This notrace was added because we want to not trace any function call between preempt_schedule() and where NEED_RESCHED is cleared. Because the function tracer does a preempt_enable_notrace() which triggers the preempt_schedule() call again, and we can end up in a recursive loop. But that is fixed, but by tracing __schedule() we do a song and dance of calling preempt_schedule_notrace and makes the trace output weird. See commit 499d79559ffe4. -- Steve
Re: [RFC PATCH 1/2] x86: WARN() when uaccess helpers fault on kernel addresses
On Wed, Aug 22, 2018 at 5:55 PM, Jann Horn wrote: > On Thu, Aug 23, 2018 at 2:28 AM Andy Lutomirski wrote: >> >> On Wed, Aug 22, 2018 at 4:53 PM, Jann Horn wrote: >> > On Tue, Aug 7, 2018 at 4:55 AM Andy Lutomirski wrote: >> >> > On Aug 6, 2018, at 6:22 PM, Jann Horn wrote: >> >> > There have been multiple kernel vulnerabilities that permitted >> >> > userspace to >> >> > pass completely unchecked pointers through to userspace accessors: >> >> > >> >> > - the waitid() bug - commit 96ca579a1ecc ("waitid(): Add missing >> >> > access_ok() checks") >> >> > - the sg/bsg read/write APIs >> >> > - the infiniband read/write APIs >> >> > >> >> > These don't happen all that often, but when they do happen, it is hard >> >> > to >> >> > test for them properly; and it is probably also hard to discover them >> >> > with >> >> > fuzzing. Even when an unmapped kernel address is supplied to such buggy >> >> > code, it just returns -EFAULT instead of doing a proper BUG() or at >> >> > least >> >> > WARN(). >> >> > >> >> > This patch attempts to make such misbehaving code a bit more visible by >> >> > WARN()ing in the pagefault handler code when a userspace accessor causes >> >> > #PF on a kernel address and the current context isn't whitelisted. >> >> >> >> I like this a lot, and, in fact, I once wrote a patch to do something >> >> similar. It was before the fancy extable code, though, so it was a mess. >> >> Here are some thoughts: >> >> >> >> - It should be three patches. One patch to add the _UA annotations, one >> >> to improve the info passes to the handlers, and one to change behavior. >> >> >> >> - You should pass the vector, the error code, and the address to the >> >> handler. >> > >> > I'm polishing the patch a bit, and I've noticed that to plumb the >> > error code and address through properly, I might need significantly >> > more code churn because of kprobes - I want to make sure I'm not going >> > down the completely wrong path here. >> > >> > I'm extending fixup_exception() to take two extra args "unsigned long >> > error_code, unsigned long fault_addr". Most callers of >> > fixup_exception() are fairly straightforward, but >> > kprobe_fault_handler() has a dozen callchains from different exception >> > handlers, and most of them are coming via notify_die(). >> >> KILL IT WITH FIRE >> >> More seriously, kill kprobe_exceptions_notify() and just fold the >> contents into do_general_protection(). This notifier chain crap is >> incomprehensible. I would love to see notify_die() removed entirely, >> and crap like this is just more reason to want it gone. > > This isn't just do_general_protection() though, that's just one > example. As far as I can tell, similar stuff happens everywhere where > notify_die() is used - #DF, #BR, #BP, #MF and so on. True. But there aren't actually that many places in the kernel that use die notifiers at all. Here's the complete list, excluding non-x86 arch-specific ones, annotated a bit: arch/x86/kernel/kgdb.c:retval = register_die_notifier(_notifier); arch/x86/kernel/kgdb.c:unregister_die_notifier(_notifier); arch/x86/kernel/kgdb.c:unregister_die_notifier(_notifier); OK, maybe not totally crazy for kgdb. arch/x86/mm/kasan_init_64.c:register_die_notifier(_die_notifier); Should be in traps.c directly. arch/x86/mm/kmmio.c:return register_die_notifier(_die); arch/x86/mm/kmmio.c:unregister_die_notifier(_die); Should probably be in traps.c directly. drivers/bus/brcmstb_gisb.c:register_die_notifier(_die_notifier); I don't know WTF this is, but it is certainly garbage if anyone ever tries to build this on x86. drivers/dma/sh/shdmac.c:int err = register_die_notifier(_dmae_nmi_notifier); drivers/dma/sh/shdmac.c:unregister_die_notifier(_dmae_nmi_notifier); SH-specific, I think. Not really sure. drivers/firmware/google/gsmi.c:register_die_notifier(_die_notifier); drivers/firmware/google/gsmi.c:unregister_die_notifier(_die_notifier); This is actually an *oops* notifier. That's totally reasonable, but it should be a separate OOPS notification chain. drivers/hv/vmbus_drv.c:register_die_notifier(_die_block); drivers/hv/vmbus_drv.c:unregister_die_notifier(_die_block); Appears to *want* to be an OOPS notifier, but it appears to be rather buggy and to actually catch everything. drivers/misc/sgi-xp/xpc_main.c: (void)unregister_die_notifier(_die_notifier); drivers/misc/sgi-xp/xpc_main.c:ret = register_die_notifier(_die_notifier); drivers/misc/sgi-xp/xpc_main.c: (void)unregister_die_notifier(_die_notifier); Haven't looked. kernel/events/hw_breakpoint.c:return register_die_notifier(_breakpoint_exceptions_nb); This is an utter abomination and needs to be killed. The #DB code is gnarly enough without this particular indirection. I want to kill it on x86. I have a big #DB cleanup series. Maybe I'll do it. kernel/events/uprobes.c:return register_die_notifier(_exception_nb); For
Re: [RFC PATCH 1/2] x86: WARN() when uaccess helpers fault on kernel addresses
On Wed, Aug 22, 2018 at 5:55 PM, Jann Horn wrote: > On Thu, Aug 23, 2018 at 2:28 AM Andy Lutomirski wrote: >> >> On Wed, Aug 22, 2018 at 4:53 PM, Jann Horn wrote: >> > On Tue, Aug 7, 2018 at 4:55 AM Andy Lutomirski wrote: >> >> > On Aug 6, 2018, at 6:22 PM, Jann Horn wrote: >> >> > There have been multiple kernel vulnerabilities that permitted >> >> > userspace to >> >> > pass completely unchecked pointers through to userspace accessors: >> >> > >> >> > - the waitid() bug - commit 96ca579a1ecc ("waitid(): Add missing >> >> > access_ok() checks") >> >> > - the sg/bsg read/write APIs >> >> > - the infiniband read/write APIs >> >> > >> >> > These don't happen all that often, but when they do happen, it is hard >> >> > to >> >> > test for them properly; and it is probably also hard to discover them >> >> > with >> >> > fuzzing. Even when an unmapped kernel address is supplied to such buggy >> >> > code, it just returns -EFAULT instead of doing a proper BUG() or at >> >> > least >> >> > WARN(). >> >> > >> >> > This patch attempts to make such misbehaving code a bit more visible by >> >> > WARN()ing in the pagefault handler code when a userspace accessor causes >> >> > #PF on a kernel address and the current context isn't whitelisted. >> >> >> >> I like this a lot, and, in fact, I once wrote a patch to do something >> >> similar. It was before the fancy extable code, though, so it was a mess. >> >> Here are some thoughts: >> >> >> >> - It should be three patches. One patch to add the _UA annotations, one >> >> to improve the info passes to the handlers, and one to change behavior. >> >> >> >> - You should pass the vector, the error code, and the address to the >> >> handler. >> > >> > I'm polishing the patch a bit, and I've noticed that to plumb the >> > error code and address through properly, I might need significantly >> > more code churn because of kprobes - I want to make sure I'm not going >> > down the completely wrong path here. >> > >> > I'm extending fixup_exception() to take two extra args "unsigned long >> > error_code, unsigned long fault_addr". Most callers of >> > fixup_exception() are fairly straightforward, but >> > kprobe_fault_handler() has a dozen callchains from different exception >> > handlers, and most of them are coming via notify_die(). >> >> KILL IT WITH FIRE >> >> More seriously, kill kprobe_exceptions_notify() and just fold the >> contents into do_general_protection(). This notifier chain crap is >> incomprehensible. I would love to see notify_die() removed entirely, >> and crap like this is just more reason to want it gone. > > This isn't just do_general_protection() though, that's just one > example. As far as I can tell, similar stuff happens everywhere where > notify_die() is used - #DF, #BR, #BP, #MF and so on. True. But there aren't actually that many places in the kernel that use die notifiers at all. Here's the complete list, excluding non-x86 arch-specific ones, annotated a bit: arch/x86/kernel/kgdb.c:retval = register_die_notifier(_notifier); arch/x86/kernel/kgdb.c:unregister_die_notifier(_notifier); arch/x86/kernel/kgdb.c:unregister_die_notifier(_notifier); OK, maybe not totally crazy for kgdb. arch/x86/mm/kasan_init_64.c:register_die_notifier(_die_notifier); Should be in traps.c directly. arch/x86/mm/kmmio.c:return register_die_notifier(_die); arch/x86/mm/kmmio.c:unregister_die_notifier(_die); Should probably be in traps.c directly. drivers/bus/brcmstb_gisb.c:register_die_notifier(_die_notifier); I don't know WTF this is, but it is certainly garbage if anyone ever tries to build this on x86. drivers/dma/sh/shdmac.c:int err = register_die_notifier(_dmae_nmi_notifier); drivers/dma/sh/shdmac.c:unregister_die_notifier(_dmae_nmi_notifier); SH-specific, I think. Not really sure. drivers/firmware/google/gsmi.c:register_die_notifier(_die_notifier); drivers/firmware/google/gsmi.c:unregister_die_notifier(_die_notifier); This is actually an *oops* notifier. That's totally reasonable, but it should be a separate OOPS notification chain. drivers/hv/vmbus_drv.c:register_die_notifier(_die_block); drivers/hv/vmbus_drv.c:unregister_die_notifier(_die_block); Appears to *want* to be an OOPS notifier, but it appears to be rather buggy and to actually catch everything. drivers/misc/sgi-xp/xpc_main.c: (void)unregister_die_notifier(_die_notifier); drivers/misc/sgi-xp/xpc_main.c:ret = register_die_notifier(_die_notifier); drivers/misc/sgi-xp/xpc_main.c: (void)unregister_die_notifier(_die_notifier); Haven't looked. kernel/events/hw_breakpoint.c:return register_die_notifier(_breakpoint_exceptions_nb); This is an utter abomination and needs to be killed. The #DB code is gnarly enough without this particular indirection. I want to kill it on x86. I have a big #DB cleanup series. Maybe I'll do it. kernel/events/uprobes.c:return register_die_notifier(_exception_nb); For
Re: [BUGFIX PATCH] tracing/kprobes: Fix to check notrace function with correct range
On Wed, 22 Aug 2018 08:58:09 -0400 Steven Rostedt wrote: > On Tue, 21 Aug 2018 09:42:49 -0400 > Steven Rostedt wrote: > > > On Tue, 21 Aug 2018 22:04:57 +0900 > > Masami Hiramatsu wrote: > > > > > Fix within_notrace_func() to check notrace function correctly. > > > > > > Since the ftrace_location_range(start, end) function checks > > > the range inclusively (start <= ftrace-loc <= end), the end > > > address must not include the entry address of next function. > > > > > > However, within_notrace_func() uses kallsyms_lookup_size_offset() > > > to get the function size and calculate the end address from > > > adding the size to the entry address. This means the end address > > > is the entry address of the next function. > > > > > > In the result, within_notrace_func() fails to find notrace > > > function if the next function of the target function is > > > ftraced. > > > > > > Let's subtract 1 from the end address so that ftrace_location_range() > > > can check it correctly. > > > > > > Fixes: commit 45408c4f9250 ("tracing: kprobes: Prohibit probing on > > > notrace function") > > > Signed-off-by: Masami Hiramatsu > > > Reported-by: Michael Rodin > > > --- > > > > > > > Applied. Thanks Masami! I'll start testing this and send it upstream > > when it's finished. > > > > Hmm, this fix shows the extent of not tracing functions with notrace > attched much deeper. For one thing, we can't hook kprobes to the > __schedule() function (which is now what all the main schedule > functions call). One of my tests used this function to test kprobes and > it failed. > > I'll push this to Linus, but I'm wondering if we want to perhaps add a > white list of functions marked "notrace" but still kprobes can trace? Yes, that is what I'm thinking about. Maybe most of notrace function is safe for kprobes (like __schedule()). And if it is really critical, those functions must be marked by NOKPROBE_SYMBOL(). Thank you, -- Masami Hiramatsu
Re: [BUGFIX PATCH] tracing/kprobes: Fix to check notrace function with correct range
On Wed, 22 Aug 2018 08:58:09 -0400 Steven Rostedt wrote: > On Tue, 21 Aug 2018 09:42:49 -0400 > Steven Rostedt wrote: > > > On Tue, 21 Aug 2018 22:04:57 +0900 > > Masami Hiramatsu wrote: > > > > > Fix within_notrace_func() to check notrace function correctly. > > > > > > Since the ftrace_location_range(start, end) function checks > > > the range inclusively (start <= ftrace-loc <= end), the end > > > address must not include the entry address of next function. > > > > > > However, within_notrace_func() uses kallsyms_lookup_size_offset() > > > to get the function size and calculate the end address from > > > adding the size to the entry address. This means the end address > > > is the entry address of the next function. > > > > > > In the result, within_notrace_func() fails to find notrace > > > function if the next function of the target function is > > > ftraced. > > > > > > Let's subtract 1 from the end address so that ftrace_location_range() > > > can check it correctly. > > > > > > Fixes: commit 45408c4f9250 ("tracing: kprobes: Prohibit probing on > > > notrace function") > > > Signed-off-by: Masami Hiramatsu > > > Reported-by: Michael Rodin > > > --- > > > > > > > Applied. Thanks Masami! I'll start testing this and send it upstream > > when it's finished. > > > > Hmm, this fix shows the extent of not tracing functions with notrace > attched much deeper. For one thing, we can't hook kprobes to the > __schedule() function (which is now what all the main schedule > functions call). One of my tests used this function to test kprobes and > it failed. > > I'll push this to Linus, but I'm wondering if we want to perhaps add a > white list of functions marked "notrace" but still kprobes can trace? Yes, that is what I'm thinking about. Maybe most of notrace function is safe for kprobes (like __schedule()). And if it is really critical, those functions must be marked by NOKPROBE_SYMBOL(). Thank you, -- Masami Hiramatsu
Re: [PATCH] include/linux/compiler*.h: make compiler-*.h mutually exclusive
On Wed, Aug 22, 2018 at 6:10 PM Dominique Martinet wrote: > > I'm not building linux directly, but BPF programs that indirectly uses > clang with bcc Oh, ok. I _can_ test the basic build (just not get a working link and a kernel) by just forcing it, so I guess I'll call that testing enough. I'll push it out. It's just Nick's patch, but with the extra barrier parenthesis removed. Linus
Re: [PATCH] include/linux/compiler*.h: make compiler-*.h mutually exclusive
On Wed, Aug 22, 2018 at 6:10 PM Dominique Martinet wrote: > > I'm not building linux directly, but BPF programs that indirectly uses > clang with bcc Oh, ok. I _can_ test the basic build (just not get a working link and a kernel) by just forcing it, so I guess I'll call that testing enough. I'll push it out. It's just Nick's patch, but with the extra barrier parenthesis removed. Linus
Re: [PATCH] include/linux/compiler*.h: make compiler-*.h mutually exclusive
Linus Torvalds wrote on Wed, Aug 22, 2018: > I've fixed that manually, but when I tried to test it I just hit the > > arch/x86/Makefile:179: *** Compiler lacks asm-goto support.. Stop. > > error. > > Do you have some experimental clang build with asm goto support? What > version? Or is it just that you're building ARM, not x86? I'm not building linux directly, but BPF programs that indirectly uses clang with bcc On fedora you can install bcc-tools and run e.g. /usr/share/bcc/tools/execsnoop that will compile something at runtime. If the package isn't available for your distro you can just install bcc and run the script from their repo[1] [1] https://raw.githubusercontent.com/iovisor/bcc/master/tools/execsnoop.py Thanks, -- Dominique
Re: [PATCH] include/linux/compiler*.h: make compiler-*.h mutually exclusive
Linus Torvalds wrote on Wed, Aug 22, 2018: > I've fixed that manually, but when I tried to test it I just hit the > > arch/x86/Makefile:179: *** Compiler lacks asm-goto support.. Stop. > > error. > > Do you have some experimental clang build with asm goto support? What > version? Or is it just that you're building ARM, not x86? I'm not building linux directly, but BPF programs that indirectly uses clang with bcc On fedora you can install bcc-tools and run e.g. /usr/share/bcc/tools/execsnoop that will compile something at runtime. If the package isn't available for your distro you can just install bcc and run the script from their repo[1] [1] https://raw.githubusercontent.com/iovisor/bcc/master/tools/execsnoop.py Thanks, -- Dominique
Re: please revert commit ce8556cca6 "kbuild: verify that $DEPMOD is installed" introduced in v4.18.4.
On 08/22/2018 05:39 PM, Dmitry Torokhov wrote: > On Wed, Aug 22, 2018 at 4:35 PM Randy Dunlap wrote: >> >> On 08/22/2018 11:53 AM, H. Nikolaus Schaller wrote: >>> This patch requires that /sbin/depmod is installed and installable on >>> the build host. >>> >>> But not all build hosts for cross compiling Linux are Linux systems >>> and are able to provide a working port of depmod, especially at the >>> file patch /sbin/depmod. >>> >>> I use, for example, a Darwin system to cross compile Linux and I run >>> depmod -a on the embedded system once, after installing a new Linux >>> kernel there. >>> >>> I have no problem with seeing a warning, but aborting the build process >>> is IMHO a bad idea since the previous behaviour didn't harm many people >>> as far as I see. Probably 99% of people compiling Linux kernels do that >>> on Linux and 99% of those have depmod installed for optimal operation of >>> their build host. So IMHO printing the warning is good enough. >> >> Thanks for the report and sorry about the problem. >> >> I'm OK with changing the error to a warning. >> Does the patch below work for you? > > Why would one want a warning on a host that never runs "make modules_install"? > Can this check be only done when we actually try to install modules? So Nikolaus: how do you provoke this problem that you are reporting? It's not just a theoretical problem, is it? The way that I read the top-level Makefile, this check for $DEPMOD only happens when you run "make modules_install". -- ~Randy
Re: please revert commit ce8556cca6 "kbuild: verify that $DEPMOD is installed" introduced in v4.18.4.
On 08/22/2018 05:39 PM, Dmitry Torokhov wrote: > On Wed, Aug 22, 2018 at 4:35 PM Randy Dunlap wrote: >> >> On 08/22/2018 11:53 AM, H. Nikolaus Schaller wrote: >>> This patch requires that /sbin/depmod is installed and installable on >>> the build host. >>> >>> But not all build hosts for cross compiling Linux are Linux systems >>> and are able to provide a working port of depmod, especially at the >>> file patch /sbin/depmod. >>> >>> I use, for example, a Darwin system to cross compile Linux and I run >>> depmod -a on the embedded system once, after installing a new Linux >>> kernel there. >>> >>> I have no problem with seeing a warning, but aborting the build process >>> is IMHO a bad idea since the previous behaviour didn't harm many people >>> as far as I see. Probably 99% of people compiling Linux kernels do that >>> on Linux and 99% of those have depmod installed for optimal operation of >>> their build host. So IMHO printing the warning is good enough. >> >> Thanks for the report and sorry about the problem. >> >> I'm OK with changing the error to a warning. >> Does the patch below work for you? > > Why would one want a warning on a host that never runs "make modules_install"? > Can this check be only done when we actually try to install modules? So Nikolaus: how do you provoke this problem that you are reporting? It's not just a theoretical problem, is it? The way that I read the top-level Makefile, this check for $DEPMOD only happens when you run "make modules_install". -- ~Randy