Re: [PATCH v4 15/16] module: Move where we mark modules RO,X
On Tue, 22 Oct 2019 22:24:01 +0200 Peter Zijlstra wrote: > On Mon, Oct 21, 2019 at 10:21:10PM -0400, Steven Rostedt wrote: > > On Fri, 18 Oct 2019 09:35:40 +0200 > > Peter Zijlstra wrote: > > > > > Now that set_all_modules_text_*() is gone, nothing depends on the > > > relation between ->state = COMING and the protection state anymore. > > > This enables moving the protection changes later, such that the COMING > > > notifier callbacks can more easily modify the text. > > > > > > Signed-off-by: Peter Zijlstra (Intel) > > > Cc: Jessica Yu > > > --- > > > > This triggered the following bug: > > > > > The trace_event_define_fields_() is defined in > > include/trace/trace_events.h and is an init function called by the > > trace_events event_create_dir() via the module notifier: > > MODULE_STATE_COMING > > The below seems to cure it; and seems to generate identical > events/*/format output (for my .config, with the exception of ID). > > It has just one section mismatch report that I'm too tired to look at > just now. > > I'm not particularly proud of the "__function__" hack, but it works :/ I > couldn't come up with anything else for [uk]probes which seem to have > dynamic fields and if we're having it then syscall_enter can also make > use of it, the syscall_metadata crud was going to be ugly otherwise. > > (also, win on LOC) > > After applying this series and this patch I triggered this: [ 1397.281889] BUG: kernel NULL pointer dereference, address: 0001 [ 1397.288896] #PF: supervisor read access in kernel mode [ 1397.294062] #PF: error_code(0x) - not-present page [ 1397.299192] PGD 0 P4D 0 [ 1397.301728] Oops: [#1] PREEMPT SMP PTI [ 1397.305908] CPU: 7 PID: 4252 Comm: ftracetest Not tainted 5.4.0-rc3-test+ #132 [ 1397.313114] Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016 [ 1397.322056] RIP: 0010:event_create_dir+0x26a/0x520 [ 1397.326841] Code: ff ff 5a 85 c0 75 37 44 03 7b 10 48 83 c3 20 4c 8b 13 4d 85 d2 0f 84 66 fe ff ff b9 0d 00 00 00 4c 89 d6 4c 89 f7 48 8b 53 08 a6 0f 97 c1 80 d9 00 84 c9 75 a5 48 89 ef e8 b2 d4 a3 00 85 c0 [ 1397.345558] RSP: 0018:c9a63d18 EFLAGS: 00010202 [ 1397.350775] RAX: RBX: c9a63d80 RCX: 000d [ 1397.357893] RDX: 811ccfac RSI: 0001 RDI: 8207c68a [ 1397.365006] RBP: 888114b1c750 R08: R09: 8881147561b0 [ 1397.372119] R10: 0001 R11: 0001 R12: 8880b1d80528 [ 1397.379243] R13: 88811189aeb0 R14: 8207c68a R15: 811d2d22 [ 1397.386365] FS: 7f2567213740() GS:88811a9c() knlGS: [ 1397.394437] CS: 0010 DS: ES: CR0: 80050033 [ 1397.400174] CR2: 0001 CR3: b1f06005 CR4: 001606e0 [ 1397.407297] Call Trace: [ 1397.409753] trace_add_event_call+0x6c/0xb0 [ 1397.413938] trace_probe_register_event_call+0x22/0x50 [ 1397.419071] trace_kprobe_create+0x65c/0xa20 [ 1397.423340] ? argv_split+0x99/0x130 [ 1397.426913] ? __kmalloc+0x1d4/0x2c0 [ 1397.430485] ? trace_kprobe_create+0xa20/0xa20 [ 1397.434922] ? trace_kprobe_create+0xa20/0xa20 [ 1397.439361] create_or_delete_trace_kprobe+0xd/0x30 [ 1397.444237] trace_run_command+0x72/0x90 [ 1397.448158] trace_parse_run_command+0xaf/0x131 [ 1397.452684] vfs_write+0xa5/0x1a0 [ 1397.455996] ksys_write+0x5c/0xd0 [ 1397.459312] do_syscall_64+0x48/0x120 [ 1397.462971] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 1397.468017] RIP: 0033:0x7f2567303ff8 By running tools/selftests/ftrace/ftracetest Crashed here: [33] Kprobe dynamic event - adding and removing [PASS] [34] Kprobe dynamic event - busy event check[PASS] [35] Kprobe event with comm arguments [FAIL] [36] Kprobe event string type argumentclient_loop: -- Steve
Re: [PATCH v4 15/16] module: Move where we mark modules RO,X
On Wed, Oct 23, 2019 at 05:16:54PM +0200, Peter Zijlstra wrote: > @@ -157,6 +158,14 @@ static int __apply_relocate_add(Elf64_Sh > > val = sym->st_value + rel[i].r_addend; > > + /* > + * .klp.rela.* sections should only contain module > + * related RELAs. All core-kernel RELAs should be in > + * normal .rela.* sections and be applied when loading > + * the patch module itself. > + */ > + WARN_ON_ONCE(klp && core_kernel_text(val)); > + This isn't quite true, we also use .klp.rela sections to access unexported vmlinux symbols. -- Josh
Re: [PATCH v4 15/16] module: Move where we mark modules RO,X
On Wed, Oct 23, 2019 at 01:48:35PM +0200, Peter Zijlstra wrote: > Now sadly that commit missed all the useful information, luckily I could > find the patch in my LKML folder, more sad, that thread still didn't > contain the actual useful information, for that I was directed to > github: > > https://github.com/dynup/kpatch/issues/580 > > Now, someone is owning me a beer for having to look at github for this. Deal. And you probably deserve a few more for fixing our crap. The github thing is supposed to be temporary, at least in theory we'll eventually have all klp patch module building code in the kernel tree. > That finally explained that what happens is that the RELA was trying to > fix up the paravirt indirect call to 'local_irq_disable', which > apply_paravirt() will have overwritten with 'CLI; NOP'. This then > obviously goes *bang*. > > This then raises a number of questions: > > 1) why is that RELA (that obviously does not depend on any module) > applied so late? Good question. The 'pv_ops' symbol is exported by the core kernel, so I can't see any reason why we'd need to apply that rela late. In theory, kpatch-build isn't supposed to convert that to a klp rela. Maybe something went wrong in the patch creation code. I'm also questioning why we even need to apply the parainstructions section late. Maybe we can remove that apply_paravirt() call altogether, along with .klp.arch.parainstruction sections. I'll need to look into it... > 2) why can't we unconditionally skip RELA's to paravirt sites? We could, but I don't think it's needed if we fix #1. > 3) Is there ever a possible module-dependent RELA to a paravirt / > alternative site? Good question... > Now, for 1), I would propose '.klp.rela.${mod}' sections only contain > RELAs that depend on symbols in ${mod} (or modules in general). That was already the goal, but we've apparently failed at that. > We can fix up RELAs that depend on core kernel early without problems. > Let them be in the normal .rela sections and be fixed up on loading > the patch-module as per usual. If such symbols aren't exported, then they still need to be in .klp.rela.vmlinux sections, since normal relas won't work. > This should also deal with 2, paravirt should always have RELAs into the > core kernel. > > Then for 3) we only have alternatives left, and I _think_ it unlikely to > be the case, but I'll have to have a hard look at that. I'm not sure about alternatives, but maybe we can enforce such limitations with tooling and/or kernel checks. -- Josh
Re: [PATCH v4 15/16] module: Move where we mark modules RO,X
On Wed, Oct 23, 2019 at 01:48:35PM +0200, Peter Zijlstra wrote: > On Mon, Oct 21, 2019 at 04:14:02PM +0200, Peter Zijlstra wrote: > > On Mon, Oct 21, 2019 at 08:53:12AM -0500, Josh Poimboeuf wrote: > > > > Doesn't livepatch code also need to be modified? We have: > > > > Urgh bah.. I was too focussed on the other klp borkage :/ But yes, > > arm64/ftrace and klp are the only two users of that function (outside of > > module.c) and Mark was already writing a patch for arm64. > > > > Means these last two patches need to wait a little until we've fixed > > those. > > So the other KLP issue: > > peterz: ad klp, apply_relocate_add() and text_poke()... what > about apply_alternatives() and apply_paravirt()? They call > text_poke_early(), which was ok with module_disable/enable_ro(), but >now it is not, I suppose. See arch_klp_init_object_loaded() in > arch/x86/kernel/livepatch.c. > > mbenes: hurm, I don't see why we would need to do > apply_alternatives() / apply_paravirt() later, why can't we do that >the moment we load the module > > mbenes: that is, those things _never_ change after boot > > peterz: as jpoimboe explained. See commit > d4c3e6e1b193497da3a2ce495fb1db0243e41e37 for detailed explanation. > > Now sadly that commit missed all the useful information, luckily I could > find the patch in my LKML folder, more sad, that thread still didn't > contain the actual useful information, for that I was directed to > github: > > https://github.com/dynup/kpatch/issues/580 > > Now, someone is owning me a beer for having to look at github for this. > > That finally explained that what happens is that the RELA was trying to > fix up the paravirt indirect call to 'local_irq_disable', which > apply_paravirt() will have overwritten with 'CLI; NOP'. This then > obviously goes *bang*. > > This then raises a number of questions: > > 1) why is that RELA (that obviously does not depend on any module) > applied so late? > > 2) why can't we unconditionally skip RELA's to paravirt sites? > > 3) Is there ever a possible module-dependent RELA to a paravirt / > alternative site? > > > Now, for 1), I would propose '.klp.rela.${mod}' sections only contain > RELAs that depend on symbols in ${mod} (or modules in general). We can > fix up RELAs that depend on core kernel early without problems. Let them > be in the normal .rela sections and be fixed up on loading the > patch-module as per usual. > > This should also deal with 2, paravirt should always have RELAs into the > core kernel. > > Then for 3) we only have alternatives left, and I _think_ it unlikely to > be the case, but I'll have to have a hard look at that. > > Hmmm ? Something like so on top, I suppose. --- --- a/arch/x86/kernel/module.c +++ b/arch/x86/kernel/module.c @@ -131,7 +131,8 @@ static int __apply_relocate_add(Elf64_Sh unsigned int symindex, unsigned int relsec, struct module *me, - void *(*write)(void *addr, const void *val, size_t size)) + void *(*write)(void *addr, const void *val, size_t size), + bool klp) { unsigned int i; Elf64_Rela *rel = (void *)sechdrs[relsec].sh_addr; @@ -157,6 +158,14 @@ static int __apply_relocate_add(Elf64_Sh val = sym->st_value + rel[i].r_addend; + /* +* .klp.rela.* sections should only contain module +* related RELAs. All core-kernel RELAs should be in +* normal .rela.* sections and be applied when loading +* the patch module itself. +*/ + WARN_ON_ONCE(klp && core_kernel_text(val)); + switch (ELF64_R_TYPE(rel[i].r_info)) { case R_X86_64_NONE: break; @@ -223,7 +232,7 @@ int apply_relocate_add(Elf64_Shdr *sechd unsigned int relsec, struct module *me) { - return __apply_relocate_add(sechdrs, strtab, symindex, relsec, me, memcpy); + return __apply_relocate_add(sechdrs, strtab, symindex, relsec, me, memcpy, false); } int klp_apply_relocate_add(Elf64_Shdr *sechdrs, @@ -234,7 +243,7 @@ int klp_apply_relocate_add(Elf64_Shdr *s { int ret; - ret = __apply_relocate_add(sechdrs, strtab, symindex, relsec, me, text_poke); + ret = __apply_relocate_add(sechdrs, strtab, symindex, relsec, me, text_poke, true); if (!ret) text_poke_sync();
Re: [PATCH v4 15/16] module: Move where we mark modules RO,X
On Mon, Oct 21, 2019 at 04:14:02PM +0200, Peter Zijlstra wrote: > On Mon, Oct 21, 2019 at 08:53:12AM -0500, Josh Poimboeuf wrote: > > Doesn't livepatch code also need to be modified? We have: > > Urgh bah.. I was too focussed on the other klp borkage :/ But yes, > arm64/ftrace and klp are the only two users of that function (outside of > module.c) and Mark was already writing a patch for arm64. > > Means these last two patches need to wait a little until we've fixed > those. So the other KLP issue: peterz: ad klp, apply_relocate_add() and text_poke()... what about apply_alternatives() and apply_paravirt()? They call text_poke_early(), which was ok with module_disable/enable_ro(), but now it is not, I suppose. See arch_klp_init_object_loaded() in arch/x86/kernel/livepatch.c. mbenes: hurm, I don't see why we would need to do apply_alternatives() / apply_paravirt() later, why can't we do that the moment we load the module mbenes: that is, those things _never_ change after boot peterz: as jpoimboe explained. See commit d4c3e6e1b193497da3a2ce495fb1db0243e41e37 for detailed explanation. Now sadly that commit missed all the useful information, luckily I could find the patch in my LKML folder, more sad, that thread still didn't contain the actual useful information, for that I was directed to github: https://github.com/dynup/kpatch/issues/580 Now, someone is owning me a beer for having to look at github for this. That finally explained that what happens is that the RELA was trying to fix up the paravirt indirect call to 'local_irq_disable', which apply_paravirt() will have overwritten with 'CLI; NOP'. This then obviously goes *bang*. This then raises a number of questions: 1) why is that RELA (that obviously does not depend on any module) applied so late? 2) why can't we unconditionally skip RELA's to paravirt sites? 3) Is there ever a possible module-dependent RELA to a paravirt / alternative site? Now, for 1), I would propose '.klp.rela.${mod}' sections only contain RELAs that depend on symbols in ${mod} (or modules in general). We can fix up RELAs that depend on core kernel early without problems. Let them be in the normal .rela sections and be fixed up on loading the patch-module as per usual. This should also deal with 2, paravirt should always have RELAs into the core kernel. Then for 3) we only have alternatives left, and I _think_ it unlikely to be the case, but I'll have to have a hard look at that. Hmmm ?
Re: [PATCH v4 15/16] module: Move where we mark modules RO,X
On Tue, Oct 22, 2019 at 04:40:23PM -0400, Steven Rostedt wrote: > On Tue, 22 Oct 2019 22:24:01 +0200 > Peter Zijlstra wrote: > > I'm not particularly proud of the "__function__" hack, but it works :/ I > > If anything, that should be defined as a macro: > > #define TRACE_EVENT_FIELD_SPECIAL "__trace_event_special__" > > And use that to test? Possibly, also, we should probably start with a character that C doesn't allow in typenames, like '$'. That way we can have a much shorter string and still be certain it will never conflict; "$func" comes to mind. > > couldn't come up with anything else for [uk]probes which seem to have > > dynamic fields and if we're having it then syscall_enter can also make > > use of it, the syscall_metadata crud was going to be ugly otherwise. > > > > (also, win on LOC) > > I'm more worried about text/data bloat. But if anything, we may be able > to deal with that later. We use almost the exact same data the function would've used, except we don't have the actual function. I just don't see how it can be more.
Re: [PATCH v4 15/16] module: Move where we mark modules RO,X
On Tue, 22 Oct 2019 22:24:01 +0200 Peter Zijlstra wrote: > The below seems to cure it; and seems to generate identical > events/*/format output (for my .config, with the exception of ID). > > It has just one section mismatch report that I'm too tired to look at > just now. Thanks, I'll try to take a look at it tomorrow. > > I'm not particularly proud of the "__function__" hack, but it works :/ I If anything, that should be defined as a macro: #define TRACE_EVENT_FIELD_SPECIAL "__trace_event_special__" And use that to test? > couldn't come up with anything else for [uk]probes which seem to have > dynamic fields and if we're having it then syscall_enter can also make > use of it, the syscall_metadata crud was going to be ugly otherwise. > > (also, win on LOC) I'm more worried about text/data bloat. But if anything, we may be able to deal with that later. -- Steve
Re: [PATCH v4 15/16] module: Move where we mark modules RO,X
On Mon, Oct 21, 2019 at 10:21:10PM -0400, Steven Rostedt wrote: > On Fri, 18 Oct 2019 09:35:40 +0200 > Peter Zijlstra wrote: > > > Now that set_all_modules_text_*() is gone, nothing depends on the > > relation between ->state = COMING and the protection state anymore. > > This enables moving the protection changes later, such that the COMING > > notifier callbacks can more easily modify the text. > > > > Signed-off-by: Peter Zijlstra (Intel) > > Cc: Jessica Yu > > --- > > This triggered the following bug: > > The trace_event_define_fields_() is defined in > include/trace/trace_events.h and is an init function called by the > trace_events event_create_dir() via the module notifier: > MODULE_STATE_COMING The below seems to cure it; and seems to generate identical events/*/format output (for my .config, with the exception of ID). It has just one section mismatch report that I'm too tired to look at just now. I'm not particularly proud of the "__function__" hack, but it works :/ I couldn't come up with anything else for [uk]probes which seem to have dynamic fields and if we're having it then syscall_enter can also make use of it, the syscall_metadata crud was going to be ugly otherwise. (also, win on LOC) --- fs/xfs/xfs_trace.h | 4 +- include/linux/trace_events.h | 16 ++- include/trace/events/filemap.h | 2 +- include/trace/trace_events.h | 64 - kernel/trace/trace.h | 31 ++-- kernel/trace/trace_entries.h | 66 +++-- kernel/trace/trace_events.c| 20 +++- kernel/trace/trace_export.c| 106 +++-- kernel/trace/trace_kprobe.c| 12 - kernel/trace/trace_syscalls.c | 48 +++ kernel/trace/trace_uprobe.c| 6 ++- 11 files changed, 162 insertions(+), 213 deletions(-) diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index eaae275ed430..53c5485cf2a1 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -218,8 +218,8 @@ DECLARE_EVENT_CLASS(xfs_bmap_class, TP_STRUCT__entry( __field(dev_t, dev) __field(xfs_ino_t, ino) - __field(void *, leaf); - __field(int, pos); + __field(void *, leaf) + __field(int, pos) __field(xfs_fileoff_t, startoff) __field(xfs_fsblock_t, startblock) __field(xfs_filblks_t, blockcount) diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h index 30a8cdcfd4a4..30782e78b91d 100644 --- a/include/linux/trace_events.h +++ b/include/linux/trace_events.h @@ -187,6 +187,20 @@ enum trace_reg { struct trace_event_call; +struct trace_event_fields { + const char *type; + union { + struct { + const char *name; + const int size; + const int align; + const int is_signed; + const int filter_type; + }; + int (*define_fields)(struct trace_event_call *); + }; +}; + struct trace_event_class { const char *system; void*probe; @@ -195,7 +209,7 @@ struct trace_event_class { #endif int (*reg)(struct trace_event_call *event, enum trace_reg type, void *data); - int (*define_fields)(struct trace_event_call *); + struct trace_event_fields *fields_array; struct list_head*(*get_fields)(struct trace_event_call *); struct list_headfields; int (*raw_init)(struct trace_event_call *); diff --git a/include/trace/events/filemap.h b/include/trace/events/filemap.h index ee05db7ee8d2..796053e162d2 100644 --- a/include/trace/events/filemap.h +++ b/include/trace/events/filemap.h @@ -85,7 +85,7 @@ TRACE_EVENT(file_check_and_advance_wb_err, TP_ARGS(file, old), TP_STRUCT__entry( - __field(struct file *, file); + __field(struct file *, file) __field(unsigned long, i_ino) __field(dev_t, s_dev) __field(errseq_t, old) diff --git a/include/trace/trace_events.h b/include/trace/trace_events.h index 4ecdfe2e3580..ca1d2e745a3f 100644 --- a/include/trace/trace_events.h +++ b/include/trace/trace_events.h @@ -394,22 +394,16 @@ static struct trace_event_functions trace_event_type_funcs_##call = { \ #include TRACE_INCLUDE(TRACE_INCLUDE_FILE) #undef __field_ext -#define __field_ext(type, item, filter_type) \ - ret = trace_define_field(event_call, #type, #item, \ -offsetof(typeof(field), item), \ -sizeof(field.item),\ -
Re: [PATCH v4 15/16] module: Move where we mark modules RO,X
On Tue, Oct 22, 2019 at 01:31:16PM +0200, Heiko Carstens wrote: > On Mon, Oct 21, 2019 at 06:11:35PM +0200, Peter Zijlstra wrote: > > On Mon, Oct 21, 2019 at 05:34:25PM +0200, Peter Zijlstra wrote: > > > On Mon, Oct 21, 2019 at 04:14:02PM +0200, Peter Zijlstra wrote: > > > > > So On IRC Josh suggested we use text_poke() for RELA. Since KLP is only > > > available on Power and x86, and Power does not have STRICT_MODULE_RWX, > > > the below should be sufficient. > > > > > > Completely untested... > > > > And because s390 also has: HAVE_LIVEPATCH and STRICT_MODULE_RWX the even > > less tested s390 bits included below. > > > > Heiko, apologies if I completely wrecked it. > > > > The purpose is to remove module_disable_ro()/module_enable_ro() from > > livepatch/core.c such that: > > > > - nothing relies on where in the module loading path module text goes RX. > > - nothing ever has writable text > > Given that Steven reported a crash, I assume I can wait until you > repost a new version of the series, which also includes s390 bits? His crash is somewhat orthogonal, but yes, I will repost once sorted.
Re: [PATCH v4 15/16] module: Move where we mark modules RO,X
On Mon, Oct 21, 2019 at 06:11:35PM +0200, Peter Zijlstra wrote: > On Mon, Oct 21, 2019 at 05:34:25PM +0200, Peter Zijlstra wrote: > > On Mon, Oct 21, 2019 at 04:14:02PM +0200, Peter Zijlstra wrote: > > > So On IRC Josh suggested we use text_poke() for RELA. Since KLP is only > > available on Power and x86, and Power does not have STRICT_MODULE_RWX, > > the below should be sufficient. > > > > Completely untested... > > And because s390 also has: HAVE_LIVEPATCH and STRICT_MODULE_RWX the even > less tested s390 bits included below. > > Heiko, apologies if I completely wrecked it. > > The purpose is to remove module_disable_ro()/module_enable_ro() from > livepatch/core.c such that: > > - nothing relies on where in the module loading path module text goes RX. > - nothing ever has writable text Given that Steven reported a crash, I assume I can wait until you repost a new version of the series, which also includes s390 bits?
Re: [PATCH v4 15/16] module: Move where we mark modules RO,X
On Fri, 18 Oct 2019 09:35:40 +0200 Peter Zijlstra wrote: > Now that set_all_modules_text_*() is gone, nothing depends on the > relation between ->state = COMING and the protection state anymore. > This enables moving the protection changes later, such that the COMING > notifier callbacks can more easily modify the text. > > Signed-off-by: Peter Zijlstra (Intel) > Cc: Jessica Yu > --- This triggered the following bug: BUG: unable to handle page fault for address: a01501f1 #PF: supervisor instruction fetch in kernel mode #PF: error_code(0x0011) - permissions violation PGD 2a16067 P4D 2a16067 PUD 2a17063 PMD c230c067 PTE 8000c4d74063 Oops: 0011 [#1] PREEMPT SMP KASAN PTI CPU: 2 PID: 638 Comm: systemd-udevd Not tainted 5.4.0-rc3-test+ #98 ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver ACPI Warning: SystemIO range 0x0530-0x053F conflicts with OpRegion 0x0500-0x0563 (\GPIO) (20190816/utaddress-213) ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver ACPI Warning: SystemIO range 0x0500-0x052F conflicts with OpRegion 0x0500-0x0563 (\GPIO) (20190816/utaddress-213) ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver lpc_ich: Resource conflict(s) found affecting gpio_ich Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016 RIP: 0010:trace_event_define_fields_i2c_result+0x0/0x86 [i2c_core] Code: 27 6a 00 48 c7 c2 60 34 13 a0 45 31 c9 48 89 df 41 b8 02 00 00 00 b9 12 00 00 00 48 c7 c6 a0 33 13 a0 e8 02 ec 14 e1 5a 5b c3 <53> 48 c7 c6 20 33 13 a0 b9 08 00 00 00 41 0 6a 00 41 RSP: 0018:8880cba07950 EFLAGS: 00010246 RAX: a01501f1 RBX: a013da40 RCX: 812a147c RDX: dc00 RSI: 0008 RDI: a013da40 RBP: a0142be0 R08: ed1017fde1ab R09: ed1017fde1ab R10: ed1017fde1aa R11: 8880bfef0d57 R12: 8880cc22a000 R13: a013da50 R14: a0137aa8 R15: 8880cd372c60 FS: 7f062a48f940() GS:8880d468() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: a01501f1 CR3: cb632003 CR4: 001606e0 Call Trace: event_create_dir+0x358/0x7b0 trace_module_notify+0x20b/0x240 notifier_call_chain+0x6d/0xa0 blocking_notifier_call_chain+0x5e/0x80 load_module+0x39a5/0x3d80 ? module_frob_arch_sections+0x20/0x20 ? vfs_read+0xcc/0x1b0 ? kernel_read+0x95/0xb0 ? kernel_read_file+0x187/0x310 ? find_held_lock+0xac/0xd0 ? syscall_trace_enter+0x369/0x590 ? __do_sys_finit_module+0x11a/0x1b0 __do_sys_finit_module+0x11a/0x1b0 ? __ia32_sys_init_module+0x40/0x40 ? trace_hardirqs_on+0x2e/0x120 ? ktime_get_coarse_real_ts64+0x6c/0xf0 ? syscall_trace_enter+0x233/0x590 ? do_syscall_64+0x14/0x1a0 do_syscall_64+0x68/0x1a0 entry_SYSCALL_64_after_hwframe+0x49/0xbe Attached config, but it seems to be triggered with modules that have trace events defined in them. The trace_event_define_fields_() is defined in include/trace/trace_events.h and is an init function called by the trace_events event_create_dir() via the module notifier: MODULE_STATE_COMING -- Steve config.gz Description: application/gzip
Re: [PATCH v4 15/16] module: Move where we mark modules RO,X
On Mon, Oct 21, 2019 at 05:34:25PM +0200, Peter Zijlstra wrote: > On Mon, Oct 21, 2019 at 04:14:02PM +0200, Peter Zijlstra wrote: > So On IRC Josh suggested we use text_poke() for RELA. Since KLP is only > available on Power and x86, and Power does not have STRICT_MODULE_RWX, > the below should be sufficient. > > Completely untested... And because s390 also has: HAVE_LIVEPATCH and STRICT_MODULE_RWX the even less tested s390 bits included below. Heiko, apologies if I completely wrecked it. The purpose is to remove module_disable_ro()/module_enable_ro() from livepatch/core.c such that: - nothing relies on where in the module loading path module text goes RX. - nothing ever has writable text > --- > arch/x86/kernel/module.c | 40 +--- > include/linux/livepatch.h | 7 +++ > kernel/livepatch/core.c | 14 ++ > 3 files changed, 50 insertions(+), 11 deletions(-) > > diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c > index d5c72cb877b3..76fa2c5f2d7b 100644 > --- a/arch/x86/kernel/module.c > +++ b/arch/x86/kernel/module.c > @@ -126,11 +126,12 @@ int apply_relocate(Elf32_Shdr *sechdrs, > return 0; > } > #else /*X86_64*/ > -int apply_relocate_add(Elf64_Shdr *sechdrs, > +int __apply_relocate_add(Elf64_Shdr *sechdrs, > const char *strtab, > unsigned int symindex, > unsigned int relsec, > -struct module *me) > +struct module *me, > +void *(*write)(void *addr, const void *val, size_t size)) > { > unsigned int i; > Elf64_Rela *rel = (void *)sechdrs[relsec].sh_addr; > @@ -162,19 +163,19 @@ int apply_relocate_add(Elf64_Shdr *sechdrs, > case R_X86_64_64: > if (*(u64 *)loc != 0) > goto invalid_relocation; > - *(u64 *)loc = val; > + write(loc, &val, 8); > break; > case R_X86_64_32: > if (*(u32 *)loc != 0) > goto invalid_relocation; > - *(u32 *)loc = val; > + write(loc, &val, 4); > if (val != *(u32 *)loc) > goto overflow; > break; > case R_X86_64_32S: > if (*(s32 *)loc != 0) > goto invalid_relocation; > - *(s32 *)loc = val; > + write(loc, &val, 4); > if ((s64)val != *(s32 *)loc) > goto overflow; > break; > @@ -183,7 +184,7 @@ int apply_relocate_add(Elf64_Shdr *sechdrs, > if (*(u32 *)loc != 0) > goto invalid_relocation; > val -= (u64)loc; > - *(u32 *)loc = val; > + write(loc, &val, 4); > #if 0 > if ((s64)val != *(s32 *)loc) > goto overflow; > @@ -193,7 +194,7 @@ int apply_relocate_add(Elf64_Shdr *sechdrs, > if (*(u64 *)loc != 0) > goto invalid_relocation; > val -= (u64)loc; > - *(u64 *)loc = val; > + write(loc, &val, 8); > break; > default: > pr_err("%s: Unknown rela relocation: %llu\n", > @@ -215,6 +216,31 @@ int apply_relocate_add(Elf64_Shdr *sechdrs, > me->name); > return -ENOEXEC; > } > + > +int apply_relocate_add(Elf64_Shdr *sechdrs, > +const char *strtab, > +unsigned int symindex, > +unsigned int relsec, > +struct module *me) > +{ > + return __apply_relocate_add(sechdrs, strtab, symindex, relsec, me, > memcpy); > +} > + > +int klp_apply_relocate_add(Elf64_Shdr *sechdrs, > +const char *strtab, > +unsigned int symindex, > +unsigned int relsec, > +struct module *me) > +{ > + int ret; > + > + ret = __apply_relocate_add(sechdrs, strtab, symindex, relsec, me, > text_poke); > + if (!ret) > + text_poke_sync(); > + > + return ret; > +} > + > #endif > > int module_finalize(const Elf_Ehdr *hdr, > diff --git a/include/linux/livepatch.h b/include/linux/livepatch.h > index 273400814020..5b8c10871b70 100644 > --- a/include/linux/livepatch.h > +++ b/include/linux/livepatch.h > @@ -217,6 +217,13 @@ void *klp_shadow_get_or_alloc(void *obj, unsigned long > id, > void klp_shadow_free(void *obj, unsigned long id, klp_shadow_dtor_t dtor); > void klp_shadow_free_all(unsigned long id, klp_shadow_dtor_t dtor); > > + > +extern int klp_apply_relocate_add(Elf64_Shdr *sechdrs, > + const char *strtab, > +
Re: [PATCH v4 15/16] module: Move where we mark modules RO,X
On Mon, Oct 21, 2019 at 05:34:25PM +0200, Peter Zijlstra wrote: > So On IRC Josh suggested we use text_poke() for RELA. Since KLP is only > available on Power and x86, and Power does not have STRICT_MODULE_RWX, > the below should be sufficient. And... s390 also has HAVE_LIVEPATCH and STRICT_MODULE_RWX, so let me poke at that.
Re: [PATCH v4 15/16] module: Move where we mark modules RO,X
On Mon, Oct 21, 2019 at 04:14:02PM +0200, Peter Zijlstra wrote: > On Mon, Oct 21, 2019 at 08:53:12AM -0500, Josh Poimboeuf wrote: > > On Fri, Oct 18, 2019 at 09:35:40AM +0200, Peter Zijlstra wrote: > > > Now that set_all_modules_text_*() is gone, nothing depends on the > > > relation between ->state = COMING and the protection state anymore. > > > This enables moving the protection changes later, such that the COMING > > > notifier callbacks can more easily modify the text. > > > > > > Signed-off-by: Peter Zijlstra (Intel) > > > Cc: Jessica Yu > > > --- > > > kernel/module.c |8 > > > 1 file changed, 4 insertions(+), 4 deletions(-) > > > > > > --- a/kernel/module.c > > > +++ b/kernel/module.c > > > @@ -3683,10 +3683,6 @@ static int complete_formation(struct mod > > > /* This relies on module_mutex for list integrity. */ > > > module_bug_finalize(info->hdr, info->sechdrs, mod); > > > > > > - module_enable_ro(mod, false); > > > - module_enable_nx(mod); > > > - module_enable_x(mod); > > > - > > > /* Mark state as coming so strong_try_module_get() ignores us, > > >* but kallsyms etc. can see us. */ > > > mod->state = MODULE_STATE_COMING; > > > @@ -3852,6 +3848,10 @@ static int load_module(struct load_info > > > if (err) > > > goto bug_cleanup; > > > > > > + module_enable_ro(mod, false); > > > + module_enable_nx(mod); > > > + module_enable_x(mod); > > > + > > > /* Module is ready to execute: parsing args may do that. */ > > > after_dashes = parse_args(mod->name, mod->args, mod->kp, mod->num_kp, > > > -32768, 32767, mod, > > > > [ Sorry if this was already discussed, I still have a large backlog. ] > > > > Doesn't livepatch code also need to be modified? We have: > > Urgh bah.. I was too focussed on the other klp borkage :/ But yes, > arm64/ftrace and klp are the only two users of that function (outside of > module.c) and Mark was already writing a patch for arm64. > > Means these last two patches need to wait a little until we've fixed > those. So On IRC Josh suggested we use text_poke() for RELA. Since KLP is only available on Power and x86, and Power does not have STRICT_MODULE_RWX, the below should be sufficient. Completely untested... --- arch/x86/kernel/module.c | 40 +--- include/linux/livepatch.h | 7 +++ kernel/livepatch/core.c | 14 ++ 3 files changed, 50 insertions(+), 11 deletions(-) diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c index d5c72cb877b3..76fa2c5f2d7b 100644 --- a/arch/x86/kernel/module.c +++ b/arch/x86/kernel/module.c @@ -126,11 +126,12 @@ int apply_relocate(Elf32_Shdr *sechdrs, return 0; } #else /*X86_64*/ -int apply_relocate_add(Elf64_Shdr *sechdrs, +int __apply_relocate_add(Elf64_Shdr *sechdrs, const char *strtab, unsigned int symindex, unsigned int relsec, - struct module *me) + struct module *me, + void *(*write)(void *addr, const void *val, size_t size)) { unsigned int i; Elf64_Rela *rel = (void *)sechdrs[relsec].sh_addr; @@ -162,19 +163,19 @@ int apply_relocate_add(Elf64_Shdr *sechdrs, case R_X86_64_64: if (*(u64 *)loc != 0) goto invalid_relocation; - *(u64 *)loc = val; + write(loc, &val, 8); break; case R_X86_64_32: if (*(u32 *)loc != 0) goto invalid_relocation; - *(u32 *)loc = val; + write(loc, &val, 4); if (val != *(u32 *)loc) goto overflow; break; case R_X86_64_32S: if (*(s32 *)loc != 0) goto invalid_relocation; - *(s32 *)loc = val; + write(loc, &val, 4); if ((s64)val != *(s32 *)loc) goto overflow; break; @@ -183,7 +184,7 @@ int apply_relocate_add(Elf64_Shdr *sechdrs, if (*(u32 *)loc != 0) goto invalid_relocation; val -= (u64)loc; - *(u32 *)loc = val; + write(loc, &val, 4); #if 0 if ((s64)val != *(s32 *)loc) goto overflow; @@ -193,7 +194,7 @@ int apply_relocate_add(Elf64_Shdr *sechdrs, if (*(u64 *)loc != 0) goto invalid_relocation; val -= (u64)loc; - *(u64 *)loc = val; + write(loc, &val, 8); break; default:
Re: [PATCH v4 15/16] module: Move where we mark modules RO,X
On Mon, Oct 21, 2019 at 08:53:12AM -0500, Josh Poimboeuf wrote: > On Fri, Oct 18, 2019 at 09:35:40AM +0200, Peter Zijlstra wrote: > > Now that set_all_modules_text_*() is gone, nothing depends on the > > relation between ->state = COMING and the protection state anymore. > > This enables moving the protection changes later, such that the COMING > > notifier callbacks can more easily modify the text. > > > > Signed-off-by: Peter Zijlstra (Intel) > > Cc: Jessica Yu > > --- > > kernel/module.c |8 > > 1 file changed, 4 insertions(+), 4 deletions(-) > > > > --- a/kernel/module.c > > +++ b/kernel/module.c > > @@ -3683,10 +3683,6 @@ static int complete_formation(struct mod > > /* This relies on module_mutex for list integrity. */ > > module_bug_finalize(info->hdr, info->sechdrs, mod); > > > > - module_enable_ro(mod, false); > > - module_enable_nx(mod); > > - module_enable_x(mod); > > - > > /* Mark state as coming so strong_try_module_get() ignores us, > > * but kallsyms etc. can see us. */ > > mod->state = MODULE_STATE_COMING; > > @@ -3852,6 +3848,10 @@ static int load_module(struct load_info > > if (err) > > goto bug_cleanup; > > > > + module_enable_ro(mod, false); > > + module_enable_nx(mod); > > + module_enable_x(mod); > > + > > /* Module is ready to execute: parsing args may do that. */ > > after_dashes = parse_args(mod->name, mod->args, mod->kp, mod->num_kp, > > -32768, 32767, mod, > > [ Sorry if this was already discussed, I still have a large backlog. ] > > Doesn't livepatch code also need to be modified? We have: Urgh bah.. I was too focussed on the other klp borkage :/ But yes, arm64/ftrace and klp are the only two users of that function (outside of module.c) and Mark was already writing a patch for arm64. Means these last two patches need to wait a little until we've fixed those.
Re: [PATCH v4 15/16] module: Move where we mark modules RO,X
On Fri, Oct 18, 2019 at 09:35:40AM +0200, Peter Zijlstra wrote: > Now that set_all_modules_text_*() is gone, nothing depends on the > relation between ->state = COMING and the protection state anymore. > This enables moving the protection changes later, such that the COMING > notifier callbacks can more easily modify the text. > > Signed-off-by: Peter Zijlstra (Intel) > Cc: Jessica Yu > --- > kernel/module.c |8 > 1 file changed, 4 insertions(+), 4 deletions(-) > > --- a/kernel/module.c > +++ b/kernel/module.c > @@ -3683,10 +3683,6 @@ static int complete_formation(struct mod > /* This relies on module_mutex for list integrity. */ > module_bug_finalize(info->hdr, info->sechdrs, mod); > > - module_enable_ro(mod, false); > - module_enable_nx(mod); > - module_enable_x(mod); > - > /* Mark state as coming so strong_try_module_get() ignores us, >* but kallsyms etc. can see us. */ > mod->state = MODULE_STATE_COMING; > @@ -3852,6 +3848,10 @@ static int load_module(struct load_info > if (err) > goto bug_cleanup; > > + module_enable_ro(mod, false); > + module_enable_nx(mod); > + module_enable_x(mod); > + > /* Module is ready to execute: parsing args may do that. */ > after_dashes = parse_args(mod->name, mod->args, mod->kp, mod->num_kp, > -32768, 32767, mod, [ Sorry if this was already discussed, I still have a large backlog. ] Doesn't livepatch code also need to be modified? We have: prepare_coming_module() klp_module_coming() klp_init_object_loaded() module_disable_ro() ... module_enable_ro() which is done right before the above patch does module_enable_ro(). We could remove the disable-RO from that case, though we'd still need it for another case (late module patching). -- Josh
[PATCH v4 15/16] module: Move where we mark modules RO,X
Now that set_all_modules_text_*() is gone, nothing depends on the relation between ->state = COMING and the protection state anymore. This enables moving the protection changes later, such that the COMING notifier callbacks can more easily modify the text. Signed-off-by: Peter Zijlstra (Intel) Cc: Jessica Yu --- kernel/module.c |8 1 file changed, 4 insertions(+), 4 deletions(-) --- a/kernel/module.c +++ b/kernel/module.c @@ -3683,10 +3683,6 @@ static int complete_formation(struct mod /* This relies on module_mutex for list integrity. */ module_bug_finalize(info->hdr, info->sechdrs, mod); - module_enable_ro(mod, false); - module_enable_nx(mod); - module_enable_x(mod); - /* Mark state as coming so strong_try_module_get() ignores us, * but kallsyms etc. can see us. */ mod->state = MODULE_STATE_COMING; @@ -3852,6 +3848,10 @@ static int load_module(struct load_info if (err) goto bug_cleanup; + module_enable_ro(mod, false); + module_enable_nx(mod); + module_enable_x(mod); + /* Module is ready to execute: parsing args may do that. */ after_dashes = parse_args(mod->name, mod->args, mod->kp, mod->num_kp, -32768, 32767, mod,