----- Original Message ----- > From: "Luis Lozano" <lloz...@chromium.org> > To: "Mathieu Desnoyers" <mathieu.desnoy...@efficios.com> > Cc: "Jakub Jelinek" <ja...@redhat.com>, "Alexander Holler" > <hol...@ahsoftware.de>, "Linus Torvalds" > <torva...@linux-foundation.org>, "Richard Henderson" <r...@twiddle.net>, > "Linux Kernel Mailing List" > <linux-kernel@vger.kernel.org>, "Will Deacon" <will.dea...@arm.com>, "Catalin > Marinas" <catalin.mari...@arm.com>, > "Peter Zijlstra" <pet...@infradead.org>, lttng-...@lists.lttng.org, "Nathan > Lynch" <nathan_ly...@mentor.com>, "Paul > E. McKenney" <paul...@linux.vnet.ibm.com>, "Andrew Morton" > <a...@linux-foundation.org>, "Bhaskar Janakiraman" > <bjanakira...@chromium.org>, "Han Shen" <shen...@chromium.org> > Sent: Friday, November 22, 2013 3:18:14 AM > Subject: Re: current_thread_info() not respecting program order with gcc 4.8.x > > On Thu, Nov 21, 2013 at 7:38 PM, Mathieu Desnoyers > <mathieu.desnoy...@efficios.com> wrote: > > > > ----- Original Message ----- > > > From: "Luis Lozano" <lloz...@google.com> > > > To: "Mathieu Desnoyers" <mathieu.desnoy...@efficios.com> > > > Cc: "Jakub Jelinek" <ja...@redhat.com>, "Alexander Holler" > > > <hol...@ahsoftware.de>, "Linus Torvalds" > > > <torva...@linux-foundation.org>, "Richard Henderson" <r...@twiddle.net>, > > > "Linux Kernel Mailing List" > > > <linux-kernel@vger.kernel.org>, "Will Deacon" <will.dea...@arm.com>, > > > "Catalin Marinas" <catalin.mari...@arm.com>, > > > "Peter Zijlstra" <pet...@infradead.org>, lttng-...@lists.lttng.org, > > > "Nathan Lynch" <nathan_ly...@mentor.com>, "Paul > > > E. McKenney" <paul...@linux.vnet.ibm.com>, "Andrew Morton" > > > <a...@linux-foundation.org>, "Bhaskar Janakiraman" > > > <bjanakira...@chromium.org>, "Han Shen" <shen...@chromium.org> > > > Sent: Thursday, November 21, 2013 9:36:27 PM > > > Subject: Re: current_thread_info() not respecting program order with gcc > > > 4.8.x > > > > > > Hi Mathieu, > > > > > > Yes the problem we were seeing with GCC bug 58854 is that an interrupt > > > handler was corrupting the stack of a routine which had an invalid > > > value of SP (not really on "top" of the stack) for part of the > > > routine. > > > But in the assembly code you sent, I don't see where sp is being > > > modified... or where the access to "below" sp is happening. > > > > The following instruction > > > > f1c: e50b1048 str r1, [fp, #-72] ; 0xffffffb8 > > > > appears in the assembly generated by gcc 4.8.2, but not in the one > > generated by 4.7.3, > > > > It is there in 4.7.3 just a few lines difference. > I don't see any meaningful difference between the 2 assembly snippets... > Just the instructions ordered a little different? > > Luis
(resending reply in plain text) Assembly generated by 4.7.3 vs 4.8.2 are much more different. The diff I've presented yesterday is between 4.8.2 and 4.8.2+barrier() at the end of the function error path. When I can get a cross compiler to work, I will send out more the complete assembly listings of the function. Thanks, Mathieu > > > > > which makes me wonder if it's good or not. With > > slightly different build output from the diff (probably a different config > > from Nathan), the first function instructions look like: > > > > 00000efc <lttng_event_reserve>: > > efc: e1a0c00d mov ip, sp > > f00: e92ddff0 push {r4, r5, r6, r7, r8, r9, sl, fp, > > ip, lr, pc} > > f04: e24cb004 sub fp, ip, #4 > > f08: e24dd03c sub sp, sp, #60 ; 0x3c > > f0c: e52de004 push {lr} ; (str lr, [sp, > > #-4]!) > > f10: ebfffffe bl 0 <__gnu_mcount_nc> > > f14: e5903000 ldr r3, [r0] > > f18: e1a04000 mov r4, r0 > > f1c: e1a0000d mov r0, sp > > f20: e5936030 ldr r6, [r3, #48] ; 0x30 > > f24: e3c03d7f bic r3, r0, #8128 ; 0x1fc0 > > f28: e3c3303f bic r3, r3, #63 ; 0x3f > > f2c: e50b104c str r1, [fp, #-76] ; 0xffffffb4 > > f30: e5932004 ldr r2, [r3, #4] > > f34: e2822001 add r2, r2, #1 > > f38: e5832004 str r2, [r3, #4] > > f3c: ebfffffe bl 0 <debug_smp_processor_id> > > f40: e59f2e44 ldr r2, [pc, #3652] ; 1d8c > > <lttng_event_reserve+0xe90> > > f44: e59f3e44 ldr r3, [pc, #3652] ; 1d90 > > <lttng_event_reserve+0xe94> > > f48: e7921100 ldr r1, [r2, r0, lsl #2] > > f4c: e1a05000 mov r5, r0 > > f50: e7932001 ldr r2, [r3, r1] > > > > My rusty ARM assembler knowledge analyzes this like: > > > > ip = sp (ip being the scratch register) > > push 11 registers * 4 bytes = 44 bytes onto the stack > > fp = ip - 4 > > sp = sp - 60 (sp = ip - 104) > > push {lr} bl > > 0 <__gnu_mcount_nc> (dynamically patched to a pop {lr}) > > [...] > > store r1 into mem location fp - 76 > > > > So yes, it does look like the -76 from fp is within the stack. > > > > > > > In GCC bug 58854 it was pretty clear that the restore of the sp > > > register in the epilogue is moved to somewhere close to the prologue > > > of the routine. > > > Are we missing some diffs from the assembly comparison? > > > > My next step is to setup my own 4.8.2 ARM cross-compiler. I've tried this > > morning, > > did not manage to get one working even after following 2 howtos, trying > > with Debian > > packages, etc. Nathan told me this diff was the full comparison between the > > two > > functions, but you'll understand that at this point I want to reproduce > > everything myself, because this is a _weird_ issue. > > > > But after a long day of debugging, it's time for some sleep. I will get > > back to this > > tomorrow, > > > > Thanks, > > > > Mathieu > > > > > > > > Thanks > > > > > > Luis > > > > > > > > > > > > > > > On Thu, Nov 21, 2013 at 5:57 PM, Mathieu Desnoyers > > > <mathieu.desnoy...@efficios.com> wrote: > > > > ----- Original Message ----- > > > >> From: "Jakub Jelinek" <ja...@redhat.com> > > > >> To: "Luis Lozano" <lloz...@google.com> > > > >> Cc: "Alexander Holler" <hol...@ahsoftware.de>, "Linus Torvalds" > > > >> <torva...@linux-foundation.org>, "Mathieu Desnoyers" > > > >> <mathieu.desnoy...@efficios.com>, "Richard Henderson" > > > >> <r...@twiddle.net>, > > > >> "Linux Kernel Mailing List" > > > >> <linux-kernel@vger.kernel.org>, "Will Deacon" <will.dea...@arm.com>, > > > >> "Catalin Marinas" <catalin.mari...@arm.com>, > > > >> "Peter Zijlstra" <pet...@infradead.org>, lttng-...@lists.lttng.org, > > > >> "Nathan Lynch" <nathan_ly...@mentor.com>, "Paul > > > >> E. McKenney" <paul...@linux.vnet.ibm.com>, "Andrew Morton" > > > >> <a...@linux-foundation.org>, "Bhaskar Janakiraman" > > > >> <bjanakira...@chromium.org>, "Han Shen" <shen...@chromium.org> > > > >> Sent: Thursday, November 21, 2013 7:39:04 PM > > > >> Subject: Re: current_thread_info() not respecting program order with > > > >> gcc > > > >> 4.8.x > > > >> > > > >> On Thu, Nov 21, 2013 at 03:45:35PM -0800, Luis Lozano wrote: > > > >> > I think we need a reproducer. Without this we may all be going on > > > >> > the > > > >> > wrong path. This whole conversation started on an *assumption* that > > > >> > some accesses were being reordered. > > > >> > > > > >> > evidence of the reorder or reproducer please? > > > >> > > > >> Yeah, if a compiler bug is suspected, can anybody please open > > > >> a bugreport in http://gcc.gnu.org/bugzilla/ with the preprocessed > > > >> source, > > > >> compiler version, flags and how it was configured and some hint in > > > >> which > > > >> function to look for what exactly? We don't necessarily need a > > > >> runtime > > > >> small reproducer, but if it can be shown in the assembly what has been > > > >> reordered and why you think it shouldn't, with the above mentioned > > > >> input > > > >> that ought to be sufficient. Thanks. > > > > > > > > OK OK, let me reply on list first so I can share the result of a full > > > > day > > > > of bug hunting. We're not there yet, but many options have been > > > > eliminated. > > > > > > > > The issue shows up in stress test, when tracing with lttng-modules > > > > 2.4-rc1, > > > > on ARM. It's been reproduced with a Linux kernel 3.12 so far, with > > > > lttng-modules > > > > compiled against that kernel. > > > > > > > > First, I asked Nathan to compile his kernel with gcc 4.7, and > > > > lttng-modules > > > > with gcc 4.8.x (and vice-versa). The problem only appears when > > > > lttng-modules > > > > are compiled with gcc 4.8.x. The compiler version used to compile the > > > > rest > > > > of the kernel does not matter. > > > > > > > > Then I looked at gcc 4.8 changelog for ARM, new feature: > > > > -fno-sched-pressure > > > > (sched pressure is there by default). Nathan tried compiling > > > > lttng-modules > > > > with > > > > -fno-sched-pressure. The problem still reproduces. > > > > > > > > Knowing that adding barrier() outside of preempt_disable()/enable() was > > > > fixing the issue, we tried identifying which code location was > > > > responsible > > > > for working around the issue. Skipping a long investigation, here is > > > > the > > > > executive summary: > > > > > > > > http://git.lttng.org/?p=lttng-modules.git;a=blob;f=lttng-ring-buffer-client.h;h=50c47b3bf49f6c2dd24e250cf1a9b97808cd8e27;hb=HEAD > > > > > > > > Has the following function. We identified that adding a barrier() as > > > > shown > > > > below > > > > works around the issue: > > > > > > > > static > > > > int lttng_event_reserve(struct lib_ring_buffer_ctx *ctx, > > > > uint32_t event_id) > > > > { > > > > struct lttng_channel *lttng_chan = > > > > channel_get_private(ctx->chan); > > > > int ret, cpu; > > > > > > > > cpu = lib_ring_buffer_get_cpu(&client_config); > > > > if (cpu < 0) > > > > return -EPERM; > > > > ctx->cpu = cpu; > > > > > > > > switch (lttng_chan->header_type) { > > > > case 1: /* compact */ > > > > if (event_id > 30) > > > > ctx->rflags |= LTTNG_RFLAG_EXTENDED; > > > > break; > > > > case 2: /* large */ > > > > if (event_id > 65534) > > > > ctx->rflags |= LTTNG_RFLAG_EXTENDED; > > > > break; > > > > default: > > > > WARN_ON_ONCE(1); > > > > } > > > > > > > > ret = lib_ring_buffer_reserve(&client_config, ctx); > > > > if (ret) > > > > goto put; > > > > lttng_write_event_header(&client_config, ctx, event_id); > > > > return 0; > > > > put: > > > > lib_ring_buffer_put_cpu(&client_config); > > > > ---------------> barrier() added here; > > > > <---------------------------- > > > > return ret; > > > > } > > > > > > > > Where barrier() is the usual asm volatile with "memory" clobber, > > > > nothing > > > > else. > > > > > > > > Nathan gave me the binary diff for the assembly generated for this > > > > function > > > > without > > > > the barrier and with the barrier: > > > > > > > > --- /tmp/lttng_event_reserve-4.8.2.dump 2013-11-21 11:14:14.536495079 > > > > -0600 > > > > +++ /tmp/lttng_event_reserve-with-barrier-4.8.2.dump 2013-11-21 > > > > 14:12:52.997355907 -0600 > > > > @@ -7,11 +7,11 @@ > > > > f10: ebfffffe bl 0 <__gnu_mcount_nc> > > > > f14: e5903000 ldr r3, [r0] > > > > f18: e1a04000 mov r4, r0 > > > > - f1c: e50b1048 str r1, [fp, #-72] ; 0xffffffb8 > > > > + f1c: e1a0000d mov r0, sp > > > > f20: e5936030 ldr r6, [r3, #48] ; 0x30 > > > > - f24: e1a0000d mov r0, sp > > > > - f28: e3c03d7f bic r3, r0, #8128 ; 0x1fc0 > > > > - f2c: e3c3303f bic r3, r3, #63 ; 0x3f > > > > + f24: e3c03d7f bic r3, r0, #8128 ; 0x1fc0 > > > > + f28: e3c3303f bic r3, r3, #63 ; 0x3f > > > > + f2c: e50b1048 str r1, [fp, #-72] ; 0xffffffb8 > > > > f30: e5932004 ldr r2, [r3, #4] > > > > f34: e2822001 add r2, r2, #1 > > > > f38: e5832004 str r2, [r3, #4] > > > > @@ -53,7 +53,7 @@ > > > > fc8: e3c3303f bic r3, r3, #63 ; 0x3f > > > > fcc: e5933000 ldr r3, [r3] > > > > fd0: e3130002 tst r3, #2 > > > > - fd4: 0a000000 beq fdc <lttng_event_reserve+0xe0> > > > > + fd4: 0a0002be beq 1ad4 > > > > <lttng_event_reserve+0xbd8> > > > > fd8: ebfffffe bl 0 <preempt_schedule> > > > > fdc: ea0002bc b 1ad4 > > > > <lttng_event_reserve+0xbd8> > > > > fe0: e3500000 cmp r0, #0 > > > > > > > > We tried disabling the ftrace function tracing to get mcount out of the > > > > way, > > > > and the problem still reproduces. > > > > > > > > I'm stopping here in terms of details about the disassembly, because I > > > > need to double check with Nathan that I get the right disassembly for > > > > the > > > > right > > > > cases. I also terribly need to setup a 4.8.2 ARM cross-compiler on my > > > > machine. > > > > > > > > I'm attaching Nathan's ARM configuration. > > > > > > > > It does look behave a bit like this bug pointed out by Luis: > > > > > > > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58854 > > > > > > > > AFAIU (please correct me if I am wrong), ARM's interrupt handler run > > > > on top of the thread stack (?). If it's the case, then anything stored > > > > on the stack below "sp" could be overwritten by an interrupt handler. > > > > This would fit well the reproduction scenario for this bug: Nathan runs > > > > LTTng tracing of kmem_cache_free tracepoint with hackbench running. > > > > A race between a short window of stack use below sp and interrupt > > > > handlers > > > > would trigger with this kind of stress-test. > > > > > > > > Thoughts ? > > > > > > > > Thanks, > > > > > > > > Mathieu > > > > > > > > > > > > -- > > > > Mathieu Desnoyers > > > > EfficiOS Inc. > > > > http://www.efficios.com > > > > > > > > > > > > -- > > > > > > Luis A. Lozano | Software Engineer | lloz...@google.com | +1 > > > (408)431-5164 > > > > > > > -- > > Mathieu Desnoyers > > EfficiOS Inc. > > http://www.efficios.com > -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/