On 2014/1/14 23:05, Robert Richter wrote: > On 14.01.14 09:52:11, Weng Meiling wrote: >> On 2014/1/13 16:45, Robert Richter wrote: >>> On 20.12.13 15:49:01, Weng Meiling wrote: > >>>> The problem was once triggered on kernel 2.6.34, the main information: >>>> <3>BUG: soft lockup - CPU#0 stuck for 60005ms! [opcontrol:8673] >>>> >>>> Pid: 8673, comm: opcontrol >>>> =====================SOFTLOCKUP INFO BEGIN======================= >>>> [CPU#0] the task [opcontrol] is not waiting for a lock,maybe a delay or >>>> deadcricle! >>>> <6>opcontrol R<c> running <c> 0 8673 7603 0x00000002 >>>> locked: >>>> bf0e1928 mutex 0 [<bf0de0d8>] oprofile_start+0x10/0x68 >>>> [oprofile] >>>> bf0e1a24 mutex 0 [<bf0e07f0>] op_arm_start+0x10/0x48 >>>> [oprofile] >>>> c0628020 &ctx->mutex 0 [<c00af85c>] >>>> perf_event_create_kernel_counter+0xa4/0x14c >>> >>> I rather suspect the code of perf_install_in_context() of 2.6.34 to >>> cause the locking issue. There was a lot of rework in between there. >>> Can you further explain the locking and why your fix should solve it? >>> >> Thanks for your answer! >> The locking happens when the event's sample_period is small which leads to >> cpu >> keeping printing the warning for the triggered unregistered event. So the >> thread >> context can't be executed and trigger softlockup. >> As you said below, the patch is not appropriate, and the patch just >> prevents printing the warning and thus stays shorter in the interrupt >> handler, >> it can't solve the problem. The problem was once triggered on kernel 2.6.34, >> I'll >> try to trigger it in current kernel and resend a correct patch. > > Weng, > > so an interrupt storm due to warning messages causes the lock. > > I was looking further at it and wrote a patch that enables the event > after it was added to the perf_events list. This should fix spurious > overflows and its warning messages. Could you reproduce the issue with > a mainline kernel and then test with the patch below applied? > > Thanks, > > -Robert > >
It's my pleasure. But one more question, please see below. > From: Robert Richter <r...@kernel.org> > Date: Tue, 14 Jan 2014 15:19:54 +0100 > Subject: [PATCH] oprofile_perf > > Signed-off-by: Robert Richter <r...@kernel.org> > --- > drivers/oprofile/oprofile_perf.c | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git a/drivers/oprofile/oprofile_perf.c > b/drivers/oprofile/oprofile_perf.c > index d5b2732..2b07c95 100644 > --- a/drivers/oprofile/oprofile_perf.c > +++ b/drivers/oprofile/oprofile_perf.c > @@ -38,6 +38,9 @@ static void op_overflow_handler(struct perf_event *event, > int id; > u32 cpu = smp_processor_id(); > > + /* sync perf_events with op_create_counter(): */ > + smp_rmb(); > + > for (id = 0; id < num_counters; ++id) > if (per_cpu(perf_events, cpu)[id] == event) > break; > @@ -68,6 +71,7 @@ static void op_perf_setup(void) > attr->config = counter_config[i].event; > attr->sample_period = counter_config[i].count; > attr->pinned = 1; > + attr->disabled = 1; > } > } > > @@ -94,6 +98,11 @@ static int op_create_counter(int cpu, int event) > > per_cpu(perf_events, cpu)[event] = pevent; > > + /* sync perf_events with overflow handler: */ > + smp_wmb(); > + > + perf_event_enable(pevent); > + Should this step go before the if check:pevent->state != PERF_EVENT_STATE_ACTIVE ? Because the attr->disabled is true, So after the perf_event_create_kernel_counter the pevent->state is not PERF_EVENT_STATE_ACTIVE. > return 0; > } > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/