Hi, We are seeing regression with our uncore perf driver(Marvell's ThunderX2, ARM64 server platform) on 5.3-Rc1. After bisecting, it turned out to be this patch causing the issue.
Test case: Load module and run perf for more than 4 events( we have 4 counters, event multiplexing takes place for more than 4 events), then unload module. With this sequence of testing, the system hangs(soft lockup) after 2 or 3 iterations. Same test runs for hours on 5.2. while [ 1 ] do rmmod thunderx2_pmu modprobe thunderx2_pmu perf stat -a -e \ uncore_dmc_0/cnt_cycles/,\ uncore_dmc_0/data_transfers/,\ uncore_dmc_0/read_txns/,\ uncore_dmc_0/config=0xE/,\ uncore_dmc_0/write_txns/ sleep 1 sleep 2 done Is this patch tested adequately? On Fri, Jun 28, 2019 at 3:18 AM Ian Rogers <irog...@google.com> wrote: > > group_index On Mon, Jun 24, 2019 at 12:55 AM Peter Zijlstra > <pet...@infradead.org> wrote: > > > > On Fri, Jun 21, 2019 at 11:01:29AM -0700, Ian Rogers wrote: > > > On Fri, Jun 21, 2019 at 1:24 AM Peter Zijlstra <pet...@infradead.org> > > > wrote: > > > > > > > > On Sat, Jun 01, 2019 at 01:27:22AM -0700, Ian Rogers wrote: > > > > > @@ -3325,6 +3331,15 @@ static int flexible_sched_in(struct perf_event > > > > > *event, void *data) > > > > > sid->can_add_hw = 0; > > > > > } > > > > > > > > > > + /* > > > > > + * If the group wasn't scheduled then set that multiplexing is > > > > > necessary > > > > > + * for the context. Note, this won't be set if the event wasn't > > > > > + * scheduled due to event_filter_match failing due to the > > > > > earlier > > > > > + * return. > > > > > + */ > > > > > + if (event->state == PERF_EVENT_STATE_INACTIVE) > > > > > + sid->ctx->rotate_necessary = 1; > > > > > + > > > > > return 0; > > > > > } > > > > > > > > That looked odd; which had me look harder at this function which > > > > resulted in the below. Should we not terminate the context interation > > > > the moment one flexible thingy fails to schedule? > > > > > > If we knew all the events were hardware events then this would be > > > true, as there may be software events that always schedule then the > > > continued iteration is necessary. > > > > But this is the 'old' code, where this is guaranteed by the context. > > That is, if this is a hardware context; there wil only be software > > events due to them being in a group with hardware events. > > > > If this is a software group, then we'll never fail to schedule and we'll > > not get in this branch to begin with. > > > > Or am I now confused for having been staring at two different code-bases > > at the same time? > > I believe you're right and I'd overlooked this. I think there is a > more efficient version of this code possible that I can follow up > with. There are 3 perf_event_contexts, potentially a sw and hw context > within the task_struct and one per-CPU in perf_cpu_context. With this > change I'm focussed on improving rotation of cgroup events that appear > system wide within the per-CPU context. Without cgroup events the > system wide events don't need to be iterated over during scheduling > in. The branch that can set rotate_necessary will only be necessary > for the task hardware events. For system wide events, considered with > cgroup mode scheduling in, the branch is necessary as rotation may be > necessary. It'd be possible to have variants of flexible_sched_in that > behave differently in the task software event context, and the system > wide and task hardware contexts. > > I have a series of patches related to Kan Liang's cgroup > perf_event_groups improvements. I'll mail these out and see if I can > avoid the branch in the task software event context case. > > Thanks, > Ian Thanks, Ganapat