On Mon, Oct 30, 2017 at 11:49:54PM +0100, Fengguang Wu wrote: > On Mon, Oct 30, 2017 at 11:02:58AM +0100, Peter Zijlstra wrote: > > On Mon, Oct 30, 2017 at 07:27:36AM +0100, Fengguang Wu wrote: > > > > > [ 189.480568] perf: interrupt took too long (5132 > 4982), lowering > > > kernel.perf_event_max_sample_rate to 38000 > > > [ 189.690660] perf: interrupt took too long (6582 > 6415), lowering > > > kernel.perf_event_max_sample_rate to 30000 > > > [ 189.901706] perf: interrupt took too long (8268 > 8227), lowering > > > kernel.perf_event_max_sample_rate to 24000 > > > [ 272.841032] perfevents: irq loop stuck! > > > [ 272.841038] ------------[ cut here ]------------ > > > [ 272.841046] WARNING: CPU: 9 PID: 5377 at > > > arch/x86/events/intel/core.c:2228 intel_pmu_handle_irq+0x4a8/0x4c0 > > > > So I've not seen this in a fair while; is this new in 4.14? > > It looks a pretty old error. Here is the dmesg for 4.12: > > [ 229.514000] Test Case > count_global_group_cpu/mem-loads/_cpu/cache-references/_cpu/stalled-cycles-backend/_u > PASS! > [ 229.514002] > [ 229.519591] Test Case > count_global_group_cpu/mem-loads/_cpu/cache-references/_cpu/stalled-cycles-backend/_k > PASS! > [ 229.519594] > [ 229.521742] ROUND : perf hardware event sample group test > [ 229.521744] > [ 229.689807] perfevents: irq loop stuck! > [ 229.689807] ------------[ cut here ]------------ > [ 229.689809] WARNING: CPU: 4 PID: 23149 at > arch/x86/events/intel/core.c:2114 intel_pmu_handle_irq+0x4a8/0x4c0
> [ 229.689828] CPU: 4 PID: 23149 Comm: perf Not tainted 4.12.0 #1 > [ 229.689829] Hardware name: Dell Inc. Studio XPS 8000/0X231R, BIOS A01 > 08/11/2009 Ok, that's a NHM client if my google skillz are any good. Is there a specific workload that makes this happen more than any other? That is, what should I attempt to reproduce?