Re: [PATCH] perf,x86: fix kernel crash with PEBS/BTS after suspend/resume

2013-03-16 Thread Parag Warudkar
On Sat, Mar 16, 2013 at 1:56 PM, Linus Torvalds
 wrote:
> On Sat, Mar 16, 2013 at 9:11 AM, Parag Warudkar  wrote:
>>
>> This seems to trigger a WARN_ON during suspend/resume.
>
> Ugh, yes. It's practically harmless, but it's ugly and technically
> wrong (we're using wrmsr_on_cpu() on our current cpu, but in a context
> where using it on anything else would be horribly broken).
>
> I think the attached patch should fix it. UNTESTED!

Applied and that seems to have suppressed the warning.

Unrelated to this now it dies in intel_pstate_timer_func doing what
seems to be a divide by zero.

Hopefully I will be able to capture the oops on camera next time.

Thanks,
Parag
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] perf,x86: fix kernel crash with PEBS/BTS after suspend/resume

2013-03-16 Thread Linus Torvalds
On Sat, Mar 16, 2013 at 9:11 AM, Parag Warudkar  wrote:
>
> This seems to trigger a WARN_ON during suspend/resume.

Ugh, yes. It's practically harmless, but it's ugly and technically
wrong (we're using wrmsr_on_cpu() on our current cpu, but in a context
where using it on anything else would be horribly broken).

I think the attached patch should fix it. UNTESTED!

  Linus


patch.diff
Description: Binary data


Re: [PATCH] perf,x86: fix kernel crash with PEBS/BTS after suspend/resume

2013-03-16 Thread Parag Warudkar
On Fri, Mar 15, 2013 at 9:26 AM, Stephane Eranian  wrote:
>
> This patch fixes a kernel crash when using precise sampling (PEBS)
> after a suspend/resume. Turns out the CPU notifier code is not invoked
> on CPU0 (BP). Therefore, the DS_AREA (used by PEBS) is not restored properly
> by the kernel and keeps it power-on/resume value of 0 causing any PEBS
> measurement to crash when running on CPU0.
>
> The workaround is to add a hook in the actual resume code to restore
> the DS Area MSR value. It is invoked for all CPUS. So for all but CPU0,
> the DS_AREA will be restored twice but this is harmless.
>
> Reported-by: Linus Torvalds 
> Signed-off-by: Stephane Eranian 
> ---
>
> diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c 
> b/arch/x86/kernel/cpu/perf_event_intel_ds.c
> index 826054a..0e9bdd3 100644
> --- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
> +++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
> @@ -729,3 +729,11 @@ void intel_ds_init(void)
> }
> }
>  }
> +
> +void perf_restore_debug_store(void)
> +{
> +   if (!x86_pmu.bts && !x86_pmu.pebs)
> +   return;
> +
> +   init_debug_store_on_cpu(smp_processor_id());
> +}

This seems to trigger a WARN_ON during suspend/resume.

[ 1479.919313] WARNING: at kernel/smp.c:244
smp_call_function_single+0x11b/0x170()
[ 1479.919314] Hardware name: iMac12,1
[ 1479.919347] Modules linked in: nfsd auth_rpcgss nfs_acl lockd
sunrpc ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE
iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4
xt_conntrack nf_conntrack xt_CHECKSUM iptable_mangle bridge stp llc
be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3
mdio libcxgbi ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core
iscsi_tcp libiscsi_tcp libiscsi rfcomm bnep scsi_transport_iscsi
nls_utf8 hfsplus arc4 ath9k ath9k_common ath9k_hw ath mac80211
snd_hda_codec_hdmi snd_hda_codec_cirrus snd_hda_intel snd_hda_codec
snd_hwdep vhost_net uvcvideo cfg80211 btusb tun macvtap macvlan
snd_seq snd_seq_device snd_pcm coretemp bluetooth tg3 snd_page_alloc
kvm_intel videobuf2_vmalloc videobuf2_memops snd_timer videobuf2_core
snd crc32c_intel
[ 1479.919361]  kvm iTCO_wdt iTCO_vendor_support videodev rfkill ptp
media ghash_clmulni_intel joydev soundcore lpc_ich pcspkr applesmc
input_polldev mfd_core i2c_i801 microcode pps_core apple_bl
binfmt_misc uinput hid_logitech_dj usb_storage radeon i915 ttm
i2c_algo_bit drm_kms_helper drm firewire_ohci firewire_core crc_itu_t
i2c_core video
[ 1479.919364] Pid: 3246, comm: pm-suspend Not tainted 3.9.0-rc2+ #2
[ 1479.919364] Call Trace:
[ 1479.919370]  [] warn_slowpath_common+0x7f/0xc0
[ 1479.919374]  [] ? __rdmsr_on_cpu+0x50/0x50
[ 1479.919376]  [] warn_slowpath_null+0x1a/0x20
[ 1479.919377]  [] smp_call_function_single+0x11b/0x170
[ 1479.919379]  [] wrmsr_on_cpu+0x43/0x50
[ 1479.919382]  [] init_debug_store_on_cpu+0x39/0x40
[ 1479.919384]  [] perf_restore_debug_store+0x21/0x30
[ 1479.919387]  [] restore_processor_state+0x225/0x230
[ 1479.919390]  [] acpi_suspend_lowlevel+0xf7/0x120
[ 1479.919393]  [] acpi_suspend_enter+0x3b/0xb7
[ 1479.919395]  [] suspend_devices_and_enter+0x37f/0x430
[ 1479.919397]  [] pm_suspend+0x19a/0x230
[ 1479.919399]  [] state_store+0x87/0xf0
[ 1479.919402]  [] kobj_attr_store+0xf/0x20
[ 1479.919405]  [] sysfs_write_file+0xd8/0x150
[ 1479.919408]  [] vfs_write+0xac/0x180
[ 1479.919410]  [] sys_write+0x52/0xa0
[ 1479.919412]  [] ? do_page_fault+0xe/0x10
[ 1479.919414]  [] system_call_fastpath+0x16/0x1b
[ 1479.919415] ---[ end trace 2af7ebe5ffee87a9 ]---
[ 1479.919416] ACPI: Low-level resume complete

--Parag
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] perf,x86: fix kernel crash with PEBS/BTS after suspend/resume

2013-03-16 Thread Parag Warudkar
On Fri, Mar 15, 2013 at 9:26 AM, Stephane Eranian eran...@google.com wrote:

 This patch fixes a kernel crash when using precise sampling (PEBS)
 after a suspend/resume. Turns out the CPU notifier code is not invoked
 on CPU0 (BP). Therefore, the DS_AREA (used by PEBS) is not restored properly
 by the kernel and keeps it power-on/resume value of 0 causing any PEBS
 measurement to crash when running on CPU0.

 The workaround is to add a hook in the actual resume code to restore
 the DS Area MSR value. It is invoked for all CPUS. So for all but CPU0,
 the DS_AREA will be restored twice but this is harmless.

 Reported-by: Linus Torvalds torva...@linux-foundation.org
 Signed-off-by: Stephane Eranian eran...@google.com
 ---

 diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c 
 b/arch/x86/kernel/cpu/perf_event_intel_ds.c
 index 826054a..0e9bdd3 100644
 --- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
 +++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
 @@ -729,3 +729,11 @@ void intel_ds_init(void)
 }
 }
  }
 +
 +void perf_restore_debug_store(void)
 +{
 +   if (!x86_pmu.bts  !x86_pmu.pebs)
 +   return;
 +
 +   init_debug_store_on_cpu(smp_processor_id());
 +}

This seems to trigger a WARN_ON during suspend/resume.

[ 1479.919313] WARNING: at kernel/smp.c:244
smp_call_function_single+0x11b/0x170()
[ 1479.919314] Hardware name: iMac12,1
[ 1479.919347] Modules linked in: nfsd auth_rpcgss nfs_acl lockd
sunrpc ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE
iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4
xt_conntrack nf_conntrack xt_CHECKSUM iptable_mangle bridge stp llc
be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3
mdio libcxgbi ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core
iscsi_tcp libiscsi_tcp libiscsi rfcomm bnep scsi_transport_iscsi
nls_utf8 hfsplus arc4 ath9k ath9k_common ath9k_hw ath mac80211
snd_hda_codec_hdmi snd_hda_codec_cirrus snd_hda_intel snd_hda_codec
snd_hwdep vhost_net uvcvideo cfg80211 btusb tun macvtap macvlan
snd_seq snd_seq_device snd_pcm coretemp bluetooth tg3 snd_page_alloc
kvm_intel videobuf2_vmalloc videobuf2_memops snd_timer videobuf2_core
snd crc32c_intel
[ 1479.919361]  kvm iTCO_wdt iTCO_vendor_support videodev rfkill ptp
media ghash_clmulni_intel joydev soundcore lpc_ich pcspkr applesmc
input_polldev mfd_core i2c_i801 microcode pps_core apple_bl
binfmt_misc uinput hid_logitech_dj usb_storage radeon i915 ttm
i2c_algo_bit drm_kms_helper drm firewire_ohci firewire_core crc_itu_t
i2c_core video
[ 1479.919364] Pid: 3246, comm: pm-suspend Not tainted 3.9.0-rc2+ #2
[ 1479.919364] Call Trace:
[ 1479.919370]  [8105e9df] warn_slowpath_common+0x7f/0xc0
[ 1479.919374]  [8131ed40] ? __rdmsr_on_cpu+0x50/0x50
[ 1479.919376]  [8105ea3a] warn_slowpath_null+0x1a/0x20
[ 1479.919377]  [810be0eb] smp_call_function_single+0x11b/0x170
[ 1479.919379]  [8131efd3] wrmsr_on_cpu+0x43/0x50
[ 1479.919382]  [81028d59] init_debug_store_on_cpu+0x39/0x40
[ 1479.919384]  [81029731] perf_restore_debug_store+0x21/0x30
[ 1479.919387]  [815363a5] restore_processor_state+0x225/0x230
[ 1479.919390]  [81036da7] acpi_suspend_lowlevel+0xf7/0x120
[ 1479.919393]  [8136075b] acpi_suspend_enter+0x3b/0xb7
[ 1479.919395]  [810a84ef] suspend_devices_and_enter+0x37f/0x430
[ 1479.919397]  [810a873a] pm_suspend+0x19a/0x230
[ 1479.919399]  [810a7577] state_store+0x87/0xf0
[ 1479.919402]  [812fba7f] kobj_attr_store+0xf/0x20
[ 1479.919405]  [81210078] sysfs_write_file+0xd8/0x150
[ 1479.919408]  [8119bccc] vfs_write+0xac/0x180
[ 1479.919410]  [8119c012] sys_write+0x52/0xa0
[ 1479.919412]  [8166369e] ? do_page_fault+0xe/0x10
[ 1479.919414]  [81667cd9] system_call_fastpath+0x16/0x1b
[ 1479.919415] ---[ end trace 2af7ebe5ffee87a9 ]---
[ 1479.919416] ACPI: Low-level resume complete

--Parag
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] perf,x86: fix kernel crash with PEBS/BTS after suspend/resume

2013-03-16 Thread Linus Torvalds
On Sat, Mar 16, 2013 at 9:11 AM, Parag Warudkar parag.l...@gmail.com wrote:

 This seems to trigger a WARN_ON during suspend/resume.

Ugh, yes. It's practically harmless, but it's ugly and technically
wrong (we're using wrmsr_on_cpu() on our current cpu, but in a context
where using it on anything else would be horribly broken).

I think the attached patch should fix it. UNTESTED!

  Linus


patch.diff
Description: Binary data


Re: [PATCH] perf,x86: fix kernel crash with PEBS/BTS after suspend/resume

2013-03-16 Thread Parag Warudkar
On Sat, Mar 16, 2013 at 1:56 PM, Linus Torvalds
torva...@linux-foundation.org wrote:
 On Sat, Mar 16, 2013 at 9:11 AM, Parag Warudkar parag.l...@gmail.com wrote:

 This seems to trigger a WARN_ON during suspend/resume.

 Ugh, yes. It's practically harmless, but it's ugly and technically
 wrong (we're using wrmsr_on_cpu() on our current cpu, but in a context
 where using it on anything else would be horribly broken).

 I think the attached patch should fix it. UNTESTED!

Applied and that seems to have suppressed the warning.

Unrelated to this now it dies in intel_pstate_timer_func doing what
seems to be a divide by zero.

Hopefully I will be able to capture the oops on camera next time.

Thanks,
Parag
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] perf,x86: fix kernel crash with PEBS/BTS after suspend/resume

2013-03-15 Thread Shuah Khan
On Fri, Mar 15, 2013 at 2:56 PM, Stephane Eranian  wrote:
> On Fri, Mar 15, 2013 at 9:53 PM, Shuah Khan  wrote:
>> On Fri, Mar 15, 2013 at 2:31 PM, Greg KH  wrote:
>>> On Fri, Mar 15, 2013 at 02:26:07PM +0100, Stephane Eranian wrote:

 This patch fixes a kernel crash when using precise sampling (PEBS)
 after a suspend/resume. Turns out the CPU notifier code is not invoked
 on CPU0 (BP). Therefore, the DS_AREA (used by PEBS) is not restored 
 properly
 by the kernel and keeps it power-on/resume value of 0 causing any PEBS
 measurement to crash when running on CPU0.

 The workaround is to add a hook in the actual resume code to restore
 the DS Area MSR value. It is invoked for all CPUS. So for all but CPU0,
 the DS_AREA will be restored twice but this is harmless.

 Reported-by: Linus Torvalds 
 Signed-off-by: Stephane Eranian 
 ---
>>>
>>> Is this needed for the 3.8 or older kernels as well?
>>>
>>> thanks,
>>>
>>> greg k-h
>>
>> Just about to ask the same question. Patch applies to 3.8, 3.4, 3.2
>> and 3.5. But needs some massaging for 3.0. I have the kernels built,
>> haven't started testing yet.
>>
> Testing the patch is easy:
>
> # echo mem >/sys/power/state
> Then press the power button again, when you get control again, type:
>
> $ taskset -c 0 perf record -e cycles:pp my_test_program
>
> Note that this problem impacts only Intel processors after Core 2
> (PEBS enabled).

Thanks. Reproduced the problem on 3.8.3, 3.4.36, and 3.0.69. Tested
the patch on 3.4 and 3.8 and the problem is fixed. I had to re-cut the
patch for 3.0. Sending it to stable tagged for 3.0

Thanks,
-- Shuah
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] perf,x86: fix kernel crash with PEBS/BTS after suspend/resume

2013-03-15 Thread Greg KH
On Fri, Mar 15, 2013 at 09:49:00PM +0100, Stephane Eranian wrote:
> On Fri, Mar 15, 2013 at 9:31 PM, Greg KH  wrote:
> >
> > On Fri, Mar 15, 2013 at 02:26:07PM +0100, Stephane Eranian wrote:
> > >
> > > This patch fixes a kernel crash when using precise sampling (PEBS)
> > > after a suspend/resume. Turns out the CPU notifier code is not invoked
> > > on CPU0 (BP). Therefore, the DS_AREA (used by PEBS) is not restored 
> > > properly
> > > by the kernel and keeps it power-on/resume value of 0 causing any PEBS
> > > measurement to crash when running on CPU0.
> > >
> > > The workaround is to add a hook in the actual resume code to restore
> > > the DS Area MSR value. It is invoked for all CPUS. So for all but CPU0,
> > > the DS_AREA will be restored twice but this is harmless.
> > >
> > > Reported-by: Linus Torvalds 
> > > Signed-off-by: Stephane Eranian 
> > > ---
> >
> > Is this needed for the 3.8 or older kernels as well?
> >
> I suspect so. If CPU0 is not covered by the cpu notifiers
> then yes, the patch is needed.

Ok, thanks, I've queued it up now there.

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] perf,x86: fix kernel crash with PEBS/BTS after suspend/resume

2013-03-15 Thread Stephane Eranian
On Fri, Mar 15, 2013 at 9:53 PM, Shuah Khan  wrote:
> On Fri, Mar 15, 2013 at 2:31 PM, Greg KH  wrote:
>> On Fri, Mar 15, 2013 at 02:26:07PM +0100, Stephane Eranian wrote:
>>>
>>> This patch fixes a kernel crash when using precise sampling (PEBS)
>>> after a suspend/resume. Turns out the CPU notifier code is not invoked
>>> on CPU0 (BP). Therefore, the DS_AREA (used by PEBS) is not restored properly
>>> by the kernel and keeps it power-on/resume value of 0 causing any PEBS
>>> measurement to crash when running on CPU0.
>>>
>>> The workaround is to add a hook in the actual resume code to restore
>>> the DS Area MSR value. It is invoked for all CPUS. So for all but CPU0,
>>> the DS_AREA will be restored twice but this is harmless.
>>>
>>> Reported-by: Linus Torvalds 
>>> Signed-off-by: Stephane Eranian 
>>> ---
>>
>> Is this needed for the 3.8 or older kernels as well?
>>
>> thanks,
>>
>> greg k-h
>
> Just about to ask the same question. Patch applies to 3.8, 3.4, 3.2
> and 3.5. But needs some massaging for 3.0. I have the kernels built,
> haven't started testing yet.
>
Testing the patch is easy:

# echo mem >/sys/power/state
Then press the power button again, when you get control again, type:

$ taskset -c 0 perf record -e cycles:pp my_test_program

Note that this problem impacts only Intel processors after Core 2
(PEBS enabled).
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] perf,x86: fix kernel crash with PEBS/BTS after suspend/resume

2013-03-15 Thread Shuah Khan
On Fri, Mar 15, 2013 at 2:31 PM, Greg KH  wrote:
> On Fri, Mar 15, 2013 at 02:26:07PM +0100, Stephane Eranian wrote:
>>
>> This patch fixes a kernel crash when using precise sampling (PEBS)
>> after a suspend/resume. Turns out the CPU notifier code is not invoked
>> on CPU0 (BP). Therefore, the DS_AREA (used by PEBS) is not restored properly
>> by the kernel and keeps it power-on/resume value of 0 causing any PEBS
>> measurement to crash when running on CPU0.
>>
>> The workaround is to add a hook in the actual resume code to restore
>> the DS Area MSR value. It is invoked for all CPUS. So for all but CPU0,
>> the DS_AREA will be restored twice but this is harmless.
>>
>> Reported-by: Linus Torvalds 
>> Signed-off-by: Stephane Eranian 
>> ---
>
> Is this needed for the 3.8 or older kernels as well?
>
> thanks,
>
> greg k-h

Just about to ask the same question. Patch applies to 3.8, 3.4, 3.2
and 3.5. But needs some massaging for 3.0. I have the kernels built,
haven't started testing yet.

-- Shuah
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] perf,x86: fix kernel crash with PEBS/BTS after suspend/resume

2013-03-15 Thread Thomas Gleixner
On Fri, 15 Mar 2013, Linus Torvalds wrote:

> On Fri, Mar 15, 2013 at 6:26 AM, Stephane Eranian  wrote:
> >
> > This patch fixes a kernel crash when using precise sampling (PEBS)
> > after a suspend/resume.
> 
> Yup, works. Applied.
> 
> Can we please get rid of the crazy CPU notifier crap from the perf
> code, and do this like we do most other wrmsr's etc? Doing
> 
> git grep "case CPU_" arch/x86/kernel/cpu
> 
> shows that the perf layer seems to be full of this kind of BS. This is
> all CPU state, it should be initialized by the regular CPU
> initialization code, not hooked up with some random callbacks.

It's on my list 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] perf,x86: fix kernel crash with PEBS/BTS after suspend/resume

2013-03-15 Thread Stephane Eranian
On Fri, Mar 15, 2013 at 9:31 PM, Greg KH  wrote:
>
> On Fri, Mar 15, 2013 at 02:26:07PM +0100, Stephane Eranian wrote:
> >
> > This patch fixes a kernel crash when using precise sampling (PEBS)
> > after a suspend/resume. Turns out the CPU notifier code is not invoked
> > on CPU0 (BP). Therefore, the DS_AREA (used by PEBS) is not restored properly
> > by the kernel and keeps it power-on/resume value of 0 causing any PEBS
> > measurement to crash when running on CPU0.
> >
> > The workaround is to add a hook in the actual resume code to restore
> > the DS Area MSR value. It is invoked for all CPUS. So for all but CPU0,
> > the DS_AREA will be restored twice but this is harmless.
> >
> > Reported-by: Linus Torvalds 
> > Signed-off-by: Stephane Eranian 
> > ---
>
> Is this needed for the 3.8 or older kernels as well?
>
I suspect so. If CPU0 is not covered by the cpu notifiers
then yes, the patch is needed.

>
> thanks,
>
> greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] perf,x86: fix kernel crash with PEBS/BTS after suspend/resume

2013-03-15 Thread Greg KH
On Fri, Mar 15, 2013 at 02:26:07PM +0100, Stephane Eranian wrote:
> 
> This patch fixes a kernel crash when using precise sampling (PEBS)
> after a suspend/resume. Turns out the CPU notifier code is not invoked
> on CPU0 (BP). Therefore, the DS_AREA (used by PEBS) is not restored properly
> by the kernel and keeps it power-on/resume value of 0 causing any PEBS
> measurement to crash when running on CPU0.
> 
> The workaround is to add a hook in the actual resume code to restore
> the DS Area MSR value. It is invoked for all CPUS. So for all but CPU0,
> the DS_AREA will be restored twice but this is harmless.
> 
> Reported-by: Linus Torvalds 
> Signed-off-by: Stephane Eranian 
> ---

Is this needed for the 3.8 or older kernels as well?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] perf,x86: fix kernel crash with PEBS/BTS after suspend/resume

2013-03-15 Thread Linus Torvalds
On Fri, Mar 15, 2013 at 6:26 AM, Stephane Eranian  wrote:
>
> This patch fixes a kernel crash when using precise sampling (PEBS)
> after a suspend/resume.

Yup, works. Applied.

Can we please get rid of the crazy CPU notifier crap from the perf
code, and do this like we do most other wrmsr's etc? Doing

git grep "case CPU_" arch/x86/kernel/cpu

shows that the perf layer seems to be full of this kind of BS. This is
all CPU state, it should be initialized by the regular CPU
initialization code, not hooked up with some random callbacks.

  Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] perf,x86: fix kernel crash with PEBS/BTS after suspend/resume

2013-03-15 Thread Linus Torvalds
On Fri, Mar 15, 2013 at 6:26 AM, Stephane Eranian eran...@google.com wrote:

 This patch fixes a kernel crash when using precise sampling (PEBS)
 after a suspend/resume.

Yup, works. Applied.

Can we please get rid of the crazy CPU notifier crap from the perf
code, and do this like we do most other wrmsr's etc? Doing

git grep case CPU_ arch/x86/kernel/cpu

shows that the perf layer seems to be full of this kind of BS. This is
all CPU state, it should be initialized by the regular CPU
initialization code, not hooked up with some random callbacks.

  Linus
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] perf,x86: fix kernel crash with PEBS/BTS after suspend/resume

2013-03-15 Thread Greg KH
On Fri, Mar 15, 2013 at 02:26:07PM +0100, Stephane Eranian wrote:
 
 This patch fixes a kernel crash when using precise sampling (PEBS)
 after a suspend/resume. Turns out the CPU notifier code is not invoked
 on CPU0 (BP). Therefore, the DS_AREA (used by PEBS) is not restored properly
 by the kernel and keeps it power-on/resume value of 0 causing any PEBS
 measurement to crash when running on CPU0.
 
 The workaround is to add a hook in the actual resume code to restore
 the DS Area MSR value. It is invoked for all CPUS. So for all but CPU0,
 the DS_AREA will be restored twice but this is harmless.
 
 Reported-by: Linus Torvalds torva...@linux-foundation.org
 Signed-off-by: Stephane Eranian eran...@google.com
 ---

Is this needed for the 3.8 or older kernels as well?

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] perf,x86: fix kernel crash with PEBS/BTS after suspend/resume

2013-03-15 Thread Stephane Eranian
On Fri, Mar 15, 2013 at 9:31 PM, Greg KH gre...@linuxfoundation.org wrote:

 On Fri, Mar 15, 2013 at 02:26:07PM +0100, Stephane Eranian wrote:
 
  This patch fixes a kernel crash when using precise sampling (PEBS)
  after a suspend/resume. Turns out the CPU notifier code is not invoked
  on CPU0 (BP). Therefore, the DS_AREA (used by PEBS) is not restored properly
  by the kernel and keeps it power-on/resume value of 0 causing any PEBS
  measurement to crash when running on CPU0.
 
  The workaround is to add a hook in the actual resume code to restore
  the DS Area MSR value. It is invoked for all CPUS. So for all but CPU0,
  the DS_AREA will be restored twice but this is harmless.
 
  Reported-by: Linus Torvalds torva...@linux-foundation.org
  Signed-off-by: Stephane Eranian eran...@google.com
  ---

 Is this needed for the 3.8 or older kernels as well?

I suspect so. If CPU0 is not covered by the cpu notifiers
then yes, the patch is needed.


 thanks,

 greg k-h
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] perf,x86: fix kernel crash with PEBS/BTS after suspend/resume

2013-03-15 Thread Thomas Gleixner
On Fri, 15 Mar 2013, Linus Torvalds wrote:

 On Fri, Mar 15, 2013 at 6:26 AM, Stephane Eranian eran...@google.com wrote:
 
  This patch fixes a kernel crash when using precise sampling (PEBS)
  after a suspend/resume.
 
 Yup, works. Applied.
 
 Can we please get rid of the crazy CPU notifier crap from the perf
 code, and do this like we do most other wrmsr's etc? Doing
 
 git grep case CPU_ arch/x86/kernel/cpu
 
 shows that the perf layer seems to be full of this kind of BS. This is
 all CPU state, it should be initialized by the regular CPU
 initialization code, not hooked up with some random callbacks.

It's on my list 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] perf,x86: fix kernel crash with PEBS/BTS after suspend/resume

2013-03-15 Thread Shuah Khan
On Fri, Mar 15, 2013 at 2:31 PM, Greg KH gre...@linuxfoundation.org wrote:
 On Fri, Mar 15, 2013 at 02:26:07PM +0100, Stephane Eranian wrote:

 This patch fixes a kernel crash when using precise sampling (PEBS)
 after a suspend/resume. Turns out the CPU notifier code is not invoked
 on CPU0 (BP). Therefore, the DS_AREA (used by PEBS) is not restored properly
 by the kernel and keeps it power-on/resume value of 0 causing any PEBS
 measurement to crash when running on CPU0.

 The workaround is to add a hook in the actual resume code to restore
 the DS Area MSR value. It is invoked for all CPUS. So for all but CPU0,
 the DS_AREA will be restored twice but this is harmless.

 Reported-by: Linus Torvalds torva...@linux-foundation.org
 Signed-off-by: Stephane Eranian eran...@google.com
 ---

 Is this needed for the 3.8 or older kernels as well?

 thanks,

 greg k-h

Just about to ask the same question. Patch applies to 3.8, 3.4, 3.2
and 3.5. But needs some massaging for 3.0. I have the kernels built,
haven't started testing yet.

-- Shuah
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] perf,x86: fix kernel crash with PEBS/BTS after suspend/resume

2013-03-15 Thread Stephane Eranian
On Fri, Mar 15, 2013 at 9:53 PM, Shuah Khan shuahk...@gmail.com wrote:
 On Fri, Mar 15, 2013 at 2:31 PM, Greg KH gre...@linuxfoundation.org wrote:
 On Fri, Mar 15, 2013 at 02:26:07PM +0100, Stephane Eranian wrote:

 This patch fixes a kernel crash when using precise sampling (PEBS)
 after a suspend/resume. Turns out the CPU notifier code is not invoked
 on CPU0 (BP). Therefore, the DS_AREA (used by PEBS) is not restored properly
 by the kernel and keeps it power-on/resume value of 0 causing any PEBS
 measurement to crash when running on CPU0.

 The workaround is to add a hook in the actual resume code to restore
 the DS Area MSR value. It is invoked for all CPUS. So for all but CPU0,
 the DS_AREA will be restored twice but this is harmless.

 Reported-by: Linus Torvalds torva...@linux-foundation.org
 Signed-off-by: Stephane Eranian eran...@google.com
 ---

 Is this needed for the 3.8 or older kernels as well?

 thanks,

 greg k-h

 Just about to ask the same question. Patch applies to 3.8, 3.4, 3.2
 and 3.5. But needs some massaging for 3.0. I have the kernels built,
 haven't started testing yet.

Testing the patch is easy:

# echo mem /sys/power/state
Then press the power button again, when you get control again, type:

$ taskset -c 0 perf record -e cycles:pp my_test_program

Note that this problem impacts only Intel processors after Core 2
(PEBS enabled).
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] perf,x86: fix kernel crash with PEBS/BTS after suspend/resume

2013-03-15 Thread Greg KH
On Fri, Mar 15, 2013 at 09:49:00PM +0100, Stephane Eranian wrote:
 On Fri, Mar 15, 2013 at 9:31 PM, Greg KH gre...@linuxfoundation.org wrote:
 
  On Fri, Mar 15, 2013 at 02:26:07PM +0100, Stephane Eranian wrote:
  
   This patch fixes a kernel crash when using precise sampling (PEBS)
   after a suspend/resume. Turns out the CPU notifier code is not invoked
   on CPU0 (BP). Therefore, the DS_AREA (used by PEBS) is not restored 
   properly
   by the kernel and keeps it power-on/resume value of 0 causing any PEBS
   measurement to crash when running on CPU0.
  
   The workaround is to add a hook in the actual resume code to restore
   the DS Area MSR value. It is invoked for all CPUS. So for all but CPU0,
   the DS_AREA will be restored twice but this is harmless.
  
   Reported-by: Linus Torvalds torva...@linux-foundation.org
   Signed-off-by: Stephane Eranian eran...@google.com
   ---
 
  Is this needed for the 3.8 or older kernels as well?
 
 I suspect so. If CPU0 is not covered by the cpu notifiers
 then yes, the patch is needed.

Ok, thanks, I've queued it up now there.

greg k-h
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] perf,x86: fix kernel crash with PEBS/BTS after suspend/resume

2013-03-15 Thread Shuah Khan
On Fri, Mar 15, 2013 at 2:56 PM, Stephane Eranian eran...@google.com wrote:
 On Fri, Mar 15, 2013 at 9:53 PM, Shuah Khan shuahk...@gmail.com wrote:
 On Fri, Mar 15, 2013 at 2:31 PM, Greg KH gre...@linuxfoundation.org wrote:
 On Fri, Mar 15, 2013 at 02:26:07PM +0100, Stephane Eranian wrote:

 This patch fixes a kernel crash when using precise sampling (PEBS)
 after a suspend/resume. Turns out the CPU notifier code is not invoked
 on CPU0 (BP). Therefore, the DS_AREA (used by PEBS) is not restored 
 properly
 by the kernel and keeps it power-on/resume value of 0 causing any PEBS
 measurement to crash when running on CPU0.

 The workaround is to add a hook in the actual resume code to restore
 the DS Area MSR value. It is invoked for all CPUS. So for all but CPU0,
 the DS_AREA will be restored twice but this is harmless.

 Reported-by: Linus Torvalds torva...@linux-foundation.org
 Signed-off-by: Stephane Eranian eran...@google.com
 ---

 Is this needed for the 3.8 or older kernels as well?

 thanks,

 greg k-h

 Just about to ask the same question. Patch applies to 3.8, 3.4, 3.2
 and 3.5. But needs some massaging for 3.0. I have the kernels built,
 haven't started testing yet.

 Testing the patch is easy:

 # echo mem /sys/power/state
 Then press the power button again, when you get control again, type:

 $ taskset -c 0 perf record -e cycles:pp my_test_program

 Note that this problem impacts only Intel processors after Core 2
 (PEBS enabled).

Thanks. Reproduced the problem on 3.8.3, 3.4.36, and 3.0.69. Tested
the patch on 3.4 and 3.8 and the problem is fixed. I had to re-cut the
patch for 3.0. Sending it to stable tagged for 3.0

Thanks,
-- Shuah
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/