Hi Thomas,

Do you have any idea to fix the issue?
If you have the idea, please send me the patch.

Thanks,
Yasuaki Ishimatsu

On 01/24/2017 02:54 PM, Thomas Gleixner wrote:
On Tue, 24 Jan 2017, Yasuaki Ishimatsu wrote:
rapl_cpu_prepare() must be called after logical package id of CPU
is set by topology_update_package_map().

But when onlining hot-added CPU, rapl_cpu_prepare() is called before
setting logical package id of the hot-added CPU. So cpu_to_rapl_pmu()
in rapl_cpu_prepare() finds a rapl_pmu of wrong logical package id and
rapl_cpu_prepare() initializes the wrong rapl_pmu.

After that logical package id of the hot-added CPU is set by
topology_update_package_map(). But rapl_cpu_prepare() does
not initialize pmu of the logical package id of the hot-added CPU.
So when calling rapl_cpu_online(), cpu_to_rapl_pmu() returns NULL and
the following NULL pointer dereference occurs.

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
  IP: rapl_cpu_online+0x8d/0xb0
  <snip>
  Call Trace:
   ? rapl_cpu_offline+0xc0/0xc0
   cpuhp_invoke_callback+0x8d/0x3f0
   cpuhp_up_callbacks+0x37/0xb0
   cpuhp_thread_fun+0xc9/0xe0
   smpboot_thread_fn+0x110/0x160
   kthread+0x101/0x140
   ? sort_range+0x30/0x30
   ? kthread_park+0x90/0x90
   ret_from_fork+0x25/0x30

The patch renames rapl_cpu_prepare() to rapl_cpu_starting() and changes
the position of cpuhp_state so that rapl_cpu_starting() is called
after topology_update_package_map().

Does not work. You cannot call that callback in the starting context. It
does allocations. This needs be fixed in a different way. I'll have a look
tomorrow.

Thanks,

        tglx

Reply via email to