With hyperthread turns on, the num_online_cpus reports the number of all 
logical cores. What I found in testing is only half the cores receives the mce 
broadcast, so I assume only the physical cores get broadcast. I have two 
sockets 5646 onboard. num_online_cpus() returns 24 and I only get 12 cores 
enter do_machine_check. I used both edac error injection and hardware edac 
error injector as well in my testing.

cpumask_weight(cpu_core_mask(0)) / cpu_data(0).booted_cores returns the ratio 
between logical cores and physical cores. In my case it is two.

Here is intel spec:
Processor Number  E5645 
# of Cores  6 
# of Threads  12

Ming

-----Original Message-----
From: Luck, Tony [mailto:tony.l...@intel.com] 
Sent: Friday, May 10, 2013 11:14 AM
To: Ming Lei; linux-kernel@vger.kernel.org
Cc: mche...@redhat.com; b...@alien8.de
Subject: RE: x86_mce: mce_start uses number of phsical cores instead of logical 
cores

> +#if NR_CPUS > 1
> +     cpus /= cpumask_weight(cpu_core_mask(0)) / cpu_data(0).booted_cores; 
> +#endif

Not entirely sure what you are trying to do here (apart from making "cpus"
be a smaller number).  What is the reasoning behind the right hand side of this 
expression?

Is this problem more related to how EDAC is injecting an error?  When I've used 
other methods (e.g. ACPI/EINJ) I end up with a machine check that is broadcast 
to all processors ... so "cpus = num_online_cpus()" is the correct[1] number of 
processors to wait for.

-Tony

[1] Andi may point me (again) to a fix to help deal with the case that Linux 
has taken some cpus offline. In that case this code is wrong as the "offline"
cpus will still show up for machine checks.  But there are troubling corner 
cases with the fix.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to