Hi,

On 1/25/2024 1:19 PM, Luck, Tony wrote:
>>> The first patch adds PPIN (Protected Processor Inventory Number) field to
>>> the tracepoint.
>>>
>>> The second patch adds the microcode field (Microcode Revision) to the
>>> tracepoint.
>>
>> This is a lot of static information to add to *every* MCE.
> 
> 8 bytes for PPIN, 4 more for microcode.
> 
> Number of recoverable machine checks per system .... I hope the monthly rate 
> should
> be countable on my fingers. If a system is getting more than that, then 
> people should
> be looking at fixing the underlying problem.
> 
> Corrected errors are much more common. Though Linux takes action to limit the
> rate when storms occur. So maybe hundreds or small numbers of thousands of
> error trace records? Increase in trace buffer consumption still measured in 
> Kbytes
> not Mbytes. Server systems that do machine check reporting now start at tens 
> of
> GBytes memory.
> 
>> And where does it end? Stick full dmesg in the tracepoint too?
> 
> Seems like overkill.
> 
>> What is the real-life use case here?
> 
> Systems using rasdaemon to track errors will be able to track both of these
> (I assume that Naik has plans to update rasdaemon to capture and save these
> new fields).
> 
Yes, I do intend to submit a pull request to the rasdaemon to parse and log 
these
new fields.

> PPIN is useful when talking to the CPU vendor about patterns of similar errors
> seen across a cluster.
> 
> MICROCODE - gives a fast path to root cause problems that have already
> been fixed in a microcode update.
> 
> -Tony

-- 
Thanks,
Avadhut Naik

Reply via email to