Re: Dummy query on processor failover

Tom Marchant Tue, 18 Dec 2018 07:45:28 -0800

On Tue, 18 Dec 2018 16:43:50 +0800, Timothy Sipples wrote:

>Radoslaw Skorupka:
>>Let's say a CPU returns false results like 2x2=5. How to recognize
>>the result is false?
>
>The IBM Z (and LinuxONE) system handles all that for you, and without
>operating system involvement. Nowadays, thanks to the wonders of
>microelectronic miniaturization, that's through intensive, thorough
>integrity checking at all critical instruction execution steps baked deep
>into every processor, and with tons of "transistor budget" spent on
>integrity checking and other RAS characteristics. The design philosophy is
>to push error handling as far down in the "stack" as possible, and that's
>what actually happens.
>
>Yes, z/OS has an amazing amount of wonderful error handling and recovery
>logic, but the design philosophy (and reality) is "never" to invoke it....
>
>Moreover, the system doesn't even necessarily bother notifying you that
>something happened that was automatically handled with aplomb....


Nearly 40 years ago as an Amdahl SE and FE, I had numerous conversations 
with customers about memory errors. These were single-bit errors that were 
corrected by the Error Checking and Correction circuitry that was able to 
correct any single bit error and detect any double bit error that occurred 
simultaneously within a doubleword.

The Amdahl design presented a machine check interruption to report on 
every one of these, and customers would notice that their IBM processors 
didn't have memory errors. My understanding was that the IBM hardware 
at the time didn't report every single-bit error, but would only report them 
after many errors had occurred.

-- 
Tom Marchant

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

Re: Dummy query on processor failover

Reply via email to