Re: RHEL 6.9 servers getting Unknown program exception: 0028 SMP

2018-04-16 Thread Stewart, Lee
And for anyone who has trouble IPLing CMS (or other OSs) after the trace 
command, we found doing a SYSTEM CLEAR, then the IPL cleared that issue...
Lee

Lee Stewart ● VM System Support ● Visa ● Phone:  6(750)4601 - +1-303-389-4601 ● 
lstew...@visa.com


-Original Message-
From: Linux on 390 Port [mailto:LINUX-390@VM.MARIST.EDU] On Behalf Of Bill 
Bitner
Sent: Friday, April 13, 2018 9:37 AM
To: LINUX-390@VM.MARIST.EDU
Subject: Re: RHEL 6.9 servers getting Unknown program exception: 0028 SMP

Rick,
As we discussed on phone, but for benefit of others. My last append forgot to 
point out that the TRACE needs to be done for all the virtual processors in the 
virtual machine. Otherwise only, only the base VMDBK is protected.

This can be done by using the "CPU" command:
CPU ALL CMD TRACE PROG 28 NOTERM NOPRINT RUN

I apologize for giving both of us a fright and for those following along.

Regards,
Bill
___

Bill Bitner - z/VM Customer Focus and Care - 607-429-3286 bitn...@us.ibm.com 
"Making systems practical and profitable for customers through virtualization 
and its exploitation." - z/VM

--
For LINUX-390 subscribe / signoff / archive access instructions, send email to 
lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit http://wiki.linuxvm.org/

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: RHEL 6.9 servers getting Unknown program exception: 0028 SMP

2018-04-13 Thread Bill Bitner
Rick,
As we discussed on phone, but for benefit of others. My last append forgot
to point out that the TRACE needs to be done for all the virtual processors
in the virtual machine. Otherwise only, only the base VMDBK is protected.

This can be done by using the "CPU" command:
CPU ALL CMD TRACE PROG 28 NOTERM NOPRINT RUN

I apologize for giving both of us a fright and for those following along.

Regards,
Bill
___

Bill Bitner - z/VM Customer Focus and Care - 607-429-3286
bitn...@us.ibm.com
"Making systems practical and profitable for customers through
virtualization and its exploitation." - z/VM

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: RHEL 6.9 servers getting Unknown program exception: 0028 SMP

2018-04-13 Thread Rick Barlow
The circumvention of using the TRACE PROG 28 did not work for us. We had
some servers that failed and could not re-boot - most likely related to the
trace being active.
We are booting Linux back to the older kernel level that was not
experiencing the problem - 2.6.32-696.18.7.el6.s390x.
Clearly there seems to be something on those late kernel patches that makes
the servers susceptible to this problem.
I am also pushing to get at least one LPAR IPLed with VM65414 installed.

Rick Barlow

On Thu, Apr 12, 2018 at 10:53 AM, Bill Bitner  wrote:

>
> 
>
> The real solution is to apply VM65414's PTF which is available. A
> mitigation would be to do a CP TRACE PROG 28 NOTERM NOPRINT RUN for the
> virtual machines through the console or if logged off and on, through a
> directory command statement.
>
> Thank you for understanding the communication challenges in this space. I'm
> sure you'll let us know if you need more info. :-)
>
> Regards,
> Bill
> 
> ___
>
> Bill Bitner - z/VM Customer Focus and Care - 607-429-3286
> bitn...@us.ibm.com
> "Making systems practical and profitable for customers through
> virtualization and its exploitation." - z/VM
>

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: RHEL 6.9 servers getting Unknown program exception: 0028 SMP

2018-04-12 Thread Terri C. Glowaniak
Thanks for the info Bill


Thanks,
Terri Glowaniak

Mainframe Systems Engineer
terri.glowan...@regions.com
(205) 261-6883 (W)

-Original Message-
From: Linux on 390 Port [mailto:LINUX-390@VM.MARIST.EDU] On Behalf Of Bill 
Bitner
Sent: Thursday, April 12, 2018 9:54 AM
To: LINUX-390@VM.MARIST.EDU
Subject: Re: RHEL 6.9 servers getting Unknown program exception: 0028 SMP

[External Content] Please use caution.

Rick, Terri, Greg, and others, apologies I should have given out the PTF 
numbers and a little more background on my earlier append.

APAR VM65396 PTF UM34851 is the one that introduced the prog check problem.
We discovered the problem February 26th, and marked VM65396 in error on March 
1st with VM65414 the PE fix to it. The correction to that error is included in 
APAR VM65414 PTF UM34853, which closed on March 23rd.  For other reasons, both 
of these APARs are Security/Integrity APARs, so see my earlier append on the 
IBM Z Security portal.

However, the functional problem, other than being introduced by VM65396, is not 
connected to the security/integrity aspects of these APARs. The functional 
problem is related to a virtual processor coming out of SIE for a fast path 
operation, as opposed to the typical normal exit from SIE. (For those 
unfamiliar with SIE, think of it as dispatching or running a virtual 
processor). Fast path exits are far more prevalent in a CMS environment than in 
other guest operating systems, but it is possible. In all cases, fast path 
exits are a subset, mostly a very small subset, of all exits.
There is a scenario where taking the fast path exit can erroneously cause z/VM 
to present the 028 program check. The error further involves the upper part of 
a register not being cleared. If this register's upper half contains zeroes, it 
would not trigger the error condition, but over time, it appears the upper half 
changes and the program checks start appearing.
So from that perspective it's a timing/workload dependent problem. This is a 
simplification, but I hope it's enough for you to appreciate the aspects you 
need to be aware.

The real solution is to apply VM65414's PTF which is available. A mitigation 
would be to do a CP TRACE PROG 28 NOTERM NOPRINT RUN for the virtual machines 
through the console or if logged off and on, through a directory command 
statement.

Thank you for understanding the communication challenges in this space. I'm 
sure you'll let us know if you need more info. :-)

Regards,
Bill
___

Bill Bitner - z/VM Customer Focus and Care - 607-429-3286 bitn...@us.ibm.com 
"Making systems practical and profitable for customers through virtualization 
and its exploitation." - z/VM

--
For LINUX-390 subscribe / signoff / archive access instructions, send email to 
lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit http://wiki.linuxvm.org/

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: RHEL 6.9 servers getting Unknown program exception: 0028 SMP

2018-04-12 Thread Bill Bitner
Rick, Terri, Greg, and others, apologies I should have given out the PTF
numbers and a little more background on my earlier append.

APAR VM65396 PTF UM34851 is the one that introduced the prog check problem.
We discovered the problem February 26th, and marked VM65396 in error on
March 1st with VM65414 the PE fix to it. The correction to that error is
included in APAR VM65414 PTF UM34853, which closed on March 23rd.  For
other reasons, both of these APARs are Security/Integrity APARs, so see my
earlier append on the IBM Z Security portal.

However, the functional problem, other than being introduced by VM65396, is
not connected to the security/integrity aspects of these APARs. The
functional problem is related to a virtual processor coming out of SIE for
a fast path operation, as opposed to the typical normal exit from SIE. (For
those unfamiliar with SIE, think of it as dispatching or running a virtual
processor). Fast path exits are far more prevalent in a CMS environment
than in other guest operating systems, but it is possible. In all cases,
fast path exits are a subset, mostly a very small subset, of all exits.
There is a scenario where taking the fast path exit can erroneously cause
z/VM to present the 028 program check. The error further involves the upper
part of a register not being cleared. If this register's upper half
contains zeroes, it would not trigger the error condition, but over time,
it appears the upper half changes and the program checks start appearing.
So from that perspective it's a timing/workload dependent problem. This is
a simplification, but I hope it's enough for you to appreciate the aspects
you need to be aware.

The real solution is to apply VM65414's PTF which is available. A
mitigation would be to do a CP TRACE PROG 28 NOTERM NOPRINT RUN for the
virtual machines through the console or if logged off and on, through a
directory command statement.

Thank you for understanding the communication challenges in this space. I'm
sure you'll let us know if you need more info. :-)

Regards,
Bill
___

Bill Bitner - z/VM Customer Focus and Care - 607-429-3286
bitn...@us.ibm.com
"Making systems practical and profitable for customers through
virtualization and its exploitation." - z/VM

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: RHEL 6.9 servers getting Unknown program exception: 0028 SMP

2018-04-12 Thread Rick Barlow
Thanks Bill. I was a little suspicious of VM65414. However, VM65396 has
been on these LPARs since March (before I knew about the later PTF) and we
did not see any problems until Linux got patched this week. I guess I will
have to scramble to get another VM IPL. 8-\

Rick Barlow

On Thu, Apr 12, 2018 at 8:26 AM, Bill Bitner  wrote:

> Rick, if you have VM65396 on, but not VM65414, that would be my guess.
> VM65414 corrected a problem introduced by VM65396 where a guest could
> erroneously be given a program check 28.
> 
> ___
>
> Bill Bitner - z/VM Customer Focus and Care - 607-429-3286
> bitn...@us.ibm.com
> "Making systems practical and profitable for customers through
> virtualization and its exploitation." - z/VM
>
>

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: RHEL 6.9 servers getting Unknown program exception: 0028 SMP

2018-04-12 Thread Bill Bitner
Greg,
For information on vulnerabilities, please see the IBM Z Security Portal.
Information on the portal can be found if you page down slightly on
https://www.ibm.com/it-infrastructure/z/capabilities/system-integrity

Regards,
Bill

___

Bill Bitner - z/VM Customer Focus and Care - 607-429-3286
bitn...@us.ibm.com
"Making systems practical and profitable for customers through
virtualization and its exploitation." - z/VM

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: RHEL 6.9 servers getting Unknown program exception: 0028 SMP

2018-04-12 Thread Martin Schwidefsky
On Thu, 12 Apr 2018 08:13:43 -0400
Rick Barlow  wrote:

> We have started seeing a rash of errors on some of our Linux virtual
> servers. The message we see on the virtual machine console is "Unknown
> program exception: 0028 [#1] SMP". It appears to only affect servers that
> were recently patched to kernel level "2.6.32-696.23.1.el6.s390x #1 SMP Sat
> Feb 10 11:11:31 EST 2018". It does not affect all of our servers that were
> recently patched. I suspect that it might be related to patches related to
> Spectre. I did a google search and did not get any hits on the message. I
> expect our Linux team will contact Red Hat support.

Program check 28 is ALET-specification exception. It is very unlikely that
Linux causes this exception. The only piece of code that uses the access
register mode is the clock_gettime() function in the vdso code. And then
only for the CPUCLOCK_VIRT clock source with a constant ALET.

--
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: RHEL 6.9 servers getting Unknown program exception: 0028 SMP

2018-04-12 Thread Terri C. Glowaniak
VM65414 is PE'd...  can't find anything on it now.  It looks like PTF UM34768  
is the latest out to fix this, and it came out April 4?  

I have VM65396 on, but have different Redhat levels than Rick mentioned.


Thanks,
Terri Glowaniak

Mainframe Systems Engineer
terri.glowan...@regions.com
(205) 261-6883 (W)

-Original Message-
From: Linux on 390 Port [mailto:LINUX-390@VM.MARIST.EDU] On Behalf Of Bill 
Bitner
Sent: Thursday, April 12, 2018 7:26 AM
To: LINUX-390@VM.MARIST.EDU
Subject: Re: RHEL 6.9 servers getting Unknown program exception: 0028 SMP

[External Content] Please use caution.

Rick, if you have VM65396 on, but not VM65414, that would be my guess.
VM65414 corrected a problem introduced by VM65396 where a guest could 
erroneously be given a program check 28.
___

Bill Bitner - z/VM Customer Focus and Care - 607-429-3286 bitn...@us.ibm.com 
"Making systems practical and profitable for customers through virtualization 
and its exploitation." - z/VM

--
For LINUX-390 subscribe / signoff / archive access instructions, send email to 
lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit http://wiki.linuxvm.org/

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: RHEL 6.9 servers getting Unknown program exception: 0028 SMP

2018-04-12 Thread Greg Preddy

Bill,

Has the dust settled now on the z/VM Spectre fixes?  We put on VM65396
as well on a couple of systems then halted that until further fixes arrive.

On 4/12/2018 7:26 AM, Bill Bitner wrote:

Rick, if you have VM65396 on, but not VM65414, that would be my guess.
VM65414 corrected a problem introduced by VM65396 where a guest could
erroneously be given a program check 28.
___

Bill Bitner - z/VM Customer Focus and Care - 607-429-3286
bitn...@us.ibm.com
"Making systems practical and profitable for customers through
virtualization and its exploitation." - z/VM

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/



--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: RHEL 6.9 servers getting Unknown program exception: 0028 SMP

2018-04-12 Thread Bill Bitner
Rick, if you have VM65396 on, but not VM65414, that would be my guess.
VM65414 corrected a problem introduced by VM65396 where a guest could
erroneously be given a program check 28.
___

Bill Bitner - z/VM Customer Focus and Care - 607-429-3286
bitn...@us.ibm.com
"Making systems practical and profitable for customers through
virtualization and its exploitation." - z/VM

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


RHEL 6.9 servers getting Unknown program exception: 0028 SMP

2018-04-12 Thread Rick Barlow
We have started seeing a rash of errors on some of our Linux virtual
servers. The message we see on the virtual machine console is "Unknown
program exception: 0028 [#1] SMP". It appears to only affect servers that
were recently patched to kernel level "2.6.32-696.23.1.el6.s390x #1 SMP Sat
Feb 10 11:11:31 EST 2018". It does not affect all of our servers that were
recently patched. I suspect that it might be related to patches related to
Spectre. I did a google search and did not get any hits on the message. I
expect our Linux team will contact Red Hat support.

Has anyone else seen this?

Thanks,
Rick Barlow

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/