Re: Is TESTCB a bad boy ?

2020-10-26 Thread Christopher Y. Blaicher
A cache line is 256 bytes.  If the code that modifies an instruction is in the 
same cache line as the instruction, then multiple cache line refreshes have to 
happen.
The cache line for the modified instruction has to be brought into the data 
cache and modified.  This causes the cache line in the instruction cache to be 
invalidated.  The cache line then has to be refreshed into the instruction 
cache and then the next instruction re-fetched.  Doing an EX causes no cache 
line refreshes and does not interfere with the pipeline.
As I said, EX has its own set of overheads, but are a very small fraction of 
modifying an instruction.

Chris Blaicher
Technical Architect
Precisely.com


-Original Message-
From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On 
Behalf Of Farley, Peter x23353
Sent: Monday, October 26, 2020 2:33 PM
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Subject: Re: Is TESTCB a bad boy ?

This message originated Externally. Use proper judgement and caution with 
attachments, links, or responses.


Isn't that true only if the dynamically built instruction isn't in the same 
cache line as the code that performs the build and EX?

I've seen examples where the "built" instruction was a non-reentrant location 
in the same CSECT and very near to the "build" instructions.  Not my code, but 
I have seen it.

Peter

-Original Message-
From: IBM Mainframe Assembler List  On Behalf 
Of Christopher Y. Blaicher
Sent: Monday, October 26, 2020 2:07 PM
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Subject: Re: Is TESTCB a bad boy ?

This will not have the same performance hit.
The problem with the MVI was the CPU had to 1) bring the cache line into the 
data cache; 2) apply the MVI data; 3) refresh the cache line in the instruction 
cache; and finally 4) execute the instruction.
Doing an EX skips steps 1, 2 and 3.  EX has its own set of overheads, but 
nowhere near what all the cache hits have.

Chris Blaicher
Technical Architect
Precisely.com


-Original Message-
From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On 
Behalf Of Melvyn Maltz
Sent: Monday, October 26, 2020 11:16 AM
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Subject: Is TESTCB a bad boy ?

This message originated Externally. Use proper judgement and caution with 
attachments, links, or responses.


In ancient times it was common practice to do this sort of thing...

SWITCH NOP GO
  MVI  SWITCH+1,X'F0'

I believe this clears the cache and causes severe performance hits

In my research into TESTCB for the z390 Project I found that it dynamically 
builds the subject instruction for an EX

Would this have the same performance hit ?

Melvyn Maltz.

This message and any attachments are intended only for the use of the addressee 
and may contain information that is privileged and confidential. If the reader 
of the message is not the intended recipient or an authorized representative of 
the intended recipient, you are hereby notified that any dissemination of this 
communication is strictly prohibited. If you have received this communication 
in error, please notify us immediately by e-mail and delete the message and any 
attachments from your system.


Re: Is TESTCB a bad boy ?

2020-10-26 Thread Christopher Y. Blaicher
This will not have the same performance hit.
The problem with the MVI was the CPU had to 1) bring the cache line into the 
data cache; 2) apply the MVI data; 3) refresh the cache line in the instruction 
cache; and finally 4) execute the instruction.
Doing an EX skips steps 1, 2 and 3.  EX has its own set of overheads, but 
nowhere near what all the cache hits have.

Chris Blaicher
Technical Architect
Precisely.com


-Original Message-
From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On 
Behalf Of Melvyn Maltz
Sent: Monday, October 26, 2020 11:16 AM
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Subject: Is TESTCB a bad boy ?

This message originated Externally. Use proper judgement and caution with 
attachments, links, or responses.


In ancient times it was common practice to do this sort of thing...

SWITCH NOP GO
  MVI  SWITCH+1,X'F0'

I believe this clears the cache and causes severe performance hits

In my research into TESTCB for the z390 Project I found that it dynamically 
builds the subject instruction for an EX

Would this have the same performance hit ?

Melvyn Maltz.


Re: Conditional MVCL macro?

2020-10-20 Thread Christopher Y. Blaicher
The first, base code, is just the following to get the overhead of loop control;
 TIMEUSED STORADR=STARTIME,CPU=TOD,LINKAGE=SYSTEM   
LOOPBDS0H   
 L R3,POOLADDRGET FROM ADDRESS  
 L R4,TOADDR  GET TO ADDR   
 BCT   R9,LOOPB   LOOP THE NEEDED NUMBER OF TIMES   
 TIMEUSED STORADR=ENDTIME,CPU=TOD,LINKAGE=SYSTEM
 SPACE 3
 LAR1,=CL12'BASE CODE'  
 BAL   R14,TIMEOUT  

The second case was just a move of 1K using four MVC instructions in a row, 
which is the fastest.

All the others are just $MVC macro vs MVCL instruction.

Chris Blaicher
Technical Architect
Precisely.com


-Original Message-
From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On 
Behalf Of Mike Hochee
Sent: Tuesday, October 20, 2020 4:40 PM
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Subject: Re: Conditional MVCL macro?

This message originated Externally. Use proper judgement and caution with 
attachments, links, or responses.


Thanks for sharing your test results, although I had trouble explaining the 
results of the first two tests, and maybe this is related to how the $MVC macro 
does its thing.

Anyway, If you throw out the first two tests, the $MVC technique appears to be 
250-300% more efficient than the MVCL technique with lengths between 4K-64K. 
But with a length of 128K, $MVC efficiency drops down to only 60%.  My guess is 
that MVCL will eventually prove to be more efficient than $MVC with move 
lengths in excess of 256K.

I don't know if moving to/from the same storage locations makes any difference 
for this test, but assuming intentional for the purpose of controlling this as 
a variable.  There's already enough unknowns!

Again, thanks for sharing!


Re: Conditional MVCL macro?

2020-10-20 Thread Christopher Y. Blaicher
There may be a hint to the reason for the jump in the explanation of MVCLE, 
programming note 3.

"The   function of not  processing more than approximately 4K bytes of either 
operand is intended to permit software polling of a flag that may be set by a 
program on another CPU during long operations."

If a similar process happens with MVCL at the 2K boundary, that could be the 
explanation.  I'm not a hardware guy, so just guessing.

Chris Blaicher
Technical Architect
Precisely.com


-Original Message-
From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On 
Behalf Of Christopher Y. Blaicher
Sent: Tuesday, October 20, 2020 2:47 PM
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Subject: Re: Conditional MVCL macro?

This message originated Externally. Use proper judgement and caution with 
attachments, links, or responses.


I just re-ran a test on our z15 machine and got interesting numbers.  The $MVC 
was reasonably linear from start to finish.  The MVCL has a big jump from 2K to 
4K, but was also reasonably linear outside of that jump.  It never caught up to 
the $MVC implementation.

TEST TYPE =  BASE CODE
CPU TIME USED=  0.003873
TEST TYPE =  1K 4 MVC
CPU TIME USED=  0.171274
TEST TYPE =  1K $MVC
CPU TIME USED=  0.183642
TEST TYPE =  1K MVCL
CPU TIME USED=  0.345227
TEST TYPE =  2K $MVC
CPU TIME USED=  0.357314
TEST TYPE =  2K MVCL
CPU TIME USED=  0.509385
TEST TYPE =  4K $MVC
CPU TIME USED=  0.704173
TEST TYPE =  4K MVCL
CPU TIME USED=  2.790247
TEST TYPE =  8K $MVC
CPU TIME USED=  1.426892
TEST TYPE =  8K MVCL
CPU TIME USED=  5.480536
TEST TYPE =  32K $MVC
CPU TIME USED=  5.835773
TEST TYPE =  32K MVCL
CPU TIME USED= 21.734112
TEST TYPE =  64K $MVC
CPU TIME USED= 12.278130
TEST TYPE =  64K MVCL
CPU TIME USED= 43.380435
TEST TYPE =  128K $MVC
CPU TIME USED= 54.570900
TEST TYPE =  128K MVCL
CPU TIME USED= 86.739562

All the iterations used this basic set of instructions.
*
*TEST 1K $MVC
*
 SPACE ,
 L R9,REPEATCOUNT DO IT 100,000 TIMES
 TIMEUSED STORADR=STARTIME,CPU=TOD,LINKAGE=SYSTEM
LOOP1A   DS0H
 L R3,POOLADDRGET FROM ADDRESS
 L R4,TOADDR  GET TO ADDR
 L R5,=A(1024)MOVE 1K  BYTES
 $MVC  (R4),(R3),(R5) MOVE IT
 AHI   R3,1024
 AHI   R4,1024
 BCT   R9,LOOP1A  LOOP THE NEEDED NUMBER OF TIMES
 TIMEUSED STORADR=ENDTIME,CPU=TOD,LINKAGE=SYSTEM
 SPACE 3
 LAR1,=CL12'1K $MVC'
 BAL   R14,TIMEOUT
*
*TEST 1K MVCL
*
 SPACE ,
 L R9,REPEATCOUNT DO IT 100,000 TIMES
 TIMEUSED STORADR=STARTIME,CPU=TOD,LINKAGE=SYSTEM
LOOP2DS0H
 L R2,POOLADDRGET FROM ADDRESS
 L R3,=F'1024'
 L R4,TOADDR  GET TO ADDR
 L R5,=F'1024'
 MVCL  R4,R2  MOVE IT
 BCT   R9,LOOP2   LOOP THE NEEDED NUMBER OF TIMES
 TIMEUSED STORADR=ENDTIME,CPU=TOD,LINKAGE=SYSTEM
 SPACE 3
 LAR1,=CL12'1K MVCL'
 BAL   R14,TIMEOUT
The REPEATCOUNT value is 10,000,000
Both POOLADDR and TOADDR areas are 256K in size, so they both should be on page 
boundaries.

Chris Blaicher
Technical Architect
Precisely.com


-Original Message-
From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On 
Behalf Of Charles Mills
Sent: Tuesday, October 20, 2020 1:57 PM
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Subject: Re: Conditional MVCL macro?

This message originated Externally. Use proper judgement and caution with 
attachments, links, or responses.


Right.

I should have said "an interruptibility that is visible to the surrounding 
assembler instructions via the CC."

Charles


-Original Message-
From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU]
On Behalf Of Seymour J Metz
Sent: Tuesday, October 20, 2020 10:52 AM
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Subject: Re: Conditional MVCL macro?

MVCL is, and always has been, interruptible.


Re: Conditional MVCL macro?

2020-10-20 Thread Christopher Y. Blaicher
I just re-ran a test on our z15 machine and got interesting numbers.  The $MVC 
was reasonably linear from start to finish.  The MVCL has a big jump from 2K to 
4K, but was also reasonably linear outside of that jump.  It never caught up to 
the $MVC implementation.
 
TEST TYPE =  BASE CODE  
CPU TIME USED=  0.003873
TEST TYPE =  1K 4 MVC   
CPU TIME USED=  0.171274
TEST TYPE =  1K $MVC
CPU TIME USED=  0.183642
TEST TYPE =  1K MVCL
CPU TIME USED=  0.345227
TEST TYPE =  2K $MVC
CPU TIME USED=  0.357314
TEST TYPE =  2K MVCL
CPU TIME USED=  0.509385
TEST TYPE =  4K $MVC
CPU TIME USED=  0.704173
TEST TYPE =  4K MVCL
CPU TIME USED=  2.790247
TEST TYPE =  8K $MVC
CPU TIME USED=  1.426892
TEST TYPE =  8K MVCL
CPU TIME USED=  5.480536
TEST TYPE =  32K $MVC   
CPU TIME USED=  5.835773
TEST TYPE =  32K MVCL   
CPU TIME USED= 21.734112
TEST TYPE =  64K $MVC   
CPU TIME USED= 12.278130
TEST TYPE =  64K MVCL   
CPU TIME USED= 43.380435
TEST TYPE =  128K $MVC  
CPU TIME USED= 54.570900
TEST TYPE =  128K MVCL  
CPU TIME USED= 86.739562

All the iterations used this basic set of instructions.
* 
*TEST 1K $MVC 
* 
 SPACE ,  
 L R9,REPEATCOUNT DO IT 100,000 TIMES 
 TIMEUSED STORADR=STARTIME,CPU=TOD,LINKAGE=SYSTEM 
LOOP1A   DS0H 
 L R3,POOLADDRGET FROM ADDRESS
 L R4,TOADDR  GET TO ADDR 
 L R5,=A(1024)MOVE 1K  BYTES  
 $MVC  (R4),(R3),(R5) MOVE IT 
 AHI   R3,1024
 AHI   R4,1024
 BCT   R9,LOOP1A  LOOP THE NEEDED NUMBER OF TIMES 
 TIMEUSED STORADR=ENDTIME,CPU=TOD,LINKAGE=SYSTEM  
 SPACE 3  
 LAR1,=CL12'1K $MVC'  
 BAL   R14,TIMEOUT
* 
*TEST 1K MVCL 
* 
 SPACE ,  
 L R9,REPEATCOUNT DO IT 100,000 TIMES 
 TIMEUSED STORADR=STARTIME,CPU=TOD,LINKAGE=SYSTEM 
LOOP2DS0H 
 L R2,POOLADDRGET FROM ADDRESS
 L R3,=F'1024'
 L R4,TOADDR  GET TO ADDR 
 L R5,=F'1024'
 MVCL  R4,R2  MOVE IT 
 BCT   R9,LOOP2   LOOP THE NEEDED NUMBER OF TIMES 
 TIMEUSED STORADR=ENDTIME,CPU=TOD,LINKAGE=SYSTEM  
 SPACE 3  
 LAR1,=CL12'1K MVCL'  
 BAL   R14,TIMEOUT
The REPEATCOUNT value is 10,000,000
Both POOLADDR and TOADDR areas are 256K in size, so they both should be on page 
boundaries.

Chris Blaicher
Technical Architect
Precisely.com


-Original Message-
From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On 
Behalf Of Charles Mills
Sent: Tuesday, October 20, 2020 1:57 PM
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Subject: Re: Conditional MVCL macro?

This message originated Externally. Use proper judgement and caution with 
attachments, links, or responses.


Right.

I should have said "an interruptibility that is visible to the surrounding 
assembler instructions via the CC."

Charles


-Original Message-
From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU]
On Behalf Of Seymour J Metz
Sent: Tuesday, October 20, 2020 10:52 AM
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Subject: Re: Conditional MVCL macro?

MVCL is, and always has been, interruptible.


Re: Conditional MVCL macro?

2020-10-20 Thread Christopher Y. Blaicher
We just got a z15 and I have not tested MVCL vs MVC loop, but on all prior 
machines a MVC loop beat a MVCL up to about 32K.  Over 32K MVCL is the way to 
go.  In our environment we rarely are moving more than 32K.  We built a $MVC 
macro with 3 parameters, destination, source and length and use that.

FYI - MVCL is a micro-code (milli-code, call it what you want) instruction.  
There is a hefty startup and end cost to micro-code instructions.  MVCL only 
really gets going when it can use the internal move page function.  That has to 
be moving whole pages and they have to be page aligned.  CLCL and similar 
instructions, at least used to, suffer the same type of startup costs.

Chris Blaicher
Technical Architect
Precisely.com


-Original Message-
From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On 
Behalf Of Mike Hochee
Sent: Tuesday, October 20, 2020 12:40 PM
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Subject: Re: Conditional MVCL macro?

This message originated Externally. Use proper judgement and caution with 
attachments, links, or responses.


Really interesting thread to start the day with!

Our experience has been that the MVC loops are typically faster, up to a point, 
that being about 30-40 instructions in the pipeline and as mentioned,  and this 
seemed very processor dependent. However when source and target operands happen 
to both be aligned on a page boundary, then the opportunity exists for the 
async data mover to kick in if a move long is being used.  I think this applied 
to both MVCL and MVCLE, but not sure. So ideally a macro would want to utilize 
both MVCs and MVCL/E.

More grist for the mill!

-Original Message-
From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On 
Behalf Of baron_car...@technologist.com
Sent: Tuesday, October 20, 2020 12:12 PM
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Subject: Re: Conditional MVCL macro?

Caution! This message was sent from outside your organization.

The COBOL compiler for a 4000 byte move, from to the same with OPT(2) generates

LAY R10,5072(,R9)   FROM
LA  R7,1072(,R9)  TO
MVC 0(256,R7),0(R10)
MVC 256(256,R7),256(R10)
MVC 512(256,R7),512(R10)
MVC 768(256,R7),768(R10)
MVC 1024(256,R7),1024(R10)
MVC 1280(256,R7),1280(R10)
MVC 1536(256,R7),1536(R10)
MVC 1792(256,R7),1792(R10)
MVC 2048(256,R7),2048(R10)
MVC 2304(256,R7),2304(R10)
MVC 2560(256,R7),2560(R10)
MVC 2816(256,R7),2816(R10)
MVC 3072(256,R7),3072(R10)
MVC 3328(256,R7),3328(R10)
MVC 3584(256,R7),3584(R10)
MVC 3840(160,R7),3840(R10)

However for 5000 bytes it generates:

LAY R7,6072(,R9)
LA  R10,0(,R7)
LA  R7,1072(,R9)
LHI R11,0x13
EQU *
MVC 0(256,R7),0(R10)
LA  R10,256(,R10)
LA  R7,256(,R7)
BRCTR11,L0128
MVC 0(136,R7),0(R10)

And yes the change occurred at 4097  bytes.



-Original Message-
From: IBM Mainframe Assembler List  On Behalf 
Of Charles Mills
Sent: Tuesday, October 20, 2020 10:54
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Subject: Re: Conditional MVCL macro?

@Ed, can you elaborate a little on your reasoning? (Not doubting it; just
curious.) Is it that the interruptibility provides a significant improvement 
over MVCL? Or the support for lengths greater than 16M? Or ... ?

When I asked Dr. Shum about move strategies he seemed to indicate that for data 
that was already or would soon anyway be in cache an MVC loop was generally 
faster than MVCL. (I did not ask about MVCLE at the time; not sure why. He did 
not suggest it.)

Charles


-Original Message-
From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU]
On Behalf Of Ed Jaffe
Sent: Tuesday, October 20, 2020 6:52 AM
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Subject: Re: Conditional MVCL macro?

We've switched almost exclusively to MVCLE except for short, fixed-length moves.


Re: old code failing

2019-07-24 Thread Christopher Y. Blaicher
Maybe the system put the TIOT entry in the extended TIOT.  Look up EXTRACT and 
GETDSAB macros.

Chris Blaicher
Technical Architect
Syncsort, Inc.


-Original Message-
From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On 
Behalf Of retired mainframer
Sent: Wednesday, July 24, 2019 11:42 AM
To: MVS List Server 2 
Subject: Re: old code failing

Can you show us the code that does the looking?

> -Original Message-
> From: IBM Mainframe Assembler List  
> On Behalf Of Richard Kuebbing
> Sent: Wednesday, July 24, 2019 8:24 AM
> To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
> Subject: old code failing
> 
> A subroutine written long ago appears to be failing.  It looks for a 
> DDname in the TIOT.  Did the format of the TIOT change at some point?  If 
> yes, at what release?
> 
> Does it matter if AMODE is 24 or 31?
> 
> It might have something to do with the fact that the DD is a proc 
> override instead of straight JCL or in a proc.


Re: Basic question on Procesors/Instruction set

2019-03-20 Thread Christopher Y. Blaicher
Look in the HLASM Programmer guide.  There is an assembler option to tell it 
the level of machine you want it to validate the instructions against.  You 
probably want the ZS-6 options, or OPTABLE(ZS6)


Chris Blaicher
Technical Architect
Syncsort, Inc.

-Original Message-
From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On 
Behalf Of Frank M. Ramaekers
Sent: Wednesday, March 20, 2019 5:42 PM
To: MVS List Server 2 
Subject: Basic question on Procesors/Instruction set

Where does one find a table of instructions a particular processor is capable 
of?   (I have a zBC12 and get a SPEC EXCEPT on a SRST/Search string)

Frank M. Ramaekers Jr.  | Mainframe Systems Analyst I | CIS Mainframe Services 
Unisys | (512)-387-3949 | Francis.Ramaekers at Unisys.Com

[unisys_logo]

THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY 
MATERIAL and is for use only by the intended recipient. If you received this in 
error, please contact the sender and delete the e-mail and its attachments from 
all devices.
[Grey_LI]  [Grey_TW] 
  [Grey_GP] 
 [Grey_YT] 
 [Grey_FB] 
 [Grey_Vimeo]  
[Grey_UB] 

--
This message contains information which is privileged and confidential and is 
solely for the use of the intended recipient. If you are not the intended 
recipient, be aware that any review, disclosure, copying, distribution, or use 
of the contents of this message is strictly prohibited. If you have received 
this in error, please destroy it immediately and notify us at 
privacy...@torchmarkcorp.com.


Re: Instruction/Data Cache Usage (was EQU *)

2018-08-01 Thread Christopher Y. Blaicher
Inline data is no more expensive than data in another page. In either case, the 
reference to the data requires a cache line load to the D-cache, but does not 
invalidate/disturb the I-cache.

A comment on the original EQU  * part of this thread.  I prefer the DS  0H to 
hold a label because you can't get burned on someone putting something in front 
of it that isn't halfword aligned.  I personally HATE it when people put a 
label on an instruction, especially if they are using long names in the label.  
All the other op codes are in column 10 and then you have one out in right 
field.  Your eye can't just flow down the screen, now you have to go searching 
for the op code.  Also putting a label on an instruction makes it harder to 
move the instruction.

My 2 cents.

Chris Blaicher
Technical Architect
Syncsort, Inc.

-Original Message-
From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On 
Behalf Of Charles Mills
Sent: Wednesday, August 1, 2018 7:54 PM
To: MVS List Server 2 
Subject: Re: Instruction/Data Cache Usage (was EQU *)

My favorite (not!) is MODESET which generates (IIRC) in-line data and a branch 
around it and a LOAD from storage. I know it is nothing but it just annoyed me 
so much that I created my own that uses LHI and no branch.

Charles


-Original Message-
From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On 
Behalf Of Keith Moe
Sent: Wednesday, August 1, 2018 3:58 PM
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Subject: Instruction/Data Cache Usage (was EQU *)

"(working storage or stack storage)"

I interpret this is mean storage that is being ALTERED, not CONSTANTS.  I would 
think that duplicate unchanged cache lines in the instruction and data caches 
would not have the same SERIOUS penalty as altering data would.  But I am not a 
hardware engineer nor do I know if this is true or not.  

I've noticed that IBM has been changing many of their macros to generate fewer 
inline constants with branches around them and use more literals (which can 
sometime surprise you with unexpected addressability problems when the data 
suddenly move from being very local) presumably to reduce the double cache 
usage (with or without the move/copy penalty), but one of the most glaring 
mixture of instructions and data that is (potentially) updated are the CVTEXIT 
and CVTBRET instructions.  Programs invoked via system linkage have Register 14 
pointing to CVTEXIT.  The CVT is in the read/write nucleus and is not even 
cache line aligned!


Re: BAKR Instruction

2018-05-29 Thread Christopher Y. Blaicher
A quick test of performance shows that BAKR/PR is about 14 times as expensive 
as STM/LM.

I would say that in in initialization/termination code using BAKR/PR isn't 
going to hurt you, but I would totally avoid it in record level code.

Chris Blaicher
Technical Architect
Mainframe Development
P: 201-930-8234  |  M: 512-627-3803    
E: cblaic...@syncsort.com

Syncsort Incorporated 
2 Blue Hill Plaza #1563
Pearl River, NY 10965
www.syncsort.com

Data quality leader Trillium Software is now a part of Syncsort.

-Original Message-
From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On 
Behalf Of Seymour J Metz
Sent: Tuesday, May 29, 2018 2:00 PM
To: MVS List Server 2 
Subject: Re: BAKR Instruction

*Any* choice of linkage conventions imposes a dependency between the caller and 
the callee.

BTW, this is an example of why I prefer to encapsulate such things in macros.


--
Shmuel (Seymour J.) Metz
http://mason.gmu.edu/~smetz3


From: IBM Mainframe Assembler List  on behalf 
of Peter Relson 
Sent: Monday, May 28, 2018 5:57 PM
To: ASSEMBLER-LIST@listserv.uga.edu
Subject: Re: BAKR Instruction

Some things to think about:
-- Recovery routines and retry points are tied to specific linkage stack levels 
in the general case. Use of BAKR as a linkage can complicate that.
-- BAKR/PR is slower than using a typical savearea linkage.
-- You might find that use of BAKR by the caller poses an unnecessary 
dependency between the caller and the callee. Consider the alternative of 
calling via BASR, and the callee deciding whether to save/restore regs via 
BAKR/PR or via STMG(+STAM)/LMG(+LAM).

Peter Relson
z/OS Core Technology Design