Re: CMSCALL return code

2006-12-05 Thread George Haddad
Looking at my  yellow card , there MAY be a reason to use LA  or L for 
certain cases.
It looks like both SR and XR set the condition code, while L and LA do 
not . So if one wanted to
preserve the CC for some reason, one could be justified in coding the L 
or LA. So take a good look

at the subsequent stmts before replacing those Load opcodes !

Schuh, Richard wrote:

IIRC, the times for SR and XR were the same on the Amdahl machines, at
least on the ones that came after the 470. They may have been the same
on the 470, as well.

-Original Message-
From: The IBM z/VM Operating System [mailto:[EMAIL PROTECTED] On
Behalf Of Mike Walter
Sent: Monday, December 04, 2006 12:02 PM
To: IBMVM@LISTSERV.UARK.EDU
Subject: Re: CMSCALL return code

Sheesh, this goes way back to my good old Assembler diaper days when
programmers really cared about performance instead of drag and drop
solutions.
Slightly off-topic: if I remember correctly, we argued intensely about
zeroing a GPR and the performance differences between: 


- SR R15,R15
- XR R15,R15
- LA R15,0(not seriously considered by performance geeks)
- L R15,=F'0' (considered for use only by amateur programmers coming
from a BASIC or COBOL background and otherwise held in low esteem by
real programmers).  ;-)

IIRC, the actual performance difference between SR and XR was different
based more on specific processor models that anything else.

Mike Walter
Hewitt Associates
Any opinions expressed herein are mine alone and do not necessarily
represent the opinions or policies of Hewitt Associates.



  


Re: CMSCALL return code

2006-12-04 Thread Schuh, Richard
True, and it is undoubtedly faster to use SR  R15,R15 than it is to use
LA  R15,0 to zero the register - there are no storage fetches and real
subtraction is not needed if the result can be predicted, as it can in
this case. However, the discussion had more to do with fetches of
boundary-aligned vs. non-aligned data. There was no mention of the
optimum speed for getting either a specific or an arbitrary value loaded
into a register. In this day of pipelined machines
 
This is sort of reminiscent of the good old days, programming in 7080
Autocoder. Boeing insisted that the programmers use a MOVE macro because
there were 26 different ways to move data from one storage location to
another. It was expected that most programmers would use either their
favorite way or the first one that popped into their heads if left on
their own. The macro chose the optimal way, depending on the operand
definitions.


From: The IBM z/VM Operating System [mailto:[EMAIL PROTECTED] On
Behalf Of Stanley Rarick
Sent: Friday, December 01, 2006 10:37 PM
To: IBMVM@LISTSERV.UARK.EDU
Subject: Re: CMSCALL return code


For a return code, LA R15,value is *much* faster than a L - only one
storage fetch.

Schuh, Richard wrote:


I really would not have left it to chance, I would have defined
a
word-aligned constant rather than using a literal. However, it
might not
have been as chancy as it may seem. The literal pool is
doubleword
aligned and boundary alignment may have been a factor in
determining
where the literal resided. I would like to think that the 8-byte
multiples are put at the front, the 4-byters next, then the twos
followed by everybody else. In looking at an assembly listing,
that
seems to be the sequence. The first two literals in the program
are
=x'A00', the next =x'FF', etc. In the literal pool, all 4
byte
entries (there were no 8 byte literals) precede the two byte
literals
and then come the ones of only 1 byte. Within each of these
groups, the
literals appear in the order in which they were defined. There
were no
long strings defined as literals in the particular listing.  

-Original Message-
From: The IBM z/VM Operating System
[mailto:[EMAIL PROTECTED] On
Behalf Of Don Russell
Sent: Tuesday, November 21, 2006 3:46 PM
To: IBMVM@LISTSERV.UARK.EDU
Subject: Re: CMSCALL return code

Schuh, Richard wrote:
  

I agree, it does seem non-intuitive. The initial SR
R15,R15 was
undoubtedly preparing for a default rc of zero. How the
non-zero rc 
gets put into the register later is largely a matter of
taste. In this


case I
  

probably would have chosen L   R15,=X'...' - a habit
learned, when
machines were slower, based on the knowledge that they
were mostly 
optimized for the LOAD instruction vs. any other way of
putting data 
from memory into a register.
  



If your habit was to use L Rx,=X'...' you were probably lucky in
the old
days the =X literal would not necessarily be word-aligned,
causing
two fetches to load the register, or, in the days when alignment
really
mattered... a program exception.

Better to use L R15,=A(X'...') if alignment is a concern and you
want to
use literals.

Then the literal IS aligned on a fullword boundary.

The initial SR 15,15 is unlikely to be setting the default
return code..
.it's clearing the register preparing for the different option
bytes to
be OR'd in. I agree the macro could (should?) have generated a
single L
instruction instead, but then what nits would we have to
discuss? :-)

  



Re: CMSCALL return code

2006-12-04 Thread Mike Walter
Sheesh, this goes way back to my good old Assembler diaper days when 
programmers really cared about performance instead of drag and drop 
solutions.
Slightly off-topic: if I remember correctly, we argued intensely about 
zeroing a GPR and the performance differences between: 

- SR R15,R15
- XR R15,R15
- LA R15,0(not seriously considered by performance geeks)
- L R15,=F'0' (considered for use only by amateur programmers coming from 
a BASIC or COBOL background and otherwise held in low esteem by real 
programmers).  ;-)

IIRC, the actual performance difference between SR and XR was different 
based more on specific processor models that anything else.

Mike Walter 
Hewitt Associates 
Any opinions expressed herein are mine alone and do not necessarily 
represent the opinions or policies of Hewitt Associates.




Schuh, Richard [EMAIL PROTECTED] 

Sent by: The IBM z/VM Operating System IBMVM@LISTSERV.UARK.EDU
12/04/2006 11:37 AM
Please respond to
The IBM z/VM Operating System IBMVM@LISTSERV.UARK.EDU



To
IBMVM@LISTSERV.UARK.EDU
cc

Subject
Re: CMSCALL return code






True, and it is undoubtedly faster to use SR  R15,R15 than it is to use LA 
 R15,0 to zero the register - there are no storage fetches and real 
subtraction is not needed if the result can be predicted, as it can in 
this case. However, the discussion had more to do with fetches of 
boundary-aligned vs. non-aligned data. There was no mention of the optimum 
speed for getting either a specific or an arbitrary value loaded into a 
register. In this day of pipelined machines
 
This is sort of reminiscent of the good old days, programming in 7080 
Autocoder. Boeing insisted that the programmers use a MOVE macro because 
there were 26 different ways to move data from one storage location to 
another. It was expected that most programmers would use either their 
favorite way or the first one that popped into their heads if left on 
their own. The macro chose the optimal way, depending on the operand 
definitions.

From: The IBM z/VM Operating System [mailto:[EMAIL PROTECTED] On 
Behalf Of Stanley Rarick
Sent: Friday, December 01, 2006 10:37 PM
To: IBMVM@LISTSERV.UARK.EDU
Subject: Re: CMSCALL return code

For a return code, LA R15,value is *much* faster than a L - only one 
storage fetch.

Schuh, Richard wrote:
I really would not have left it to chance, I would have defined a
word-aligned constant rather than using a literal. However, it might not
have been as chancy as it may seem. The literal pool is doubleword
aligned and boundary alignment may have been a factor in determining
where the literal resided. I would like to think that the 8-byte
multiples are put at the front, the 4-byters next, then the twos
followed by everybody else. In looking at an assembly listing, that
seems to be the sequence. The first two literals in the program are
=x'A00', the next =x'FF', etc. In the literal pool, all 4 byte
entries (there were no 8 byte literals) precede the two byte literals
and then come the ones of only 1 byte. Within each of these groups, the
literals appear in the order in which they were defined. There were no
long strings defined as literals in the particular listing. 

-Original Message-
From: The IBM z/VM Operating System [mailto:[EMAIL PROTECTED] On
Behalf Of Don Russell
Sent: Tuesday, November 21, 2006 3:46 PM
To: IBMVM@LISTSERV.UARK.EDU
Subject: Re: CMSCALL return code

Schuh, Richard wrote:
 
I agree, it does seem non-intuitive. The initial SR   R15,R15 was
undoubtedly preparing for a default rc of zero. How the non-zero rc 
gets put into the register later is largely a matter of taste. In this
 
case I
 
probably would have chosen L   R15,=X'...' - a habit learned, when
machines were slower, based on the knowledge that they were mostly 
optimized for the LOAD instruction vs. any other way of putting data 
from memory into a register.
 
 

If your habit was to use L Rx,=X'...' you were probably lucky in the old
days the =X literal would not necessarily be word-aligned, causing
two fetches to load the register, or, in the days when alignment really
mattered... a program exception.

Better to use L R15,=A(X'...') if alignment is a concern and you want to
use literals.

Then the literal IS aligned on a fullword boundary.

The initial SR 15,15 is unlikely to be setting the default return code..
.it's clearing the register preparing for the different option bytes to
be OR'd in. I agree the macro could (should?) have generated a single L
instruction instead, but then what nits would we have to discuss? :-)

 


 
The information contained in this e-mail and any accompanying documents may 
contain information that is confidential or otherwise protected from 
disclosure. If you are not the intended recipient of this message, or if this 
message has been addressed to you in error, please immediately alert the sender 
by reply e-mail and then delete this message, including any attachments. Any 
dissemination, distribution

Re: CMSCALL return code

2006-12-04 Thread Ray Mullins
There is a new option now, especially with non-zero codes:

LHI  R15,4

No storage fetch.

The subject of instruction timings on IBM-MAIN and ASSEMBLER-LIST comes up
now and then.  I point y'all to the archives of both lists.  With the new
z/Architecture pipelines and caches, sometimes what seems at first to be
illogical instruction placement may actually be better.  Hypothetical
illustration example:


LR4,RECPTRLoad address of pointer
AHI  R6,1 Add 1 to counter
AHI  R8,(-8)  Some other strange counter
CLI  16(R4),X'40'
JE   GOHERE

The z/Architecture processor will execute the two AHI instructions while the
base/displacement calculation and storage access for the L instruction is
occurring, because it knows that R4 isn't affected by those instructions.
By the time the CLI is hit R4 will contain the address and there is no delay
that might occur if you code

AHI  R6,1 Add 1 to counter
AHI  R8,(-8)  Some other strange counter
LR4,RECPTRLoad address of pointer
CLI  16(R4),X'40'
JE   GOHERE

In this case, there might be a delay at the CLI.

Speaking of branches there's been an interesting discussion recently about
the branch-prediction logic in z/Architecture, which is why I demonstrate
with the RI (or is it IR? I can never remember) instruction.

Later,
Ray

-Original Message-
From: The IBM z/VM Operating System [mailto:[EMAIL PROTECTED] On
Behalf Of Mike Walter
Sent: Monday December 04 2006 12:02
To: IBMVM@LISTSERV.UARK.EDU
Subject: Re: CMSCALL return code

Sheesh, this goes way back to my good old Assembler diaper days when
programmers really cared about performance instead of drag and drop
solutions.
Slightly off-topic: if I remember correctly, we argued intensely about
zeroing a GPR and the performance differences between: 

- SR R15,R15
- XR R15,R15
- LA R15,0(not seriously considered by performance geeks)
- L R15,=F'0' (considered for use only by amateur programmers coming from a
BASIC or COBOL background and otherwise held in low esteem by real
programmers).  ;-)

IIRC, the actual performance difference between SR and XR was different
based more on specific processor models that anything else.

Mike Walter
Hewitt Associates
Any opinions expressed herein are mine alone and do not necessarily
represent the opinions or policies of Hewitt Associates.




Schuh, Richard [EMAIL PROTECTED] 

Sent by: The IBM z/VM Operating System IBMVM@LISTSERV.UARK.EDU
12/04/2006 11:37 AM
Please respond to
The IBM z/VM Operating System IBMVM@LISTSERV.UARK.EDU



To
IBMVM@LISTSERV.UARK.EDU
cc

Subject
Re: CMSCALL return code






True, and it is undoubtedly faster to use SR  R15,R15 than it is to use LA
R15,0 to zero the register - there are no storage fetches and real
subtraction is not needed if the result can be predicted, as it can in this
case. However, the discussion had more to do with fetches of
boundary-aligned vs. non-aligned data. There was no mention of the optimum
speed for getting either a specific or an arbitrary value loaded into a
register. In this day of pipelined machines
 
This is sort of reminiscent of the good old days, programming in 7080
Autocoder. Boeing insisted that the programmers use a MOVE macro because
there were 26 different ways to move data from one storage location to
another. It was expected that most programmers would use either their
favorite way or the first one that popped into their heads if left on their
own. The macro chose the optimal way, depending on the operand definitions.

From: The IBM z/VM Operating System [mailto:[EMAIL PROTECTED] On
Behalf Of Stanley Rarick
Sent: Friday, December 01, 2006 10:37 PM
To: IBMVM@LISTSERV.UARK.EDU
Subject: Re: CMSCALL return code

For a return code, LA R15,value is *much* faster than a L - only one storage
fetch.

Schuh, Richard wrote:
I really would not have left it to chance, I would have defined a
word-aligned constant rather than using a literal. However, it might not
have been as chancy as it may seem. The literal pool is doubleword aligned
and boundary alignment may have been a factor in determining where the
literal resided. I would like to think that the 8-byte multiples are put at
the front, the 4-byters next, then the twos followed by everybody else. In
looking at an assembly listing, that seems to be the sequence. The first two
literals in the program are =x'A00', the next =x'FF', etc. In the
literal pool, all 4 byte entries (there were no 8 byte literals) precede the
two byte literals and then come the ones of only 1 byte. Within each of
these groups, the literals appear in the order in which they were defined.
There were no long strings defined as literals in the particular listing. 

-Original Message-
From: The IBM z/VM Operating System [mailto:[EMAIL PROTECTED] On
Behalf Of Don Russell
Sent: Tuesday, November 21, 2006 3:46 PM
To: IBMVM@LISTSERV.UARK.EDU
Subject: Re: CMSCALL return code

Schuh, Richard wrote:
 
I agree, it does seem non-intuitive. The initial SR

Re: CMSCALL return code

2006-12-04 Thread Schuh, Richard
IIRC, the times for SR and XR were the same on the Amdahl machines, at
least on the ones that came after the 470. They may have been the same
on the 470, as well.

-Original Message-
From: The IBM z/VM Operating System [mailto:[EMAIL PROTECTED] On
Behalf Of Mike Walter
Sent: Monday, December 04, 2006 12:02 PM
To: IBMVM@LISTSERV.UARK.EDU
Subject: Re: CMSCALL return code

Sheesh, this goes way back to my good old Assembler diaper days when
programmers really cared about performance instead of drag and drop
solutions.
Slightly off-topic: if I remember correctly, we argued intensely about
zeroing a GPR and the performance differences between: 

- SR R15,R15
- XR R15,R15
- LA R15,0(not seriously considered by performance geeks)
- L R15,=F'0' (considered for use only by amateur programmers coming
from a BASIC or COBOL background and otherwise held in low esteem by
real programmers).  ;-)

IIRC, the actual performance difference between SR and XR was different
based more on specific processor models that anything else.

Mike Walter
Hewitt Associates
Any opinions expressed herein are mine alone and do not necessarily
represent the opinions or policies of Hewitt Associates.




Schuh, Richard [EMAIL PROTECTED] 

Sent by: The IBM z/VM Operating System IBMVM@LISTSERV.UARK.EDU
12/04/2006 11:37 AM
Please respond to
The IBM z/VM Operating System IBMVM@LISTSERV.UARK.EDU



To
IBMVM@LISTSERV.UARK.EDU
cc

Subject
Re: CMSCALL return code






True, and it is undoubtedly faster to use SR  R15,R15 than it is to use
LA  R15,0 to zero the register - there are no storage fetches and real
subtraction is not needed if the result can be predicted, as it can in
this case. However, the discussion had more to do with fetches of
boundary-aligned vs. non-aligned data. There was no mention of the
optimum speed for getting either a specific or an arbitrary value loaded
into a register. In this day of pipelined machines
 
This is sort of reminiscent of the good old days, programming in 7080
Autocoder. Boeing insisted that the programmers use a MOVE macro because
there were 26 different ways to move data from one storage location to
another. It was expected that most programmers would use either their
favorite way or the first one that popped into their heads if left on
their own. The macro chose the optimal way, depending on the operand
definitions.

From: The IBM z/VM Operating System [mailto:[EMAIL PROTECTED] On
Behalf Of Stanley Rarick
Sent: Friday, December 01, 2006 10:37 PM
To: IBMVM@LISTSERV.UARK.EDU
Subject: Re: CMSCALL return code

For a return code, LA R15,value is *much* faster than a L - only one
storage fetch.

Schuh, Richard wrote:
I really would not have left it to chance, I would have defined a
word-aligned constant rather than using a literal. However, it might not
have been as chancy as it may seem. The literal pool is doubleword
aligned and boundary alignment may have been a factor in determining
where the literal resided. I would like to think that the 8-byte
multiples are put at the front, the 4-byters next, then the twos
followed by everybody else. In looking at an assembly listing, that
seems to be the sequence. The first two literals in the program are
=x'A00', the next =x'FF', etc. In the literal pool, all 4 byte
entries (there were no 8 byte literals) precede the two byte literals
and then come the ones of only 1 byte. Within each of these groups, the
literals appear in the order in which they were defined. There were no
long strings defined as literals in the particular listing. 

-Original Message-
From: The IBM z/VM Operating System [mailto:[EMAIL PROTECTED] On
Behalf Of Don Russell
Sent: Tuesday, November 21, 2006 3:46 PM
To: IBMVM@LISTSERV.UARK.EDU
Subject: Re: CMSCALL return code

Schuh, Richard wrote:
 
I agree, it does seem non-intuitive. The initial SR   R15,R15 was
undoubtedly preparing for a default rc of zero. How the non-zero rc gets
put into the register later is largely a matter of taste. In this
 
case I
 
probably would have chosen L   R15,=X'...' - a habit learned, when
machines were slower, based on the knowledge that they were mostly
optimized for the LOAD instruction vs. any other way of putting data
from memory into a register.
 
 

If your habit was to use L Rx,=X'...' you were probably lucky in the old
days the =X literal would not necessarily be word-aligned, causing
two fetches to load the register, or, in the days when alignment really
mattered... a program exception.

Better to use L R15,=A(X'...') if alignment is a concern and you want to
use literals.

Then the literal IS aligned on a fullword boundary.

The initial SR 15,15 is unlikely to be setting the default return code..
.it's clearing the register preparing for the different option bytes to
be OR'd in. I agree the macro could (should?) have generated a single L
instruction instead, but then what nits would we have to discuss? :-)

 


 
The information contained in this e-mail and any accompanying

Re: CMSCALL return code

2006-12-04 Thread Schuh, Richard
Pipelining a machine and adding caches does throw a monkey wrench into
the discussion. Add interrupts and you really have a mess. That is one
reason why the performance guys like to preface every sentence with
YMMV or It depends :-)  

-Original Message-
From: The IBM z/VM Operating System [mailto:[EMAIL PROTECTED] On
Behalf Of Ray Mullins
Sent: Monday, December 04, 2006 12:40 PM
To: IBMVM@LISTSERV.UARK.EDU
Subject: Re: CMSCALL return code

There is a new option now, especially with non-zero codes:

LHI  R15,4

No storage fetch.

The subject of instruction timings on IBM-MAIN and ASSEMBLER-LIST comes
up now and then.  I point y'all to the archives of both lists.  With the
new z/Architecture pipelines and caches, sometimes what seems at first
to be illogical instruction placement may actually be better.
Hypothetical illustration example:


LR4,RECPTRLoad address of pointer
AHI  R6,1 Add 1 to counter
AHI  R8,(-8)  Some other strange counter
CLI  16(R4),X'40'
JE   GOHERE

The z/Architecture processor will execute the two AHI instructions while
the base/displacement calculation and storage access for the L
instruction is occurring, because it knows that R4 isn't affected by
those instructions.
By the time the CLI is hit R4 will contain the address and there is no
delay that might occur if you code

AHI  R6,1 Add 1 to counter
AHI  R8,(-8)  Some other strange counter
LR4,RECPTRLoad address of pointer
CLI  16(R4),X'40'
JE   GOHERE

In this case, there might be a delay at the CLI.

Speaking of branches there's been an interesting discussion recently
about the branch-prediction logic in z/Architecture, which is why I
demonstrate with the RI (or is it IR? I can never remember)
instruction.

Later,
Ray

-Original Message-
From: The IBM z/VM Operating System [mailto:[EMAIL PROTECTED] On
Behalf Of Mike Walter
Sent: Monday December 04 2006 12:02
To: IBMVM@LISTSERV.UARK.EDU
Subject: Re: CMSCALL return code

Sheesh, this goes way back to my good old Assembler diaper days when
programmers really cared about performance instead of drag and drop
solutions.
Slightly off-topic: if I remember correctly, we argued intensely about
zeroing a GPR and the performance differences between: 

- SR R15,R15
- XR R15,R15
- LA R15,0(not seriously considered by performance geeks)
- L R15,=F'0' (considered for use only by amateur programmers coming
from a BASIC or COBOL background and otherwise held in low esteem by
real programmers).  ;-)

IIRC, the actual performance difference between SR and XR was different
based more on specific processor models that anything else.

Mike Walter
Hewitt Associates
Any opinions expressed herein are mine alone and do not necessarily
represent the opinions or policies of Hewitt Associates.




Schuh, Richard [EMAIL PROTECTED] 

Sent by: The IBM z/VM Operating System IBMVM@LISTSERV.UARK.EDU
12/04/2006 11:37 AM
Please respond to
The IBM z/VM Operating System IBMVM@LISTSERV.UARK.EDU



To
IBMVM@LISTSERV.UARK.EDU
cc

Subject
Re: CMSCALL return code






True, and it is undoubtedly faster to use SR  R15,R15 than it is to use
LA R15,0 to zero the register - there are no storage fetches and real
subtraction is not needed if the result can be predicted, as it can in
this case. However, the discussion had more to do with fetches of
boundary-aligned vs. non-aligned data. There was no mention of the
optimum speed for getting either a specific or an arbitrary value loaded
into a register. In this day of pipelined machines
 
This is sort of reminiscent of the good old days, programming in 7080
Autocoder. Boeing insisted that the programmers use a MOVE macro because
there were 26 different ways to move data from one storage location to
another. It was expected that most programmers would use either their
favorite way or the first one that popped into their heads if left on
their own. The macro chose the optimal way, depending on the operand
definitions.

From: The IBM z/VM Operating System [mailto:[EMAIL PROTECTED] On
Behalf Of Stanley Rarick
Sent: Friday, December 01, 2006 10:37 PM
To: IBMVM@LISTSERV.UARK.EDU
Subject: Re: CMSCALL return code

For a return code, LA R15,value is *much* faster than a L - only one
storage fetch.

Schuh, Richard wrote:
I really would not have left it to chance, I would have defined a
word-aligned constant rather than using a literal. However, it might not
have been as chancy as it may seem. The literal pool is doubleword
aligned and boundary alignment may have been a factor in determining
where the literal resided. I would like to think that the 8-byte
multiples are put at the front, the 4-byters next, then the twos
followed by everybody else. In looking at an assembly listing, that
seems to be the sequence. The first two literals in the program are
=x'A00', the next =x'FF', etc. In the literal pool, all 4 byte
entries (there were no 8 byte literals) precede the two byte literals
and then come the ones of only 1 byte. Within

Re: CMSCALL return code

2006-12-04 Thread Alan Altmark
On Monday, 12/04/2006 at 02:33 PST, Schuh, Richard [EMAIL PROTECTED] 
wrote:
 Pipelining a machine and adding caches does throw a monkey wrench into
 the discussion. Add interrupts and you really have a mess. That is one
 reason why the performance guys like to preface every sentence with
 YMMV or It depends :-)

...and is why, unless we're in a performance-sensitive area of code, we 
avoid spending time worrying about the speed of a particular instruction 
(laden or unladen).  You get into endless arguments that are religious in 
nature and based on 30-year-old machine designs.  :-)

Alan Altmark
z/VM Development
IBM Endicott


Re: CMSCALL return code

2006-12-04 Thread pfa
Another way to clear the register (not really recommended but it works :-)
   SRLR15,32





Mike Walter [EMAIL PROTECTED] 
Sent by: The IBM z/VM Operating System IBMVM@LISTSERV.UARK.EDU
12/04/2006 03:01 PM
Please respond to
The IBM z/VM Operating System IBMVM@LISTSERV.UARK.EDU


To
IBMVM@LISTSERV.UARK.EDU
cc

Subject
Re: CMSCALL return code






Sheesh, this goes way back to my good old Assembler diaper days when 
programmers really cared about performance instead of drag and drop 
solutions.
Slightly off-topic: if I remember correctly, we argued intensely about 
zeroing a GPR and the performance differences between: 

- SR R15,R15
- XR R15,R15
- LA R15,0(not seriously considered by performance geeks)
- L R15,=F'0' (considered for use only by amateur programmers coming from 
a BASIC or COBOL background and otherwise held in low esteem by real 
programmers).  ;-)

IIRC, the actual performance difference between SR and XR was different 
based more on specific processor models that anything else.

Mike Walter 
Hewitt Associates 
Any opinions expressed herein are mine alone and do not necessarily 
represent the opinions or policies of Hewitt Associates.




Schuh, Richard [EMAIL PROTECTED] 

Sent by: The IBM z/VM Operating System IBMVM@LISTSERV.UARK.EDU
12/04/2006 11:37 AM
Please respond to
The IBM z/VM Operating System IBMVM@LISTSERV.UARK.EDU



To
IBMVM@LISTSERV.UARK.EDU
cc

Subject
Re: CMSCALL return code






True, and it is undoubtedly faster to use SR  R15,R15 than it is to use LA 

 R15,0 to zero the register - there are no storage fetches and real 
subtraction is not needed if the result can be predicted, as it can in 
this case. However, the discussion had more to do with fetches of 
boundary-aligned vs. non-aligned data. There was no mention of the optimum 

speed for getting either a specific or an arbitrary value loaded into a 
register. In this day of pipelined machines
 
This is sort of reminiscent of the good old days, programming in 7080 
Autocoder. Boeing insisted that the programmers use a MOVE macro because 
there were 26 different ways to move data from one storage location to 
another. It was expected that most programmers would use either their 
favorite way or the first one that popped into their heads if left on 
their own. The macro chose the optimal way, depending on the operand 
definitions.

From: The IBM z/VM Operating System [mailto:[EMAIL PROTECTED] On 
Behalf Of Stanley Rarick
Sent: Friday, December 01, 2006 10:37 PM
To: IBMVM@LISTSERV.UARK.EDU
Subject: Re: CMSCALL return code

For a return code, LA R15,value is *much* faster than a L - only one 
storage fetch.

Schuh, Richard wrote:
I really would not have left it to chance, I would have defined a
word-aligned constant rather than using a literal. However, it might not
have been as chancy as it may seem. The literal pool is doubleword
aligned and boundary alignment may have been a factor in determining
where the literal resided. I would like to think that the 8-byte
multiples are put at the front, the 4-byters next, then the twos
followed by everybody else. In looking at an assembly listing, that
seems to be the sequence. The first two literals in the program are
=x'A00', the next =x'FF', etc. In the literal pool, all 4 byte
entries (there were no 8 byte literals) precede the two byte literals
and then come the ones of only 1 byte. Within each of these groups, the
literals appear in the order in which they were defined. There were no
long strings defined as literals in the particular listing. 

-Original Message-
From: The IBM z/VM Operating System [mailto:[EMAIL PROTECTED] On
Behalf Of Don Russell
Sent: Tuesday, November 21, 2006 3:46 PM
To: IBMVM@LISTSERV.UARK.EDU
Subject: Re: CMSCALL return code

Schuh, Richard wrote:
 
I agree, it does seem non-intuitive. The initial SR   R15,R15 was
undoubtedly preparing for a default rc of zero. How the non-zero rc 
gets put into the register later is largely a matter of taste. In this
 
case I
 
probably would have chosen L   R15,=X'...' - a habit learned, when
machines were slower, based on the knowledge that they were mostly 
optimized for the LOAD instruction vs. any other way of putting data 
from memory into a register.
 
 

If your habit was to use L Rx,=X'...' you were probably lucky in the old
days the =X literal would not necessarily be word-aligned, causing
two fetches to load the register, or, in the days when alignment really
mattered... a program exception.

Better to use L R15,=A(X'...') if alignment is a concern and you want to
use literals.

Then the literal IS aligned on a fullword boundary.

The initial SR 15,15 is unlikely to be setting the default return code..
.it's clearing the register preparing for the different option bytes to
be OR'd in. I agree the macro could (should?) have generated a single L
instruction instead, but then what nits would we have to discuss? :-)

 


 
The information contained in this e-mail and any

Re: CMSCALL return code

2006-12-04 Thread Mike Walter
Ha!!  We only *WISH* we were based on a 30-year old machine design!  ;-)
Reminds me of a joke: last night I was out with ... oh, let's not go 
there. 
Is it really only Monday?

Mike Walter



Alan Altmark [EMAIL PROTECTED] 

Sent by: The IBM z/VM Operating System IBMVM@LISTSERV.UARK.EDU
12/04/2006 04:43 PM
Please respond to
The IBM z/VM Operating System IBMVM@LISTSERV.UARK.EDU



To
IBMVM@LISTSERV.UARK.EDU
cc

Subject
Re: CMSCALL return code






On Monday, 12/04/2006 at 02:33 PST, Schuh, Richard [EMAIL PROTECTED] 
wrote:
 Pipelining a machine and adding caches does throw a monkey wrench into
 the discussion. Add interrupts and you really have a mess. That is one
 reason why the performance guys like to preface every sentence with
 YMMV or It depends :-)

...and is why, unless we're in a performance-sensitive area of code, we 
avoid spending time worrying about the speed of a particular instruction 
(laden or unladen).  You get into endless arguments that are religious in 
nature and based on 30-year-old machine designs.  :-)

Alan Altmark
z/VM Development
IBM Endicott




 
The information contained in this e-mail and any accompanying documents may 
contain information that is confidential or otherwise protected from 
disclosure. If you are not the intended recipient of this message, or if this 
message has been addressed to you in error, please immediately alert the sender 
by reply e-mail and then delete this message, including any attachments. Any 
dissemination, distribution or other use of the contents of this message by 
anyone other than the intended recipient 
is strictly prohibited.


Re: CMSCALL return code

2006-12-04 Thread Schuh, Richard
Are you wishing that you were 30 years younger? 

-Original Message-
From: The IBM z/VM Operating System [mailto:[EMAIL PROTECTED] On
Behalf Of Mike Walter
Sent: Monday, December 04, 2006 2:56 PM
To: IBMVM@LISTSERV.UARK.EDU
Subject: Re: CMSCALL return code

Ha!!  We only *WISH* we were based on a 30-year old machine design!  ;-)
Reminds me of a joke: last night I was out with ... oh, let's not go
there. 
Is it really only Monday?

Mike Walter



Alan Altmark [EMAIL PROTECTED] 

Sent by: The IBM z/VM Operating System IBMVM@LISTSERV.UARK.EDU
12/04/2006 04:43 PM
Please respond to
The IBM z/VM Operating System IBMVM@LISTSERV.UARK.EDU



To
IBMVM@LISTSERV.UARK.EDU
cc

Subject
Re: CMSCALL return code






On Monday, 12/04/2006 at 02:33 PST, Schuh, Richard [EMAIL PROTECTED]
wrote:
 Pipelining a machine and adding caches does throw a monkey wrench into

 the discussion. Add interrupts and you really have a mess. That is one

 reason why the performance guys like to preface every sentence with 
 YMMV or It depends :-)

...and is why, unless we're in a performance-sensitive area of code, we
avoid spending time worrying about the speed of a particular instruction
(laden or unladen).  You get into endless arguments that are religious
in nature and based on 30-year-old machine designs.  :-)

Alan Altmark
z/VM Development
IBM Endicott




 
The information contained in this e-mail and any accompanying documents
may contain information that is confidential or otherwise protected from
disclosure. If you are not the intended recipient of this message, or if
this message has been addressed to you in error, please immediately
alert the sender by reply e-mail and then delete this message, including
any attachments. Any dissemination, distribution or other use of the
contents of this message by anyone other than the intended recipient is
strictly prohibited.


Re: CMSCALL return code

2006-12-04 Thread Rich Greenberg
On: Mon, Dec 04, 2006 at 02:24:39PM -0800,Schuh, Richard Wrote:

} IIRC, the times for SR and XR were the same on the Amdahl machines, at
} least on the ones that came after the 470. They may have been the same
} on the 470, as well.

I do recall that on the Amdahl 470s, there was one pair of instructions
which were often executed together many places in VM/370 where the order
of which one was first mattered a lot for execution time.  They could go
in any order, but the IBM code usually executed them in the wrong order
for the 470s.  A large fraction of the Amdahl mods to run VM 370 took a
sequence like:

   xx .
   yy .

and changed that to:

   yy .
   xx .

to squeeze out a few more microseconds of performance.

I don't recall what difference (if any) the instruction order made on
blue hardware.

-- 
Rich Greenberg  N Ft Myers, FL, USA richgr atsign panix.com  + 1 239 543 1353
Eastern time.  N6LRT  I speak for myself  my dogs only.VM'er since CP-67
Canines:Val, Red, Shasta  Casey (RIP), Red  Zero, Siberians  Owner:Chinook-L
Retired at the beach Asst Owner:Sibernet-L


Re: CMSCALL return code

2006-12-01 Thread Stanley Rarick
For a return code, LA R15,value is *much* faster than a L - only one 
storage fetch.


Schuh, Richard wrote:


I really would not have left it to chance, I would have defined a
word-aligned constant rather than using a literal. However, it might not
have been as chancy as it may seem. The literal pool is doubleword
aligned and boundary alignment may have been a factor in determining
where the literal resided. I would like to think that the 8-byte
multiples are put at the front, the 4-byters next, then the twos
followed by everybody else. In looking at an assembly listing, that
seems to be the sequence. The first two literals in the program are
=x'A00', the next =x'FF', etc. In the literal pool, all 4 byte
entries (there were no 8 byte literals) precede the two byte literals
and then come the ones of only 1 byte. Within each of these groups, the
literals appear in the order in which they were defined. There were no
long strings defined as literals in the particular listing.  


-Original Message-
From: The IBM z/VM Operating System [mailto:[EMAIL PROTECTED] On
Behalf Of Don Russell
Sent: Tuesday, November 21, 2006 3:46 PM
To: IBMVM@LISTSERV.UARK.EDU
Subject: Re: CMSCALL return code

Schuh, Richard wrote:
 


I agree, it does seem non-intuitive. The initial SR   R15,R15 was
undoubtedly preparing for a default rc of zero. How the non-zero rc 
gets put into the register later is largely a matter of taste. In this
   


case I
 


probably would have chosen L   R15,=X'...' - a habit learned, when
machines were slower, based on the knowledge that they were mostly 
optimized for the LOAD instruction vs. any other way of putting data 
from memory into a register.
 
   



If your habit was to use L Rx,=X'...' you were probably lucky in the old
days the =X literal would not necessarily be word-aligned, causing
two fetches to load the register, or, in the days when alignment really
mattered... a program exception.

Better to use L R15,=A(X'...') if alignment is a concern and you want to
use literals.

Then the literal IS aligned on a fullword boundary.

The initial SR 15,15 is unlikely to be setting the default return code..
.it's clearing the register preparing for the different option bytes to
be OR'd in. I agree the macro could (should?) have generated a single L
instruction instead, but then what nits would we have to discuss? :-)

 



CMSCALL return code

2006-11-22 Thread Richard Corak

If your habit was to use L Rx,=X'...' you were probably lucky in the old
days the =X literal would not necessarily be word-aligned, causing
two fetches to load the register, or, in the days when alignment really
mattered... a program exception.


Not true.  Assemblers going back to F (anything before?) have always
ordered literals by alignment.  Any 8-byte-multiple literal will
be aligned on an 8-byte boundary, regardless of how defined.
Then all 4-byte-multiple literals not a multiple of 8, then all
2-byte-multiple not x4 or x8, then all 1-byte-multiple literals.

So, a literal =X'12345678' or =X'12,34,56,78', etc., will be 4-byte aligned.

Richard Corak 


Re: CMSCALL return code

2006-11-22 Thread Schuh, Richard
Wasn't the 360 assembler before F the Basic Assembler Language - the
origin of the BAL acronym. It was designed so that it could be used on
small machines where memory was at a premium. The scheme of sorting the
literal pool according to size probably originated with it. In the early
360s, the Byte Oriented Operand feature was an extra cost option.
Without it, a program check was the result if you violated the alignment
requirements.

Before that, there was 7080 Autocoder; however, the concept of a word
was somewhat different. IIRC, you could set a register's length (1-256
characters - it was before the word byte had been invented) or load
from a beginning address to a word mark in memory. Since starting
address and size were arbitrary, there was no alignment requirement
other than that word marks had to be in a 4 or a 9 position. If you
chose to load to a word mark and there was none in memory, the load
instruction would happily keep wrapping the register until the
instruction timed out.

The 7080 was a character machine that did decimal arithmetic. Its
contemporary was the 7094 which was strictly 36-bit word oriented.
Alignment in it was not an option.

-Original Message-
From: The IBM z/VM Operating System [mailto:[EMAIL PROTECTED] On
Behalf Of Richard Corak
Sent: Wednesday, November 22, 2006 8:23 AM
To: IBMVM@LISTSERV.UARK.EDU
Subject: CMSCALL return code

If your habit was to use L Rx,=X'...' you were probably lucky in the 
old days the =X literal would not necessarily be word-aligned, 
causing two fetches to load the register, or, in the days when 
alignment really mattered... a program exception.

Not true.  Assemblers going back to F (anything before?) have always
ordered literals by alignment.  Any 8-byte-multiple literal will be
aligned on an 8-byte boundary, regardless of how defined.
Then all 4-byte-multiple literals not a multiple of 8, then all
2-byte-multiple not x4 or x8, then all 1-byte-multiple literals.

So, a literal =X'12345678' or =X'12,34,56,78', etc., will be 4-byte
aligned.

Richard Corak 


Re: CMSCALL return code

2006-11-21 Thread Schuh, Richard
I really would not have left it to chance, I would have defined a
word-aligned constant rather than using a literal. However, it might not
have been as chancy as it may seem. The literal pool is doubleword
aligned and boundary alignment may have been a factor in determining
where the literal resided. I would like to think that the 8-byte
multiples are put at the front, the 4-byters next, then the twos
followed by everybody else. In looking at an assembly listing, that
seems to be the sequence. The first two literals in the program are
=x'A00', the next =x'FF', etc. In the literal pool, all 4 byte
entries (there were no 8 byte literals) precede the two byte literals
and then come the ones of only 1 byte. Within each of these groups, the
literals appear in the order in which they were defined. There were no
long strings defined as literals in the particular listing.  

-Original Message-
From: The IBM z/VM Operating System [mailto:[EMAIL PROTECTED] On
Behalf Of Don Russell
Sent: Tuesday, November 21, 2006 3:46 PM
To: IBMVM@LISTSERV.UARK.EDU
Subject: Re: CMSCALL return code

Schuh, Richard wrote:
 I agree, it does seem non-intuitive. The initial SR   R15,R15 was
 undoubtedly preparing for a default rc of zero. How the non-zero rc 
 gets put into the register later is largely a matter of taste. In this
case I
 probably would have chosen L   R15,=X'...' - a habit learned, when
 machines were slower, based on the knowledge that they were mostly 
 optimized for the LOAD instruction vs. any other way of putting data 
 from memory into a register.
   

If your habit was to use L Rx,=X'...' you were probably lucky in the old
days the =X literal would not necessarily be word-aligned, causing
two fetches to load the register, or, in the days when alignment really
mattered... a program exception.

Better to use L R15,=A(X'...') if alignment is a concern and you want to
use literals.

Then the literal IS aligned on a fullword boundary.

The initial SR 15,15 is unlikely to be setting the default return code..
.it's clearing the register preparing for the different option bytes to
be OR'd in. I agree the macro could (should?) have generated a single L
instruction instead, but then what nits would we have to discuss? :-)


Re: CMSCALL return code

2006-11-21 Thread Shimon Lebowitz
On 21 Nov 2006 at 17:32, Schuh, Richard wrote:

  However, it might not
 have been as chancy as it may seem. The literal pool is doubleword
 aligned and boundary alignment may have been a factor in determining
 where the literal resided. I would like to think that the 8-byte
 multiples are put at the front, the 4-byters next, then the twos
 followed by everybody else. In looking at an assembly listing, that
 seems to be the sequence. 

I was going to say the same thing, but a bit more
definitively.

The following quotation is taken from the HLASM 5 ref manual, at
http://publibfp.boulder.ibm.com/cgi-bin/bookmgr/BOOKS/asmr1010/5.31.1?SHELF=DT=20040728153937


Each literal pool has five segments into which the literals are stored (a) in 
the order that the literals are specified, and (b) according to their assembled 
lengths, which, for each literal, is the total explicit or implied length, as 
described below. 

| The first segment contains all literal constants whose assembled 
| lengths are a multiple of 16. 

The second segment contains those whose assembled lengths are a multiple of 8, 
but not of 16. 

The third segment contains those whose assembled lengths are a multiple of 4, 
but not a multiple of 8. 

The fourth segment contains those whose assembled lengths are even, but not a 
multiple of 4. 

The fifth segment contains all the remaining literal constants whose assembled 
lengths are odd. 

| Since each literal pool is aligned on a SECTALGN alignment, this 
| guarantees that all literals in the second segment are doubleword aligned; 
| in the third segment, fullword aligned; and, in the fourth, halfword 
| aligned. 



Don Russell also said:

 If your habit was to use L Rx,=X'...' you were probably lucky in the old
 days the =X literal would not necessarily be word-aligned, causing
... problems ;-)

I believe the current literal pool alignment behavior has been
around for a pretty long time. I went now to look it up in the ref, 
but it is how I remember being taught in the old days of the 70s.

Shimon
-- 
**
**
Shimon Lebowitzmailto:[EMAIL PROTECTED]
VM System Programmer   .
Israel Police National HQ. http://www.poboxes.com/shimonpgp
Jerusalem, Israel  phone: +972 2 530-9877  fax: 530-9308
**
**