Re: CMSCALL return code
Looking at my yellow card , there MAY be a reason to use LA or L for certain cases. It looks like both SR and XR set the condition code, while L and LA do not . So if one wanted to preserve the CC for some reason, one could be justified in coding the L or LA. So take a good look at the subsequent stmts before replacing those Load opcodes ! Schuh, Richard wrote: IIRC, the times for SR and XR were the same on the Amdahl machines, at least on the ones that came after the 470. They may have been the same on the 470, as well. -Original Message- From: The IBM z/VM Operating System [mailto:[EMAIL PROTECTED] On Behalf Of Mike Walter Sent: Monday, December 04, 2006 12:02 PM To: IBMVM@LISTSERV.UARK.EDU Subject: Re: CMSCALL return code Sheesh, this goes way back to my good old Assembler diaper days when programmers really cared about performance instead of drag and drop solutions. Slightly off-topic: if I remember correctly, we argued intensely about zeroing a GPR and the performance differences between: - SR R15,R15 - XR R15,R15 - LA R15,0(not seriously considered by performance geeks) - L R15,=F'0' (considered for use only by amateur programmers coming from a BASIC or COBOL background and otherwise held in low esteem by real programmers). ;-) IIRC, the actual performance difference between SR and XR was different based more on specific processor models that anything else. Mike Walter Hewitt Associates Any opinions expressed herein are mine alone and do not necessarily represent the opinions or policies of Hewitt Associates.
Re: CMSCALL return code
True, and it is undoubtedly faster to use SR R15,R15 than it is to use LA R15,0 to zero the register - there are no storage fetches and real subtraction is not needed if the result can be predicted, as it can in this case. However, the discussion had more to do with fetches of boundary-aligned vs. non-aligned data. There was no mention of the optimum speed for getting either a specific or an arbitrary value loaded into a register. In this day of pipelined machines This is sort of reminiscent of the good old days, programming in 7080 Autocoder. Boeing insisted that the programmers use a MOVE macro because there were 26 different ways to move data from one storage location to another. It was expected that most programmers would use either their favorite way or the first one that popped into their heads if left on their own. The macro chose the optimal way, depending on the operand definitions. From: The IBM z/VM Operating System [mailto:[EMAIL PROTECTED] On Behalf Of Stanley Rarick Sent: Friday, December 01, 2006 10:37 PM To: IBMVM@LISTSERV.UARK.EDU Subject: Re: CMSCALL return code For a return code, LA R15,value is *much* faster than a L - only one storage fetch. Schuh, Richard wrote: I really would not have left it to chance, I would have defined a word-aligned constant rather than using a literal. However, it might not have been as chancy as it may seem. The literal pool is doubleword aligned and boundary alignment may have been a factor in determining where the literal resided. I would like to think that the 8-byte multiples are put at the front, the 4-byters next, then the twos followed by everybody else. In looking at an assembly listing, that seems to be the sequence. The first two literals in the program are =x'A00', the next =x'FF', etc. In the literal pool, all 4 byte entries (there were no 8 byte literals) precede the two byte literals and then come the ones of only 1 byte. Within each of these groups, the literals appear in the order in which they were defined. There were no long strings defined as literals in the particular listing. -Original Message- From: The IBM z/VM Operating System [mailto:[EMAIL PROTECTED] On Behalf Of Don Russell Sent: Tuesday, November 21, 2006 3:46 PM To: IBMVM@LISTSERV.UARK.EDU Subject: Re: CMSCALL return code Schuh, Richard wrote: I agree, it does seem non-intuitive. The initial SR R15,R15 was undoubtedly preparing for a default rc of zero. How the non-zero rc gets put into the register later is largely a matter of taste. In this case I probably would have chosen L R15,=X'...' - a habit learned, when machines were slower, based on the knowledge that they were mostly optimized for the LOAD instruction vs. any other way of putting data from memory into a register. If your habit was to use L Rx,=X'...' you were probably lucky in the old days the =X literal would not necessarily be word-aligned, causing two fetches to load the register, or, in the days when alignment really mattered... a program exception. Better to use L R15,=A(X'...') if alignment is a concern and you want to use literals. Then the literal IS aligned on a fullword boundary. The initial SR 15,15 is unlikely to be setting the default return code.. .it's clearing the register preparing for the different option bytes to be OR'd in. I agree the macro could (should?) have generated a single L instruction instead, but then what nits would we have to discuss? :-)
Re: CMSCALL return code
Sheesh, this goes way back to my good old Assembler diaper days when programmers really cared about performance instead of drag and drop solutions. Slightly off-topic: if I remember correctly, we argued intensely about zeroing a GPR and the performance differences between: - SR R15,R15 - XR R15,R15 - LA R15,0(not seriously considered by performance geeks) - L R15,=F'0' (considered for use only by amateur programmers coming from a BASIC or COBOL background and otherwise held in low esteem by real programmers). ;-) IIRC, the actual performance difference between SR and XR was different based more on specific processor models that anything else. Mike Walter Hewitt Associates Any opinions expressed herein are mine alone and do not necessarily represent the opinions or policies of Hewitt Associates. Schuh, Richard [EMAIL PROTECTED] Sent by: The IBM z/VM Operating System IBMVM@LISTSERV.UARK.EDU 12/04/2006 11:37 AM Please respond to The IBM z/VM Operating System IBMVM@LISTSERV.UARK.EDU To IBMVM@LISTSERV.UARK.EDU cc Subject Re: CMSCALL return code True, and it is undoubtedly faster to use SR R15,R15 than it is to use LA R15,0 to zero the register - there are no storage fetches and real subtraction is not needed if the result can be predicted, as it can in this case. However, the discussion had more to do with fetches of boundary-aligned vs. non-aligned data. There was no mention of the optimum speed for getting either a specific or an arbitrary value loaded into a register. In this day of pipelined machines This is sort of reminiscent of the good old days, programming in 7080 Autocoder. Boeing insisted that the programmers use a MOVE macro because there were 26 different ways to move data from one storage location to another. It was expected that most programmers would use either their favorite way or the first one that popped into their heads if left on their own. The macro chose the optimal way, depending on the operand definitions. From: The IBM z/VM Operating System [mailto:[EMAIL PROTECTED] On Behalf Of Stanley Rarick Sent: Friday, December 01, 2006 10:37 PM To: IBMVM@LISTSERV.UARK.EDU Subject: Re: CMSCALL return code For a return code, LA R15,value is *much* faster than a L - only one storage fetch. Schuh, Richard wrote: I really would not have left it to chance, I would have defined a word-aligned constant rather than using a literal. However, it might not have been as chancy as it may seem. The literal pool is doubleword aligned and boundary alignment may have been a factor in determining where the literal resided. I would like to think that the 8-byte multiples are put at the front, the 4-byters next, then the twos followed by everybody else. In looking at an assembly listing, that seems to be the sequence. The first two literals in the program are =x'A00', the next =x'FF', etc. In the literal pool, all 4 byte entries (there were no 8 byte literals) precede the two byte literals and then come the ones of only 1 byte. Within each of these groups, the literals appear in the order in which they were defined. There were no long strings defined as literals in the particular listing. -Original Message- From: The IBM z/VM Operating System [mailto:[EMAIL PROTECTED] On Behalf Of Don Russell Sent: Tuesday, November 21, 2006 3:46 PM To: IBMVM@LISTSERV.UARK.EDU Subject: Re: CMSCALL return code Schuh, Richard wrote: I agree, it does seem non-intuitive. The initial SR R15,R15 was undoubtedly preparing for a default rc of zero. How the non-zero rc gets put into the register later is largely a matter of taste. In this case I probably would have chosen L R15,=X'...' - a habit learned, when machines were slower, based on the knowledge that they were mostly optimized for the LOAD instruction vs. any other way of putting data from memory into a register. If your habit was to use L Rx,=X'...' you were probably lucky in the old days the =X literal would not necessarily be word-aligned, causing two fetches to load the register, or, in the days when alignment really mattered... a program exception. Better to use L R15,=A(X'...') if alignment is a concern and you want to use literals. Then the literal IS aligned on a fullword boundary. The initial SR 15,15 is unlikely to be setting the default return code.. .it's clearing the register preparing for the different option bytes to be OR'd in. I agree the macro could (should?) have generated a single L instruction instead, but then what nits would we have to discuss? :-) The information contained in this e-mail and any accompanying documents may contain information that is confidential or otherwise protected from disclosure. If you are not the intended recipient of this message, or if this message has been addressed to you in error, please immediately alert the sender by reply e-mail and then delete this message, including any attachments. Any dissemination, distribution
Re: CMSCALL return code
There is a new option now, especially with non-zero codes: LHI R15,4 No storage fetch. The subject of instruction timings on IBM-MAIN and ASSEMBLER-LIST comes up now and then. I point y'all to the archives of both lists. With the new z/Architecture pipelines and caches, sometimes what seems at first to be illogical instruction placement may actually be better. Hypothetical illustration example: LR4,RECPTRLoad address of pointer AHI R6,1 Add 1 to counter AHI R8,(-8) Some other strange counter CLI 16(R4),X'40' JE GOHERE The z/Architecture processor will execute the two AHI instructions while the base/displacement calculation and storage access for the L instruction is occurring, because it knows that R4 isn't affected by those instructions. By the time the CLI is hit R4 will contain the address and there is no delay that might occur if you code AHI R6,1 Add 1 to counter AHI R8,(-8) Some other strange counter LR4,RECPTRLoad address of pointer CLI 16(R4),X'40' JE GOHERE In this case, there might be a delay at the CLI. Speaking of branches there's been an interesting discussion recently about the branch-prediction logic in z/Architecture, which is why I demonstrate with the RI (or is it IR? I can never remember) instruction. Later, Ray -Original Message- From: The IBM z/VM Operating System [mailto:[EMAIL PROTECTED] On Behalf Of Mike Walter Sent: Monday December 04 2006 12:02 To: IBMVM@LISTSERV.UARK.EDU Subject: Re: CMSCALL return code Sheesh, this goes way back to my good old Assembler diaper days when programmers really cared about performance instead of drag and drop solutions. Slightly off-topic: if I remember correctly, we argued intensely about zeroing a GPR and the performance differences between: - SR R15,R15 - XR R15,R15 - LA R15,0(not seriously considered by performance geeks) - L R15,=F'0' (considered for use only by amateur programmers coming from a BASIC or COBOL background and otherwise held in low esteem by real programmers). ;-) IIRC, the actual performance difference between SR and XR was different based more on specific processor models that anything else. Mike Walter Hewitt Associates Any opinions expressed herein are mine alone and do not necessarily represent the opinions or policies of Hewitt Associates. Schuh, Richard [EMAIL PROTECTED] Sent by: The IBM z/VM Operating System IBMVM@LISTSERV.UARK.EDU 12/04/2006 11:37 AM Please respond to The IBM z/VM Operating System IBMVM@LISTSERV.UARK.EDU To IBMVM@LISTSERV.UARK.EDU cc Subject Re: CMSCALL return code True, and it is undoubtedly faster to use SR R15,R15 than it is to use LA R15,0 to zero the register - there are no storage fetches and real subtraction is not needed if the result can be predicted, as it can in this case. However, the discussion had more to do with fetches of boundary-aligned vs. non-aligned data. There was no mention of the optimum speed for getting either a specific or an arbitrary value loaded into a register. In this day of pipelined machines This is sort of reminiscent of the good old days, programming in 7080 Autocoder. Boeing insisted that the programmers use a MOVE macro because there were 26 different ways to move data from one storage location to another. It was expected that most programmers would use either their favorite way or the first one that popped into their heads if left on their own. The macro chose the optimal way, depending on the operand definitions. From: The IBM z/VM Operating System [mailto:[EMAIL PROTECTED] On Behalf Of Stanley Rarick Sent: Friday, December 01, 2006 10:37 PM To: IBMVM@LISTSERV.UARK.EDU Subject: Re: CMSCALL return code For a return code, LA R15,value is *much* faster than a L - only one storage fetch. Schuh, Richard wrote: I really would not have left it to chance, I would have defined a word-aligned constant rather than using a literal. However, it might not have been as chancy as it may seem. The literal pool is doubleword aligned and boundary alignment may have been a factor in determining where the literal resided. I would like to think that the 8-byte multiples are put at the front, the 4-byters next, then the twos followed by everybody else. In looking at an assembly listing, that seems to be the sequence. The first two literals in the program are =x'A00', the next =x'FF', etc. In the literal pool, all 4 byte entries (there were no 8 byte literals) precede the two byte literals and then come the ones of only 1 byte. Within each of these groups, the literals appear in the order in which they were defined. There were no long strings defined as literals in the particular listing. -Original Message- From: The IBM z/VM Operating System [mailto:[EMAIL PROTECTED] On Behalf Of Don Russell Sent: Tuesday, November 21, 2006 3:46 PM To: IBMVM@LISTSERV.UARK.EDU Subject: Re: CMSCALL return code Schuh, Richard wrote: I agree, it does seem non-intuitive. The initial SR
Re: CMSCALL return code
IIRC, the times for SR and XR were the same on the Amdahl machines, at least on the ones that came after the 470. They may have been the same on the 470, as well. -Original Message- From: The IBM z/VM Operating System [mailto:[EMAIL PROTECTED] On Behalf Of Mike Walter Sent: Monday, December 04, 2006 12:02 PM To: IBMVM@LISTSERV.UARK.EDU Subject: Re: CMSCALL return code Sheesh, this goes way back to my good old Assembler diaper days when programmers really cared about performance instead of drag and drop solutions. Slightly off-topic: if I remember correctly, we argued intensely about zeroing a GPR and the performance differences between: - SR R15,R15 - XR R15,R15 - LA R15,0(not seriously considered by performance geeks) - L R15,=F'0' (considered for use only by amateur programmers coming from a BASIC or COBOL background and otherwise held in low esteem by real programmers). ;-) IIRC, the actual performance difference between SR and XR was different based more on specific processor models that anything else. Mike Walter Hewitt Associates Any opinions expressed herein are mine alone and do not necessarily represent the opinions or policies of Hewitt Associates. Schuh, Richard [EMAIL PROTECTED] Sent by: The IBM z/VM Operating System IBMVM@LISTSERV.UARK.EDU 12/04/2006 11:37 AM Please respond to The IBM z/VM Operating System IBMVM@LISTSERV.UARK.EDU To IBMVM@LISTSERV.UARK.EDU cc Subject Re: CMSCALL return code True, and it is undoubtedly faster to use SR R15,R15 than it is to use LA R15,0 to zero the register - there are no storage fetches and real subtraction is not needed if the result can be predicted, as it can in this case. However, the discussion had more to do with fetches of boundary-aligned vs. non-aligned data. There was no mention of the optimum speed for getting either a specific or an arbitrary value loaded into a register. In this day of pipelined machines This is sort of reminiscent of the good old days, programming in 7080 Autocoder. Boeing insisted that the programmers use a MOVE macro because there were 26 different ways to move data from one storage location to another. It was expected that most programmers would use either their favorite way or the first one that popped into their heads if left on their own. The macro chose the optimal way, depending on the operand definitions. From: The IBM z/VM Operating System [mailto:[EMAIL PROTECTED] On Behalf Of Stanley Rarick Sent: Friday, December 01, 2006 10:37 PM To: IBMVM@LISTSERV.UARK.EDU Subject: Re: CMSCALL return code For a return code, LA R15,value is *much* faster than a L - only one storage fetch. Schuh, Richard wrote: I really would not have left it to chance, I would have defined a word-aligned constant rather than using a literal. However, it might not have been as chancy as it may seem. The literal pool is doubleword aligned and boundary alignment may have been a factor in determining where the literal resided. I would like to think that the 8-byte multiples are put at the front, the 4-byters next, then the twos followed by everybody else. In looking at an assembly listing, that seems to be the sequence. The first two literals in the program are =x'A00', the next =x'FF', etc. In the literal pool, all 4 byte entries (there were no 8 byte literals) precede the two byte literals and then come the ones of only 1 byte. Within each of these groups, the literals appear in the order in which they were defined. There were no long strings defined as literals in the particular listing. -Original Message- From: The IBM z/VM Operating System [mailto:[EMAIL PROTECTED] On Behalf Of Don Russell Sent: Tuesday, November 21, 2006 3:46 PM To: IBMVM@LISTSERV.UARK.EDU Subject: Re: CMSCALL return code Schuh, Richard wrote: I agree, it does seem non-intuitive. The initial SR R15,R15 was undoubtedly preparing for a default rc of zero. How the non-zero rc gets put into the register later is largely a matter of taste. In this case I probably would have chosen L R15,=X'...' - a habit learned, when machines were slower, based on the knowledge that they were mostly optimized for the LOAD instruction vs. any other way of putting data from memory into a register. If your habit was to use L Rx,=X'...' you were probably lucky in the old days the =X literal would not necessarily be word-aligned, causing two fetches to load the register, or, in the days when alignment really mattered... a program exception. Better to use L R15,=A(X'...') if alignment is a concern and you want to use literals. Then the literal IS aligned on a fullword boundary. The initial SR 15,15 is unlikely to be setting the default return code.. .it's clearing the register preparing for the different option bytes to be OR'd in. I agree the macro could (should?) have generated a single L instruction instead, but then what nits would we have to discuss? :-) The information contained in this e-mail and any accompanying
Re: CMSCALL return code
Pipelining a machine and adding caches does throw a monkey wrench into the discussion. Add interrupts and you really have a mess. That is one reason why the performance guys like to preface every sentence with YMMV or It depends :-) -Original Message- From: The IBM z/VM Operating System [mailto:[EMAIL PROTECTED] On Behalf Of Ray Mullins Sent: Monday, December 04, 2006 12:40 PM To: IBMVM@LISTSERV.UARK.EDU Subject: Re: CMSCALL return code There is a new option now, especially with non-zero codes: LHI R15,4 No storage fetch. The subject of instruction timings on IBM-MAIN and ASSEMBLER-LIST comes up now and then. I point y'all to the archives of both lists. With the new z/Architecture pipelines and caches, sometimes what seems at first to be illogical instruction placement may actually be better. Hypothetical illustration example: LR4,RECPTRLoad address of pointer AHI R6,1 Add 1 to counter AHI R8,(-8) Some other strange counter CLI 16(R4),X'40' JE GOHERE The z/Architecture processor will execute the two AHI instructions while the base/displacement calculation and storage access for the L instruction is occurring, because it knows that R4 isn't affected by those instructions. By the time the CLI is hit R4 will contain the address and there is no delay that might occur if you code AHI R6,1 Add 1 to counter AHI R8,(-8) Some other strange counter LR4,RECPTRLoad address of pointer CLI 16(R4),X'40' JE GOHERE In this case, there might be a delay at the CLI. Speaking of branches there's been an interesting discussion recently about the branch-prediction logic in z/Architecture, which is why I demonstrate with the RI (or is it IR? I can never remember) instruction. Later, Ray -Original Message- From: The IBM z/VM Operating System [mailto:[EMAIL PROTECTED] On Behalf Of Mike Walter Sent: Monday December 04 2006 12:02 To: IBMVM@LISTSERV.UARK.EDU Subject: Re: CMSCALL return code Sheesh, this goes way back to my good old Assembler diaper days when programmers really cared about performance instead of drag and drop solutions. Slightly off-topic: if I remember correctly, we argued intensely about zeroing a GPR and the performance differences between: - SR R15,R15 - XR R15,R15 - LA R15,0(not seriously considered by performance geeks) - L R15,=F'0' (considered for use only by amateur programmers coming from a BASIC or COBOL background and otherwise held in low esteem by real programmers). ;-) IIRC, the actual performance difference between SR and XR was different based more on specific processor models that anything else. Mike Walter Hewitt Associates Any opinions expressed herein are mine alone and do not necessarily represent the opinions or policies of Hewitt Associates. Schuh, Richard [EMAIL PROTECTED] Sent by: The IBM z/VM Operating System IBMVM@LISTSERV.UARK.EDU 12/04/2006 11:37 AM Please respond to The IBM z/VM Operating System IBMVM@LISTSERV.UARK.EDU To IBMVM@LISTSERV.UARK.EDU cc Subject Re: CMSCALL return code True, and it is undoubtedly faster to use SR R15,R15 than it is to use LA R15,0 to zero the register - there are no storage fetches and real subtraction is not needed if the result can be predicted, as it can in this case. However, the discussion had more to do with fetches of boundary-aligned vs. non-aligned data. There was no mention of the optimum speed for getting either a specific or an arbitrary value loaded into a register. In this day of pipelined machines This is sort of reminiscent of the good old days, programming in 7080 Autocoder. Boeing insisted that the programmers use a MOVE macro because there were 26 different ways to move data from one storage location to another. It was expected that most programmers would use either their favorite way or the first one that popped into their heads if left on their own. The macro chose the optimal way, depending on the operand definitions. From: The IBM z/VM Operating System [mailto:[EMAIL PROTECTED] On Behalf Of Stanley Rarick Sent: Friday, December 01, 2006 10:37 PM To: IBMVM@LISTSERV.UARK.EDU Subject: Re: CMSCALL return code For a return code, LA R15,value is *much* faster than a L - only one storage fetch. Schuh, Richard wrote: I really would not have left it to chance, I would have defined a word-aligned constant rather than using a literal. However, it might not have been as chancy as it may seem. The literal pool is doubleword aligned and boundary alignment may have been a factor in determining where the literal resided. I would like to think that the 8-byte multiples are put at the front, the 4-byters next, then the twos followed by everybody else. In looking at an assembly listing, that seems to be the sequence. The first two literals in the program are =x'A00', the next =x'FF', etc. In the literal pool, all 4 byte entries (there were no 8 byte literals) precede the two byte literals and then come the ones of only 1 byte. Within
Re: CMSCALL return code
On Monday, 12/04/2006 at 02:33 PST, Schuh, Richard [EMAIL PROTECTED] wrote: Pipelining a machine and adding caches does throw a monkey wrench into the discussion. Add interrupts and you really have a mess. That is one reason why the performance guys like to preface every sentence with YMMV or It depends :-) ...and is why, unless we're in a performance-sensitive area of code, we avoid spending time worrying about the speed of a particular instruction (laden or unladen). You get into endless arguments that are religious in nature and based on 30-year-old machine designs. :-) Alan Altmark z/VM Development IBM Endicott
Re: CMSCALL return code
Another way to clear the register (not really recommended but it works :-) SRLR15,32 Mike Walter [EMAIL PROTECTED] Sent by: The IBM z/VM Operating System IBMVM@LISTSERV.UARK.EDU 12/04/2006 03:01 PM Please respond to The IBM z/VM Operating System IBMVM@LISTSERV.UARK.EDU To IBMVM@LISTSERV.UARK.EDU cc Subject Re: CMSCALL return code Sheesh, this goes way back to my good old Assembler diaper days when programmers really cared about performance instead of drag and drop solutions. Slightly off-topic: if I remember correctly, we argued intensely about zeroing a GPR and the performance differences between: - SR R15,R15 - XR R15,R15 - LA R15,0(not seriously considered by performance geeks) - L R15,=F'0' (considered for use only by amateur programmers coming from a BASIC or COBOL background and otherwise held in low esteem by real programmers). ;-) IIRC, the actual performance difference between SR and XR was different based more on specific processor models that anything else. Mike Walter Hewitt Associates Any opinions expressed herein are mine alone and do not necessarily represent the opinions or policies of Hewitt Associates. Schuh, Richard [EMAIL PROTECTED] Sent by: The IBM z/VM Operating System IBMVM@LISTSERV.UARK.EDU 12/04/2006 11:37 AM Please respond to The IBM z/VM Operating System IBMVM@LISTSERV.UARK.EDU To IBMVM@LISTSERV.UARK.EDU cc Subject Re: CMSCALL return code True, and it is undoubtedly faster to use SR R15,R15 than it is to use LA R15,0 to zero the register - there are no storage fetches and real subtraction is not needed if the result can be predicted, as it can in this case. However, the discussion had more to do with fetches of boundary-aligned vs. non-aligned data. There was no mention of the optimum speed for getting either a specific or an arbitrary value loaded into a register. In this day of pipelined machines This is sort of reminiscent of the good old days, programming in 7080 Autocoder. Boeing insisted that the programmers use a MOVE macro because there were 26 different ways to move data from one storage location to another. It was expected that most programmers would use either their favorite way or the first one that popped into their heads if left on their own. The macro chose the optimal way, depending on the operand definitions. From: The IBM z/VM Operating System [mailto:[EMAIL PROTECTED] On Behalf Of Stanley Rarick Sent: Friday, December 01, 2006 10:37 PM To: IBMVM@LISTSERV.UARK.EDU Subject: Re: CMSCALL return code For a return code, LA R15,value is *much* faster than a L - only one storage fetch. Schuh, Richard wrote: I really would not have left it to chance, I would have defined a word-aligned constant rather than using a literal. However, it might not have been as chancy as it may seem. The literal pool is doubleword aligned and boundary alignment may have been a factor in determining where the literal resided. I would like to think that the 8-byte multiples are put at the front, the 4-byters next, then the twos followed by everybody else. In looking at an assembly listing, that seems to be the sequence. The first two literals in the program are =x'A00', the next =x'FF', etc. In the literal pool, all 4 byte entries (there were no 8 byte literals) precede the two byte literals and then come the ones of only 1 byte. Within each of these groups, the literals appear in the order in which they were defined. There were no long strings defined as literals in the particular listing. -Original Message- From: The IBM z/VM Operating System [mailto:[EMAIL PROTECTED] On Behalf Of Don Russell Sent: Tuesday, November 21, 2006 3:46 PM To: IBMVM@LISTSERV.UARK.EDU Subject: Re: CMSCALL return code Schuh, Richard wrote: I agree, it does seem non-intuitive. The initial SR R15,R15 was undoubtedly preparing for a default rc of zero. How the non-zero rc gets put into the register later is largely a matter of taste. In this case I probably would have chosen L R15,=X'...' - a habit learned, when machines were slower, based on the knowledge that they were mostly optimized for the LOAD instruction vs. any other way of putting data from memory into a register. If your habit was to use L Rx,=X'...' you were probably lucky in the old days the =X literal would not necessarily be word-aligned, causing two fetches to load the register, or, in the days when alignment really mattered... a program exception. Better to use L R15,=A(X'...') if alignment is a concern and you want to use literals. Then the literal IS aligned on a fullword boundary. The initial SR 15,15 is unlikely to be setting the default return code.. .it's clearing the register preparing for the different option bytes to be OR'd in. I agree the macro could (should?) have generated a single L instruction instead, but then what nits would we have to discuss? :-) The information contained in this e-mail and any
Re: CMSCALL return code
Ha!! We only *WISH* we were based on a 30-year old machine design! ;-) Reminds me of a joke: last night I was out with ... oh, let's not go there. Is it really only Monday? Mike Walter Alan Altmark [EMAIL PROTECTED] Sent by: The IBM z/VM Operating System IBMVM@LISTSERV.UARK.EDU 12/04/2006 04:43 PM Please respond to The IBM z/VM Operating System IBMVM@LISTSERV.UARK.EDU To IBMVM@LISTSERV.UARK.EDU cc Subject Re: CMSCALL return code On Monday, 12/04/2006 at 02:33 PST, Schuh, Richard [EMAIL PROTECTED] wrote: Pipelining a machine and adding caches does throw a monkey wrench into the discussion. Add interrupts and you really have a mess. That is one reason why the performance guys like to preface every sentence with YMMV or It depends :-) ...and is why, unless we're in a performance-sensitive area of code, we avoid spending time worrying about the speed of a particular instruction (laden or unladen). You get into endless arguments that are religious in nature and based on 30-year-old machine designs. :-) Alan Altmark z/VM Development IBM Endicott The information contained in this e-mail and any accompanying documents may contain information that is confidential or otherwise protected from disclosure. If you are not the intended recipient of this message, or if this message has been addressed to you in error, please immediately alert the sender by reply e-mail and then delete this message, including any attachments. Any dissemination, distribution or other use of the contents of this message by anyone other than the intended recipient is strictly prohibited.
Re: CMSCALL return code
Are you wishing that you were 30 years younger? -Original Message- From: The IBM z/VM Operating System [mailto:[EMAIL PROTECTED] On Behalf Of Mike Walter Sent: Monday, December 04, 2006 2:56 PM To: IBMVM@LISTSERV.UARK.EDU Subject: Re: CMSCALL return code Ha!! We only *WISH* we were based on a 30-year old machine design! ;-) Reminds me of a joke: last night I was out with ... oh, let's not go there. Is it really only Monday? Mike Walter Alan Altmark [EMAIL PROTECTED] Sent by: The IBM z/VM Operating System IBMVM@LISTSERV.UARK.EDU 12/04/2006 04:43 PM Please respond to The IBM z/VM Operating System IBMVM@LISTSERV.UARK.EDU To IBMVM@LISTSERV.UARK.EDU cc Subject Re: CMSCALL return code On Monday, 12/04/2006 at 02:33 PST, Schuh, Richard [EMAIL PROTECTED] wrote: Pipelining a machine and adding caches does throw a monkey wrench into the discussion. Add interrupts and you really have a mess. That is one reason why the performance guys like to preface every sentence with YMMV or It depends :-) ...and is why, unless we're in a performance-sensitive area of code, we avoid spending time worrying about the speed of a particular instruction (laden or unladen). You get into endless arguments that are religious in nature and based on 30-year-old machine designs. :-) Alan Altmark z/VM Development IBM Endicott The information contained in this e-mail and any accompanying documents may contain information that is confidential or otherwise protected from disclosure. If you are not the intended recipient of this message, or if this message has been addressed to you in error, please immediately alert the sender by reply e-mail and then delete this message, including any attachments. Any dissemination, distribution or other use of the contents of this message by anyone other than the intended recipient is strictly prohibited.
Re: CMSCALL return code
On: Mon, Dec 04, 2006 at 02:24:39PM -0800,Schuh, Richard Wrote: } IIRC, the times for SR and XR were the same on the Amdahl machines, at } least on the ones that came after the 470. They may have been the same } on the 470, as well. I do recall that on the Amdahl 470s, there was one pair of instructions which were often executed together many places in VM/370 where the order of which one was first mattered a lot for execution time. They could go in any order, but the IBM code usually executed them in the wrong order for the 470s. A large fraction of the Amdahl mods to run VM 370 took a sequence like: xx . yy . and changed that to: yy . xx . to squeeze out a few more microseconds of performance. I don't recall what difference (if any) the instruction order made on blue hardware. -- Rich Greenberg N Ft Myers, FL, USA richgr atsign panix.com + 1 239 543 1353 Eastern time. N6LRT I speak for myself my dogs only.VM'er since CP-67 Canines:Val, Red, Shasta Casey (RIP), Red Zero, Siberians Owner:Chinook-L Retired at the beach Asst Owner:Sibernet-L
Re: CMSCALL return code
For a return code, LA R15,value is *much* faster than a L - only one storage fetch. Schuh, Richard wrote: I really would not have left it to chance, I would have defined a word-aligned constant rather than using a literal. However, it might not have been as chancy as it may seem. The literal pool is doubleword aligned and boundary alignment may have been a factor in determining where the literal resided. I would like to think that the 8-byte multiples are put at the front, the 4-byters next, then the twos followed by everybody else. In looking at an assembly listing, that seems to be the sequence. The first two literals in the program are =x'A00', the next =x'FF', etc. In the literal pool, all 4 byte entries (there were no 8 byte literals) precede the two byte literals and then come the ones of only 1 byte. Within each of these groups, the literals appear in the order in which they were defined. There were no long strings defined as literals in the particular listing. -Original Message- From: The IBM z/VM Operating System [mailto:[EMAIL PROTECTED] On Behalf Of Don Russell Sent: Tuesday, November 21, 2006 3:46 PM To: IBMVM@LISTSERV.UARK.EDU Subject: Re: CMSCALL return code Schuh, Richard wrote: I agree, it does seem non-intuitive. The initial SR R15,R15 was undoubtedly preparing for a default rc of zero. How the non-zero rc gets put into the register later is largely a matter of taste. In this case I probably would have chosen L R15,=X'...' - a habit learned, when machines were slower, based on the knowledge that they were mostly optimized for the LOAD instruction vs. any other way of putting data from memory into a register. If your habit was to use L Rx,=X'...' you were probably lucky in the old days the =X literal would not necessarily be word-aligned, causing two fetches to load the register, or, in the days when alignment really mattered... a program exception. Better to use L R15,=A(X'...') if alignment is a concern and you want to use literals. Then the literal IS aligned on a fullword boundary. The initial SR 15,15 is unlikely to be setting the default return code.. .it's clearing the register preparing for the different option bytes to be OR'd in. I agree the macro could (should?) have generated a single L instruction instead, but then what nits would we have to discuss? :-)
CMSCALL return code
If your habit was to use L Rx,=X'...' you were probably lucky in the old days the =X literal would not necessarily be word-aligned, causing two fetches to load the register, or, in the days when alignment really mattered... a program exception. Not true. Assemblers going back to F (anything before?) have always ordered literals by alignment. Any 8-byte-multiple literal will be aligned on an 8-byte boundary, regardless of how defined. Then all 4-byte-multiple literals not a multiple of 8, then all 2-byte-multiple not x4 or x8, then all 1-byte-multiple literals. So, a literal =X'12345678' or =X'12,34,56,78', etc., will be 4-byte aligned. Richard Corak
Re: CMSCALL return code
Wasn't the 360 assembler before F the Basic Assembler Language - the origin of the BAL acronym. It was designed so that it could be used on small machines where memory was at a premium. The scheme of sorting the literal pool according to size probably originated with it. In the early 360s, the Byte Oriented Operand feature was an extra cost option. Without it, a program check was the result if you violated the alignment requirements. Before that, there was 7080 Autocoder; however, the concept of a word was somewhat different. IIRC, you could set a register's length (1-256 characters - it was before the word byte had been invented) or load from a beginning address to a word mark in memory. Since starting address and size were arbitrary, there was no alignment requirement other than that word marks had to be in a 4 or a 9 position. If you chose to load to a word mark and there was none in memory, the load instruction would happily keep wrapping the register until the instruction timed out. The 7080 was a character machine that did decimal arithmetic. Its contemporary was the 7094 which was strictly 36-bit word oriented. Alignment in it was not an option. -Original Message- From: The IBM z/VM Operating System [mailto:[EMAIL PROTECTED] On Behalf Of Richard Corak Sent: Wednesday, November 22, 2006 8:23 AM To: IBMVM@LISTSERV.UARK.EDU Subject: CMSCALL return code If your habit was to use L Rx,=X'...' you were probably lucky in the old days the =X literal would not necessarily be word-aligned, causing two fetches to load the register, or, in the days when alignment really mattered... a program exception. Not true. Assemblers going back to F (anything before?) have always ordered literals by alignment. Any 8-byte-multiple literal will be aligned on an 8-byte boundary, regardless of how defined. Then all 4-byte-multiple literals not a multiple of 8, then all 2-byte-multiple not x4 or x8, then all 1-byte-multiple literals. So, a literal =X'12345678' or =X'12,34,56,78', etc., will be 4-byte aligned. Richard Corak
Re: CMSCALL return code
I really would not have left it to chance, I would have defined a word-aligned constant rather than using a literal. However, it might not have been as chancy as it may seem. The literal pool is doubleword aligned and boundary alignment may have been a factor in determining where the literal resided. I would like to think that the 8-byte multiples are put at the front, the 4-byters next, then the twos followed by everybody else. In looking at an assembly listing, that seems to be the sequence. The first two literals in the program are =x'A00', the next =x'FF', etc. In the literal pool, all 4 byte entries (there were no 8 byte literals) precede the two byte literals and then come the ones of only 1 byte. Within each of these groups, the literals appear in the order in which they were defined. There were no long strings defined as literals in the particular listing. -Original Message- From: The IBM z/VM Operating System [mailto:[EMAIL PROTECTED] On Behalf Of Don Russell Sent: Tuesday, November 21, 2006 3:46 PM To: IBMVM@LISTSERV.UARK.EDU Subject: Re: CMSCALL return code Schuh, Richard wrote: I agree, it does seem non-intuitive. The initial SR R15,R15 was undoubtedly preparing for a default rc of zero. How the non-zero rc gets put into the register later is largely a matter of taste. In this case I probably would have chosen L R15,=X'...' - a habit learned, when machines were slower, based on the knowledge that they were mostly optimized for the LOAD instruction vs. any other way of putting data from memory into a register. If your habit was to use L Rx,=X'...' you were probably lucky in the old days the =X literal would not necessarily be word-aligned, causing two fetches to load the register, or, in the days when alignment really mattered... a program exception. Better to use L R15,=A(X'...') if alignment is a concern and you want to use literals. Then the literal IS aligned on a fullword boundary. The initial SR 15,15 is unlikely to be setting the default return code.. .it's clearing the register preparing for the different option bytes to be OR'd in. I agree the macro could (should?) have generated a single L instruction instead, but then what nits would we have to discuss? :-)
Re: CMSCALL return code
On 21 Nov 2006 at 17:32, Schuh, Richard wrote: However, it might not have been as chancy as it may seem. The literal pool is doubleword aligned and boundary alignment may have been a factor in determining where the literal resided. I would like to think that the 8-byte multiples are put at the front, the 4-byters next, then the twos followed by everybody else. In looking at an assembly listing, that seems to be the sequence. I was going to say the same thing, but a bit more definitively. The following quotation is taken from the HLASM 5 ref manual, at http://publibfp.boulder.ibm.com/cgi-bin/bookmgr/BOOKS/asmr1010/5.31.1?SHELF=DT=20040728153937 Each literal pool has five segments into which the literals are stored (a) in the order that the literals are specified, and (b) according to their assembled lengths, which, for each literal, is the total explicit or implied length, as described below. | The first segment contains all literal constants whose assembled | lengths are a multiple of 16. The second segment contains those whose assembled lengths are a multiple of 8, but not of 16. The third segment contains those whose assembled lengths are a multiple of 4, but not a multiple of 8. The fourth segment contains those whose assembled lengths are even, but not a multiple of 4. The fifth segment contains all the remaining literal constants whose assembled lengths are odd. | Since each literal pool is aligned on a SECTALGN alignment, this | guarantees that all literals in the second segment are doubleword aligned; | in the third segment, fullword aligned; and, in the fourth, halfword | aligned. Don Russell also said: If your habit was to use L Rx,=X'...' you were probably lucky in the old days the =X literal would not necessarily be word-aligned, causing ... problems ;-) I believe the current literal pool alignment behavior has been around for a pretty long time. I went now to look it up in the ref, but it is how I remember being taught in the old days of the 70s. Shimon -- ** ** Shimon Lebowitzmailto:[EMAIL PROTECTED] VM System Programmer . Israel Police National HQ. http://www.poboxes.com/shimonpgp Jerusalem, Israel phone: +972 2 530-9877 fax: 530-9308 ** **