Re: Is there a significant performance penalty for non-aligned operands?

2014-06-04 Thread Binyamin Dissen
On Mon, 2 Jun 2014 11:45:38 -0700 Ed Jaffe edja...@phoenixsoftware.com
wrote:

:NIL and OIL are not required on z196 and higher machines. NI and OI now 
:do the necessary serialization.

Do I understand correctly?

NI/OI do the same serialization that CS does? That two OI's issued on two
processors will guarantee the result of both bits set?

--
Binyamin Dissen bdis...@dissensoftware.com
http://www.dissensoftware.com

Director, Dissen Software, Bar  Grill - Israel


Should you use the mailblocks package and expect a response from me,
you should preauthorize the dissensoftware.com domain.

I very rarely bother responding to challenge/response systems,
especially those from irresponsible companies.

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Is there a significant performance penalty for non-aligned operands?

2014-06-04 Thread Bob Rutledge

Binyamin Dissen wrote:

On Mon, 2 Jun 2014 11:45:38 -0700 Ed Jaffe edja...@phoenixsoftware.com
wrote:

:NIL and OIL are not required on z196 and higher machines. NI and OI now 
:do the necessary serialization.


Do I understand correctly?

NI/OI do the same serialization that CS does? That two OI's issued on two
processors will guarantee the result of both bits set?



For OR (OI, OIY), the first operand is one byte in length, and only one byte is 
stored. When the inter-locked-access facility 2 is installed, the update of the
first operand appears to be an interlocked-update ref-erence as observed by 
other CPUs and channel pro-grams, and a specific-operand-serialization operation

is performed.

Facility bit 52.

Bob

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Is there a significant performance penalty for non-aligned operands?

2014-06-02 Thread Skip Robinson
Early in my career I developed a habit when writing ASM code. In 'working 
storage', code DS/DC fields in this order unless some other structural 
sequence is required: double words first, then full words, half words, and 
finally character (C or X) fields of whatever length. Most macros generate 
their own alignment instructions. Always heed alignment warnings from the 
assembler. 

.
.
J.O.Skip Robinson
Southern California Edison Company
Electric Dragon Team Paddler 
SHARE MVS Program Co-Manager
626-302-7535 Office
323-715-0595 Mobile
jo.skip.robin...@sce.com



From:   Lloyd leful...@sbcglobal.net
To: IBM-MAIN@LISTSERV.UA.EDU, 
Date:   06/01/2014 06:12 PM
Subject:Re: Is there a significant performance penalty for 
non-aligned operands?
Sent by:IBM Mainframe Discussion List IBM-MAIN@LISTSERV.UA.EDU



On 6/1/2014 7:51 PM, Peter Relson wrote:
 I believe the answer is no, all other things being equal.

 But if one of those other things is that the non-aligned operand spans
 cache lines whereas an aligned one would not, then the answer could be a
 huge yes.

 Things such as doubleword (or quadword) consistency could not be relied
 upon, depending on the degree of non-alignment.

 Peter Relson
 z/OS Core Technology Design

 --
 For IBM-MAIN subscribe / signoff / archive access instructions,
 send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

There are times, though, when things have to be aligned for specific 
instructions whether hardware instructions or supervisor calls (not 
necessarily SVC) that things HAVE to be on a doubleword or quadword 
boundary.  What I usually do then is defne a variable larger by the 
appropriate number of bytes and verify by shifting the address that I am 
aligned.

Lloyd


--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Is there a significant performance penalty for non-aligned operands?

2014-06-02 Thread Jim Mulder
 Once upon a time IIRC there was a significant performance penalty for
 non-aligned operands (loading a fullword from an address not evenly
 divisible by four, etc.). Does that still exist for modern Z processors?
 (Once upon a time it didn't work at all, but that's AFH, to use an 
acronym I
 learned this week.)
 
 I have a string that looks like halfword, char, char, ..., halfword, 
char,
 char, ... . It will be accessed millions of times per day in a system 
exit.
 Would it be better to pack the halfwords immediately following the 
chars,
 or halfword-align them with a slack byte where necessary? The access 
will be
 read-only if that makes a difference.

  Response from hardware designer:

It depends on the system.  z10 and before, if an operand crossed a DW 
there
was a penalty.  On z196 and later, it is only if an operand crosses a 
cache line.  Of course things could change again in the future.

Jim Mulder   z/OS System Test   IBM Corp.  Poughkeepsie,  NY

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Is there a significant performance penalty for non-aligned operands?

2014-06-02 Thread Charles Mills
Pardon my ignorance: what exactly is a cache line?

Charles

-Original Message-
From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On
Behalf Of Jim Mulder
Sent: Monday, June 02, 2014 8:13 AM
To: IBM-MAIN@LISTSERV.UA.EDU
Subject: Re: Is there a significant performance penalty for non-aligned
operands?

 Once upon a time IIRC there was a significant performance penalty for 
 non-aligned operands (loading a fullword from an address not evenly 
 divisible by four, etc.). Does that still exist for modern Z processors?
 (Once upon a time it didn't work at all, but that's AFH, to use an
acronym I
 learned this week.)
 
 I have a string that looks like halfword, char, char, ..., halfword,
char,
 char, ... . It will be accessed millions of times per day in a system
exit.
 Would it be better to pack the halfwords immediately following the
chars,
 or halfword-align them with a slack byte where necessary? The access
will be
 read-only if that makes a difference.

  Response from hardware designer:

It depends on the system.  z10 and before, if an operand crossed a DW there
was a penalty.  On z196 and later, it is only if an operand crosses a cache
line.  Of course things could change again in the future.

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Is there a significant performance penalty for non-aligned operands?

2014-06-02 Thread Lloyd

On 6/2/2014 11:01 AM, Skip Robinson wrote:

Early in my career I developed a habit when writing ASM code. In 'working
storage', code DS/DC fields in this order unless some other structural
sequence is required: double words first, then full words, half words, and
finally character (C or X) fields of whatever length. Most macros generate
their own alignment instructions. Always heed alignment warnings from the
assembler.

.
.
J.O.Skip Robinson
Southern California Edison Company
Electric Dragon Team Paddler
SHARE MVS Program Co-Manager
626-302-7535 Office
323-715-0595 Mobile
jo.skip.robin...@sce.com



--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

I do that also when I am allocating the storage area. Unfortunately, 
some of the things that I code need to run with Language Environment and 
needs to use the LE stack.  The LE stack is NOT guaranteed to be 
allocated on a doubleword boundary.  It is on a byte boundary basis, and 
yes, I have seen it allocated at ...1 or .7.  That type of code 
is when I need to do as I proposed.


Lloyd

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Is there a significant performance penalty for non-aligned operands?

2014-06-02 Thread Jim Mulder
 Pardon my ignorance: what exactly is a cache line?

 http://en.wikipedia.org/wiki/CPU_cache

 For purposes of this alignment discussion, the cache
line size on IBM mainframes has been 256 (x'100') bytes
for at least all of the z/Architecture machines, and
I think even for several generations before that. 

Jim Mulder   z/OS System Test   IBM Corp.  Poughkeepsie,  NY

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Is there a significant performance penalty for non-aligned operands?

2014-06-02 Thread John Eells

Charles Mills wrote:

Pardon my ignorance: what exactly is a cache line?


The caches are divided into 256-byte increments called cache lines in 
current processors.


--
John Eells
z/OS Technical Marketing
IBM Poughkeepsie
ee...@us.ibm.com

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Is there a significant performance penalty for non-aligned operands?

2014-06-02 Thread Shmuel Metz (Seymour J.)
In
ofe6b203a0.45aeb620-on85257cea.0082e0b5-85257cea.00831...@us.ibm.com,
on 06/01/2014
   at 07:51 PM, Peter Relson rel...@us.ibm.com said:

I believe the answer is no, all other things being equal.

What about the performance of Load versus ICM for nonaligned data?
 
-- 
 Shmuel (Seymour J.) Metz, SysProg and JOAT
 ISO position; see http://patriot.net/~shmuel/resume/brief.html 
We don't care. We don't have to care, we're Congress.
(S877: The Shut up and Eat Your spam act of 2003)

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Is there a significant performance penalty for non-aligned operands?

2014-06-02 Thread Paul Gilmartin
On Mon, 2 Jun 2014 11:47:17 -0400, John Eells wrote:

Charles Mills wrote:
 Pardon my ignorance: what exactly is a cache line?

The caches are divided into 256-byte increments called cache lines in
current processors.
 
(The exact number is probably subject to change.)

Is this also the width of the data path to main memory?  It would
seem sensible to avoid spending cycles fetching data that would
never be accessed.

Does HLASM and/or STORAGE provide for cache line alignment?

I've long wondered why the NIL and OIL macros are necessary to
lock access to bits within a byte, whereas the hardware performs
the equivalent function for bytes within a cache line.

-- gil

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Is there a significant performance penalty for non-aligned operands?

2014-06-02 Thread CM Poncelet

... and code CNOP 2,8 or 4,8 (not DS 0H or 0F) at the end of DSECTs

Skip Robinson wrote:

Early in my career I developed a habit when writing ASM code. In 'working 
storage', code DS/DC fields in this order unless some other structural 
sequence is required: double words first, then full words, half words, and 
finally character (C or X) fields of whatever length. Most macros generate 
their own alignment instructions. Always heed alignment warnings from the 
assembler. 


.
.
J.O.Skip Robinson
Southern California Edison Company
Electric Dragon Team Paddler 
SHARE MVS Program Co-Manager

626-302-7535 Office
323-715-0595 Mobile
jo.skip.robin...@sce.com



From:   Lloyd leful...@sbcglobal.net
To: IBM-MAIN@LISTSERV.UA.EDU, 
Date:   06/01/2014 06:12 PM
Subject:Re: Is there a significant performance penalty for 
non-aligned operands?

Sent by:IBM Mainframe Discussion List IBM-MAIN@LISTSERV.UA.EDU



On 6/1/2014 7:51 PM, Peter Relson wrote:
 


I believe the answer is no, all other things being equal.

But if one of those other things is that the non-aligned operand spans
cache lines whereas an aligned one would not, then the answer could be a
huge yes.

Things such as doubleword (or quadword) consistency could not be relied
upon, depending on the degree of non-alignment.

Peter Relson
z/OS Core Technology Design

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

   

There are times, though, when things have to be aligned for specific 
instructions whether hardware instructions or supervisor calls (not 
necessarily SVC) that things HAVE to be on a doubleword or quadword 
boundary.  What I usually do then is defne a variable larger by the 
appropriate number of bytes and verify by shifting the address that I am 
aligned.


Lloyd


--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


 



--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Is there a significant performance penalty for non-aligned operands?

2014-06-02 Thread Ed Jaffe

On 6/2/2014 8:01 AM, Skip Robinson wrote:

Early in my career I developed a habit when writing ASM code. In 'working
storage', code DS/DC fields in this order unless some other structural
sequence is required: double words first, then full words, half words, and
finally character (C or X) fields of whatever length.


These days, fields at the top of a data area (i.e., lowest positive 
addresses for base register coverage) tend to be those that must be 
accessed via instructions that support only 12-bit displacements, e.g. 
MVC. Fields referenced by instructions with 20-bit displacements (e.g., 
L can be replaced with LY, LLGT, LLGF, LGF, etc.) can appear much, much 
further away or even in a negative direction (though I have yet to 
leverage that) from the origin of the data area.


--
Edward E Jaffe
Phoenix Software International, Inc
831 Parkview Drive North
El Segundo, CA 90245
http://www.phoenixsoftware.com/

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Is there a significant performance penalty for non-aligned operands?

2014-06-02 Thread Ed Jaffe

On 6/2/2014 9:06 AM, Paul Gilmartin wrote:

On Mon, 2 Jun 2014 11:47:17 -0400, John Eells wrote:


The caches are divided into 256-byte increments called cache lines in
current processors.


(The exact number is probably subject to change.)


Modern processors implement instructions a program can use to learn 
about the cache configuration on the machine, though I don't expect the 
line size to change in my lifetime.



Does HLASM and/or STORAGE provide for cache line alignment?


For a long, long time now...


I've long wondered why the NIL and OIL macros are necessary to
lock access to bits within a byte, whereas the hardware performs
the equivalent function for bytes within a cache line.


NIL and OIL are not required on z196 and higher machines. NI and OI now 
do the necessary serialization.


--
Edward E Jaffe
Phoenix Software International, Inc
831 Parkview Drive North
El Segundo, CA 90245
http://www.phoenixsoftware.com/

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Is there a significant performance penalty for non-aligned operands?

2014-06-02 Thread Charles Mills
Thanks, John. My question was indeed more what is? than how big?

Charles
-Original Message-
From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On
Behalf Of John Eells
Sent: Monday, June 02, 2014 8:47 AM
To: IBM-MAIN@LISTSERV.UA.EDU
Subject: Re: Is there a significant performance penalty for non-aligned
operands?

Charles Mills wrote:
 Pardon my ignorance: what exactly is a cache line?

The caches are divided into 256-byte increments called cache lines in
current processors.

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Is there a significant performance penalty for non-aligned operands?

2014-06-01 Thread John Gilmore
There is a performance penalty.  I have measured it for aligned and
unaligned signed halfword, i.e.,  signed binary fixed(15,0), in
compiled PL/I code and found that it is usually about 13%, which may
be trivial or important depending upon context.

More important in multiple-CP environments, I think, is that there are
contexts in which alignment|non-alignment determines whether an
operation is performed as an interlocked|non-interlocked update.  See
the discussion of the ASI and AGSI instructions on page 7-25 of the
current PrOp.

John Gilmore, Ashland, MA 01721 - USA

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Is there a significant performance penalty for non-aligned operands?

2014-06-01 Thread Charles Mills
Thanks, John, exactly the sort of information I was looking for. Do you
recall what the hardware was?

I am leaning toward aligning it. If I can gain a couple of milliseconds of
CPU time per customer per day in return for coding + 1  0xfffe once I
think that's worth it.

OTOH, it will make the whole structure a little bit bigger. There is an
absolute downside to using more ECSA, and it also carries a performance
penalty of its own: a byte of padding decreases the likelihood that the byte
you need is already in cache.

Re-emphasizing what I said, the references in question are read-only. The
structure is effectively write once, read many.

Charles

-Original Message-
From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On
Behalf Of John Gilmore
Sent: Sunday, June 01, 2014 6:19 AM
To: IBM-MAIN@LISTSERV.UA.EDU
Subject: Re: Is there a significant performance penalty for non-aligned
operands?

There is a performance penalty.  I have measured it for aligned and
unaligned signed halfword, i.e.,  signed binary fixed(15,0), in compiled
PL/I code and found that it is usually about 13%, which may be trivial or
important depending upon context.

More important in multiple-CP environments, I think, is that there are
contexts in which alignment|non-alignment determines whether an operation is
performed as an interlocked|non-interlocked update.  See the discussion of
the ASI and AGSI instructions on page 7-25 of the current PrOp.

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Is there a significant performance penalty for non-aligned operands?

2014-06-01 Thread Peter Relson
I believe the answer is no, all other things being equal.

But if one of those other things is that the non-aligned operand spans 
cache lines whereas an aligned one would not, then the answer could be a 
huge yes.

Things such as doubleword (or quadword) consistency could not be relied 
upon, depending on the degree of non-alignment.

Peter Relson
z/OS Core Technology Design

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Is there a significant performance penalty for non-aligned operands?

2014-06-01 Thread Lloyd

On 6/1/2014 7:51 PM, Peter Relson wrote:

I believe the answer is no, all other things being equal.

But if one of those other things is that the non-aligned operand spans
cache lines whereas an aligned one would not, then the answer could be a
huge yes.

Things such as doubleword (or quadword) consistency could not be relied
upon, depending on the degree of non-alignment.

Peter Relson
z/OS Core Technology Design

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

There are times, though, when things have to be aligned for specific 
instructions whether hardware instructions or supervisor calls (not 
necessarily SVC) that things HAVE to be on a doubleword or quadword 
boundary.  What I usually do then is defne a variable larger by the 
appropriate number of bytes and verify by shifting the address that I am 
aligned.


Lloyd

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN