Re: Fw: Dataspace versus common area above the bar

2014-01-24 Thread Ed Jaffe

On 1/24/2014 4:20 PM, John McKown wrote:

On Fri, Jan 24, 2014 at 4:58 PM, Ed Jaffe wrote:

Perhaps JG's assertion is actually about "grande" instructions vs "normal"
instructions. Our benchmarks show grande instructions are ever-so-slightly
(<2%) slower than their non-grande counterparts. Example: L vs LG.

Of course, the instruction path for the six-byte grande "LG" benchmark
code is 50% larger (in terms of space occupied, not instructions issued)
than its four-byte non-grande "L" counterpart, meaning more i-cache is
required to run it. So, perhaps that is to what this <2% difference is
attributable.

Either way, it's something we consistently observe...

In cases like this, it might be helpful to know the _exact_ machine you're
running on, or if this is one a number of different machine types. z9BC,
z9EC, z10BC, z10EC, and so on.


Latest benchmarks run on full-speed zIIP engine of a zBC12.

--
Edward E Jaffe
Phoenix Software International, Inc
831 Parkview Drive North
El Segundo, CA 90245
http://www.phoenixsoftware.com/

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Fw: Dataspace versus common area above the bar

2014-01-24 Thread John McKown
On Fri, Jan 24, 2014 at 4:58 PM, Ed Jaffe wrote:

> On 1/21/2014 5:24 PM, Jim Mulder wrote:
>
>>
>>   From a hardware design engineer:
>> 
>> All hardware instructions perform at the same speed in 64-bit mode or
>> 31-bit mode.  I assume the AMODE(31) and AMODE(64) he is referring to
>> only affects the addressing mode, but the exact same instruction
>> sequences are used in both cases. If different code sequences are being
>> used, then all bets are off.  My first statement applies to the
>> exact same code sequence in 64-bit addressing mode versus 31-bit
>> addressing mode. A few millicoded instructions do have slightly
>> different path lengths depending on addressing mode, but even that
>> is not common.
>> 
>>
>
> Perhaps JG's assertion is actually about "grande" instructions vs "normal"
> instructions. Our benchmarks show grande instructions are ever-so-slightly
> (<2%) slower than their non-grande counterparts. Example: L vs LG.
>
> Of course, the instruction path for the six-byte grande "LG" benchmark
> code is 50% larger (in terms of space occupied, not instructions issued)
> than its four-byte non-grande "L" counterpart, meaning more i-cache is
> required to run it. So, perhaps that is to what this <2% difference is
> attributable.
>
> Either way, it's something we consistently observe...
>
> --
> Edward E Jaffe
>
>

In cases like this, it might be helpful to know the _exact_ machine you're
running on, or if this is one a number of different machine types. z9BC,
z9EC, z10BC, z10EC, and so on.

-- 
Wasn't there something about a PASCAL programmer knowing the value of
everything and the Wirth of nothing?

Maranatha! <><
John McKown

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Fw: Dataspace versus common area above the bar

2014-01-24 Thread Ed Jaffe

On 1/21/2014 5:24 PM, Jim Mulder wrote:


  From a hardware design engineer:

All hardware instructions perform at the same speed in 64-bit mode or
31-bit mode.  I assume the AMODE(31) and AMODE(64) he is referring to
only affects the addressing mode, but the exact same instruction
sequences are used in both cases. If different code sequences are being
used, then all bets are off.  My first statement applies to the
exact same code sequence in 64-bit addressing mode versus 31-bit
addressing mode. A few millicoded instructions do have slightly
different path lengths depending on addressing mode, but even that
is not common.



Perhaps JG's assertion is actually about "grande" instructions vs 
"normal" instructions. Our benchmarks show grande instructions are 
ever-so-slightly (<2%) slower than their non-grande counterparts. 
Example: L vs LG.


Of course, the instruction path for the six-byte grande "LG" benchmark 
code is 50% larger (in terms of space occupied, not instructions issued) 
than its four-byte non-grande "L" counterpart, meaning more i-cache is 
required to run it. So, perhaps that is to what this <2% difference is 
attributable.


Either way, it's something we consistently observe...

--
Edward E Jaffe
Phoenix Software International, Inc
831 Parkview Drive North
El Segundo, CA 90245
http://www.phoenixsoftware.com/

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Fw: Dataspace versus common area above the bar

2014-01-23 Thread Shmuel Metz (Seymour J.)
In
,
on 01/21/2014
   at 07:44 PM, John Gilmore  said:

>and it may be that what we have here is a misunderstanding of my
>language. 

Or it may be your ignorance.


>Let me begin with a little history.  On System/360 models above the
>model 30, L was faster than LH because they had  [at least]
>four-byte fetch widths and had to 'throw away' half of what they
>fetched for LH.

What was the 360/40, chopped liver?
 
-- 
 Shmuel (Seymour J.) Metz, SysProg and JOAT
 ISO position; see  
We don't care. We don't have to care, we're Congress.
(S877: The Shut up and Eat Your spam act of 2003)

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Fw: Dataspace versus common area above the bar

2014-01-21 Thread Kenneth Wilkerson
I have never found comparing instruction speeds to be a fair gauge of
performance. It's not the choice of instructions (unless the original
choices were very poor) that affect performance but algorithms. As has been
pointed out, I have never seen any evidence that converting an algorithm
using data spaces and alets to one using 64 bit instructions and shared
memory objects would result in any measurable (2+%as an example) difference
in performance. However, if the change afforded a way to significantly
reduce the working set size or a way to search less frequently, this can
often yield significant reductions in overhead. 

Some things are very difficult to quantify. For example, there is
significant argument over the advantages of transactional memory versus
locks. On the surface, locking is more efficient but at a cost to
throughput. Transactional memory can use more cycle but improve throughput.
So how do you quantify this?

Almost 30 years ago, I developed a non-traditional storage manager that does
not use chains. As a result, it does not experience storage fragmentation.
It's path length varies slightly from the 1st to the millionth call. As a
resut, it outperforms chained storage manager that require locks by many
factors. And as the number of calls grow, the performance factor increases. 

Again I have never seen significant gains from using the same algorithms and
simply changing the instructions. Whereas,  I have seen x-fold  performance
reductions by improving algorithms.

Kenneth


-Original Message-
From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On
Behalf Of Jim Mulder
Sent: Tuesday, January 21, 2014 7:25 PM
To: IBM-MAIN@LISTSERV.UA.EDU
Subject: Re: Fw: Dataspace versus common area above the bar

> 
> AMODE does not affect performance.  Can you explain which instructions
> you think are faster than some functional equivalent, and why you
> think they are faster?
> 
> 
> and it may be that what we have here is a misunderstanding of my
> language.  Let me begin with a little history.  On System/360 models
> above the model 30, L was faster than LH because they had  [at least]
> four-byte fetch widths and had to 'throw away' half of what they
> fetched for LH.
> 
> In my experience, and I have made many measurements, the same
> principle continues to apply mutatis mutandis today.
> 
> I, for example, have a pair of assembly-language glb-seeking binary
> search routines
> that search the same table of quadword elements.  One of these
> routines is AMODE(31) and one AMODE(64).The table---The same
> assembled table is always used---contains 63 elements.   The usual 127
> searches are performed, each 256 times.  In the upshot the AMODE(64)
> routine is measurably, 2.1201%, faster.
> 
> I have performed similar tests using searches of ordered lists of
> 10(10)200 elements.  They are more addressing-intensive, and the
> superiority of the AMODE(64) routine increases almost linearly with
> table size, from 2.0897% for a list of 10 elements to 2.3311% for a
> list of 200 elements.
> 
> Now it may be that what you mean by "AMODE does not affect
> performance" is different from what I mean.  If so, I should be
> pleased to have you clarify the ways in which our uses of this word
> are different.

 From a hardware design engineer:

All hardware instructions perform at the same speed in 64-bit mode or 
31-bit mode.  I assume the AMODE(31) and AMODE(64) he is referring to
only affects the addressing mode, but the exact same instruction 
sequences are used in both cases. If different code sequences are being
used, then all bets are off.  My first statement applies to the 
exact same code sequence in 64-bit addressing mode versus 31-bit
addressing mode. A few millicoded instructions do have slightly 
different path lengths depending on addressing mode, but even that
is not common.


  If you can send me the listings of the exact code that you are
measuring, I might be able to analyze the difference that
you are measuring.

  There certainly have been cases over the years where 
some processors required extra cycles to perform operand extension,
especially when involves sign bit propagation.  For specific
instructions on a specific processor, I can ask the engineers if
that is the case (as long as it is a recent enough processor that 
the engineers are still here). 
 
Jim Mulder   z/OS System Test   IBM Corp.  Poughkeepsie,  NY

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Fw: Dataspace versus common area above the bar

2014-01-21 Thread Jim Mulder
> 
> AMODE does not affect performance.  Can you explain which instructions
> you think are faster than some functional equivalent, and why you
> think they are faster?
> 
> 
> and it may be that what we have here is a misunderstanding of my
> language.  Let me begin with a little history.  On System/360 models
> above the model 30, L was faster than LH because they had  [at least]
> four-byte fetch widths and had to 'throw away' half of what they
> fetched for LH.
> 
> In my experience, and I have made many measurements, the same
> principle continues to apply mutatis mutandis today.
> 
> I, for example, have a pair of assembly-language glb-seeking binary
> search routines
> that search the same table of quadword elements.  One of these
> routines is AMODE(31) and one AMODE(64).The table---The same
> assembled table is always used---contains 63 elements.   The usual 127
> searches are performed, each 256 times.  In the upshot the AMODE(64)
> routine is measurably, 2.1201%, faster.
> 
> I have performed similar tests using searches of ordered lists of
> 10(10)200 elements.  They are more addressing-intensive, and the
> superiority of the AMODE(64) routine increases almost linearly with
> table size, from 2.0897% for a list of 10 elements to 2.3311% for a
> list of 200 elements.
> 
> Now it may be that what you mean by "AMODE does not affect
> performance" is different from what I mean.  If so, I should be
> pleased to have you clarify the ways in which our uses of this word
> are different.

 From a hardware design engineer:

All hardware instructions perform at the same speed in 64-bit mode or 
31-bit mode.  I assume the AMODE(31) and AMODE(64) he is referring to
only affects the addressing mode, but the exact same instruction 
sequences are used in both cases. If different code sequences are being
used, then all bets are off.  My first statement applies to the 
exact same code sequence in 64-bit addressing mode versus 31-bit
addressing mode. A few millicoded instructions do have slightly 
different path lengths depending on addressing mode, but even that
is not common.


  If you can send me the listings of the exact code that you are
measuring, I might be able to analyze the difference that
you are measuring.

  There certainly have been cases over the years where 
some processors required extra cycles to perform operand extension,
especially when involves sign bit propagation.  For specific
instructions on a specific processor, I can ask the engineers if
that is the case (as long as it is a recent enough processor that 
the engineers are still here). 
 
Jim Mulder   z/OS System Test   IBM Corp.  Poughkeepsie,  NY

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Fw: Dataspace versus common area above the bar

2014-01-21 Thread John Gilmore
Jim Mulder wrote:


AMODE does not affect performance.  Can you explain which instructions
you think are faster than some functional equivalent, and why you
think they are faster?


and it may be that what we have here is a misunderstanding of my
language.  Let me begin with a little history.  On System/360 models
above the model 30, L was faster than LH because they had  [at least]
four-byte fetch widths and had to 'throw away' half of what they
fetched for LH.

In my experience, and I have made many measurements, the same
principle continues to apply mutatis mutandis today.

I, for example, have a pair of assembly-language glb-seeking binary
search routines
that search the same table of quadword elements.  One of these
routines is AMODE(31) and one AMODE(64).The table---The same
assembled table is always used---contains 63 elements.   The usual 127
searches are performed, each 256 times.  In the upshot the AMODE(64)
routine is measurably, 2.1201%, faster.

I have performed similar tests using searches of ordered lists of
10(10)200 elements.  They are more addressing-intensive, and the
superiority of the AMODE(64) routine increases almost linearly with
table size, from 2.0897% for a list of 10 elements to 2.3311% for a
list of 200 elements.

Now it may be that what you mean by "AMODE does not affect
performance" is different from what I mean.  If so, I should be
pleased to have you clarify the ways in which our uses of this word
are different.

John Gilmore, Ashland, MA 01721 - USA

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Fw: Dataspace versus common area above the bar

2014-01-21 Thread Hank Oerlemans
Thank you Peter Relson.my wife is glad she is not alone ...


Hank, PD Tools, IBM Australia


IBM Mainframe Discussion List  wrote on 
21/01/2014 10:19:56 AM:


> ...
>   I have seen Peter Relson type (fast) while he was 
> looking at me and carrying on a conversation.  For me, 
> that happens only in my dreams.  Pretty much the same
> way that fast skating and goal scoring happens only in my dreams. 
> 
> Jim Mulder   z/OS System Test   IBM Corp.  Poughkeepsie,  NY
> 
> --
> For IBM-MAIN subscribe / signoff / archive access instructions,
> send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
> 

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Fw: Dataspace versus common area above the bar

2014-01-20 Thread Jim Mulder
> Could additional potential CPU efficiencies be gained from creating the
> memory objects with 1Mb large pages (or 2Gb if they are REALLY big!)?

  1Mb pages can reduce the number of TLB misses, along with the
cache effects of fetching the DAT table entries to resolve the 
TLB misses.  Keep in mind that z/OS pages out 1Mb pages
to Flash memory only (and not to page data sets), so 1Mb pages are 
pageable only if you have an EC12 processor with Flash memory. 

 PAGEFRAMESIZE=1M can be specified on DSPSERV CREATE on
z/OS 2.1,  or z/OS 1.13 with the RSM Web-deliverable for
Flash memory support installed.


Jim Mulder   z/OS System Test   IBM Corp.  Poughkeepsie,  NY

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Fw: Dataspace versus common area above the bar

2014-01-20 Thread Jim Mulder
> You also meant TARGET_VIEW=HIDDEN, not TAGET_V IEW=HIDDEN.

  Unfortunately, even with 34 years in the programming business,
I have never mastered typing skills. At my mother's
insistence, I did take a 6-week summer school class in
touch typing in 1974, but it didn't do me much good - I 
still have to look at the keyboard.  Fortunately, dump reading
involves mostly staring at the dump, staring at assembly listings,
and some amount of profanity, but not a lot of typing.

  I have seen Peter Relson type (fast) while he was 
looking at me and carrying on a conversation.  For me, 
that happens only in my dreams.  Pretty much the same
way that fast skating and goal scoring happens only in my dreams. 

Jim Mulder   z/OS System Test   IBM Corp.  Poughkeepsie,  NY

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Fw: Dataspace versus common area above the bar

2014-01-20 Thread Graham Harris
Could additional potential CPU efficiencies be gained from creating the
memory objects with 1Mb large pages (or 2Gb if they are REALLY big!)?


On 20 January 2014 22:40, John Gilmore  wrote:

> Jim.
>
> You also meant TARGET_VIEW=HIDDEN, not TAGET_V IEW=HIDDEN.
>
>
> John Gilmore, Ashland, MA 01721 - USA
>
> --
> For IBM-MAIN subscribe / signoff / archive access instructions,
> send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
>

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Fw: Dataspace versus common area above the bar

2014-01-20 Thread John Gilmore
Jim.

You also meant TARGET_VIEW=HIDDEN, not TAGET_V IEW=HIDDEN.


John Gilmore, Ashland, MA 01721 - USA

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Fw: Dataspace versus common area above the bar

2014-01-20 Thread Jim Mulder
>> Memory objects are much more flexible than data spaces. Data spaces are
>> limited to 2GB. Memory objects are only limited by the auxiliary 
storage.
>> Memory objects can be guarded and can also be page protected. Data 
spaces
>> cannot. Code can execute in memory object but not in data spaces. I 
started
>> using memory objects 10 years ago and have nearly forgotten how to use 
a
>> data space. 

>  Guard pages and protected pages can be created in data spaces
>using IARV64  with TAGET_VIEW=HIDDEN  and TARGET_VIEW=READONLY

 I meant IARVSERV, not IARV64 

Jim Mulder   z/OS System Test   IBM Corp.  Poughkeepsie,  NY

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN