Re: Millicode Instructions

2013-05-13 Thread Robert Ngan
MVCX sounds a bit like MVCOS with R00=0.
The PoOP says MVCOS may be significantly slower than MVC, but I would be
interested to see a comparison between it and an executed MVC i.e. for use
in short(ish) variable length moves.

Robert Ngan
CSC Financial Services Group



From:   Peurifoy, Richard L r-peuri...@neo.tamu.edu
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Date:   2013/04/17 10:49
Subject:Re: Millicode Instructions
Sent by:IBM Mainframe Assembler List ASSEMBLER-LIST@LISTSERV.UGA.EDU



 Some millicode instructions will outperform their PoOp-code
 counterparts
 because millicode has access to hardware features not available to
 ordinary code. For example, MVCL(E) has the ability to move data
 under
 certain conditions without loading it into cache. (You can't do that
 with looping MVC.) Millicode routines also have access to the MVCX
 instruction which performs a variable-length MVC -- something
 ordinary
 programs cannot do without using the EXecute instruction.

MVCX sounds like it would be usefull for non-millicode, any idea why it
was not externalized? Is there a coresponding CLCX?

--
Richard


Automatic reply: Millicode Instructions

2013-05-13 Thread Ward Able, Grant
I will be out of the office, returning May 28th. I will respond to your email 
ASAP once I am back.

In the meantime if you require an urgent response, please contact CSC on 
212-855-1541.




BR_
FONT size=2BR
DTCC DISCLAIMER: This email and any files transmitted with it are
confidential and intended solely for the use of the individual or
entity to whom they are addressed. If you have received this email
in error, please notify us immediately and delete the email and any
attachments from your system. The recipient should check this email
and any attachments for the presence of viruses.  The company
accepts no liability for any damage caused by any virus transmitted
by this email./FONT


Re: Millicode Instructions

2013-05-13 Thread Ed Jaffe

On 4/17/2013 8:44 AM, Peurifoy, Richard L wrote:

Is there a coresponding CLCX?


I assume yes, although I know not for sure...

--
Edward E Jaffe
Phoenix Software International, Inc
831 Parkview Drive North
El Segundo, CA 90245
http://www.phoenixsoftware.com/


Re: Good Performing Code (Was: Millicode Instructions)

2013-04-19 Thread Peter Relson
I would assume that 1 branch (the first option) is always faster than
2 branches (the second option). The branch prediction in the CPU
should figure out with execution path is most likely.

That is not a correct assumption, even if the branch prediction table of
the CPU was long enough to remember every branch ever taken for the life
of the power-on.
You would need to know if this code is even executed enough to stay in any
sort of table (let alone in cache).

I'm not saying which is faster, just that the assumption is incorrect. In
this land of pipelines and out of order execution (mixed with operating
system dispatches and redispatches), hard and fast rules are hard to come
by.

What is knowable is that the general approach of the machine is to look
ahead and to prefer that conditional branches not be taken.
And when looking ahead, it knows that an unconditional branch will be
taken so it can continue forward. So maybe it will be able to look further
forward in the fall-through case.

Peter Relson
z/OS Core Technology Design


Automatic reply: Good Performing Code (Was: Millicode Instructions)

2013-04-19 Thread Howell Meghan R
I will be out of the office on vacation starting from 6:30am CDT on Friday, 
April 19th.  I will not have access to email throughout this time.  I will 
return on Monday, April 29th at 6:30am.


Re: Good Performing Code (Was: Millicode Instructions)

2013-04-19 Thread DAL POS Raphael
And according to Dr John, BCT/BCTG, BXLE/BXLEG are predicted to always branch.

Ciao,

--
Raphael Dal-Pos / z/OS Support
Generali France Assurances
DSIO - DIO - IT Infrastructure  Support
Saint Denis - Wilo W 03 B1 028 F
rdal...@generali.fr  +(33)1-58-38-59-67
  or mobile  +(33)6.24.33.20.87
--
MVS: Guilty, until proven innocent !! RDP 2009


-Message d'origine-
De : IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] De 
la part de Peter Relson
Envoyé : vendredi 19 avril 2013 14:39
À : ASSEMBLER-LIST@LISTSERV.UGA.EDU
Objet : Re: Good Performing Code (Was: Millicode Instructions)

I would assume that 1 branch (the first option) is always faster than
2 branches (the second option). The branch prediction in the CPU
should figure out with execution path is most likely.

That is not a correct assumption, even if the branch prediction table of
the CPU was long enough to remember every branch ever taken for the life
of the power-on.
You would need to know if this code is even executed enough to stay in any
sort of table (let alone in cache).

I'm not saying which is faster, just that the assumption is incorrect. In
this land of pipelines and out of order execution (mixed with operating
system dispatches and redispatches), hard and fast rules are hard to come
by.

What is knowable is that the general approach of the machine is to look
ahead and to prefer that conditional branches not be taken.
And when looking ahead, it knows that an unconditional branch will be
taken so it can continue forward. So maybe it will be able to look further
forward in the fall-through case.

Peter Relson
z/OS Core Technology Design


Re: Good Performing Code (Was: Millicode Instructions)

2013-04-18 Thread Fred van der Windt
 I understand also that unconditional branches are faster than conditional 
 branches.  So, which is faster:

 BNZ   LABEL  Branch most frequent
 or:
  BZ*+8fall through most frequent
 B LABEL  Unconditional

It might seem naïve but I would assume that 1 branch (the first option) is 
always faster than 2 branches (the second option). The branch prediction in the 
CPU should figure out with execution path is most likely.

Fred!
-
ATTENTION:
The information in this electronic mail message is private and
confidential, and only intended for the addressee. Should you
receive this message by mistake, you are hereby notified that
any disclosure, reproduction, distribution or use of this
message is strictly prohibited. Please inform the sender by
reply transmission and delete the message without copying or
opening it.

Messages and attachments are scanned for all viruses known.
If this message contains password-protected attachments, the
files have NOT been scanned for viruses by the ING mail domain.
Always scan attachments before opening them.
-


Re: Good Performing Code (Was: Millicode Instructions)

2013-04-18 Thread Phil Smith III
John Ehrman wrote:
Performance concerns about individual instructions aren't worth much
effort. Things like operand alignment, data and instruction cache
retention, locality of reference, branch frequency etc. can have really
significant effects.

For sure. But for the pathologically curious, if you have z/VM source, look
at the main module for EXEC 2 (DMSEXE). It's full of lines like:
  SLR   R0,R8  (DO IT WHILE R3 SETTLES)

These must go back to what, 303x? 370 itself? Christopher J. Stephenson Sir
Chris the EXECutor) wrote this code 30+ years ago, when that stuff DID
matter.

There were giants in those days...!

...phsiii


Re: Good Performing Code (Was: Millicode Instructions)

2013-04-18 Thread Fred van der Windt
The two newest processors (z196 and zEC12) do out-of-order processing. Does 
that mean that we do not need to 'intermingle' instructions because the 
processor will do it for us?

Fred!

Sent from my new iPad

On Apr 18, 2013, at 17:05, Phil Smith III li...@akphs.com wrote:

 John Ehrman wrote:
 Performance concerns about individual instructions aren't worth much
 effort. Things like operand alignment, data and instruction cache
 retention, locality of reference, branch frequency etc. can have really
 significant effects.

 For sure. But for the pathologically curious, if you have z/VM source, look
 at the main module for EXEC 2 (DMSEXE). It's full of lines like:
  SLR   R0,R8  (DO IT WHILE R3 SETTLES)

 These must go back to what, 303x? 370 itself? Christopher J. Stephenson Sir
 Chris the EXECutor) wrote this code 30+ years ago, when that stuff DID
 matter.

 There were giants in those days...!

 ...phsiii
-
ATTENTION:
The information in this electronic mail message is private and
confidential, and only intended for the addressee. Should you
receive this message by mistake, you are hereby notified that
any disclosure, reproduction, distribution or use of this
message is strictly prohibited. Please inform the sender by
reply transmission and delete the message without copying or
opening it.

Messages and attachments are scanned for all viruses known.
If this message contains password-protected attachments, the
files have NOT been scanned for viruses by the ING mail domain.
Always scan attachments before opening them.
-


Re: Millicode Instructions

2013-04-17 Thread David Stokes
Well, I for one don't go along with the obsession with saving a nanosecond here 
and there. In any case, as someone just pointed out,  a compiler can do far 
more optimization than one can manage by hand, and compiler writers spend large 
amounts of time determining optimal instruction sequences for certain 
operations and developing algorithms to compile an optimal solution for each 
piece of code. Modern software systems like Microsoft .Net can compile at 
run-time for the current hardware architecture. So much for TRT vs TRTE. Few 
active product developers have time to really learn all the endless new 
z/Architecture instructions, anyway, I suggest, and continually compare them 
and determine which would be optimal in this or that situation. (But maybe I'm 
just too lazy nowadays).

What rarely (if ever) gets addressed here is good programming practices (in a 
wider sense than how to load a base register or whatever), something you 
continually encounter on HLL forums, but rarely, somehow, in assembler. That 
for me means writing code that works correctly, is understandable, reflects in 
structure the logic of the problem, and can be easily modified and expanded in 
scope. Cryptically clever code is generally best avoided, much as it seems to 
appeal to certain kinds of programmers. Efficiency of individual small code 
sections is mostly pretty irrelevant unless at the center of a loop which is 
executed a vast number of times. Not so long ago it was suggested by one of the 
more august personalities here that I should not use a system macro for its 
intended purpose but rather some allegedly quicker set of instructions 
accessing the same data via control block pointers. However, since the code is 
executed once at start-up of a permanently active STC the issue !
 of speed was not very relevant.

Good practices in my view would also exclude enormous code sections requiring 
numerous base registers (even if replaced by relative branches). Our coding 
standards never gave rise to a need for more than one code base register, 
although it's all baseless nowadays and uses 64 bit code and the odd ZS3 
instruction, even. In fact we recently implemented a pre-loader with the aim of 
loading different code versions for modern or older machines, but have seen no 
pressing need to use it yet for that purpose (it has other functions as well). 
Structured programming as small logical sections is something that can be 
practiced in assembler too. I am responsible for several products in use around 
the world in large IBM mainframe computer centers. They are all written in 
assembler (for various good reasons from the distant past, starting again today 
might change things of course). Although we occasionally hear a customer 
complain that we are using too much CPU, it is generally due to !
 poor use of the products' facilities and not to obvious weaknesses in the 
code.  Speed in a program depends often more on the architecture of code than 
on individual instructions. Running serially through long lists or tables to 
find stuff is a common cause of CPU hotspots. One solution is to use a hash 
table. Using methods like bubble-sort rather than say quicksort algorithms to 
sort data in storage makes the programming easy, but much, much slower. Of 
course in pure assembler these kind of things have to be programmed. (The nicer 
part of using HLL - in the wider world of Java, C#, C++ etc. anyway - is having 
large libraries of functions available to do such things). Misuse of system 
functions can cause issues too,  some of our early I/O code caused problems by 
issuing unnecessary PGSER RELEASE requests, for example.  Such things can be 
determined by suitable tools.

Btw, my last reply to one of your posts got caught by the reply-to issue, but I 
didn't feel a great need to post it again to the list. It wasn't my intention 
to reply to you personally.

DS




-Ursprüngliche Nachricht-
Von: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] Im 
Auftrag von Scott Ford
Gesendet: Mittwoch, 17. April 2013 00:55
An: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Betreff: Re: Millicode Instructions

Ed,

I want to ask a question, in this day/age and processing power is it really 
worth being concerned about Assembler instructions speed ? Unless there is some 
application that is very time sensitive, that I understand


Regards,

Scott J Ford
Software Engineer
http://www.identityforge.com/




 From: Ed Jaffe edja...@phoenixsoftware.com
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Sent: Tuesday, April 16, 2013 6:13 PM
Subject: Re: Millicode Instructions


On 4/16/2013 12:43 PM, Gibney, Dave wrote:
 I don't get to work at this level often, but I am always interested.
 How can Millicode be faster than the equivalent using the hardware
 instructions? As I understand Millicode, that is really all it is
 (using the hardware instructions) plus any overhead in context
 switching to the Millicode

Re: Good Performing Code (Was: Millicode Instructions)

2013-04-17 Thread Rob van der Heij
On 17 April 2013 07:34, Ed Jaffe edja...@phoenixsoftware.com wrote:

 On 4/16/2013 3:55 PM, Scott Ford wrote:

 I want to ask a question, in this day/age and processing power is it
 really worth
 being concerned about Assembler instructions speed ?


 I am not unbiased. My answer is exactly what one would expect from the
 CTO of a software company that has been authoring far-better-performing
 code since 1978. Am I proud of slides 67-74 in this SHARE presentation?
 https://share.confex.com/**share/120/webprogram/Handout/**
 Session13319/%28E%29JES%**20Update_SHARE%20120.pdfhttps://share.confex.com/share/120/webprogram/Handout/Session13319/%28E%29JES%20Update_SHARE%20120.pdf
 You bet I am!


You may! :-)

I think that most of the people on the list realize that much of this type
of discussions is to hone your skills, understand what challenges your code
offers to the machine, and be able to diagnose issues with code fragments
where it is relevant. My experience is that those who don't appreciate the
low level concepts often don't see the big picture either.

Much of what I learn here while lurking provides background information for
when I need to address an issue in the code base that I inherited. I
recently found the code spending a lot of time in sequentially searching a
linked list, comparing each key with an EX of a CLC instruction (which I
understand is not a good idea anymore). Since only the search argument was
variable length, I could copy it to a fixed length field and do plain CLC
instead. While the big improvements are in the algorithms, some
understanding of the machine architecture is helpful when thinking about
those issues as well. I don't know which part helped most to write the next
release that added a lot of new function, reduced the size of code by 30%
and reduced CPU usage by factor of 5-10.

While it is true that CPUs have gotten faster, the volume of data we
operate on has often increased as well. And even when algorithms are O(N)
the volume of data can still surprise you. My favorite quote from an
application developer is: Rob, we know this is not effecient. But it works
fine for 100,000 records. Why would it not work for 107 million? (hint,
100K records took less than 2 minutes to run, the nightly batch took 27
hrs).

Rob


Re: Good Performing Code (Was: Millicode Instructions)

2013-04-17 Thread John Gilmore
Long ago G. H. Hardy, one of the great figures of 20th-century
mathematics, set out what he took to be the three most important
characteristics of successful contributors to any technical field.
They are

1) intellectual curiosity, the itch to know how things work,

2) craftsmanship, a commitment to doing the best job one knows how to do, and

3) a desire for recognition, fame, money, the esteem of one's
colleagues and the like.

Conspicuously absent from this list are preoccupation with rules of
thumb and standard practices.

Algorithms are indeed important:  Linear search is polynomial-time;
binary search is logarithmic time.  Details are important too, not
least because the cumulative effect of getting them wrong can swamp
the advantages that the choice of good algorithms confers.  Scale is
important.  Some problems are still inaccessible, others will remain
so when computers that operate at the frequencies of hard cosmic rays
become available.

Taste and experience are important.  Anyone who knows a little physics
can sit down and make a long list of the things that may affect the
path of, say, an artillery shell from muzzle to impact. Some of them
indeed need to be considered; but it turns out that the Newtonian
model of the parabolic path of a  mass point in a gravitational field
is usually sovereign.  The capacity to clear away intellectual clutter
is thus one of the chief marks of high talent, and the role for
programmers of low talent is diminishing rapidly.  (Yes, this is a
species of

Programming, like other engineering activities, is characterized, all
but defined, by the need to make tradeoffs among conflicting, finally
irreconcilable objectives; and few programmers are at all good at
this, mostly because 1) no institutional premium has been placed on
doing it well and 2) they have been poorly educated to do it.

Over time vendor groups and ISVs like EJ's will, I think, very largely
replace in-house programming staffs.  We shall have a situation much
like that which prevails in the legal profession today.  For scut
work, reviewing an employment contract or lease, say, the in-house
lawyer has his or her uses.  For real trouble and important advice, an
outside firm must be turned to.

John Gilmore, Ashland, MA 01721 - USA


Re: Good Performing Code (Was: Millicode Instructions)

2013-04-17 Thread DASDBILL2
Your elapsed times in g oing from 100,000 records to 107 million looks like 
linear scaling.  That's the best one can hope for.  Working fine means 
running in less than two minutes.  It did work for 107 million records.  But 
it didn't work fine because it took longer than two minutes.  I suppose this 
developer also expects it to take less than two minutes to process 100 billion 
records.  The application developer needs to go to remedial multiplication 
class.  I learned how to multiply in the third grade.  Sixty years later I 
still remember how to multiply.  It's also important to know when, why, and 
what  to multiply. 


Bill Fairchild 
Franklin, TN 


- Original Message -
From: Rob van der Heij rvdh...@gmail.com 
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU 
Sent: Wednesday, April 17, 2013 3:12:09 AM 
Subject: Re: Good Performing Code (Was: Millicode Instructions) 

On 17 April 2013 07:34, Ed Jaffe edja...@phoenixsoftware.com wrote: 

 On 4/16/2013 3:55 PM, Scott Ford wrote: 
 
 I want to ask a question, in this day/age and processing power is it 
 really worth 
 being concerned about Assembler instructions speed ? 
 
 
 I am not unbiased. My answer is exactly what one would expect from the 
 CTO of a software company that has been authoring far-better-performing 
 code since 1978. Am I proud of slides 67-74 in this SHARE presentation? 
 https://share.confex.com/**share/120/webprogram/Handout/** 
 Session13319/%28E%29JES%**20Update_SHARE%20120.pdfhttps://share.confex.com/share/120/webprogram/Handout/Session13319/%28E%29JES%20Update_SHARE%20120.pdf
  
 You bet I am! 
 

You may! :-) 

I think that most of the people on the list realize that much of this type 
of discussions is to hone your skills, understand what challenges your code 
offers to the machine, and be able to diagnose issues with code fragments 
where it is relevant. My experience is that those who don't appreciate the 
low level concepts often don't see the big picture either. 

Much of what I learn here while lurking provides background information for 
when I need to address an issue in the code base that I inherited. I 
recently found the code spending a lot of time in sequentially searching a 
linked list, comparing each key with an EX of a CLC instruction (which I 
understand is not a good idea anymore). Since only the search argument was 
variable length, I could copy it to a fixed length field and do plain CLC 
instead. While the big improvements are in the algorithms, some 
understanding of the machine architecture is helpful when thinking about 
those issues as well. I don't know which part helped most to write the next 
release that added a lot of new function, reduced the size of code by 30% 
and reduced CPU usage by factor of 5-10. 

While it is true that CPUs have gotten faster, the volume of data we 
operate on has often increased as well. And even when algorithms are O(N) 
the volume of data can still surprise you. My favorite quote from an 
application developer is: Rob, we know this is not effecient. But it works 
fine for 100,000 records. Why would it not work for 107 million? (hint, 
100K records took less than 2 minutes to run, the nightly batch took 27 
hrs). 

Rob 


Re: Good Performing Code (Was: Millicode Instructions)

2013-04-17 Thread Rob van der Heij
On 17 April 2013 15:14, DASDBILL2 dasdbi...@comcast.net wrote:

 Your elapsed times in g oing from 100,000 records to 107 million looks
 like linear scaling.  That's the best one can hope for.  Working fine
 means running in less than two minutes.  It did work for 107 million
 records.  But it didn't work fine because it took longer than two
 minutes.  I suppose this developer also expects it to take less than two
 minutes to process 100 billion records.  The application developer needs to
 go to remedial multiplication class.  I learned how to multiply in the
 third grade.  Sixty years later I still remember how to multiply.  It's
 also important to know when, why, and what  to multiply.


Right. I like the case because it illustrates some of the issues. In this
particular case the application went from a intimate interaction between
z/OS and DB2 to a remote database on z/Linux. It's not even bad if you make
the round trip from the application through TCP/IP to the database, now and
then hit the disk, dispatch the virtual machine, and back to the
application, and all on average under 1 ms. That's less than 2 minutes for
100K, but 27 hrs for 100M...

I think it's not uncommon for people having trouble to absorb several
orders of magnitude. Many mainframe folks have learned to do that
multiplication despite intuition. And some of us know how long it takes to
copy a 3390-3 and can do the math during the meeting already. I've been
involved in migration projects where people claimed it was pretty fast
but in reality would not even do 5% of the total migration in 48 hrs. It's
the same experience that makes me ask what about backup and D/R and
project managers blame the messenger for being negative...

Rob


Re: Good Performing Code (Was: Millicode Instructions)

2013-04-17 Thread Gerhard Postpischil

On 4/17/2013 9:14 AM, DASDBILL2 wrote:

I learned how to multiply in the third grade.  Sixty years later I
still remember how to multiply.  It's also important to know when,
why, and what  to multiply.


Simple - you write the two numbers with the larger on the left. In the
next row, double the number on the left, and halve the number on the
right, discarding any fraction. Upon reaching 1 on the right, cross out
any row where the right number is even. Add the remaining rows on the left.

Gerhard Postpischil
Bradford, Vermont


Re: Good Performing Code (Was: Millicode Instructions)

2013-04-17 Thread DASDBILL2
I tried your algorithm with 13 multiplied by 81 and produced the correct 
answer.  This algorithm is undoubtedly how the microcode for the M (multiply 
fullword) instruction does its math. 



There are many paths to the end of one's journey, Grasshopper. 

Bill Fairchild 
Franklin, TN 

“Political language is designed to make lies sound truthful and murder 
acceptable, and to give the appearance of solidity to pure wind.” [George 
Orwell] 

- Original Message -
From: Gerhard Postpischil gerh...@valley.net 
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU 
Sent: Wednesday, April 17, 2013 10:19:22 AM 
Subject: Re: Good Performing Code (Was: Millicode              Instructions) 

On 4/17/2013 9:14 AM, DASDBILL2 wrote: 
 I learned how to multiply in the third grade.  Sixty years later I 
 still remember how to multiply.  It's also important to know when, 
 why, and what  to multiply. 

Simple - you write the two numbers with the larger on the left. In the 
next row, double the number on the left, and halve the number on the 
right, discarding any fraction. Upon reaching 1 on the right, cross out 
any row where the right number is even. Add the remaining rows on the left. 

Gerhard Postpischil 
Bradford, Vermont 


Automatic reply: Good Performing Code (Was: Millicode Instructions)

2013-04-17 Thread Capps, Joey
I am currently out of the office and unreachable until Thursday.
If you have a P1, Production Down issue with PowerExchange or UDR products 
please make a voice call to Informatica Support and open an SR.

Thanks,
Joey


Re: Millicode Instructions

2013-04-17 Thread Peurifoy, Richard L
 Some millicode instructions will outperform their PoOp-code
 counterparts
 because millicode has access to hardware features not available to
 ordinary code. For example, MVCL(E) has the ability to move data
 under
 certain conditions without loading it into cache. (You can't do that
 with looping MVC.) Millicode routines also have access to the MVCX
 instruction which performs a variable-length MVC -- something
 ordinary
 programs cannot do without using the EXecute instruction.

MVCX sounds like it would be usefull for non-millicode, any idea why it
was not externalized? Is there a coresponding CLCX?

--
Richard


Re: Good Performing Code (Was: Millicode Instructions)

2013-04-17 Thread Paul Gilmartin
On 2013-04-17, at 09:31, DASDBILL2 wrote:

 I tried your algorithm with 13 multiplied by 81 and produced the correct 
 answer.  This algorithm is undoubtedly how the microcode for the M (multiply 
 fullword) instruction does its math.

It has a lot to do with where the 1-bits are in the binary representation
of the multiplier, yes.

GIYF.  Wallace tree

PDP-6 et al. inspected two bits of the multiplier at each iteration
and mixed adds and subtracts to get a 2s complement product without
a restoring step.

 - Original Message -
 From: Gerhard Postpischil
 Sent: Wednesday, April 17, 2013 10:19:22 AM

 Simple - you write the two numbers with the larger on the left. In the
 next row, double the number on the left, and halve the number on the
 right, discarding any fraction. Upon reaching 1 on the right, cross out
 any row where the right number is even. Add the remaining rows on the left.

-- gil


Re: Good Performing Code (Was: Millicode Instructions)

2013-04-17 Thread Scott Ford
Hey Gil,

I assume that type of math with bits is super fast ...I has a friend show my 
similar techniques using SRL or SLL, but my old age ...I forgot . Will have to 
revisit

Scott ford
www.identityforge.com
from my IPAD

'Infinite wisdom through infinite means'


On Apr 17, 2013, at 1:03 PM, Paul Gilmartin paulgboul...@aim.com wrote:

 On 2013-04-17, at 09:31, DASDBILL2 wrote:

 I tried your algorithm with 13 multiplied by 81 and produced the correct 
 answer.  This algorithm is undoubtedly how the microcode for the M (multiply 
 fullword) instruction does its math.
 It has a lot to do with where the 1-bits are in the binary representation
 of the multiplier, yes.

 GIYF.  Wallace tree

 PDP-6 et al. inspected two bits of the multiplier at each iteration
 and mixed adds and subtracts to get a 2s complement product without
 a restoring step.

 - Original Message -
 From: Gerhard Postpischil
 Sent: Wednesday, April 17, 2013 10:19:22 AM

 Simple - you write the two numbers with the larger on the left. In the
 next row, double the number on the left, and halve the number on the
 right, discarding any fraction. Upon reaching 1 on the right, cross out
 any row where the right number is even. Add the remaining rows on the left.

 -- gil


Re: Good Performing Code (Was: Millicode Instructions)

2013-04-17 Thread Paul Gilmartin
On 2013-04-17, at 11:26, Scott Ford wrote:

 I assume that type of math with bits is super fast ...I has a friend show my 
 similar techniques using SRL or SLL, but my old age ...I forgot . Will have 
 to revisit

I expect it's hardwired in the Multiply instruction.  Or, you could
do it with a MACRO.  But we're already having that discussion.

Similar techniques are applicable to SQRT.

-- gil


Re: Good Performing Code (Was: Millicode Instructions)

2013-04-17 Thread John Ehrman
Performance concerns about individual instructions aren't worth much
effort. Things like operand alignment, data and instruction cache
retention, locality of reference, branch frequency etc. can have really
significant effects.

Remember that CPU speeds have increased much faster than memory speeds --
getting an operand from cache can take a cycle or two, but from memory can
take hundreds or thousands (try causing a page fault!).

Re: Good Performing Code (Was: Millicode Instructions)

2013-04-17 Thread Scott Ford
John,

For example, what are Assembler no nos in performance ...I am trying to put you 
'on the spot' ,
I am curious and responsible person, so I would like to know

Best Regards,

Scott ford
www.identityforge.com
from my IPAD

'Infinite wisdom through infinite means'


On Apr 17, 2013, at 2:08 PM, John Ehrman ehr...@us.ibm.com wrote:

 Performance concerns about individual instructions aren't worth much
 effort. Things like operand alignment, data and instruction cache
 retention, locality of reference, branch frequency etc. can have really
 significant effects.

 Remember that CPU speeds have increased much faster than memory speeds --
 getting an operand from cache can take a cycle or two, but from memory can
 take hundreds or thousands (try causing a page fault!).


Re: Good Performing Code (Was: Millicode Instructions)

2013-04-17 Thread Scott Ford
Trying not to put you on the spot sorry...

Scott ford
www.identityforge.com
from my IPAD

'Infinite wisdom through infinite means'


On Apr 17, 2013, at 3:35 PM, Scott Ford scott_j_f...@yahoo.com wrote:

 John,

 For example, what are Assembler no nos in performance ...I am trying to put 
 you 'on the spot' ,
 I am curious and responsible person, so I would like to know

 Best Regards,

 Scott ford
 www.identityforge.com
 from my IPAD

 'Infinite wisdom through infinite means'


 On Apr 17, 2013, at 2:08 PM, John Ehrman ehr...@us.ibm.com wrote:

 Performance concerns about individual instructions aren't worth much
 effort. Things like operand alignment, data and instruction cache
 retention, locality of reference, branch frequency etc. can have really
 significant effects.

 Remember that CPU speeds have increased much faster than memory speeds --
 getting an operand from cache can take a cycle or two, but from memory can
 take hundreds or thousands (try causing a page fault!).


Re: Good Performing Code (Was: Millicode Instructions)

2013-04-17 Thread Ed Jaffe

On 4/17/2013 12:35 PM, Scott Ford wrote:

For example, what are Assembler no nos in performance ...I am trying to put you 
'on the spot' ,
I am curious and responsible person, so I would like to know


One example would be having a data area with various fields that are
frequently updated by multiple, simultaneous units of work. The cache
thrashing will eat your lunch. Instead, spread out the data so that
each unit of work has its own cache line to play with.

--
Edward E Jaffe
Phoenix Software International, Inc
831 Parkview Drive North
El Segundo, CA 90245
http://www.phoenixsoftware.com/


Re: Good Performing Code (Was: Millicode Instructions)

2013-04-17 Thread John Ehrman
Scott Ford asked:
 what are Assembler no nos in performance ...

Here are some examples (from my session 12522 talk at SHARE in San
Francisco):

1.  Memory speed is very slow compared to CPU speed -- for example, use
immediate operands wherever possible
2.  Operand alignment can be very important (doubleword alignment if
possible!)
3.  Don't mix instructions and data -- keep them far apart
4.  Modifying instructions on the fly is performance poison
5.  Minimize Address Generation Interlock (you can put other unrelated
instructions between these two at little or no cost because the CPU has to
wait until the first Load completes before it can execute the second)
   L1,Pointer
   L2,0(,1)
6.  Arrange branches so the fall through path is most frequent
7.  Keep data references close in memory and time
8.  Keep instruction references close in memory and time

That's a start, anyway.
John Ehrman

Re: Good Performing Code (Was: Millicode Instructions)

2013-04-17 Thread Paul Gilmartin
On 2013-04-17, at 14:59, John Ehrman wrote:

 6.  Arrange branches so the fall through path is most frequent

I understand also that unconditional branches are faster than
conditional branches.  So, which is faster:

 BNZ   LABEL  Branch most frequent
or:
 BZ*+8fall through most frequent
 B LABEL  Unconditional

?

-- gil


Re: Good Performing Code (Was: Millicode Instructions)

2013-04-17 Thread Ott, Jeff
Haven't seen a timings table since the early 90's, and rather than show some 
gusto and code up a test, my official guess is: neither.

-Original Message-
From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On 
Behalf Of Paul Gilmartin
Sent: Wednesday, April 17, 2013 6:19 PM
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Subject: Re: Good Performing Code (Was: Millicode Instructions)

On 2013-04-17, at 14:59, John Ehrman wrote:

 6.  Arrange branches so the fall through path is most frequent

I understand also that unconditional branches are faster than conditional 
branches.  So, which is faster:

 BNZ   LABEL  Branch most frequent
or:
 BZ*+8fall through most frequent
 B LABEL  Unconditional

?

-- gil


Millicode Instructions

2013-04-16 Thread Gibney, Dave
 -Original Message-
 From: IBM Mainframe Assembler List [mailto:ASSEMBLER-
 l...@listserv.uga.edu] On Behalf Of John Gilmore
 Sent: Tuesday, April 16, 2013 12:29 PM
 To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
 Subject: Re: TRTE and new instructions

 Peter Farley's points are interesting ones.  My numbers tell a very
 different tale, and I suspect that these differences turn on when such
 measurements are taken.

 The first appearances of new instructions, millicoded ones anyway, do
 often exhibit 'bad' performance; but this performance sometimes, even
 usually, improves rapidly.

 Working with millicoded instructions has taught me two important
 lessons: Their performance is a moving target, and early measurements
 of it are usually misleading.  It often improves significantly in the
 interval that would be required to replace them with alternative
 sequences.

I don't get to work at this level often, but I am always interested.
How can Millicode be faster than the equivalent using the hardware instructions?
As I understand Millicode, that is really all it is (using the hardware 
instructions) plus any overhead in context switching to the Millicode 
environment.

For the MVC/MVCL option, I can imagine a macro which generates an MVC loop, or 
unroll the loop into a sequence of MVC, or generate the MVCL depending on 
several criteria. I currently don't have the knowledge to determine the 
criteria and I would expect the criteria to change over time.


 John Gilmore, Ashland, MA 01721 - USA


Re: Millicode Instructions

2013-04-16 Thread John Gilmore
Dave Gibney wrote:

begin extract
How can Millicode be faster than the equivalent using the hardware instructions?
As I understand Millicode, that is really all it is (using the
hardware instructions) plus any overhead in context switching to the
Millicode environment.
/end extract

This is a common misunderstanding that has unfortunately been repeated
many times.  It is a radically misleading caricature.  Millicode makes
available many facilities not available in the HLASM.  It does not
make additional machine instructions available, but it does make its
own powerful facilities for specifying the path pf control among them
available.

I have always felt some impatience with this view.  If it were at all
accurate it would make millicode, which goes back to the System/390,
unimportant, even dispensable; and, while IBM is not infallible, it is
deeply serious about its hardware investments.


GIYF.  To begin see (watch wrap)

http://ecc.marist.edu/conf2011/materials/SlegelSystemZ_APeekUnderTheHood_Slegel_MaristECC.pdf.

John Gilmore, Ashland, MA 01721 - USA


Re: Millicode Instructions

2013-04-16 Thread Ed Jaffe

On 4/16/2013 12:43 PM, Gibney, Dave wrote:

I don't get to work at this level often, but I am always interested.
How can Millicode be faster than the equivalent using the hardware
instructions? As I understand Millicode, that is really all it is
(using the hardware instructions) plus any overhead in context
switching to the Millicode environment. For the MVC/MVCL option, I
can imagine a macro which generates an MVC loop, or unroll the loop
into a sequence of MVC, or generate the MVCL depending on several
criteria. I currently don't have the knowledge to determine the
criteria and I would expect the criteria to change over time


Some millicode instructions will outperform their PoOp-code counterparts
because millicode has access to hardware features not available to
ordinary code. For example, MVCL(E) has the ability to move data under
certain conditions without loading it into cache. (You can't do that
with looping MVC.) Millicode routines also have access to the MVCX
instruction which performs a variable-length MVC -- something ordinary
programs cannot do without using the EXecute instruction.

Furthermore, a millicode instruction is perceived by the architecture as
a single instruction. This allows millicode to do things that cannot be
simulated in ordinary code. For example, it would be impossible to write
a simulation of the PLO instruction.

--
Edward E Jaffe
Phoenix Software International, Inc
831 Parkview Drive North
El Segundo, CA 90245
http://www.phoenixsoftware.com/


Re: Millicode Instructions

2013-04-16 Thread Scott Ford
Ed,
 
I want to ask a question, in this day/age and processing power is it really 
worth
being concerned about Assembler instructions speed ? Unless there is some 
application that is very time sensitive, that I understand
 
 
Regards,

Scott J Ford
Software Engineer
http://www.identityforge.com/
 
 


 From: Ed Jaffe edja...@phoenixsoftware.com
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU 
Sent: Tuesday, April 16, 2013 6:13 PM
Subject: Re: Millicode Instructions
  

On 4/16/2013 12:43 PM, Gibney, Dave wrote:
 I don't get to work at this level often, but I am always interested.
 How can Millicode be faster than the equivalent using the hardware
 instructions? As I understand Millicode, that is really all it is
 (using the hardware instructions) plus any overhead in context
 switching to the Millicode environment. For the MVC/MVCL option, I
 can imagine a macro which generates an MVC loop, or unroll the loop
 into a sequence of MVC, or generate the MVCL depending on several
 criteria. I currently don't have the knowledge to determine the
 criteria and I would expect the criteria to change over time

Some millicode instructions will outperform their PoOp-code counterparts
because millicode has access to hardware features not available to
ordinary code. For example, MVCL(E) has the ability to move data under
certain conditions without loading it into cache. (You can't do that
with looping MVC.) Millicode routines also have access to the MVCX
instruction which performs a variable-length MVC -- something ordinary
programs cannot do without using the EXecute instruction.

Furthermore, a millicode instruction is perceived by the architecture as
a single instruction. This allows millicode to do things that cannot be
simulated in ordinary code. For example, it would be impossible to write
a simulation of the PLO instruction.

--
Edward E Jaffe
Phoenix Software International, Inc
831 Parkview Drive North
El Segundo, CA 90245
http://www.phoenixsoftware.com/


Re: Millicode Instructions

2013-04-16 Thread John McKown
For us, yes. We pay most of our software based on MSU usage. My boss says
that one MSU reduction will save us $13,000/yr. Is this huge? To us, yes.
We must constantly fight the management belief that Windows is better!
Cheaper! faster! If some company could do a conversion with a 1 year ROI,
they would go full blast without any other consideration being looked at.
On Apr 16, 2013 5:56 PM, Scott Ford scott_j_f...@yahoo.com wrote:

 Ed,

 I want to ask a question, in this day/age and processing power is it
 really worth
 being concerned about Assembler instructions speed ? Unless there is some
 application that is very time sensitive, that I understand


 Regards,

 Scott J Ford
 Software Engineer
 http://www.identityforge.com/



 
  From: Ed Jaffe edja...@phoenixsoftware.com
 To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
 Sent: Tuesday, April 16, 2013 6:13 PM
 Subject: Re: Millicode Instructions


 On 4/16/2013 12:43 PM, Gibney, Dave wrote:
  I don't get to work at this level often, but I am always interested.
  How can Millicode be faster than the equivalent using the hardware
  instructions? As I understand Millicode, that is really all it is
  (using the hardware instructions) plus any overhead in context
  switching to the Millicode environment. For the MVC/MVCL option, I
  can imagine a macro which generates an MVC loop, or unroll the loop
  into a sequence of MVC, or generate the MVCL depending on several
  criteria. I currently don't have the knowledge to determine the
  criteria and I would expect the criteria to change over time

 Some millicode instructions will outperform their PoOp-code counterparts
 because millicode has access to hardware features not available to
 ordinary code. For example, MVCL(E) has the ability to move data under
 certain conditions without loading it into cache. (You can't do that
 with looping MVC.) Millicode routines also have access to the MVCX
 instruction which performs a variable-length MVC -- something ordinary
 programs cannot do without using the EXecute instruction.

 Furthermore, a millicode instruction is perceived by the architecture as
 a single instruction. This allows millicode to do things that cannot be
 simulated in ordinary code. For example, it would be impossible to write
 a simulation of the PLO instruction.

 --
 Edward E Jaffe
 Phoenix Software International, Inc
 831 Parkview Drive North
 El Segundo, CA 90245
 http://www.phoenixsoftware.com/



Good Performing Code (Was: Millicode Instructions)

2013-04-16 Thread Ed Jaffe

On 4/16/2013 3:55 PM, Scott Ford wrote:

I want to ask a question, in this day/age and processing power is it really 
worth
being concerned about Assembler instructions speed ?


I am not unbiased. My answer is exactly what one would expect from the
CTO of a software company that has been authoring far-better-performing
code since 1978. Am I proud of slides 67-74 in this SHARE presentation?
https://share.confex.com/share/120/webprogram/Handout/Session13319/%28E%29JES%20Update_SHARE%20120.pdf
You bet I am!

--
Edward E Jaffe
Phoenix Software International, Inc
831 Parkview Drive North
El Segundo, CA 90245
http://www.phoenixsoftware.com/