Re: CPU time differences for the same job

Anne & Lynn Wheeler Wed, 06 Feb 2008 04:53:19 -0800

The following message is a courtesy copy of an article
that has been posted to bit.listserv.ibm-main,alt.folklore.computers as well.

[EMAIL PROTECTED] (Tom Schmidt) writes:
> I've been running VM more off than on since PLC 5 and I'm certain that
> the behavior that I referenced WAS in VM... at some point.  But if you
> & Lynn Wheeler say it isn't there now, I'll believe you (unless/until
> I can prove you wrong, of course).
>  
> But I know back in the VM/HPO or (maybe) early VM/XA days it was true
> that VM put itself into a tiny loop while it waited for work.  The
> loop was in a unique-to-VM PSW key so that the hardware monitor (the
> "speedometer") could tell the difference between work and wait.

there were a number of specific environment experiments done in that
time frame ... for one reason or another.

one of the first was for acp/tpf on 3081. acp/tpf didn't support
multiprocessor support ... and there wasn't going to be a
non-multiprocessor machine.

normally to simulate a privilege instruction (not handled by the
micrcode) ... interrupt into the vm kernel, do the simulation, and
return to the virtual machine. this "overhead" will tend to be constant
from run to run ... directly part of doing work for the virtual machine.
over the years, attempts were made to get this as small as possible
and/or have it done directly in the hardware of the machine.

the other overhead is the cache/paging scenario ... fault for page not
in memory and there is overhead to bring the page into memory. this is
analogous to cache miss ... and the program appears to execute slower
because of the latency to process the cache miss. this can be variable
based on other activity going on in the real machine (analogous reasons
for both cache misses and page faults).

in the acp/tpf scenario ... if essentially just about the only workload
was acp/tpf ... the 2nd 3081 processor would be idle. so there was a
hack developed for things like SIOF emulation ... interrupt into the
kernel, create an asyncronous task for SIOF emulation, SIGP the other
processor and return to acp/tpf virtual machine. Creation of asyncronous
task, signaling the other processor, taking the interrupts, plus
misc. multiprocessing locking/unlocking drove up total avg "overhead" by
ten to fifteen percent. However, the SIOF and ccw translation offloaded
to run asyncronously on the idle 3081 resulted in net thruput benefit
for the single acp/tpf scenario.

The "problem" was that the implementation drove up the total avg
"overhead" by ten to fifteen percent for every customer running VM on
multiprocessor ... even those where the other processors weren't idle.

For pure topic drift ... there is something analogous going on in the
current environment with multi-core processors being introduced into the
desktop/laptop (personal) computing environment.

Eventually, 3081 with the 2nd processor was removed was announced as
3083 (for acp/tpf customers). Since 3081 still had the cross-cache
chatter 10 percent cycle slowdown scenario i've described for 370
multiprocessor ... 

they were able to run the single 3081 (aka 3083) processor nearly
15percent faster. even later still, acp/tpf eventually supported
multiprocessor.

"Active wait" was another such experiment ... where a specific hardware
configuration and workload gained a couple percent if the system
effectively polled for something to do.

from long ago and far away

To: wheeler
DATE: 04/19/85 20:58:47

On 4/10/85 xxxxxx presented his latest results to management and others
and I thought you might be interested to hear how we stack up against
HPO.  These are runs of VM/XA SF1 (which is Mig. Aid releases 3 and 4
rolled up into one package now), with about 2K LOC of enhancements to
boost the performance.  The enhancements include processor-local true
runlists and "active wait", with a master-only runlist also.  They also
include a significant rework of the drum paging code and rework of the
SSKE code (for non-resident pages only?).  And other things which I just
forget now.  All these things collectively saved a whole lot of
execution time.

As a  result, SF1 now can handle 80% of the number of CMS users
that HPO can handle, whereas earlier it was only about 60% as many as
HPO.

... snip ... 

now, the HPO base they are referring to still has the 10-15 percent
multiprocessor "penalty" that had been introduced for the acp/tpf
environment. There were also a list of a dozen or so other carefully
chosen workload and configuration items to try and weight the comparison
in VM/XA SF1 favor (CMS workload truely trivial, trivial paging
activity, homogeneous well-behaved CMS workload ... but lots of them).
I don't remember the exact VM/XA SF1 processor cycles for "active wait"
trade-off vis-a-vis actually being in wait state (and VM/XA was a
totally different implementation from VM/HPO).

The "active wait" was along the same lines as the "delayed queue drop"
fix from the same era. There was a bug in identifying "idle" activity
and dropping "idle" tasks from active queue (decommitting resources)
... for some virtual machines under some circumstances. Rather than fix
the bug ... whenever something was identified as idle, introduce a
couple hundred millisecond delay before actual "queue drop" (for
everybody). Under specific configuration & workload considerations,
systems showed improved thruput with the "delayed queue drop" fix
... but could make thruput in other environments worse. The actual
solution should have been to fix the underlying bug ... rather than
layering fixes on top.

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html

Re: CPU time differences for the same job

Reply via email to