Re: Execution Velocity

Anne & Lynn Wheeler Wed, 21 Mar 2012 12:36:40 -0700

m42tom-ibmm...@yahoo.com (Tom Marchant) writes:
> To look at it another way, cache exists because main storage is very
> slow compared to the processor speed.  Without cache, the processor
> would not be able to execute instructions nearly as fast as it could.
> Cache allows data from main storage to be kept very close to the
> processor in extremely fast memory, allowing the processor to execute
> instructions as fast as possible.


there have been observations that the latency of a cache miss (elapsed
to retrieve data from main storage), measured in "processor cycles" is
on the order of the 60s disk access ... measured in 60s processor
cycles. the effort in the 60s to improve throughput was to have
multitasking and/or multithreading ... be able to switch to some other
work ... while waiting for disk accesses.

a lot of work was done in this area starting in the 80s ... especially
for risk processors, for out-of-order execution and speculative
execution ... allowing execution of other instructions (that had their
data in cache) ... while a "stalled" instruction was waiting on
cache-miss.  The 60s equivalent to not simply trying to make infinite
sized storage, as countermeasure to serialized miss latency, but
multiprogramming to be able to switch to something else while waiting.

there was work on hyperthreading ... independent instruction streams
feeding common execution units ... that while one stalled instruction
stream (waiting on cache miss), there could be instruction execution
from other independent instruction stream ... basically simulates
multiple processors ... but w/o actual double all of the hardware.

possibly one of the original hyperthreading efforts was for 370/195 ...
which didn't actually ship. 370/195 was pipelined and allowed
out-of-order execution ... but didn't have speculative execution and/or
branch prediction ... so conditional branch stalled the processor.  Peak
throughput of 195 was approx. 10mips ... but most codes only got 5mips
because of abundance of conditional branches. The 195 hyperthreading
effort was to simulate multiprocessing with two instruction streams,
PSWs, registers, etc ... but not twice the hardware (instructions in
pipeline would have one flag bit indicating which instruction stream it
was associated with). Two (simulated multiprocessor, independent)
instruction streams ... each executing in the pipeline at 5mips (because
of stall waiting for conditional branches) ... would be able to keep the
execution units operating at effective throughput of 10mips.

One of the issues of current and past several generation of "86"
("CISC") chips ... are that they are actually RISC chips ... with a
hardware layer translating the CISC instructions into RISC micro-ops for
actual execution. This has resulted in significantly closing the thruput
MIP thruput rate of CISC vis-a-vis traditional RISC.

The current generation of chips have cache sizes larger than the 60s
processor memory sizes. However, the relative performance degradation of
a cache miss today is about the same as virtual memory page fault from
the 60s. Some applications today are tuned to maintain their "working
set" in cache (and minimize cache misses) ... in much the same way that
virtual memory apps in the 60s were tuned to maintain their working sets
in real storage (and minimize page faults).

The science center ... some past posts
http://www.garlic.com/~lynn/subtopic.html#545tech

besides having done a lot of virtual memory with (virtual machine) cp67
in the 60s and early 70s ... also did a lot of work with performance
monitoring, performance simulated, and workload profiling (some of which
then evolved into capacity planning). Science center also did a lot of
paging algorithm and paging simulation work. One such was full
instruction trace that was then fed into paging simulator ... that also
had support for doing semi-automatic program re-organization to optimize
operation in virtual memory environment. This was eventually released as
a product called VS/Repack in 1976. However, even before it was
released, internally, a lot of products made extensive use of it to
improve their operations ... including a lot of OS/360 applications,
subsystems, and products making the transition to "virtual storage"
environment.

Some of the high-use, performance sensitive applications do something
similar today ... but from the standpoint of improved throughput in a
processor cache environment (i.e. today's cache has become the 60s real
storage for 60s virtual memory page fault systems)

For a total different take ... there have been some current
high-throughput processors done w/o caches ... but with something like
128 "hyper-threads", aka 128 independent instruction streams (simulated
multi-processors) ...  that while the processor execution is stalled
waiting for data from other threads ... the hardware is able to switch
to some thread that has instruction with data ready for execution (think
of it as hardware multiprogramming and hardware WAIT/POST and its own
dispatch/scheduling)

-- 
virtualization experience starting Jan1968, online at home since Mar1970

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@bama.ua.edu with the message: INFO IBM-MAIN

Re: Execution Velocity

Reply via email to