> On Jun 11, 2024, at 11:52 PM, ben via cctalk <cctalk@classiccmp.org> wrote:
>
> On 2024-06-10 10:18 a.m., Joshua Rice via cctalk wrote:
>> On 10/06/2024 05:54, dwight via cctalk wrote:
>>> No one is mentioning multiple processors on a single die and cache that is
>>> bigger than most systems of that times complete RAM.
>>> Clock speed was dealt with clever register reassignment, pipelining and
>>> prediction.
>>> Dwight
>> Pipelining has always been a double edged sword. Splitting the instruction
>> cycle into smaller, faster chunks that can run simultaneously is a great
>> idea, but if the actual instruction execution speed gets longer, failed
>> branch predictions and subsequent pipeline flushes can truly bog down the
>> real-life IPS. This is ultimately what led the NetBurst architecture to be
>> the dead-end it became.
>
> The other gotya with pipelining, is you have to have equal size chunks.
> A 16 word register file seems to be right size for a 16 bit alu.
> 64 words for words for 32 bit alu. 256 words for 64 bit alu,
> as a guess.
Huh? There is no direct connection between word length, register count, and
pipeline length.
The natural pipeline length (for a given functional unit) is the number of
steps needed to do the work, given a step that can be completed in a single
clock cycle. That assumes a pipe that long is affordable; if not it gets
shorter. Not all functional units will have the same pipeline length.
The register count is a function of cost -- for the registers themselves and
for the scoreboard logic to sort out register conflicts. In modern designs
that would be die area; in older machines it would be cost in modules or
transistors. For example, in the CDC 6600, the registers (8 x 60 bits, 8 x 18
bit address, 8 * 18 bit index/count) and their associated data path controls
to/from all the functional units take up an entire chassis, 750-ish logic
modules.
> You never see a gate level delays on a spec sheet.
> Our pipeline is X delays + N delays for a latch.
Gate level delays are not interesting for the machine user to know. What is
interesting is the detailed properties of the pipelines, including whether they
can accept a new operation every cycle or just every N cycles (say, a
multiplier that accepts operands every 2 cycles); how many cycles is the delay
from input to output; and whether there are "bypass" data paths to reduce the
delays from input or output conflicts. Often these details are hard to pry out
of the manufacturer; often they are not documented in the standard data sheets
or processor user manuals. But they are critical if you want to do work such
as pipeline models to drive compiler optimizers.
paul