Pipelining and Dec Jupiter thoughts....

Chris Zach via cctalk Thu, 06 May 2021 19:35:32 -0700

Sort of.  But while a lot of things happen in parallel, out of order, speculatively, etc., the 
programming model exposed by the hardware still is the C sequential model.  A whole lot of logic is 
needed to create that appearance, and in fact you can see that all the way back in the CDC 6600 
"scoreboard" and "stunt box".  Some processors occasionally relax the 
software-visible order, which tends to cause bugs, create marketing issues, or both -- Alpha comes 
to mind as an example.


Interesting to see this.

I've been reading a lot recently about the Jupiter/Dolphin project andthe more I read the more I understand why it just could not be done. Atthe time (and to an extent even now) the only way to really improve asystem's performance was to pipeline the processor, and the Pdp10instruction set just wasn't easy to do that with.

They had a great concept: An Instruction fetch/decode system (IBOX), anexecution engine (EBOX), the obligitory vector processor or FPU (HBOX)and of course the memory system (MBOX). Break the process up into stepsand have the parts all work in parallel to boost performance.

Unfortunately they started to find way too many cases where an indirectinstruction would be fetched that would be based on the AC, which wasbeing changed by another instruction in the EBOX. This would blow outall the prefetched work in the pipe, forcing the IBOX to do a costlyreload.

Likewise branch prediction couldn't be done well because most branchesand skips depended on the value in the AC which was once again usuallybeing modified in the EBOX down the pipe. As soon as it was modified thepipe had to be flushed and reloaded. It looks like they tried to putthat logic into the IBOX to catch these issues, but that resulted in aflat processor that wasn't going to benefit from any parallelism, anendless series of bugs, and an IBOX that was pretty much running withits own EBOX.

It got worse when they realized that the Extended memory segments in the2060 architecture totally wrecked the concept of an instructiondecoder/execution box. There were just too many places where an indirectinstruction to another section which was then based on the AC's wouldresult in Ibox tossing the queue and invalidating the translationbuffers. Increasing the translation buffer helped (I think that's one ofthe things they did on the final 2065 to make it faster) but theycouldn't make that big and fast enough. I guess an indirect jumpinstruction based on comparing the AC to an indirect address pointing toan extended segment would be enough to make any decoder just cry.

It's sad to read, you can almost see then realizing it was doomed. TheFoonly F1 was a screamer, but it was basically the KA10 instruction setand couldn't run extended memory segments like the 2060. And when theytried to do the same thing with the F4 it came out to be a little slowerthan a 2060. I used to think they put only one extended segment in the2020 to cripple the box, but maybe they started running into the sameproblem and ran out of microcode space to try and address it.

Couple this with the fact that much of the 20 series programs were builtin assembler (and why not, it was an amazing thing to program) and youjust had too many programs with cool bespoke code that would totallytrash a pipeline. Fixing compilers to order instructions properly couldhave worked, but people just wrote in assembler it wasn't going tohappen and they weren't about to re-code their app to please the newscheduler God.

The VAX instruction set was a lot less beautiful, but could be pipelinedeasier especially with the dedicated MMU so they took the people andpipelined the hell out of the 780 resulting in the nifty 8600/8650 andlater the 8800's. Dec learned their lesson when they built Alpha, andeven Intel realized that their instruction set needed to be pipelinedfor the Pentium Pro and above processors.

Ah well. I don't think it was evil marketing or VAX monsters that killedthe KC10, it was simply the fact that the amazing instruction setcouldn't be pipelined to make it more efficient for hardware and thememory management system wasn't as efficient as the pdp11/Vax MMU concept.

Pipelining and Dec Jupiter thoughts....

Reply via email to