One should note that Intel processors (and all modern processors) already do
a lot of code optimization. Out-of-order instructions, branch prediction,
register-renaming, etc are all done to optimize code on the fly. The main
difference (in my limited understanding) between this and what Digital and
Transmeta did is that where the Intel and Amd processors (which are, ever
since Pentium-II, RISC cores with a translation engine in front) are
especially designed to deal with IA32 code-streams, the Digital/Transmeta
aproach is to optimize the code for a non-native core (i.e one without eax,
and all the headaches associated with merged data/instruction memories).
Transmeta found that this was very hard to do. Digital did it as a feature
(and also had better performance in the Alphas to get away with it).
Between compiler optimization and processor optimizations, it's very hard to
do better.
Disclaimer: While I work for intel, I'm not an architect, and most of my
computer-architecture knowledge comes from college classes (some time ago),
so don't take what I say as gospel (or even necesarily truthful)
.Geoff
-----Original Message-----
From: Kevin Lawton [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, November 08, 2000 3:38 PM
To: [EMAIL PROTECTED]
Subject: Re: Oddball idea on performance (FUTURE!!!)
Colin Davidson wrote:
>
> Hi All,
>
> Has anyone else noticed that Plex86 will have most of the necessary
> infrastructure to perform dynamic code optimization (like the Transmeta
> Crusoe or the HP Dynamo project). Wouldn't it be neat if Guest OSes could
> run FASTER under Plex86 than they do stand-alone!
Yeah, this kind of thing was brought up some time ago. In general,
when you dynamically translate code (especially x86) performance
goes down, not up.
Look at Transmeta. According to web reviews, their special
hardware/software solution's performance lacks lustre compared
with the real Iron from Intel or AMD. And we only have
software/software to draw from.
I looked into dynamic translation somewhat with bochs. I did a lot
of thinking about DT with respect to the x86 architecture. It's
quite a challenge. Methinks many people have fallen into this
tarpit, and will continue to.
Now, there may well be cases where DT code for virtualized ring0
guest code may be more efficient than SBE controlled native code.
Will leave that for some day in the future, as you mention in
your subject line. I'm hoping for ring3 code, we can allow
it to run without SBE control (by user option of course).
-Kevin