Wow, thanks for that excellent anaylsis of the problem.  On the point
about the data and OS calls: You could potentially get around that by
also converting the system-call machine code into byte-code as well, in
which case your program may end up emulating some of the OS (which may
fiddle with interupts and other things you are usually not supposed to
touch).  In the extreme case you could convert the entire OS to byte
code.  However, its not a very practical thing to do.  As you noted, it
would probably be better to try to figure out what to do with the system
calls...in which case you could probably use one of the higher-level
languages that do this sort of thing (convert a disk access system call
into a perl or python call for the same thing).   The many-to-one
problem would probably involve a lot of special cases...possibly too
many to actually be implementable.  Very fun to think about though.  :)

-Paul B  

> -----Original Message-----
> From: Bryan C. Warnock [mailto:[EMAIL PROTECTED]] 
> Sent: Thursday, January 03, 2002 11:11 PM
> To: Paul Baranowski
> Cc: [EMAIL PROTECTED]
> Subject: Re: Automatic porting with register-based VMs?
> 
> 
> On Thursday 03 January 2002 08:40 pm, Paul Baranowski wrote:
> > Hi -
> > I love what you guys are doing with Parrot.  I was just recently 
> > wondering if it would be possible to transform a program 
> compiled down 
> > to machine language into byte-code, thereby automatically 
> porting any 
> > app to any other machine (at least any statically-compiled 
> app).  Does 
> > anyone see any technical problems in doing this?
> 
> Let's tackle the difficulties in order.  (I apologize if this 
> will seem 
> flippant.  Not the intention.)
> 
> First, we need to be able to read the program format: ELF, 
> dwarf, .exe, 
> what-have-you.  This is relatively clear-cut - a lot of 
> metadata, your 
> different segments, etc.  Some of the info may not be 
> pertinent, some may 
> not map well.  We'll assume we find a valid use for what we 
> have, and that 
> we can produce anything we need.  (After all, it had to have 
> been producable 
> at some point.)
> 
> Next, we need to understand the program inside the program - 
> the opcode 
> instructions.  This is mostly straightforward.  This code is this 
> instruction, these two are its arguments (which are these), 
> so on and so 
> forth.  We should be able to figure out what operations are 
> occuring on what 
> data with relative ease.
> 
> Now, we have to figure out the semantics of those operations. 
>  Some of them 
> are quite obvious - when we divide an integer by an integer, 
> we are going to 
> get an integer back.  Some aren't.  When we make the system 
> call "fstat", 
> for instance, what exactly does that mean?
> 
> Plus, we have to figure out what the data means.  Is this a 
> path?  Do we 
> need to change the path delimiter?  Does the path refer to an 
> absolute 
> location - /home - or some effective location - the home 
> directories?  This 
> is largely up to the programmer for providing portable, 
> consistent data, so 
> we shall not address it.  Data is *always* the scourge of portability.
> 
> Let's assume that we've got some reasonable results from the 
> above steps, 
> and we're ready to port to our byte-code model.  Working in 
> reverse order, we need to make sure that we can reproduce the 
> semantics of the program.  
> Does this mean that we need to call the system call "fstat", and use 
> whatever it decides to return?  What if it doesn't exist?  What about 
> interpreting the results so that the remaining semantics stay 
> true to the 
> original program's intent?  Does it mean that we need to 
> convert to bytecode 
> that retrieves the stat info, no matter how that may be, so 
> that we may 
> preserve the semantics directly?  What of information that may not be 
> applicable from one platform to the next?
> 
> It's safe to assume that opcodes are not going to map 
> one-to-one.  So we now 
> need to take the opcode stream, with all its arguments and 
> semantics, and 
> map them somehow onto bytecode.  This part should be relatively 
> straight-forward, assuming that everything is mappable.  
> However, there are 
> bound to be some sticky parts.  What about op-and-a-half ops? 
>  (Bytecode 
> that does slightly more than one native op, and slightly less 
> than two.) Do 
> argument sizes need to come into play?
> 
> Now, you need to reassemble your mapped bytecode to a format 
> the interpreter 
> can read.  Maybe some fixup stuff.  Maybe just some packing 
> things in there. Presto.  If you're successful, you've a 
> portable program.
> 
> But let's think about this a little further.  If you're able 
> to deconstruct 
> problems 1, 2, and 3 for a particular platform, should you be able to 
> construct the reverse for the same?  It should be an issue of 
> one-to-many 
> mappings not being inversible, because you don't need to find 
> *the* original 
> program, simply one of the many.  The problems are exactly as 
> described for 
> the virtual machine - it is, after all, a machine.  
> 
> This is basically what machine emulation is - it's what 
> allows me to play my 
> old Commodore 64 favorites on my Linux box.  Or Amiga.  Or 
> Atari 2600.  Or 
> Atari ST.  An example of reducing many different frontends to 
> a single 
> backend.
> 
> Of course, solving it for bytecode requires a single 
> solution, whereas 
> solving it for multiple platforms requires multiple 
> solutions.  But this is 
> exactly what GCC does from its intermediate format to the 
> eventual platform 
> binary that it produces.
> 
> The obvious conclusion, then, is why limit yourself to an interpreted 
> bytecode stream, when you can have native speed and still be 
> portable? That, in turn, begs the question, why isn't it being done?
> 
> The obvious answer is that we do, but that we just take 
> shortcuts.  Since 
> steps one through four would be so much simpler if the platforms were 
> similar (or identical), we create our own virtual platform, 
> with its own 
> program format and semantics - source code in a standardized 
> language. 
> 
> The other obvious answer is that we don't, because it's just 
> to difficult, 
> if not impossible, to do, and do well.  So is it worth the 
> effort to pursue?
> 
> -- 
> Bryan C. Warnock
> [EMAIL PROTECTED]
> 

Reply via email to