On Fri, 25 May 2001 03:54:31 -0700, Willow Schlanger wrote:
>
>NOTE: I write very poorly when my sleep schedule is all messed up. It's,
>like, 2:40 AM here so my grammer will be a tad queer here. Also this is
>a tad off topic; it has to do with a subproject of plex86 (and an
>unofficial one, e.g. it will not be integrated into the main plex86
>sources until _after_ it's working and _if_ Kevin agrees to it as a
>plugin in the main distro).
>
>Wouldn't it be nifty if all Plex86 guests communicated via TCP/IP? Then
>one guest could be running on the same PC you're using; other guests
>could be running on another PC; still others could be running on a PC
>using MultiPlex86, a miniture Linux kernel that uses only legacy
>hardware, has no display or keyboard support (in the kernel), but
>provides just enough to run Plex86. Thus the plex86 client on that PC
>can do hardware I/O that would normally be reserved to the Linux kernel.
>That hardware I/O would be noted, but would "fall thru" to the real
>hardware unless its a certain device that must be emulated (e.g. we
>can't let it know about the actual network card since we're using it;
>unless of course it knows about an emulated version of the card).
>
>---
> WARNING THIS IS OFF TOPIC AND RAMBLING BUT TECHNICAL AND PERHAPS YOU'LL
>FIND IT INTERSTING
>
>The approach of Plex86 has been to look at code as it executes and to
>convert or emulate the things that can't run natively and to run (most
>of) the rest natively.
>
>Wouldn't it be nifty if we could provide some kind of "hint mechanism"
>that would take advantage of the target architecture.
>
>E.g. take a Windows EXE program. It's fairly predictable: we can scan it
>_before_ executing it, considering all routes. Only hardware I/O or
>advanced things would be decided at the time what to do.
>
>If we mapped a Windows EXE out like that, then we could find out which
>parts of the code refers to other parts of the code (like Sourcer does).
>
>Then we could actually "staticly translate" it so that we'd _see_ what's
>going to run at CPL 0 and _translate_ it, in advance, to code that we
>can actually run at CPL 0 (_after_ translation).
>
>Thus if a CPL 0 code reloads CR3, it could be replaced with a call to
>the WriteCR3 handler. The length of the call instruction could even be
>longer than the MOV CR3,EAX or whatever instruction. We'd just adjust
>all relative pointers that were detected.
>
>Another idea / application / example.
>
>We could also make it so that calls to the Windows kernel could be
>"trapped" similiar to the way Plex86 traps "bad" instructions. Those
>calls could, if the MultiPlex86 slave PC were runnig Windows, then cause
>the slave PC, which _actually_is_ running Windows, to make a
>corresponding call. The Slave PC would be running Windows
>_inside_Plex86_.
>
>The changes the slave PC makes to hardware would be noted by the slave
>PC but not told to the guest / host PC. Only the _end_effect_ ("output")
>of a Windows kernel function would be told to the host PC. That PC would
>then simulate those effects.
>
>IMHO, this is a way to clone Windows, module by module, without
>violating its license agreement, and without separating its components.
>
>For we could compare the changes made by a Windows function with the
>changes Wine would make (Wine would be running in a second guest on the
>host PC) and record, to a log file, any discrepencies (excluding things
>that have to do with the internal operation of Windows).
>
>To do this we'd map out the windows kernel functions and emulate
>executing them, like a web crawler. If there were a
>
>MOV EAX,5
>
>instruction that at that place we'd store the _old_ value of EAX so that
>we can walk BACKWORDS, undoing the modification.
>
>Each register's bit would contain a pointer to its source.
>
>All instructions would have 2 inputs: stdin, auxin and one output:
>stdout.
>
>For MOV EAX,5, stdin would be 5 and stdout would be EAX. There would be
>no auxin.
>
>Instructions can input flags, either CF, the low 8 bits of flags, or
>other masks of flags (e.g. see 'POPF').
>
>Now for ADD EAX,5, EAX would be the stdin because it's the _main_ input.
>5 is the auxin because its auxillary; it MODIFIES ADD EAX but does not
>fundamentally change what's being done. EAX is also the stdout.
>
>Thus we'd make a linked list of these "operator notes" which would have
>a pointer to an operator, the flag info, and the 3 pointers for stdin,
>stdout, and auxin.
>
>For rotations, auxin would be the rotation count.
>
>Now the "operator nodes" would point to eachother: every input that is
>not null, points to an output.
>
>Thus:
>
>a:
>DEC EAX
>JZ b
>mov EAX,5
>b:
>INC EAX
>INC EAX
>
>consider the above code.
>
>We wish to figure out what it does without having a human disassemble
>it.
>
>Have the machine do it for you.
>
>First, consider doing this: emulating all possible courses.
>
>Start off with EAX = unknown. Then we hit
>
>a: ; I will use ::EAX to refer to "what EAX was at a"
>DEC EAX ; EAX = ::EAX - 1
>JZ b
>
>We will assume that the JZ is not taken. This is possible because the
>range of ::EAX is 0..ffffffff. Any range, plus or minus a number, stays
>the unknown if it began unknown. Thus after subtracting one, the range
>of ::EAX (the possible values it can have) is the same.
>
>Now we hit the 'JZ b'. What we decide there, is if it is possible given
>the current state of things, that EAX is zero. We know (our program
>knows), in scanning the code, that ZF reflects the zeroness of EAX. And
>it knows that EAX is ::EAX - 1, and that ::EAX has a range of 0..fffffff
>(in the program it's stored like this: each _bit_ of eax points to a
>place (the output of something) unless it can be anything at all, in
>which case it's NULL, and a NULL pointer means "not restricted to a
>certain state (0 or 1) at the state being conisdered"); also store a
>range for _before_ the mask.
>
>Thus we have done:
>
>DEC EAX
>JZ b
>
>And we now put a restriction on what EAX can be. Now we know that EAX is
>1..ffffffff.
>
>Next we hit this:
>
>mov EAX,5
>INC EAX
>INC EAX
>
>And so there is an algorithm describing the operation of the function
>(if it does hardware I/O, that I/O is considered simply to be changing
>"registers" of that hardware. If that hardware does something in the
>background, ficticious registers reflect that fact).
>
>Thus we have a program that walks thru the code,
>
>DEC EAX
>JZ b
>mov EAX,5
>INC EAX
>INC EAX
>
>And we have the table entry made:
>
>if ::EAX-1 != 0 then
> EAX = 7
>
>Now we simplefy it doing arithmatic. ::EAX - 1 != 0 can be stated ::EAX
>!= 1.
>
>Input EAX 0,2..ffffffff
>Output EAX 5
>
>Thus we have reverse engineered a function. Before hand we have mapped
>the function to find out where we'll be executing etc. We also know what
>outputs of the function are actually used. FOr this example, let's say
>that we've determined that the only thing this function can change that
>actually gets used is EAX. Thus we don't worry about the flags that are
>output.
>
>BUT we're not done! The Input EAX range is 0..ffffffff and we have
>covered only 0,2..ffffffff! We need still to do 1.
>
>How do we do that? Well, as we were walking through the code, we were
>"taking notes" on the way things were, so we can undo things. Remember
>that operator map (linked lists)? Well when we walk up (back-up), we
>readjust that, including restrictions.
>
>Look at this:
>
>a:
>DEC EAX
>JZ b
>mov EAX,5
>b:
>INC EAX
>INC EAX
>d:
>
>We're now at 'd'. We know that the sflags (except CF which is ::CF) are
>the output of the last INC EAX, and we have EAX linked to be 7.
>
>When we back up, past the INC EAX, we "undo" the sflags and decrement
>EAX so it's now 6.
>
>Now we're at b.
>
>Go up still, and we're at the MOV EAX,5.
>
>The node corresponding to that (the 'cluster') _stores_ the pointer for
>all the bits of EAX that are overwritten (destroyed). Thus we walk back
>up past the MOV EAX,5 so that EAX is now ::EAX - 1. And the restrictions
>on EAX are: it can't be zero. The sflags (except CF) are now the output
>of the DEC EAX.
>
>Now if we encounter "INC EAX / INC EAX" we store in the operator linked
>list "+ 1 + 1" but that's only done so that we can back up. EAX's state
>is simplefied, so that its value would be, + 2.
>
>Anyways.
>
>Now we're at the 'JZ b' instruction.
>
>That instruction stores the restrictions on EAX (what they were as we
>were walking down the former time, when we there). Thus at the JZ b the
>restirctions EAX _are_, are, that EAX is nonzero. But when we walk up,
>we "undo" the restrictions (first).
>
>Thus the restrictions become what they were, which is, 0..ffffffff.
>
>Now we will continue walking up to the top, and then we're done, but we
>are CRAWLING. Thus we first must expire all possibilities.
>
>Thus before walking up to the first 'DEC EAX', we now switch to
>walk-down mode! Why? Because the 'JZ b' does contains a pointer to the
>next node, and yes, it's been taken, indicating we can walk but, EXCEPT
>it also has a "branch" next pointer. It's a second program counter
>target. _That_ one exists but hasn't been expired (taken).
>
>So we now take the JZ b.
>
>In taking it, we place these restrictions on EAX: it is zero. And EAX's
>linked list, if you traveerse it and print it would, is ::EAX - 1.
>
>If we were to simplefy it (don't simplefy it until you hit the end,
>however), then we'd see that ::EAX - 1 = 0 and thus, ::EAX = 0. But we
>don't know that yet; we just know that EAX is 0.
>
>And thus we're now at 'b'.
>
>EAX becomes 2.
>
>Now we're at the end point, so we simplefy and add an entry to our table
>of the outputs that get used (the flags don't get used, let's say; if
>they did, we'd note the output flags too):
>
>Input EAX 0,2..ffffffff 1
>Output EAX 5 2
>
>Next we walk up, doing the same thing, until we hit the starting point,
>or we encounter an untaken node target (a branch). But the whole thing's
>expired, so we'll hit the starting point! When you make it there, you're
>done traversing or recrusiving or whatever.
>
>And behold! That little table can be used as documentation to clone the
>function.
>
>Why clone a function? Because IMHO it can optimize things a hella lot,
>and simultaniously promote interoperability. If the function used
>Windows-specific things, we can have a compiler-like program convert the
>table to a Linux source file, and there's no trace of where it came
>from.
>
>Although you could only do that with the outermost Windows kernel
>functions.
>
>Now you'd do the same thing with the inner Windows kernel functions,
>too, but you would do that only to make a database of tables that can
>then be used to clone the highest level functions.
>
>Now the compiler - like program, which would take a table as input and
>output C source code, would make references to functions/macros
>_specific_ to the target architecture.
>
>Thus, this would be done for _interoperability_only_.
>
>If you want to clone Windows itself, you'd have to make a _superset_ OS
>(like NT, in principle) what would have the ouput source code used only
>for interoperability.
>
>The making of the tables could benifiet from Plex86.
>
>MultiPlex86 would be needed only for the lowest level function; namely,
>things that interact directly with hardware I/O.
>
>It had not occurred to me to begin making tables at the highest level
>and then work down, going down as much as necessary.
>
>SO! Want to know something cool?
>
>The IBM BIOS will have functions that input, say, AH = 1, 2, or 3.
>
>And they'll do this:
>
>IBMFunction:
>DEC AH
>JZ IBMHandlerForFunction1
>DEC AH
>JZ IBMHandlerForFunction2
>DEC AH
>JZ IBMHandlerForFunction3
>RET
>
>Know what the Award clone'd funciton will do? Something like this:
>
>AwardFunction:
>CMP AH,1
>JZ AwardHandlerForFunction1
>CMP AH,2
>JZ AwardHandlerForFunction2
>CMP AH,3
>JZ AwardHandlerForFunction3
>SUB AH,3
>RET
>
>That's not something that they actually do but it is the same IDEA. E.g.
>I saw something similiar to that.
>
>Nifty, huh? SO if you input 4, to the IBM function (an invalid value)
>you'll get out 1. And 5 will output 2.
>
>Award could have called the function with AH = 1, 2, 3, and they know
>what it does, and then said, "what if I call it with 4?"
>
>And they'd get get 2.
>
>Then they'd try all 256 values (actually, just enough to convince them
>it's a linear function, y = x - 3) and then they'd have a table, like
>so:
>
>InputAH 0 4 5 6
>OutputAH -3 1 2 3
>
>And so on, then they'd make a hypothesis (here, let's say they're
>analysing invalid function numbers to see what the output is):
>
>InputAH N
>OutputAH N - 3
>
>Thus, having simplefied the table, they'd then write their function,
>first handling things as expected and then, if it's invalid, subtracting
>3.
>
>Then they _own_the_copyright_ for a 100% compatible function.
>
>To be truly 100% compatible it'd have to be 100% identical (what about
>if someone reads a byte of the ROM CODE and compares it with the
>expected byte?) But that is 101% compatible not 100%, because _that_
>would be incompatible with future IBM ROM BIOS versions!
>
>Thus 100% compatbile is a tad loosely defined. It means, 'within
>reason'.
>
>To be truly 100% functionality compatible, they'd have to think about
>the fact that CF is not the same on output. It falls thru for IBM's but
>is the output of the SUB for Awards.
>
>I've actually seen code like the above. It's nifty to compare clones
>with the originals to think of a more systematic (e.g. a cheaper way
>that doesn't involve so many engineers).
>
>I think Award used a mapping program, though, and maybe it gave them
>some hints, but in a clean room you need to be able to prove beyond a
>reasonable dobt that you used something comparable to a true clean room.
>
>How can we, in developing Plex86, promote interoperability between
>technologies? I think 'Instrumentation' is a good start but I think we
>should add some of these ideas.
>
>Note that the program outputs a table for the IBM function and the award
>function, automatically! No calling necessary! And we can't use the
>target addresses of the jumps (don't tell any "humans". keep it private,
>so the _PROGRAM_ knows where to go next, but not the _human_).
>
>But if this is a public function we can use the table. We'd get this:
>
>IBMFunction
>::AH *
>AH ::AH - 3
>
>Along with info on the flags.
>
>Thus we'd get the same table, _automatically_.
>
>Some tables must be kept private, like I said, and used by the program
>only. E.g. they can't be told to the cleanroom guys.
>
>But some things, under supervision, could be input into the program that
>generates C source code fragments.
>
>It'd be:
>
> switch(AH) {
> case 1:
> Function2130841();
> break;
> case 2:
> Function2130842();
> break;
> case 3:
> Function2130843();
> break;
> default:
> AH -= 3;
> }
>
>and NOT this:
>
> if(!--AH)
> Function2130841();
> else
> if(!--AH)
> Function2130842();
> else
> if(!--AH)
> Function2130843();
>
>Some code can be used as-is after being approved. Some code WILL BE
>WRITTEN, even without telling the cleanroom guys about it, because, to
>get from "A to C" they'll realize they need a "C" and we'll say, "You're
>Hot!" (meaning they're on the right track) and they'll develop a "B" on
>their own (an internal thing). Maybe the thing being cloned used "B"
>between A and C, maybe it used X. Doesn't matter.
>
>The second output is an OPERATIONAL clone. The first, is a functionality
>clone. The second is thus a decompiled version and the first, a
>table-output-due-to-mapping version.
>
>The first is the kind that can be used as-is (if it is for a public
>function). The cleanroom guys will build up an architecture and the
>"dirty guys" (who operate the program described above) would follow what
>the cleanroom's architecture is and give their hot and cold responses
>based on how things are developing.
>
>Multiple cleanrooms could be used. Multiple levels could be used too (an
>intermediate cleanroom that might closely resemble the source
>architecutre. It'd _work_ and be good for internal use; e.g. study it to
>make a plan for the next level).
>
>Thus the above is just some thoughts I've got. It's where I'm headed. I
>want to make LGPL tools to promote interoperability between Linux and
>other systems. I hope that the tools will be so good that if Windows
>isn't cloned, it's not my fault (for not making all the tools that
>someone needs).
>
>BTW sometimes I see IBM doing something like an 'AND' where AMI and
>Award do a MOV; that has the same effect. Some fragments of code for AMI
>and award are identical.
>
>I classify BIOSes thus:
>1. IBM BIOS
>2. Award, AMI BIOS
>3. PhoenixBIOS
>
>Anyone know which of those are merged into one company?
>
>If we make a 100% compatible LGPL BIOS, would anyone use it (e.g. PC
>mfr's)?
>
>The PhoenixBIOS is very different from the others.
>
>All clone BIOSes are "piecemealed" together. Rather than having
>different source files linked together, it looks like they use DOS COPY
>or something, to copy all the modules together, and then fixup things
>via jumps.
>
>Thus it looks like they do,
>
>CALL 5000h
>
>to call a function in another source file. Then 5000h will have a JMP
>instruction that jumps to wherever that function ends up.
>
>Also Award uses a stupid assembler. it doesn't know that PUSH 0 can be
>encoded as a sign extended byte. Weird, huh?
>
>IMHO, I believe someone has developed what I have described above and I
>belive Microsoft uses it or something similiar. It fits the facts
>perfectly. In some cases MS buys companies. They bought Lattice, and
>renamed their compiler Microsoft C; but: they bought a DOS clone; MS
>loves COMPAQ who cloned an IBM BIOS... Pauld Allen (co-founder of MS)
>developed an emulator for the Amiga or for something.... seem fishy?
>Someone knows a LOT about cloning.
>
>Now COMPAQ probably used primative reverse engineering, with two
>separate, large, well-paid teams. They probably disassembled things. But
>this method could enhance cost effectiveness a lot even for the
>primative COMPAQ method.
>
>Because it reduces the amount of machine code that humans need to
>disassemble, even if you are to do that.
>
>Windows can be disassembled but it'd be cleaner to NOT disassemble it
>and ONLY used the better method I described above.
>
>In addition to promoting interoperability we could also privide for a
>new architecture, something that maps out a program, even a Mac or
>Windows program, _BEFORE_ you run it, and then converts it to executable
>code. It'd even handle self modified code as conditionals.
>
>The tables can be conveted to pseudo-code too, which is probably
>preferable.
>
>Later, all.... I'm hoping for feedback on these ideas.. feel free to
>pass this on to other newsgroups / people who might be interested...
>I've implemented only the first version of what i've described here.
>
>What my program does (that I actually wrote) is map out code based on an
>entry point.
>
>It stops when it hits an end of line.
>
>It doesn't back up like I described above because what was described
>above would be the _Second_ generation.
>
>Instead it rescans the entire thing from the entrypoint (I have one
>cluster for each opcode; beforehand I convert to clusters, considering
>each byte a canadite for an entrypoint, noting its lenght, targets,
>adjusting its relative targets, tec.)
>
>Anyway when it hits an untaken branch it takes that route.
>
>It assumes RETs are matched with CALLs, always where they aren't you
>have to manually help it out.
>
>When its done you have to look at all the end points.
>
>Some of them will be for, for example,
>
>JMP [1234+BX].
>
>Now when its done, each cluster contains a linked list of all
>instructions that can transfer to it. Thus blocks of code that have no
>entrypoint in them can be grouped together and commented as pseudo code.
>You have to manually see where flags get used or overwritten... my
>current generation is probably as clumbsy as compaq's.
>
>Anyway for the JMP [1234+BX] you have to manually look at the
>disassembled code, thus:
>
>.
>.
>.
>Line1: CMP BX,7
>Line2: JAE somewhere_over_the_rainbow
>Line3: ADD BX,BX
>Line4: JMP [1234+BX]
>
>Now when disassembling, you have to group things according to where
>there are entrypoints. Since we see that, so far, there are no
>entrypoints to Line3 or Line4, we know that the restrictions on BX is
>that it is equal to twice a number that is between 0 and 6 inclusive.
>Thus we have to manually insert code in the source that makes it so that
>when we run the program again, and it uses the data from the last run,
>when it hits that "endpoint" the special code says, "branch to
>[1234+2*0]" unless that's been branched to, then to "branch to
>[1234+2*1]", and so on.
>
>Doing this method I finally mapped out a VGA BIOS (not the Elpin one,
>don't worry) completely.
>
>But then what? Ta hell if I'm gunna MANUALLY walk thru that shit and
>take notes. I'd do it if I were _paid_ to, but I don't need to know
>precisely how a VGA BIOS works; I just want to perfect the _TOOLS_ to
>figure out how.
>
>Thus the current generation is VERY clumbsy. I wrote that part of it in
>a few days, maybe two days.
>
>But there's another part to it. It uses a table of opcodes that gets
>expanded from a script file I wrote that defines the x86 non-FP,MMX,etc.
>instruction set, not in terms of ins, outs, axioms, operators, but just
>in terms of machine code, instruction name, assembly language form.
>
>I can thus make disassembler, and crude assemblers, automatically.
>
>I posted the source for the script part as an attatchment to an earlier
>message.
>
>No one seemed to care about it.
>
>