>From ChangeLog: - "Kevin P. Lawton" <[EMAIL PROTECTED]>: Fri Jan 12 15:54:22 EST 2001 More enhancements to dt-testbed/proto2, and more notes in the README. >From dt-testbed/proto2/README: The workload before each guest branch instruction was previously only a NOP. I changed this to create guest code which had a small tight-loop section before each out-of-page branch. I chose to use a cascading add loop to make sure the CPU was kept busy and didn't parallelize things (and thus compress time spent in the loop). You can set the number of tight-loops with 'DT_MicroLoops' in dt.h. Some results: Native DT factor (DT/Native) DT_MicroLoops== 5: 0.89 1.46 1.64 DT_MicroLoops== 10: 2.02 2.50 1.24 DT_MicroLoops==100: 17.62 18.08 1.03 This doesn't factor in dynamic (computed) branches, which are used in dense switch statements, C++ virtual functions (C function pointers), and (implicitly using the stack) by the return() statement. Anyways, you can see that as the real workload that is executed between overhead instructions (like out-of-page branches) increases (approaching more realistic code), the overhead factor signficantly improves. Even low tight-loop iteration counts yield pretty good performance. Dynamic branches should probably be looked at next... -Kevin