I ran an application benchmark on Viengoos. Specifically, the application is derived from the GCbench program. You can find it here:
http://cvs.savannah.gnu.org/viewvc/hurd-l4/benchmarks/GCbench.c?root=hurd&view=log The benchmark takes 239.4 seconds to complete. During this time, it aggressively uses Viengoos' services. (Viengoos is implemented as a user-level server running on top of Pistachio.) I disabled all other threads so that the only two threads that were running were the application's main thread and Viengoos' service thread. Thus, whenever the application makes a call, it should never block. In Viengoos, I used l4_system_clock to read out the time on the receipt of a message. Just before Viengoos sends a reply, it again reads the time and records the difference. This is recorded in a per-method variable. The number of calls per method is also recorded. In the application, I instrumented the RPC stubs to do the same: just before l4_call is invoked, I call l4_system_clock. On return, I again call l4_system_clock and save the difference in a per-method variable. The number of calls per method is again recorded. Below are the four most used system calls: Time (ms) % Time us per call User Kernel U K # Calls User Kernel delta object discard 18,054 15,171 7% 6% 686,960 26.2 22.0 4.2 object alloc 730 567 0% 0% 91,123 8.0 6.2 1.8 cap copy 868 515 0% 0% 90,464 9.5 5.6 3.9 folio alloc 30 27 0% 0% 712 43.1 37.8 5.3 I'd expect the amount of time measured from user space minus the time measured in Viengoos to correspond to the RPC overhead. On this machine (which has an AMD 1.2 Ghz K7 Duron with a 64kb L2 cache) ping-pong reports the following costs associated with Inter-AS IPC: IPC ( 0 MRs): 627.01 cycles, 0.52us, 0.00 instrs IPC ( 4 MRs): 660.87 cycles, 0.55us, 0.00 instrs IPC ( 8 MRs): 670.11 cycles, 0.56us, 0.00 instrs IPC (12 MRs): 678.08 cycles, 0.56us, 0.00 instrs IPC (16 MRs): 675.67 cycles, 0.56us, 0.00 instrs IPC (20 MRs): 683.11 cycles, 0.57us, 0.00 instrs IPC (24 MRs): 691.04 cycles, 0.57us, 0.00 instrs IPC (28 MRs): 697.73 cycles, 0.58us, 0.00 instrs IPC (32 MRs): 697.39 cycles, 0.58us, 0.00 instrs IPC (36 MRs): 701.98 cycles, 0.58us, 0.00 instrs IPC (40 MRs): 714.57 cycles, 0.59us, 0.00 instrs IPC (44 MRs): 718.00 cycles, 0.60us, 0.00 instrs IPC (48 MRs): 720.20 cycles, 0.60us, 0.00 instrs IPC (52 MRs): 729.10 cycles, 0.60us, 0.00 instrs IPC (56 MRs): 736.47 cycles, 0.61us, 0.00 instrs IPC (60 MRs): 733.48 cycles, 0.61us, 0.00 instrs Each invocation includes approximately 12 words of payload and each reply contains 2 words. This suggests an RPC overhead of 1350 cycles or 1.2 us. The 4.2 us represents approximately 5000 cycles. This leaves 3650 unaccounted cycles. This seems to be a bit more than one can simply accounted to secondary cache effects, however, perhaps ping pong really measures the very hot case and I'm running with very cold caches. I hope someone else can suggest how to figure out to what end these cycles are being put, has a theory, or can confirm that these cycle counts are not, in fact, too high. Thanks, Neal
