On Thu, Apr 18, 2024 at 8:31 PM Gernot Heiser via Devel
<devel@sel4.systems> wrote:
>
> I’m not completely sure what you’re actually measuring here (actually I’m 
> totally unsure
> what your setup really is, I can’t make sense of the above description in 
> terms of
> what the seL4 system is doing).
> But measuring performance of a kernel inside a virtual machine is somewhat 
> pointless at best.
>

I don't think I've got any way to run both QNX and seL4 on the same
hardware without virtualization. I wouldn't think it would make that
much of a difference here.

>
> In any case it seems clear that you’re *not* measuring raw seL4 IPC.
> Your quoted cost of a round-trip on seL4 is say 120ms for 10,000 round trips,
> or 12µs for one round trip. You don’t specify the clock rate, but that would 
> be
> somewhere between 12,000 and 50,000 cycles, when a round-trip IPC on x86 with 
> MCS is <900 cycles

I'm running on a Core i7-960, which is 3.2GHz.

>
> Note that using Send()/Recv()/Reply() is the wrong way to use seL4 IPC, see
> https://microkerneldude.org/2019/03/07/how-to-and-how-not-to-use-sel4-ipc/
> However, this will only be about a factor two slower than using the correct 
> Call()/ReplyWait() incantation.
>

Somehow I said Send()/Recv()/Reply() when I meant Call()/Recv()/Send()
(MCS of course doesn't even have a distinct Reply() function).

Here's the exact code for the benchmark loop I'm using on seL4:

void c_test_client(seL4_CPtr endpoint)
{
    struct seL4_MessageInfo info = seL4_MessageInfo_new(0, 0, 0, 2);
    int j;
    for (j = 0; j < 10000; j++){
        seL4_Call(endpoint, info);
    }
}

void c_test_server(seL4_CPtr endpoint, seL4_CPtr reply)
{
  struct seL4_MessageInfo info = seL4_MessageInfo_new(0, 0, 0, 2);
  while (1){
    seL4_Recv(endpoint, (void *)0, reply);
    seL4_Send(reply, info);
  }
}

This is being called from a Rust wrapper that allocates the
endpoint/reply and reads the TSC; the inner part of it looks like
this:

let start = x86_64::_rdtsc();
x86_64::__cpuid(0);
c_test_client(endpoint.to_cap());
let end = x86_64::_rdtsc();
x86_64::__cpuid(0);
info!("test finished: {} {} {}", start, end, end - start);

I'm not quite sure what would be slowing it down. I would have thought
this would be measuring only the cost of a round trip IPC and nothing
more. Is there possibly some kind of scheduling issue adding overhead?

The QNX/Linux equivalent is:

int main()
{
    char buf[100];
    int f = open("/dev/null", O_WRONLY);
    int i;
    for (i = 0; i < 10; i++) {
        uint64_t start = __rdtsc();
        uint64_t end;
        int j;
        for (j = 0; j < 10000; j++){
            if (write(f, buf, 100) != 100){
                printf("cannot write\n");
                exit(1);
            }
        }
        end = __rdtsc();
        printf("cycles: %u\n", end - start);
    }
}
_______________________________________________
Devel mailing list -- devel@sel4.systems
To unsubscribe send an email to devel-leave@sel4.systems

Reply via email to