Hi Benoit (and Pekka), thanks. Here are some comments and questions:

--
Nadav Har'El
n...@scylladb.com

On Thu, Aug 25, 2016 at 1:07 PM, Benoit Canet <
benoit.canet.cont...@gmail.com> wrote:

> Enable "fast system calls" via the 'syscall' instruction on OSv. The
> instruction is used by Go programs on Linux/x86-64 for system calls.
>
> Signed-off-by: Pekka Enberg <penb...@scylladb.com>
> Signed-off-by: BenoƮt Canet <ben...@sylladb.com>
> ---
>  arch/x64/arch-setup.cc | 12 ++++++++++++
>  arch/x64/entry.S       | 20 ++++++++++++++++++++
>  arch/x64/msr.hh        |  3 +++
>  linux.cc               |  9 +++++++++
>  4 files changed, 44 insertions(+)
>
> diff --git a/arch/x64/arch-setup.cc b/arch/x64/arch-setup.cc
> index 5e76d82..520651d 100644
> --- a/arch/x64/arch-setup.cc
> +++ b/arch/x64/arch-setup.cc
> @@ -10,6 +10,7 @@
>  #include <osv/mempool.hh>
>  #include <osv/mmu.hh>
>  #include "processor.hh"
> +#include "processor-flags.h"
>  #include "msr.hh"
>  #include "xen.hh"
>  #include <osv/elf.hh>
> @@ -213,6 +214,16 @@ static inline void disable_pic()
>      XENPV_ALTERNATIVE({ processor::outb(0xff, 0x21);
> processor::outb(0xff, 0xa1); }, {});
>  }
>
> +extern "C" void syscall_entry(void);
> +
> +static void setup_syscall()
> +{
> +    processor::wrmsr(msr::IA32_STAR,  static_cast<uint64_t>(1*8) << 32);
>

What is this calculation "1*8" supposed to mean? can it get a name?


> +    processor::wrmsr(msr::IA32_LSTAR, reinterpret_cast<uint64_t>(
> syscall_entry));
> +    processor::wrmsr(msr::IA32_FMASK, 0);
>

I'm not an expert in this stuff enough to understand what FMASK was
supposed to solve, and why 0 is good enough for us.


> +    processor::wrmsr(msr::IA32_EFER,  processor::rdmsr(msr::IA32_EFER) |
> 0x01);
>

please define a name - e.g., IA32_EFER_SCE or msr_bits::IA32_EFER::SCE or
something, for this "0x1" (SCE = "system call extension").


> +}
> +
>  void arch_init_premain()
>  {
>      auto omb = *osv_multiboot_info;
> @@ -220,6 +231,7 @@ void arch_init_premain()
>         debug_early_u64("Error reading disk (real mode): ",
> static_cast<u64>(omb.disk_err));
>
>      disable_pic();
> +    setup_syscall();
>  }
>
>  #include "drivers/driver.hh"
> diff --git a/arch/x64/entry.S b/arch/x64/entry.S
> index b6f5abe..d3a864a 100644
> --- a/arch/x64/entry.S
> +++ b/arch/x64/entry.S
> @@ -159,3 +159,23 @@ call_signal_handler_thunk:
>          iretq
>          .cfi_endproc
>
> +.global syscall_entry
> +syscall_entry:
> +       # There is no ring transition and rflags are left unchanged. The
> only
> +       # thing we need to save is the rip which is stored in rcx by the
> syscall
> +       # instruction.
> +       push %rcx
> +       # FIXME: registers clobbered?

+       # FIXME: FPU state?
>

Please see Glauber's comments about these issues  in
https://groups.google.com/forum/#!msg/osv-dev/PW3bkaVCuMg/-bePROMbWWEJ
If you think Glauber's comments are wrong, please remove the FIXMEs. If
they are right, we better fix those FIXMEs and not commit
something which doesn't actually work.

I think the need to save FPU state is unlikely (because there will only be
dirty FPU state if the function calling the SYSCALL instruction is using
the FPU) but may be necessary for completeness. But it will be really
inefficient :-( Linux has it easier here, because it doesn't use FPU in the
kernel...


+       # FIXME: system call arguments?
> +       movq %r10, %rcx
> +       # rotate syscall arguments
> +       movq %r8,  %r9
> +       movq %rcx, %r8
> +       movq %rdx, %rcx
> +       movq %rsi, %rdx
> +       movq %rdi, %rsi
> +       movq %rax, %rdi
>

I wonder how this code behaves with gdb if you try to "backtrace" a crash
inside a system call. Maybe we need all sorts of ".cfi" pseudo-instructions
to solve this? But we can attend to this issue later (let's just not forget
to return to it).

+       call syscall_wrapper
> +       pop %rcx
> +       jmp *%rcx
>

This looks correct, although I guess the more "traditional" approach would
be to use SYSRET here instead of the pop and the jmp.



> diff --git a/arch/x64/msr.hh b/arch/x64/msr.hh
> index 154bba7..d77c75c 100644
> --- a/arch/x64/msr.hh
> +++ b/arch/x64/msr.hh
> @@ -58,6 +58,9 @@ enum class msr : uint32_t {
>
>      IA32_APIC_BASE = 0x0000001b,
>      IA32_EFER = 0xc0000080,
> +    IA32_STAR = 0xc0000081,
> +    IA32_LSTAR = 0xc0000082,
> +    IA32_FMASK = 0xc0000084,
>      IA32_FS_BASE = 0xc0000100,
>
>      KVM_WALL_CLOCK = 0x11,
> diff --git a/linux.cc b/linux.cc
> index bd82ca9..8a2a4a3 100644
> --- a/linux.cc
> +++ b/linux.cc
> @@ -291,3 +291,12 @@ long syscall(long number, ...)
>      return -1;
>  }
>  long __syscall(long number, ...)  __attribute__((alias("syscall")));
> +
> +extern "C" long syscall_wrapper(long number, ...)
> +{
> +    auto ret = syscall(number);
>
+    if (ret < 0) {
> +        return -errno;
>

I think there's a bug: The syscall instruction itself is not supposed to
modify errno. This code does. I think we need to restore errno here to what
it was before we called syscall().



> +    }
> +    return 0;
> +}
> --
> 2.7.4
>
> --
> You received this message because you are subscribed to the Google Groups
> "OSv Development" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to osv-dev+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to