Hi Benoit (and Pekka), thanks. Here are some comments and questions:
-- Nadav Har'El n...@scylladb.com On Thu, Aug 25, 2016 at 1:07 PM, Benoit Canet < benoit.canet.cont...@gmail.com> wrote: > Enable "fast system calls" via the 'syscall' instruction on OSv. The > instruction is used by Go programs on Linux/x86-64 for system calls. > > Signed-off-by: Pekka Enberg <penb...@scylladb.com> > Signed-off-by: BenoƮt Canet <ben...@sylladb.com> > --- > arch/x64/arch-setup.cc | 12 ++++++++++++ > arch/x64/entry.S | 20 ++++++++++++++++++++ > arch/x64/msr.hh | 3 +++ > linux.cc | 9 +++++++++ > 4 files changed, 44 insertions(+) > > diff --git a/arch/x64/arch-setup.cc b/arch/x64/arch-setup.cc > index 5e76d82..520651d 100644 > --- a/arch/x64/arch-setup.cc > +++ b/arch/x64/arch-setup.cc > @@ -10,6 +10,7 @@ > #include <osv/mempool.hh> > #include <osv/mmu.hh> > #include "processor.hh" > +#include "processor-flags.h" > #include "msr.hh" > #include "xen.hh" > #include <osv/elf.hh> > @@ -213,6 +214,16 @@ static inline void disable_pic() > XENPV_ALTERNATIVE({ processor::outb(0xff, 0x21); > processor::outb(0xff, 0xa1); }, {}); > } > > +extern "C" void syscall_entry(void); > + > +static void setup_syscall() > +{ > + processor::wrmsr(msr::IA32_STAR, static_cast<uint64_t>(1*8) << 32); > What is this calculation "1*8" supposed to mean? can it get a name? > + processor::wrmsr(msr::IA32_LSTAR, reinterpret_cast<uint64_t>( > syscall_entry)); > + processor::wrmsr(msr::IA32_FMASK, 0); > I'm not an expert in this stuff enough to understand what FMASK was supposed to solve, and why 0 is good enough for us. > + processor::wrmsr(msr::IA32_EFER, processor::rdmsr(msr::IA32_EFER) | > 0x01); > please define a name - e.g., IA32_EFER_SCE or msr_bits::IA32_EFER::SCE or something, for this "0x1" (SCE = "system call extension"). > +} > + > void arch_init_premain() > { > auto omb = *osv_multiboot_info; > @@ -220,6 +231,7 @@ void arch_init_premain() > debug_early_u64("Error reading disk (real mode): ", > static_cast<u64>(omb.disk_err)); > > disable_pic(); > + setup_syscall(); > } > > #include "drivers/driver.hh" > diff --git a/arch/x64/entry.S b/arch/x64/entry.S > index b6f5abe..d3a864a 100644 > --- a/arch/x64/entry.S > +++ b/arch/x64/entry.S > @@ -159,3 +159,23 @@ call_signal_handler_thunk: > iretq > .cfi_endproc > > +.global syscall_entry > +syscall_entry: > + # There is no ring transition and rflags are left unchanged. The > only > + # thing we need to save is the rip which is stored in rcx by the > syscall > + # instruction. > + push %rcx > + # FIXME: registers clobbered? + # FIXME: FPU state? > Please see Glauber's comments about these issues in https://groups.google.com/forum/#!msg/osv-dev/PW3bkaVCuMg/-bePROMbWWEJ If you think Glauber's comments are wrong, please remove the FIXMEs. If they are right, we better fix those FIXMEs and not commit something which doesn't actually work. I think the need to save FPU state is unlikely (because there will only be dirty FPU state if the function calling the SYSCALL instruction is using the FPU) but may be necessary for completeness. But it will be really inefficient :-( Linux has it easier here, because it doesn't use FPU in the kernel... + # FIXME: system call arguments? > + movq %r10, %rcx > + # rotate syscall arguments > + movq %r8, %r9 > + movq %rcx, %r8 > + movq %rdx, %rcx > + movq %rsi, %rdx > + movq %rdi, %rsi > + movq %rax, %rdi > I wonder how this code behaves with gdb if you try to "backtrace" a crash inside a system call. Maybe we need all sorts of ".cfi" pseudo-instructions to solve this? But we can attend to this issue later (let's just not forget to return to it). + call syscall_wrapper > + pop %rcx > + jmp *%rcx > This looks correct, although I guess the more "traditional" approach would be to use SYSRET here instead of the pop and the jmp. > diff --git a/arch/x64/msr.hh b/arch/x64/msr.hh > index 154bba7..d77c75c 100644 > --- a/arch/x64/msr.hh > +++ b/arch/x64/msr.hh > @@ -58,6 +58,9 @@ enum class msr : uint32_t { > > IA32_APIC_BASE = 0x0000001b, > IA32_EFER = 0xc0000080, > + IA32_STAR = 0xc0000081, > + IA32_LSTAR = 0xc0000082, > + IA32_FMASK = 0xc0000084, > IA32_FS_BASE = 0xc0000100, > > KVM_WALL_CLOCK = 0x11, > diff --git a/linux.cc b/linux.cc > index bd82ca9..8a2a4a3 100644 > --- a/linux.cc > +++ b/linux.cc > @@ -291,3 +291,12 @@ long syscall(long number, ...) > return -1; > } > long __syscall(long number, ...) __attribute__((alias("syscall"))); > + > +extern "C" long syscall_wrapper(long number, ...) > +{ > + auto ret = syscall(number); > + if (ret < 0) { > + return -errno; > I think there's a bug: The syscall instruction itself is not supposed to modify errno. This code does. I think we need to restore errno here to what it was before we called syscall(). > + } > + return 0; > +} > -- > 2.7.4 > > -- > You received this message because you are subscribed to the Google Groups > "OSv Development" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to osv-dev+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "OSv Development" group. To unsubscribe from this group and stop receiving emails from it, send an email to osv-dev+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.