[osv-dev] [PATCH 8/8] lazy stack: new tracepoint for stack pre-faults

2022-08-30 Thread Waldemar Kozaczuk
This last patch of the series adds new tracepoint - mmu_vm_stack_fault -
which when enabled allows one to see how particular app triggers
the stack page faults. The tracepoint captures the stack fault address,
the thread id and number of the page (0 being the 1st page).

Please note this does not capture the 1st page of the stack (page_no 0) as this 
one
pre-faulted by the parent thread that creates a new one.

./scripts/run.py -e /tests/tst-pipe.so --trace=mmu_vm_stack_fault 
--trace-backtrace -H
./scripts/trace.py extract && ./scripts/trace.py list -bl

0x816b7040 >init0 0.002215401 mmu_vm_stack_fault   
thread=32, addr=0x200ff9d0, page_no=1
  mmu::vm_fault(unsigned long, exception_frame*)
  page_fault
  ex_pf
  std_malloc(unsigned long, unsigned long)
  malloc
  operator new(unsigned long)
  do_main_thread(void*)
  std::_Function_handler::_M_invoke(std::_Any_data const&) __invoke_impl&>__invoke_r&> _M_invoke
  sched::thread::main()
  thread_main_c

...

0x816b7040 >init0 0.084799151 mmu_vm_stack_fault   
thread=32, addr=0x200f8440, page_no=8
  mmu::vm_fault(unsigned long, exception_frame*)
  page_fault
  ex_pf
  memory::page_pool::l1::alloc_page()
  untracked_alloc_page
  memory::alloc_page()
  std_malloc(unsigned long, unsigned long)
  malloc
  operator new(unsigned long)
  lookup
  sys_lstat

Signed-off-by: Waldemar Kozaczuk 
---
 core/mmu.cc | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/core/mmu.cc b/core/mmu.cc
index e41de215..afab7cc9 100644
--- a/core/mmu.cc
+++ b/core/mmu.cc
@@ -1413,6 +1413,9 @@ bool access_fault(vma& vma, unsigned int error_code)
 TRACEPOINT(trace_mmu_vm_fault, "addr=%p, error_code=%x", uintptr_t, unsigned 
int);
 TRACEPOINT(trace_mmu_vm_fault_sigsegv, "addr=%p, error_code=%x, %s", 
uintptr_t, unsigned int, const char*);
 TRACEPOINT(trace_mmu_vm_fault_ret, "addr=%p, error_code=%x", uintptr_t, 
unsigned int);
+#if CONF_lazy_stack
+TRACEPOINT(trace_mmu_vm_stack_fault, "thread=%d, addr=%p, page_no=%d", 
unsigned int, uintptr_t, unsigned int);
+#endif
 
 static void vm_sigsegv(uintptr_t addr, exception_frame* ef)
 {
@@ -1438,6 +1441,14 @@ void vm_fault(uintptr_t addr, exception_frame* ef)
 trace_mmu_vm_fault_sigsegv(addr, ef->get_error(), "fast");
 return;
 }
+#if CONF_lazy_stack
+auto stack = sched::thread::current()->get_stack_info();
+void *v_addr = reinterpret_cast(addr);
+if (v_addr >= stack.begin && v_addr < stack.begin + stack.size) {
+trace_mmu_vm_stack_fault(sched::thread::current()->id(), addr,
+((u64)(stack.begin + stack.size - addr)) / 4096);
+}
+#endif
 addr = align_down(addr, mmu::page_size);
 WITH_LOCK(vma_list_mutex.for_read()) {
 auto vma = find_intersecting_vma(addr);
-- 
2.34.1

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/20220831042433.140243-8-jwkozaczuk%40gmail.com.


[osv-dev] [PATCH 7/8] lazy stack: activate lazy stack in pthreads

2022-08-30 Thread Waldemar Kozaczuk
This patch adds new mmap flag - mmap_stack - that is used when mmaping
a stack when creating new pthread. This new flag is only used when
the build parameter CONF_lazy_stack is enabled.

Signed-off-by: Waldemar Kozaczuk 
---
 include/osv/mmu-defs.hh | 1 +
 libc/mman.cc| 7 +--
 libc/pthread.cc | 7 ++-
 3 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/include/osv/mmu-defs.hh b/include/osv/mmu-defs.hh
index 18edf441..694815f8 100644
--- a/include/osv/mmu-defs.hh
+++ b/include/osv/mmu-defs.hh
@@ -84,6 +84,7 @@ enum {
 mmap_small   = 1ul << 5,
 mmap_jvm_balloon = 1ul << 6,
 mmap_file= 1ul << 7,
+mmap_stack   = 1ul << 8,
 };
 
 enum {
diff --git a/libc/mman.cc b/libc/mman.cc
index 75a94eb0..115b0313 100644
--- a/libc/mman.cc
+++ b/libc/mman.cc
@@ -43,12 +43,7 @@ unsigned libc_flags_to_mmap(int flags)
 mmap_flags |= mmu::mmap_populate;
 }
 if (flags & MAP_STACK) {
-// OSv currently requires that stacks be pinned (see issue #143). So
-// if an application wants to mmap() a stack for pthread_attr_setstack
-// and did us the courtesy of telling this to ue (via MAP_STACK),
-// let's return the courtesy by returning pre-faulted memory.
-// FIXME: If issue #143 is fixed, this workaround should be removed.
-mmap_flags |= mmu::mmap_populate;
+mmap_flags |= mmu::mmap_stack;
 }
 if (flags & MAP_SHARED) {
 mmap_flags |= mmu::mmap_shared;
diff --git a/libc/pthread.cc b/libc/pthread.cc
index cda6cf90..de5979e8 100644
--- a/libc/pthread.cc
+++ b/libc/pthread.cc
@@ -141,7 +141,12 @@ namespace pthread_private {
 return {attr.stack_begin, attr.stack_size};
 }
 size_t size = attr.stack_size;
-void *addr = mmu::map_anon(nullptr, size, mmu::mmap_populate, 
mmu::perm_rw);
+#if CONF_lazy_stack
+unsigned stack_flags = mmu::mmap_stack;
+#else
+unsigned stack_flags = mmu::mmap_populate;
+#endif
+void *addr = mmu::map_anon(nullptr, size, stack_flags, mmu::perm_rw);
 mmu::mprotect(addr, attr.guard_size, 0);
 sched::thread::stack_info si{addr, size};
 si.deleter = free_stack;
-- 
2.34.1

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/20220831042433.140243-7-jwkozaczuk%40gmail.com.


[osv-dev] [PATCH 5/8] lazy stack: ensure next stack page conditionally if interrupts and preemption enabled

2022-08-30 Thread Waldemar Kozaczuk
This patch modifies the last set of call sites where we do not know statically
if any of interrupts or preemption are enabled. In those cases we
dynamically check if both preemtion and interrupts are enabled
and only then pre-fault the stack.

Most of these places are in the tracepoint and sampler implementation.
This is obviously the most costly operation but even here we are lucky
that it affects the performance only when tracepoints are enabled and
even then the code already saves the state of the interrupts using 
the arch::irq_flag_notrace so checking if interrupts are enabled is pretty
cheap. With sampler on other hand, the performance is only affected when
starting or stoping the sampler which is quite rare.

Signed-off-by: Waldemar Kozaczuk 
---
 arch/aarch64/interrupt.cc |  5 +
 core/sampler.cc   | 31 +++
 core/trace.cc | 17 +
 include/osv/trace.hh  |  5 +
 4 files changed, 54 insertions(+), 4 deletions(-)

diff --git a/arch/aarch64/interrupt.cc b/arch/aarch64/interrupt.cc
index e26e10ee..0b244e2e 100644
--- a/arch/aarch64/interrupt.cc
+++ b/arch/aarch64/interrupt.cc
@@ -31,6 +31,11 @@ void sgi_interrupt::send(sched::cpu* cpu)
 
 void sgi_interrupt::send_allbutself()
 {
+#if CONF_lazy_stack
+if (sched::preemptable() && arch::irq_enabled()) {
+arch::ensure_next_stack_page();
+}
+#endif
 gic::gic->send_sgi(gic::sgi_filter::SGI_TARGET_ALL_BUT_SELF,
0, get_id());
 }
diff --git a/core/sampler.cc b/core/sampler.cc
index 9e37ba4f..b9e2b05c 100644
--- a/core/sampler.cc
+++ b/core/sampler.cc
@@ -32,11 +32,16 @@ private:
 sched::timer_base _timer;
 bool _active;
 
-void rearm()
+void arm()
 {
 _timer.set(_config.period);
 }
 
+void rearm()
+{
+_timer.set_with_irq_disabled(_config.period);
+}
+
 public:
 cpu_sampler()
 : _timer(*this)
@@ -54,7 +59,11 @@ public:
 {
 assert(!_active);
 _active = true;
-rearm();
+if (arch::irq_enabled()) {
+arm();
+} else {
+rearm();
+}
 }
 
 void stop()
@@ -97,7 +106,11 @@ static void start_on_current()
 
 if (prev_active + 1 == _n_cpus) {
 _started = true;
-_controller.wake();
+if (arch::irq_enabled()) {
+_controller.wake();
+} else {
+_controller.wake_from_kernel_or_with_irq_disabled();
+}
 }
 }
 
@@ -110,7 +123,11 @@ static void stop_on_current()
 _sampler->stop();
 
 if (--_active_cpus == 0) {
-_controller.wake();
+if (arch::irq_enabled()) {
+_controller.wake();
+} else {
+_controller.wake_from_kernel_or_with_irq_disabled();
+}
 }
 }
 
@@ -170,6 +187,12 @@ void stop_sampler() throw()
 
 WITH_LOCK(migration_lock) {
 stop_sampler_ipi.send_allbutself();
+#if CONF_lazy_stack_invariant
+assert(arch::irq_enabled());
+#endif
+#if CONF_lazy_stack
+sched::ensure_next_stack_page_if_preemptable();
+#endif
 stop_on_current();
 }
 
diff --git a/core/trace.cc b/core/trace.cc
index dc69c807..c9ed2bab 100644
--- a/core/trace.cc
+++ b/core/trace.cc
@@ -247,6 +247,13 @@ void tracepoint_base::update()
 WITH_LOCK(trace_control_lock) {
 bool empty;
 
+#if CONF_lazy_stack_invariant
+assert(arch::irq_enabled());
+assert(sched::preemptable());
+#endif
+#if CONF_lazy_stack
+arch::ensure_next_stack_page();
+#endif
 WITH_LOCK(osv::rcu_read_lock) {
 auto& probes = *probes_ptr.read();
 
@@ -376,6 +383,11 @@ extern "C" void __cyg_profile_func_enter(void *this_fn, 
void *call_site)
 }
 arch::irq_flag_notrace irq;
 irq.save();
+#if CONF_lazy_stack
+if (sched::preemptable() && irq.enabled()) {
+arch::ensure_next_stack_page();
+}
+#endif
 arch::irq_disable_notrace();
 if (func_trace_nesting++ == 0) {
 trace_function_entry(this_fn, call_site);
@@ -391,6 +403,11 @@ extern "C" void __cyg_profile_func_exit(void *this_fn, 
void *call_site)
 }
 arch::irq_flag_notrace irq;
 irq.save();
+#if CONF_lazy_stack
+if (sched::preemptable() && irq.enabled()) {
+arch::ensure_next_stack_page();
+}
+#endif
 arch::irq_disable_notrace();
 if (func_trace_nesting++ == 0) {
 trace_function_exit(this_fn, call_site);
diff --git a/include/osv/trace.hh b/include/osv/trace.hh
index d735575c..01d72022 100644
--- a/include/osv/trace.hh
+++ b/include/osv/trace.hh
@@ -348,6 +348,11 @@ public:
 if (active) {
 arch::irq_flag_notrace irq;
 irq.save();
+#if CONF_lazy_stack
+if (sched::preemptable() && irq.enabled()) {
+arch::ensure_next_stack_page();
+}
+#endif
 arch::irq_disable_notrace();
 log(as);
 run_probes();
-- 
2.34.1

-- 
You received this 

[osv-dev] [PATCH 6/8] lazy stack: prevent deadlock when taking vma_list_mutex for write

2022-08-30 Thread Waldemar Kozaczuk
This patch makes all functions in core/mmu.cc that take vma_list_mutex
for write to pre-fault the stack two pages deep before using the mutex.
This is necessary to prevent any follow up stack faults down the call stack 
after the vma_list_mutex is take for write as this would lead to a deadlock
experienced when testing one of the apps when the page fault handler and
mmu::vm_fault() function would try to take the same vma_list_mutex for read.

Signed-off-by: Waldemar Kozaczuk 
---
 core/mmu.cc | 17 +
 1 file changed, 17 insertions(+)

diff --git a/core/mmu.cc b/core/mmu.cc
index 007d4331..e41de215 100644
--- a/core/mmu.cc
+++ b/core/mmu.cc
@@ -47,6 +47,17 @@ extern const char text_start[], text_end[];
 
 namespace mmu {
 
+#if CONF_lazy_stack
+// We need to ensure that lazy stack is populated deeply enough (2 pages)
+// for all the cases when the vma_list_mutex is taken for write to prevent
+// page faults triggered on stack. The page-fault handling logic would
+// attempt to take same vma_list_mutex fo read and end up with a deadlock.
+#define PREVENT_STACK_PAGE_FAULT \
+arch::ensure_next_two_stack_pages();
+#else
+#define PREVENT_STACK_PAGE_FAULT
+#endif
+
 struct vma_range_compare {
 bool operator()(const vma_range& a, const vma_range& b) {
 return a.start() < b.start();
@@ -1271,6 +1282,7 @@ static void nohugepage(void* addr, size_t length)
 
 error advise(void* addr, size_t size, int advice)
 {
+PREVENT_STACK_PAGE_FAULT
 WITH_LOCK(vma_list_mutex.for_write()) {
 if (!ismapped(addr, size)) {
 return make_error(ENOMEM);
@@ -1310,6 +1322,7 @@ void* map_anon(const void* addr, size_t size, unsigned 
flags, unsigned perm)
 size = align_up(size, mmu::page_size);
 auto start = reinterpret_cast(addr);
 auto* vma = new mmu::anon_vma(addr_range(start, start + size), perm, 
flags);
+PREVENT_STACK_PAGE_FAULT
 SCOPE_LOCK(vma_list_mutex.for_write());
 auto v = (void*) allocate(vma, start, size, search);
 if (flags & mmap_populate) {
@@ -1336,6 +1349,7 @@ void* map_file(const void* addr, size_t size, unsigned 
flags, unsigned perm,
 auto start = reinterpret_cast(addr);
 auto *vma = f->mmap(addr_range(start, start + size), flags | mmap_file, 
perm, offset).release();
 void *v;
+PREVENT_STACK_PAGE_FAULT
 WITH_LOCK(vma_list_mutex.for_write()) {
 v = (void*) allocate(vma, start, size, search);
 if (flags & mmap_populate) {
@@ -1708,6 +1722,7 @@ ulong map_jvm(unsigned char* jvm_addr, size_t size, 
size_t align, balloon_ptr b)
 
 auto* vma = new mmu::jvm_balloon_vma(jvm_addr, start, start + size, b, 
v->perm(), v->flags());
 
+PREVENT_STACK_PAGE_FAULT
 WITH_LOCK(vma_list_mutex.for_write()) {
 // This means that the mapping that we had before was a balloon mapping
 // that was laying around and wasn't updated to an anon mapping. If we
@@ -2014,6 +2029,7 @@ void free_initial_memory_range(uintptr_t addr, size_t 
size)
 
 error mprotect(const void *addr, size_t len, unsigned perm)
 {
+PREVENT_STACK_PAGE_FAULT
 SCOPE_LOCK(vma_list_mutex.for_write());
 
 if (!ismapped(addr, len)) {
@@ -2025,6 +2041,7 @@ error mprotect(const void *addr, size_t len, unsigned 
perm)
 
 error munmap(const void *addr, size_t length)
 {
+PREVENT_STACK_PAGE_FAULT
 SCOPE_LOCK(vma_list_mutex.for_write());
 
 length = align_up(length, mmu::page_size);
-- 
2.34.1

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/20220831042433.140243-6-jwkozaczuk%40gmail.com.


[osv-dev] [PATCH 4/8] lazy stack: ensure next stack page dynamically if preemption enabled

2022-08-30 Thread Waldemar Kozaczuk
This patch modifies number of critical places mostly in the scheduler
code to dynamically pre-fault the stack if preemption is enabled. In all
these places we can be statically sure that interrupts are enabled
but not so sure about preemption (maybe in future we can prove it is enabled
at least in some of the cases and replace conditional
ensure_next_stack_page_if_preemptable() with cheaper single-intruction
ensure_next_stack_page()).

The three call sites before irq_lock is taken (interrupts are disabled)
include: 
- cpu::schedule() before WITH_LOCK(irq_lock)
- thread::pin() before WITH_LOCK(irq_lock)
- thread::yield() before guard*(irq_lock)

The reasoning above goes like this: the methods were designed
with intention to be used when interrupts are enabled because otherwise
the act of using WITH_LOCK or guard construct would imply that interrupts
would be re-enabled after the block so the code does not care about restoring
interrupts to the proper state it was before. If that was the intension,
these methods would use irq_save_lock_type.

Two other call sites in the scheduler are:
- timer_base destructor before calling timer_base::cancel() which disables 
interrupts
- thread::wake_with_from_mutex() only called from waiter::wake()
follow the reasoning that interrupts are enabled most of the time
except few places in scheduler and interrupt handler. Also we assume the mutex
is not used when interrupts are disabled. And based on my analysis of all the 
code
that disables/enables interrupts that seems to be the case.
In any case we put invariants in these two places to verify that interrupts
are enabled indeed.

The last call site in abort() assumes that calling irq_disable() implies
that interrupts are enabled before and we only need to check preemption status.

Signed-off-by: Waldemar Kozaczuk 
---
 core/sched.cc| 24 
 include/osv/sched.hh | 17 +
 runtime.cc   |  3 +++
 3 files changed, 44 insertions(+)

diff --git a/core/sched.cc b/core/sched.cc
index 1e109694..65842ff3 100644
--- a/core/sched.cc
+++ b/core/sched.cc
@@ -224,6 +224,9 @@ void thread::cputime_estimator_get(
 // scheduler on a different CPU would be disastrous.
 void cpu::schedule()
 {
+#if CONF_lazy_stack
+sched::ensure_next_stack_page_if_preemptable();
+#endif
 WITH_LOCK(irq_lock) {
 #ifdef __aarch64__
 reschedule_from_interrupt(sched::cpu::current(), false, thyst);
@@ -566,6 +569,9 @@ void thread::pin(cpu *target_cpu)
 t.wake();
 }, sched::thread::attr().pin(source_cpu)));
 wakeme->start();
+#if CONF_lazy_stack
+sched::ensure_next_stack_page_if_preemptable();
+#endif
 WITH_LOCK(irq_lock) {
 trace_sched_migrate(, target_cpu->id);
 t.stat_migrations.incr();
@@ -822,6 +828,12 @@ void thread::yield(thread_runtime::duration preempt_after)
 {
 trace_sched_yield();
 auto t = current();
+#if CONF_lazy_stack_invariant
+assert(arch::irq_enabled());
+#endif
+#if CONF_lazy_stack
+sched::ensure_next_stack_page_if_preemptable();
+#endif
 std::lock_guard guard(irq_lock);
 // FIXME: drive by IPI
 cpu::current()->handle_incoming_wakeups();
@@ -1258,6 +1270,12 @@ void thread::wake_impl(detached_state* st, unsigned 
allowed_initial_states_mask)
 
 void thread::wake()
 {
+#if CONF_lazy_stack_invariant
+assert(arch::irq_enabled());
+#endif
+#if CONF_lazy_stack
+sched::ensure_next_stack_page_if_preemptable();
+#endif
 WITH_LOCK(rcu_read_lock) {
 wake_impl(_detached_state.get());
 }
@@ -1604,6 +1622,12 @@ timer_base::timer_base(timer_base::client& t)
 
 timer_base::~timer_base()
 {
+#if CONF_lazy_stack_invariant
+assert(arch::irq_enabled());
+#endif
+#if CONF_lazy_stack
+sched::ensure_next_stack_page_if_preemptable();
+#endif
 cancel();
 }
 
diff --git a/include/osv/sched.hh b/include/osv/sched.hh
index 1691bed8..8a2694cb 100644
--- a/include/osv/sched.hh
+++ b/include/osv/sched.hh
@@ -1043,6 +1043,15 @@ inline bool preemptable()
 return !get_preempt_counter();
 }
 
+#if CONF_lazy_stack
+inline void ensure_next_stack_page_if_preemptable() {
+if (!preemptable()) {
+return;
+}
+arch::ensure_next_stack_page();
+}
+#endif
+
 inline void preempt()
 {
 if (preemptable()) {
@@ -1350,6 +1359,14 @@ template 
 inline
 void thread::wake_with_from_mutex(Action action)
 {
+#if CONF_lazy_stack_invariant
+assert(arch::irq_enabled());
+#endif
+#if CONF_lazy_stack
+if (preemptable()) {
+arch::ensure_next_stack_page();
+}
+#endif
 return do_wake_with(action, (1 << unsigned(status::waiting))
   | (1 << unsigned(status::sending_lock)));
 }
diff --git a/runtime.cc b/runtime.cc
index 521b5c24..5f67e79b 100644
--- a/runtime.cc
+++ b/runtime.cc
@@ -113,6 +113,9 @@ void abort(const char *fmt, ...)
 do {} while (true);
 }
 
+#if CONF_lazy_stack
+sched::ensure_next_stack_page_if_preemptable();
+#endif
 

[osv-dev] [PATCH 3/8] lazy stack: ensure next stack page statically

2022-08-30 Thread Waldemar Kozaczuk
This patch modifies all relevant places, where we statically know
that both interrupts and preemption should be enabled, to unconditionally
pre-fault the stack.

These include places in code before:
- WITH_LOCK(preemption_lock) block
- WITH_LOCK(rcu_read_lock) block
- sched::preempt_disable() call
- WITH_LOCK(irq_lock) block in one case

In general, these are the places that follow the assumption
that most of the time preemption and interrupts are enabled.
And the functions/method below are called directly or indirectly by
an application and there is no other kernel code in that application
call stack above that also disables interrupts or preemption.

One good example is the memory allocation code in mempool.cc
that disables preemption in quite few places. Many of those
use WITH_LOCK/DROP_LOCK(preempt_lock) combination that implies
they were intended to be called with preemption enabled otherwise
code in DROP_LOCK would not work. Also all of those end up calling
some form of thread::wait() method that asserts that both interrupts
and interrupts are enabled.

To validate the reasoning, we add relevant invariants before we pre-fault
the stack.

Signed-off-by: Waldemar Kozaczuk 
---
 bsd/porting/uma_stub.cc | 12 +
 bsd/sys/net/routecache.hh   |  6 +
 bsd/sys/netinet/arpcache.hh |  7 +
 core/async.cc   | 18 +
 core/condvar.cc |  7 +
 core/elf.cc | 12 +
 core/mempool.cc | 50 ++-
 core/rcu.cc |  6 +
 core/sched.cc   | 52 +
 fs/vfs/kern_descrip.cc  |  6 +
 include/osv/elf.hh  |  7 +
 include/osv/percpu_xmit.hh  | 13 ++
 include/osv/sched.hh|  7 +
 13 files changed, 202 insertions(+), 1 deletion(-)

diff --git a/bsd/porting/uma_stub.cc b/bsd/porting/uma_stub.cc
index cf320b13..a9d689c3 100644
--- a/bsd/porting/uma_stub.cc
+++ b/bsd/porting/uma_stub.cc
@@ -34,6 +34,12 @@ void * uma_zalloc_arg(uma_zone_t zone, void *udata, int 
flags)
 {
 void * ptr;
 
+#if CONF_lazy_stack_invariant
+assert(sched::preemptable() && arch::irq_enabled());
+#endif
+#if CONF_lazy_stack
+arch::ensure_next_stack_page();
+#endif
 WITH_LOCK(preempt_lock) {
 ptr = (*zone->percpu_cache)->alloc();
 }
@@ -101,6 +107,12 @@ void uma_zfree_arg(uma_zone_t zone, void *item, void 
*udata)
 zone->uz_dtor(item, zone->uz_size, udata);
 }
 
+#if CONF_lazy_stack_invariant
+assert(sched::preemptable() && arch::irq_enabled());
+#endif
+#if CONF_lazy_stack
+arch::ensure_next_stack_page();
+#endif
 WITH_LOCK(preempt_lock) {
 if ((*zone->percpu_cache)->free(item)) {
 return;
diff --git a/bsd/sys/net/routecache.hh b/bsd/sys/net/routecache.hh
index bdcdf496..e2140f06 100644
--- a/bsd/sys/net/routecache.hh
+++ b/bsd/sys/net/routecache.hh
@@ -162,6 +162,12 @@ public:
 // route.cc).
 assert(fibnum == 0);
 
+#if CONF_lazy_stack_invariant
+assert(sched::preemptable() && arch::irq_enabled());
+#endif
+#if CONF_lazy_stack
+arch::ensure_next_stack_page();
+#endif
 WITH_LOCK(osv::rcu_read_lock) {
 auto *c = cache.read();
 auto entry = c->search(dst->sin_addr.s_addr);
diff --git a/bsd/sys/netinet/arpcache.hh b/bsd/sys/netinet/arpcache.hh
index 4556d40e..289a2f29 100644
--- a/bsd/sys/netinet/arpcache.hh
+++ b/bsd/sys/netinet/arpcache.hh
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -98,6 +99,12 @@ struct arp_cache {
 
 boost::optional lookup(const in_addr ip)
 {
+#if CONF_lazy_stack_invariant
+assert(sched::preemptable() && arch::irq_enabled());
+#endif
+#if CONF_lazy_stack
+arch::ensure_next_stack_page();
+#endif
 WITH_LOCK(osv::rcu_read_lock) {
 auto i = _entries.reader_find(ip, std::hash(), 
entry_compare());
 return boost::optional(!!i, *i);
diff --git a/core/async.cc b/core/async.cc
index 9f3d61d2..ada19fba 100644
--- a/core/async.cc
+++ b/core/async.cc
@@ -104,6 +104,12 @@ public:
 
 void insert(percpu_timer_task& task)
 {
+#if CONF_lazy_stack_invariant
+assert(arch::irq_enabled() && sched::preemptable());
+#endif
+#if CONF_lazy_stack
+arch::ensure_next_stack_page();
+#endif
 WITH_LOCK(preempt_lock) {
 trace_async_timer_task_insert(this, );
 
@@ -125,6 +131,12 @@ public:
 
 percpu_timer_task& borrow_task()
 {
+#if CONF_lazy_stack_invariant
+assert(arch::irq_enabled() && sched::preemptable());
+#endif
+#if CONF_lazy_stack
+arch::ensure_next_stack_page();
+#endif
 WITH_LOCK(preempt_lock) {
 auto task = released_timer_tasks.pop();
 if (task) {
@@ -142,6 +154,12 @@ public:
 {
 auto task = new one_shot_task(std::move(callback));
 
+#if CONF_lazy_stack_invariant
+assert(arch::irq_enabled() && 

[osv-dev] [PATCH 2/8] lazy stack: do nothing in kernel threads and on populated stack

2022-08-30 Thread Waldemar Kozaczuk
This patch annotates the relevant call sites with the invariant assert
expressions to validate assumptions that let us do "nothing" in all these
cases. We also reorganize some code in the scheduler to help differentiate
between cases when given function/method is called with interrupts or preemption
disabled or from kernel thread or by interrupt handler.

Following methods get added to scheduler code with names describing state of 
interrupts
or preemption or kernel caller:
- timer_base::set_with_irq_disabled(osv::clock::uptime::time_point time)
- timer_base::set_with_irq_disabled(std::chrono::duration duration)
- thread::wake_with_irq_disabled()
- thread::wake_with_irq_or_preemption_disabled(Action action)
- thread_handle::wake_from_kernel_or_with_irq_disabled()

In general: 
- we modify all interrupt handlers (those that are executed on interrupt stack)
  to call one of the 3 new wake_...() methods (mostly wake_with_irq_disabled())
  to indicate we do not need/should not pre-fault the stack; most of those are 
in
  device drivers code
- we modify all code executed on kernel threads that disables preemption or 
interrupts
  by adding relevant invariant - assert(!sched::thread::current()->is_app()); 
we do
  not need to pre-fault because the stack is populated
- we also modify the code whhich is indirectly called from kernel threads like
  classifier::post_packet() in net channels
- finally we also modify the scheduler code to use 
timer_bas::set_with_irq_disabled()
  mostly around preemption_timer to indicate that we should not pre-fault the 
stack
  downstream

Signed-off-by: Waldemar Kozaczuk 
---
 arch/aarch64/exceptions.cc |  3 ++
 arch/aarch64/interrupt.cc  |  3 ++
 arch/x64/exceptions.cc |  3 ++
 arch/x64/mmu.cc|  2 +-
 arch/x64/msi.cc|  2 +-
 core/async.cc  |  6 +++
 core/epoll.cc  |  2 +-
 core/mempool.cc| 12 --
 core/net_channel.cc|  8 +++-
 core/rcu.cc|  5 ++-
 core/sched.cc  | 79 ++
 drivers/acpi.cc|  2 +-
 drivers/ahci.cc|  2 +-
 drivers/ahci.hh|  2 +-
 drivers/cadence-uart.cc|  2 +-
 drivers/isa-serial.cc  |  2 +-
 drivers/kbd.cc |  2 +-
 drivers/mmio-isa-serial.cc |  2 +-
 drivers/pl011.cc   |  2 +-
 drivers/virtio-blk.cc  |  6 +--
 drivers/virtio-fs.cc   |  4 +-
 drivers/virtio-net.cc  |  6 +--
 drivers/virtio-rng.cc  |  2 +-
 drivers/virtio-scsi.cc |  2 +-
 drivers/virtio-vring.hh|  2 +-
 drivers/vmw-pvscsi.cc  |  2 +-
 drivers/xenconsole.cc  |  2 +-
 include/osv/net_channel.hh |  5 ++-
 include/osv/sched.hh   | 16 
 include/osv/xen_intr.hh|  2 +-
 libc/signal.cc |  3 ++
 libc/timerfd.cc|  3 ++
 32 files changed, 157 insertions(+), 39 deletions(-)

diff --git a/arch/aarch64/exceptions.cc b/arch/aarch64/exceptions.cc
index cadbb3a2..5c2c59ab 100644
--- a/arch/aarch64/exceptions.cc
+++ b/arch/aarch64/exceptions.cc
@@ -122,6 +122,9 @@ void interrupt_table::unregister_interrupt(interrupt 
*interrupt)
 
 bool interrupt_table::invoke_interrupt(unsigned int id)
 {
+#if CONF_lazy_stack_invariant
+assert(!arch::irq_enabled());
+#endif
 WITH_LOCK(osv::rcu_read_lock) {
 assert(id < this->nr_irqs);
 interrupt_desc *desc = this->irq_desc[id].read();
diff --git a/arch/aarch64/interrupt.cc b/arch/aarch64/interrupt.cc
index b8337e23..e26e10ee 100644
--- a/arch/aarch64/interrupt.cc
+++ b/arch/aarch64/interrupt.cc
@@ -22,6 +22,9 @@ sgi_interrupt::~sgi_interrupt()
 
 void sgi_interrupt::send(sched::cpu* cpu)
 {
+#if CONF_lazy_stack_invariant
+assert(!arch::irq_enabled() || !sched::preemptable());
+#endif
 gic::gic->send_sgi(gic::sgi_filter::SGI_TARGET_LIST,
cpu->arch.smp_idx, get_id());
 }
diff --git a/arch/x64/exceptions.cc b/arch/x64/exceptions.cc
index 7c9eaf51..fbf6be65 100644
--- a/arch/x64/exceptions.cc
+++ b/arch/x64/exceptions.cc
@@ -220,6 +220,9 @@ void 
interrupt_descriptor_table::unregister_interrupt(gsi_level_interrupt *inter
 
 void interrupt_descriptor_table::invoke_interrupt(unsigned vector)
 {
+#if CONF_lazy_stack_invariant
+assert(!arch::irq_enabled());
+#endif
 WITH_LOCK(osv::rcu_read_lock) {
 unsigned i, nr_shared;
 bool handled = false;
diff --git a/arch/x64/mmu.cc b/arch/x64/mmu.cc
index 1af268c0..675410d0 100644
--- a/arch/x64/mmu.cc
+++ b/arch/x64/mmu.cc
@@ -64,7 +64,7 @@ std::atomic tlb_flush_pendingconfirms;
 inter_processor_interrupt tlb_flush_ipi{IPI_TLB_FLUSH, [] {
 mmu::flush_tlb_local();
 if (tlb_flush_pendingconfirms.fetch_add(-1) == 1) {
-tlb_flush_waiter.wake();
+tlb_flush_waiter.wake_from_kernel_or_with_irq_disabled();
 }
 }};
 
diff --git a/arch/x64/msi.cc b/arch/x64/msi.cc
index 9a28e3a5..cf0c3dc5 100644
--- a/arch/x64/msi.cc
+++ b/arch/x64/msi.cc
@@ -126,7 +126,7 @@ static 

[osv-dev] [PATCH 1/8] lazy stack: inline assembly to pre-fault stack

2022-08-30 Thread Waldemar Kozaczuk
This is the 1st of the total of eight patches that implement optional
support of so called "lazy stack" feature. The lazy stack is well explained
by the issue #143 and allows to save substantial amount of memory if
application spawns many pthreads with large stacks by letting stack grow
dynamically as needed instead of getting pre-populated ahead of time.

The crux of this solution and the previous versions is based on the observation
that OSv memory fault handler requires that both interrupts and preemption must
be enabled when fault is triggered. Therefore if stack is dynamically mapped
we need to make sure that stack page faults NEVER happen in the relatively
few places of kernel code that executes with either interrupts or preemption
disabled. And we satisfy this requirement by "pre-faulting" the stack by reading
a byte page (4096) down per stack pointer just before preemption or interrupts
are disabled. Now, the problem is complicated by the fact that the kernel code A
that disables preemption or interrupts may nest by calling another kernel
function B that also disables preemption or interrupts in which case the 
function
B should NOT try to pre-fault the stack otherwise the fault handler will abort
due to violated constraint stated before. In short we cannot "blindly"
or unconditionally pre-fault the stack in all places before interrupts or
preemption are disabled. 

Some of the previous solutions modified both arch::irq_disable() and
sched::preempt_disable() to check if both preemption and interrupts
are enabled and only then try to read a byte at -4096 offset down. 
Unfortunately,
this makes it more costly than envisioned by Nadav Har'El - instead of single
instruction to read from memory, compiler needs 4-5 to read data if preemption
and interrupts are enabled and perform relevant jump. To make it worse, the
implementation of arch::irq_enabled() is pretty expensive at least in x64
and uses stack with pushfq. To avoid it the previous solutions would add new
thread local counter and pack irq disabling counter together with preemption 
one.
But even with this optimization I found this approach to deteriorate the
performance quite substantially. For example the memory allocation logic
disables preemption in quite many places (see core/mempool.cc) and corresponding
test - misc-free-perf.cc - would show performance - number of malloc()/free()
executed per second - degrade on average by 15-20%.

So this latest version implemented by this and next 7 patches takes different
approach. Instead of putting the conditional pre-faulting of stack in both
irq_disable() and preempt_disable(), we analyze OSv code to find all places
where irq_disable() and/or preempt_disable() is called directly (or indirectly
sometimes) and pre-fault the stack there if necessary or not. This makes
it obviously way more laborious and prone to human error (we can miss some 
places),
but would make it way more performant (no noticable performance degradation 
noticed)
comparing to earlier versions described in the paragraph above.

As we analyze all call sites, we need to make some observations to help us
decide what exactly to do in each case:
- do nothing
- blindly pre-fault the stack (single instruction)
- conditionally pre-fault the stack (hopefully in very few places)

Rule 1: Do nothing if call site in question executes ALWAYS in kernel thread.

Rule 2: Do nothing if call site executes on populated stack - includes the above
but also code executing on interrupt, exception or syscall stack.

Rule 3: Do nothing if call site executes when we know that either interrupts
or preemption are disabled. Good example is an interrupt handler
or code within WITH_LOCK(irq_lock) or WITH_LOCK(preemption_lock) blocks.

Rule 4: Pre-fault unconditionally if we know that BOTH preemption and interrupts
are enabled. These in most cases can only be deduced by analysing where
the particular function is called. In general any such function called 
by
user code like libc would satisfy the condition. But sometimes it is 
tricky
because kernel might be calling libc functions, such as malloc().

Rule 5: Otherwise pre-fault stack by determining dynamically:
only if sched::preemptable() and irq::enabled()

One general rule is that all potential stack page faults would happen
on application thread stack when some kernel code gets executed down
the call stack.

In general we identify the call sites in following categories:
- direct calls to arch::irq_disable() and arch::irq_disable_notrace() 
(tracepoints)
- direct calls to sched::preempt_disable()
- code using WITH_LOCK() with instance of irq_lock_type or irq_save_lock_type
- code using WITH_LOCK(preempt_lock)
- code using WITH_LOCK(osv::rcu_read_lock)

The above locations can be found with simple grep but also with an IDE like
CLion from JetBrains that can help more efficiently find all direct but also
more importantly indirect usages of the 

Re: [osv-dev] [COMMIT osv master] trace: do not use malloc/free when interrupts are disabled

2022-08-30 Thread 'Nadav Har'El' via OSv Development
On Tue, Aug 30, 2022 at 6:59 PM Commit Bot  wrote:

> From: Waldemar Kozaczuk 
> Committer: Waldemar Kozaczuk 
> Branch: master
>
> trace: do not use malloc/free when interrupts are disabled
>
> This patch fixes a subtle bug found when working on lazy stack changes
> and trying to establish some invariants in relevant places in the kernel
> code.
> One of the findings was that the memory allocation logic or more
> specifically the functions malloc() and free() require both interrupts and
> preemption enabled to function correctly.
>
> If one starts analysis with memory::pool::alloc(), he/she will notice
> a DROP_LOCK(preempt_lock) executed under WITH_LOCK(preempt) if
> the pool is empty. This means that the code assumes the preemption must
> be enabled before calling alloc(), otherwise the code in DROP_LOCK would
> run with preemption disabled which is contrary to the intention.
> Also, after drilling down the add_page() calling path, one eventually
> would find the call to reclaimers::wait() which calls
> thread::wait_until(). The
> thread::do_wait_until() called by the former enforces that both interrupts
> and preemption needs to be enabled with assertion.
>
> The bottom line of the analysis above is that both interrupts and
> preemption
> must be enabled before calling malloc() (same applies to free()). This
> effectively
> means that trying to call malloc() after disabling interrupts might not
> work (which
> would only happen if relevent pool was out of free pages AND l1 pool was
> empty and
> had to refill from l2 pool which probably is quite rare).
>
> So this patch changes the code in trace::create_trace_dump() to avoid
> calling new/malloc() (see copies.emplace_back()) and instead makes it
> preallocate
> the vector ahead of time and then simply copy the trace buffer to a
> relevant
> spot in the vector.
>
> Signed-off-by: Waldemar Kozaczuk 
>
> ---
> diff --git a/core/trace.cc b/core/trace.cc
> --- a/core/trace.cc
> +++ b/core/trace.cc
> @@ -697,7 +697,7 @@ std::string
>  trace::create_trace_dump()
>  {
>  semaphore signal(0);
> -std::vector copies;
> +std::vector copies(sched::cpus.size());
>
>  auto is_valid_tracepoint = [](const tracepoint_base * tp_test) {
>  for (auto & tp : tracepoint_base::tp_list) {
> @@ -717,7 +717,7 @@ trace::create_trace_dump()
>  irq.save();
>  arch::irq_disable_notrace();
>  auto * tbp = percpu_trace_buffer.for_cpu(cpu);
> -copies.emplace_back(*tbp);
> +copies[i] = trace_buf(*tbp);
>

This change is good, but I think you didn't reall have to change the
emplace_back() call - you could have kept the emplace_back() and instead of
the constructor in the beginning, use copies.reserve(sched::cpus.size()).

 irq.restore();
>  signal.post();
>  }, sched::thread::attr().pin(cpu)));
>
> --
> You received this message because you are subscribed to the Google Groups
> "OSv Development" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to osv-dev+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/osv-dev/0da02b05e708%40google.com
> .
>

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/CANEVyjvonrc5fPy-SDWY3r6VfYjhCHDYmgd38%2ByGHdY1GVnC1g%40mail.gmail.com.


Re: [osv-dev] [COMMIT osv master] aarch64 trace: make compiler pick a register instead of x0

2022-08-30 Thread 'Nadav Har'El' via OSv Development
What a difference one character makes :-)

--
Nadav Har'El
n...@scylladb.com


On Tue, Aug 30, 2022 at 6:59 PM Commit Bot  wrote:

> From: Waldemar Kozaczuk 
> Committer: Waldemar Kozaczuk 
> Branch: master
>
> aarch64 trace: make compiler pick a register instead of x0
>
> This subtle 1-character patch fixes a nasty bug that causes interrupts
> to be enabled instead of correctly restored to the state it was when
> saving the state. This bug would affect the tracing logic and result in
> crashes described by the issues #1158 and #1195.
>
> This bug in inline assembly was most likely a typo as I am sure
> the intention was to use '%0' instead of 'x0' to let compiler
> correctly pick a register instead of using any garbage in the x0
> register.
>
> Fixes #1158
> Fixes #1195
>
> Signed-off-by: Waldemar Kozaczuk 
>
> ---
> diff --git a/arch/aarch64/arch.hh b/arch/aarch64/arch.hh
> --- a/arch/aarch64/arch.hh
> +++ b/arch/aarch64/arch.hh
> @@ -79,7 +79,7 @@ inline void irq_flag_notrace::save() {
>  }
>
>  inline void irq_flag_notrace::restore() {
> -asm volatile("msr daif, x0" :: "r"(daif) : "memory");
> +asm volatile("msr daif, %0" :: "r"(daif) : "memory");
>  }
>
>  inline bool irq_flag_notrace::enabled() const {
>
> --
> You received this message because you are subscribed to the Google Groups
> "OSv Development" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to osv-dev+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/osv-dev/fa038905e6fc%40google.com
> .
>

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/CANEVyjur8-AnHOYW1vc1BqCK7Luj%2BnLhsCU5fNKFkGo5bnfi6w%40mail.gmail.com.


[osv-dev] [COMMIT osv master] aarch64: download boost program-options library needed by some tests

2022-08-30 Thread Commit Bot
From: Waldemar Kozaczuk 
Committer: Waldemar Kozaczuk 
Branch: master

aarch64: download boost program-options library needed by some tests

Signed-off-by: Waldemar Kozaczuk 

---
diff --git a/scripts/download_aarch64_packages.py 
b/scripts/download_aarch64_packages.py
--- a/scripts/download_aarch64_packages.py
+++ b/scripts/download_aarch64_packages.py
@@ -34,6 +34,7 @@ def fedora_download_commands(fedora_version):
   'boost-filesystem',
   'boost-test',
   'boost-chrono',
+  'boost-program-options',
   'boost-timer']
 script_path = '%s/scripts/download_fedora_aarch64_rpm_package.sh' % 
osv_root
 
@@ -55,6 +56,8 @@ def ubuntu_download_commands(boost_long_version):
   'libboost-test%s-dev' % boost_short_version,
   'libboost-timer%s' % boost_short_version,
   'libboost-timer%s-dev' % boost_short_version,
+  'libboost-program-options%s' % boost_short_version,
+  'libboost-program-options%s-dev' % boost_short_version,
   'libboost-chrono%s' % boost_short_version,
   'libboost-chrono%s-dev' % boost_short_version]
 

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/2066ee05e7ef%40google.com.


[osv-dev] [COMMIT osv master] trace: do not use malloc/free when interrupts are disabled

2022-08-30 Thread Commit Bot
From: Waldemar Kozaczuk 
Committer: Waldemar Kozaczuk 
Branch: master

trace: do not use malloc/free when interrupts are disabled

This patch fixes a subtle bug found when working on lazy stack changes
and trying to establish some invariants in relevant places in the kernel code.
One of the findings was that the memory allocation logic or more
specifically the functions malloc() and free() require both interrupts and
preemption enabled to function correctly.

If one starts analysis with memory::pool::alloc(), he/she will notice
a DROP_LOCK(preempt_lock) executed under WITH_LOCK(preempt) if
the pool is empty. This means that the code assumes the preemption must
be enabled before calling alloc(), otherwise the code in DROP_LOCK would
run with preemption disabled which is contrary to the intention.
Also, after drilling down the add_page() calling path, one eventually
would find the call to reclaimers::wait() which calls thread::wait_until(). The
thread::do_wait_until() called by the former enforces that both interrupts
and preemption needs to be enabled with assertion.

The bottom line of the analysis above is that both interrupts and preemption
must be enabled before calling malloc() (same applies to free()). This 
effectively
means that trying to call malloc() after disabling interrupts might not work 
(which
would only happen if relevent pool was out of free pages AND l1 pool was empty 
and
had to refill from l2 pool which probably is quite rare).

So this patch changes the code in trace::create_trace_dump() to avoid
calling new/malloc() (see copies.emplace_back()) and instead makes it 
preallocate
the vector ahead of time and then simply copy the trace buffer to a relevant
spot in the vector.

Signed-off-by: Waldemar Kozaczuk 

---
diff --git a/core/trace.cc b/core/trace.cc
--- a/core/trace.cc
+++ b/core/trace.cc
@@ -697,7 +697,7 @@ std::string
 trace::create_trace_dump()
 {
 semaphore signal(0);
-std::vector copies;
+std::vector copies(sched::cpus.size());
 
 auto is_valid_tracepoint = [](const tracepoint_base * tp_test) {
 for (auto & tp : tracepoint_base::tp_list) {
@@ -717,7 +717,7 @@ trace::create_trace_dump()
 irq.save();
 arch::irq_disable_notrace();
 auto * tbp = percpu_trace_buffer.for_cpu(cpu);
-copies.emplace_back(*tbp);
+copies[i] = trace_buf(*tbp);
 irq.restore();
 signal.post();
 }, sched::thread::attr().pin(cpu)));

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/0da02b05e708%40google.com.


[osv-dev] [COMMIT osv master] aarch64 trace: make compiler pick a register instead of x0

2022-08-30 Thread Commit Bot
From: Waldemar Kozaczuk 
Committer: Waldemar Kozaczuk 
Branch: master

aarch64 trace: make compiler pick a register instead of x0

This subtle 1-character patch fixes a nasty bug that causes interrupts
to be enabled instead of correctly restored to the state it was when
saving the state. This bug would affect the tracing logic and result in
crashes described by the issues #1158 and #1195.

This bug in inline assembly was most likely a typo as I am sure
the intention was to use '%0' instead of 'x0' to let compiler
correctly pick a register instead of using any garbage in the x0
register.

Fixes #1158
Fixes #1195

Signed-off-by: Waldemar Kozaczuk 

---
diff --git a/arch/aarch64/arch.hh b/arch/aarch64/arch.hh
--- a/arch/aarch64/arch.hh
+++ b/arch/aarch64/arch.hh
@@ -79,7 +79,7 @@ inline void irq_flag_notrace::save() {
 }
 
 inline void irq_flag_notrace::restore() {
-asm volatile("msr daif, x0" :: "r"(daif) : "memory");
+asm volatile("msr daif, %0" :: "r"(daif) : "memory");
 }
 
 inline bool irq_flag_notrace::enabled() const {

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/fa038905e6fc%40google.com.