[osv-dev] Re: [PATCH] threads: allocate application stacks lazily to save memory

Waldek Kozaczuk Thu, 30 Jan 2020 11:36:20 -0800


On Thursday, January 30, 2020 at 12:18:38 AM UTC-5, Waldek Kozaczuk wrote:
>
> Unfortunately, after more extensive testing (even though all unit tests 
> pass), I have realized this patch has some major flaws but hopefully 
> repairable.
>
> I have tried to run the http test (cd modules/httpserver-api && make 
> check-http) and got the app hung. After some debugging, I found this stack 
> trace that explains why:
>
> #0  sched::thread::switch_to (this=0xffff8000009c8040, 
> this@entry=0xffff80007fee1040) at arch/x64/arch-switch.hh:108
> #1  0x000000004040c57a in sched::cpu::reschedule_from_interrupt 
> (this=0xffff80000001f040, called_from_yield=called_from_yield@entry=false, 
> preempt_after=..., preempt_after@entry=...) at core/sched.cc:339
> #2  0x000000004040d2e8 in sched::cpu::schedule () at 
> include/osv/sched.hh:1316
> #3  0x000000004040d406 in sched::thread::wait 
> (this=this@entry=0xffff800003fa8040) at core/sched.cc:1216
> #4  0x000000004043a856 in sched::thread::do_wait_for<lockfree::mutex, 
> sched::wait_object<waitqueue> > (mtx=...) at include/osv/mutex.h:41
> #5  sched::thread::wait_for<waitqueue&> (mtx=...) at 
> include/osv/sched.hh:1226
> #6  waitqueue::wait (this=this@entry=0x408f04d0 <mmu::vma_list_mutex+48>, 
> mtx=...) at core/waitqueue.cc:56
> #7  0x00000000403ea41b in rwlock::reader_wait_lockable (this=<optimized 
> out>) at core/rwlock.cc:174
> #8  rwlock::rlock (this=this@entry=0x408f04a0 <mmu::vma_list_mutex>) at 
> core/rwlock.cc:29
> #9  0x000000004034cbac in rwlock_for_read::lock (this=0x408f04a0 
> <mmu::vma_list_mutex>) at include/osv/rwlock.h:113
> #10 std::lock_guard<rwlock_for_read&>::lock_guard (__m=..., 
> this=<synthetic pointer>) at /usr/include/c++/9/bits/std_mutex.h:159
> #11 lock_guard_for_with_lock<rwlock_for_read&>::lock_guard_for_with_lock 
> (lock=..., this=<synthetic pointer>) at include/osv/mutex.h:89
> #12 mmu::vm_fault (addr=35184666537984, addr@entry=35184666541728, 
> ef=ef@entry=0xffff800003fad068) at core/mmu.cc:1333
> #13 0x00000000403ad539 in page_fault (ef=0xffff800003fad068) at 
> arch/x64/mmu.cc:42
> #14 <signal handler called>
> #15 arch::ensure_next_stack_page () at arch/x64/arch.hh:37
> #16 sched::preempt_disable () at include/osv/sched.hh:1008
> #17 preempt_lock_t::lock (this=<optimized out>) at 
> include/osv/preempt-lock.hh:15
> #18 std::lock_guard<preempt_lock_t>::lock_guard (__m=..., this=<synthetic 
> pointer>) at /usr/include/c++/9/bits/std_mutex.h:159
> #19 lock_guard_for_with_lock<preempt_lock_t>::lock_guard_for_with_lock 
> (lock=..., this=<synthetic pointer>) at include/osv/mutex.h:89
> #20 memory::pool::alloc (this=0x40907608 <memory::malloc_pools+168>) at 
> core/mempool.cc:214
> #21 0x00000000403f936f in std_malloc (size=80, alignment=16) at 
> core/mempool.cc:1679
> #22 0x00000000403f97db in malloc (size=80) at core/mempool.cc:1887
> #23 0x00000000404c0089 in operator new(unsigned long) ()
> #24 0x000000004034c9d2 in mmu::anon_vma::split (this=0xffffa00001f69e80, 
> edge=4136108032) at include/osv/addr_range.hh:16
> #25 0x000000004034ef93 in mmu::evacuate (start=<optimized out>, 
> end=<optimized out>) at /usr/include/boost/move/detail/meta_utils.hpp:267
> #26 0x000000004035007c in mmu::allocate (v=v@entry=0xffffa000050e4600, 
> start=start@entry=4127195136, size=size@entry=8912896, 
> search=search@entry=false)
>     at core/mmu.cc:1116
> #27 0x0000000040350e87 in mmu::map_anon (addr=addr@entry=0xf6000000, 
> size=size@entry=8912896, flags=flags@entry=1, perm=perm@entry=3) at 
> core/mmu.cc:1219
> #28 0x000000004047d9f0 in mmap (addr=0xf6000000, length=8912896, 
> prot=<optimized out>, flags=<optimized out>, fd=<optimized out>, offset=0)
>     at libc/mman.cc:152
> #29 0x0000100000f2dcef in os::Linux::commit_memory_impl(char*, unsigned 
> long, bool) ()
> #30 0x0000100000f2dfe9 in os::pd_commit_memory(char*, unsigned long, 
> unsigned long, bool) ()
> #31 0x0000100000f22a4a in os::commit_memory(char*, unsigned long, unsigned 
> long, bool) ()
> #32 0x0000100000f956db in PSVirtualSpace::expand_by(unsigned long) ()
> #33 0x0000100000f96998 in PSYoungGen::resize_generation(unsigned long, 
> unsigned long) ()
> #34 0x0000100000f95ad2 in PSYoungGen::resize(unsigned long, unsigned long) 
> ()
> #35 0x0000100000f92e9f in PSScavenge::invoke_no_policy() ()
> #36 0x0000100000f93673 in PSScavenge::invoke() ()
> #37 0x0000100000f4de5d in 
> ParallelScavengeHeap::failed_mem_allocate(unsigned long) ()
> #38 0x00001000010cfd97 in VM_ParallelGCFailedAllocation::doit() ()
> #39 0x00001000010d7572 in VM_Operation::evaluate() ()
> #40 0x00001000010d526b in VMThread::evaluate_operation(VM_Operation*) ()
> #41 0x00001000010d65ab in VMThread::loop() ()
> #42 0x00001000010d6a13 in VMThread::run() ()
> #43 0x0000100000f2e152 in java_start(Thread*) ()
> #44 0x00000000404773ba in pthread_private::pthread::<lambda()>::operator() 
> (__closure=0xffffa00002ddbd00) at libc/pthread.cc:115
> #45 std::_Function_handler<void(), pthread_private::pthread::pthread(void* 
> (*)(void*), void*, sigset_t, const 
> pthread_private::thread_attr*)::<lambda()> >::_M_invoke(const 
> std::_Any_data &) (__functor=...) at 
> /usr/include/c++/9/bits/std_function.h:300
> #46 0x000000004040da1c in sched::thread_main_c (t=0xffff800003fa8040) at 
> arch/x64/arch-switch.hh:326
> #47 0x00000000403ad2f3 in thread_main () at arch/x64/entry.S:113
>
> First of I realized that reschedule_from_interrupt() happens within the 
> context of irq_lock which ends up calling disable/enable irq methods which 
> increment/decrement the thread-local irq counter. This means that at least 
> in this case where irq_lock is used (and I think in most others), the 
> counter gets incremented on a different thread that decremented. It looks i 
> have been just lucky that all other tests have worked so far. Or possibly 
> most cases where interrupts are enabled/disabled not much stack is used. 
> But clearly I have to either replace this counter mechanism with something 
> else or adjust it.
>


This is false alarm. I somehow got it wrong in my head. The irq counter 
getting incremented on one thread when irq are disabled and getting 
decremented on another one after a switch is OK. It should not cause any 
issues which I somehow imagined in my head :-)

>
> The first bug is not really a reason for the deadlock you see above - 
> frame 27 core/mmu.cc:1219 and frame 12 core/mmu.cc:1333. I think we have 
> a situation when we have ongoing memory allocation (see memory::pool::alloc) 
> which then requires disabling preemption which triggers 
> ensure_next_stack_page() trigger a fault that tries to allocate memory and 
> ends up trying to acquire same lock - mmu::vma_list_mutex. 
>
> I am not sure what is the right way to fix it. One way could be to prevent 
> this deadlock could be accomplished by preventing the fault. I experimented 
> a bit and adding these 3 lines to mmap() to trigger earlier fault to ensure 
> 2 pages of stack made the deadlock go away:
>
> --- a/libc/mman.cc
> +++ b/libc/mman.cc
> @@ -154,6 +149,9 @@ void *mmap(void *addr, size_t length, int prot, int 
> flags,
>              }
>          }
>          try {
> +            char i;
> +            asm volatile("movb -4096(%%rsp), %0" : "=r"(i));
> +            asm volatile("movb -8096(%%rsp), %0" : "=r"(i));
>              ret = mmu::map_anon(addr, length, mmap_flags, mmap_perm);
>          } catch (error& err) {
>              err.to_libc(); // sets errno
>
> Now he most troubling question is whether it an example of the situation 
> where memory allocation causes a page fault by ensure_next_stack_page() 
> on disabling preemption or interrupts and we just need to pre-fault deeper 
> in all these cases (hopefully not to many) or there are many other types of 
> scenarios we have not foreseen?
>
> This is obviously still a problem. And I have a better understanding of it 
and more confidence there is one category this deadlock falls into and 
there is a solution to deal with. 

Let me explain what exactly is happening per this stack trace. First  the 
mmu::map_anon() acquires the lock *vma_list_mutex*  for write, then 
downstream malloc() called by *mmu::anon_vma::split*() eventually leads to  
preempt_disable() being called which cause a fault which then eventually 
tried to acquire same lock *vma_list_mutex *for read which it cannot (see 
mmu::vm_faultt()). Even if this lock was recursive (which I think it is 
non), trying to handle memory allocation that ends up requiring vma list 
change when vma list change is being attempted as triggered in this example 
by  mmu::map_anon() cannot not be really handled as it is a logical 
recursion or "chicker-or-egg" problem, right?

So really the only solution is to prevent the page fault caused by 
*ensure_next_stack_page*() downstream. One way could be to read 2 pages in 
all the places in core/mmu.cc before they acquire the lock 
*vma_list_mutex  *to write so that once *ensure_next_stack_page *is called, 
it would not trigger the page fault. Based on some initials tests the 
deadlock issue went away. Do you think it is a right fix?

 

> Waldek
>
> On Wednesday, January 29, 2020 at 8:14:30 AM UTC-5, Waldek Kozaczuk wrote:
>>
>> As the issue #143 explains, currently applications stacks are eagerly 
>> allocated (aka pre-populated) which may lead to substantial memory waste. 
>> We want those stacks to be lazily allocated instead, so that physical 
>> memory behind the stack gets allocated as needed by standard fault 
>> handling mechanism. 
>>
>> The page fault handling logic requires that both interrupts and 
>> preemption 
>> are enabled. Therefore this patch implements simple strategy to read 
>> single byte on a stack 
>> one page (4K) deeper before disabling interrupts or preemption. 
>>
>> It turns out we cannot read that single byte "blindly" because in some 
>> scenarios 
>> interrupts might be already disabled when we are about to disable 
>> preemption and vice versa. 
>> Also similarly, which is more speculative, disabling preemption (and 
>> possibly interrupts) may nest. 
>> Therefore we need a "conditional" read of that byte on stack. More 
>> specifically 
>> we should ONLY read the byte IF both interrupts and preemption are 
>> enabled. 
>>
>> This patch achieves this by adding extra thread-local counter to track 
>> interrupts disabling/enabling 
>> that operates in a similar way the preemption counter does. So every time 
>> interrupts 
>> are disabled we increment the counter and decrement it everytime 
>> interrupts are enabled back. 
>> Then everytime before we disable interrupts or preemption, we check if 
>> both counters 
>> are 0 and only then try to read a byte to potentially trigger a fault. 
>>
>> Please note that performance-wise this patch is slighy more expensive 
>> than it 
>> was originally hoped. As the disassembled snippets illustrate it costs 
>> extra 3-4 instructions 
>> extra every time we disable preemption or interrupts. 
>>
>> A. Disabling preemption in __tls_get_addr before the patch: 
>> ``` 
>>  4035cae8:   e8 83 f6 ff ff          callq  4035c170 <elf::get_program()> 
>>  4035caed:   4c 8b 2d 6c 40 51 00    mov    0x51406c(%rip),%r13        # 
>> 40870b60 <sched::preempt_counter+0x40870b60> 
>>  4035caf4:   4c 8b 33                mov    (%rbx),%r14 
>>  4035caf7:   64 41 83 45 00 01       addl   $0x1,%fs:0x0(%r13) 
>>  4035cafd:   e8 6e f6 ff ff          callq  4035c170 <elf::get_program()> 
>> ``` 
>>
>> B. Disabling preemption in __tls_get_addr after the patch: 
>> ``` 
>>  4035fdd8:   e8 63 f6 ff ff          callq  4035f440 <elf::get_program()> 
>>  4035fddd:   4c 8b 2d a4 4d 51 00    mov    0x514da4(%rip),%r13        # 
>> 40874b88 <arch::irq_preempt_counters+0x40874b88> 
>>  4035fde4:   4c 8b 33                mov    (%rbx),%r14 
>>  4035fde7:   64 49 83 7d 00 00       cmpq   $0x0,%fs:0x0(%r13) 
>>  4035fded:   75 07                   jne    4035fdf6 
>> <__tls_get_addr+0x76> 
>>  4035fdef:   8a 84 24 00 f0 ff ff    mov    -0x1000(%rsp),%al 
>>  4035fdf6:   64 41 83 45 00 01       addl   $0x1,%fs:0x0(%r13) 
>>  4035fdfc:   e8 3f f6 ff ff          callq  4035f440 <elf::get_program()> 
>> ``` 
>>
>> As an example of memory saving, tomcat using ~ 300 thread ended up 365MB 
>> instead 
>> of 620MB before the patch. 
>>
>> Fixes #143 
>>
>> Signed-off-by: Matthew Pabst <pabstmatt...@gmail.com> 
>> Signed-off-by: Waldemar Kozaczuk <jwkozac...@gmail.com> 
>> --- 
>>  arch/aarch64/arch.hh    |  6 ++++++ 
>>  arch/x64/arch-switch.hh | 10 ++++++++++ 
>>  arch/x64/arch.hh        | 20 ++++++++++++++++++++ 
>>  core/sched.cc           |  5 ++++- 
>>  include/osv/counters.hh | 31 +++++++++++++++++++++++++++++++ 
>>  include/osv/irqlock.hh  |  2 ++ 
>>  include/osv/mmu-defs.hh |  3 +-- 
>>  include/osv/sched.hh    |  9 +++++---- 
>>  libc/mman.cc            |  9 ++++----- 
>>  libc/pthread.cc         |  5 +++++ 
>>  10 files changed, 88 insertions(+), 12 deletions(-) 
>>  create mode 100644 include/osv/counters.hh 
>>
>> diff --git a/arch/aarch64/arch.hh b/arch/aarch64/arch.hh 
>> index 855f1987..7dae53f5 100644 
>> --- a/arch/aarch64/arch.hh 
>> +++ b/arch/aarch64/arch.hh 
>> @@ -11,6 +11,8 @@ 
>>  #define ARCH_HH_ 
>>   
>>  #include "processor.hh" 
>> +#include <osv/barrier.hh> 
>> +#include <osv/counters.hh> 
>>   
>>  // architecture independent interface for architecture dependent 
>> operations 
>>   
>> @@ -20,6 +22,10 @@ namespace arch { 
>>  #define INSTR_SIZE_MIN 4 
>>  #define ELF_IMAGE_START (OSV_KERNEL_BASE + 0x10000) 
>>   
>> +inline void ensure_next_stack_page() { 
>> +    //TODO: Implement lazy stack check for AARCH64 
>> +} 
>> + 
>>  inline void irq_disable() 
>>  { 
>>      processor::irq_disable(); 
>> diff --git a/arch/x64/arch-switch.hh b/arch/x64/arch-switch.hh 
>> index 6803498f..048beb6f 100644 
>> --- a/arch/x64/arch-switch.hh 
>> +++ b/arch/x64/arch-switch.hh 
>> @@ -148,6 +148,13 @@ void thread::init_stack() 
>>      _state.rip = reinterpret_cast<void*>(thread_main); 
>>      _state.rsp = stacktop; 
>>      _state.exception_stack = _arch.exception_stack + 
>> sizeof(_arch.exception_stack); 
>> + 
>> +    if (stack.lazy) { 
>> +        // If the thread stack is setup to get lazily allocated and 
>> given the thread initially starts 
>> +        // running with preemption disabled, we need to pre-fault the 
>> first stack page. 
>> +        volatile char r = *((char *) (stacktop - 1)); 
>> +        (void) r; // trick the compiler into thinking that r is used 
>> +    } 
>>  } 
>>   
>>  void thread::setup_tcb() 
>> @@ -311,6 +318,9 @@ void thread::free_tcb() 
>>   
>>  void thread_main_c(thread* t) 
>>  { 
>> +    if (t->get_stack_info().lazy) { 
>> +        arch::irq_preempt_counters.irq = 
>> arch::irq_counter_lazy_stack_init_value; 
>> +    } 
>>      arch::irq_enable(); 
>>  #ifdef CONF_preempt 
>>      preempt_enable(); 
>> diff --git a/arch/x64/arch.hh b/arch/x64/arch.hh 
>> index 17df5f5c..5cadc562 100644 
>> --- a/arch/x64/arch.hh 
>> +++ b/arch/x64/arch.hh 
>> @@ -10,6 +10,8 @@ 
>>   
>>  #include "processor.hh" 
>>  #include "msr.hh" 
>> +#include <osv/barrier.hh> 
>> +#include <osv/counters.hh> 
>>   
>>  // namespace arch - architecture independent interface for architecture 
>>  //                  dependent operations (e.g. irq_disable vs. cli) 
>> @@ -20,8 +22,19 @@ namespace arch { 
>>  #define INSTR_SIZE_MIN 1 
>>  #define ELF_IMAGE_START OSV_KERNEL_BASE 
>>   
>> +inline void ensure_next_stack_page() { 
>> +    if (irq_preempt_counters.any_is_on) { 
>> +        return; 
>> +    } 
>> + 
>> +    char i; 
>> +    asm volatile("movb -4096(%%rsp), %0" : "=r"(i)); 
>> +} 
>> + 
>>  inline void irq_disable() 
>>  { 
>> +    ensure_next_stack_page(); 
>> +    ++irq_preempt_counters.irq; 
>>      processor::cli(); 
>>  } 
>>   
>> @@ -36,11 +49,15 @@ inline void irq_disable_notrace() 
>>  inline void irq_enable() 
>>  { 
>>      processor::sti(); 
>> +    --irq_preempt_counters.irq; 
>> +    barrier(); 
>>  } 
>>   
>>  inline void wait_for_interrupt() 
>>  { 
>>      processor::sti_hlt(); 
>> +    --irq_preempt_counters.irq; 
>> +    barrier(); 
>>  } 
>>   
>>  inline void halt_no_interrupts() 
>> @@ -78,12 +95,15 @@ private: 
>>   
>>  inline void irq_flag_notrace::save() 
>>  { 
>> +    ensure_next_stack_page(); 
>> +    ++irq_preempt_counters.irq; 
>>      asm volatile("sub $128, %%rsp; pushfq; popq %0; add $128, %%rsp" : 
>> "=r"(_rflags)); 
>>  } 
>>   
>>  inline void irq_flag_notrace::restore() 
>>  { 
>>      asm volatile("sub $128, %%rsp; pushq %0; popfq; add $128, %%rsp" : : 
>> "r"(_rflags)); 
>> +    --irq_preempt_counters.irq; 
>>  } 
>>   
>>  inline bool irq_flag_notrace::enabled() const 
>> diff --git a/core/sched.cc b/core/sched.cc 
>> index 06f849d1..295e3de0 100644 
>> --- a/core/sched.cc 
>> +++ b/core/sched.cc 
>> @@ -42,6 +42,10 @@ extern char _percpu_start[], _percpu_end[]; 
>>  using namespace osv; 
>>  using namespace osv::clock::literals; 
>>   
>> +namespace arch { 
>> +counters __thread irq_preempt_counters = {{1, 
>> irq_counter_default_init_value}}; 
>> +} 
>> + 
>>  namespace sched { 
>>   
>>  TRACEPOINT(trace_sched_idle, ""); 
>> @@ -69,7 +73,6 @@ std::vector<cpu*> cpus 
>> __attribute__((init_priority((int)init_prio::cpus))); 
>>  thread __thread * s_current; 
>>  cpu __thread * current_cpu; 
>>   
>> -unsigned __thread preempt_counter = 1; 
>>  bool __thread need_reschedule = false; 
>>   
>>  elf::tls_data tls; 
>> diff --git a/include/osv/counters.hh b/include/osv/counters.hh 
>> new file mode 100644 
>> index 00000000..38d40fb6 
>> --- /dev/null 
>> +++ b/include/osv/counters.hh 
>> @@ -0,0 +1,31 @@ 
>> +/* 
>> + * Copyright (C) 2020 Waldemar Kozaczuk 
>> + * 
>> + * This work is open source software, licensed under the terms of the 
>> + * BSD license as described in the LICENSE file in the top-level 
>> directory. 
>> + */ 
>> + 
>> +#ifndef OSV_COUNTERS_HH 
>> +#define OSV_COUNTERS_HH 
>> + 
>> +namespace arch { 
>> + 
>> +// Both preempt and irq counters are colocated next to each other in 
>> this 
>> +// union by design to let compiler optimize ensure_next_stack_page() so 
>> +// that it can check if any counter is non-zero by checking single 
>> 64-bit any_is_on field 
>> +union counters { 
>> +    struct { 
>> +        unsigned preempt; 
>> +        unsigned irq; 
>> +    }; 
>> +    uint64_t any_is_on; 
>> +}; 
>> + 
>> +extern counters __thread irq_preempt_counters; 
>> + 
>> +constexpr unsigned irq_counter_default_init_value = 11; 
>> +constexpr unsigned irq_counter_lazy_stack_init_value = 1; 
>> + 
>> +} 
>> + 
>> +#endif //OSV_COUNTERS_HH 
>> diff --git a/include/osv/irqlock.hh b/include/osv/irqlock.hh 
>> index 2d5372fc..1a163e45 100644 
>> --- a/include/osv/irqlock.hh 
>> +++ b/include/osv/irqlock.hh 
>> @@ -34,6 +34,8 @@ inline void irq_save_lock_type::lock() 
>>  inline void irq_save_lock_type::unlock() 
>>  { 
>>      _flags.restore(); 
>> +    --arch::irq_preempt_counters.irq; 
>> +    barrier(); 
>>  } 
>>   
>>  extern irq_lock_type irq_lock; 
>> diff --git a/include/osv/mmu-defs.hh b/include/osv/mmu-defs.hh 
>> index 18edf441..a731ed6e 100644 
>> --- a/include/osv/mmu-defs.hh 
>> +++ b/include/osv/mmu-defs.hh 
>> @@ -15,8 +15,6 @@ 
>>   
>>  struct exception_frame; 
>>   
>> -struct exception_frame; 
>> - 
>>  namespace mmu { 
>>   
>>  constexpr uintptr_t page_size = 4096; 
>> @@ -84,6 +82,7 @@ enum { 
>>      mmap_small       = 1ul << 5, 
>>      mmap_jvm_balloon = 1ul << 6, 
>>      mmap_file        = 1ul << 7, 
>> +    mmap_stack       = 1ul << 8, 
>>  }; 
>>   
>>  enum { 
>> diff --git a/include/osv/sched.hh b/include/osv/sched.hh 
>> index 0fb44f77..d0e9b00f 100644 
>> --- a/include/osv/sched.hh 
>> +++ b/include/osv/sched.hh 
>> @@ -335,6 +335,7 @@ public: 
>>          void* begin; 
>>          size_t size; 
>>          void (*deleter)(stack_info si);  // null: don't delete 
>> +        bool lazy = false; 
>>          static void default_deleter(stack_info si); 
>>      }; 
>>      struct attr { 
>> @@ -978,14 +979,13 @@ inline void release(mutex_t* mtx) 
>>      } 
>>  } 
>>   
>> -extern unsigned __thread preempt_counter; 
>>  extern bool __thread need_reschedule; 
>>   
>>  #ifdef __OSV_CORE__ 
>>  inline unsigned int get_preempt_counter() 
>>  { 
>>      barrier(); 
>> -    return preempt_counter; 
>> +    return arch::irq_preempt_counters.preempt; 
>>  } 
>>   
>>  inline bool preemptable() 
>> @@ -1005,14 +1005,15 @@ inline void preempt() 
>>   
>>  inline void preempt_disable() 
>>  { 
>> -    ++preempt_counter; 
>> +    arch::ensure_next_stack_page(); 
>> +    ++arch::irq_preempt_counters.preempt; 
>>      barrier(); 
>>  } 
>>   
>>  inline void preempt_enable() 
>>  { 
>>      barrier(); 
>> -    --preempt_counter; 
>> +    --arch::irq_preempt_counters.preempt; 
>>      if (preemptable() && need_reschedule && arch::irq_enabled()) { 
>>          cpu::schedule(); 
>>      } 
>> diff --git a/libc/mman.cc b/libc/mman.cc 
>> index d0803ac4..d00befc3 100644 
>> --- a/libc/mman.cc 
>> +++ b/libc/mman.cc 
>> @@ -38,12 +38,11 @@ unsigned libc_flags_to_mmap(int flags) 
>>          mmap_flags |= mmu::mmap_populate; 
>>      } 
>>      if (flags & MAP_STACK) { 
>> -        // OSv currently requires that stacks be pinned (see issue 
>> #143). So 
>> -        // if an application wants to mmap() a stack for 
>> pthread_attr_setstack 
>> -        // and did us the courtesy of telling this to ue (via 
>> MAP_STACK), 
>> -        // let's return the courtesy by returning pre-faulted memory. 
>> -        // FIXME: If issue #143 is fixed, this workaround should be 
>> removed. 
>> +#ifdef AARCH64_PORT_STUB 
>>          mmap_flags |= mmu::mmap_populate; 
>> +#else 
>> +        mmap_flags |= mmu::mmap_stack; 
>> +#endif 
>>      } 
>>      if (flags & MAP_SHARED) { 
>>          mmap_flags |= mmu::mmap_shared; 
>> diff --git a/libc/pthread.cc b/libc/pthread.cc 
>> index 8c976bf6..5c00e8b9 100644 
>> --- a/libc/pthread.cc 
>> +++ b/libc/pthread.cc 
>> @@ -140,10 +140,15 @@ namespace pthread_private { 
>>              return {attr.stack_begin, attr.stack_size}; 
>>          } 
>>          size_t size = attr.stack_size; 
>> +#ifdef AARCH64_PORT_STUB 
>>          void *addr = mmu::map_anon(nullptr, size, mmu::mmap_populate, 
>> mmu::perm_rw); 
>> +#else 
>> +        void *addr = mmu::map_anon(nullptr, size, mmu::mmap_stack, 
>> mmu::perm_rw); 
>> +#endif 
>>          mmu::mprotect(addr, attr.guard_size, 0); 
>>          sched::thread::stack_info si{addr, size}; 
>>          si.deleter = free_stack; 
>> +        si.lazy = true; 
>>          return si; 
>>      } 
>>   
>> -- 
>> 2.20.1 
>>
>>

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/89dd5c57-0c60-4423-9cf9-20c9b5ef474a%40googlegroups.com.

[osv-dev] Re: [PATCH] threads: allocate application stacks lazily to save memory

Reply via email to