On Sun, Dec 8, 2019 at 4:10 PM Waldek Kozaczuk <jwkozac...@gmail.com> wrote:

>
>
> On Sun, Dec 8, 2019 at 07:32 Nadav Har'El <n...@scylladb.com> wrote:
>
>>
>> On Sat, Dec 7, 2019 at 6:07 AM Waldek Kozaczuk <jwkozac...@gmail.com>
>> wrote:
>>
>>> On second thought, let me tweak my suggestions and add some more.
>>>
>>> On Friday, December 6, 2019 at 5:05:03 PM UTC-5, Waldek Kozaczuk wrote:
>>>>
>>>>
>>>>
>>>> On Friday, December 6, 2019 at 4:06:07 PM UTC-5, Matthew Pabst wrote:
>>>>>
>>>>> Ah, I think I understand things better now.
>>>>>
>>>>> Here is my current idea for a solution:
>>>>>
>>>>> Everytime we disable interrupts and preemption (in irq_disable() and
>>>>> preempt_disable()), we call a method like ensure_stack_mapped(), which 
>>>>> will
>>>>> do the following:
>>>>>
>>>>
>>>>>    - If the current stack pointer is in a region mapped as
>>>>>    mmu::mmap_stack (i.e. an application stack), then we read the next few
>>>>>    stack pages.
>>>>>    - Otherwise the stack pointer is in a kernel stack, which is
>>>>>    already mapped so reading ahead may cause a segmentation fault.
>>>>>
>>>>> I do not think you need to differentiate between "kernel" vs "app"
>>>> stack. In this context (forget fault handling kernel code that uses
>>>> separate stack) both kernel and application code would run on the same
>>>> application stack. Unless you meant simply checking if the reading byte on
>>>> the stack 1 or 2 pages ahead still fits within the stack mapped area. Which
>>>> is indeed what we have to do (application stack is almost full it is full
>>>> and we cannot do anything about). But I would start without making that
>>>> check.
>>>>
>>> I think eventually we want to differentiate between "kernel" and
>>> "application" threads and only ensure stack for application threads.
>>>
>>
>> Why are you saying this?
>>
>> The only reason I can think to do that is code which already has a tiny
>> stack (like 16KB) and we don't want to waste the last page of the stack.
>> Is this the reason you were thinking of?
>>
> Well, the kernel stacks are always fully pre-allocated (and will stay like
> so, right?) so why bother triggering page fault if there will be none ever?
> But maybe extra if is more expensive.
>

Ok, so it's not a "kernel" vs "application" code issue, but rather a
question of how the current thread's stack was allocated.
We can easily have a thread-local boolean, whether the current stack was
allocated by mmap() or malloc(), and check that. But it will be an extra
if().

The beauty of the simplistic approach I hoped we could do - to just read
one byte - was that such an instruction would, at least in theory, be even
cheaper than a if() needed to decide not to do it in the first place. But
wasting the last page of a small stack (especially the exception stacks,
etc.) may be more of a problem.


>
> But I am worried we might need some “if” just to make sure we do not cause
> some stack overflow and page fault into area that is not mapped when for
> example the code is executing very deep almost at the bottom of the stack.
> Especially it app uses tiny stacks like Golang does?
>
>>
>> Why not "mov" for reading? Why writing?
>>
> Where would you “mov” to?
>

I think you can do something like this (completely untested):

    int i;
    asm volatile("mov -4096(%rsp), %0" : "=r"(i));


To let the compiler choose the register to write to, hoping there will be
one free in the calling code. I don't know if there's a mov replacement
which doesn't store its result anywhere :-(



> Can we use any general register without affecting surrounding compiler
> generated code? We could always push and then pop the read value but that
> would be 2 instructions instead of single one;
>
> push -4096(%rsp)
> pop
>

>
>>
>>
>>> mov $0,-4096(%rsp)
>>> mov $0,-8192(%rsp) //For second page, etc
>>>
>>>
>>> This effectively would read single-byte 4096 bytes deep in the stack and
>>>> automatically trigger page fault right at this moment IF -4096(%rsp)has
>>>> not been allocated yet. For starters, it might be as simple as that.
>>>>
>>>> I am not sure if cmp is the best instruction to use here or maybe there
>>>> is a better assembly instruction for that.
>>>>
>>>> Finally to see the effect of this logic you will need to revert this
>>> commit -
>>> https://github.com/cloudius-systems/osv/commit/41efdc1cb8b19216d98fc0baf38e5312fd3c27d9
>>> .
>>>
>>> Testing these changes will be fun :-) I would start with unit tests and
>>> especially use tst-pipe to run it 100 or more times to verify things do not
>>> break. Then it would nice to measure the improvements of memory
>>> utilizations - please see this email thread -
>>> https://groups.google.com/d/msg/osv-dev/LqUAWjzVpSc/sF-G4dYYBgAJ.
>>> Hopefully, with your changes, my Java example with 400 threads will need
>>> much less memory than 600MB.
>>>
>>> Good luck!
>>>
>>>>
>>>> Let me know what you think,
>>>>> Matthew
>>>>>
>>>>> On Thursday, December 5, 2019 at 8:05:15 AM UTC-6, Nadav Har'El wrote:
>>>>>>
>>>>>> On Tue, Dec 3, 2019 at 5:56 PM Matthew Pabst <pabstm...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Thanks for the response.
>>>>>>>
>>>>>>> I've been trying to think through the logic for this change more,
>>>>>>> and I'm don't completely understand the proposed solution.
>>>>>>>
>>>>>>> For example, say the program faults on a stack page. Then before
>>>>>>> disabling interrupts we access the next stack page(s). If one of those
>>>>>>> pages is not yet allocated, it will cause a second page fault.
>>>>>>>
>>>>>>
>>>>>> No, that is not a problem. The page-fault handling code already knows
>>>>>> it can be used without a working stack - so we have separate interrupt 
>>>>>> and
>>>>>> exception stacks which we switch to while handling those, and they work
>>>>>> fine even if the user's stack is empty.
>>>>>>
>>>>>> Rather, the problem is what are in Linux known as "system calls", and
>>>>>> in OSv are ordinary function calls.
>>>>>> Imagine you do call read() or sleep() or whatever in your
>>>>>> application. Some of the code implementing these
>>>>>> functions, which considers itself "kernel" code, disables interrupts
>>>>>> or preemption. But it also uses the stack,
>>>>>> and if it suddenly needs a new page of the stack, the resulting page
>>>>>> fault will refuse to run.
>>>>>>
>>>>>>
>>>>>>> This second page fault will invoke the page fault handler, which
>>>>>>> will attempt to run some code, which may access the first stack page and
>>>>>>> run into deadlock.
>>>>>>>
>>>>>>> Is this a possible problem, or am I misunderstanding how page faults
>>>>>>> are handled in OSv?
>>>>>>>
>>>>>>> If this is a problem, could we switch to a pre-populated per-CPU
>>>>>>> exception stack before calling the page fault handler?
>>>>>>>
>>>>>>
>>>>>> We do exactly that already - see exception_stack and interrupt_stack
>>>>>> in arch/x64/arch-cpu.hh
>>>>>>
>>>>>>
>>>>>>> --
>>>>>>> You received this message because you are subscribed to the Google
>>>>>>> Groups "OSv Development" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>> send an email to osv...@googlegroups.com.
>>>>>>> To view this discussion on the web visit
>>>>>>> https://groups.google.com/d/msgid/osv-dev/92857c9c-3329-4743-bb88-a74053db613e%40googlegroups.com
>>>>>>> <https://groups.google.com/d/msgid/osv-dev/92857c9c-3329-4743-bb88-a74053db613e%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>> .
>>>>>>>
>>>>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "OSv Development" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to osv-dev+unsubscr...@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/osv-dev/e061cf71-013f-4a28-8f0c-0dafc1ad89cc%40googlegroups.com
>>> <https://groups.google.com/d/msgid/osv-dev/e061cf71-013f-4a28-8f0c-0dafc1ad89cc%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/CANEVyjvqbjAN2%3D6S9Z8RjerEsDw35Zms7R%3D05M-R1t6k7B9p7g%40mail.gmail.com.

Reply via email to