Re: [osv-dev] AArch64 debug build woes

2021-02-25 Thread Avi Kivity


On 25/02/2021 01.49, Waldek Kozaczuk wrote:



On Wednesday, February 24, 2021 at 5:07:36 PM UTC-5 
stewart.h...@dornerworks.com wrote:


On Wednesday, February 24, 2021 at 4:38:02 PM UTC-5
jwkoz...@gmail.com wrote:

On Wednesday, February 24, 2021 at 11:42:33 AM UTC-5
stewart.h...@dornerworks.com wrote:

On Tuesday, February 23, 2021 at 3:20:04 PM UTC-5 Stewart
Hildebrand wrote:

On Tuesday, February 23, 2021 at 11:50:09 AM UTC-5
Nadav Har'El wrote:

On Tue, Feb 23, 2021 at 2:00 AM Waldek Kozaczuk
 wrote:



On Monday, February 22, 2021 at 1:36:12 AM
UTC-5 Nadav Har'El wrote:

On Mon, Feb 22, 2021 at 7:30 AM Waldek
Kozaczuk  wrote:
    asm volatile

        ("mov %%rbp, %c[rbp](%0) \n\t"
         "movq $1f, %c[rip](%0) \n\t"
         "mov %%rsp, %c[rsp](%0) \n\t"
         "mov %c[rsp](%1), %%rsp \n\t"
         "mov %c[rbp](%1), %%rbp \n\t"
         "jmpq *%c[rip](%1) \n\t"
         "1: \n\t"
         :
         : "a"(&old->_state),
"c"(&this->_state),
 [rsp]"i"(offsetof(thread_state, rsp)),
 [rbp]"i"(offsetof(thread_state, rbp)),
 [rip]"i"(offsetof(thread_state, rip))
         : "rbx", "rdx", "rsi", "rdi",
"r8", "r9",
           "r10", "r11", "r12", "r13",
"r14", "r15", "memory");


Note this list of registers here! I think
(but it's been years since I looked at
this, so I'm rusty...), that the idea
is exactly to have the compiler save the
callee-saved registers, if it didn't
already. It tells the compiler to
pretend that this assembly instruction
just modified all the registers in the
list. So now we have a function
switch_to() which thinks it modifies r15
et al., so it needs to save and restore
these registers.

Perhaps exactly the same solution would
work for aarch64 as well.

This would be better than explicit
copying, because as you noted, in some
builds reschedule_from_interrupt() already
saves these registers so there is no need
to do it again.

If this is the solution, we need a comment
next to this list of registers explaining
its raison d'etre :-(


Indeed when you disassemble the x64 version of
switch_to() (this one is from a release
build), it looks like this:

Dump of assembler code for function
_ZN5sched6thread9switch_toEv:
   0x403f8140 <+0>:push  %rbp
   0x403f8141 <+1>:mov   %rsp,%rbp
   0x403f8144 <+4>:push  %r15
   0x403f8146 <+6>:push  %r14
   0x403f8148 <+8>:push  %r13
   0x403f814a <+10>:push  %r12
   0x403f814c <+12>:push  %rbx
...
   0x403f81e1 <+161>:pop   %rbx
   0x403f81e2 <+162>:pop   %r12
   0x403f81e4 <+164>:pop   %r13
   0x403f81e6 <+166>:pop   %r14
   0x403f81e8 <+168>:pop   %r15
   0x403f81ea <+170>:pop   %rbp
   0x403f81eb <+171>:ret

So all callee-save registers indeed are pushed
and popped from the stack. Some of them like
r12 are used by the body of switch_to(), but
others are not - r13, r14, r15. This is
interesting because according to the
documentation of in

Re: [osv-dev] AArch64 debug build woes

2021-02-25 Thread Avi Kivity


On 25/02/2021 00.07, 'Stewart Hildebrand' via OSv Development wrote:



On Wednesday, February 24, 2021 at 4:38:02 PM UTC-5 jwkoz...@gmail.com 
wrote:


On Wednesday, February 24, 2021 at 11:42:33 AM UTC-5
stewart.h...@dornerworks.com wrote:

On Tuesday, February 23, 2021 at 3:20:04 PM UTC-5 Stewart
Hildebrand wrote:

On Tuesday, February 23, 2021 at 11:50:09 AM UTC-5 Nadav
Har'El wrote:

On Tue, Feb 23, 2021 at 2:00 AM Waldek Kozaczuk
 wrote:



On Monday, February 22, 2021 at 1:36:12 AM UTC-5
Nadav Har'El wrote:

On Mon, Feb 22, 2021 at 7:30 AM Waldek
Kozaczuk  wrote:
    asm volatile

        ("mov %%rbp, %c[rbp](%0) \n\t"
         "movq $1f, %c[rip](%0) \n\t"
         "mov %%rsp, %c[rsp](%0) \n\t"
         "mov %c[rsp](%1), %%rsp \n\t"
         "mov %c[rbp](%1), %%rbp \n\t"
         "jmpq *%c[rip](%1) \n\t"
         "1: \n\t"
         :
         : "a"(&old->_state),
"c"(&this->_state),
 [rsp]"i"(offsetof(thread_state, rsp)),
 [rbp]"i"(offsetof(thread_state, rbp)),
 [rip]"i"(offsetof(thread_state, rip))
         : "rbx", "rdx", "rsi", "rdi",
"r8", "r9",
           "r10", "r11", "r12", "r13",
"r14", "r15", "memory");


Note this list of registers here! I think (but
it's been years since I looked at this, so I'm
rusty...), that the idea
is exactly to have the compiler save the
callee-saved registers, if it didn't already.
It tells the compiler to
pretend that this assembly instruction just
modified all the registers in the list. So now
we have a function
switch_to() which thinks it modifies r15 et
al., so it needs to save and restore these
registers.

Perhaps exactly the same solution would work
for aarch64 as well.

This would be better than explicit copying,
because as you noted, in some builds
reschedule_from_interrupt() already
saves these registers so there is no need to
do it again.

If this is the solution, we need a comment
next to this list of registers explaining its
raison d'etre :-(


Indeed when you disassemble the x64 version of
switch_to() (this one is from a release build), it
looks like this:

Dump of assembler code for function
_ZN5sched6thread9switch_toEv:
   0x403f8140 <+0>:push  %rbp
   0x403f8141 <+1>:mov   %rsp,%rbp
   0x403f8144 <+4>:push  %r15
   0x403f8146 <+6>:push  %r14
   0x403f8148 <+8>:push  %r13
   0x403f814a <+10>:push  %r12
   0x403f814c <+12>:push  %rbx
...
   0x403f81e1 <+161>:pop   %rbx
   0x403f81e2 <+162>:pop   %r12
   0x403f81e4 <+164>:pop   %r13
   0x403f81e6 <+166>:pop   %r14
   0x403f81e8 <+168>:pop   %r15
   0x403f81ea <+170>:pop   %rbp
   0x403f81eb <+171>:ret

So all callee-save registers indeed are pushed and
popped from the stack. Some of them like r12 are
used by the body of switch_to(), but others are
not - r13, r14, r15. This is interesting because
according to the documentation of inline assembly
-
https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html

- it does not mention the callee-save registers.
The section "6.47.2.6 Clobbers and Scratch
Registers" has this statement:

"*When the compi

Re: [osv-dev] AArch64 debug build woes

2021-02-22 Thread Avi Kivity


On Monday, February 22, 2021 at 7:30:31 AM UTC+2 jwkoz...@gmail.com wrote:

> I think I have an explanation of what is going on. Before I present it let 
> me recap the calling convention for aarch64:
>
> Caller:
>
>1. If we need any of x0-x18 registers, save them. They are corruptible.
>2. Move the first 8 parameters into registers x0-x7.
>3. Push any additional parameters on the stack.
>4. Use BL to call the function.
>5. Evaluate the return code in x0.
>6. Restore any of x0-x18 that we saved in step 1.
>
> Callee:
>
>1. Push LR (x30) and x19-x29 onto the stack if used by this routine.
>2. Do the work
>3. Put return code in x0.
>4. Pop LR and x19-x29 if pushed in step 1.
>5. Use RET instruction to return execution to the caller (this will 
>implicitly use LR (x30) as an address to return to).
>
> Now imagine the following scenario involving function F executing on 
> thread T1 that calls thread::yield() or another function calling yield():
>
>1. Function F pushes one of the callee saved registers - x23 (just an 
>example) - on the T1 stack becuase it uses it for something and it must do 
>it per the calling convention.
>2. Function F stores some value in x23.
>3. Function F calls thread::yield() directly or indirectly.
>4. Eventually, reschedule_from_interrupt() is called and it calls 
>switch_to() to switch stack pointer to the new thread T2 stack. The debug 
>version of  reschedule_from_interrupt() nor switch_to() stores x23 as they 
>do not use this register (unlike the release version).
>5. At some point, later reschedule_from_interrupt() is called again 
>(not necessarily the very next time) and calls switch_to() that switches 
>back to T1.
>6. T1 resumes and eventually returns the control to the function F1 
>right after it called yield().
>7. The code in F1 after calling yield() reads the value of x23 ... and 
>boom. The x23 quite likely contains garbage because it was never restored 
>by F1 after calling yield() because per calling convention yield() or 
> other 
>callees should have saved and restored. But it did not, did it? Or rather 
>different routines on different threads running on the same cpu in between 
>ruined it.
>
> Why does it all work with the release version? It does because the 
> reschedule_from_interrupt() compiled with -02 happens to use and save all 
> callee-saved registers x19-x28. So they happen to be restored to correct 
> values after the switch.
>
> So it seems that the right solution is to save and restore x19-x28 (callee 
> saved registers) in switch_to() like so:
>
> diff --git a/arch/aarch64/arch-switch.hh b/arch/aarch64/arch-switch.hh
> index dff7467c..45aff4a7 100644
> --- a/arch/aarch64/arch-switch.hh
> +++ b/arch/aarch64/arch-switch.hh
> @@ -27,6 +27,7 @@ void thread::switch_to()
>  
>  asm volatile("\n"
>   "str x29, %0  \n"
> +"sub sp, sp, #0x50\n"
>   "mov x2, sp   \n"
>   "adr x1, 1f   \n" /* address of label */
>   "stp x2, x1,  %1  \n"
> @@ -34,10 +35,23 @@ void thread::switch_to()
>   "ldp x29, x0, %2  \n"
>   "ldp x2, x1,  %3  \n"
>  
> +"stp x19, x20, [sp, #0]\n"
> +"stp x21, x22, [sp, #16]\n"
> +"stp x23, x24, [sp, #32]\n"
> +"stp x25, x26, [sp, #48]\n"
> +"stp x27, x28, [sp, #64]\n"
> +
>   "mov sp, x2   \n"
>   "blr x1   \n"
>  
>   "1:   \n" /* label */
> +
> +"ldp x19, x20, [sp, #0]\n"
> +"ldp x21, x22, [sp, #16]\n"
> +"ldp x23, x24, [sp, #32]\n"
> +"ldp x25, x26, [sp, #48]\n"
> +"ldp x27, x28, [sp, #64]\n"
> +"add sp, sp, #0x50\n"
>   :
>   : "Q"(old->_state.fp), "Ump"(old->_state.sp),
> "Ump"(this->_state.fp), "Ump"(this->_state.sp)
>
> And indeed the crashes in both -00 and -O1 go away.
>
> Does my explanation have holes? Or am I completely wrong?
>

I think you're completely right, well spotted.

I think you're completely right. Here's the equivalent from Linux.

Here's the Linux equivalent:

SYM_FUNC_START(cpu_switch_to)
mov x10, #THREAD_CPU_CONTEXT
add x8, x0, x10
mov x9, sp
stp x19, x20, [x8], #16 // store callee-saved 
registers
stp x21, x22, [x8], #16
stp x23, x24, [x8], #16
stp x25, x26, [x8], #16
stp x27, x28, [x8], #16
stp x29, x9, [x8], #16
str lr, [x8]
add x8, x1, x10
ldp x19, x20, [x8], #16 // restore callee-saved 
registers
ldp x21, x22, [x8], #16
ldp x23, x24, [x8], #16
ldp x25, x26, 

Re: [osv-dev] AArch64 debug build woes

2021-02-15 Thread Waldek Kozaczuk
For comparison fragment of zcopy_tx in release loader.elf until a call to
eventfd which is after new:

Dump of assembler code for function zcopy_tx(int, zmsghdr*):
   0x40100da0 <+0>: stp x29, x30, [sp, #-144]!
   0x40100da4 <+4>: mov x29, sp
   0x40100da8 <+8>: stp x21, x22, [sp, #32]
   0x40100dac <+12>:mov x22, x1
   0x40100db0 <+16>:stp x19, x20, [sp, #16]
   0x40100db4 <+20>:mov w20, w0
   0x40100db8 <+24>:mov x0, #0x8//
#8
   0x40100dbc <+28>:stp x23, x24, [sp, #48]
   0x40100dc0 <+32>:stp x25, x26, [sp, #64]
   0x40100dc4 <+36>:stp x27, x28, [sp, #80]
   0x40100dc8 <+40>:stp xzr, xzr, [sp, #104]
   0x40100dcc <+44>:stp xzr, xzr, [sp, #120]
   0x40100dd0 <+48>:str xzr, [sp, #136]
   0x40100dd4 <+52>:bl  0x403920e0 <_Znwm>
   0x40100dd8 <+56>:mov x23, x0
   0x40100ddc <+60>:str x0, [x22, #64]
   0x40100de0 <+64>:mov w1, #0x800  //
#2048
   0x40100de4 <+68>:mov w0, #0x0//
#0
   0x40100de8 <+72>:movkw1, #0x8, lsl #16
   0x40100dec <+76>:str xzr, [x23]
   0x40100df0 <+80>:bl  0x403520e0 

Now for debug version equivalent fragment (the crash happens at PC:
0x40111e80 ):

Dump of assembler code for function zcopy_tx(int, zmsghdr*):
   0x40111e2c <+0>: stp x29, x30, [sp, #-208]!
   0x40111e30 <+4>: mov x29, sp
   0x40111e34 <+8>: stp x19, x20, [sp, #16]
   0x40111e38 <+12>:str w0, [sp, #44]
   0x40111e3c <+16>:str x1, [sp, #32]
   0x40111e40 <+20>:str xzr, [sp, #88]
   0x40111e44 <+24>:str xzr, [sp, #96]
   0x40111e48 <+28>:str xzr, [sp, #104]
   0x40111e4c <+32>:str xzr, [sp, #112]
   0x40111e50 <+36>:str xzr, [sp, #120]
   0x40111e54 <+40>:str xzr, [sp, #184]
   0x40111e58 <+44>:ldr x0, [sp, #32]
   0x40111e5c <+48>:str x0, [sp, #176]
   0x40111e60 <+52>:mov x0, #0x8//
#8
   0x40111e64 <+56>:bl  0x405b7e60 <_Znwm>
   0x40111e68 <+60>:mov x19, x0
   0x40111e6c <+64>:mov x0, x19
   0x40111e70 <+68>:bl  0x40112544

   0x40111e74 <+72>:str x19, [sp, #168]
   0x40111e78 <+76>:ldr x0, [sp, #32]
   0x40111e7c <+80>:ldr x1, [sp, #168]
   *0x40111e80 <+84>:str x1, [x0, #64] -> pc reported in
the stack trace*
   0x40111e84 <+88>:mov w1, #0x800  //
#2048
   0x40111e88 <+92>:movkw1, #0x8, lsl #16
   0x40111e8c <+96>:mov w0, #0x0//
#0
   0x40111e90 <+100>:   bl  0x40557c1c 

Assembly of _Znwm which I believe is the same in both cases:

Dump of assembler code for function _Znwm:
   0x405b7e60 <+0>: stp x29, x30, [sp, #-32]!
   0x405b7e64 <+4>: cmp x0, #0x0
   0x405b7e68 <+8>: mov x29, sp
   0x405b7e6c <+12>: str x19, [sp, #16]
   0x405b7e70 <+16>: csinc x19, x0, xzr, ne  // ne = any
   0x405b7e74 <+20>: mov x0, x19
   0x405b7e78 <+24>: bl 0x40406f4c 
   0x405b7e7c <+28>: cbz x0, 0x405b7e8c <_Znwm+44>
   0x405b7e80 <+32>: ldr x19, [sp, #16]
   0x405b7e84 <+36>: ldp x29, x30, [sp], #32
   0x405b7e88 <+40>: ret
   0x405b7e8c <+44>: bl 0x405b7e50 <_ZSt15get_new_handlerv>
   0x405b7e90 <+48>: cbz x0, 0x405b7e9c <_Znwm+60>
   0x405b7e94 <+52>: blr x0
   0x405b7e98 <+56>: b 0x405b7e74 <_Znwm+20>
   0x405b7e9c <+60>: mov x0, #0x8   // #8
   0x405b7ea0 <+64>: bl 0x405b6170 <__cxa_allocate_exception>
   0x405b7ea4 <+68>: adrp x3, 0x40098000
   0x405b7ea8 <+72>: adrp x2, 0x40099000
   0x405b7eac <+76>: adrp x1, 0x40098000
   0x405b7eb0 <+80>: ldr x3, [x3, #1128]
   0x405b7eb4 <+84>: ldr x2, [x2, #928]
   0x405b7eb8 <+88>: add x3, x3, #0x10
   0x405b7ebc <+92>: ldr x1, [x1, #2992]
   0x405b7ec0 <+96>: str x3, [x0]
   0x405b7ec4 <+100>: bl 0x405b76e0 <__cxa_throw>

And the ztx_handle::ztx_handle(): for debug:

Dump of assembler code for function ztx_handle::ztx_handle():
   0x40112544 <+0>: stp x29, x30, [sp, #-32]!
   0x40112548 <+4>: mov x29, sp
   0x4011254c <+8>: str x0, [sp, #24]
   0x40112550 <+12>: ldr x0, [sp, #24]
   0x40112554 <+16>: mov x1, #0x0   // #0
   0x40112558 <+20>: bl 0x4011245c ::atomic(unsigned long)>
   0x4011255c <+24>: nop
   0x401125

Re: [osv-dev] AArch64 debug build woes

2021-02-15 Thread Avi Kivity



On 11/02/2021 07.42, Waldek Kozaczuk wrote:
Apart from the TLS issue reported here OSv can be built in the aarch64 
debug mode.


Some of the tests pass as well (as on release) but there are some that 
seem to fail in a similar way due to possibly wrong compiled code in 
kernel possibly due to -O0.


Here is one example:

./scripts/run.py -e '/tests/tst-bsd-tcp1-zsnd.so' -c 1

page fault outside application, addr: 0x

[registers]

PC: 0x40111e40 

X00: 0x0001 X01: 0xa0004100f9c0 X02: 0x0008

X03: 0x0008 X04: 0x0008 X05: 0x7001

X06: 0x X07: 0xb71b X08: 0x800041782aa0

X09: 0x X10: 0x0002 X11: 0x

X12: 0x2050435420612073 X13: 0x006567617373656d X14: 0x1af8

X15: 0x X16: 0x1005b5d0 X17: 0x40111dec

X18: 0x1120 X19: 0xa0004100f9c0 X20: 0x0190

X21: 0x0001 X22: 0x800041782db8 X23: 0x0001

X24: 0xa000414c4b80 X25: 0x800041793d98 X26: 0x800041793da8

X27: 0x206ffb00 X28: 0x1005a000 X29: 0x800041782c10

X30: 0x40111e34 SP:0x800041782c10 ESR: 0x9646

PSTATE: 0x6345

Aborted


[backtrace]

0x400e9e14 


After connecting with gdb and reconstructing the stacktrace, it looks 
like this:


00x40111e40in zcopy_tx(s=5, zm=0x1) at 
bsd/sys/kern/uipc_syscalls.cc:1027


#10x10037954in test_bsd_tcp1::tcp_server(this=0x206ff988) 
at /home/wkozaczuk/projects/osv/tests/tst-bsd-tcp1-zsnd.cc:114


#20x10037a64in 
test_bsd_tcp1::run()::{lambda()#1}::operator()() 
const(__closure=) at 
/home/wkozaczuk/projects/osv/tests/tst-bsd-tcp1-zsnd.cc:229


#3std::__invoke_impltest_bsd_tcp1::run()::{lambda()#1}&>(std::__invoke_other, 
test_bsd_tcp1::run()::{lambda()#1}&)(__f=...) at 
/usr/include/c++/10/bits/invoke.h:60


#4std::__invoke_rtest_bsd_tcp1::run()::{lambda()#1}&>(std::__is_invocable&&, 
(test_bsd_tcp1::run()::{lambda()#1}&)...)(__fn=...) at 
/usr/include/c++/10/bits/invoke.h:153


#5std::_Function_handlertest_bsd_tcp1::run()::{lambda()#1}>::_M_invoke(std::_Any_data 
const&)(__functor=...) at /usr/include/c++/10/bits/std_function.h:291


#60x4031cba8in std::function::operator()() 
const(this=0xa0004168d630) at 
/usr/include/c++/10/bits/std_function.h:622


#70x4043e1ccin sched::thread::main(this=0xa0004168d600) at 
core/sched.cc:1219


#80x4043a188in sched::thread_main_c(t=0xa0004168d600) at 
arch/aarch64/arch-switch.hh:186


#90x40439cf4in sched::thread::switch_to(this=0x0) at 
arch/aarch64/arch-switch.hh:28


#10 0xin ??()

Backtrace stopped: previous frame identical to this frame (corrupt stack?)

(gdb) frame 1

#10x10037954in test_bsd_tcp1::tcp_server(this=0x206ff988) 
at /home/wkozaczuk/projects/osv/tests/tst-bsd-tcp1-zsnd.cc:114


114int bytes2 = zcopy_tx(client_s, &zm);

(gdb) p client_s

$1 = 5

(gdb) p &zm

$2 = (zmsghdr *) 0x800041782d40


As you can see the test app calls zcopy_tx() which takes 2 arguments:

ssize_t zcopy_tx(int s, struct zmsghdr *zm)

the 1st one is int and has value 5 in the caller - the test app - and 
is received as such


in the kernel zcopy_tx.


The second one - the address of struct zmsghdr - is problematic. On 
the caller's side looks OK but when received in the kernel it is wrong 
- 0x1.


Why?


I saw another test crashing in a similar way when the caller (another 
test) would pass 3 arguments to kernel function and 2 of those 
(non-addresses) were passed correctly but the 3rd one - address one 
was not.



Any ideas what might be going on?





Can you provide a disassembly of zcopy_tx? From the start of the 
function until the crash site (there should only be register saves and 
other preamble, and the call to operator new, if I read it correctly).



Maybe save zm in some global before calling new, to see if operator new 
is the problem.


--
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/7441eb47-afa6-7393-774d-95aa59c7a2a5%40scylladb.com.


Re: [osv-dev] AArch64 debug build woes

2021-02-15 Thread Waldek Kozaczuk
On Mon, Feb 15, 2021 at 02:19 Nadav Har'El  wrote:

> On Mon, Feb 15, 2021 at 7:43 AM Waldek Kozaczuk 
> wrote:
>
>>
>>
>> On Sunday, February 14, 2021 at 2:33:16 PM UTC-5 Nadav Har'El wrote:
>>
>>>
>>> You seem to be pushing registers on the stack here. Where is this stack?
>>> In x86, we had separate stacks for exceptions, for nested exceptions, and
>>> interrupts.
>>> Is this also true in the arm version?
>>>
>>
>> Good question. So in our aarch64 port, there is no dedicated exception
>> nor interrupt stack unlike in x64. There is a single stack per thread where
>> everything happens. And this might be the issue.
>>
> But somehow it works with '-O2' but maybe some bugs we have for aarch64
>> which we do not understand do happen because of a single stack.
>>
>
> As usual this is just a wild theory, but it's possible that O2 code uses
> fewer or more registers, or uses the stack more or less or differently.
>
Right.

>
>
>>
>>> This discussion of the stack made me think of another possible reason
>>> for losing data in functions.
>>> The red zone.
>>> Do we have a "red zone" on arm64 as well?
>>> Basically, the "red zone" is 128 bytes below the stack pointer that a
>>> function can use as scratch space, and it can use it for example to store
>>> some of the parameters if it needs the registers to store something else -
>>> without wasting time on instructions to change the stack pointer. If some
>>> interrupt or exception overwrites this redzone, we lose data.
>>> To avoid this, we usually had separate stacks for interrupts and
>>> exceptions and nested exceptions, but where we didn't want to do this,
>>> e.g., in syscalls, we had to skip the redzone (see for example commit
>>> 499b9433ae748b6c04dedc2125ea17010ffbdaf1).
>>>
>>
>> The ARMv8  Procedure Call Standard -
>> https://developer.arm.com/documentation/ihi0055/latest/- does not
>> mention any "red zone". However, both Windows (
>> https://docs.microsoft.com/en-us/cpp/build/arm64-windows-abi-conventions?view=msvc-160#red-zone)
>> and Apple (
>> https://developer.apple.com/documentation/xcode/writing_arm64_code_for_apple_platforms)
>> extensions both mention the red zone - 16 bytes and 128 bytes respectively.
>> I could not find anything specific for Linux though. For sure Linux
>> requires 128 red zone for x86_64.
>>
>
> I suspect that gcc does *not* use the red zone on aarch64 - it seems there
> is no "-mno-red-zone" option, and
> https://github.com/iains/gcc-darwin-arm64 suggests that unlike Apple's
> compiler, gcc doesn't use a red zone - and your experiment below also
> suggests that this is not the problem. So that's probably (hopefully) not
> the problem, so there is one less reason to use separate stacks.
>
> By the way another consequence of using the user's stacks for interrupts,
> exceptions, etc., is that it becomes more important for all stacks to be
> big enough. I wonder if the problem could be that some of our thread stacks
> are too small. Maybe you can hack sched::thread to always use a larger
> minimum stack size and see if it helps?
>

So here is what I tried at the same time:
1) Enforced eager resolving of symbols (for one type of the crash).
2) Added a space of 128 bytes (tried 256, 512 even 4K) on the stack BEFORE
pushing the exception frame for both interrupts and other exceptions.
3) Added a space of 256 bytes on the stack AFTER pushing the exception
frame for both interrupts and exceptions.
4) Doubled the size of the default stack from 64K to 128.

Same crashes as before. Different for O0 and O1 as before my changes. So
none of the above helps or changes anything.

So my theory was that either the exception frame would overwrite portion of
the stack where registers are restored from, or something overwrites part
of the exception frame. But the experiments above seem to prove that this
may not be the case.

Then what else? Compiler bug? I will try with older version of 9.3.

>
>
>
>>
>> Now I devised a simple experiment and subtracted 128 bytes from sp at the
>> beginning of push_frame and added 128 bytes at the end of the pop_frame
>> macros that did not help in any way. I also tried 256, 512 to the same
>> effect.
>>
>>
>>
>> I have another wild guess below - caller-saved registers:
>>>
>>> The "typical" problem here (I don't know if it happens in your case, but
>>> it happened in the past in various cases)
>>> is that "something" (interrupt, exception, signal, etc.) gets called *in
>>> the middle *of the user's function code, so he
>>> did not know he was going to call a function, so it didn't save these
>>> caller-saved registers. This is why all of that
>>> asynchronous code needs to save all those caller-saved registers. In
>>> x86, we had these problems with the FPU
>>> and had to save the FPU state in a bunch of places. Maybe in aarch64 we
>>> need to save additional registers in
>>> the same place we saved the FPU state for x86?
>>>
>> There is the arm64 FPU save/restore code in OSv where it saves floating
>> point re

Re: [osv-dev] AArch64 debug build woes

2021-02-14 Thread Nadav Har'El
On Mon, Feb 15, 2021 at 7:43 AM Waldek Kozaczuk 
wrote:

>
>
> On Sunday, February 14, 2021 at 2:33:16 PM UTC-5 Nadav Har'El wrote:
>
>>
>> You seem to be pushing registers on the stack here. Where is this stack?
>> In x86, we had separate stacks for exceptions, for nested exceptions, and
>> interrupts.
>> Is this also true in the arm version?
>>
>
> Good question. So in our aarch64 port, there is no dedicated exception nor
> interrupt stack unlike in x64. There is a single stack per thread where
> everything happens. And this might be the issue.
>
But somehow it works with '-O2' but maybe some bugs we have for aarch64
> which we do not understand do happen because of a single stack.
>

As usual this is just a wild theory, but it's possible that O2 code uses
fewer or more registers, or uses the stack more or less or differently.


>
>> This discussion of the stack made me think of another possible reason for
>> losing data in functions.
>> The red zone.
>> Do we have a "red zone" on arm64 as well?
>> Basically, the "red zone" is 128 bytes below the stack pointer that a
>> function can use as scratch space, and it can use it for example to store
>> some of the parameters if it needs the registers to store something else -
>> without wasting time on instructions to change the stack pointer. If some
>> interrupt or exception overwrites this redzone, we lose data.
>> To avoid this, we usually had separate stacks for interrupts and
>> exceptions and nested exceptions, but where we didn't want to do this,
>> e.g., in syscalls, we had to skip the redzone (see for example commit
>> 499b9433ae748b6c04dedc2125ea17010ffbdaf1).
>>
>
> The ARMv8  Procedure Call Standard -
> https://developer.arm.com/documentation/ihi0055/latest/- does not mention
> any "red zone". However, both Windows (
> https://docs.microsoft.com/en-us/cpp/build/arm64-windows-abi-conventions?view=msvc-160#red-zone)
> and Apple (
> https://developer.apple.com/documentation/xcode/writing_arm64_code_for_apple_platforms)
> extensions both mention the red zone - 16 bytes and 128 bytes respectively.
> I could not find anything specific for Linux though. For sure Linux
> requires 128 red zone for x86_64.
>

I suspect that gcc does *not* use the red zone on aarch64 - it seems there
is no "-mno-red-zone" option, and https://github.com/iains/gcc-darwin-arm64
suggests that unlike Apple's compiler, gcc doesn't use a red zone - and
your experiment below also suggests that this is not the problem. So that's
probably (hopefully) not the problem, so there is one less reason to use
separate stacks.

By the way another consequence of using the user's stacks for interrupts,
exceptions, etc., is that it becomes more important for all stacks to be
big enough. I wonder if the problem could be that some of our thread stacks
are too small. Maybe you can hack sched::thread to always use a larger
minimum stack size and see if it helps?



>
> Now I devised a simple experiment and subtracted 128 bytes from sp at the
> beginning of push_frame and added 128 bytes at the end of the pop_frame
> macros that did not help in any way. I also tried 256, 512 to the same
> effect.
>
>
>
> I have another wild guess below - caller-saved registers:
>>
>> The "typical" problem here (I don't know if it happens in your case, but
>> it happened in the past in various cases)
>> is that "something" (interrupt, exception, signal, etc.) gets called *in
>> the middle *of the user's function code, so he
>> did not know he was going to call a function, so it didn't save these
>> caller-saved registers. This is why all of that
>> asynchronous code needs to save all those caller-saved registers. In x86,
>> we had these problems with the FPU
>> and had to save the FPU state in a bunch of places. Maybe in aarch64 we
>> need to save additional registers in
>> the same place we saved the FPU state for x86?
>>
> There is the arm64 FPU save/restore code in OSv where it saves floating
> point registers. But are you suggesting we save/restore extra registers in
> there? But why if push_frame/pop_frame do that for us already when the
> interrupt is taken?
>

OSv does this FPU save/restore in *more* than just interrupts. We also have
exceptions, signal handlers, and SYSCALL, all of which can wind up calling
code in the middle of user code. So the first code that leaves the user's
code needs to save these registers. It sounds like you're doing this
correctly for interrupts, but maybe it's missing for some other things?

That being said, if this were something as "obvious" as not saving the
registers, I would suspect this would have been more obvious and more
frequent, and not specific to O1. So maybe that's not the problem.


>
> Now in arm64, which is RISC, the stack is manipulated very differently
> than in x64, and very often storing or reading from the stack does not
> touch the stack pointer but merely references it, and then at some point,
> it is adjusted accordingly (it may happen before). So I wonder if 

Re: [osv-dev] AArch64 debug build woes

2021-02-14 Thread Waldek Kozaczuk


On Sunday, February 14, 2021 at 2:33:16 PM UTC-5 Nadav Har'El wrote:

> On Sat, Feb 13, 2021 at 6:24 PM Waldek Kozaczuk  
> wrote:
>
>> Hi,
>>
>> On Thu, Feb 11, 2021 at 9:06 AM Nadav Har'El  wrote:
>>
>>> On Thu, Feb 11, 2021 at 7:42 AM Waldek Kozaczuk  
>>> wrote:
>>>

 #1  0x10037954 in test_bsd_tcp1::tcp_server (this=0x206ff988) 
 at /home/wkozaczuk/projects/osv/tests/tst-bsd-tcp1-zsnd.cc:114

 114 int bytes2 = zcopy_tx(client_s, &zm);

 (gdb) p client_s

 $1 = 5

 (gdb) p &zm

 $2 = (zmsghdr *) 0x800041782d40


 As you can see the test app calls zcopy_tx() which takes 2 arguments:

 ssize_t zcopy_tx(int s, struct zmsghdr *zm)

 the 1st one is int and has value 5 in the caller - the test app - and 
 is received as such 

 in the kernel zcopy_tx.


 The second one - the address of struct zmsghdr - is problematic. On the 
 caller's side looks OK but when received in the kernel it is wrong - 0x1.

 Why?

>>>
>>> Not being an expert on aarch64 or it's function calling conventions, all 
>>> I can do is raise some wild guesses, I hope one of them is correct and you 
>>> can figure out which - perhaps by reading the code or trying to reproduce 
>>> it in new tests (you can perhaps write a new test which loops calling some 
>>> function f() with a bunch of parameters in multiple threads, and printing 
>>> an error if f ever gets called with wrong parameters) .
>>>
>>> One possibility is that our context-switch implementation is forgetting 
>>> to save some of the registers, and the register which is used to hold the 
>>> third argument of a function is lost on the context switch.
>>>
>>> Another possibility is that we lose this register in situations smaller 
>>> asynchronous events, not just context switches between threads. We have 
>>> interrupts (e.g., the timer interrupt), exceptions, and signals, which can 
>>> run complex OSv code in the middle of the user's function without the 
>>> function knowing that this is happening, so when we switch to these 
>>> interrupts or exceptions we mustn't forget the registers which the OSv code 
>>> may clobber.
>>>  
>>>
>>
>> I think you are right that we are "losing" a register or overwriting 
>> stack memory we restore registers from. It is just not clear where.
>>
>> So here is what I have tried since last email:
>> 1) I had a theory that maybe there is a bug in the __elf_resolve_pltgot 
>> assembly which would be called lazily as the app is executing it might be 
>> messing the registers. So I forced the app symbols to be resolved eagerly 
>> and still getting same exact crash. So back to the drawing board.
>> 2) I made gradual changed to conf/debug.mk to find the minimal 
>> difference with release.mk that makes the crashes happen:
>>
>>- Disabling the debug messages with 'onf-logger_debug=0' still give 
>>the exact same crashes
>>- Removing '-Wno-maybe-uninitialized' from debug.mk also does change 
>>anything - still same crash
>>- So at this point the only difference is optimization level and 
>>missing '-DNDEBUG'. So I thought of bumping the debug optimization level 
>> to 
>>'-O1'. Interestingly this makes the crashes to happen in the kernel code 
>>when mounting ROFS filesystem in rofs_mount() static function and it is 
>>very repeatable. But at the end of the day it is similar to the original 
>>crashes with -O0 - some register - typically x2* (x24, x22 or x28) 
>> changes 
>>in the middle of the function after it calls another one that register 
>>typically holding an address has wrong bogus value just as it was not 
>>restored by callee.
>>
>> In the specific kernel crash with -O1 above, the rofs_mount() 
>> calls rofs_read_blocks() which reads from a block device so for sure it 
>> causes an exit to hypervisor and results in an interrupt handled so for 
>> sure there are multiples threads being switched. So again maybe 
>> something wrong in interrupt handler?
>>
>> Now looking at all assembly code (which is I think they issue is) there 
>> are not that many candidates: entry.S where we have sync (page fault) and 
>> async (interrupts) handlers entry points. Both save and restore all 
>> registers they should I think using those 2 macros:
>> .macro push_state_to_exception_frame
>> sub sp, sp, #48 // make space for align2, align1+ESR, PSTATE, 
>> PC, SP
>> push_pair x28, x29
>> push_pair x26, x27
>>
>
> I have two problems trying to understand this: first, sadly my memory of 
> how we did all of this in x86 is very rusty.
> Second, I have no idea what was done differently in aarch64, and if so why.
>
> You seem to be pushing registers on the stack here. Where is this stack? 
> In x86, we had separate stacks for exceptions, for nested exceptions, and 
> interrupts.
> Is this also true in the arm version?
>
 
Good question. So in

Re: [osv-dev] AArch64 debug build woes

2021-02-14 Thread Nadav Har'El
On Sat, Feb 13, 2021 at 6:24 PM Waldek Kozaczuk 
wrote:

> Hi,
>
> On Thu, Feb 11, 2021 at 9:06 AM Nadav Har'El  wrote:
>
>> On Thu, Feb 11, 2021 at 7:42 AM Waldek Kozaczuk 
>> wrote:
>>
>>>
>>> #1  0x10037954 in test_bsd_tcp1::tcp_server (this=0x206ff988)
>>> at /home/wkozaczuk/projects/osv/tests/tst-bsd-tcp1-zsnd.cc:114
>>>
>>> 114 int bytes2 = zcopy_tx(client_s, &zm);
>>>
>>> (gdb) p client_s
>>>
>>> $1 = 5
>>>
>>> (gdb) p &zm
>>>
>>> $2 = (zmsghdr *) 0x800041782d40
>>>
>>>
>>> As you can see the test app calls zcopy_tx() which takes 2 arguments:
>>>
>>> ssize_t zcopy_tx(int s, struct zmsghdr *zm)
>>>
>>> the 1st one is int and has value 5 in the caller - the test app - and is
>>> received as such
>>>
>>> in the kernel zcopy_tx.
>>>
>>>
>>> The second one - the address of struct zmsghdr - is problematic. On the
>>> caller's side looks OK but when received in the kernel it is wrong - 0x1.
>>>
>>> Why?
>>>
>>
>> Not being an expert on aarch64 or it's function calling conventions, all
>> I can do is raise some wild guesses, I hope one of them is correct and you
>> can figure out which - perhaps by reading the code or trying to reproduce
>> it in new tests (you can perhaps write a new test which loops calling some
>> function f() with a bunch of parameters in multiple threads, and printing
>> an error if f ever gets called with wrong parameters) .
>>
>> One possibility is that our context-switch implementation is forgetting
>> to save some of the registers, and the register which is used to hold the
>> third argument of a function is lost on the context switch.
>>
>> Another possibility is that we lose this register in situations smaller
>> asynchronous events, not just context switches between threads. We have
>> interrupts (e.g., the timer interrupt), exceptions, and signals, which can
>> run complex OSv code in the middle of the user's function without the
>> function knowing that this is happening, so when we switch to these
>> interrupts or exceptions we mustn't forget the registers which the OSv code
>> may clobber.
>>
>>
>
> I think you are right that we are "losing" a register or overwriting stack
> memory we restore registers from. It is just not clear where.
>
> So here is what I have tried since last email:
> 1) I had a theory that maybe there is a bug in the __elf_resolve_pltgot
> assembly which would be called lazily as the app is executing it might be
> messing the registers. So I forced the app symbols to be resolved eagerly
> and still getting same exact crash. So back to the drawing board.
> 2) I made gradual changed to conf/debug.mk to find the minimal difference
> with release.mk that makes the crashes happen:
>
>- Disabling the debug messages with 'onf-logger_debug=0' still give
>the exact same crashes
>- Removing '-Wno-maybe-uninitialized' from debug.mk also does change
>anything - still same crash
>- So at this point the only difference is optimization level and
>missing '-DNDEBUG'. So I thought of bumping the debug optimization level to
>'-O1'. Interestingly this makes the crashes to happen in the kernel code
>when mounting ROFS filesystem in rofs_mount() static function and it is
>very repeatable. But at the end of the day it is similar to the original
>crashes with -O0 - some register - typically x2* (x24, x22 or x28) changes
>in the middle of the function after it calls another one that register
>typically holding an address has wrong bogus value just as it was not
>restored by callee.
>
> In the specific kernel crash with -O1 above, the rofs_mount()
> calls rofs_read_blocks() which reads from a block device so for sure it
> causes an exit to hypervisor and results in an interrupt handled so for
> sure there are multiples threads being switched. So again maybe
> something wrong in interrupt handler?
>
> Now looking at all assembly code (which is I think they issue is) there
> are not that many candidates: entry.S where we have sync (page fault) and
> async (interrupts) handlers entry points. Both save and restore all
> registers they should I think using those 2 macros:
> .macro push_state_to_exception_frame
> sub sp, sp, #48 // make space for align2, align1+ESR, PSTATE,
> PC, SP
> push_pair x28, x29
> push_pair x26, x27
>

I have two problems trying to understand this: first, sadly my memory of
how we did all of this in x86 is very rusty.
Second, I have no idea what was done differently in aarch64, and if so why.

You seem to be pushing registers on the stack here. Where is this stack? In
x86, we had separate stacks for exceptions, for nested exceptions, and
interrupts.
Is this also true in the arm version?

This discussion of the stack made me think of another possible reason for
losing data in functions.
The red zone.
Do we have a "red zone" on arm64 as well?
Basically, the "red zone" is 128 bytes below the stack pointer that a
function can use as scratch space, and it can

Re: [osv-dev] AArch64 debug build woes

2021-02-13 Thread Waldek Kozaczuk
Hi,

On Thu, Feb 11, 2021 at 9:06 AM Nadav Har'El  wrote:

> On Thu, Feb 11, 2021 at 7:42 AM Waldek Kozaczuk 
> wrote:
>
>>
>> #1  0x10037954 in test_bsd_tcp1::tcp_server (this=0x206ff988)
>> at /home/wkozaczuk/projects/osv/tests/tst-bsd-tcp1-zsnd.cc:114
>>
>> 114 int bytes2 = zcopy_tx(client_s, &zm);
>>
>> (gdb) p client_s
>>
>> $1 = 5
>>
>> (gdb) p &zm
>>
>> $2 = (zmsghdr *) 0x800041782d40
>>
>>
>> As you can see the test app calls zcopy_tx() which takes 2 arguments:
>>
>> ssize_t zcopy_tx(int s, struct zmsghdr *zm)
>>
>> the 1st one is int and has value 5 in the caller - the test app - and is
>> received as such
>>
>> in the kernel zcopy_tx.
>>
>>
>> The second one - the address of struct zmsghdr - is problematic. On the
>> caller's side looks OK but when received in the kernel it is wrong - 0x1.
>>
>> Why?
>>
>
> Not being an expert on aarch64 or it's function calling conventions, all I
> can do is raise some wild guesses, I hope one of them is correct and you
> can figure out which - perhaps by reading the code or trying to reproduce
> it in new tests (you can perhaps write a new test which loops calling some
> function f() with a bunch of parameters in multiple threads, and printing
> an error if f ever gets called with wrong parameters) .
>
> One possibility is that our context-switch implementation is forgetting to
> save some of the registers, and the register which is used to hold the
> third argument of a function is lost on the context switch.
>
> Another possibility is that we lose this register in situations smaller
> asynchronous events, not just context switches between threads. We have
> interrupts (e.g., the timer interrupt), exceptions, and signals, which can
> run complex OSv code in the middle of the user's function without the
> function knowing that this is happening, so when we switch to these
> interrupts or exceptions we mustn't forget the registers which the OSv code
> may clobber.
>
>

I think you are right that we are "losing" a register or overwriting stack
memory we restore registers from. It is just not clear where.

So here is what I have tried since last email:
1) I had a theory that maybe there is a bug in the __elf_resolve_pltgot
assembly which would be called lazily as the app is executing it might be
messing the registers. So I forced the app symbols to be resolved eagerly
and still getting same exact crash. So back to the drawing board.
2) I made gradual changed to conf/debug.mk to find the minimal difference
with release.mk that makes the crashes happen:

   - Disabling the debug messages with 'onf-logger_debug=0' still give the
   exact same crashes
   - Removing '-Wno-maybe-uninitialized' from debug.mk also does change
   anything - still same crash
   - So at this point the only difference is optimization level and missing
   '-DNDEBUG'. So I thought of bumping the debug optimization level to '-O1'.
   Interestingly this makes the crashes to happen in the kernel code when
   mounting ROFS filesystem in rofs_mount() static function and it is very
   repeatable. But at the end of the day it is similar to the original crashes
   with -O0 - some register - typically x2* (x24, x22 or x28) changes in the
   middle of the function after it calls another one that register
   typically holding an address has wrong bogus value just as it was not
   restored by callee.

In the specific kernel crash with -O1 above, the rofs_mount()
calls rofs_read_blocks() which reads from a block device so for sure it
causes an exit to hypervisor and results in an interrupt handled so for
sure there are multiples threads being switched. So again maybe
something wrong in interrupt handler?

Now looking at all assembly code (which is I think they issue is) there are
not that many candidates: entry.S where we have sync (page fault) and async
(interrupts) handlers entry points. Both save and restore all registers
they should I think using those 2 macros:
.macro push_state_to_exception_frame
sub sp, sp, #48 // make space for align2, align1+ESR, PSTATE,
PC, SP
push_pair x28, x29
push_pair x26, x27
push_pair x24, x25
push_pair x22, x23
push_pair x20, x21
push_pair x18, x19
push_pair x16, x17
push_pair x14, x15
push_pair x12, x13
push_pair x10, x11
push_pair x8, x9
push_pair x6, x7
push_pair x4, x5
push_pair x2, x3
push_pair x0, x1
add x1, sp, #288 // x1 := old SP (48 + 16 * 15 = 288)
mrs x2, elr_el1
mrs x3, spsr_el1
stp x30, x1, [sp, #240]  // store lr, old SP
stp x2, x3, [sp, #256]   // store elr_el1, spsr_el1

.macro pop_state_from_exception_frame
ldp x21, x22, [sp, #256] // load elr_el1, spsr_el1
pop_pair x0, x1
pop_pair x2, x3
pop_pair x4, x5
pop_pair x6, x7
pop_pair x8, x9
msr elr_el1, x21 // set lo

Re: [osv-dev] AArch64 debug build woes

2021-02-11 Thread Nadav Har'El
On Thu, Feb 11, 2021 at 7:42 AM Waldek Kozaczuk 
wrote:

>
> #1  0x10037954 in test_bsd_tcp1::tcp_server (this=0x206ff988)
> at /home/wkozaczuk/projects/osv/tests/tst-bsd-tcp1-zsnd.cc:114
>
> 114 int bytes2 = zcopy_tx(client_s, &zm);
>
> (gdb) p client_s
>
> $1 = 5
>
> (gdb) p &zm
>
> $2 = (zmsghdr *) 0x800041782d40
>
>
> As you can see the test app calls zcopy_tx() which takes 2 arguments:
>
> ssize_t zcopy_tx(int s, struct zmsghdr *zm)
>
> the 1st one is int and has value 5 in the caller - the test app - and is
> received as such
>
> in the kernel zcopy_tx.
>
>
> The second one - the address of struct zmsghdr - is problematic. On the
> caller's side looks OK but when received in the kernel it is wrong - 0x1.
>
> Why?
>

Not being an expert on aarch64 or it's function calling conventions, all I
can do is raise some wild guesses, I hope one of them is correct and you
can figure out which - perhaps by reading the code or trying to reproduce
it in new tests (you can perhaps write a new test which loops calling some
function f() with a bunch of parameters in multiple threads, and printing
an error if f ever gets called with wrong parameters) .

One possibility is that our context-switch implementation is forgetting to
save some of the registers, and the register which is used to hold the
third argument of a function is lost on the context switch.

Another possibility is that we lose this register in situations smaller
asynchronous events, not just context switches between threads. We have
interrupts (e.g., the timer interrupt), exceptions, and signals, which can
run complex OSv code in the middle of the user's function without the
function knowing that this is happening, so when we switch to these
interrupts or exceptions we mustn't forget the registers which the OSv code
may clobber.


>
> I saw another test crashing in a similar way when the caller (another
> test) would pass 3 arguments to kernel function and 2 of those
> (non-addresses) were passed correctly but the 3rd one - address one was not.
>
>
> Any ideas what might be going on?
>
>
> Waldek
>
> --
> You received this message because you are subscribed to the Google Groups
> "OSv Development" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to osv-dev+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/osv-dev/4a97809f-d207-48b9-88e7-06e218e5d829n%40googlegroups.com
> 
> .
>

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/CANEVyjvmYpHXzXSTNgXQ4wnAg_p%2B5uobvfkx%3DYCg5URwVUPsmg%40mail.gmail.com.