[osv-dev] Proposal to change virtual memory layout

Waldek Kozaczuk Thu, 10 Mar 2022 21:03:27 -0800

In the last couple of weeks I have been working on various issues related 
to aarch64 port. I have managed to make good progress and I will be sending 
new patches soon.


Two issues had to do with making Java run on aarch64 
- https://github.com/cloudius-systems/osv/issues/1145 
and https://github.com/cloudius-systems/osv/issues/1157.  After exchanging 
some emails on the openjdk emailing list and researching this problem, I 
finally discovered that the problem only happens when JIT is enabled and is 
caused by the fact that the JIT compiler generates machine code to access 
arbitrary address in memory in a way that assumes all addresses are 48 
bits, meaning first 16 bits are 0. And here are the details:

"Once I got hold of the JDK debuginfo files and identified the patching 
code - MacroAssembler::pd_patch_instruction(), I was able to put a 
breakpoint in it and see something very revealing:

#0  MacroAssembler::pd_patch_instruction_size (branch=0x20000879465c 
"\351\377\237\322\351\377\277\362\351\377\337\362\n\243\352\227\037",
    target=0xffffa00042c862e0 "\020zB") at 
src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp:75
#1  0x0000100000bc13cc in MacroAssembler::pd_patch_instruction (file=0x0, 
line=0, target=0xffffa00042c862e0 "\020zB", branch=<optimized out>)
    at src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp:626
#2  NativeMovConstReg::set_data (this=0x20000879465c, x=-105551995837728) 
at src/hotspot/cpu/aarch64/nativeInst_aarch64.cpp:262
#3  0x0000100000850bd0 in CompiledIC::set_ic_destination_and_value 
(value=0xffffa00042c862e0,
    entry_point=0x20000823d290 
"(\b@\271\b\001]\322*\005@\371\037\001\n\353,\001", <incomplete sequence 
\371\200>, this=<optimized out>)
    at src/hotspot/share/code/compiledIC.hpp:193
#4  ICStub::finalize (this=<optimized out>) at 
src/hotspot/share/code/icBuffer.cpp:91
#5  ICStubInterface::finalize (this=<optimized out>, self=<optimized out>) 
at src/hotspot/share/code/icBuffer.cpp:43
#6  0x0000100000e30958 in StubQueue::stub_finalize 
(this=0xffffa00041555300, s=<optimized out>) at 
src/hotspot/share/code/stubs.hpp:168
#7  StubQueue::remove_first (this=0xffffa00041555300) at 
src/hotspot/share/code/stubs.cpp:175
....

The corresponding crash value of X9 was this: 

0x0000*a00042c862e0*

vs the target argument of pd_patch_instruction() (see above in the 
backtrace):

0xffff*a00042c862e0*

Now given this comment:

// Move a constant pointer into r.  In AArch64 mode the virtual
// address space is 48 bits in size, so we only need three
// instructions to create a patchable instruction sequence that can
// reach anywhere.

and this fragment of pd_patch_instruction() - 
https://github.com/openjdk/jdk17u/blob/6f0f42630eac1febf562062afc523fdf3d2a920a/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp#L152-L161

it seems that the code to load x8 register with an address gets patched 
with 0x0000a00042c862e0 instead of 0xffffa00042c862e0. It is interesting 
that this assert - 
https://github.com/openjdk/jdk17u/blob/6f0f42630eac1febf562062afc523fdf3d2a920a/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp#L77
 
- does not get hit.

The bottom line is that the valid address 0xffffa00042c862e0 gets truncated 
to 0x0000a00042c862e0 I guess based on the assumption that in Linux all 
userspace addresses are 48-bits long (see 
https://www.kernel.org/doc/html/latest/arm64/memory.html). In OSv 
unikernel, there is no separation between user space and kernel space, and 
it happens that addresses returned by malloc fall into this range:

0xffffa00000000000 - 0xffffafffffffffff

So I guess the only solution to fix it on the OSv side would be to tweak 
its virtual memory mapping for mallocs and make sure it never uses virtual 
addresses > 48-bits."

Currently OSv maps this part of virtual memory like so:
------ 0x ffff 8000 0000 0000 phys_mem --\ | | |- Main Area - 16T ------ 0x 
ffff 9000 0000 0000 --X | | |- Page Area - 16T ------ 0x ffff a000 0000 
0000 --X | | |- Mempool Area - 16T ------ 0x ffff b000 0000 0000 --X | | |- 
Debug Area - 80T ------ 0x ffff ffff ffff ffff --/

I wonder if this was arbitrary choice made early in OSv design and there 
was some good reason for it.

Could this be changed to this:

------ 0x 0000 8000 0000 0000 phys_mem --\ | | |- Main Area - 16T ------ 0x 
0000 9000 0000 0000 --X | | |- Page Area - 16T ------ 0x 0000 a000 0000 
0000 --X | | |- Mempool Area - 16T ------ 0x 0000 b000 0000 0000 --X | | |- 
Debug Area - 80T ------ 0x 0000 ffff ffff ffff --/

I did manage to hack the code for aarch64 and it seems to be working.

I also found a similar case for x86_64. The library rapidjson used by 
dotnet uses last 16 bits of addresses to compress some info into it - 
see https://github.com/Tencent/rapidjson/pull/546#issue-133623698.

Now going forward I think Linux will extend the userspace addresses 
eventually from 48 bits to 56 bits 
(see https://en.wikipedia.org/wiki/Intel_5-level_paging) or higher. And 
dotnet actually made a fix to disable this high 16-bits hack. But given 
there are Linux apps that may assume that addresses are 48-bit and take 
advantage of it, it might be wise to change the OSv virtual memory layout 
to use the lower part only (<= 0x 0000 ffff ffff ffff).

What do you think?
Waldek

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/79d6032f-02c9-4dcf-955e-eb2f4b9f308bn%40googlegroups.com.

[osv-dev] Proposal to change virtual memory layout

Reply via email to