Re: [osv-dev] [PATCH] Move kernel to 0x40200000 address (1 GiB higher) in virtual memory

Waldek Kozaczuk Tue, 18 Jun 2019 19:05:59 -0700

Finally:
OSv v0.53.0-35-g61070e27
-> arch_init_premain(): elf_start = 0000000000000000
-> features_type() constructor
process_xen_bits
------> xen_init(): Start
------> xen_init(): After cpuid
------> xen_init(): Before XENVER_get_features
------> xen_init(): Before XENMEM_add_to_physmap
------> xen_init(): After XENMEM_add_to_physmap
------> xen_init(): End
-> premain() -> before init tab
-> premain() -> after init tab
1 CPUs detected
Firmware vendor: Xen
bsd: initializing - done
VFS: mounting ramfs at /
VFS: mounting devfs at /dev
net: initializing - done
vga: Add VGA device instance
eth0: ethernet address: 0a:cd:94:8b:60:d2
backend features: feature-sg feature-gso-tcp4
1024MB <Virtual Block Device> at device/vbd/51712random: intel drng, rdrand 
registered as a source.
random: <Software, Yarrow> initialized
VFS: unmounting /dev
VFS: mounting rofs at /rofs
failed to mount /rofs, error = No error information
VFS: mounting zfs at /zfs
zfs: mounting osv/zfs from device /dev/vblk0.1
random: device unblocked.
VFS: mounting devfs at /dev
VFS: mounting procfs at /proc
program zpool.so returned 1
BSD shrinker: event handler list found: 0xffffa00000b79a00
BSD shrinker found: 1
BSD shrinker: unlocked, running
[E/28 bsd-log]: eth0: link state changed to DOWN


[E/28 bsd-log]: eth0: link state changed to UP

[I/28 dhcp]: Broadcasting DHCPDISCOVER message with xid: [1369833657]
[I/28 dhcp]: Waiting for IP...
[I/28 dhcp]: Broadcasting DHCPDISCOVER message with xid: [999080978]
[I/28 dhcp]: Waiting for IP...
[I/197 dhcp]: DHCP received hostname: ip-172-31-6-160

[I/197 dhcp]: Received DHCPOFFER message from DHCP server: 172.31.0.1 
regarding offerred IP address: 172.31.6.160
[I/197 dhcp]: Broadcasting DHCPREQUEST message with xid: [999080978] to 
SELECT offered IP: 172.31.6.160
[I/197 dhcp]: DHCP received hostname: ip-172-31-6-160

[I/197 dhcp]: Received DHCPACK message from DHCP server: 172.31.0.1 
regarding offerred IP address: 172.31.6.160
[I/197 dhcp]: Server acknowledged IP 172.31.6.160 for interface eth0 with 
time to lease in seconds: 3600
eth0: 172.31.6.160
[I/197 dhcp]: Configuring eth0: ip 172.31.6.160 subnet mask 255.255.240.0 
gateway 172.31.0.1 MTU 9001
[E/197 bsd-log]: eth0: link state changed to DOWN

[E/197 bsd-log]: eth0: link state changed to UP

[I/197 dhcp]: Set hostname to: ip-172-31-6-160
disk read (real mode): 1992.96ms, (+1992.96ms)
uncompress lzloader.elf: 2019.09ms, (+26.13ms)
TLS initialization: 2045.69ms, (+26.60ms)
.init functions: 2055.63ms, (+9.94ms)
SMP launched: 2059.00ms, (+3.37ms)
VFS initialized: 2069.98ms, (+10.97ms)
Network initialized: 2072.19ms, (+2.21ms)
pvpanic done: 2072.43ms, (+0.24ms)
pci enumerated: 2085.52ms, (+13.09ms)
drivers probe: 2085.52ms, (+0.00ms)
drivers loaded: 2272.28ms, (+186.76ms)
ZFS mounted: 3042.23ms, (+769.95ms)
Total time: 6137.31ms, (+3095.08ms)
Hello from C code
[I/0 dhcp]: Unicasting DHCPRELEASE message with xid: [2052720604] from 
client: 172.31.6.160 to server: 172.31.0.1
VFS: unmounting /dev
VFS: unmounting /proc
VFS: unmounting /
Powering off.

This is the output from EC2 instance,

Turns out my thinking was right, just the execution was poor. Simply put 
virt_to_phys() was the wrong tool at the time - the variables it depends on 
have not been set yet !!!

So simple subtraction does the job:
     // Base + 1 would have given us the version number, it is mostly
     // uninteresting for us now
     auto x = processor::cpuid(base + 2);
-    processor::wrmsr(x.b, cast_pointer(&hypercall_page));
+    processor::wrmsr(x.b, cast_pointer(&hypercall_page) - 
OSV_KERNEL_VM_SHIFT);

..
+    debug_early("------> xen_init(): Before XENMEM_add_to_physmap\n");
     struct xen_add_to_physmap map;
     map.domid = DOMID_SELF;
     map.idx = 0;
     map.space = 0;
-    map.gpfn = cast_pointer(&xen_shared_info) >> 12;
+    map.gpfn = (cast_pointer(&xen_shared_info) - OSV_KERNEL_VM_SHIFT) >> 
12;


I would still love to test the non-HVM (PV?) path but I need to be able to 
locally run OSv on Xen.

Waldek

On Tuesday, June 18, 2019 at 5:35:04 PM UTC-4, Waldek Kozaczuk wrote:
>
> As I somehow suspected mapping kernel higher breaks Xen. At least on EC2. 
> I still have not gotten my laptop properly support to run OSv on Xen (which 
> I will be detailing in my other email). So I cannot reproduce anything 
> locally yet. Very frustrating.
>
> The good news is that OSv at least starts the boot process, displays "OSv 
> .." and exits somewhere in premain just before executing the ELF init 
> functions. Also the very latest master works on EC2. And my kernel mapping 
> changes break it. 
>
> I managed to add couple of debug_early() and here is what I got:
> OSv v0.53.0-35-g61070e27
> -> arch_init_premain()
> -> features_type() constructor
> process_xen_bits
> ------> xen_init(): Start
> ------> xen_init(): After cpuid
> ------> xen_init(): Before XENVER_get_features
>
> The arch/x64/xen.cc changes:
> diff --git a/arch/x64/xen.cc b/arch/x64/xen.cc
> index 462c266c..0247d5eb 100644
> --- a/arch/x64/xen.cc
> +++ b/arch/x64/xen.cc
> @@ -169,17 +169,24 @@ gsi_level_interrupt *xen_set_callback(int irqno)
>  
>  void xen_init(processor::features_type &features, unsigned base)
>  {
> +    debug_early("------> xen_init(): Start\n");
> +
>      // Base + 1 would have given us the version number, it is mostly
>      // uninteresting for us now
>      auto x = processor::cpuid(base + 2);
>      processor::wrmsr(x.b, cast_pointer(&hypercall_page));
> +    //processor::wrmsr(x.b, 
> cast_pointer(mmu::virt_to_phys(&hypercall_page)));
>  
> +    debug_early("------> xen_init(): After cpuid\n");
>      struct xen_feature_info info;
>      // To fill up the array used by C code
>      for (int i = 0; i < XENFEAT_NR_SUBMAPS; i++) {
>          info.submap_idx = i;
> -        if (version_hypercall(XENVER_get_features, &info) < 0)
> +        debug_early("------> xen_init(): Before XENVER_get_features\n");
> +        if (version_hypercall(XENVER_get_features, &info) < 0) {
> +            debug_early("------> xen_init(): XENVER_get_features 
> failed!\n");
>              assert(0);
> +        }
>          for (int j = 0; j < 32; j++)
>              xen_features[i * 32 + j] = !!(info.submap & 1<<j);
>      }
> @@ -188,18 +195,25 @@ void xen_init(processor::features_type &features, 
> unsigned base)
>      if (!features.xen_vector_callback)
>          evtchn_irq_is_legacy();
>  
> +    debug_early("------> xen_init(): Before XENMEM_add_to_physmap\n");
>      struct xen_add_to_physmap map;
>      map.domid = DOMID_SELF;
>      map.idx = 0;
>      map.space = 0;
> -    map.gpfn = cast_pointer(&xen_shared_info) >> 12;
> +    map.gpfn = (cast_pointer(mmu::virt_to_phys(&xen_shared_info))) >> 12;
>  
>      // 7 => add to physmap
> -    if (memory_hypercall(XENMEM_add_to_physmap, &map))
> +    if (memory_hypercall(XENMEM_add_to_physmap, &map)) {
> +        debug_early("------> xen_init(): XENMEM_add_to_physmap 
> failed!\n");
>          assert(0);
> +    }
> +
> +    debug_early("------> xen_init(): After XENMEM_add_to_physmap\n");
>  
>      features.xen_pci = xen_pci_enabled();
>      HYPERVISOR_shared_info = reinterpret_cast<shared_info_t 
> *>(&xen_shared_info);
> +
> +    debug_early("------> xen_init(): End\n");
>  }
>
> So it looks like it breaks right in version_hypercall()  and we do not 
> even see another statement before abort(). 
>
> My theory is that there is somewhere missing minus or plus 
> OSV_KERNEL_VM_SHIFT in that code or mmu::virt_to_phys() or phys_to_virt(). 
> As you can see I already made a change around setting gpfn field (address 
> of mapping table?, 
> https://github.com/cloudius-systems/osv/blob/a3cd022fcda2c88eae89476aa6c29e3c4be04926/bsd/sys/xen/interface/memory.h#L194-L217)
>  
> which I am guessing needs to by a physical page address >>12. I also 
> thought that writing an address of hypercall_page to MSR required a 
> physical address. But when I made this change (see commented line), it 
> broke on that call - I did not see the following debug statement. 
>
> It is very hard to find very good explanation what exactly needs to 
> happen. I have tried to look at Linux code as an example but I am still 
> confused how the xen init process should look like as far as what types of 
> addresses (physical or virtual) need to be passed at what point. For 
> example per this - 
> https://github.com/torvalds/linux/blob/83f3ef3de625a5766de2382f9e077d4daafd5bac/arch/x86/xen/enlighten_pvh.c#L26-L40
>  - 
> Linux passes physical address or hypercall page. When we do it it breaks.
>
> Overall I am not sure I understand what exact mode of Xen virtualization 
> OSv is coded to work under - HVM and/or PV (see 
> https://wiki.xen.org/wiki/Xen_Project_Software_Overview#HVM)? I think 
> when it boots on EC2 or when run with './scripts/run.py -p xen' (please see 
> there is also xenpv mode) Xen uses HVM (hardware assisted virtualization) 
> but also exposes para-virtual xen devices (block and net) which OSv takes 
> advantage of. When xenpv is used xen uses loader.elf (I think) so this is 
> when this entry-xen.S comes it play. In this mode xen is supposed to setup 
> all page tables as we do not go through boot.S logic. 
>
> In any case I am also not clear how exactly the memory mapping works under 
> Xen. In HVM mode OSv goes through boot16.S/boot.S so it seems to be setting 
> up paging tables same way it does it for example under KVM or other 
> hypervisor. Does it mean that CR3 works just like on KVM? But then I was 
> reading there are special hypercalls to setup mapping. What is the point of 
> this memory_hypercall XENMEM_add_to_physmap() and why do we need it if we 
> have CR3? When we mmap(), do we something extra (hypercalls?) on xen? I 
> could not find any evidence of it.
>
> Any ideas/suggestions what might be wrong and what should be fixed?
>
> The most frustrating part is that I cannot reproduce it locally.
>
> On Sunday, June 16, 2019 at 9:57:58 AM UTC-4, Nadav Har'El wrote:
>>
>> On Sun, Jun 16, 2019 at 4:37 PM Waldek Kozaczuk <jwkoz...@gmail.com> 
>> wrote:
>>
>>>
>>> +    # virtual address space 1 GiB at a time
>>>>>>> +    # The very 1st entry maps 1st GiB 1:1 by pointing to 
>>>>>>> ident_pt_l2 table
>>>>>>> +    # that specifies addresses of every one of 512 2MiB slots of 
>>>>>>> physical memory
>>>>>>> +    .quad ident_pt_l2 + 0x67 - OSV_KERNEL_VM_SHIFT
>>>>>>> +    # The 2nd entry maps 2nd GiB to the same 1st GiB of physical 
>>>>>>> memory by pointing
>>>>>>> +    # to the same ident_pt_l2 table as the 1st entry above
>>>>>>> +    # This way we effectively provide correct mapping for the 
>>>>>>> kernel linked
>>>>>>> +    # to start at 1 GiB + 2 MiB (0x40200000) in virtual memory and 
>>>>>>> point to
>>>>>>> +    # 2 MiB address (0x200000) where it starts in physical memory
>>>>>>> +    .quad ident_pt_l2 + 0x67 - OSV_KERNEL_VM_SHIFT
>>>>>>>
>>>>>>
>>>>>> Oh, but doesn't this mean that this only works correctly when 
>>>>>> OSV_KERNEL_VM_SHIFT is *exactly* 1 GB?
>>>>>> I.e., the reason why you want the mapping of the second gigabyte to 
>>>>>> be identical to the first gigabyte is
>>>>>> just because the shift is exactly 1GB?
>>>>>>
>>>>> That is correct. The general scheme (which I am planning to make part 
>>>>> of the next patch at some time) should be this:
>>>>> OSV_KERNEL_VM_SHIFT = 1 GiB + N * 2MiB where 0 =< N < 500 (more less 
>>>>> as last 24MB of the 2nd GB should be enough for the kernel).
>>>>> But then instead of re-using and pointing to the ident_pt_l2 table I 
>>>>> will have to define extra instance of ident_pt_l2-equivalent-table where 
>>>>> the first N entries will be zero.  
>>>>>
>>>>
>>>> So although in all other places in the code, OSV_KERNEL_VM_SHIFT can be 
>>>> anything, in this specific
>>>> place you really do assume that OSV_KERNEL_VM_SHIFT= 1GB and nothing 
>>>> else will work? If this
>>>> is the case, I think you should add a static assertion (if it's 
>>>> possible in the assembly code?) or
>>>> equivalent, to fail the compilation if OSV_KERNEL_VM_SHIFT != 1 GB.
>>>>
>>> Indeed at least with this patch (until we have more dynamic formula) it 
>>> all comes down to a single constraint where we reuse the pl3 table and the 
>>> shift has to be be 1GB. How can we assert it with the compiler ?
>>>
>>
>> Since  OSV_KERNEL_VM_SHIFT is a preprocessor macro, maybe we can check it 
>> with the preprocessor - even in the assembler code, something like:
>>
>> #if OSV_KERNEL_VM_SHIFT != 0x40000000 && OSV_KERNEL_VM_SHIFT != 0
>> #error This code only works correctly for OSV_KERNEL_VM_SHIFT = 
>> 0x40000000 or 0
>> #endif
>>
>> Put it right next to the code which only works for these cases (correct 
>> me if I'm wrong but
>> I think the same code works for either 0 and 0x40000000, but not anything 
>> else?) and check that you get a compilation error for OSV_KERNEL_VM_SHIFT = 
>> 0x12345.
>>
>

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/6f0686d3-9b87-4e03-b02c-9212a7fdf31b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [osv-dev] [PATCH] Move kernel to 0x40200000 address (1 GiB higher) in virtual memory

Reply via email to