Re: [osv-dev] [PATCH] Move kernel to 0x40200000 address (1 GiB higher) in virtual memory

Nadav Har'El Wed, 19 Jun 2019 01:46:08 -0700

On Wed, Jun 19, 2019 at 5:05 AM Waldek Kozaczuk <[email protected]>
wrote:


> Finally:
> OSv v0.53.0-35-g61070e27
> -> arch_init_premain(): elf_start = 0000000000000000
> -> features_type() constructor
> process_xen_bits
> ------> xen_init(): Start
> ------> xen_init(): After cpuid
> ------> xen_init(): Before XENVER_get_features
> ------> xen_init(): Before XENMEM_add_to_physmap
> ------> xen_init(): After XENMEM_add_to_physmap
> ------> xen_init(): End
> -> premain() -> before init tab
> -> premain() -> after init tab
> 1 CPUs detected
> Firmware vendor: Xen
> bsd: initializing - done
> VFS: mounting ramfs at /
> VFS: mounting devfs at /dev
> net: initializing - done
> vga: Add VGA device instance
> eth0: ethernet address: 0a:cd:94:8b:60:d2
> backend features: feature-sg feature-gso-tcp4
> 1024MB <Virtual Block Device> at device/vbd/51712random: intel drng,
> rdrand registered as a source.
> random: <Software, Yarrow> initialized
> VFS: unmounting /dev
> VFS: mounting rofs at /rofs
> failed to mount /rofs, error = No error information
> VFS: mounting zfs at /zfs
> zfs: mounting osv/zfs from device /dev/vblk0.1
> random: device unblocked.
> VFS: mounting devfs at /dev
> VFS: mounting procfs at /proc
> program zpool.so returned 1
> BSD shrinker: event handler list found: 0xffffa00000b79a00
> BSD shrinker found: 1
> BSD shrinker: unlocked, running
> [E/28 bsd-log]: eth0: link state changed to DOWN
>
> [E/28 bsd-log]: eth0: link state changed to UP
>
> [I/28 dhcp]: Broadcasting DHCPDISCOVER message with xid: [1369833657]
> [I/28 dhcp]: Waiting for IP...
> [I/28 dhcp]: Broadcasting DHCPDISCOVER message with xid: [999080978]
> [I/28 dhcp]: Waiting for IP...
> [I/197 dhcp]: DHCP received hostname: ip-172-31-6-160
>
> [I/197 dhcp]: Received DHCPOFFER message from DHCP server: 172.31.0.1
> regarding offerred IP address: 172.31.6.160
> [I/197 dhcp]: Broadcasting DHCPREQUEST message with xid: [999080978] to
> SELECT offered IP: 172.31.6.160
> [I/197 dhcp]: DHCP received hostname: ip-172-31-6-160
>
> [I/197 dhcp]: Received DHCPACK message from DHCP server: 172.31.0.1
> regarding offerred IP address: 172.31.6.160
> [I/197 dhcp]: Server acknowledged IP 172.31.6.160 for interface eth0 with
> time to lease in seconds: 3600
> eth0: 172.31.6.160
> [I/197 dhcp]: Configuring eth0: ip 172.31.6.160 subnet mask 255.255.240.0
> gateway 172.31.0.1 MTU 9001
> [E/197 bsd-log]: eth0: link state changed to DOWN
>
> [E/197 bsd-log]: eth0: link state changed to UP
>
> [I/197 dhcp]: Set hostname to: ip-172-31-6-160
> disk read (real mode): 1992.96ms, (+1992.96ms)
> uncompress lzloader.elf: 2019.09ms, (+26.13ms)
> TLS initialization: 2045.69ms, (+26.60ms)
> .init functions: 2055.63ms, (+9.94ms)
> SMP launched: 2059.00ms, (+3.37ms)
> VFS initialized: 2069.98ms, (+10.97ms)
> Network initialized: 2072.19ms, (+2.21ms)
> pvpanic done: 2072.43ms, (+0.24ms)
> pci enumerated: 2085.52ms, (+13.09ms)
> drivers probe: 2085.52ms, (+0.00ms)
> drivers loaded: 2272.28ms, (+186.76ms)
> ZFS mounted: 3042.23ms, (+769.95ms)
> Total time: 6137.31ms, (+3095.08ms)
> Hello from C code
> [I/0 dhcp]: Unicasting DHCPRELEASE message with xid: [2052720604] from
> client: 172.31.6.160 to server: 172.31.0.1
> VFS: unmounting /dev
> VFS: unmounting /proc
> VFS: unmounting /
> Powering off.
>
> This is the output from EC2 instance,
>
> Turns out my thinking was right, just the execution was poor. Simply put
> virt_to_phys() was the wrong tool at the time - the variables it depends on
> have not been set yet !!!
>
> So simple subtraction does the job:
>      // Base + 1 would have given us the version number, it is mostly
>      // uninteresting for us now
>      auto x = processor::cpuid(base + 2);
> -    processor::wrmsr(x.b, cast_pointer(&hypercall_page));
> +    processor::wrmsr(x.b, cast_pointer(&hypercall_page) -
> OSV_KERNEL_VM_SHIFT);
>
> ..
> +    debug_early("------> xen_init(): Before XENMEM_add_to_physmap\n");
>      struct xen_add_to_physmap map;
>      map.domid = DOMID_SELF;
>      map.idx = 0;
>      map.space = 0;
> -    map.gpfn = cast_pointer(&xen_shared_info) >> 12;
> +    map.gpfn = (cast_pointer(&xen_shared_info) - OSV_KERNEL_VM_SHIFT) >>
> 12;
>
>
>
Great, thanks!


> I would still love to test the non-HVM (PV?) path but I need to be able to
> locally run OSv on Xen.
>

Unfortunately I can't help you with that :-(
If you have specific questions, you can try "git blame" to see who is to
"blame" for lines of code you don't understand, and try to ask this person
directly.


> Waldek
>
> On Tuesday, June 18, 2019 at 5:35:04 PM UTC-4, Waldek Kozaczuk wrote:
>>
>> As I somehow suspected mapping kernel higher breaks Xen. At least on EC2.
>> I still have not gotten my laptop properly support to run OSv on Xen (which
>> I will be detailing in my other email). So I cannot reproduce anything
>> locally yet. Very frustrating.
>>
>> The good news is that OSv at least starts the boot process, displays "OSv
>> .." and exits somewhere in premain just before executing the ELF init
>> functions. Also the very latest master works on EC2. And my kernel mapping
>> changes break it.
>>
>> I managed to add couple of debug_early() and here is what I got:
>> OSv v0.53.0-35-g61070e27
>> -> arch_init_premain()
>> -> features_type() constructor
>> process_xen_bits
>> ------> xen_init(): Start
>> ------> xen_init(): After cpuid
>> ------> xen_init(): Before XENVER_get_features
>>
>> The arch/x64/xen.cc changes:
>> diff --git a/arch/x64/xen.cc b/arch/x64/xen.cc
>> index 462c266c..0247d5eb 100644
>> --- a/arch/x64/xen.cc
>> +++ b/arch/x64/xen.cc
>> @@ -169,17 +169,24 @@ gsi_level_interrupt *xen_set_callback(int irqno)
>>
>>  void xen_init(processor::features_type &features, unsigned base)
>>  {
>> +    debug_early("------> xen_init(): Start\n");
>> +
>>      // Base + 1 would have given us the version number, it is mostly
>>      // uninteresting for us now
>>      auto x = processor::cpuid(base + 2);
>>      processor::wrmsr(x.b, cast_pointer(&hypercall_page));
>> +    //processor::wrmsr(x.b,
>> cast_pointer(mmu::virt_to_phys(&hypercall_page)));
>>
>> +    debug_early("------> xen_init(): After cpuid\n");
>>      struct xen_feature_info info;
>>      // To fill up the array used by C code
>>      for (int i = 0; i < XENFEAT_NR_SUBMAPS; i++) {
>>          info.submap_idx = i;
>> -        if (version_hypercall(XENVER_get_features, &info) < 0)
>> +        debug_early("------> xen_init(): Before XENVER_get_features\n");
>> +        if (version_hypercall(XENVER_get_features, &info) < 0) {
>> +            debug_early("------> xen_init(): XENVER_get_features
>> failed!\n");
>>              assert(0);
>> +        }
>>          for (int j = 0; j < 32; j++)
>>              xen_features[i * 32 + j] = !!(info.submap & 1<<j);
>>      }
>> @@ -188,18 +195,25 @@ void xen_init(processor::features_type &features,
>> unsigned base)
>>      if (!features.xen_vector_callback)
>>          evtchn_irq_is_legacy();
>>
>> +    debug_early("------> xen_init(): Before XENMEM_add_to_physmap\n");
>>      struct xen_add_to_physmap map;
>>      map.domid = DOMID_SELF;
>>      map.idx = 0;
>>      map.space = 0;
>> -    map.gpfn = cast_pointer(&xen_shared_info) >> 12;
>> +    map.gpfn = (cast_pointer(mmu::virt_to_phys(&xen_shared_info))) >> 12;
>>
>>      // 7 => add to physmap
>> -    if (memory_hypercall(XENMEM_add_to_physmap, &map))
>> +    if (memory_hypercall(XENMEM_add_to_physmap, &map)) {
>> +        debug_early("------> xen_init(): XENMEM_add_to_physmap
>> failed!\n");
>>          assert(0);
>> +    }
>> +
>> +    debug_early("------> xen_init(): After XENMEM_add_to_physmap\n");
>>
>>      features.xen_pci = xen_pci_enabled();
>>      HYPERVISOR_shared_info = reinterpret_cast<shared_info_t
>> *>(&xen_shared_info);
>> +
>> +    debug_early("------> xen_init(): End\n");
>>  }
>>
>> So it looks like it breaks right in version_hypercall()  and we do not
>> even see another statement before abort().
>>
>> My theory is that there is somewhere missing minus or plus
>> OSV_KERNEL_VM_SHIFT in that code or mmu::virt_to_phys() or phys_to_virt().
>> As you can see I already made a change around setting gpfn field (address
>> of mapping table?,
>> https://github.com/cloudius-systems/osv/blob/a3cd022fcda2c88eae89476aa6c29e3c4be04926/bsd/sys/xen/interface/memory.h#L194-L217)
>> which I am guessing needs to by a physical page address >>12. I also
>> thought that writing an address of hypercall_page to MSR required a
>> physical address. But when I made this change (see commented line), it
>> broke on that call - I did not see the following debug statement.
>>
>> It is very hard to find very good explanation what exactly needs to
>> happen. I have tried to look at Linux code as an example but I am still
>> confused how the xen init process should look like as far as what types of
>> addresses (physical or virtual) need to be passed at what point. For
>> example per this -
>> https://github.com/torvalds/linux/blob/83f3ef3de625a5766de2382f9e077d4daafd5bac/arch/x86/xen/enlighten_pvh.c#L26-L40
>>  -
>> Linux passes physical address or hypercall page. When we do it it breaks.
>>
>> Overall I am not sure I understand what exact mode of Xen virtualization
>> OSv is coded to work under - HVM and/or PV (see
>> https://wiki.xen.org/wiki/Xen_Project_Software_Overview#HVM)? I think
>> when it boots on EC2 or when run with './scripts/run.py -p xen' (please see
>> there is also xenpv mode) Xen uses HVM (hardware assisted virtualization)
>> but also exposes para-virtual xen devices (block and net) which OSv takes
>> advantage of. When xenpv is used xen uses loader.elf (I think) so this is
>> when this entry-xen.S comes it play. In this mode xen is supposed to setup
>> all page tables as we do not go through boot.S logic.
>>
>> In any case I am also not clear how exactly the memory mapping works
>> under Xen. In HVM mode OSv goes through boot16.S/boot.S so it seems to be
>> setting up paging tables same way it does it for example under KVM or other
>> hypervisor. Does it mean that CR3 works just like on KVM? But then I was
>> reading there are special hypercalls to setup mapping. What is the point of
>> this memory_hypercall XENMEM_add_to_physmap() and why do we need it if we
>> have CR3? When we mmap(), do we something extra (hypercalls?) on xen? I
>> could not find any evidence of it.
>>
>> Any ideas/suggestions what might be wrong and what should be fixed?
>>
>> The most frustrating part is that I cannot reproduce it locally.
>>
>> On Sunday, June 16, 2019 at 9:57:58 AM UTC-4, Nadav Har'El wrote:
>>>
>>> On Sun, Jun 16, 2019 at 4:37 PM Waldek Kozaczuk <[email protected]>
>>> wrote:
>>>
>>>>
>>>> +    # virtual address space 1 GiB at a time
>>>>>>>> +    # The very 1st entry maps 1st GiB 1:1 by pointing to
>>>>>>>> ident_pt_l2 table
>>>>>>>> +    # that specifies addresses of every one of 512 2MiB slots of
>>>>>>>> physical memory
>>>>>>>> +    .quad ident_pt_l2 + 0x67 - OSV_KERNEL_VM_SHIFT
>>>>>>>> +    # The 2nd entry maps 2nd GiB to the same 1st GiB of physical
>>>>>>>> memory by pointing
>>>>>>>> +    # to the same ident_pt_l2 table as the 1st entry above
>>>>>>>> +    # This way we effectively provide correct mapping for the
>>>>>>>> kernel linked
>>>>>>>> +    # to start at 1 GiB + 2 MiB (0x40200000) in virtual memory and
>>>>>>>> point to
>>>>>>>> +    # 2 MiB address (0x200000) where it starts in physical memory
>>>>>>>> +    .quad ident_pt_l2 + 0x67 - OSV_KERNEL_VM_SHIFT
>>>>>>>>
>>>>>>>
>>>>>>> Oh, but doesn't this mean that this only works correctly when
>>>>>>> OSV_KERNEL_VM_SHIFT is *exactly* 1 GB?
>>>>>>> I.e., the reason why you want the mapping of the second gigabyte to
>>>>>>> be identical to the first gigabyte is
>>>>>>> just because the shift is exactly 1GB?
>>>>>>>
>>>>>> That is correct. The general scheme (which I am planning to make part
>>>>>> of the next patch at some time) should be this:
>>>>>> OSV_KERNEL_VM_SHIFT = 1 GiB + N * 2MiB where 0 =< N < 500 (more less
>>>>>> as last 24MB of the 2nd GB should be enough for the kernel).
>>>>>> But then instead of re-using and pointing to the ident_pt_l2 table I
>>>>>> will have to define extra instance of ident_pt_l2-equivalent-table where
>>>>>> the first N entries will be zero.
>>>>>>
>>>>>
>>>>> So although in all other places in the code, OSV_KERNEL_VM_SHIFT can
>>>>> be anything, in this specific
>>>>> place you really do assume that OSV_KERNEL_VM_SHIFT= 1GB and nothing
>>>>> else will work? If this
>>>>> is the case, I think you should add a static assertion (if it's
>>>>> possible in the assembly code?) or
>>>>> equivalent, to fail the compilation if OSV_KERNEL_VM_SHIFT != 1 GB.
>>>>>
>>>> Indeed at least with this patch (until we have more dynamic formula) it
>>>> all comes down to a single constraint where we reuse the pl3 table and the
>>>> shift has to be be 1GB. How can we assert it with the compiler ?
>>>>
>>>
>>> Since  OSV_KERNEL_VM_SHIFT is a preprocessor macro, maybe we can check
>>> it with the preprocessor - even in the assembler code, something like:
>>>
>>> #if OSV_KERNEL_VM_SHIFT != 0x40000000 && OSV_KERNEL_VM_SHIFT != 0
>>> #error This code only works correctly for OSV_KERNEL_VM_SHIFT =
>>> 0x40000000 or 0
>>> #endif
>>>
>>> Put it right next to the code which only works for these cases (correct
>>> me if I'm wrong but
>>> I think the same code works for either 0 and 0x40000000, but not
>>> anything else?) and check that you get a compilation error for
>>> OSV_KERNEL_VM_SHIFT = 0x12345.
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "OSv Development" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/osv-dev/6f0686d3-9b87-4e03-b02c-9212a7fdf31b%40googlegroups.com
> <https://groups.google.com/d/msgid/osv-dev/6f0686d3-9b87-4e03-b02c-9212a7fdf31b%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/CANEVyjvX5N8aikuoJC2ufzBLrg9COJiOadBDFPjmzUMNRxb89g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [osv-dev] [PATCH] Move kernel to 0x40200000 address (1 GiB higher) in virtual memory

Reply via email to