On Sun, Jun 16, 2019 at 3:22 PM Waldek Kozaczuk <jwkozac...@gmail.com>
wrote:

>
>
> On Sunday, June 16, 2019 at 5:40:46 AM UTC-4, Nadav Har'El wrote:
>>
>> On Sun, Jun 16, 2019 at 8:05 AM Waldemar Kozaczuk <jwkoz...@gmail.com>
>> wrote:
>>
>>> This patch provides all necessary changes to move OSv kernel by 1 GiB
>>> higher
>>> in virtual memory space to start at 0x40200000. Most changes involve
>>> adding
>>> or substracting 0x40000000 (OSV_KERNEL_VM_SHIFT) in all relevant places.
>>> Please
>>> note that the kernel is still loaded at 2MiB in physical memory.
>>>
>>
>> Hi, overall I think this is a very good direction, because there's indeed
>> no reason why the
>> kernel's physical and virtual addresses need to be the same and the
>> assumption that it
>> was was hidden throughout our code. I do have a couple of questions
>> indline below though.
>>
>> By the way, I personally feel that it would be more convenient to just
>> control the virtual
>> address of the kernel directly - not as a "shift" from its physical
>> address, but I guess it's
>> just a matter of taste, and some of the code below is indeed more natural
>> when written
>> with the "shift" available directly. Perhaps we could have the best of
>> both worlds -
>> define the kernel *position* as the primary setting, and then define
>> OSV_KERNEL_VM_SHIFT
>> as simply the subtraction of the kerne's virtual position and its
>> physical position.
>>
> Either way there should be only single variable so either:
> 1) the shift one is a function of the virtual address (shift = virtual -
> physical) or
> 2) virtual address is a function of the shift (virtual = shift + physical)
>

Right. I thought it would be more convenient to choose the virtual address,
not the shift,
but we can do this later (if at all).


> We might first apply other patch I sent "Make rebuilding loader.elf
> automatic and more efficient when changing kernel_base" that could make it
> handy
> to define it in single header rather than in Makefile. In either case this
> could be changed later.
>


>> Also, I see your patch makes 0x40000000 the default. I'm not sure I'm
>> following all
>> the details here - is there any *downside* to this change from the
>> previous default?
>> I can't think of any, but wondering if there's something I am missing.
>>
> I am not sure what change you are talking about. Just to clarify (per this
> patch):
> OSV_KERNEL_VM_SHIFT = 0x40000000
> elf_start = 0x200000
> phys_elf_start = elf_start + OSV_KERNEL_VM_SHIFT = 0x40200000
> So in the comments I meant that "really" kernel starts at 0x40200000 in
> VMA.
>

Yes, I know. I mean, this patch not only adds a new optional feature of
moving the
kernel's virtual-memory address, it also makes a new address the default.
The question
is whether there is any *downside* to changing this default. I know it
makes some
non-PIE code easier to run but does it make other things harder to run? I
think the
answer is no - there is no downside, but wanted to hear your confirmation.

+    # Each of the 512 entries in this table maps the very 1st 512 GiB of
>
>> +    # virtual address space 1 GiB at a time
>>> +    # The very 1st entry maps 1st GiB 1:1 by pointing to ident_pt_l2
>>> table
>>> +    # that specifies addresses of every one of 512 2MiB slots of
>>> physical memory
>>> +    .quad ident_pt_l2 + 0x67 - OSV_KERNEL_VM_SHIFT
>>> +    # The 2nd entry maps 2nd GiB to the same 1st GiB of physical memory
>>> by pointing
>>> +    # to the same ident_pt_l2 table as the 1st entry above
>>> +    # This way we effectively provide correct mapping for the kernel
>>> linked
>>> +    # to start at 1 GiB + 2 MiB (0x40200000) in virtual memory and
>>> point to
>>> +    # 2 MiB address (0x200000) where it starts in physical memory
>>> +    .quad ident_pt_l2 + 0x67 - OSV_KERNEL_VM_SHIFT
>>>
>>
>> Oh, but doesn't this mean that this only works correctly when
>> OSV_KERNEL_VM_SHIFT is *exactly* 1 GB?
>> I.e., the reason why you want the mapping of the second gigabyte to be
>> identical to the first gigabyte is
>> just because the shift is exactly 1GB?
>>
> That is correct. The general scheme (which I am planning to make part of
> the next patch at some time) should be this:
> OSV_KERNEL_VM_SHIFT = 1 GiB + N * 2MiB where 0 =< N < 500 (more less as
> last 24MB of the 2nd GB should be enough for the kernel).
> But then instead of re-using and pointing to the ident_pt_l2 table I will
> have to define extra instance of ident_pt_l2-equivalent-table where the
> first N entries will be zero.
>

So although in all other places in the code, OSV_KERNEL_VM_SHIFT can be
anything, in this specific
place you really do assume that OSV_KERNEL_VM_SHIFT= 1GB and nothing else
will work? If this
is the case, I think you should add a static assertion (if it's possible in
the assembly code?) or
equivalent, to fail the compilation if OSV_KERNEL_VM_SHIFT != 1 GB.


>
>
>>
>> If this is the case (please correct me if I misunderstood!), this code
>> need to be more sophisticated to handle
>> a general OSV_KERNEL_VM_SHIFT, or you need some sort of compile time
>> check to verify that
>> OSV_KERNEL_VM_SHIFT must be set to 1GB and nothing else.
>>
> Well this be enforced by the linker which should complain if the kernel
> code goes beyond 2GiB limit in VM (small model).
>

But the user can set OSV_KERNEL_VM_SHIFTto 0 or 0.1 GB or 1.3 GB or
whatever, and all of these almost
work (the kernel doesn't go beyond 2GB) but if I understand correctly, will
then fail because of this one piece of code?



>
>>      # Set long mode
>>> @@ -128,7 +147,7 @@ start64:
>>>      jz start64_continue
>>>      call extract_linux_boot_params
>>>      mov $0x1000, %rbx
>>> -    mov $0x200000, %rbp
>>> +    mov $OSV_KERNEL_BASE, %rbp
>>>
>>
>> Good catch.
>>
> Please also note that (I think) we unnecessarily set ebp/rbp in all these
> places and then pass all the way to arch-setup.cc but in reality the
> 0x200000 is predefined in the makefile so I am not sure what is the point
> of this code in *.S files. Either way we can remove this redundancy later.
>

I agree.


>
>>>
>>>  elfnote_val(XEN_ELFNOTE_ENTRY, xen_start)
>>>  elfnote_val(XEN_ELFNOTE_HYPERCALL_PAGE, hypercall_page)
>>> -elfnote_val(XEN_ELFNOTE_VIRT_BASE, 0)
>>> +elfnote_val(XEN_ELFNOTE_VIRT_BASE, OSV_KERNEL_VM_SHIFT)
>>>
>>
>> I have no idea what this does :-(
>>
> Here is the article I read -
> https://wiki.xen.org/wiki/X86_Paravirtualised_Memory_Management (look for
> "Start Of Day" section)
>

I guess we'll need to check on Xen (maybe an older AWS VM with  Xen) to see
if this actually works :-(
I don't have Xen installed locally (and never did...).


>>
>>>  elfnote_str(XEN_ELFNOTE_XEN_VERSION, "xen-3.0")
>>>  elfnote_str(XEN_ELFNOTE_GUEST_OS, "osv")
>>>  elfnote_str(XEN_ELFNOTE_GUEST_VERSION, "?.?")
>>> @@ -50,4 +50,5 @@ xen_start:
>>>      mov %rsp, xen_bootstrap_end
>>>      mov %rsi, %rdi
>>>      call xen_init
>>> +    mov $0x0, %rdi
>>>      jmp start64
>>> diff --git a/arch/x64/loader.ld b/arch/x64/loader.ld
>>> index caae1f68..a3ac0790 100644
>>> --- a/arch/x64/loader.ld
>>> +++ b/arch/x64/loader.ld
>>> @@ -15,15 +15,21 @@ SECTIONS
>>>          * We can't export the ELF header base as a symbol, because ld
>>>          * insists on moving stuff around if we do.
>>>          *
>>> +        * Make kernel start OSV_KERNEL_VM_SHIFT bytes higher than where
>>> it
>>> +        * starts in physical memory. Also put AT() expressions in all
>>> +        * sections to enforce their placements in physical memory lower
>>> +        * by OSV_KERNEL_VM_SHIFT bytes.
>>>
>>
>> I'm afraid I didn't understand the "AT()" part. Why do we need to add
>> OSV_KERNEL_VM_SHIFT
>> On the first line ( . = OSV_KERNEL_BASE + 0x800 + OSV_KERNEL_VM_SHIFT)
>> but then
>> subtract it back on every single address calculation that follows? I
>> didn't understand what
>> you are trying to achieve.
>>
> This part is critical to keep the segments (and sections they are made of)
> in right place in physical memory. It makes paddr stay in tact at 0x200000
> and for example firecracker relies on it. Without AT the loader.elf would
> be 1GB.
>

So, you basically need all the addresses in loader.elf to remain physical -
the small numbers below 1GB? So why did you need to set "." in the
beginning to the high number (above 1GB)? I am afraid I still don't
understand the combination of these things: setting "." to something but
then overriding it on every point.

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/CANEVyjvq05w08JCZZpZoxmdYvNzfPm6LZzJ%3DbQpVSBmpiP%2BEkQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to