On Fri, Apr 20, 2018 at 4:07 AM, Waldek Kozaczuk <jwkozac...@gmail.com>
wrote:

> To make SMP working I had to hack OSv to pass 00000000fee00900 when
> enabling APIC for first CPU and 00000000fee00800 for all other CPUs. It
> looks like (based on source code of hyperkit) it requires that the APIC
> registers memory area base address passed in when enabling it needs to be
> the same as when it was read. But why is it different for each CPU? It
> looks like QEMU/KVM, VMware, XEN hypervisors OSv runs on do not have this
> requirement. Unfortunately I am not very familiar with APIC so if anybody
> can enlighten me I would appreciate it.
>
> I was also wondering if anybody knows what is the reason behind this logic:
>
> void apic_driver::read_base()
> {
>     static constexpr u64 base_addr_mask = 0xFFFFFF000;
>     _apic_base = rdmsr(msr::IA32_APIC_BASE) & base_addr_mask;
> }
>
>
> Why are we masking with 0xFFFFFF000? Based on the logs from OSv when
> running on hyperkit this logic effectively overwrites original APIC base
> address as 00000000fee00000:
>
> ### apic_driver:read_base() - read base as  : 00000000fee00900
>
> ### apic_driver:read_base() - saved base as : 00000000fee00000
>
> ### xapic:enable() - enabling with base as  : 00000000fee00900
>
> So in case of hyperkit when we pass 00000000fee00000 instead of 
> 00000000fee00900
> (which is what hyperkit returned in read_base()) it rejects it. However the
> same logic works just fine with other hypervisors.
>

First, to explain the various addresses you saw and what these "800" and
"900" mean:

The last 12 bits of this MSR are *not* part of the address (which is
supposed to be page aligned, i.e., last 12 bits are zero....), but rather,
various other flags.
Of particular interest are the bits 0x800 which means "enabled", and 0x100
which means "bootstrap". That latter should be set only on the first CPU -
which explains why you saw 0x900 on the first CPU and 0x800 on all others.

The bug is, as you suspected, probably in

void xapic::enable()
{
    wrmsr(msr::IA32_APIC_BASE, _apic_base | APIC_BASE_GLOBAL_ENABLE);
    software_enable();
}

After _apic_base was previously stripped from all the bit flags, this code
adds back just one. But apparently hyperkit doesn't like losing the BSP
("bootstrap") flag on the first APIC, for some reason. And it shouldn't
have. It's a bug we removed it. I think this code should be changed to do:

wrmsr(msr::IA32_APIC_BASE, rdmsr(msr::IA32_APIC_BASE) |
APIC_BASE_GLOBAL_ENABLE);

And not use "_apic_base" at all.

I think that x2apic::enable() should be changed similarly to rdmsr instead
of using _apic_base.

If you could check these fixes and if they work, send a patch, that would
be great
Thanks.



>
> Waldek
>
> PS. Adding an article link I found abound APIC -
> https://wiki.osdev.org/APIC
>
> On Tuesday, April 17, 2018 at 12:08:42 PM UTC-4, Waldek Kozaczuk wrote:
>>
>> I forgot to add that to achieve this kind of timing I built image with
>> ROFS:
>>
>> ./scripts/build image=native-example fs=rofs
>>
>> On Monday, April 16, 2018 at 5:25:37 PM UTC-4, Waldek Kozaczuk wrote:
>>>
>>> I have never tried brew to install it but possibly it can work.
>>>
>>> I cloned it directly from https://github.com/moby/hyperkit and then
>>> built locally. That way I could put all kinds of debug statements to figure
>>> out why OSv was not working. Feel free to use my fork of hyperkit -
>>> https://github.com/wkozaczuk/hyperkit/tree/osv - which has ton of debug
>>> statements.
>>>
>>> To build hyperkit locally you need to install developer tools from Apple
>>> that includes gcc, make, git, etc. I believe if you open terminal and type
>>> 'gcc' it will ask you if you want to install developer tools. And then git
>>> clone, make and you have you own hyperkit under build subdirectory.
>>>
>>> I did not have to modify hyperkit to make it work with OSv. All my
>>> modifications are on OSv multiboot branch - https://github.com/wkozaczuk
>>> /osv/tree/multiboot. Beside tons of debug statements added all over the
>>> place I added multiboot_header.asm and multiboot.S with some hard-coded
>>> values to pass correct memory info to OSv. I also modified Makefile,
>>> lzloader.ld and disabled assert in hpet.cc (
>>> https://github.com/wkozaczuk/osv/blob/multiboot/drivers/hpet.cc#L55) -
>>> hyperkit does not seem to support 64-bit counters. Finally I hacked
>>> arch/x64/apic.cc to properly read and then pass APIC memory base offset
>>> when enabling APIC - otherwise interrupts would not work. I do not not
>>> understand why original apic logic in OSv did not work.
>>>
>>> To run it have a script like this:
>>>
>>> IMAGE=$1
>>> DISK=$2
>>>
>>> build/hyperkit -A -m 512M -s 0:0,hostbridge \
>>>   -s 31,lpc \
>>>   -l com1,stdio \
>>>   -s 4,virtio-blk,$DISK \
>>>   -f multiboot,$IMAGE
>>>
>>> where IMAGE is lzloader.elf and IMAGE is build/release/usr.img converted
>>> to raw.
>>>
>>> Enjoy!
>>>
>>> Waldek
>>>
>>> PS. I also had to hard code cmdline in loader.cc. I think it should come
>>> from multiboot.
>>>
>>> On Sunday, April 15, 2018 at 8:36:54 PM UTC-4, Asias He wrote:
>>>>
>>>>
>>>>
>>>> On Wed, Apr 11, 2018 at 3:29 AM, Waldek Kozaczuk <jwkoz...@gmail.com>
>>>> wrote:
>>>>
>>>>> Last week I have been trying to hack OSv to run on hyperkit and
>>>>> finally I managed to execute native hello world example with ROFS.
>>>>>
>>>>> Here is a timing on hyperkit/OSX (the bootchart does not work on
>>>>> hyperkit due to not granular enough timer):
>>>>>
>>>>> OSv v0.24-516-gc872202
>>>>> Hello from C code
>>>>>
>>>>> *real 0m0.075s *
>>>>> *user 0m0.012s *
>>>>> *sys 0m0.058s*
>>>>>
>>>>> command to boot it (please note that I hacked the lzloader ELF to
>>>>> support multiboot):
>>>>>
>>>>> hyperkit -A -m 512M \
>>>>>   -s 0:0,hostbridge \
>>>>>   -s 31,lpc \
>>>>>   -l com1,stdio \
>>>>>   -s 4,virtio-blk,test.img \
>>>>>   -f multiboot,lzloader.elf
>>>>>
>>>>
>>>> Impressive! How hard is it to setup hyperkit on osx, just brew install?
>>>>
>>>>
>>>>
>>>>>
>>>>> Here is a timing on QEMU/KVM on Linux (same hardware - my laptop is
>>>>> setup to triple-boot Ubuntu 16/Mac OSX and Windows):
>>>>>
>>>>> OSv v0.24-510-g451dc6d
>>>>> 4 CPUs detected
>>>>> Firmware vendor: SeaBIOS
>>>>> bsd: initializing - done
>>>>> VFS: mounting ramfs at /
>>>>> VFS: mounting devfs at /dev
>>>>> net: initializing - done
>>>>> vga: Add VGA device instance
>>>>> virtio-blk: Add blk device instances 0 as vblk0, devsize=8520192
>>>>> random: intel drng, rdrand registered as a source.
>>>>> random: <Software, Yarrow> initialized
>>>>> VFS: unmounting /dev
>>>>> VFS: mounting rofs at /rofs
>>>>> VFS: mounting devfs at /dev
>>>>> VFS: mounting procfs at /proc
>>>>> VFS: mounting ramfs at /tmp
>>>>> disk read (real mode): 28.31ms, (+28.31ms)
>>>>> uncompress lzloader.elf: 49.63ms, (+21.32ms)
>>>>> TLS initialization: 50.23ms, (+0.59ms)
>>>>> .init functions: 52.22ms, (+1.99ms)
>>>>> SMP launched: 53.01ms, (+0.79ms)
>>>>> VFS initialized: 55.25ms, (+2.24ms)
>>>>> Network initialized: 55.54ms, (+0.29ms)
>>>>> pvpanic done: 55.66ms, (+0.12ms)
>>>>> pci enumerated: 60.40ms, (+4.74ms)
>>>>> drivers probe: 60.40ms, (+0.00ms)
>>>>> drivers loaded: 126.37ms, (+65.97ms)
>>>>> ROFS mounted: 128.65ms, (+2.28ms)
>>>>> Total time: 128.65ms, (+0.00ms)
>>>>> Hello from C code
>>>>> VFS: unmounting /dev
>>>>> VFS: unmounting /proc
>>>>> VFS: unmounting /
>>>>> ROFS: spent 1.00 ms reading from disk
>>>>> ROFS: read 21 512-byte blocks from disk
>>>>> ROFS: allocated 18 512-byte blocks of cache memory
>>>>> ROFS: hit ratio is 89.47%
>>>>> Powering off.
>>>>>
>>>>> *real 0m1.049s*
>>>>> *user 0m0.173s*
>>>>> *sys 0m0.253s*
>>>>>
>>>>> booted like so:
>>>>>
>>>>> qemu-system-x86_64 -m 2G -smp 4 \
>>>>>
>>>>>  -device virtio-blk-pci,id=blk0,bootindex=0,drive=hd0,scsi=off \
>>>>>
>>>>>  -drive 
>>>>> file=/home/wkozaczuk/projects/osv/build/last/usr.img,if=none,id=hd0,cache=none,aio=native
>>>>>  \
>>>>>
>>>>>  -enable-kvm -cpu host,+x2apic \
>>>>>
>>>>>  -chardev stdio,mux=on,id=stdio,signal=off \
>>>>>
>>>>>  -mon chardev=stdio,mode=readline
>>>>>
>>>>>  -device isa-serial,chardev=stdio
>>>>>
>>>>>
>>>>> In both cases I am not using networking - only block device. BTW I
>>>>> have not tested how networking nor SMP on hyperkit with OSv.
>>>>>
>>>>> So as you can see* OSv is 10 (ten) times faster* on the same
>>>>> hardware. I am not sure if my results are representative. But if they are
>>>>> it would mean that QEMU is probably the culprit. Please see my
>>>>> questions/consideration toward the end of the email.
>>>>>
>>>>> Anyway let me give you some background. What is hyperkit? Hyperkit (
>>>>> https://github.com/moby/hyperkit) is a fork by Docker of xhyve (
>>>>> https://github.com/mist64/xhyve) which itself is a port of bhyve (
>>>>> https://www.freebsd.org/doc/handbook/virtualization-host-bhyve.html)
>>>>> - hypervisor on FreeBSD. Bhyve architecture is similar to that of KVM/QEMU
>>>>> but QEMU-equivalent of bhyve is much lighter and simpler:
>>>>>
>>>>> "The bhyve BSD-licensed hypervisor became part of the base system
>>>>> with FreeBSD 10.0-RELEASE. This hypervisor supports a number of guests,
>>>>> including FreeBSD, OpenBSD, and many Linux® distributions. By default,
>>>>>  bhyve provides access to serial console and does not emulate a
>>>>> graphical console. Virtualization offload features of newer CPUs are
>>>>> used to avoid the legacy methods of translating instructions and manually
>>>>> managing memory mappings.
>>>>>
>>>>> The bhyve design requires a processor that supports Intel® Extended
>>>>> Page Tables (EPT) or AMD® Rapid Virtualization Indexing (RVI) or
>>>>> Nested Page Tables (NPT). Hosting Linux® guests or FreeBSD guests
>>>>> with more than one vCPU requires VMX unrestricted mode support (UG).
>>>>> Most newer processors, specifically the Intel® Core™ i3/i5/i7 and
>>>>> Intel® Xeon™ E3/E5/E7, support these features. UG support was
>>>>> introduced with Intel's Westmere micro-architecture. For a complete list 
>>>>> of
>>>>> Intel® processors that support EPT, refer to
>>>>> http://ark.intel.com/search/advanced?s=t&ExtendedPageTables=true. RVI is
>>>>> found on the third generation and later of the AMD Opteron™
>>>>> (Barcelona) processors"
>>>>>
>>>>> Hyperkit/Xhyve is a port of bhyve but targets Apple OSX as a host
>>>>> system and instead of FreeBSD vmm kernel module uses Apple hypervisor
>>>>> framework (https://developer.apple.com/documentation/hypervisor).
>>>>> Docker, I think, forked xhyve to create hyperkit in order to provide
>>>>> lighter alternative of running Docker containers on Linux on Mac. So in
>>>>> essence hyperkit is a component of Docker for Mac vs Docker 
>>>>> Machine/Toolbox
>>>>> (based on VirtualBox). Please see for details there -
>>>>> https://docs.docker.com/docker-for-mac/docker-toolbox/.
>>>>>
>>>>> How does it apply to OSv? It only applies if you want to run OSv on
>>>>> Mac. Now the only choice is QEMU (dog slow because no KVM) or VirtualBox
>>>>> (pretty fast once OSv is up but it takes long time to boot and has other
>>>>> configuration quirks). Based on my experiments hyperkit becomes very
>>>>> compelling new alternative.
>>>>>
>>>>> Reletely you maybe aware of uKVM (https://github.com/Solo5/solo5) -
>>>>> very light hypervisor to run clean-state unikernels like MirageOS or
>>>>> IncludeOS. There is an OSv issue - https://github.com/cloudius-
>>>>> systems/osv/issues/886- and corresponding one on uKVM -
>>>>> https://github.com/Solo5/solo5/issues/249 - to track what it would
>>>>> take to boot OSv on it. It turns our (read issues for details) that in
>>>>> current form uKVM is too minimalistic for OSv. For example there is no
>>>>> interrupts. The experiment with hyperkit made me think that it would be
>>>>> nice if there an alternative to QEMU on Linux - simpler and lighter than
>>>>> qemu but not as minimalistic as uKVM - something equivalent to bhyve for
>>>>> Linux.
>>>>>
>>>>> Waldek
>>>>>
>>>>> PS. I have created an issue https://github.com/cloud
>>>>> ius-systems/osv/issues/948 to track what it would take to make OSv
>>>>> run on hyperkit.
>>>>>
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "OSv Development" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to osv-dev+u...@googlegroups.com.
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Asias
>>>>
>>> --
> You received this message because you are subscribed to the Google Groups
> "OSv Development" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to osv-dev+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to