Also I ran the 2 CPU example with all tracepoints on and here is what I got:

./scripts/run.py -p qemu_microvm --qemu-path /home/wkozaczuk/projects/qemu/
bin/release/native/x86_64-softmmu/qemu-system-x86_64 --nics 0 -m 64M -c 2 --
block-device-cache writeback,aio=threads -e '/radix -p 2 -r4096' -H --trace 
\*

#In other terminal
./scripts/trace.py extract
./scripts/trace.py summary
Collected 38141 samples spanning 100.38 ms

Time ranges:

  CPU 0x01:  0.000000000 -  0.100380272 =  100.38 ms
  CPU 0x00:  0.083725677 -  0.100295947 =   16.57 ms

Tracepoint statistics:

  name                           count
  ----                           -----
  access_scanner                  5145
  async_worker_started               1
  clear_pte                        256
  condvar_wait                       8
  condvar_wake_all                  12
  memory_free                       64
  memory_malloc                     68
  memory_malloc_large                9
  memory_malloc_mempool             38
  memory_malloc_page                 3
  memory_page_alloc                  9
  memory_page_free                 262
  mutex_lock                      5367
  mutex_lock_wait                   28
  mutex_lock_wake                   30
  mutex_receive_lock                 8
  mutex_send_lock                    8
  mutex_unlock                    5377
  pcpu_worker_sheriff_started        1
  pool_alloc                        38
  pool_free                         52
  pool_free_same_cpu                52
  sched_idle                        13
  sched_idle_ret                    13
  sched_ipi                          7
  sched_load                       118
  sched_migrate                      1
  sched_preempt                     23
  sched_queue                       71
  sched_sched                      101
  sched_switch                      70
  sched_wait                        46
  sched_wait_ret                    43
  sched_wake                      5197
  thread_create                      4
  timer_cancel                    5209
  timer_fired                     5150
  timer_set                       5211
  vfs_pwritev                       13
  vfs_pwritev_ret                   13
  waitqueue_wake_all                 1
  waitqueue_wake_one                 1

./scripts/trace.py cpu-load
 0.000000000             1
 0.000000000             1
 0.000000000             1
 0.000002133             0
 0.000002546             1
 0.000002987             1
 0.000030307             2
 0.000030768             2
 0.000032967             1
 0.000040996             2
 0.000041268             2
 0.000041831             1
 0.000043297             2
 0.000043585             2
 0.000045945             1
 0.000046650             0
 0.000290645             1
 0.000291750             1
 0.000294524             2
 0.000295683             1
 0.000297979             0
 0.000304896             1
 0.000305348             1
 0.000306794             2
 0.000307488             1
 0.000309413             0
 0.000316847             1
 0.000317216             1
 0.000318711             2
 0.000319370             1
 0.000321079             0
 0.000327622             1
 0.000328009             1
 0.000531069             2
 0.000532382             1
 0.000539432             0
 0.000573914             1
 0.000574651             1
 0.000576728             0
 0.000584365             1
 0.000584997             1
 0.000587286             0
 0.000591755             1
 0.000592399             1
 0.000594461             0
 0.000598470             1
 0.000599040             1
 0.000611236             0
 0.000835164             1
 0.000836416             1
 0.000843416             2
 0.000843890             2
 0.000845046             1
 0.000856800             2
 0.000857064             2
 0.000858037             1
 0.000862489             0
 0.086250040          2  0
 0.086252051          3  0
 0.086253257          2  0
 0.086254377          3  0
 0.086296669          2  0
 0.086297441          3  0
 0.086336375          2  0
 0.086337328          3  0
 0.086337723          2  0
 0.086338657          3  0
 0.087719001          2  0
 0.087720113          3  0
 0.089164101          2  0
 0.089165836          3  0
 0.089166234          2  0
 0.089167249          3  0
 0.000000000             1
 0.000000000             1
 0.000000000             1
 0.000002133             0
 0.000002546             1
 0.000002987             1
 0.000030307             2
 0.000030768             2
 0.000032967             1
 0.000040996             2
 0.000041268             2
 0.000041831             1
 0.000043297             2
 0.000043585             2
 0.000045945             1
 0.000046650             0
 0.000290645             1
 0.000291750             1
 0.000294524             2
 0.000295683             1
 0.000297979             0
 0.000304896             1
 0.000305348             1
 0.000306794             2
 0.000307488             1
 0.000309413             0
 0.000316847             1
 0.000317216             1
 0.000318711             2
 0.000319370             1
 0.000321079             0
 0.000327622             1
 0.000328009             1
 0.000531069             2
 0.000532382             1
 0.000539432             0
 0.000573914             1
 0.000574651             1
 0.000576728             0
 0.000584365             1
 0.000584997             1
 0.000587286             0
 0.000591755             1
 0.000592399             1
 0.000594461             0

Is my understanding correct that the load was not spread evenly across both 
cpus?

On Tuesday, February 25, 2020 at 1:09:08 PM UTC-5, Waldek Kozaczuk wrote:

> So I did try to build and run the radix test (please note my Ubuntu laptop 
> has only 4 cores and hyper-threading disabled). BTW it seems that 
> particular benchmark does not need read-write FS so I used ROFS):
>
> ./scripts/manifest_from_host.sh -w ../splash2-posix/kernels/radix/radix && 
> ./scripts/*build* fs=rofs --append-manifest -j4
>
> Linux host 1 cpu:
>
> ./radix -p 1 -r4096
>
>
> Integer Radix Sort
>
>     262144 Keys
>
>     1 Processors
>
>     Radix = 4096
>
>     Max key = 524288
>
>
>
>                  PROCESS STATISTICS
>
>               Total            Rank            Sort
>
> Proc          Time             Time            Time
>
>    0           7335            2568            4765
>
>
>                  TIMING INFORMATION
>
> Start time                        : 1582652832386234
>
> Initialization finish time        : 1582652832444092
>
> Overall finish time               : 1582652832451427
>
> Total time with initialization    :            65193
>
> Total time without initialization :             7335
>
>
> Linux host 2 cpus:
> ./radix -p 2 -r4096
>
> Integer Radix Sort
>      262144 Keys
>      2 Processors
>      Radix = 4096
>      Max key = 524288
>
>
>                  PROCESS STATISTICS
>                Total            Rank            Sort
>  Proc          Time             Time            Time
>     0           4325            1571            2704
>
>                  TIMING INFORMATION
> Start time                        : 1582652821496771
> Initialization finish time        : 1582652821531279
> Overall finish time               : 1582652821535604
> Total time with initialization    :            38833
> Total time without initialization :             4325
>
> host 4 cpus:
> ./radix -p 4 -r4096
>
> Integer Radix Sort
>      262144 Keys
>      4 Processors
>      Radix = 4096
>      Max key = 524288
>
>
>                  PROCESS STATISTICS
>                Total            Rank            Sort
>  Proc          Time             Time            Time
>     0           2599            1077            1470
>
>                  TIMING INFORMATION
> Start time                        : 1582653906150199
> Initialization finish time        : 1582653906171932
> Overall finish time               : 1582653906174531
> Total time with initialization    :            24332
> Total time without initialization :             2599
>
>
> OSv 1 CPU
> ./scripts/run.py -p qemu_microvm --qemu-path 
> /home/wkozaczuk/projects/qemu/bin/release/native/x86_64-softmmu/qemu-system-x86_64
>  
> --nics 0 --nogdb -m 64M -c 1 --block-device-cache writeback,aio=threads -e 
> '/radix -p 1 -r4096'
> OSv v0.54.0-119-g4ee4b788
> Booted up in 3.75 ms
> Cmdline: /radix -p 1 -r4096 
>
> Integer Radix Sort
>      262144 Keys
>      1 Processors
>      Radix = 4096
>      Max key = 524288
>
>
>                  PROCESS STATISTICS
>                Total            Rank            Sort
>  Proc          Time             Time            Time
>     0           6060            2002            4049
>
>                  TIMING INFORMATION
> Start time                        : 1582652845450708
> Initialization finish time        : 1582652845500348
> Overall finish time               : 1582652845506408
> Total time with initialization    :            55700
> Total time without initialization :             6060
>
> OSv 2 CPUs:
> ./scripts/run.py -p qemu_microvm --qemu-path 
> /home/wkozaczuk/projects/qemu/bin/release/native/x86_64-softmmu/qemu-system-x86_64
>  
> --nics 0 --nogdb -m 64M -c 2 --block-device-cache writeback,aio=threads -e 
> '/radix -p 2 -r4096'
> OSv v0.54.0-119-g4ee4b788
> Booted up in 4.81 ms
> Cmdline: /radix -p 2 -r4096 
>
> Integer Radix Sort
>      262144 Keys
>      2 Processors
>      Radix = 4096
>      Max key = 524288
>
>
>                  PROCESS STATISTICS
>                Total            Rank            Sort
>  Proc          Time             Time            Time
>     0           5797            1702            4089
>
>                  TIMING INFORMATION
> Start time                        : 1582653305076852
> Initialization finish time        : 1582653305129792
> Overall finish time               : 1582653305135589
> Total time with initialization    :            58737
> Total time without initialization :             5797
>
> OSv 4 cpus
> ./scripts/run.py -p qemu_microvm --qemu-path 
> /home/wkozaczuk/projects/qemu/bin/release/native/x86_64-softmmu/qemu-system-x86_64
>  
> --nics 0 --nogdb -m 64M -c 4 --block-device-cache writeback,aio=threads -e 
> '/radix -p 4 -r4096'
> OSv v0.54.0-119-g4ee4b788
> Booted up in 5.26 ms
> Cmdline: /radix -p 4 -r4096 
>
> Integer Radix Sort
>      262144 Keys
>      4 Processors
>      Radix = 4096
>      Max key = 524288
>
>
>                  PROCESS STATISTICS
>                Total            Rank            Sort
>  Proc          Time             Time            Time
>     0           6498            2393            4099
>
>                  TIMING INFORMATION
> Start time                        : 1582653946823458
> Initialization finish time        : 1582653946875522
> Overall finish time               : 1582653946882020
> Total time with initialization    :            58562
> Total time without initialization :             6498
>
>
> As you can see with single CPU the benchmark seems to be 10-15 % faster. 
> But with two and four CPUs OSv barely sees any improvements, whereas on 
> host the app runs 40% faster. So OSv does not seem to scale at all 
> (somebody mentioned it used to) so it would be nice to understand why. OSv 
> has many sophisticated tracing tools that can help here - 
> https://github.com/cloudius-systems/osv/wiki/Trace-analysis-using-trace.py
>
> Waldek
>
> BTW1. I tried to bump size of the matrix to something higher but with 
> -r8192 the app crashes on both Linux and OSv.
> BTW2. It would be interestingly to compare OSv with Linux guest (vs host).
>
> On Tuesday, February 25, 2020 at 10:05:08 AM UTC-5, twee...@comcast.net 
> wrote:
>>
>> Thanks for the response! I will get this information to you after work 
>> with the few modifications you recommended! The application is essentially 
>> just testing CPU performance using multiprocessing. Nothing too fancy about 
>> it! The code I am using can be found at:
>>
>> https://www.github.com/ProfessorWest/splash2-posix
>>
>> In side of the kernels folder located at radix.c and I change the problem 
>> size to 16,777,206. 
>>
>> If you happen to examine the code, do ignore the lacking cleanness of the 
>> code...we just smashed everything into one file for simplicity on our end. 
>> (Running the same code across all platforms being benchmarked). 
>>
>> On Tuesday, February 25, 2020 at 8:52:48 AM UTC-5, Waldek Kozaczuk wrote:
>>>
>>> Hi,
>>>
>>> I am quite late to the party :-) Could you run OSv on single CPU with 
>>> verbose on (add -V to run.py) and send us the output so we can see a little 
>>> more what is happening. To disable networking you need to add '--nics=0' 
>>> (for all 50 options run.py supports run it with '--help'). I am not 
>>> familiar with that benchmark but I wonder if it needs read-write FS (ZFS in 
>>> OSv case), if not that you can build OSv images with read-only FS 
>>> (./scripts/build fs=rofs). Lastly, you can improve boot time by running OSv 
>>> on firecracker (
>>> https://github.com/cloudius-systems/osv/wiki/Running-OSv-on-Firecracker) 
>>> or on QEMU microvm (-p qemu_imcrovm - requires QEMU >= 4.1), with read-only 
>>> FS on both OSv should boot within 5ms, ZFS within 40ms). Last thing - 
>>> writing to console on OSv can be quite slow, I wonder how much this 
>>> benchmark does it.
>>>
>>> While I definitely agree with my colleague Nadav, where he essentially 
>>> says do not use OSv if the raw performance matters (database for example) 
>>> and Linux will beat it no matter what, OSv may have advantages in use cases 
>>> where pure performance does not matter (it still needs to be reasonable). I 
>>> think the best use cases for OSv are serverless or stateless apps 
>>> (microservices or web assembly) running on single CPU where all state 
>>> management is delegated to a remote persistent store (most custom-built 
>>> business apps are like that) and where high isolation matters. 
>>>
>>> Relatedly, I think it might be more useful to think of OSv (and other 
>>> unikernels) as highly isolated processes. To that end, we still need to 
>>> optimize memory overhead (stacks for example) and improve virtio-fs support 
>>> (in this case to run the app on OSv you do not need full image, just kernel 
>>> to run a Linux app).
>>>
>>> Also, I think the lack of good tooling in unikernel space affects their 
>>> adoption. Compare it with docker - build, push, pull, run. OSv has its 
>>> equivalent - capstan - but at this point, we do not really have a registry 
>>> where one can pull the latest OSv kernel or push, pull images. Trying to 
>>> run an app on OSv is still quite painful to a business app developer - it 
>>> probably takes at least 30 minutes or so. 
>>>
>>> Lastly, I think one of the main reasons for Docker adoption, was 
>>> repeatability (besides its fantastic ease of use) where one can create an 
>>> image and expect it to run almost the same way in production. Imagine you 
>>> can achieve that with OSv. 
>>>
>>> Waldek
>>>
>>> On Tuesday, February 25, 2020 at 7:00:16 AM UTC-5, twee...@comcast.net 
>>> wrote:
>>>>
>>>> Very well explained. Thank you for that. That does make perfect sense 
>>>> as well. 
>>>
>>>

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/0b85eefa-dee1-47b0-9fa9-b043bd61d67b%40googlegroups.com.

Reply via email to