Re: Qemu storage performance drops when smp > 1 (NetBSD 9.3 + Qemu/nvmm + ZVOL)

2022-08-17 Thread Matthias Petermann

Hello Brian,

On 18.08.22 07:10, Brian Buhrow wrote:

hello.  that's interesting.  Do the cores used for the vms also get 
used for the host os?
Can you arrange things so that the host os gets dedicated cores that the vms 
can't use?  If you
do that, do you still see a performance drop when you add cores to the vms?


the host OS has no other purpose than managing the VMs at this time. So 
apart from the minimal footprint of the hosts processes (postfix, 
syslog...) all the ressources are available to the Qemu processes.


In the meantime, because of the other mail in this thread, I believe my 
fundamental misunderstanding is/was that I can overcommit a physical 2 
Core CPU with a multiple of virtual cores without penalties.
I'm starting to see that now, I wasn't aware that the context switches 
are so expensive that they can paralyze the whole I/O system.


Kind regards
Matthias



smime.p7s
Description: S/MIME Cryptographic Signature


Re: Qemu storage performance drops when smp > 1 (NetBSD 9.3 + Qemu/nvmm + ZVOL)

2022-08-17 Thread Brian Buhrow
hello.  that's interesting.  Do the cores used for the vms also get 
used for the host os?
Can you arrange things so that the host os gets dedicated cores that the vms 
can't use?  If you
do that, do you still see a performance drop when you add cores to the vms?

-thanks
-Brian



Re: Qemu storage performance drops when smp > 1 (NetBSD 9.3 + Qemu/nvmm + ZVOL)

2022-08-17 Thread Matthias Petermann

Hello Brian,

On 17.08.22 20:51, Brian Buhrow wrote:

hello.  If you want to use zfs for your storage, which I strongly 
recommend, lose the
zvols and use flat files inside zfs itself.  I think you'll find your storage 
performance goes
up by orders of magnetude.  I struggled with this on FreeBSD for over a year 
before I found the
myriad of tickets on google regarding the terrible performance of zvols.  It's 
a real shame,
because zvols are such a tidy way to manage virtual servers.  However, the 
performance penalty
is just too big to ignore.
-thanks
-Brian



thank you for your suggestion. I have researched the ZVOL vs. QCOW2 
discussion. Unfortunately, nothing can be found in connection with 
NetBSD, but some in connection with Linux and KVM. The things I have 
found attest to the ZVOLs at least a slight performance advantage. That 
people finally decide for QCOW2 seems to be mainly due to the fact that 
the VM can be paused when the underlying storage is overfilled instead 
of crashing like with ZVOL. However, this situation can also be 
prevented with monitoring and regular snapshots.


Nevertheless, I made a practical attempt and built my described test 
scenario exactly with QCOW2 files located in one and the same ZFS 
dataset. However, the result is almost the same.


If I give the Qemu processes only one core via parameter -smp, I can 
measure a very good I/O bandwidth on the host - depending on the number 
of running VMs it even increases significantly, so that the limiting 
factor here seems to be only the single-thread performance of a CPU core:


- VM 1 with 1 SMP Core: ~200 MByte/s
- + VM2 with 1 SMP Core:  ~300 MByte/s
- + VM3 with 1 SMP Core:  ~500 MByte/s

As with my first test, performance is dramatically worse when I give 
each VM 2 cores instead of 1:


- VM 1 with 2 SMP Cores: ~30...40 MByte/s
- + VM2 with 2 SMP Cores:  < 1 MByte/s
- + VM3 with 2 SMP Cores:  < 1 MByte/s

Is there any logical explanation for this drastic drop in performance?

Kind regards
Matthias



smime.p7s
Description: S/MIME Cryptographic Signature


Automated report: NetBSD-current/i386 test failure

2022-08-17 Thread NetBSD Test Fixture
This is an automatically generated notice of a new failure of the
NetBSD test suite.

The newly failing test case is:

usr.bin/make/t_make:opt_query

The above test failed in each of the last 4 test runs, and passed in
at least 26 consecutive runs before that.

The following commits were made between the last successful test and
the failed test:

2022.08.17.19.56.28 rillig src/sys/net/if.c,v 1.512
2022.08.17.20.03.05 riastradh src/sys/dev/usb/uhci.c,v 1.316
2022.08.17.20.05.41 rillig src/usr.bin/make/unit-tests/opt-query.exp,v 1.3
2022.08.17.20.05.41 rillig src/usr.bin/make/unit-tests/opt-query.mk,v 1.5

Logs can be found at:


http://releng.NetBSD.org/b5reports/i386/commits-2022.08.html#2022.08.17.20.05.41


daily CVS update output

2022-08-17 Thread NetBSD source update


Updating src tree:
P src/bin/sh/histedit.c
P src/distrib/sets/lists/base/mi
P src/doc/3RDPARTY
P src/doc/CHANGES
P src/external/public-domain/tz/tzdata2netbsd
P src/external/public-domain/tz/dist/Makefile
P src/external/public-domain/tz/dist/NEWS
U src/external/public-domain/tz/dist/TZDATA_VERSION
P src/external/public-domain/tz/dist/africa
P src/external/public-domain/tz/dist/antarctica
P src/external/public-domain/tz/dist/asia
P src/external/public-domain/tz/dist/australasia
P src/external/public-domain/tz/dist/backward
P src/external/public-domain/tz/dist/backzone
P src/external/public-domain/tz/dist/calendars
P src/external/public-domain/tz/dist/etcetera
P src/external/public-domain/tz/dist/europe
P src/external/public-domain/tz/dist/leap-seconds.list
P src/external/public-domain/tz/dist/leapseconds
P src/external/public-domain/tz/dist/northamerica
P src/external/public-domain/tz/dist/southamerica
P src/external/public-domain/tz/dist/theory.html
U src/external/public-domain/tz/dist/version
P src/external/public-domain/tz/dist/ziguard.awk
P src/external/public-domain/tz/dist/zishrink.awk
P src/external/public-domain/tz/dist/zone.tab
P src/external/public-domain/tz/dist/zone1970.tab
P src/lib/libc/stdlib/strfmon.c
P src/sbin/ifconfig/af_inetany.c
P src/sys/arch/sun3/conf/GENERIC3X
P src/sys/dev/usb/uhci.c
P src/sys/net/if.c
P src/usr.bin/make/compat.c
P src/usr.bin/make/make.c
P src/usr.bin/make/unit-tests/opt-query.exp
P src/usr.bin/make/unit-tests/opt-query.mk

Updating xsrc tree:


Killing core files:




Updating file list:
-rw-rw-r--  1 srcmastr  netbsd  38500134 Aug 18 03:03 ls-lRA.gz


Automated report: NetBSD-current/i386 test failure

2022-08-17 Thread NetBSD Test Fixture
This is an automatically generated notice of a new failure of the
NetBSD test suite.

The newly failing test case is:

lib/libc/locale/t_strfmon:strfmon

The above test failed in each of the last 4 test runs, and passed in
at least 26 consecutive runs before that.

The following commits were made between the last successful test and
the failed test:

2022.08.17.09.32.56 christos src/lib/libc/stdlib/strfmon.c,v 1.17

Logs can be found at:


http://releng.NetBSD.org/b5reports/i386/commits-2022.08.html#2022.08.17.09.32.56


Re: Qemu storage performance drops when smp > 1 (NetBSD 9.3 + Qemu/nvmm + ZVOL)

2022-08-17 Thread Brian Buhrow
hello.  If you want to use zfs for your storage, which I strongly 
recommend, lose the
zvols and use flat files inside zfs itself.  I think you'll find your storage 
performance goes
up by orders of magnetude.  I struggled with this on FreeBSD for over a year 
before I found the
myriad of tickets on google regarding the terrible performance of zvols.  It's 
a real shame,
because zvols are such a tidy way to manage virtual servers.  However, the 
performance penalty
is just too big to ignore.
-thanks
-Brian
On Aug 17,  5:43pm, Matthias Petermann wrote:
} Subject: Qemu storage performance drops when smp > 1 (NetBSD 9.3 + Qemu/nv
} This is a cryptographically signed message in MIME format.
} 
} --ms050409000304070603080100
} Content-Type: text/plain; charset=utf-8; format=flowed
} Content-Language: de-DE
} Content-Transfer-Encoding: quoted-printable
} 
} Hello,
} 
} I'm trying to find the cause of a performance problem and don't really=20
} know how to proceed.
} 


Qemu storage performance drops when smp > 1 (NetBSD 9.3 + Qemu/nvmm + ZVOL)

2022-08-17 Thread Matthias Petermann

Hello,

I'm trying to find the cause of a performance problem and don't really 
know how to proceed.



## Test Setup

Given a host (Intel NUC7CJYHN, 2 physical cores, 8 GB RAM, 500 GB SSD) 
with a fresh NetBSD/amd64 9.3_STABLE. The SSD contains ESP, an FFS root 
partition and swap, and a large ZPOOL.


The host is to be used as a virtual host for VMs. For this, VMs are run 
with Qemu 7.0.0 (from pkgsrc 2022Q2) and nvmm. The VMs also run NetBSD 
9.3. Storage is provided by ZVOLs through virtio.


Before explaining the issue I face, here some rounded numbers showing 
the performance of the host OS (sampled with iostat):


1) Starting one writer

```
# dd if=/dev/zero of=/dev/zvol/rdsk/tank/vol/test1 bs=4m &


---> ~ 200 MByte/s

2) Adding another writer

```
# dd if=/dev/zero of=/dev/zvol/rdsk/tank/vol/test2 bs=4m &
```

---> ~ 300 MByte/s

3) Adding another writer

```
# dd if=/dev/zero of=/dev/zvol/rdsk/tank/vol/test3 bs=4m &
```

---> ~ 500 MByte/s

From my understanding, this represents the write performance I can 
expect with my hardware when I write raw data in parallel to discrete 
ZVOLS located on the same physical storage (SSD).


This picture changes completely when Qemu comes into play. I did install 
a basic NetBSD 9.3 on each of the ZVOLs (standard layout with FFSv2 + 
WAPBL) and operate them with this QEMU command:


```
qemu-system-x86_64  -machine pc-q35-7.0 -smp $VM_CORES -m $VM_RAM -accel 
nvmm \

-k de -boot cd \
-machine graphics=off -display none -vga none \
-object 
rng-random,filename=/dev/urandom,id=viornd0 \

-device virtio-rng-pci,rng=viornd0 \
-object iothread,id=t0 \
-device virtio-blk-pci,drive=hd0,iothread=t0 \
-device virtio-net-pci,netdev=vioif0,mac=$VM_MAC \
-chardev 
socket,id=monitor,path=$MONITOR_SOCKET,server=on,wait=off \

-monitor chardev:monitor \
-chardev 
socket,id=serial0,path=$CONSOLE_SOCKET,server=on,wait=off \

-serial chardev:serial0 \
-pidfile /tmp/$VM_ID.pid \
-cdrom $VM_CDROM_IMAGE \
-drive 
file=$VM_HDD_VOLUME,if=none,id=hd0,format=raw \
-netdev 
tap,id=vioif0,ifname=$VM_NETIF,script=no,downscript=no \

-device virtio-balloon-pci,id=balloon0
```

The command already includes following optimizations:

 - use virtio driver instead of emulated SCSI device
 - use a separate I/O thread for block device access


## Test Case 1

The environment is set for this test:

 - VM_CORES: 1
 - VM_RAM: 256
 - 
 - VM_HDD_VOLUME (e.g. /dev/zvol/rdsk/tank/vol/test3), each VM has its 
dedicated ZVOL


My test case is the following:

0) Launch iostat -c on the Host and monitor continuously
1) Launch 3 instances of the VM configuration (vm1, vm2, vm3)
2) SSH into vm1
3) Issue dd if=/dev/zero of=/root/test.img bs=4m
   Observation: iostat on Host shows ~140 MByte /s
4) SSH into vm2
5) Issue dd if=/dev/zero of=/root/test.img bs=4m
   Observation: iostat on Host shows ~180 MByte /s
6) SSH into vm3
7) Issue dd if=/dev/zero of=/root/test.img bs=4m
   Observation: iostat on Host shows ~220 MByte /s

Intermediate summary:

 - pretty good results :-)
 - with each additional writer, the bandwidth utilization raises


## Test Case 2

The environment is modified for this test:

 - VM_CORES: 2

The same test case yields completely different results:

0) Launch iostat -c on the Host and monitor continuously
1) Launch 3 instances of the VM configuration (vm1, vm2, vm3)
2) SSH into vm1
3) Issue dd if=/dev/zero of=/root/test.img bs=4m
   Observation: iostat on Host shows ~ 30 MByte /s
4) SSH into vm2
5) Issue dd if=/dev/zero of=/root/test.img bs=4m
   Observation: iostat on Host shows ~ 3 MByte /s
6) SSH into vm3
7) Issue dd if=/dev/zero of=/root/test.img bs=4m
   Observation: iostat on Host shows < 1 MByte /s

Intermediate summary:

 - unexpected bad performance - even with one writer performance is far
   below the values compared to using only one core per VM
 - bandwidth drops dramatically with each additional writer

## Summary and Questions

 - adding more cores to Qemu seems to considerable impact disk I/O 
performance

 - Is this expected / known behavior?
 - What could I do to mitigate / help to find root cause?

By the way - except for this hopefully solvable problem I am surprised 
how well the team of NetBSD, ZVOL, Qemu and NVMM works.



Kind regards
Matthisa



smime.p7s
Description: S/MIME Cryptographic Signature


Automated report: NetBSD-current/i386 build success

2022-08-17 Thread NetBSD Test Fixture
The NetBSD-current/i386 build is working again.

The following commits were made between the last failed build and the
successful build:

2022.08.17.14.03.05 kre src/distrib/sets/lists/base/mi,v 1.1311

Logs can be found at:


http://releng.NetBSD.org/b5reports/i386/commits-2022.08.html#2022.08.17.14.03.05


Automated report: NetBSD-current/i386 build failure

2022-08-17 Thread NetBSD Test Fixture
This is an automatically generated notice of a NetBSD-current/i386
build failure.

The failure occurred on babylon5.netbsd.org, a NetBSD/amd64 host,
using sources from CVS date 2022.08.17.12.35.10.

An extract from the build.sh output follows:

Files in DESTDIR but missing from flist.
File is obsolete or flist is out of date ?
--
./usr/share/zoneinfo/Europe/Kyiv
=  end of 1 extra files  ===
*** Failed target: checkflist
*** Failed commands:
${SETSCMD} ${.CURDIR}/checkflist  ${MAKEFLIST_FLAGS} 
${CHECKFLIST_FLAGS} ${METALOG.unpriv}
=> cd /tmp/build/2022.08.17.12.35.10-i386/src/distrib/sets &&  
DESTDIR=/tmp/build/2022.08.17.12.35.10-i386/destdir  MACHINE=i386  
MACHINE_ARCH=i386  AWK=/tmp/build/2022.08.17.12.35.10-i386/tools/bin/nbawk  
CKSUM=/tmp/build/2022.08.17.12.35.10-i386/tools/bin/nbcksum  
DB=/tmp/build/2022.08.17.12.35.10-i386/tools/bin/nbdb  
EGREP=/tmp/build/2022.08.17.12.35.10-i386/tools/bin/nbgrep\ -E  HOST_SH=/bin/sh 
 MAKE=/tmp/build/2022.08.17.12.35.10-i386/tools/bin/nbmake  
MKTEMP=/tmp/build/2022.08.17.12.35.10-i386/tools/bin/nbmktemp  
MTREE=/tmp/build/2022.08.17.12.35.10-i386/tools/bin/nbmtree  
PAX=/tmp/build/2022.08.17.12.35.10-i386/tools/bin/nbpax  COMPRESS_PROGRAM=gzip  
GZIP=-n  XZ_OPT=-9  TAR_SUFF=tgz  
PKG_CREATE=/tmp/build/2022.08.17.12.35.10-i386/tools/bin/nbpkg_create  
SED=/tmp/build/2022.08.17.12.35.10-i386/tools/bin/nbsed  
TSORT=/tmp/build/2022.08.17.12.35.10-i386/tools/bin/nbtsort\ -q  /bin/sh 
/tmp/build/2022.08.17.12.35.10-i386/src/distrib/sets/checkflist  -L base  -M 
/tmp/build/2022
 .08.17.12.35.10-i386/destdir/METALOG.sanitised
*** [checkflist] Error code 1
nbmake[2]: stopped in /tmp/build/2022.08.17.12.35.10-i386/src/distrib/sets
1 error
nbmake[2]: stopped in /tmp/build/2022.08.17.12.35.10-i386/src/distrib/sets
nbmake[1]: stopped in /tmp/build/2022.08.17.12.35.10-i386/src
nbmake: stopped in /tmp/build/2022.08.17.12.35.10-i386/src
ERROR: Failed to make release

The following commits were made between the last successful build and
the failed build:

2022.08.17.12.17.43 kre src/external/public-domain/tz/dist/Makefile,v 
1.1.1.32
2022.08.17.12.17.43 kre src/external/public-domain/tz/dist/NEWS,v 1.1.1.36
2022.08.17.12.17.43 kre src/external/public-domain/tz/dist/calendars,v 
1.1.1.2
2022.08.17.12.17.44 kre src/external/public-domain/tz/dist/africa,v 1.1.1.27
2022.08.17.12.17.44 kre src/external/public-domain/tz/dist/theory.html,v 
1.1.1.14
2022.08.17.12.17.45 kre src/external/public-domain/tz/dist/antarctica,v 
1.1.1.15
2022.08.17.12.17.52 kre src/external/public-domain/tz/dist/europe,v 1.1.1.32
2022.08.17.12.17.54 kre src/external/public-domain/tz/dist/northamerica,v 
1.1.1.29
2022.08.17.12.17.54 kre src/external/public-domain/tz/dist/southamerica,v 
1.1.1.19
2022.08.17.12.17.55 kre src/external/public-domain/tz/dist/backzone,v 
1.1.1.22
2022.08.17.12.17.55 kre src/external/public-domain/tz/dist/etcetera,v 
1.1.1.6
2022.08.17.12.17.55 kre src/external/public-domain/tz/dist/zone1970.tab,v 
1.1.1.22
2022.08.17.12.17.56 kre src/external/public-domain/tz/dist/ziguard.awk,v 
1.1.1.8
2022.08.17.12.17.56 kre src/external/public-domain/tz/dist/zishrink.awk,v 
1.1.1.8
2022.08.17.12.17.56 kre src/external/public-domain/tz/dist/zone.tab,v 
1.1.1.21
2022.08.17.12.19.41 kre src/external/public-domain/tz/dist/TZDATA_VERSION,v 
1.28
2022.08.17.12.19.41 kre src/external/public-domain/tz/dist/asia,v 1.5
2022.08.17.12.19.41 kre src/external/public-domain/tz/dist/australasia,v 1.5
2022.08.17.12.19.41 kre src/external/public-domain/tz/dist/backward,v 1.4
2022.08.17.12.19.41 kre 
src/external/public-domain/tz/dist/leap-seconds.list,v 1.4
2022.08.17.12.19.41 kre src/external/public-domain/tz/dist/leapseconds,v 1.4
2022.08.17.12.19.41 kre src/external/public-domain/tz/dist/version,v 1.5
2022.08.17.12.24.42 kre src/doc/3RDPARTY,v 1.1869
2022.08.17.12.24.42 kre src/doc/CHANGES,v 1.2899
2022.08.17.12.25.47 kre src/doc/CHANGES,v 1.2900
2022.08.17.12.35.10 nat src/sbin/ifconfig/af_inetany.c,v 1.22

Logs can be found at:


http://releng.NetBSD.org/b5reports/i386/commits-2022.08.html#2022.08.17.12.35.10