TB Cache size grows out of control with qemu 5.0

Christian Ehrhardt Wed, 15 Jul 2020 07:31:22 -0700

Hi,
Since qemu 5.0 I found that Ubuntu Test environments crashed often.
After a while I found that all affected tests run qemu TCG and
they get into OOM conditions that kills the qemu process.


Steps to reproduce:
Run TCG on a guest image until boot settles:
  $ wget
http://cloud-images.ubuntu.com/daily/server/groovy/20200714/groovy-server-cloudimg-amd64.img
  $ qemu-system-x86_64 -name guest=groovy-testguest,debug-threads=on
-machine pc-q35-focal,accel=tcg,usb=off,dump-guest-core=off -cpu qemu64 -m
512 -overcommit mem-lock=off -smp 1,sockets=1,cores=1,threads=1 -hda
groovy-server-cloudimg-amd64.img -nographic -serial file:/tmp/serial.debug
I usually wait until I see no faults anymore in pidstat (indicates bootup is
complete). Also at that point the worker threads vanish or at least reduce
significantly.
Then I checked RSS sizes:
  $ pidstat -p $(pgrep -f 'name guest=groovy-testguest') -T ALL -rt 5

To know the expected deviations I ran it a few times with old/new version

       qemu 4.2           qemu 5.0
    VSZ     RSS        VSZ     RSS
1735828  642172    2322668 1641532
1424596  602368    2374068 1628788
1570060  611372    2789648 1676748
1556696  611240    2981112 1658196
1388844  649696    2443716 1636896
1597788  644584    2989336 1635516

That is ~+160%

I was wondering if that might be the new toolchain or due to features
instead of TCG (even though all non TCG tests showed no issue).
I ran the same with -enable-kvm which shows no difference worth to report:

accel=kvm Old qemu:   accel=kvm New qemu:
    VSZ     RSS           VSZ     RSS
1844232  489224       1195880  447696
1068784  448324       1330036  484464
1583020  448708       1380408  468588
1247244  493980       1244148  493188
1702912  483444       1247672  454444
1287980  448480       1983548  501184

So it seems to come down to "4.2 TCG vs 5.0 TCG"
Therefore I have spun up a 4.2 and a 5.0 qemu with TCG showing this ~+160%
increased memory consumption.
Using smem I then check where the consumption was per mapping:

# smem --reverse --mappings --abbreviate --processfilter=qemu-system-x86_64
| head -n 10
                           qemu 4.2          qemu 5.0
Map                 AVGPSS      PSS   AVGPSS      PSS
<anonymous>         289.5M   579.0M   811.5M     1.6G
qemu-system-x86_64    9.1M     9.1M     9.2M     9.2M
[heap]                2.8M     5.6M     3.4M     6.8M
/usr/bin/python3.8    1.8M     1.8M     1.8M     1.8M
/libepoxy.so.0.     448.0K   448.0K   448.0K   448.0K
/libcrypto.so.1     296.0K   296.0K   275.0K   275.0K
/libgnutls.so.3     234.0K   234.0K   230.0K   230.0K
/libasound.so.2     208.0K   208.0K   208.0K   208.0K
/libssl.so.1.1      180.0K   180.0K    92.0K   184.0K

So all the increase is in anon memory of qemu.

Since it is TCG I ran `info jit` in the Monitor

qemu 4.2:
(qemu) info jit
Translation buffer state:
gen code size       99.781.715/134.212.563
TB count            183622
TB avg target size  18 max=1992 bytes
TB avg host size    303 bytes (expansion ratio: 16.4)
cross page TB count 797 (0%)
direct jump count   127941 (69%) (2 jumps=91451 49%)
TB hash buckets     98697/131072 (75.30% head buckets used)
TB hash occupancy   34.04% avg chain occ. Histogram: [0,10)%|▆ █
 ▅▁▃▁▁|[90,100]%
TB hash avg chain   1.020 buckets. Histogram: 1|█▁▁|3

Statistics:
TB flush count      14
TB invalidate count 92226
TLB full flushes    1
TLB partial flushes 175405
TLB elided flushes  233747
[TCG profiler not compiled]

qemu 5.0:
(qemu) info jit
Translation buffer state:
gen code size       259.896.403/1.073.736.659
TB count            456365
TB avg target size  20 max=1992 bytes
TB avg host size    328 bytes (expansion ratio: 16.1)
cross page TB count 2020 (0%)
direct jump count   309815 (67%) (2 jumps=227122 49%)
TB hash buckets     216220/262144 (82.48% head buckets used)
TB hash occupancy   41.36% avg chain occ. Histogram: [0,10)%|▅ █
 ▇▁▄▁▂|[90,100]%
TB hash avg chain   1.039 buckets. Histogram: 1|█▁▁|3

Statistics:
TB flush count      1
TB invalidate count 463653
TLB full flushes    0
TLB partial flushes 178464
TLB elided flushes  242382
[TCG profiler not compiled]

Well I see the numbers increase, but this isn't my home turf anymore.

The one related tunabel I know is -tb-size I ran both versions with
  -tb-size 150
And the result gave me two similarly working qemu processes.
RSS
qemu 4.2: 628072 635528
qemu 5.0: 655628 634952

Seems to be "good again" with that tunable set.

It seems the TB default sizing, self size reduction or something like
it cache has changed. On a system with ~1.5G for example (which matches our
testbeds) I'd expect it to back down a bit before being OOM Killed consuming
almost 100% of the memory.

My next step is to build qemu from source without an Ubuntu downstream
delta.
That should help to further track it down and also provide some results of
the soon to be released 5.1. That will probably take until tomorrow,
I'll report here again then.

I searched the mailing list and the web for this behavior, but either I use
the wrong keywords or it wasn't reported/discussed yet.
Nor does [1] list anything that sounds related
But if this already rings a bell for someone please let me know.
Thanks in advance!

[1]: https://wiki.qemu.org/ChangeLog/5.0#TCG

-- 
Christian Ehrhardt
Staff Engineer, Ubuntu Server
Canonical Ltd

TB Cache size grows out of control with qemu 5.0

Reply via email to