Re: qemu CI & ccache: cache size is too small
03.06.2024 14:29, Daniel P. Berrangé wrote: Given your original job had cache of 447 MB, and new cache is 654 MB, the old cache is 68% of size of the new cache. So effectively your 63% is high 90's cache hit rate of what was present. Don't forget the way how old items are evicted from the cache. If we have N files to compile but the cache can only fit N-1 files, the cache hit ratio might be near zero - provided we compile files in the same order and oldest files gets evicted. When doing the compiles I forgot to reset cache stats before the second run (with larger cache), - the hit ratio should've been about 100% there. So we need the cache size not less than to hold WHOLE compilation plus a fine bit more so it wont evict things which can be reused in favor of changed files. This would suggest a cache size of 700 MB is more appropriate, unless some other jobs have even high usage needs. Yes, that seems right. I'd keep it at 800MB if possible. /mjt -- GPG Key transition (from rsa2048 to rsa4096) since 2024-04-24. New key: rsa4096/61AD3D98ECDF2C8E 9D8B E14E 3F2A 9DD7 9199 28F1 61AD 3D98 ECDF 2C8E Old key: rsa2048/457CE0A0804465C5 6EE1 95D1 886E 8FFB 810D 4324 457C E0A0 8044 65C5 Transition statement: http://www.corpit.ru/mjt/gpg-transition-2024.txt
Re: qemu CI & ccache: cache size is too small
On Mon, May 27, 2024 at 02:38:08PM +0300, Michael Tokarev wrote: > 27.05.2024 14:19, Thomas Huth wrote: > > On 27/05/2024 12.49, Michael Tokarev wrote: > > > Hi! > > > > > > Noticed today that a rebuild of basically the same tree (a few commits > > > apart) > > > in CI result in just 11% hit rate of ccache: > > > > > > https://gitlab.com/mjt0k/qemu/-/jobs/6947445337#L5054 > > > > For me, the results look better: > > > > https://gitlab.com/thuth/qemu/-/jobs/6918599017#L4954 > > Yeah, it's a bit better, but still not good enough. > I dunno how much changes the source had between the two runs. > It still had 11 cleanups, and the cache size is at the same level. > (It is an older ccache, too). > > > > while it should be near 100%. What's interesting in there is: > > > > > > 1) cache size is close to max cache size, > > > and more important, > > > 2) cleanups performed 78 > > > > > > so it has to remove old entries before it finished the build. > > > > Did you maybe switch between master and stable branches before that run? > > ... I guess that could have invalidated most of the cached files since > > we switched from CentOS 8 to 9 recently...? > > Nope, nothing else ran between the two and it was just a few > source-level commits (stable-8.2 pick ups), without changing > giltab/containers/etc configuration. > > I increased cache size to 900M and did another test run, here are > the results: https://gitlab.com/mjt0k/qemu/-/jobs/6947894974#L5054 > > cache directory /builds/mjt0k/qemu/ccache > primary config /builds/mjt0k/qemu/ccache/ccache.conf > secondary config (readonly)/etc/ccache.conf > stats updated Mon May 27 11:17:44 2024 > stats zeroedMon May 27 11:10:22 2024 > cache hit (direct) 1862 > cache hit (preprocessed) 274 > cache miss 1219 > cache hit rate 63.67 % > called for link 285 > called for preprocessing 71 > compiler produced empty output 5 > preprocessor error 2 > no input file 6 > cleanups performed 0 > files in cache 9948 > cache size 654.6 MB > max cache size 900.0 MB > > This is having in mind that the previous run was with CCACHE_SIZE=500M > and had multiple cleanups, so 63% is actually more than I'd expect already. Given your original job had cache of 447 MB, and new cache is 654 MB, the old cache is 68% of size of the new cache. So effectively your 63% is high 90's cache hit rate of what was present. This would suggest a cache size of 700 MB is more appropriate, unless some other jobs have even high usage needs. With regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
Re: qemu CI & ccache: cache size is too small
On Mon, May 27, 2024 at 01:49:41PM +0300, Michael Tokarev wrote: > Hi! > > Noticed today that a rebuild of basically the same tree (a few commits apart) > in CI result in just 11% hit rate of ccache: > > https://gitlab.com/mjt0k/qemu/-/jobs/6947445337#L5054 > > while it should be near 100%. What's interesting in there is: > > 1) cache size is close to max cache size, > and more important, > 2) cleanups performed 78 > > so it has to remove old entries before it finished the build. > > So effectively, our ccache usage is an extra burden, not help. I think this ends up being different per job. If I try the 'build-system-fedora' job, for example, I get a 99% cache hit rate, and 0.2 GB usage of cache storage https://gitlab.com/berrange/qemu/-/jobs/6876054586 $ ccache --show-stats Cacheable calls: 3018 / 3208 (94.08%) Hits: 49 / 3018 ( 1.62%) Direct: 0 / 49 ( 0.00%) Preprocessed:49 / 49 (100.0%) Misses: 2969 / 3018 (98.38%) Uncacheable calls: 190 / 3208 ( 5.92%) Local storage: Cache size (GB): 0.2 / 0.5 (30.55%) Hits: 49 / 3018 ( 1.62%) Misses: 2969 / 3018 (98.38%) If I compare the jobs, the big differences are the target lists: CentOS: '--target-list=ppc64-softmmu or1k-softmmu s390x-softmmu x86_64-softmmu rx-softmmu sh4-softmmu' Fedora: '--target-list=microblaze-softmmu mips-softmmu xtensa-softmmu m68k-softmmu riscv32-softmmu ppc-softmmu sparc64-softmmu' And then a few minor things: CentOS: '--disable-nettle' '--enable-gcrypt' '--enable-vfio-user-server' '--enable-modules' '--enable-trace-backends=dtrace' Fedora: '--disable-gcrypt' '--enable-nettle' the crypto won't make a diffeernce to caching. Modules ought not to make a difference either, as that's just moving some .o files from the exe to a so, not adding many more exes. The trace backends will add quite a few .o files, but I'm not sure that will impact cache. IOW, I bet the target list has the big difference on the amount of data that needs to be cached, to explain the different cache usage. I wonder what the picture looks like for cache hits / cache disk usage across all the other jobs. Is CentOS an outlier or is FEdora an outlier? We do want cache to be in the 90+% mark if possible as it has a big impact on build time. > I should be increased at least, I think. But it's actually difficult > to say really, - is the cache shared between all builds or is it unique > for each build config? Because if it the former, it shouldn't even > work since different ccache versions use different format of the files > in cache. It is unique per job per buildtest-template.yml: cache: paths: - ccache key: "$CI_JOB_NAME" when: always > What's unique in my pipeline run - I ran just a single build job > in two pipelines, nothing more. In my test I ran a job, then re-ran it in the same pipeline. With regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
Re: qemu CI & ccache: cache size is too small
27.05.2024 14:19, Thomas Huth wrote: On 27/05/2024 12.49, Michael Tokarev wrote: Hi! Noticed today that a rebuild of basically the same tree (a few commits apart) in CI result in just 11% hit rate of ccache: https://gitlab.com/mjt0k/qemu/-/jobs/6947445337#L5054 For me, the results look better: https://gitlab.com/thuth/qemu/-/jobs/6918599017#L4954 Yeah, it's a bit better, but still not good enough. I dunno how much changes the source had between the two runs. It still had 11 cleanups, and the cache size is at the same level. (It is an older ccache, too). while it should be near 100%. What's interesting in there is: 1) cache size is close to max cache size, and more important, 2) cleanups performed 78 so it has to remove old entries before it finished the build. Did you maybe switch between master and stable branches before that run? ... I guess that could have invalidated most of the cached files since we switched from CentOS 8 to 9 recently...? Nope, nothing else ran between the two and it was just a few source-level commits (stable-8.2 pick ups), without changing giltab/containers/etc configuration. I increased cache size to 900M and did another test run, here are the results: https://gitlab.com/mjt0k/qemu/-/jobs/6947894974#L5054 cache directory /builds/mjt0k/qemu/ccache primary config /builds/mjt0k/qemu/ccache/ccache.conf secondary config (readonly)/etc/ccache.conf stats updated Mon May 27 11:17:44 2024 stats zeroedMon May 27 11:10:22 2024 cache hit (direct) 1862 cache hit (preprocessed) 274 cache miss 1219 cache hit rate 63.67 % called for link 285 called for preprocessing 71 compiler produced empty output 5 preprocessor error 2 no input file 6 cleanups performed 0 files in cache 9948 cache size 654.6 MB max cache size 900.0 MB This is having in mind that the previous run was with CCACHE_SIZE=500M and had multiple cleanups, so 63% is actually more than I'd expect already. Thanks, /mjt -- GPG Key transition (from rsa2048 to rsa4096) since 2024-04-24. New key: rsa4096/61AD3D98ECDF2C8E 9D8B E14E 3F2A 9DD7 9199 28F1 61AD 3D98 ECDF 2C8E Old key: rsa2048/457CE0A0804465C5 6EE1 95D1 886E 8FFB 810D 4324 457C E0A0 8044 65C5 Transition statement: http://www.corpit.ru/mjt/gpg-transition-2024.txt
Re: qemu CI & ccache: cache size is too small
On 27/05/2024 12.49, Michael Tokarev wrote: Hi! Noticed today that a rebuild of basically the same tree (a few commits apart) in CI result in just 11% hit rate of ccache: https://gitlab.com/mjt0k/qemu/-/jobs/6947445337#L5054 For me, the results look better: https://gitlab.com/thuth/qemu/-/jobs/6918599017#L4954 while it should be near 100%. What's interesting in there is: 1) cache size is close to max cache size, and more important, 2) cleanups performed 78 so it has to remove old entries before it finished the build. Did you maybe switch between master and stable branches before that run? ... I guess that could have invalidated most of the cached files since we switched from CentOS 8 to 9 recently...? Thomas
qemu CI & ccache: cache size is too small
Hi! Noticed today that a rebuild of basically the same tree (a few commits apart) in CI result in just 11% hit rate of ccache: https://gitlab.com/mjt0k/qemu/-/jobs/6947445337#L5054 while it should be near 100%. What's interesting in there is: 1) cache size is close to max cache size, and more important, 2) cleanups performed 78 so it has to remove old entries before it finished the build. So effectively, our ccache usage is an extra burden, not help. I should be increased at least, I think. But it's actually difficult to say really, - is the cache shared between all builds or is it unique for each build config? Because if it the former, it shouldn't even work since different ccache versions use different format of the files in cache. What's unique in my pipeline run - I ran just a single build job in two pipelines, nothing more. Thanks, /mjt -- GPG Key transition (from rsa2048 to rsa4096) since 2024-04-24. New key: rsa4096/61AD3D98ECDF2C8E 9D8B E14E 3F2A 9DD7 9199 28F1 61AD 3D98 ECDF 2C8E Old key: rsa2048/457CE0A0804465C5 6EE1 95D1 886E 8FFB 810D 4324 457C E0A0 8044 65C5 Transition statement: http://www.corpit.ru/mjt/gpg-transition-2024.txt