Re: qemu CI & ccache: cache size is too small

2024-06-03 Thread Michael Tokarev

03.06.2024 14:29, Daniel P. Berrangé wrote:


Given your original job had cache of 447 MB, and new cache is 654 MB, the
old cache is 68% of size of the new cache. So effectively your 63% is
high 90's cache hit rate of what was present.


Don't forget the way how old items are evicted from the cache.  If we have
N files to compile but the cache can only fit N-1 files, the cache hit ratio
might be near zero - provided we compile files in the same order and oldest
files gets evicted.

When doing the compiles I forgot to reset cache stats before the second run
(with larger cache), - the hit ratio should've been about 100% there.

So we need the cache size not less than to hold WHOLE compilation plus a fine
bit more so it wont evict things which can be reused in favor of changed
files.


This would suggest a cache size of 700 MB is more appropriate, unless some
other jobs have even high usage needs.


Yes, that seems right.  I'd keep it at 800MB if possible.

/mjt
--
GPG Key transition (from rsa2048 to rsa4096) since 2024-04-24.
New key: rsa4096/61AD3D98ECDF2C8E  9D8B E14E 3F2A 9DD7 9199  28F1 61AD 3D98 
ECDF 2C8E
Old key: rsa2048/457CE0A0804465C5  6EE1 95D1 886E 8FFB 810D  4324 457C E0A0 
8044 65C5
Transition statement: http://www.corpit.ru/mjt/gpg-transition-2024.txt




Re: qemu CI & ccache: cache size is too small

2024-06-03 Thread Daniel P . Berrangé
On Mon, May 27, 2024 at 02:38:08PM +0300, Michael Tokarev wrote:
> 27.05.2024 14:19, Thomas Huth wrote:
> > On 27/05/2024 12.49, Michael Tokarev wrote:
> > > Hi!
> > > 
> > > Noticed today that a rebuild of basically the same tree (a few commits 
> > > apart)
> > > in CI result in just 11% hit rate of ccache:
> > > 
> > > https://gitlab.com/mjt0k/qemu/-/jobs/6947445337#L5054
> > 
> > For me, the results look better:
> > 
> >   https://gitlab.com/thuth/qemu/-/jobs/6918599017#L4954
> 
> Yeah, it's a bit better, but still not good enough.
> I dunno how much changes the source had between the two runs.
> It still had 11 cleanups, and the cache size is at the same level.
> (It is an older ccache, too).
> 
> > > while it should be near 100%.  What's interesting in there is:
> > > 
> > > 1) cache size is close to max cache size,
> > > and more important,
> > > 2) cleanups performed 78
> > > 
> > > so it has to remove old entries before it finished the build.
> > 
> > Did you maybe switch between master and stable branches before that run?
> > ... I guess that could have invalidated most of the cached files since
> > we switched from CentOS 8 to 9 recently...?
> 
> Nope, nothing else ran between the two and it was just a few
> source-level commits (stable-8.2 pick ups), without changing
> giltab/containers/etc configuration.
> 
> I increased cache size to 900M and did another test run, here are
> the results: https://gitlab.com/mjt0k/qemu/-/jobs/6947894974#L5054
> 
> cache directory /builds/mjt0k/qemu/ccache
> primary config  /builds/mjt0k/qemu/ccache/ccache.conf
> secondary config  (readonly)/etc/ccache.conf
> stats updated   Mon May 27 11:17:44 2024
> stats zeroedMon May 27 11:10:22 2024
> cache hit (direct)  1862
> cache hit (preprocessed) 274
> cache miss  1219
> cache hit rate 63.67 %
> called for link  285
> called for preprocessing  71
> compiler produced empty output 5
> preprocessor error 2
> no input file  6
> cleanups performed 0
> files in cache  9948
> cache size 654.6 MB
> max cache size 900.0 MB
> 
> This is having in mind that the previous run was with CCACHE_SIZE=500M
> and had multiple cleanups, so 63% is actually more than I'd expect already.

Given your original job had cache of 447 MB, and new cache is 654 MB, the
old cache is 68% of size of the new cache. So effectively your 63% is
high 90's cache hit rate of what was present.

This would suggest a cache size of 700 MB is more appropriate, unless some
other jobs have even high usage needs.


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: qemu CI & ccache: cache size is too small

2024-06-03 Thread Daniel P . Berrangé
On Mon, May 27, 2024 at 01:49:41PM +0300, Michael Tokarev wrote:
> Hi!
> 
> Noticed today that a rebuild of basically the same tree (a few commits apart)
> in CI result in just 11% hit rate of ccache:
> 
> https://gitlab.com/mjt0k/qemu/-/jobs/6947445337#L5054
> 
> while it should be near 100%.  What's interesting in there is:
> 
> 1) cache size is close to max cache size,
> and more important,
> 2) cleanups performed 78
> 
> so it has to remove old entries before it finished the build.
> 
> So effectively, our ccache usage is an extra burden, not help.

I think this ends up being different per job. If I try the
'build-system-fedora' job, for example, I get a 99% cache
hit rate, and 0.2 GB usage of cache storage

https://gitlab.com/berrange/qemu/-/jobs/6876054586

$ ccache --show-stats
Cacheable calls:   3018 / 3208 (94.08%)
  Hits:  49 / 3018 ( 1.62%)
Direct:   0 /   49 ( 0.00%)
Preprocessed:49 /   49 (100.0%)
  Misses:  2969 / 3018 (98.38%)
Uncacheable calls:  190 / 3208 ( 5.92%)
Local storage:
  Cache size (GB):  0.2 /  0.5 (30.55%)
  Hits:  49 / 3018 ( 1.62%)
  Misses:  2969 / 3018 (98.38%)

If I compare the jobs, the big differences are the target lists:

  CentOS: '--target-list=ppc64-softmmu or1k-softmmu s390x-softmmu 
x86_64-softmmu rx-softmmu sh4-softmmu'

  Fedora: '--target-list=microblaze-softmmu mips-softmmu xtensa-softmmu 
m68k-softmmu riscv32-softmmu ppc-softmmu sparc64-softmmu'

And  then a few minor things:

  CentOS: '--disable-nettle' '--enable-gcrypt' '--enable-vfio-user-server' 
'--enable-modules' '--enable-trace-backends=dtrace'

  Fedora: '--disable-gcrypt' '--enable-nettle'

the crypto won't make a diffeernce to caching. Modules ought not to make a
difference either, as that's just moving some .o files from the exe to a
so, not adding many more exes.

The trace backends will add quite a few .o files, but I'm not sure that
will impact cache.

IOW, I bet the target list has the big difference on the amount of data
that needs to be cached, to explain the different cache usage.

I wonder what the picture looks like for cache hits / cache disk usage
across all the other jobs. Is CentOS an outlier or is FEdora an outlier?

We do want cache to be in the 90+% mark if possible as it has a big impact
on build time.



> I should be increased at least, I think.  But it's actually difficult
> to say really, - is the cache shared between all builds or is it unique
> for each build config?  Because if it the former, it shouldn't even
> work since different ccache versions use different format of the files
> in cache.

It is unique per job per buildtest-template.yml:

  cache:
paths:
  - ccache
key: "$CI_JOB_NAME"
when: always


> What's unique in my pipeline run - I ran just a single build job
> in two pipelines, nothing more.

In my test I ran a job, then re-ran it in the same pipeline.


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: qemu CI & ccache: cache size is too small

2024-05-27 Thread Michael Tokarev

27.05.2024 14:19, Thomas Huth wrote:

On 27/05/2024 12.49, Michael Tokarev wrote:

Hi!

Noticed today that a rebuild of basically the same tree (a few commits apart)
in CI result in just 11% hit rate of ccache:

https://gitlab.com/mjt0k/qemu/-/jobs/6947445337#L5054


For me, the results look better:

  https://gitlab.com/thuth/qemu/-/jobs/6918599017#L4954


Yeah, it's a bit better, but still not good enough.
I dunno how much changes the source had between the two runs.
It still had 11 cleanups, and the cache size is at the same level.
(It is an older ccache, too).


while it should be near 100%.  What's interesting in there is:

1) cache size is close to max cache size,
and more important,
2) cleanups performed 78

so it has to remove old entries before it finished the build.


Did you maybe switch between master and stable branches before that run? ... I guess that could have invalidated most of the cached files since we 
switched from CentOS 8 to 9 recently...?


Nope, nothing else ran between the two and it was just a few
source-level commits (stable-8.2 pick ups), without changing
giltab/containers/etc configuration.

I increased cache size to 900M and did another test run, here are
the results: https://gitlab.com/mjt0k/qemu/-/jobs/6947894974#L5054

cache directory /builds/mjt0k/qemu/ccache
primary config  /builds/mjt0k/qemu/ccache/ccache.conf
secondary config  (readonly)/etc/ccache.conf
stats updated   Mon May 27 11:17:44 2024
stats zeroedMon May 27 11:10:22 2024
cache hit (direct)  1862
cache hit (preprocessed) 274
cache miss  1219
cache hit rate 63.67 %
called for link  285
called for preprocessing  71
compiler produced empty output 5
preprocessor error 2
no input file  6
cleanups performed 0
files in cache  9948
cache size 654.6 MB
max cache size 900.0 MB

This is having in mind that the previous run was with CCACHE_SIZE=500M
and had multiple cleanups, so 63% is actually more than I'd expect already.

Thanks,

/mjt

--
GPG Key transition (from rsa2048 to rsa4096) since 2024-04-24.
New key: rsa4096/61AD3D98ECDF2C8E  9D8B E14E 3F2A 9DD7 9199  28F1 61AD 3D98 
ECDF 2C8E
Old key: rsa2048/457CE0A0804465C5  6EE1 95D1 886E 8FFB 810D  4324 457C E0A0 
8044 65C5
Transition statement: http://www.corpit.ru/mjt/gpg-transition-2024.txt




Re: qemu CI & ccache: cache size is too small

2024-05-27 Thread Thomas Huth

On 27/05/2024 12.49, Michael Tokarev wrote:

Hi!

Noticed today that a rebuild of basically the same tree (a few commits apart)
in CI result in just 11% hit rate of ccache:

https://gitlab.com/mjt0k/qemu/-/jobs/6947445337#L5054


For me, the results look better:

 https://gitlab.com/thuth/qemu/-/jobs/6918599017#L4954


while it should be near 100%.  What's interesting in there is:

1) cache size is close to max cache size,
and more important,
2) cleanups performed 78

so it has to remove old entries before it finished the build.


Did you maybe switch between master and stable branches before that run? ... 
I guess that could have invalidated most of the cached files since we 
switched from CentOS 8 to 9 recently...?


 Thomas





qemu CI & ccache: cache size is too small

2024-05-27 Thread Michael Tokarev

Hi!

Noticed today that a rebuild of basically the same tree (a few commits apart)
in CI result in just 11% hit rate of ccache:

https://gitlab.com/mjt0k/qemu/-/jobs/6947445337#L5054

while it should be near 100%.  What's interesting in there is:

1) cache size is close to max cache size,
and more important,
2) cleanups performed 78

so it has to remove old entries before it finished the build.

So effectively, our ccache usage is an extra burden, not help.

I should be increased at least, I think.  But it's actually difficult
to say really, - is the cache shared between all builds or is it unique
for each build config?  Because if it the former, it shouldn't even
work since different ccache versions use different format of the files
in cache.

What's unique in my pipeline run - I ran just a single build job
in two pipelines, nothing more.

Thanks,

/mjt
--
GPG Key transition (from rsa2048 to rsa4096) since 2024-04-24.
New key: rsa4096/61AD3D98ECDF2C8E  9D8B E14E 3F2A 9DD7 9199  28F1 61AD 3D98 
ECDF 2C8E
Old key: rsa2048/457CE0A0804465C5  6EE1 95D1 886E 8FFB 810D  4324 457C E0A0 
8044 65C5
Transition statement: http://www.corpit.ru/mjt/gpg-transition-2024.txt