Re: [gentoo-portage-dev] Add caching to a few commonly used functions

Zac Medico Sat, 27 Jun 2020 20:43:39 -0700

On 6/27/20 8:12 PM, Michał Górny wrote:
> Dnia June 28, 2020 3:00:00 AM UTC, Zac Medico <zmed...@gentoo.org> napisał(a):
>> On 6/26/20 11:34 PM, Chun-Yu Shei wrote:
>>> Hi,
>>>
>>> I was recently interested in whether portage could be speed up, since
>>> dependency resolution can sometimes take a while on slower machines.
>>> After generating some flame graphs with cProfile and vmprof, I found
>> 3
>>> functions which seem to be called extremely frequently with the same
>>> arguments: catpkgsplit, use_reduce, and match_from_list.  In the
>> first
>>> two cases, it was simple to cache the results in dicts, while
>>> match_from_list was a bit trickier, since it seems to be a
>> requirement
>>> that it return actual entries from the input "candidate_list".  I
>> also
>>> ran into some test failures if I did the caching after the
>>> mydep.unevaluated_atom.use and mydep.repo checks towards the end of
>> the
>>> function, so the caching is only done up to just before that point.
>>>
>>> The catpkgsplit change seems to definitely be safe, and I'm pretty
>> sure
>>> the use_reduce one is too, since anything that could possibly change
>> the
>>> result is hashed.  I'm a bit less certain about the match_from_list
>> one,
>>> although all tests are passing.
>>>
>>> With all 3 patches together, "emerge -uDvpU --with-bdeps=y @world"
>>> speeds up from 43.53 seconds to 30.96 sec -- a 40.6% speedup. 
>> "emerge
>>> -ep @world" is just a tiny bit faster, going from 18.69 to 18.22 sec
>>> (2.5% improvement).  Since the upgrade case is far more common, this
>>> would really help in daily use, and it shaves about 30 seconds off
>>> the time you have to wait to get to the [Yes/No] prompt (from ~90s to
>>> 60s) on my old Sandy Bridge laptop when performing normal upgrades.
>>>
>>> Hopefully, at least some of these patches can be incorporated, and
>> please
>>> let me know if any changes are necessary.
>>>
>>> Thanks,
>>> Chun-Yu
>>
>> Using global variables for caches like these causes a form of memory
>> leak for use cases involving long-running processes that need to work
>> with many different repositories (and perhaps multiple versions of
>> those
>> repositories).
>>
>> There are at least a couple of different strategies that we can use to
>> avoid this form of memory leak:
>>
>> 1) Limit the scope of the caches so that they have some sort of garbage
>> collection life cycle. For example, it would be natural for the
>> depgraph
>> class to have a local cache of use_reduce results, so that the cache
>> can
>> be garbage collected along with the depgraph.
>>
>> 2) Eliminate redundant calls. For example, redundant calls to
>> catpkgslit
>> can be avoided by constructing more _pkg_str instances, since
>> catpkgsplit is able to return early when its argument happens to be a
>> _pkg_str instance.
> 
> I think the weak stuff from the standard library might also be helpful.
> 
> --
> Best regards, 
> Michał Górny
>


Hmm, maybe weak global caches are an option?
-- 
Thanks,
Zac

signature.asc
Description: OpenPGP digital signature

Re: [gentoo-portage-dev] Add caching to a few commonly used functions

Reply via email to