Hi Mikhail, Okay thanks, that's helpful.
You mentioned that you might try restarting zuul periodically to see if that helps. Perhaps instead you could do a reload (or HUP) first to see if that clears the cache and alleviates the issue for you? Cheers, Josh On Tue, Mar 8, 2016 at 10:53 AM, Mikhail Medvedev <mihail...@gmail.com> wrote: > Hi Josh, > > On Mon, Mar 7, 2016 at 5:25 PM, Joshua Hesketh <joshua.hesk...@gmail.com> > wrote: > > Hi Mikhail, > > > > Thank you for the extra details. I'll continue to look into this. > > > > With the daily bumps when you do the log rotation, I assume you aren't > > reloading zuul at that point and the freed memory is likely due to > another > > process? > > I was puzzled by the bumps, and checked the syslog. They are definitely > due to > "run-parts --report /etc/cron.daily" being triggered at 06:25, and not > zuul reloads. > The memory bumps could be due to any of the cron jobs. logrotate seemed > likely. > For the record: > > root@zuul:~# ls /etc/cron.daily > apache2 apport apt aptitude bsdmainutils dpkg exim4-base > logrotate man-db mlocate ntp passwd update-notifier-common > upstart > > I have also confirmed there were no changes to zuul layout for the > interval that > the graph shows. > > > > > Cheers, > > Josh > > > > On Tue, Mar 8, 2016 at 10:17 AM, Mikhail Medvedev <mihail...@gmail.com> > > wrote: > >> > >> On Wed, Feb 10, 2016 at 10:57 AM, James E. Blair <cor...@inaugust.com> > >> wrote: > >> > Michael Still <mi...@stillhq.com> writes: > >> > > >> >> On Tue, Feb 9, 2016 at 4:59 AM, Joshua Hesketh > >> >> <joshua.hesk...@gmail.com> > >> >> wrote: > >> >> > >> >>> On Thu, Feb 4, 2016 at 2:44 AM, James E. Blair <cor...@inaugust.com > > > >> >>> wrote: > >> >>>> > >> >>>> On the subject of clearing the cache more often, I think we may not > >> >>>> want > >> >>>> to wipe out the cache more often than we do now -- in fact, I think > >> >>>> we > >> >>>> may want to look into ways to keep from doing even that, because > >> >>>> whenever we reload now, Zuul slows down considerably as it has to > >> >>>> query > >> >>>> Gerrit again for all of the data previously in its cache. > >> >>>> > >> >>> > >> >>> I can see a lot of 3rd parties or simpler CI's not needing to reload > >> >>> zuul > >> >>> very often so this cache would never get cleared. Perhaps cached > >> >>> objects > >> >>> should have an expiry time (of a day or so) and can be cleaned up > >> >>> periodically? Additionally if clearing the cache on a reload is > >> >>> causing > >> >>> pain maybe we should move the cache into the scheduler and keep it > >> >>> between > >> >>> reloads? > >> >>> > >> >> > >> >> Do you guys use oslo at all? I ask because the olso memcache stuff > does > >> >> exactly this, so it should be trivial to implement if you don't mind > >> >> depending on oslo. > >> > > >> > One of the main things we use the cache for is to ensure that every > >> > change is represented by a single Change object in Zuul's memory. The > >> > graph of enqueued Items link to their respective Changes which may > link > >> > to each other due to dependencies. When something changes in Gerrit, > we > >> > want that reflected immediately and consistently in all of the objects > >> > in that graph. Using the cache means that every time we add a new > >> > Change object to that graph, we use the same object for a given > change. > >> > > >> > This is why we can't use time-based expiry -- we must not drop objects > >> > from the cache if they are still in the graph. Otherwise we will > create > >> > new duplicative objects and the ones still in the graph will not be > >> > updated. > >> > > >> > Perhaps we should change these objects to something more ephemeral > that > >> > can proxy for some other mechanism that can operate more like a > >> > traditional cache (with time-based expiry). But I think changes to > this > >> > system should happen in Zuulv3 -- it works well enough for Zuulv2 for > >> > now. > >> > > >> > -Jim > >> > > >> > >> We are one of third-party CIs and using "Zuul version: 2.1.1.dev123", > >> which is one commit after [1]. That one commit after is not in tree - I > am > >> applying [2] on top. > >> > >> The VM has 8GB of RAM. zuul-server memory footprint goes up consistently > >> over > >> the course of a week. Normally it takes about 3-4 days to get over to > 3Gb. > >> About a week ago I witnessed zuul-server get to 95% of RAM, at which > point > >> kernel started killing other processes. The graph [3] memory [3], and it > >> reflects zuul-server consumption. The daily bumps on the graph are daily > >> cron > >> doing log rotation etc, possibly flushing caches. > >> > >> I can not say 100% that it is still the leak. Could simply be that > >> zuul-server > >> requires more ram now. > >> > >> [1] > >> > https://review.openstack.org/#q,I81ee47524cda71a500c55a95a2280f491b1b63d9,n,z > >> [2] > >> > https://review.openstack.org/#q,If3a418fa2d4993a149d454e02a9b26529e4b6825,n,z > >> [3] http://imgur.com/SzqSA1H > >> > >> Mikhail Medvedev (mmedvede) > >> > >> _______________________________________________ > >> OpenStack-Infra mailing list > >> OpenStack-Infra@lists.openstack.org > >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra > > > > >
_______________________________________________ OpenStack-Infra mailing list OpenStack-Infra@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra