2TB seems very little for a data hoarder like me, I happen to have ~30TB of 
storage at home. 

10-14TB hard drives are not really so expensive.

While I totally agree that we should control/limit better what we collect, I 
wonder if we should not aim to get a setup where we do not struggle with disk 
space, where we can keep logs for ~60 days without having to think too much 
about running out of disk space.

Another approach which I used in the past was a cleanup script that was 
removing old builds based on their age as long the free disk space was under a 
specific value (10%?). That means a dynamic retention period.

Thanks
Sorin

> On 13 Jun 2019, at 15:58, Wesley Hayutin <whayu...@redhat.com> wrote:
> 
> 
> 
> On Thu, Jun 13, 2019 at 8:55 AM Javier Pena <jp...@redhat.com 
> <mailto:jp...@redhat.com>> wrote:
> 
> 
> 
> 
> On Thu, Jun 13, 2019 at 8:22 AM Javier Pena <jp...@redhat.com 
> <mailto:jp...@redhat.com>> wrote:
> Hi all,
> 
> For the last few days, I have been monitoring a spike in disk space 
> utilization for logs.rdoproject.org <http://logs.rdoproject.org/>. The 
> current situation is:
> 
> - 94% of space used, with less than 140GB out of 2TB available.
> - The log pruning script has been reclaiming less space than we are using for 
> new logs during this week.
> - I expect the situation to improve over the weekend, but we're definitely 
> running out of space.
> 
> I have looked at a random job (https://review.opendev.org/639324 
> <https://review.opendev.org/639324>, patch set 26), and found that each run 
> is consuming 1.2 GB of disk space in logs. The worst offenders I have found 
> are:
> 
> - atop.bin.gz files (one per job, 8 jobs per recheck), ranging between 15 and 
> 40 MB each
> - logs/undercloud/home/zuul/tempest/.stackviz directory on 
> tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001 jobs, which is a virtualenv 
> eating up 81 MB.
> 
> Can we sync up w/ how you are calculating these results as they do not match 
> our results.
> I see each job consuming about 215M of space, we are close on stackviz being 
> 83M. Oddly I don't see atop.bin.gz in our calculations so I'll have to look 
> into that.
> I've checked it directly using du on the logserver. By 1.2 GB I meant the 
> aggregate of the 8 jobs running for a single patchset. PS26 is currently 
> using 2.5 GB and had one recheck.
> 
> About the atop.bin.gz file:
> 
> # find . -name atop.bin.gz -exec du -sh {} \;
> 16M    
> ./tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-queens-branch/042cb8f/logs/undercloud/var/log/atop.bin.gz
> 16M    
> ./tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-queens-branch/e4171d7/logs/undercloud/var/log/atop.bin.gz
> 28M    
> ./tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-rocky-branch/ffd4de9/logs/undercloud/var/log/atop.bin.gz
> 26M    
> ./tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-rocky-branch/34d44bf/logs/undercloud/var/log/atop.bin.gz
> 25M    
> ./tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-stein-branch/b89761d/logs/undercloud/var/log/atop.bin.gz
> 24M    
> ./tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-stein-branch/9ade834/logs/undercloud/var/log/atop.bin.gz
> 29M    
> ./tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset053/a10447d/logs/undercloud/var/log/atop.bin.gz
> 44M    
> ./tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset053/99a5f9f/logs/undercloud/var/log/atop.bin.gz
> 15M    
> ./tripleo-ci-centos-7-multinode-1ctlr-featureset010/c8a8c60/logs/subnode-2/var/log/atop.bin.gz
> 33M    
> ./tripleo-ci-centos-7-multinode-1ctlr-featureset010/c8a8c60/logs/undercloud/var/log/atop.bin.gz
> 16M    
> ./tripleo-ci-centos-7-multinode-1ctlr-featureset010/73ef532/logs/subnode-2/var/log/atop.bin.gz
> 33M    
> ./tripleo-ci-centos-7-multinode-1ctlr-featureset010/73ef532/logs/undercloud/var/log/atop.bin.gz
> 40M    
> ./tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset035/109d5ae/logs/undercloud/var/log/atop.bin.gz
> 45M    
> ./tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset035/c2ebeae/logs/undercloud/var/log/atop.bin.gz
> 39M    
> ./tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001/7fe5bbb/logs/undercloud/var/log/atop.bin.gz
> 16M    
> ./tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001/5e6cb0f/logs/undercloud/var/log/atop.bin.gz
> 40M    
> ./tripleo-ci-centos-7-ovb-3ctlr_1comp_1supp-featureset039/c6bf5ea/logs/undercloud/var/log/atop.bin.gz
> 40M    
> ./tripleo-ci-centos-7-ovb-3ctlr_1comp_1supp-featureset039/6ec5ac6/logs/undercloud/var/log/atop.bin.gz
> 
> Can I safely delete all .stackviz directories? I guess that would give us 
> some breathing room.
> 
> Yup, go for it
>  
> 
> Regards,
> Javier
> 
> Each job reports the size of the logs e.g. [1]
> http://logs.rdoproject.org/24/639324/26/openstack-check/tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-stein-branch/9ade834/logs/quickstart_files/log-size.txt
>  
> <http://logs.rdoproject.org/24/639324/26/openstack-check/tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-stein-branch/9ade834/logs/quickstart_files/log-size.txt>
> 
> 
> As a temporary measure, I am reducing log retention from 21 days to 14, but 
> we still need to reduce the rate at which we are uploading logs. Would it be 
> possible to check the oooq-generated logs and see where we can reduce? These 
> jobs are by far the ones consuming most space.
> 
> Thanks,
> Javier
> _______________________________________________
> dev mailing list
> dev@lists.rdoproject.org <mailto:dev@lists.rdoproject.org>
> http://lists.rdoproject.org/mailman/listinfo/dev 
> <http://lists.rdoproject.org/mailman/listinfo/dev>
> 
> To unsubscribe: dev-unsubscr...@lists.rdoproject.org 
> <mailto:dev-unsubscr...@lists.rdoproject.org>
> 
> _______________________________________________
> dev mailing list
> dev@lists.rdoproject.org
> http://lists.rdoproject.org/mailman/listinfo/dev
> 
> To unsubscribe: dev-unsubscr...@lists.rdoproject.org

_______________________________________________
dev mailing list
dev@lists.rdoproject.org
http://lists.rdoproject.org/mailman/listinfo/dev

To unsubscribe: dev-unsubscr...@lists.rdoproject.org

Reply via email to