On Mon, Feb 19, 2018 at 5:58 PM, Nithya Balachandran <nbala...@redhat.com> wrote: > > > On 19 February 2018 at 13:12, Atin Mukherjee <amukh...@redhat.com> wrote: >> >> >> >> On Mon, Feb 19, 2018 at 8:53 AM, Nigel Babu <nig...@redhat.com> wrote: >>> >>> Hello, >>> >>> As you all most likely know, we store the tarball of the binaries and >>> core if there's a core during regression. Occasionally, we've introduced a >>> bug in Gluster and this tar can take up a lot of space. This has happened >>> recently with brick multiplex tests. The build-install tar takes up 25G, >>> causing the machine to run out of space and continuously fail. >> >> >> AFAIK, we don't have a .t file in upstream regression suits where hundreds >> of volumes are created. With that scale and brick multiplexing enabled, I >> can understand the core will be quite heavy loaded and may consume up to >> this much of crazy amount of space. FWIW, can we first try to figure out >> which test was causing this crash and see if running a gcore after a certain >> steps in the tests do left us with a similar size of the core file? IOW, >> have we actually seen such huge size of core file generated earlier? If not, >> what changed because which we've started seeing this is something to be >> invested on. > > > We also need to check if this is only the core file that is causing the > increase in size or whether there is something else that is taking up a lot > of space. >> >> >>> >>> >>> I've made some changes this morning. Right after we create the tarball, >>> we'll delete all files in /archive that are greater than 1G. Please be aware >>> that this means all large files including the newly created tarball will be >>> deleted. You will have to work with the traceback on the Jenkins job. >> >> >> We'd really need to first investigate on the average size of the core file >> what we can get with when a system is running with brick multiplexing and >> ongoing I/O. With out that immediately deleting the core files > 1G will >> cause trouble to the developers in debugging genuine crashes as traceback >> alone may not be sufficient. >>
I'd like to echo what Nithya writes - instead of treating this incident as an outlier, we might want to do further analysis. If this has happened on a production system - there would be blood. _______________________________________________ Gluster-infra mailing list Gluster-infra@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-infra