Le mercredi 06 mars 2019 à 21:31 +0530, Sankarshan Mukhopadhyay a écrit : > On Wed, Mar 6, 2019 at 8:47 PM Michael Scherer <msche...@redhat.com> > wrote: > > > > Le mercredi 06 mars 2019 à 17:53 +0530, Sankarshan Mukhopadhyay a > > écrit : > > > On Wed, Mar 6, 2019 at 5:38 PM Deepshikha Khandelwal > > > <dkhan...@redhat.com> wrote: > > > > > > > > Hello, > > > > > > > > Today while debugging the centos7-regression failed builds I > > > > saw > > > > most of the builders did not pass the instance status check on > > > > AWS > > > > and were unreachable. > > > > > > > > Misc investigated this and came to know about the patch[1] > > > > which > > > > seems to break the builder one after the other. They all ran > > > > the > > > > regression test for this specific change before going offline. > > > > We suspect that this change do result in infinite loop of > > > > processes > > > > as we did not see any trace of error in the system logs. > > > > > > > > We did reboot all those builders and they all seem to be > > > > running > > > > fine now. > > > > > > > > > > The question though is - what to do about the patch, if the patch > > > itself is the root cause? Is this assigned to anyone to look > > > into? > > > > We also pondered on wether we should protect the builder from that > > kind > > of issue. But since: > > - we are not sure that the hypothesis is right > > - any protection based on "limit the number of process" would > > surely > > sooner or later block legitimate tests, and requires adjustement > > (and > > likely investigation) > > > > we didn't choose to follow that road for now. > > > > This is a good topic though. Is there any logical way to fence off > the > builders from noisy neighbors?
I am not sure to follow the question, what I had in mind was more to just regular ulimit to avoid the equivalent of a fork bomb (again, if the hypothesis is the right one). Since our builders are running 1 job at a time, there is no noisy neighbor issues, or rather, since that's AWS, we can't control anything regarding contention of shared ressources anyway . -- Michael Scherer Sysadmin, Community Infrastructure and Platform, OSAS
signature.asc
Description: This is a digitally signed message part
_______________________________________________ Gluster-infra mailing list Gluster-infra@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-infra