On Mon, Apr 11, 2016 at 9:39 AM, Morales, Victor <victor.mora...@intel.com> wrote: > > > > > > On 4/11/16, 5:07 AM, "Jakub Libosvar" <jlibo...@redhat.com> wrote: > >>Hi, >> >>recently we hit an issue in Neutron with tests getting stuck [1]. As a >>side effect we discovered logs are not collected properly which makes it >>hard to find the root cause. The reason of missing logs is that we send >>SIGKILL to whatever gate hook is running when we hit the global timeout >>per gate job [2]. This gives no time to running process to perform any >>post-processing. In post_gate_hook function in Neutron, we collect logs >>from /tmp directory, compress them and move them to /opt/stack/logs to >>make them exposed. >> >>I have in mind two solutions to which I'd like to get feedback before >>sending patches. >> >>1) In Neutron, we execute tests in post_gate_hook (dunno why). But even >>if we would have moved test execution into gate_hook and tests get stuck >>then the post_gate_hook won't be triggered [3]. So the solution I >>propose here is to terminate gate_hook N minutes before global timeout >>and still execute post_gate_hook (with timeout) as post-processing routine. >> >>2) Second proposal is to let timeout wrapped commands know they are >>about to be killed. We can send let's say SIGTERM instead of SIGKILL and >>after certain amount of time, send SIGKILL. Example: We send SIGTERM 3 >>minutes before global timeout, letting these 3 minutes to 'command' to >>handle the SIGTERM signal. >> >> timeout -s 15 -k 3 $((REMAINING_TIME-3))m bash -c "command" >> >>With the 2nd approach we can trap the signal that kills running test >>suite and collects logs with same functions we currently have. >> >> >>I would personally go with second option but I want to hear if anybody >>has a better idea about post processing in gate jobs or if there is >>already a tool we can use to collect logs. > > I also like the second option, it seems less aggressive and give opportunity > to catch > more information before killing processes. Ideally, timeouts are ultimatums > for worst-case scenarios > and should be never reach it.
Kuba and I discussed this issue at length - I also think the 2nd approach is reasonable but I'd like to see what more Devstack oriented folks think. > >> >>Thanks, >>Kuba >> >> >>[1] https://bugs.launchpad.net/bugs/1567668 >>[2] >>https://github.com/openstack-infra/devstack-gate/blob/master/functions.sh#L1151 >>[3] >>https://github.com/openstack-infra/devstack-gate/blob/master/devstack-vm-gate-wrap.sh#L581 >> >>__________________________________________________________________________ >>OpenStack Development Mailing List (not for usage questions) >>Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev