On Mon, Apr 11, 2016, at 03:07 AM, Jakub Libosvar wrote: > Hi, > > recently we hit an issue in Neutron with tests getting stuck [1]. As a > side effect we discovered logs are not collected properly which makes it > hard to find the root cause. The reason of missing logs is that we send > SIGKILL to whatever gate hook is running when we hit the global timeout > per gate job [2]. This gives no time to running process to perform any > post-processing. In post_gate_hook function in Neutron, we collect logs > from /tmp directory, compress them and move them to /opt/stack/logs to > make them exposed. > > I have in mind two solutions to which I'd like to get feedback before > sending patches. > > 1) In Neutron, we execute tests in post_gate_hook (dunno why). But even > if we would have moved test execution into gate_hook and tests get stuck > then the post_gate_hook won't be triggered [3]. So the solution I > propose here is to terminate gate_hook N minutes before global timeout > and still execute post_gate_hook (with timeout) as post-processing > routine. > > 2) Second proposal is to let timeout wrapped commands know they are > about to be killed. We can send let's say SIGTERM instead of SIGKILL and > after certain amount of time, send SIGKILL. Example: We send SIGTERM 3 > minutes before global timeout, letting these 3 minutes to 'command' to > handle the SIGTERM signal. > > timeout -s 15 -k 3 $((REMAINING_TIME-3))m bash -c "command" > > With the 2nd approach we can trap the signal that kills running test > suite and collects logs with same functions we currently have. > > > I would personally go with second option but I want to hear if anybody > has a better idea about post processing in gate jobs or if there is > already a tool we can use to collect logs. > > Thanks, > Kuba
Devstack gate already does a "soft" timeout [0] then proceeds to cleanup (part of which is collecting logs) [1], then Jenkins does the "hard" timeout [2]. Why aren't we collecting the required log files as part of the existing cleanup? [0] https://git.openstack.org/cgit/openstack-infra/devstack-gate/tree/devstack-vm-gate-wrap.sh#n569 [1] https://git.openstack.org/cgit/openstack-infra/devstack-gate/tree/devstack-vm-gate-wrap.sh#n594 [2] https://git.openstack.org/cgit/openstack-infra/project-config/tree/jenkins/jobs/devstack-gate.yaml#n325 Clark __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev