On Mon, Apr 11, 2016 at 1:56 PM, Clark Boylan <cboy...@sapwetik.org> wrote: > On Mon, Apr 11, 2016, at 10:52 AM, Jakub Libosvar wrote: >> On 04/11/2016 06:41 PM, Clark Boylan wrote: >> > On Mon, Apr 11, 2016, at 03:07 AM, Jakub Libosvar wrote: >> >> Hi, >> >> >> >> recently we hit an issue in Neutron with tests getting stuck [1]. As a >> >> side effect we discovered logs are not collected properly which makes it >> >> hard to find the root cause. The reason of missing logs is that we send >> >> SIGKILL to whatever gate hook is running when we hit the global timeout >> >> per gate job [2]. This gives no time to running process to perform any >> >> post-processing. In post_gate_hook function in Neutron, we collect logs >> >> from /tmp directory, compress them and move them to /opt/stack/logs to >> >> make them exposed. >> >> >> >> I have in mind two solutions to which I'd like to get feedback before >> >> sending patches. >> >> >> >> 1) In Neutron, we execute tests in post_gate_hook (dunno why). But even >> >> if we would have moved test execution into gate_hook and tests get stuck >> >> then the post_gate_hook won't be triggered [3]. So the solution I >> >> propose here is to terminate gate_hook N minutes before global timeout >> >> and still execute post_gate_hook (with timeout) as post-processing >> >> routine. >> >> >> >> 2) Second proposal is to let timeout wrapped commands know they are >> >> about to be killed. We can send let's say SIGTERM instead of SIGKILL and >> >> after certain amount of time, send SIGKILL. Example: We send SIGTERM 3 >> >> minutes before global timeout, letting these 3 minutes to 'command' to >> >> handle the SIGTERM signal. >> >> >> >> timeout -s 15 -k 3 $((REMAINING_TIME-3))m bash -c "command" >> >> >> >> With the 2nd approach we can trap the signal that kills running test >> >> suite and collects logs with same functions we currently have. >> >> >> >> >> >> I would personally go with second option but I want to hear if anybody >> >> has a better idea about post processing in gate jobs or if there is >> >> already a tool we can use to collect logs. >> >> >> >> Thanks, >> >> Kuba >> > >> > Devstack gate already does a "soft" timeout [0] then proceeds to cleanup >> > (part of which is collecting logs) [1], then Jenkins does the "hard" >> > timeout [2]. Why aren't we collecting the required log files as part of >> > the existing cleanup? >> This existing cleanup doesn't support hooks. Neutron tests produce a lot >> of logs by default stored in /tmp/dsvm-<job_name> so we need to compress >> and move them to /opt/stack/logs in order to get them collected by [1]. > > My suggestion would be to stop writing these log files to /tmp and > instead write them to the log dir where they will be automagically > compressed and collected.
Yeah that's what I'm doing here https://review.openstack.org/#/c/303594/. > >> >> > >> > [0] >> > https://git.openstack.org/cgit/openstack-infra/devstack-gate/tree/devstack-vm-gate-wrap.sh#n569 >> > [1] >> > https://git.openstack.org/cgit/openstack-infra/devstack-gate/tree/devstack-vm-gate-wrap.sh#n594 >> > [2] >> > https://git.openstack.org/cgit/openstack-infra/project-config/tree/jenkins/jobs/devstack-gate.yaml#n325 >> > >> > Clark >> > >> > __________________________________________________________________________ >> > OpenStack Development Mailing List (not for usage questions) >> > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> > >> >> >> __________________________________________________________________________ >> OpenStack Development Mailing List (not for usage questions) >> Unsubscribe: >> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev