https://bugzilla.redhat.com/show_bug.cgi?id=1379228
Shyamsundar <srang...@redhat.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |srang...@redhat.com --- Comment #9 from Shyamsundar <srang...@redhat.com> --- My notes: The script: https://github.com/gluster/glusterfs-patch-acceptance-tests/blob/master/smoke.sh Success case: https://build.gluster.org/job/smoke/30870/console ============= 10:47:56 + wait %3 ---> This happens when %2 wait is complete, so dbench was done by this time and the script started waiting on %3 (IOW the line printed is going to be executed) 10:48:51 All tests successful. 10:48:51 Files=191, Tests=1960, 129 wallclock secs ( 1.28 usr 0.36 sys + 9.57 cusr 7.43 csys = 18.64 CPU) 10:48:51 Result: PASS ---> %3 (compliance) completed (took about 129 seconds, dbench would take about 71-72 seconds including the warmup), so the wait above was over and we proceed ---> cleanup starts 10:48:51 + rm -rf clients 10:48:53 + cd - 10:48:53 /home/jenkins/root/workspace/smoke 10:48:53 + finish 10:48:53 + RET=0 ---> NOTE: RET here takes the output of rm -rf clients, not sure if this is intended 10:48:53 + '[' 0 -ne 0 ']' 10:48:53 + cleanup ---> cleanup invoked by the finish, and this possibly has the set -x enabled by the script (but watchdog does not see the failed case) 10:48:53 + killall -15 glusterfs glusterfsd glusterd ---> All well! Failure case: https://build.gluster.org/job/smoke/30852/console ============= 00:03:16 All tests successful. 00:03:16 Files=191, Tests=1960, 93 wallclock secs ( 0.89 usr 0.26 sys + 5.46 cusr 3.30 csys = 9.91 CPU) 00:03:16 Result: PASS 00:11:36 Kicking in watchdog after 600 secs ---> Where are the watchdog cleanup calls noted? It appears that watchdog is called before set -x and hence cleanup is not logged here ---> Assuming cleanup was called, it killed all gluster processes, and dbench finally errored out in the read (no connection), and hence %2 completed 00:11:36 + wait %3 ---> wait for %3 starts, and gets over ASAP as compliance has finished running about 8 minutes back (00:03:16) 00:11:36 + rm -rf clients 00:11:36 rm: cannot remove `clients': Transport endpoint is not connected ---> We cannot as watchdog has cleaned up the process, so this rm -rf fails (we failed cleanup, is this an issue for the next run?) 00:11:36 + finish 00:11:36 + RET=1 ---> rm -rf failed, so we caught that, is this what is intended? 00:11:36 + '[' 1 -ne 0 ']' 00:11:36 + cat /build/dbench-logs ----------- 00:11:36 10 cleanup 581 sec ---> dbench has been attempting cleanup for 580 odd seconds 00:11:36 [643] read failed on handle 10007 (Transport endpoint is not connected) ---> Finally the dbench clients get an error as watchdog shut the process and hence the volume down and we get connection errors and dbench exits ----------- 00:11:36 + cleanup ---> Called by finish, and everything fails as watchdog has cleaned up already 00:11:36 + killall -15 glusterfs glusterfsd glusterd 00:11:36 glusterfs: no process killed 00:11:36 glusterfsd: no process killed 00:11:36 glusterd: no process killed Root cause: =========== Looks like dbench got stuck at https://github.com/sahlberg/dbench/blob/master/fileio.c#L400 (or pread) and never was able to break out of it. This caused dbench never to complete till the volume and the mount was taken down and it errored out. Why it got stuck here, would be the next question I guess. -- You are receiving this mail because: You are on the CC list for the bug. Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=Q2R8FSpovJ&a=cc_unsubscribe _______________________________________________ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra