I've been trying to figure out what's causing CentOS regression tests to fail with core dumps, quite clearly unrelated to the patches that are being affected. I've managed to find a couple of clues, so it seems that maybe someone else will recognize something and zero in on the problem faster than I can with my own digging.
The first clue is that the tests are being found at the conclusion of the following test. bugs/stripe/bug-1002207.t However, I think that's a bit of a red herring. When I look in the logs from that test, I find the following messages that seem related to the crash. > [2016-04-08 07:11:23.319610] E [MSGID: 101019] > [xlator.c:430:xlator_init] 0-patchy-posix: Initialization of volume > 'patchy-posix' failed, review your volfile again > [2016-04-08 07:11:23.319628] E [MSGID: 101066] > [graph.c:324:glusterfs_graph_init] 0-patchy-posix: initializing > translator failed > [2016-04-08 07:11:23.319643] E [MSGID: 101176] > [graph.c:670:glusterfs_graph_activate] 0-graph: init failed > [2016-04-08 07:11:23.320773] I [MSGID: 101190] > [event-epoll.c:628:event_dispatch_epoll_worker] 0-epoll: Started thread > with index 2 > pending frames: > frame : type(0) op(0) > patchset: git://git.gluster.com/glusterfs.git > signal received: 11 This is consistent with some other things I've seen in these failures, which are either in graph-teardown code or in socket code but either way seem to occur immediately after we've failed to initialize a new graph. Here's the interesting part. Those lines came from this log file: bricks/d-backends-1-patchy_snap_mnt.log This is a stripe test. It doesn't do anything with snapshots. However, here's the test that runs immediately before it. bugs/snapshot/bug-1322772-real-path-fix-for-snapshot.t That test clearly does have something to do with snapshots, and even uses a name consistent with the name of the log file associated with the failure. Thus, in addition to whatever bug is actually causing the process to crash, we seem to have a problem with snapshot processes from one test persisting into the next. That's where you, the reader, come in. I have three questions. (a) Where should we look for the original bug that causes the crashes? (b) Is there another bug that's allowing snapshot processes to persist beyond their proper lifetime? (c) What should our test-infrastructure code should do to protect against the possibility of (b)? _______________________________________________ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel