On 22/05/2014, at 1:34 PM, Kaushal M wrote: > Thanks Justin, I found the problem. The VM can be deleted now.
Done. :) > Turns out, there was more than enough time for the rebalance to complete. But > we hit a race, which caused a command to fail. > > The particular test that failed is waiting for rebalance to finish. It does > this by doing a 'gluster volume rebalance <> status' command and checking the > result. The EXPECT_WITHIN function runs this command till we have a match, > the command fails or the timeout happens. > > For a rebalance status command, glusterd sends a request to the rebalance > process (as a brick_op) to get the latest stats. It had done the same in this > case as well. But while glusterd was waiting for the reply, the rebalance > completed and the process stopped itself. This caused the rpc connection > between glusterd and rebalance proc to close. This caused the all pending > requests to be unwound as failures. Which in turnlead to the command failing. > > I cannot think of a way to avoid this race from within glusterd. For this > particular test, we could avoid using the 'rebalance status' command if we > directly checked the rebalance process state using its pid etc. I don't > particularly approve of this approach, as I think I used the 'rebalance > status' command for a reason. But I currently cannot recall the reason, and > if cannot come with it soon, I wouldn't mind changing the test to avoid > rebalance status. Hmmm, is it the kind of thing where the "rebalance status" command should retry, if it's connection gets closed by a just-completed- rebalance (as happened here)? Or would that not work as well? + Justin -- Open Source and Standards @ Red Hat twitter.com/realjustinclift _______________________________________________ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel