On 22/05/2014, at 1:34 PM, Kaushal M wrote:
> Thanks Justin, I found the problem. The VM can be deleted now.

Done. :)


> Turns out, there was more than enough time for the rebalance to complete. But 
> we hit a race, which caused a command to fail.
> 
> The particular test that failed is waiting for rebalance to finish. It does 
> this by doing a 'gluster volume rebalance <> status' command and checking the 
> result. The EXPECT_WITHIN function runs this command till we have a match, 
> the command fails or the timeout happens.
> 
> For a rebalance status command, glusterd sends a request to the rebalance 
> process (as a brick_op) to get the latest stats. It had done the same in this 
> case as well. But while glusterd was waiting for the reply, the rebalance 
> completed and the process stopped itself. This caused the rpc connection 
> between glusterd and rebalance proc to close. This caused the all pending 
> requests to be unwound as failures. Which in turnlead to the command failing.
> 
> I cannot think of a way to avoid this race from within glusterd. For this 
> particular test, we could avoid using the 'rebalance status' command if we 
> directly checked the rebalance process state using its pid etc. I don't 
> particularly approve of this approach, as I think I used the 'rebalance 
> status' command for a reason. But I currently cannot recall the reason, and 
> if cannot come with it soon, I wouldn't mind changing the test to avoid 
> rebalance status.

Hmmm, is it the kind of thing where the "rebalance status" command
should retry, if it's connection gets closed by a just-completed-
rebalance (as happened here)?

Or would that not work as well?

+ Justin

--
Open Source and Standards @ Red Hat

twitter.com/realjustinclift

_______________________________________________
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel

Reply via email to