It doesn't make any sense to me do a rebalance status at first go and then go for a rebalance status in a loop. Instead we should go for a rebalance status in a loop immediately.
On Wed, Aug 30, 2017 at 4:28 PM, Shwetha Panduranga <spand...@redhat.com> wrote: > May be i should change the log message from 'Checking rebalance status' to > 'Logging rebalance status' because the first 'rebalance status' command > just does that . It executes 'rebalance status'. Now > wait_for_rebalance_to_complete validates rebalance is 'completed' within 5 > minutes ( default time out ). If that makes sense i will make those changes > as well along with introducing the delay b/w 'start' and 'status' > > On Wed, Aug 30, 2017 at 4:26 PM, Atin Mukherjee <amukh...@redhat.com> > wrote: > >> >> >> On Wed, Aug 30, 2017 at 4:23 PM, Shwetha Panduranga <spand...@redhat.com> >> wrote: >> >>> This is the first check where we just execute 'rebalance status' . >>> That's the command which failed and hence failed the test case. If u see >>> the test case, the next step is wait_for_rebalance_to_complete (status >>> --xml). This is where we execute rebalance status until 5 minutes for >>> rebalance to get completed. Even before waiting for rebalance, the first >>> execution of status command failed. Hence the test case failed. >>> >> >> Cool. So there is still a problem in the test case. We can't assume >> rebalance status to report back success immediately after rebalance start >> and I've explained the why part in the earlier thread. Why do we need to do >> an intermediate check of rebalance status before going for >> wait_for_rebalance_to_complete ? >> >> >>> On Wed, Aug 30, 2017 at 4:07 PM, Atin Mukherjee <amukh...@redhat.com> >>> wrote: >>> >>>> *14:15:57* # Start Rebalance*14:15:57* >>>> g.log.info("Starting Rebalance on the volume")*14:15:57* ret, _, _ >>>> = rebalance_start(self.mnode, self.volname)*14:15:57* >>>> self.assertEqual(ret, 0, ("Failed to start rebalance on the volume >>>> "*14:15:57* "%s", >>>> self.volname))*14:15:57* g.log.info("Successfully started >>>> rebalance on the volume %s",*14:15:57* >>>> self.volname)*14:15:57* *14:15:57* # Check Rebalance >>>> status*14:15:57* g.log.info("Checking Rebalance status")*14:15:57* >>>> ret, _, _ = rebalance_status(self.mnode, self.volname)*14:15:57* >>>> self.assertEqual(ret, 0, ("Failed to get rebalance status for the >>>> "*14:15:57* > "volume %s", >>>> self.volname))*14:15:57* E AssertionError: ('Failed to get rebalance >>>> status for the volume %s', 'testvol_distributed-dispersed') >>>> >>>> >>>> The above is the snip extracted from https://ci.centos.org/view/Glu >>>> ster/job/gluster_glusto/377/console >>>> >>>> If we had gone for rebalance status checks multiple times, I should >>>> have seen multiple entries of rebalance_status failure or at least a >>>> difference in time, isn't it? >>>> >>>> >>>> On Wed, Aug 30, 2017 at 3:39 PM, Shwetha Panduranga < >>>> spand...@redhat.com> wrote: >>>> >>>>> Case: >>>>> >>>>> 1) add-brick when IO is in progress , wait for 30 seconds >>>>> >>>>> 2) Trigger rebalance >>>>> >>>>> 3) Execute: 'rebalance status' ( there is no time delay b/w 2) and 3) ) >>>>> >>>>> 4) wait_for_rebalance_to_complete ( This get's the xml output of >>>>> rebalance status and keep checking for rebalance status to be 'complete' >>>>> for every 10 seconds uptil 5 minutes. 5 minutes wait time can be passed as >>>>> parameter ) >>>>> >>>>> At every step we check the exit status of the command output. If the >>>>> exit status is non-zero we fail the test case. >>>>> >>>>> -Shwetha >>>>> >>>>> On Wed, Aug 30, 2017 at 6:06 AM, Sankarshan Mukhopadhyay < >>>>> sankarshan.mukhopadh...@gmail.com> wrote: >>>>> >>>>>> On Wed, Aug 30, 2017 at 6:03 AM, Atin Mukherjee <amukh...@redhat.com> >>>>>> wrote: >>>>>> > >>>>>> > On Wed, 30 Aug 2017 at 00:23, Shwetha Panduranga < >>>>>> spand...@redhat.com> >>>>>> > wrote: >>>>>> >> >>>>>> >> Hi Shyam, we are already doing it. we wait for rebalance status to >>>>>> be >>>>>> >> complete. We loop. we keep checking if the status is complete for >>>>>> '20' >>>>>> >> minutes or so. >>>>>> > >>>>>> > >>>>>> > Are you saying in this test rebalance status was executed multiple >>>>>> times >>>>>> > till it succeed? If yes then the test shouldn't have failed. Can I >>>>>> get to >>>>>> > access the complete set of logs? >>>>>> >>>>>> Would you not prefer to look at the specific test under discussion as >>>>>> well? >>>>>> _______________________________________________ >>>>>> Gluster-devel mailing list >>>>>> Gluster-devel@gluster.org >>>>>> http://lists.gluster.org/mailman/listinfo/gluster-devel >>>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Gluster-devel mailing list >>>>> Gluster-devel@gluster.org >>>>> http://lists.gluster.org/mailman/listinfo/gluster-devel >>>>> >>>> >>>> >>> >> >
_______________________________________________ Gluster-devel mailing list Gluster-devel@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-devel