The return code is in the log message you copy pasted: 2017-08-28 15:13:58,952 INFO (test_expanding_volume_when_io_in_progress) Successfully started rebalance on the volume testvol_distributed-dispersed 2017-08-28 15:13:58,952 INFO (test_expanding_volume_when_io_in_progress) Checking Rebalance status 2017-08-28 15:13:58,953 INFO (run) root@172.19.2.86 (cp): gluster volume rebalance testvol_distributed-dispersed status 2017-08-28 15:13:58,953 DEBUG (_get_ssh_connection) Retrieved connection from cache: root@172.19.2.86 *2017-08-28 15:13:58,993 INFO (_log_results) RETCODE (root@172.19.2.86 <root@172.19.2.86>): 1 *
We do not have any delay b/w rebalance start and the first 'rebalance status' Should i introduce the delay? On Wed, Aug 30, 2017 at 4:23 PM, Atin Mukherjee <amukh...@redhat.com> wrote: > Ok, Nigel helped me in understanding the trace back time is not something > we should look and the right way to dig through this problem is by looking > at glusto logs. As per the last rebalance instance from the log I see the > following: > > volume rebalance: testvol_distributed-dispersed: success: Rebalance on > testvol_distributed-dispersed has been started successfully. Use > rebalance status command to check status of the rebalance process. > ID: 96c645c7-710c-4c4c-a434-4157cbb75753 > > > > 2017-08-28 15:13:58,952 INFO (test_expanding_volume_when_io_in_progress) > Successfully started rebalance on the volume > testvol_distributed-dispersed > 2017-08-28 15:13:58,952 INFO (test_expanding_volume_when_io_in_progress) > Checking Rebalance status > 2017-08-28 15:13:58,953 INFO (run) root@172.19.2.86 (cp): gluster volume > rebalance testvol_distributed-dispersed status > 2017-08-28 15:13:58,953 DEBUG (_get_ssh_connection) Retrieved connection > from cache: root@172.19.2.86 > 2017-08-28 15:13:58,993 INFO (_log_results) RETCODE (root@172.19.2.86): > 1 > 2017-08-28 15:13:58,994 INFO (_log_results) STDERR (root@172.19.2.86 > )... > volume rebalance: testvol_distributed-dispersed: failed: error > > Again here I see the rebalance status failure was logged at 15:13:58,994 > where as rebalance start was triggered at 15:13:58,952. > > @Shwetha - could you help me in understanding how do we log the rebalance > status ret code in glusto log? > > > On Wed, Aug 30, 2017 at 4:07 PM, Atin Mukherjee <amukh...@redhat.com> > wrote: > >> *14:15:57* # Start Rebalance*14:15:57* g.log.info("Starting >> Rebalance on the volume")*14:15:57* ret, _, _ = >> rebalance_start(self.mnode, self.volname)*14:15:57* >> self.assertEqual(ret, 0, ("Failed to start rebalance on the volume >> "*14:15:57* "%s", self.volname))*14:15:57* >> g.log.info("Successfully started rebalance on the volume >> %s",*14:15:57* self.volname)*14:15:57* *14:15:57* >> # Check Rebalance status*14:15:57* g.log.info("Checking >> Rebalance status")*14:15:57* ret, _, _ = >> rebalance_status(self.mnode, self.volname)*14:15:57* >> self.assertEqual(ret, 0, ("Failed to get rebalance status for the >> "*14:15:57* > "volume %s", >> self.volname))*14:15:57* E AssertionError: ('Failed to get rebalance >> status for the volume %s', 'testvol_distributed-dispersed') >> >> >> The above is the snip extracted from https://ci.centos.org/view/Glu >> ster/job/gluster_glusto/377/console >> >> If we had gone for rebalance status checks multiple times, I should have >> seen multiple entries of rebalance_status failure or at least a difference >> in time, isn't it? >> >> >> On Wed, Aug 30, 2017 at 3:39 PM, Shwetha Panduranga <spand...@redhat.com> >> wrote: >> >>> Case: >>> >>> 1) add-brick when IO is in progress , wait for 30 seconds >>> >>> 2) Trigger rebalance >>> >>> 3) Execute: 'rebalance status' ( there is no time delay b/w 2) and 3) ) >>> >>> 4) wait_for_rebalance_to_complete ( This get's the xml output of >>> rebalance status and keep checking for rebalance status to be 'complete' >>> for every 10 seconds uptil 5 minutes. 5 minutes wait time can be passed as >>> parameter ) >>> >>> At every step we check the exit status of the command output. If the >>> exit status is non-zero we fail the test case. >>> >>> -Shwetha >>> >>> On Wed, Aug 30, 2017 at 6:06 AM, Sankarshan Mukhopadhyay < >>> sankarshan.mukhopadh...@gmail.com> wrote: >>> >>>> On Wed, Aug 30, 2017 at 6:03 AM, Atin Mukherjee <amukh...@redhat.com> >>>> wrote: >>>> > >>>> > On Wed, 30 Aug 2017 at 00:23, Shwetha Panduranga <spand...@redhat.com >>>> > >>>> > wrote: >>>> >> >>>> >> Hi Shyam, we are already doing it. we wait for rebalance status to be >>>> >> complete. We loop. we keep checking if the status is complete for >>>> '20' >>>> >> minutes or so. >>>> > >>>> > >>>> > Are you saying in this test rebalance status was executed multiple >>>> times >>>> > till it succeed? If yes then the test shouldn't have failed. Can I >>>> get to >>>> > access the complete set of logs? >>>> >>>> Would you not prefer to look at the specific test under discussion as >>>> well? >>>> _______________________________________________ >>>> Gluster-devel mailing list >>>> Gluster-devel@gluster.org >>>> http://lists.gluster.org/mailman/listinfo/gluster-devel >>>> >>> >>> >>> _______________________________________________ >>> Gluster-devel mailing list >>> Gluster-devel@gluster.org >>> http://lists.gluster.org/mailman/listinfo/gluster-devel >>> >> >> >
_______________________________________________ Gluster-devel mailing list Gluster-devel@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-devel