Current run is segfault free(yay!) so far: [root@gqas001 ~]# gluster v rebalance testvol status Node Rebalanced-files size scanned failures skipped status run time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 893493 54.5GB 2692286 0 0 in progress 38643.00 gqas013.sbu.lab.eng.bos.redhat.com 0 0Bytes 1 0 0 completed 26070.00 gqas011.sbu.lab.eng.bos.redhat.com 0 0Bytes 0 0 0 failed 0.00 gqas014.sbu.lab.eng.bos.redhat.com 0 0Bytes 0 0 0 failed 0.00 gqas016.sbu.lab.eng.bos.redhat.com 892110 54.4GB 2692295 0 0 in progress 38643.00 gqas015.sbu.lab.eng.bos.redhat.com 0 0Bytes 0 0 0 failed 0.00 volume rebalance: testvol: success:
The beseline ran for 98,500.00 seconds. This one is at 38,643.00(1/3 the number of seconds) with 54 GB transferred so far. The same data set last run transferred 81 GB so at 54 GBs we are 66% there. By my estimations we should run for ~10,000-20,000 more seconds which would give us a 40-50% improvement! Lets see how it finishes out :) Any idea why I am getting the "failed" for three of them? This has been consistent across each run I have tried. -b On Fri, May 1, 2015 at 3:05 AM, Ravishankar N <ravishan...@redhat.com> wrote: > I sent a fix <http://review.gluster.org/#/c/10478/> but abandoned it > since Susant (CC'ed) has already sent one > http://review.gluster.org/#/c/10459/ > I think it needs re-submission, but more review-eyes are welcome. > -Ravi > > > On 05/01/2015 12:18 PM, Benjamin Turner wrote: > > There was a segfault on gqas001, have a look when you get a sec: > > Core was generated by `/usr/sbin/glusterfs -s localhost --volfile-id > rebalance/testvol --xlator-option'. > Program terminated with signal 11, Segmentation fault. > #0 gf_defrag_get_entry (this=0x7f26f8011180, defrag=0x7f26f8031ef0, > loc=0x7f26f4dbbfd0, migrate_data=0x7f2707874be8) at dht-rebalance.c:2032 > 2032 GF_FREE (tmp_container->parent_loc); > (gdb) bt > #0 gf_defrag_get_entry (this=0x7f26f8011180, defrag=0x7f26f8031ef0, > loc=0x7f26f4dbbfd0, migrate_data=0x7f2707874be8) at dht-rebalance.c:2032 > #1 gf_defrag_process_dir (this=0x7f26f8011180, defrag=0x7f26f8031ef0, > loc=0x7f26f4dbbfd0, migrate_data=0x7f2707874be8) at dht-rebalance.c:2207 > #2 0x00007f26fdae1eb8 in gf_defrag_fix_layout (this=0x7f26f8011180, > defrag=0x7f26f8031ef0, loc=0x7f26f4dbbfd0, fix_layout=0x7f2707874b5c, > migrate_data=0x7f2707874be8) > at dht-rebalance.c:2299 > #3 0x00007f26fdae1f4b in gf_defrag_fix_layout (this=0x7f26f8011180, > defrag=0x7f26f8031ef0, loc=0x7f26f4dbc200, fix_layout=0x7f2707874b5c, > migrate_data=0x7f2707874be8) > at dht-rebalance.c:2416 > #4 0x00007f26fdae1f4b in gf_defrag_fix_layout (this=0x7f26f8011180, > defrag=0x7f26f8031ef0, loc=0x7f26f4dbc430, fix_layout=0x7f2707874b5c, > migrate_data=0x7f2707874be8) > at dht-rebalance.c:2416 > #5 0x00007f26fdae1f4b in gf_defrag_fix_layout (this=0x7f26f8011180, > defrag=0x7f26f8031ef0, loc=0x7f26f4dbc660, fix_layout=0x7f2707874b5c, > migrate_data=0x7f2707874be8) > at dht-rebalance.c:2416 > #6 0x00007f26fdae1f4b in gf_defrag_fix_layout (this=0x7f26f8011180, > defrag=0x7f26f8031ef0, loc=0x7f26f4dbc890, fix_layout=0x7f2707874b5c, > migrate_data=0x7f2707874be8) > at dht-rebalance.c:2416 > #7 0x00007f26fdae1f4b in gf_defrag_fix_layout (this=0x7f26f8011180, > defrag=0x7f26f8031ef0, loc=0x7f26f4dbcac0, fix_layout=0x7f2707874b5c, > migrate_data=0x7f2707874be8) > at dht-rebalance.c:2416 > #8 0x00007f26fdae1f4b in gf_defrag_fix_layout (this=0x7f26f8011180, > defrag=0x7f26f8031ef0, loc=0x7f26f4dbccf0, fix_layout=0x7f2707874b5c, > migrate_data=0x7f2707874be8) > at dht-rebalance.c:2416 > #9 0x00007f26fdae1f4b in gf_defrag_fix_layout (this=0x7f26f8011180, > defrag=0x7f26f8031ef0, loc=0x7f26f4dbcf60, fix_layout=0x7f2707874b5c, > migrate_data=0x7f2707874be8) > at dht-rebalance.c:2416 > #10 0x00007f26fdae2524 in gf_defrag_start_crawl (data=0x7f26f8011180) at > dht-rebalance.c:2599 > #11 0x00007f2709024f62 in synctask_wrap (old_task=<value optimized out>) > at syncop.c:375 > #12 0x0000003648c438f0 in ?? () from /lib64/libc-2.12.so > #13 0x0000000000000000 in ?? () > > > On Fri, May 1, 2015 at 12:53 AM, Benjamin Turner <bennytu...@gmail.com> > wrote: > >> Ok I have all my data created and I just started the rebalance. One >> thing to not in the client log I see the following spamming: >> >> [root@gqac006 ~]# cat /var/log/glusterfs/gluster-mount-.log | wc -l >> 394042 >> >> [2015-05-01 00:47:55.591150] I [MSGID: 109036] >> [dht-common.c:6478:dht_log_new_layout_for_dir_selfheal] 0-testvol-dht: >> Setting layout of /file_dstdir/ >> gqac006.sbu.lab.eng.bos.redhat.com/thrd_05/d_001/d_000/d_004/d_006 with >> [Subvol_name: testvol-replicate-0, Err: -1 , Start: 0 , Stop: 2141429669 ], >> [Subvol_name: testvol-replicate-1, Err: -1 , Start: 2141429670 , Stop: >> 4294967295 ], >> [2015-05-01 00:47:55.596147] I >> [dht-selfheal.c:1587:dht_selfheal_layout_new_directory] 0-testvol-dht: >> chunk size = 0xffffffff / 19920276 = 0xd7 >> [2015-05-01 00:47:55.596177] I >> [dht-selfheal.c:1626:dht_selfheal_layout_new_directory] 0-testvol-dht: >> assigning range size 0x7fa39fa6 to testvol-replicate-1 >> [2015-05-01 00:47:55.596189] I >> [dht-selfheal.c:1626:dht_selfheal_layout_new_directory] 0-testvol-dht: >> assigning range size 0x7fa39fa6 to testvol-replicate-0 >> [2015-05-01 00:47:55.597081] I [MSGID: 109036] >> [dht-common.c:6478:dht_log_new_layout_for_dir_selfheal] 0-testvol-dht: >> Setting layout of /file_dstdir/ >> gqac006.sbu.lab.eng.bos.redhat.com/thrd_05/d_001/d_000/d_004/d_005 with >> [Subvol_name: testvol-replicate-0, Err: -1 , Start: 2141429670 , Stop: >> 4294967295 ], [Subvol_name: testvol-replicate-1, Err: -1 , Start: 0 , Stop: >> 2141429669 ], >> [2015-05-01 00:47:55.601853] I >> [dht-selfheal.c:1587:dht_selfheal_layout_new_directory] 0-testvol-dht: >> chunk size = 0xffffffff / 19920276 = 0xd7 >> [2015-05-01 00:47:55.601882] I >> [dht-selfheal.c:1626:dht_selfheal_layout_new_directory] 0-testvol-dht: >> assigning range size 0x7fa39fa6 to testvol-replicate-1 >> [2015-05-01 00:47:55.601895] I >> [dht-selfheal.c:1626:dht_selfheal_layout_new_directory] 0-testvol-dht: >> assigning range size 0x7fa39fa6 to testvol-replicate-0 >> >> Just to confirm the patch is >> in, glusterfs-3.8dev-0.71.gita7f8482.el6.x86_64. Correct? >> >> Here is the info on the data set: >> >> hosts in test : ['gqac006.sbu.lab.eng.bos.redhat.com', ' >> gqas003.sbu.lab.eng.bos.redhat.com'] >> top test directory(s) : ['/gluster-mount'] >> peration : create >> files/thread : 500000 >> threads : 8 >> record size (KB, 0 = maximum) : 0 >> file size (KB) : 64 >> file size distribution : fixed >> files per dir : 100 >> dirs per dir : 10 >> total threads = 16 >> total files = 7222600 >> total data = 440.833 GB >> 90.28% of requested files processed, minimum is 70.00 >> 8107.852862 sec elapsed time >> 890.815377 files/sec >> 890.815377 IOPS >> 55.675961 MB/sec >> >> Here is the rebalance run after about 5 or so minutes: >> >> [root@gqas001 ~]# gluster v rebalance testvol status >> Node Rebalanced-files size >> scanned failures skipped status run time in >> secs >> --------- ----------- ----------- >> ----------- ----------- ----------- ------------ >> -------------- >> localhost 32203 2.0GB >> 120858 0 5184 in progress >> 1294.00 >> gqas011.sbu.lab.eng.bos.redhat.com 0 0Bytes >> 0 0 0 failed >> 0.00 >> gqas016.sbu.lab.eng.bos.redhat.com 9364 585.2MB >> 53121 0 0 in progress >> 1294.00 >> gqas013.sbu.lab.eng.bos.redhat.com 0 0Bytes >> 14750 0 0 in progress >> 1294.00 >> gqas014.sbu.lab.eng.bos.redhat.com 0 0Bytes >> 0 0 0 failed >> 0.00 >> gqas015.sbu.lab.eng.bos.redhat.com 0 0Bytes >> 196382 0 0 in progress >> 1294.00 >> volume rebalance: testvol: success: >> >> The hostnames are there if you want to poke around. I had a problem >> with one of the added systems being on a different version of glusterfs so >> I had to update everything to glusterfs-3.8dev-0.99.git7d7b80e.el6.x86_64, >> remove the bricks I just added, and add them back. Something may have went >> wrong in that process but I thought I did everything correctly. I'll start >> fresh tomorrow. I figured I'd let this run over night. >> >> -b >> >> >> >> >> On Wed, Apr 29, 2015 at 9:48 PM, Benjamin Turner <bennytu...@gmail.com> >> wrote: >> >>> Sweet! Here is the baseline: >>> >>> [root@gqas001 ~]# gluster v rebalance testvol status >>> Node Rebalanced-files >>> size scanned failures skipped status run >>> time in secs >>> --------- ----------- ----------- >>> ----------- ----------- ----------- ------------ >>> -------------- >>> localhost 1328575 >>> 81.1GB 9402953 0 0 completed >>> 98500.00 >>> gqas012.sbu.lab.eng.bos.redhat.com 0 0Bytes >>> 8000011 0 0 completed >>> 51982.00 >>> gqas003.sbu.lab.eng.bos.redhat.com 0 0Bytes >>> 8000011 0 0 completed >>> 51982.00 >>> gqas004.sbu.lab.eng.bos.redhat.com 1326290 81.0GB >>> 9708625 0 0 completed >>> 98500.00 >>> gqas013.sbu.lab.eng.bos.redhat.com 0 0Bytes >>> 8000011 0 0 completed >>> 51982.00 >>> gqas014.sbu.lab.eng.bos.redhat.com 0 0Bytes >>> 8000011 0 0 completed >>> 51982.00 >>> volume rebalance: testvol: success: >>> >>> I'll have a run on the patch started tomorrow. >>> >>> -b >>> >>> On Wed, Apr 29, 2015 at 12:51 PM, Nithya Balachandran < >>> nbala...@redhat.com> wrote: >>> >>>> >>>> Doh my mistake, I thought it was merged. I was just running with the >>>> upstream 3.7 daily. Can I use this run as my baseline and then I can >>>> run >>>> next time on the patch to show the % improvement? I'll wipe everything >>>> and >>>> try on the patch, any idea when it will be merged? >>>> >>>> Yes, it would be very useful to have this run as the baseline. The >>>> patch has just been merged in master. It should be backported to 3.7 in a >>>> day or so. >>>> >>>> Regards, >>>> Nithya >>>> >>>> >>>> > > > > >>>> > > > > > >>>> > > > > > On Wed, Apr 22, 2015 at 1:10 AM, Nithya Balachandran >>>> > > > > > <nbala...@redhat.com> >>>> > > > > > wrote: >>>> > > > > > >>>> > > > > > > That sounds great. Thanks. >>>> > > > > > > >>>> > > > > > > Regards, >>>> > > > > > > Nithya >>>> > > > > > > >>>> > > > > > > ----- Original Message ----- >>>> > > > > > > From: "Benjamin Turner" <bennytu...@gmail.com> >>>> > > > > > > To: "Nithya Balachandran" <nbala...@redhat.com> >>>> > > > > > > Cc: "Susant Palai" <spa...@redhat.com>, "Gluster Devel" < >>>> > > > > > > gluster-devel@gluster.org> >>>> > > > > > > Sent: Wednesday, 22 April, 2015 12:14:14 AM >>>> > > > > > > Subject: Re: [Gluster-devel] Rebalance improvement design >>>> > > > > > > >>>> > > > > > > I am setting up a test env now, I'll have some feedback for >>>> you >>>> > this >>>> > > > > > > week. >>>> > > > > > > >>>> > > > > > > -b >>>> > > > > > > >>>> > > > > > > On Tue, Apr 21, 2015 at 11:36 AM, Nithya Balachandran >>>> > > > > > > <nbala...@redhat.com >>>> > > > > > > > >>>> > > > > > > wrote: >>>> > > > > > > >>>> > > > > > > > Hi Ben, >>>> > > > > > > > >>>> > > > > > > > Did you get a chance to try this out? >>>> > > > > > > > >>>> > > > > > > > Regards, >>>> > > > > > > > Nithya >>>> > > > > > > > >>>> > > > > > > > ----- Original Message ----- >>>> > > > > > > > From: "Susant Palai" <spa...@redhat.com> >>>> > > > > > > > To: "Benjamin Turner" <bennytu...@gmail.com> >>>> > > > > > > > Cc: "Gluster Devel" <gluster-devel@gluster.org> >>>> > > > > > > > Sent: Monday, April 13, 2015 9:55:07 AM >>>> > > > > > > > Subject: Re: [Gluster-devel] Rebalance improvement design >>>> > > > > > > > >>>> > > > > > > > Hi Ben, >>>> > > > > > > > Uploaded a new patch here: >>>> > http://review.gluster.org/#/c/9657/. >>>> > > > > > > > We >>>> > > > > > > > can >>>> > > > > > > > start perf test on it. :) >>>> > > > > > > > >>>> > > > > > > > Susant >>>> > > > > > > > >>>> > > > > > > > ----- Original Message ----- >>>> > > > > > > > From: "Susant Palai" <spa...@redhat.com> >>>> > > > > > > > To: "Benjamin Turner" <bennytu...@gmail.com> >>>> > > > > > > > Cc: "Gluster Devel" <gluster-devel@gluster.org> >>>> > > > > > > > Sent: Thursday, 9 April, 2015 3:40:09 PM >>>> > > > > > > > Subject: Re: [Gluster-devel] Rebalance improvement design >>>> > > > > > > > >>>> > > > > > > > Thanks Ben. RPM is not available and I am planning to >>>> refresh >>>> > the >>>> > > > > > > > patch >>>> > > > > > > in >>>> > > > > > > > two days with some more regression fixes. I think we can >>>> run >>>> > the >>>> > > > > > > > tests >>>> > > > > > > post >>>> > > > > > > > that. Any larger data-set will be good(say 3 to 5 TB). >>>> > > > > > > > >>>> > > > > > > > Thanks, >>>> > > > > > > > Susant >>>> > > > > > > > >>>> > > > > > > > ----- Original Message ----- >>>> > > > > > > > From: "Benjamin Turner" <bennytu...@gmail.com> >>>> > > > > > > > To: "Vijay Bellur" <vbel...@redhat.com> >>>> > > > > > > > Cc: "Susant Palai" <spa...@redhat.com>, "Gluster Devel" < >>>> > > > > > > > gluster-devel@gluster.org> >>>> > > > > > > > Sent: Thursday, 9 April, 2015 2:10:30 AM >>>> > > > > > > > Subject: Re: [Gluster-devel] Rebalance improvement design >>>> > > > > > > > >>>> > > > > > > > >>>> > > > > > > > I have some rebalance perf regression stuff I have been >>>> > working on, >>>> > > > > > > > is >>>> > > > > > > > there an RPM with these patches anywhere so that I can >>>> try it >>>> > on my >>>> > > > > > > > systems? If not I'll just build from: >>>> > > > > > > > >>>> > > > > > > > >>>> > > > > > > > git fetch git:// review.gluster.org/glusterfs >>>> > > > > > > > refs/changes/57/9657/8 >>>> > > > > > > > && >>>> > > > > > > > git cherry-pick FETCH_HEAD >>>> > > > > > > > >>>> > > > > > > > >>>> > > > > > > > >>>> > > > > > > > I will have _at_least_ 10TB of storage, how many TBs of >>>> data >>>> > should >>>> > > > > > > > I >>>> > > > > > > > run >>>> > > > > > > > with? >>>> > > > > > > > >>>> > > > > > > > >>>> > > > > > > > -b >>>> > > > > > > > >>>> > > > > > > > >>>> > > > > > > > On Tue, Apr 7, 2015 at 9:07 AM, Vijay Bellur < >>>> > vbel...@redhat.com > >>>> > > > > > > wrote: >>>> > > > > > > > >>>> > > > > > > > >>>> > > > > > > > >>>> > > > > > > > >>>> > > > > > > > On 04/07/2015 03:08 PM, Susant Palai wrote: >>>> > > > > > > > >>>> > > > > > > > >>>> > > > > > > > Here is one test performed on a 300GB data set and around >>>> > 100%(1/2 >>>> > > > > > > > the >>>> > > > > > > > time) improvement was seen. >>>> > > > > > > > >>>> > > > > > > > [root@gprfs031 ~]# gluster v i >>>> > > > > > > > >>>> > > > > > > > Volume Name: rbperf >>>> > > > > > > > Type: Distribute >>>> > > > > > > > Volume ID: 35562662-337e-4923-b862- d0bbb0748003 >>>> > > > > > > > Status: Started >>>> > > > > > > > Number of Bricks: 4 >>>> > > > > > > > Transport-type: tcp >>>> > > > > > > > Bricks: >>>> > > > > > > > Brick1: gprfs029-10ge:/bricks/ gprfs029/brick1 >>>> > > > > > > > Brick2: gprfs030-10ge:/bricks/ gprfs030/brick1 >>>> > > > > > > > Brick3: gprfs031-10ge:/bricks/ gprfs031/brick1 >>>> > > > > > > > Brick4: gprfs032-10ge:/bricks/ gprfs032/brick1 >>>> > > > > > > > >>>> > > > > > > > >>>> > > > > > > > Added server 32 and started rebalance force. >>>> > > > > > > > >>>> > > > > > > > Rebalance stat for new changes: >>>> > > > > > > > [root@gprfs031 ~]# gluster v rebalance rbperf status >>>> > > > > > > > Node Rebalanced-files size scanned failures skipped >>>> status run >>>> > time >>>> > > > > > > > in >>>> > > > > > > secs >>>> > > > > > > > --------- ----------- ----------- ----------- ----------- >>>> > > > > > > > ----------- >>>> > > > > > > > ------------ -------------- >>>> > > > > > > > localhost 74639 36.1GB 297319 0 0 completed 1743.00 >>>> > > > > > > > 172.17.40.30 67512 33.5GB 269187 0 0 completed 1395.00 >>>> > > > > > > > gprfs029-10ge 79095 38.8GB 284105 0 0 completed 1559.00 >>>> > > > > > > > gprfs032-10ge 0 0Bytes 0 0 0 completed 402.00 >>>> > > > > > > > volume rebalance: rbperf: success: >>>> > > > > > > > >>>> > > > > > > > Rebalance stat for old model: >>>> > > > > > > > [root@gprfs031 ~]# gluster v rebalance rbperf status >>>> > > > > > > > Node Rebalanced-files size scanned failures skipped >>>> status run >>>> > time >>>> > > > > > > > in >>>> > > > > > > secs >>>> > > > > > > > --------- ----------- ----------- ----------- ----------- >>>> > > > > > > > ----------- >>>> > > > > > > > ------------ -------------- >>>> > > > > > > > localhost 86493 42.0GB 634302 0 0 completed 3329.00 >>>> > > > > > > > gprfs029-10ge 94115 46.2GB 687852 0 0 completed 3328.00 >>>> > > > > > > > gprfs030-10ge 74314 35.9GB 651943 0 0 completed 3072.00 >>>> > > > > > > > gprfs032-10ge 0 0Bytes 594166 0 0 completed 1943.00 >>>> > > > > > > > volume rebalance: rbperf: success: >>>> > > > > > > > >>>> > > > > > > > >>>> > > > > > > > This is interesting. Thanks for sharing & well done! >>>> Maybe we >>>> > > > > > > > should >>>> > > > > > > > attempt a much larger data set and see how we fare there >>>> :). >>>> > > > > > > > >>>> > > > > > > > Regards, >>>> > > > > > > > >>>> > > > > > > > >>>> > > > > > > > Vijay >>>> > > > > > > > >>>> > > > > > > > >>>> > > > > > > > ______________________________ _________________ >>>> > > > > > > > Gluster-devel mailing list >>>> > > > > > > > Gluster-devel@gluster.org >>>> > > > > > > > http://www.gluster.org/ mailman/listinfo/gluster-devel >>>> > > > > > > > >>>> > > > > > > > _______________________________________________ >>>> > > > > > > > Gluster-devel mailing list >>>> > > > > > > > Gluster-devel@gluster.org >>>> > > > > > > > http://www.gluster.org/mailman/listinfo/gluster-devel >>>> > > > > > > > _______________________________________________ >>>> > > > > > > > Gluster-devel mailing list >>>> > > > > > > > Gluster-devel@gluster.org >>>> > > > > > > > http://www.gluster.org/mailman/listinfo/gluster-devel >>>> > > > > > > > >>>> > > > > > > >>>> > > > > > >>>> > > > > _______________________________________________ >>>> > > > > Gluster-devel mailing list >>>> > > > > Gluster-devel@gluster.org >>>> > > > > http://www.gluster.org/mailman/listinfo/gluster-devel >>>> > > > > >>>> > > > _______________________________________________ >>>> > > > Gluster-devel mailing list >>>> > > > Gluster-devel@gluster.org >>>> > > > http://www.gluster.org/mailman/listinfo/gluster-devel >>>> > > > >>>> > > >>>> > >>>> >>> >>> >> > > > _______________________________________________ > Gluster-devel mailing > listGluster-devel@gluster.orghttp://www.gluster.org/mailman/listinfo/gluster-devel > > >
_______________________________________________ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel