Re: [Gluster-users] Rebalance times in 3.2.5 vs 3.4.2
Hi Matt, if the 'status' says 0 for everything that's not good. Normally when I do rebalance the numbers should change (up). Also the rebalance log should show files being moved around. For the errors - my (limited) experience with Gluster is that the 'W' are normally harmless and they show up quite a bit. For the actuall error 'E' you could try to play with 'auth.allow' as suggested here http://gluster.org/pipermail/gluster-users/2011-November/009094.html Normally when rebalancing I do count of files on the bricks and the Gluster mount to make sure they eventually add up. Also I grep and count '-T' and see how the count goes down and 'rw' count goes up. v On Thu 27 Feb 2014 00:57:28, Matt Edwards wrote: Hopefully I'm not derailing this thread too far, but I have a related rebalance progress/speed issue. I have a rebalance process started that's been running for 3-4 days. Is there a good way to see if it's running successfully, or might this be a sign of some problem? This is on a 4-node distribute setup with v3.4.2 and 45T of data. The *-rebalance.log has been silent since some informational messages when the rebalance started. There were a few initial warnings and errors that I observed, though: E [client-handshake.c:1397:client_setvolume_cbk] 0-cluster2-client-0: SETVOLUME on remote-host failed: Authentication failed W [client-handshake.c:1365:client_setvolume_cbk] 0-cluster2-client-4: failed to set the volume (Permission denied) W [client-handshake.c:1391:client_setvolume_cbk] 0-cluster2-client-4: failed to get 'process-uuid' from reply dict W [socket.c:514:__socket_rwv] 0-cluster2-client-3: readv failed (No data available) gluster volume status reports that the rebalance is in progress, the process listed in vols/volname/rebalance/hash.pid is still running on the server, but gluster volume rebalance volname status reports 0 for everything (files scanned or rebalanced, failures, run time). Thanks, Matt On Thu, Feb 27, 2014 at 12:39 AM, Shylesh Kumar shmo...@redhat.com wrote: Hi Viktor, Lots of optimizations and improvements went in for 3.4 so it should be faster than 3.2. Just to make sure what's happening could you please check rebalance logs which will be in /var/log/glusterfs/volname-rebalance.log and check is there any progress ? Thanks, Shylesh Viktor Villafuerte wrote: Anybody can confirm/dispute that this is normal/abnormal? v On Tue 25 Feb 2014 15:21:40, Viktor Villafuerte wrote: Hi all, I have distributed replicated set with 2 servers (replicas) and am trying to add another set of replicas: 1 x (1x1) = 2 x (1x1) I have about 23G of data which I copy onto the first replica, check everything and then add the other set of replicas and eventually rebalance fix-layout, migrate-data. Now on Gluster v3.2.5 this took about 30 mins (to rebalance + migrate-data) on Gluster v3.4.2 this has been running for almost 4 hours and it's still not finished As I may have to do this in production, where the amount of data is significantly larger than 23G, I'm looking at about three weeks of wait to rebalance :) Now my question is if this is as it's meant to be? I can see that v3.4.2 gives me more info about the rebalance process etc, but that surely cannot justify the enormous time difference. Is this normal/expected behaviour? If so I will have to stick with the v3.2.5 as it seems way quicker. Please, let me know if there is any 'well known' option/way/secret to speed the rebalance up on v3.4.2. thanks -- Regards Viktor Villafuerte Optus Internet Engineering t: 02 808-25265 ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users -- Regards Viktor Villafuerte Optus Internet Engineering t: 02 808-25265 ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Rebalance times in 3.2.5 vs 3.4.2
Also I should add here that I'm doing this on VMs. However the rebalance with 3.2.5 was done on the same VMs v On Thu 27 Feb 2014 17:16:55, Viktor Villafuerte wrote: Hi Shylesh, yes the log showing files being processed and eventually the rebalance completed (with skipped files) but it took much much longer than with 3.2.5 which I tested intially. v On Thu 27 Feb 2014 11:09:56, Shylesh Kumar wrote: Hi Viktor, Lots of optimizations and improvements went in for 3.4 so it should be faster than 3.2. Just to make sure what's happening could you please check rebalance logs which will be in /var/log/glusterfs/volname-rebalance.log and check is there any progress ? Thanks, Shylesh Viktor Villafuerte wrote: Anybody can confirm/dispute that this is normal/abnormal? v On Tue 25 Feb 2014 15:21:40, Viktor Villafuerte wrote: Hi all, I have distributed replicated set with 2 servers (replicas) and am trying to add another set of replicas: 1 x (1x1) = 2 x (1x1) I have about 23G of data which I copy onto the first replica, check everything and then add the other set of replicas and eventually rebalance fix-layout, migrate-data. Now on Gluster v3.2.5 this took about 30 mins (to rebalance + migrate-data) on Gluster v3.4.2 this has been running for almost 4 hours and it's still not finished As I may have to do this in production, where the amount of data is significantly larger than 23G, I'm looking at about three weeks of wait to rebalance :) Now my question is if this is as it's meant to be? I can see that v3.4.2 gives me more info about the rebalance process etc, but that surely cannot justify the enormous time difference. Is this normal/expected behaviour? If so I will have to stick with the v3.2.5 as it seems way quicker. Please, let me know if there is any 'well known' option/way/secret to speed the rebalance up on v3.4.2. thanks -- Regards Viktor Villafuerte Optus Internet Engineering t: 02 808-25265 ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users -- Regards Viktor Villafuerte Optus Internet Engineering t: 02 808-25265 ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users -- Regards Viktor Villafuerte Optus Internet Engineering t: 02 808-25265 ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Rebalance times in 3.2.5 vs 3.4.2
I just got this error [2014-02-28 03:11:26.311077] W [socket.c:514:__socket_rwv] 0-management: readv failed (No data available) after I umount-ed and stopped glusterd on a replicated setup. 2 bricks replicated, one server umount the volume (still ok), 'gluster volume stop' (still ok), after 'service glusterd stop' this error comes up on the other brick (server). complete log: [r...@gluster07.uat ~]# service glusterd stop [ OK ] == log on gluster08.uat == /var/log/glusterfs/etc-glusterfs-glusterd.vol.log == [2014-02-28 03:14:11.416945] W [socket.c:514:__socket_rwv] 0-management: readv failed (No data available) [2014-02-28 03:14:11.417160] W [socket.c:1962:__socket_proto_state_machine] 0-management: reading from socket failed. Error (No data available), peer (10.116.126.31:24007) [2014-02-28 03:14:12.989556] E [socket.c:2157:socket_connect_finish] 0-management: connection to 10.116.126.31:24007 failed (Connection refused) [2014-02-28 03:14:12.989702] W [socket.c:514:__socket_rwv] 0-management: readv failed (No data available) [2014-02-28 03:14:15.003420] W [socket.c:514:__socket_rwv] 0-management: readv failed (No data available) [2014-02-28 03:14:18.020311] W [socket.c:514:__socket_rwv] 0-management: readv failed (No data available) [2014-02-28 03:14:21.036216] W [socket.c:514:__socket_rwv] 0-management: readv failed (No data available) [r...@gluster07.uat ~]# service glusterd start Starting glusterd: [ OK ] == log on gluster08.uat (continue) [2014-02-28 03:15:54.595801] W [socket.c:514:__socket_rwv] 0-management: readv failed (No data available) [2014-02-28 03:15:56.140337] I [glusterd-handshake.c:557:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 2 [2014-02-28 03:15:56.170983] I [glusterd-handler.c:1956:__glusterd_handle_incoming_friend_req] 0-glusterd: Received probe from uuid: 79a76cb4-1163-464f-84c4-19f2a39deee9 [2014-02-28 03:15:57.613644] I [glusterd-handler.c:2987:glusterd_xfer_friend_add_resp] 0-glusterd: Responded to 10.116.126.32 (0), ret: 0 [2014-02-28 03:15:57.635688] I [glusterd-sm.c:494:glusterd_ac_send_friend_update] 0-: Added uuid: 79a76cb4-1163-464f-84c4-19f2a39deee9, host: gluster07.uat [2014-02-28 03:15:57.656542] I [glusterd-rpc-ops.c:542:__glusterd_friend_update_cbk] 0-management: Received ACC from uuid: 79a76cb4-1163-464f-84c4-19f2a39deee9 [2014-02-28 03:15:57.687594] I [glusterd-rpc-ops.c:345:__glusterd_friend_add_cbk] 0-glusterd: Received ACC from uuid: 79a76cb4-1163-464f-84c4-19f2a39deee9, host: gluster07.uat, port: 0 [2014-02-28 03:15:57.712743] I [glusterd-handler.c:2118:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: 79a76cb4-1163-464f-84c4-19f2a39deee9 [2014-02-28 03:15:57.713470] I [glusterd-handler.c:2163:__glusterd_handle_friend_update] 0-: Received uuid: c1d10b71-d118-4f4a-adc2-3cfbea13fd54, hostname:10.116.126.32 [2014-02-28 03:15:57.713564] I [glusterd-handler.c:2172:__glusterd_handle_friend_update] 0-: Received my uuid as Friend no error after this point as long as glusterd is running on both bricks. Even when everything else is stopped.. On Thu 27 Feb 2014 21:54:46, Matt Edwards wrote: Hi Viktor, Thanks for the tips. Everything I say here is more of a comment then a 'tip' :) as I'm still learning pretty much everything about Gluster. I'm a bit confused, since the clients mount the share fine, and gluster peer status and gluster volume status all detail are happy. Rebalance is more difficult for the bricks. I've had situation before when I had files in '-T' state after rebalance completed (on the bricks). This is clearly wrong to me, but the mount was ok. The file still existed on the inital replica.. What is the expected output of rebalance status for just a fix-layout run? I believe the last time I did that, the status was always 0s (which makes some sense, as files aren't moving) and the log was empty, but the operation seemed to complete successfully. Does a file rebalance first require a fix-layout operation internally, and is it possible that my volume is still in that phase? Or I making up an overly optimistic scenario? I've just tried 'fix-layout' only and you're right the result is all '0's. But the the status is 'completed' and 'success' [r...@gluster08.uat ~]# gluster volume rebalance cdn-uat status Node Rebalanced-files size scanned failures skipped status run time in secs - --- --- --- --- --- -- localhost00Bytes 0 0 0 completed18.00 gluster07.uat00Bytes 0 0 0 completed18.00
Re: [Gluster-users] Rebalance times in 3.2.5 vs 3.4.2
No, the amount of data is still the same and the files are identical. I'm just running another rebalance now, with 3.4.2 packages that I've compiled myself on our build server. So I'll see if that's any different. Also since I'm in it. I'm just looking on the 'rebalance status' and I would expect it to be the same on all the bricks (servers). However the output is quite different. On the 'primary' server under 'Node' are all 4 servers: localhost and 3 hostnames. On its replica there are: 3 primaries + one hostname randomly changing (two hostnames from replica 2) gluster08 - primary Node - localhost gluster07.uat gluster01.uat gluster02.uat gluster07 - replica with 08 Node - localhost gluster08.uat gluster08.uat gluster08.uat gluster02.uat On the second replica there are: 4 localhosts + one hostname randomly changing (two hostnames from replica 1 + its own replica hostname) gluster01 Node - localhost localhost localhost localhost gluster08.uat gluster02 - replica with 01 Node - localhost localhost localhost localhost gluster01.uat Is this right? v On Fri 28 Feb 2014 08:45:37, Vijay Bellur wrote: On 02/28/2014 07:04 AM, Viktor Villafuerte wrote: Also I should add here that I'm doing this on VMs. However the rebalance with 3.2.5 was done on the same VMs v Load on the VMs and hypervisors hosting the VMs could also have a bearing. Algorithmically we are much better than 3.2.5 in 3.4.2 and our practical experience also seems to corroborate that. Has the amount of data involved in rebalancing changed since the last time this test was run? -Vijay On Thu 27 Feb 2014 17:16:55, Viktor Villafuerte wrote: Hi Shylesh, yes the log showing files being processed and eventually the rebalance completed (with skipped files) but it took much much longer than with 3.2.5 which I tested intially. v -- Regards Viktor Villafuerte Optus Internet Engineering t: 02 808-25265 ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users