Re: [Gluster-users] Rebalance times in 3.2.5 vs 3.4.2

2014-02-27 Thread Viktor Villafuerte
Hi Matt,

if the 'status' says 0 for everything that's not good. Normally when I
do rebalance the numbers should change (up). Also the rebalance log
should show files being moved around.

For the errors - my (limited) experience with Gluster is that the 'W'
are normally harmless and they show up quite a bit. For the actuall
error 'E' you could try to play with 'auth.allow' as suggested here

http://gluster.org/pipermail/gluster-users/2011-November/009094.html


Normally when rebalancing I do count of files on the bricks and the
Gluster mount to make sure they eventually add up. Also I grep and count
'-T' and see how the count goes down and 'rw' count goes up.

v




On Thu 27 Feb 2014 00:57:28, Matt Edwards wrote:
 Hopefully I'm not derailing this thread too far, but I have a related
 rebalance progress/speed issue.
 
 I have a rebalance process started that's been running for 3-4 days.  Is
 there a good way to see if it's running successfully, or might this be a
 sign of some problem?
 
 This is on a 4-node distribute setup with v3.4.2 and 45T of data.
 
 The *-rebalance.log has been silent since some informational messages when
 the rebalance started.  There were a few initial warnings and errors that I
 observed, though:
 
 
 E [client-handshake.c:1397:client_setvolume_cbk] 0-cluster2-client-0:
 SETVOLUME on remote-host failed: Authentication failed
 
 W [client-handshake.c:1365:client_setvolume_cbk] 0-cluster2-client-4:
 failed to set the volume (Permission denied)
 
 W [client-handshake.c:1391:client_setvolume_cbk] 0-cluster2-client-4:
 failed to get 'process-uuid' from reply dict
 
 W [socket.c:514:__socket_rwv] 0-cluster2-client-3: readv failed (No data
 available)
 
 
 gluster volume status reports that the rebalance is in progress, the
 process listed in vols/volname/rebalance/hash.pid is still running on
 the server, but gluster volume rebalance volname status reports 0 for
 everything (files scanned or rebalanced, failures, run time).
 
 Thanks,
 
 Matt
 
 
 On Thu, Feb 27, 2014 at 12:39 AM, Shylesh Kumar shmo...@redhat.com wrote:
 
  Hi Viktor,
 
  Lots of optimizations and improvements went in for 3.4 so it should be
  faster than 3.2.
  Just to make sure what's happening could you please check rebalance logs
  which will be in
  /var/log/glusterfs/volname-rebalance.log and check is there any
  progress ?
 
  Thanks,
  Shylesh
 
 
  Viktor Villafuerte wrote:
 
  Anybody can confirm/dispute that this is normal/abnormal?
 
  v
 
 
  On Tue 25 Feb 2014 15:21:40, Viktor Villafuerte wrote:
 
  Hi all,
 
  I have distributed replicated set with 2 servers (replicas) and am
  trying to add another set of replicas: 1 x (1x1) = 2 x (1x1)
 
  I have about 23G of data which I copy onto the first replica, check
  everything and then add the other set of replicas and eventually
  rebalance fix-layout, migrate-data.
 
  Now on
 
  Gluster v3.2.5 this took about 30 mins (to rebalance + migrate-data)
 
  on
 
  Gluster v3.4.2 this has been running for almost 4 hours and it's still
  not finished
 
 
  As I may have to do this in production, where the amount of data is
  significantly larger than 23G, I'm looking at about three weeks of wait
  to rebalance :)
 
  Now my question is if this is as it's meant to be? I can see that v3.4.2
  gives me more info about the rebalance process etc, but that surely
  cannot justify the enormous time difference.
 
  Is this normal/expected behaviour? If so I will have to stick with the
  v3.2.5 as it seems way quicker.
 
  Please, let me know if there is any 'well known' option/way/secret to
  speed the rebalance up on v3.4.2.
 
 
  thanks
 
 
 
  --
  Regards
 
  Viktor Villafuerte
  Optus Internet Engineering
  t: 02 808-25265
  ___
  Gluster-users mailing list
  Gluster-users@gluster.org
  http://supercolony.gluster.org/mailman/listinfo/gluster-users
 
 
  ___
  Gluster-users mailing list
  Gluster-users@gluster.org
  http://supercolony.gluster.org/mailman/listinfo/gluster-users
 

-- 
Regards

Viktor Villafuerte
Optus Internet Engineering
t: 02 808-25265
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Rebalance times in 3.2.5 vs 3.4.2

2014-02-27 Thread Viktor Villafuerte
Also I should add here that I'm doing this on VMs. However the rebalance
with 3.2.5 was done on the same VMs

v


On Thu 27 Feb 2014 17:16:55, Viktor Villafuerte wrote:
 Hi Shylesh,
 
 yes the log showing files being processed and eventually the rebalance
 completed (with skipped files) but it took much much longer than with
 3.2.5 which I tested intially.
 
 v
 
 
 On Thu 27 Feb 2014 11:09:56, Shylesh Kumar wrote:
  Hi Viktor,
  
  Lots of optimizations and improvements went in for 3.4 so it should
  be faster than 3.2.
  Just to make sure what's happening could you please check rebalance
  logs which will be in
  /var/log/glusterfs/volname-rebalance.log and check is there any progress ?
  
  Thanks,
  Shylesh
  
  Viktor Villafuerte wrote:
  Anybody can confirm/dispute that this is normal/abnormal?
  
  v
  
  
  On Tue 25 Feb 2014 15:21:40, Viktor Villafuerte wrote:
  Hi all,
  
  I have distributed replicated set with 2 servers (replicas) and am
  trying to add another set of replicas: 1 x (1x1) = 2 x (1x1)
  
  I have about 23G of data which I copy onto the first replica, check
  everything and then add the other set of replicas and eventually
  rebalance fix-layout, migrate-data.
  
  Now on
  
  Gluster v3.2.5 this took about 30 mins (to rebalance + migrate-data)
  
  on
  
  Gluster v3.4.2 this has been running for almost 4 hours and it's still
  not finished
  
  
  As I may have to do this in production, where the amount of data is
  significantly larger than 23G, I'm looking at about three weeks of wait
  to rebalance :)
  
  Now my question is if this is as it's meant to be? I can see that v3.4.2
  gives me more info about the rebalance process etc, but that surely
  cannot justify the enormous time difference.
  
  Is this normal/expected behaviour? If so I will have to stick with the
  v3.2.5 as it seems way quicker.
  
  Please, let me know if there is any 'well known' option/way/secret to
  speed the rebalance up on v3.4.2.
  
  
  thanks
  
  
  
  -- 
  Regards
  
  Viktor Villafuerte
  Optus Internet Engineering
  t: 02 808-25265
  ___
  Gluster-users mailing list
  Gluster-users@gluster.org
  http://supercolony.gluster.org/mailman/listinfo/gluster-users
 
 -- 
 Regards
 
 Viktor Villafuerte
 Optus Internet Engineering
 t: 02 808-25265
 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://supercolony.gluster.org/mailman/listinfo/gluster-users

-- 
Regards

Viktor Villafuerte
Optus Internet Engineering
t: 02 808-25265
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Rebalance times in 3.2.5 vs 3.4.2

2014-02-27 Thread Viktor Villafuerte
I just got this error

[2014-02-28 03:11:26.311077] W [socket.c:514:__socket_rwv] 0-management:
readv failed (No data available)

after I umount-ed and stopped glusterd on a replicated setup. 2 bricks
replicated, one server umount the volume (still ok), 'gluster volume
stop' (still ok), after 'service glusterd stop' this error comes up on the
other brick (server).

complete log:

[r...@gluster07.uat ~]# service glusterd stop   [ OK ]

== log on gluster08.uat
== /var/log/glusterfs/etc-glusterfs-glusterd.vol.log ==
[2014-02-28 03:14:11.416945] W [socket.c:514:__socket_rwv] 0-management:
readv failed (No data available)
[2014-02-28 03:14:11.417160] W
[socket.c:1962:__socket_proto_state_machine] 0-management: reading from
socket failed. Error (No data available), peer (10.116.126.31:24007)
[2014-02-28 03:14:12.989556] E [socket.c:2157:socket_connect_finish]
0-management: connection to 10.116.126.31:24007 failed (Connection
refused)
[2014-02-28 03:14:12.989702] W [socket.c:514:__socket_rwv] 0-management:
readv failed (No data available)
[2014-02-28 03:14:15.003420] W [socket.c:514:__socket_rwv] 0-management:
readv failed (No data available)
[2014-02-28 03:14:18.020311] W [socket.c:514:__socket_rwv] 0-management:
readv failed (No data available)
[2014-02-28 03:14:21.036216] W [socket.c:514:__socket_rwv] 0-management:
readv failed (No data available)


[r...@gluster07.uat ~]# service glusterd start
Starting glusterd: [  OK  ]


== log on gluster08.uat (continue)
[2014-02-28 03:15:54.595801] W [socket.c:514:__socket_rwv] 0-management:
readv failed (No data available)
[2014-02-28 03:15:56.140337] I
[glusterd-handshake.c:557:__glusterd_mgmt_hndsk_versions_ack]
0-management: using the op-version 2
[2014-02-28 03:15:56.170983] I
[glusterd-handler.c:1956:__glusterd_handle_incoming_friend_req]
0-glusterd: Received probe from uuid:
79a76cb4-1163-464f-84c4-19f2a39deee9
[2014-02-28 03:15:57.613644] I
[glusterd-handler.c:2987:glusterd_xfer_friend_add_resp] 0-glusterd:
Responded to 10.116.126.32 (0), ret: 0
[2014-02-28 03:15:57.635688] I
[glusterd-sm.c:494:glusterd_ac_send_friend_update] 0-: Added uuid:
79a76cb4-1163-464f-84c4-19f2a39deee9, host: gluster07.uat
[2014-02-28 03:15:57.656542] I
[glusterd-rpc-ops.c:542:__glusterd_friend_update_cbk] 0-management:
Received ACC from uuid: 79a76cb4-1163-464f-84c4-19f2a39deee9
[2014-02-28 03:15:57.687594] I
[glusterd-rpc-ops.c:345:__glusterd_friend_add_cbk] 0-glusterd: Received
ACC from uuid: 79a76cb4-1163-464f-84c4-19f2a39deee9, host:
gluster07.uat, port: 0
[2014-02-28 03:15:57.712743] I
[glusterd-handler.c:2118:__glusterd_handle_friend_update] 0-glusterd:
Received friend update from uuid: 79a76cb4-1163-464f-84c4-19f2a39deee9
[2014-02-28 03:15:57.713470] I
[glusterd-handler.c:2163:__glusterd_handle_friend_update] 0-: Received
uuid: c1d10b71-d118-4f4a-adc2-3cfbea13fd54, hostname:10.116.126.32
[2014-02-28 03:15:57.713564] I
[glusterd-handler.c:2172:__glusterd_handle_friend_update] 0-: Received
my uuid as Friend


no error after this point as long as glusterd is running on both bricks.
Even when everything else is stopped..



On Thu 27 Feb 2014 21:54:46, Matt Edwards wrote:
 Hi Viktor,
 
 Thanks for the tips.

Everything I say here is more of a comment then a 'tip' :) as I'm still
learning pretty much everything about Gluster.

 I'm a bit confused, since the clients mount the share
 fine, and gluster peer status and gluster volume status all detail are
 happy.

Rebalance is more difficult for the bricks. I've had situation before
when I had files in '-T' state after rebalance completed (on the bricks).
This is clearly wrong to me, but the mount was ok. The file still existed
on the inital replica..

 
 What is the expected output of rebalance status for just a fix-layout
 run?  I believe the last time I did that, the status was always 0s (which
 makes some sense, as files aren't moving) and the log was empty, but the
 operation seemed to complete successfully.  Does a file rebalance first
 require a fix-layout operation internally, and is it possible that my
 volume is still in that phase?  Or I making up an overly optimistic
 scenario?

I've just tried 'fix-layout' only and you're right the result is all
'0's. But the the status is 'completed' and 'success'

[r...@gluster08.uat ~]# gluster volume rebalance cdn-uat status
Node Rebalanced-files  size   
scanned  failures   skipped status run time in secs
   -  ---   ---   
---   ---   ---      --
   localhost00Bytes 
0 0 0  completed18.00
   gluster07.uat00Bytes 
0 0 0  completed18.00
   

Re: [Gluster-users] Rebalance times in 3.2.5 vs 3.4.2

2014-02-27 Thread Viktor Villafuerte
No, the amount of data is still the same and the files are identical.
I'm just running another rebalance now, with 3.4.2 packages that I've
compiled myself on our build server. So I'll see if that's any
different.

Also since I'm in it. I'm just looking on the 'rebalance status' and I
would expect it to be the same on all the bricks (servers). However the
output is quite different.

On the 'primary' server under 'Node' are all 4 servers: localhost and
3 hostnames. On its replica there are: 3 primaries + one hostname
randomly changing (two hostnames from replica 2)

gluster08 - primary
Node
   -
   localhost
gluster07.uat
gluster01.uat
gluster02.uat


gluster07 - replica with 08
Node
   -
   localhost
gluster08.uat
gluster08.uat
gluster08.uat
gluster02.uat


On the second replica there are: 4 localhosts + one hostname randomly
changing (two hostnames from replica 1 + its own replica hostname)

gluster01
Node
   -
   localhost
   localhost
   localhost
   localhost
gluster08.uat


gluster02 - replica with 01
Node
   -
   localhost
   localhost
   localhost
   localhost
gluster01.uat


Is this right?

v



On Fri 28 Feb 2014 08:45:37, Vijay Bellur wrote:
 On 02/28/2014 07:04 AM, Viktor Villafuerte wrote:
 Also I should add here that I'm doing this on VMs. However the rebalance
 with 3.2.5 was done on the same VMs
 
 v
 
 
 Load on the VMs and hypervisors hosting the VMs could also have a
 bearing. Algorithmically we are much better than 3.2.5 in 3.4.2 and
 our practical experience also seems to corroborate that.
 
 Has the amount of data involved in rebalancing changed since the
 last time this test was run?
 
 -Vijay
 
 
 
 On Thu 27 Feb 2014 17:16:55, Viktor Villafuerte wrote:
 Hi Shylesh,
 
 yes the log showing files being processed and eventually the rebalance
 completed (with skipped files) but it took much much longer than with
 3.2.5 which I tested intially.
 
 v
 
 

-- 
Regards

Viktor Villafuerte
Optus Internet Engineering
t: 02 808-25265
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users