Looks like you may be running into this bug: https://bugzilla.redhat.com/show_bug.cgi?id=905871

can you gdb to the client process (on the master) and give the backtrace?

-venky

On Thursday 21 March 2013 12:28 AM, Samuli Heinonen wrote:
Dear all,

I'm running GlusterFS 3.4 alpha2 together with oVirt 3.2. This is solely a test 
system and it doesn't have much data or anything important in it. Currently it 
has only 2 VM's running and disk usage is around 15 GB. I have been trying to 
set up a geo-replication for disaster recovery testing. For geo-replication I 
did following:

All machines are running CentOS 6.4 and using GlusterFS packages from 
http://download.gluster.org/pub/gluster/glusterfs/qa-releases/3.4.0alpha2/EPEL.repo/.
 Gluster bricks are using XFS. On slave I have tried ext4 and btrfs.

1. Installed slave machine (VM hosted in separate environment) with 
glusterfs-geo-replication, rsync and some other packages as needed by 
dependencies.
2. Installed glusterfs-geo-replication and rsync packages on GlusterFS server.
3. Created ssh key on server, saved it to 
/var/lib/glusterd/geo-replication/secret.pem and copied it to slave 
/root/.ssh/authorized_keys
4. On server ran:
- gluster volume geo-replication vmstorage slave:/backup/vmstorage config 
remote_gsyncd /usr/libexec/glusterfs/gsyncd
- gluster volume geo-replication vmstorage slave:/backup/vmstorage start

After that geo-replication status was "starting…" for a while and after that it switched 
to "N/A". I set log-level to DEBUG and saw lines like these  appearing every 10 seconds:
[2013-03-20 18:48:19.417107] D [repce:175:push] RepceClient: call 
27756:140178941277952:1363798099.42 keep_alive(None,) ...
[2013-03-20 18:48:19.418431] D [repce:190:__call__] RepceClient: call 
27756:140178941277952:1363798099.42 keep_alive -> 34
[2013-03-20 18:48:29.427959] D [repce:175:push] RepceClient: call 
27756:140178941277952:1363798109.43 keep_alive(None,) ...
[2013-03-20 18:48:29.429172] D [repce:190:__call__] RepceClient: call 
27756:140178941277952:1363798109.43 keep_alive -> 35

I thought that maybe it's creating index or something like that let it run for 
about 30 hours. Still after that there was no new log messages and no data 
being transferred to slave. I tried using strace -p 27756 to see what was going 
on but there was no output at all. My next thought was that maybe running 
virtual machines are causing some trouble so I shut down all VMs and restarted 
geo-replication but it didn't have any effect. My last effort was to crete new 
clean volume without any data in it and try geo-replication with it - no luck 
there either.

I also did quick test with master running GlusterFS 3.3.1 and it had no 
problems copying data to exactly same slave server.

There isn't much documentation available about geo-replication and before 
filing a bug report I'd like to hear if anyone else has used geo-replication 
successfully with 3.4 alpha orif I'm missing something obvious.

Output of gluster volume info:
Volume Name: vmstorage
Type: Distributed-Replicate
Volume ID: a800e5b7-089e-4b55-9515-c9cc72502aea
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: mc1.ovirt.local:/gluster/brick0/vmstorage
Brick2: mc5.ovirt.local:/gluster/brick0/vmstorage
Brick3: mc1.ovirt.local:/gluster/brick1/vmstorage
Brick4: mc5.ovirt.local:/gluster/brick1/vmstorage
Options Reconfigured:
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
network.remote-dio: enable
geo-replication.indexing: on
storage.owner-uid: 36
storage.owner-gid: 36
network.ping-timeout: 10
nfs.disable: on

Best regards,
Samuli Heinonen

_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Reply via email to