[Gluster-users] Replication delay

2014-01-24 Thread Fabio Rosati
Hi All, 

in a distributed-replicated volume hosting some VMs disk images (GlusterFS 
3.4.2 on CentOS 6.5, qemu-kvm with glusterfs native support, no fuse mount), I 
always get the same two files that need healing: 



[root@networker ~]# gluster volume heal gv_pri info 
Gathering Heal info on volume gv_pri has been successful 




Brick nw1glus.gem.local:/glustexp/pri1/brick 
Number of entries: 2 
/alfresco.qc2 
/remlog.qc2 




Brick nw2glus.gem.local:/glustexp/pri1/brick 
Number of entries: 2 
/alfresco.qc2 
/remlog.qc2 




Brick nw3glus.gem.local:/glustexp/pri2/brick 
Number of entries: 0 




Brick nw4glus.gem.local:/glustexp/pri2/brick 
Number of entries: 0 

This is not a split-brain situation (I checked) and If I stop the two VMs that 
use these images, I get the two files healed/synced in about 15min. This is too 
much time, IMHO. 
In this volume there are other VM images with (smaller) disk images replicated 
on the same bricks and they get synced in real-time. 

These are the volume's details, the host networker is nw1glus.gem.local : 



[root@networker ~]# gluster volume info gv_pri 

Volume Name: gv_pri 
Type: Distributed-Replicate 
Volume ID: 3d91b91e-4d72-484f-8655-e5ed8d38bb28 
Status: Started 
Number of Bricks: 2 x 2 = 4 
Transport-type: tcp 
Bricks: 
Brick1: nw1glus.gem.local:/glustexp/pri1/brick 
Brick2: nw2glus.gem.local:/glustexp/pri1/brick 
Brick3: nw3glus.gem.local:/glustexp/pri2/brick 
Brick4: nw4glus.gem.local:/glustexp/pri2/brick 
Options Reconfigured: 
server.allow-insecure: on 
storage.owner-uid: 107 
storage.owner-gid: 107 

[root@networker ~]# gluster volume status gv_pri detail 


Status of volume: gv_pri 
-- 
Brick : Brick nw1glus.gem.local:/glustexp/pri1/brick 
Port : 50178 
Online : Y 
Pid : 25721 
File System : xfs 
Device : /dev/mapper/vg_guests-lv_brick1 
Mount Options : rw,noatime 
Inode Size : 512 
Disk Space Free : 168.4GB 
Total Disk Space : 194.9GB 
Inode Count : 102236160 
Free Inodes : 102236130 
-- 
Brick : Brick nw2glus.gem.local:/glustexp/pri1/brick 
Port : 50178 
Online : Y 
Pid : 27832 
File System : xfs 
Device : /dev/mapper/vg_guests-lv_brick1 
Mount Options : rw,noatime 
Inode Size : 512 
Disk Space Free : 168.4GB 
Total Disk Space : 194.9GB 
Inode Count : 102236160 
Free Inodes : 102236130 
-- 
Brick : Brick nw3glus.gem.local:/glustexp/pri2/brick 
Port : 50182 
Online : Y 
Pid : 14571 
File System : xfs 
Device : /dev/mapper/vg_guests-lv_brick2 
Mount Options : rw,noatime 
Inode Size : 512 
Disk Space Free : 418.3GB 
Total Disk Space : 433.8GB 
Inode Count : 227540992 
Free Inodes : 227540973 
-- 
Brick : Brick nw4glus.gem.local:/glustexp/pri2/brick 
Port : 50181 
Online : Y 
Pid : 21942 
File System : xfs 
Device : /dev/mapper/vg_guests-lv_brick2 
Mount Options : rw,noatime 
Inode Size : 512 
Disk Space Free : 418.3GB 
Total Disk Space : 433.8GB 
Inode Count : 227540992 
Free Inodes : 227540973 

fuse-mount of the gv_pri volume: 



[root@networker ~]# ll -h /mnt/gluspri/ 
totale 37G 
-rw---. 1 qemu qemu 7,7G 24 gen 10:21 alfresco.qc2 
-rw---. 1 qemu qemu 4,2G 24 gen 10:22 check_mk-salmo.qc2 
-rw---. 1 qemu qemu 27M 23 gen 16:42 newnxserver.qc2 
-rw---. 1 qemu qemu 1,1G 23 gen 13:38 newubutest1.qc2 
-rw---. 1 qemu qemu 11G 24 gen 10:17 nxserver.qc2 
-rw---. 1 qemu qemu 8,1G 24 gen 10:17 remlog.qc2 
-rw---. 1 qemu qemu 5,6G 24 gen 10:19 ubutest1.qc2 







Do you think this is the expected behaviour, maybe due to caching? What if the 
most updated node goes down while the VMs are running? 







Thanks a lot, 




Fabio Rosati 
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Replication delay

2014-01-24 Thread Pranith Kumar Karampuri
Hi Fabio,
 This is a known issue that has been addressed on master. It may be 
backported to 3.5. When a file is undergoing changes, it may appear in 'gluster 
volume heal volname info' output even when it doesn't need any self-heal.

Pranith

- Original Message -
 From: Fabio Rosati fabio.ros...@geminformatica.it
 To: Gluster-users@gluster.org List gluster-users@gluster.org
 Sent: Friday, January 24, 2014 3:17:27 PM
 Subject: [Gluster-users] Replication delay
 
 Hi All,
 
 in a distributed-replicated volume hosting some VMs disk images (GlusterFS
 3.4.2 on CentOS 6.5, qemu-kvm with glusterfs native support, no fuse mount),
 I always get the same two files that need healing:
 
 
 
 [root@networker ~]# gluster volume heal gv_pri info
 Gathering Heal info on volume gv_pri has been successful
 
 
 
 
 Brick nw1glus.gem.local:/glustexp/pri1/brick
 Number of entries: 2
 /alfresco.qc2
 /remlog.qc2
 
 
 
 
 Brick nw2glus.gem.local:/glustexp/pri1/brick
 Number of entries: 2
 /alfresco.qc2
 /remlog.qc2
 
 
 
 
 Brick nw3glus.gem.local:/glustexp/pri2/brick
 Number of entries: 0
 
 
 
 
 Brick nw4glus.gem.local:/glustexp/pri2/brick
 Number of entries: 0
 
 This is not a split-brain situation (I checked) and If I stop the two VMs
 that use these images, I get the two files healed/synced in about 15min.
 This is too much time, IMHO.
 In this volume there are other VM images with (smaller) disk images
 replicated on the same bricks and they get synced in real-time.
 
 These are the volume's details, the host networker is nw1glus.gem.local :
 
 
 
 [root@networker ~]# gluster volume info gv_pri
 
 Volume Name: gv_pri
 Type: Distributed-Replicate
 Volume ID: 3d91b91e-4d72-484f-8655-e5ed8d38bb28
 Status: Started
 Number of Bricks: 2 x 2 = 4
 Transport-type: tcp
 Bricks:
 Brick1: nw1glus.gem.local:/glustexp/pri1/brick
 Brick2: nw2glus.gem.local:/glustexp/pri1/brick
 Brick3: nw3glus.gem.local:/glustexp/pri2/brick
 Brick4: nw4glus.gem.local:/glustexp/pri2/brick
 Options Reconfigured:
 server.allow-insecure: on
 storage.owner-uid: 107
 storage.owner-gid: 107
 
 [root@networker ~]# gluster volume status gv_pri detail
 
 
 Status of volume: gv_pri
 --
 Brick : Brick nw1glus.gem.local:/glustexp/pri1/brick
 Port : 50178
 Online : Y
 Pid : 25721
 File System : xfs
 Device : /dev/mapper/vg_guests-lv_brick1
 Mount Options : rw,noatime
 Inode Size : 512
 Disk Space Free : 168.4GB
 Total Disk Space : 194.9GB
 Inode Count : 102236160
 Free Inodes : 102236130
 --
 Brick : Brick nw2glus.gem.local:/glustexp/pri1/brick
 Port : 50178
 Online : Y
 Pid : 27832
 File System : xfs
 Device : /dev/mapper/vg_guests-lv_brick1
 Mount Options : rw,noatime
 Inode Size : 512
 Disk Space Free : 168.4GB
 Total Disk Space : 194.9GB
 Inode Count : 102236160
 Free Inodes : 102236130
 --
 Brick : Brick nw3glus.gem.local:/glustexp/pri2/brick
 Port : 50182
 Online : Y
 Pid : 14571
 File System : xfs
 Device : /dev/mapper/vg_guests-lv_brick2
 Mount Options : rw,noatime
 Inode Size : 512
 Disk Space Free : 418.3GB
 Total Disk Space : 433.8GB
 Inode Count : 227540992
 Free Inodes : 227540973
 --
 Brick : Brick nw4glus.gem.local:/glustexp/pri2/brick
 Port : 50181
 Online : Y
 Pid : 21942
 File System : xfs
 Device : /dev/mapper/vg_guests-lv_brick2
 Mount Options : rw,noatime
 Inode Size : 512
 Disk Space Free : 418.3GB
 Total Disk Space : 433.8GB
 Inode Count : 227540992
 Free Inodes : 227540973
 
 fuse-mount of the gv_pri volume:
 
 
 
 [root@networker ~]# ll -h /mnt/gluspri/
 totale 37G
 -rw---. 1 qemu qemu 7,7G 24 gen 10:21 alfresco.qc2
 -rw---. 1 qemu qemu 4,2G 24 gen 10:22 check_mk-salmo.qc2
 -rw---. 1 qemu qemu 27M 23 gen 16:42 newnxserver.qc2
 -rw---. 1 qemu qemu 1,1G 23 gen 13:38 newubutest1.qc2
 -rw---. 1 qemu qemu 11G 24 gen 10:17 nxserver.qc2
 -rw---. 1 qemu qemu 8,1G 24 gen 10:17 remlog.qc2
 -rw---. 1 qemu qemu 5,6G 24 gen 10:19 ubutest1.qc2
 
 
 
 
 
 
 
 Do you think this is the expected behaviour, maybe due to caching? What if
 the most updated node goes down while the VMs are running?
 
 
 
 
 
 
 
 Thanks a lot,
 
 
 
 
 Fabio Rosati
 
 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://supercolony.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Replication delay

2014-01-24 Thread Pranith Kumar Karampuri


- Original Message -
 From: Pranith Kumar Karampuri pkara...@redhat.com
 To: Fabio Rosati fabio.ros...@geminformatica.it
 Cc: Gluster-users@gluster.org List gluster-users@gluster.org
 Sent: Friday, January 24, 2014 3:29:19 PM
 Subject: Re: [Gluster-users] Replication delay
 
 Hi Fabio,
  This is a known issue that has been addressed on master. It may be
  backported to 3.5. When a file is undergoing changes, it may appear in
  'gluster volume heal volname info' output even when it doesn't need
  any self-heal.
 
 Pranith

Sorry, I just saw that there is a self-heal happening for 15 minutes when you 
stop the VMs. How are you checking that the self-heal is happening?

Pranith
 
 - Original Message -
  From: Fabio Rosati fabio.ros...@geminformatica.it
  To: Gluster-users@gluster.org List gluster-users@gluster.org
  Sent: Friday, January 24, 2014 3:17:27 PM
  Subject: [Gluster-users] Replication delay
  
  Hi All,
  
  in a distributed-replicated volume hosting some VMs disk images (GlusterFS
  3.4.2 on CentOS 6.5, qemu-kvm with glusterfs native support, no fuse
  mount),
  I always get the same two files that need healing:
  
  
  
  [root@networker ~]# gluster volume heal gv_pri info
  Gathering Heal info on volume gv_pri has been successful
  
  
  
  
  Brick nw1glus.gem.local:/glustexp/pri1/brick
  Number of entries: 2
  /alfresco.qc2
  /remlog.qc2
  
  
  
  
  Brick nw2glus.gem.local:/glustexp/pri1/brick
  Number of entries: 2
  /alfresco.qc2
  /remlog.qc2
  
  
  
  
  Brick nw3glus.gem.local:/glustexp/pri2/brick
  Number of entries: 0
  
  
  
  
  Brick nw4glus.gem.local:/glustexp/pri2/brick
  Number of entries: 0
  
  This is not a split-brain situation (I checked) and If I stop the two VMs
  that use these images, I get the two files healed/synced in about 15min.
  This is too much time, IMHO.
  In this volume there are other VM images with (smaller) disk images
  replicated on the same bricks and they get synced in real-time.
  
  These are the volume's details, the host networker is nw1glus.gem.local :
  
  
  
  [root@networker ~]# gluster volume info gv_pri
  
  Volume Name: gv_pri
  Type: Distributed-Replicate
  Volume ID: 3d91b91e-4d72-484f-8655-e5ed8d38bb28
  Status: Started
  Number of Bricks: 2 x 2 = 4
  Transport-type: tcp
  Bricks:
  Brick1: nw1glus.gem.local:/glustexp/pri1/brick
  Brick2: nw2glus.gem.local:/glustexp/pri1/brick
  Brick3: nw3glus.gem.local:/glustexp/pri2/brick
  Brick4: nw4glus.gem.local:/glustexp/pri2/brick
  Options Reconfigured:
  server.allow-insecure: on
  storage.owner-uid: 107
  storage.owner-gid: 107
  
  [root@networker ~]# gluster volume status gv_pri detail
  
  
  Status of volume: gv_pri
  --
  Brick : Brick nw1glus.gem.local:/glustexp/pri1/brick
  Port : 50178
  Online : Y
  Pid : 25721
  File System : xfs
  Device : /dev/mapper/vg_guests-lv_brick1
  Mount Options : rw,noatime
  Inode Size : 512
  Disk Space Free : 168.4GB
  Total Disk Space : 194.9GB
  Inode Count : 102236160
  Free Inodes : 102236130
  --
  Brick : Brick nw2glus.gem.local:/glustexp/pri1/brick
  Port : 50178
  Online : Y
  Pid : 27832
  File System : xfs
  Device : /dev/mapper/vg_guests-lv_brick1
  Mount Options : rw,noatime
  Inode Size : 512
  Disk Space Free : 168.4GB
  Total Disk Space : 194.9GB
  Inode Count : 102236160
  Free Inodes : 102236130
  --
  Brick : Brick nw3glus.gem.local:/glustexp/pri2/brick
  Port : 50182
  Online : Y
  Pid : 14571
  File System : xfs
  Device : /dev/mapper/vg_guests-lv_brick2
  Mount Options : rw,noatime
  Inode Size : 512
  Disk Space Free : 418.3GB
  Total Disk Space : 433.8GB
  Inode Count : 227540992
  Free Inodes : 227540973
  --
  Brick : Brick nw4glus.gem.local:/glustexp/pri2/brick
  Port : 50181
  Online : Y
  Pid : 21942
  File System : xfs
  Device : /dev/mapper/vg_guests-lv_brick2
  Mount Options : rw,noatime
  Inode Size : 512
  Disk Space Free : 418.3GB
  Total Disk Space : 433.8GB
  Inode Count : 227540992
  Free Inodes : 227540973
  
  fuse-mount of the gv_pri volume:
  
  
  
  [root@networker ~]# ll -h /mnt/gluspri/
  totale 37G
  -rw---. 1 qemu qemu 7,7G 24 gen 10:21 alfresco.qc2
  -rw---. 1 qemu qemu 4,2G 24 gen 10:22 check_mk-salmo.qc2
  -rw---. 1 qemu qemu 27M 23 gen 16:42 newnxserver.qc2
  -rw---. 1 qemu qemu 1,1G 23 gen 13:38 newubutest1.qc2
  -rw---. 1 qemu qemu 11G 24 gen 10:17 nxserver.qc2
  -rw---. 1 qemu qemu 8,1G 24 gen 10:17 remlog.qc2
  -rw---. 1 qemu qemu 5,6G 24 gen 10:19 ubutest1.qc2
  
  
  
  
  
  
  
  Do you think this is the expected behaviour, maybe due to caching? What if
  the most updated node goes down while the VMs are running?
  
  
  
  
  
  
  

Re: [Gluster-users] gluster and kvm livemigration

2014-01-24 Thread Bernhard Glomm

samuli wrote: 
  Can you try to set storage.owner-uid and storage.owner-gid to 
  libvirt-qemu? To do that you have to stop volume.

hi samuli, hi all 
I tried setting storage.owner-uid and storage.owner-gid to libvirt-qemu, as 
suggested, but with the same effect,during livemigration the ownership of the 
imagefile changes from libvirt-qemu/kvm to root/root
root@pong[/5]:~ # gluster volume info glfs_atom01 Volume Name: glfs_atom01Type: 
ReplicateVolume ID: f28f0f62-37b3-4b10-8e86-9b373f4c0e75Status: StartedNumber 
of Bricks: 1 x 2 = 2Transport-type: tcpBricks:Brick1: 
172.24.1.11:/ecopool/fs_atom01Brick2: 172.24.1.13:/ecopool/fs_atom01Options 
Reconfigured:storage.owner-gid: 104storage.owner-uid: 107network.remote-dio: 
enable
this is tree -pfungiA path to where my images live : atom01 is running
[-rw--- libvirt- kvm     ]  /srv/vms/mnt_atom01/atom01.img[drwxr-xr-x 
libvirt- kvm     ]  /srv/vms/mnt_atom02[-rw--- root     root    ]  
/srv/vms/mnt_atom02/atom02.img[drwxr-xr-x libvirt- kvm     ]  
/srv/vms/mnt_atom03
Now I migrate through VirtualMachineManager and watching treeI see the 
permission changing to:
[drwxr-xr-x libvirt- kvm     ]  /srv/vms/mnt_atom01[-rw--- root     root    
]  /srv/vms/mnt_atom01/atom01.img[drwxr-xr-x libvirt- kvm     ]  
/srv/vms/mnt_atom02[-rw--- root     root    ]  
/srv/vms/mnt_atom02/atom02.img
From inside the atom01 (the VM) the filesystem becomes readonly.But in contrast 
tohttp://epboven.home.xs4all.nl/gluster-migrate.html
I can still read all file, can checksum them, just no write accessfrom outside 
the image file behaves as Paul described,as long as the machine is running I 
can't read the file
root@pong[/5]:~ # virsh list
 Id    Name                           State



 6     atom01                         running



root@pong[/5]:~ # l /srv/vms/mnt_atom01/atom01.img

-rw--- 1 root root 10G Jan 24 10:20 /srv/vms/mnt_atom01/atom01.img

root@pong[/5]:~ # file /srv/vms/mnt_atom01/atom01.img

/srv/vms/mnt_atom01/atom01.img: writable, regular file, no read permission

root@pong[/5]:~ # md5sum /srv/vms/mnt_atom01/atom01.img

md5sum: /srv/vms/mnt_atom01/atom01.img: Permission denied

root@pong[/5]:~ # virsh destroy atom01

Domain atom01 destroyed



root@pong[/5]:~ # l /srv/vms/mnt_atom01/atom01.img

-rw--- 1 root root 10G Jan 24 10:20 /srv/vms/mnt_atom01/atom01.img

root@pong[/5]:~ # file /srv/vms/mnt_atom01/atom01.img

/srv/vms/mnt_atom01/atom01.img: x86 boot sector; partition 1: ID=0x83, 
starthead 1, startsector 63, 16777165 sectors; partition 2: ID=0xf, starthead 
254, startsector 16777228, 1677718 sectors, code offset 0x63

root@pong[/5]:~ # md5sum /srv/vms/mnt_atom01/atom01.img

9d048558deb46fef7b24e8895711c554  /srv/vms/mnt_atom01/atom01.img
root@pong[/5]:~ # 

But interestingly the source of the migration can access the file after 
migration completedlike so: start atom01 on host ping, migrate it to pong
root@pong[/8]:~ # file 
/srv/vms/mnt_atom01/atom01.img/srv/vms/mnt_atom01/atom01.img: writable, regular 
file, no read permission root@ping[/5]:~ # file 
/srv/vms/mnt_atom01/atom01.img/srv/vms/mnt_atom01/atom01.img: x86 boot sector; 
partition 1: ID=0x83, starthead 1, startsector 63, 16777165 sectors; partition 
2: ID=0xf, starthead 254, startsector 16777228, 1677718 sectors, code offset 
0x63
100% reproducible 
Regards
Bernhard
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Replication delay

2014-01-24 Thread Fabio Rosati



- Messaggio originale -
 Da: Pranith Kumar Karampuri pkara...@redhat.com
 A: Fabio Rosati fabio.ros...@geminformatica.it
 Cc: Gluster-users@gluster.org List gluster-users@gluster.org
 Inviato: Venerdì, 24 gennaio 2014 11:02:15
 Oggetto: Re: [Gluster-users] Replication delay
 
 
 
 - Original Message -
  From: Pranith Kumar Karampuri pkara...@redhat.com
  To: Fabio Rosati fabio.ros...@geminformatica.it
  Cc: Gluster-users@gluster.org List gluster-users@gluster.org
  Sent: Friday, January 24, 2014 3:29:19 PM
  Subject: Re: [Gluster-users] Replication delay
  
  Hi Fabio,
   This is a known issue that has been addressed on master. It may be
   backported to 3.5. When a file is undergoing changes, it may appear in
   'gluster volume heal volname info' output even when it doesn't need
   any self-heal.
  
  Pranith
 
 Sorry, I just saw that there is a self-heal happening for 15 minutes when you
 stop the VMs. How are you checking that the self-heal is happening?

When I stop the VM for alfresco.qc2, heal info still reports alfresco.qc2 as 
in need for healing for about 15min.
It seems this is a real out-of-sync situation because if I check the two bricks 
I get different modification times up until they are healed (no more reported 
by heal info). This is the bricks' status for alfresco.qc2 while the VM is 
halted:

[root@networker ~]# ll /glustexp/pri1/brick/
totale 27769492
-rw---. 2 qemu qemu 8212709376 24 gen 11:16 alfresco.qc2
[...]

[root@networker2 ~]# ll /glustexp/pri1/brick/
totale 27769384
-rw---. 2 qemu qemu 8212709376 24 gen 11:05 alfresco.qc2
[...]

Bricks' status AFTER heal info doesn't report alfresco.qc2 anymore:

[root@networker ~]# ll /glustexp/pri1/brick/
totale 27769492
-rw---. 2 qemu qemu 8212709376 24 gen 11:05 alfresco.qc2

[root@networker2 ~]# ll /glustexp/pri1/brick/
totale 27769384
-rw---. 2 qemu qemu 8212709376 24 gen 11:05 alfresco.qc2

Thanks for helping!

Fabio

 
 Pranith
  
  - Original Message -
   From: Fabio Rosati fabio.ros...@geminformatica.it
   To: Gluster-users@gluster.org List gluster-users@gluster.org
   Sent: Friday, January 24, 2014 3:17:27 PM
   Subject: [Gluster-users] Replication delay
   
   Hi All,
   
   in a distributed-replicated volume hosting some VMs disk images
   (GlusterFS
   3.4.2 on CentOS 6.5, qemu-kvm with glusterfs native support, no fuse
   mount),
   I always get the same two files that need healing:
   
   
   
   [root@networker ~]# gluster volume heal gv_pri info
   Gathering Heal info on volume gv_pri has been successful
   
   
   
   
   Brick nw1glus.gem.local:/glustexp/pri1/brick
   Number of entries: 2
   /alfresco.qc2
   /remlog.qc2
   
   
   
   
   Brick nw2glus.gem.local:/glustexp/pri1/brick
   Number of entries: 2
   /alfresco.qc2
   /remlog.qc2
   
   
   
   
   Brick nw3glus.gem.local:/glustexp/pri2/brick
   Number of entries: 0
   
   
   
   
   Brick nw4glus.gem.local:/glustexp/pri2/brick
   Number of entries: 0
   
   This is not a split-brain situation (I checked) and If I stop the two VMs
   that use these images, I get the two files healed/synced in about 15min.
   This is too much time, IMHO.
   In this volume there are other VM images with (smaller) disk images
   replicated on the same bricks and they get synced in real-time.
   
   These are the volume's details, the host networker is nw1glus.gem.local
   :
   
   
   
   [root@networker ~]# gluster volume info gv_pri
   
   Volume Name: gv_pri
   Type: Distributed-Replicate
   Volume ID: 3d91b91e-4d72-484f-8655-e5ed8d38bb28
   Status: Started
   Number of Bricks: 2 x 2 = 4
   Transport-type: tcp
   Bricks:
   Brick1: nw1glus.gem.local:/glustexp/pri1/brick
   Brick2: nw2glus.gem.local:/glustexp/pri1/brick
   Brick3: nw3glus.gem.local:/glustexp/pri2/brick
   Brick4: nw4glus.gem.local:/glustexp/pri2/brick
   Options Reconfigured:
   server.allow-insecure: on
   storage.owner-uid: 107
   storage.owner-gid: 107
   
   [root@networker ~]# gluster volume status gv_pri detail
   
   
   Status of volume: gv_pri
   --
   Brick : Brick nw1glus.gem.local:/glustexp/pri1/brick
   Port : 50178
   Online : Y
   Pid : 25721
   File System : xfs
   Device : /dev/mapper/vg_guests-lv_brick1
   Mount Options : rw,noatime
   Inode Size : 512
   Disk Space Free : 168.4GB
   Total Disk Space : 194.9GB
   Inode Count : 102236160
   Free Inodes : 102236130
   --
   Brick : Brick nw2glus.gem.local:/glustexp/pri1/brick
   Port : 50178
   Online : Y
   Pid : 27832
   File System : xfs
   Device : /dev/mapper/vg_guests-lv_brick1
   Mount Options : rw,noatime
   Inode Size : 512
   Disk Space Free : 168.4GB
   Total Disk Space : 194.9GB
   Inode Count : 102236160
   Free Inodes : 102236130
   --
   Brick : Brick 

Re: [Gluster-users] Replication delay

2014-01-24 Thread Pranith Kumar Karampuri
This time when you stop the VM, could you get the output of getfattr -d -m. -e 
hex file-path-on-brick on both the bricks to debug further.

Pranith
- Original Message -
 From: Fabio Rosati fabio.ros...@geminformatica.it
 To: Pranith Kumar Karampuri pkara...@redhat.com
 Cc: Gluster-users@gluster.org List gluster-users@gluster.org
 Sent: Friday, January 24, 2014 3:58:38 PM
 Subject: Re: [Gluster-users] Replication delay
 
 
 
 
 - Messaggio originale -
  Da: Pranith Kumar Karampuri pkara...@redhat.com
  A: Fabio Rosati fabio.ros...@geminformatica.it
  Cc: Gluster-users@gluster.org List gluster-users@gluster.org
  Inviato: Venerdì, 24 gennaio 2014 11:02:15
  Oggetto: Re: [Gluster-users] Replication delay
  
  
  
  - Original Message -
   From: Pranith Kumar Karampuri pkara...@redhat.com
   To: Fabio Rosati fabio.ros...@geminformatica.it
   Cc: Gluster-users@gluster.org List gluster-users@gluster.org
   Sent: Friday, January 24, 2014 3:29:19 PM
   Subject: Re: [Gluster-users] Replication delay
   
   Hi Fabio,
This is a known issue that has been addressed on master. It may be
backported to 3.5. When a file is undergoing changes, it may appear
in
'gluster volume heal volname info' output even when it doesn't
need
any self-heal.
   
   Pranith
  
  Sorry, I just saw that there is a self-heal happening for 15 minutes when
  you
  stop the VMs. How are you checking that the self-heal is happening?
 
 When I stop the VM for alfresco.qc2, heal info still reports alfresco.qc2
 as in need for healing for about 15min.
 It seems this is a real out-of-sync situation because if I check the two
 bricks I get different modification times up until they are healed (no more
 reported by heal info). This is the bricks' status for alfresco.qc2 while
 the VM is halted:
 
 [root@networker ~]# ll /glustexp/pri1/brick/
 totale 27769492
 -rw---. 2 qemu qemu 8212709376 24 gen 11:16 alfresco.qc2
 [...]
 
 [root@networker2 ~]# ll /glustexp/pri1/brick/
 totale 27769384
 -rw---. 2 qemu qemu 8212709376 24 gen 11:05 alfresco.qc2
 [...]
 
 Bricks' status AFTER heal info doesn't report alfresco.qc2 anymore:
 
 [root@networker ~]# ll /glustexp/pri1/brick/
 totale 27769492
 -rw---. 2 qemu qemu 8212709376 24 gen 11:05 alfresco.qc2
 
 [root@networker2 ~]# ll /glustexp/pri1/brick/
 totale 27769384
 -rw---. 2 qemu qemu 8212709376 24 gen 11:05 alfresco.qc2
 
 Thanks for helping!
 
 Fabio
 
  
  Pranith
   
   - Original Message -
From: Fabio Rosati fabio.ros...@geminformatica.it
To: Gluster-users@gluster.org List gluster-users@gluster.org
Sent: Friday, January 24, 2014 3:17:27 PM
Subject: [Gluster-users] Replication delay

Hi All,

in a distributed-replicated volume hosting some VMs disk images
(GlusterFS
3.4.2 on CentOS 6.5, qemu-kvm with glusterfs native support, no fuse
mount),
I always get the same two files that need healing:



[root@networker ~]# gluster volume heal gv_pri info
Gathering Heal info on volume gv_pri has been successful




Brick nw1glus.gem.local:/glustexp/pri1/brick
Number of entries: 2
/alfresco.qc2
/remlog.qc2




Brick nw2glus.gem.local:/glustexp/pri1/brick
Number of entries: 2
/alfresco.qc2
/remlog.qc2




Brick nw3glus.gem.local:/glustexp/pri2/brick
Number of entries: 0




Brick nw4glus.gem.local:/glustexp/pri2/brick
Number of entries: 0

This is not a split-brain situation (I checked) and If I stop the two
VMs
that use these images, I get the two files healed/synced in about
15min.
This is too much time, IMHO.
In this volume there are other VM images with (smaller) disk images
replicated on the same bricks and they get synced in real-time.

These are the volume's details, the host networker is
nw1glus.gem.local
:



[root@networker ~]# gluster volume info gv_pri

Volume Name: gv_pri
Type: Distributed-Replicate
Volume ID: 3d91b91e-4d72-484f-8655-e5ed8d38bb28
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: nw1glus.gem.local:/glustexp/pri1/brick
Brick2: nw2glus.gem.local:/glustexp/pri1/brick
Brick3: nw3glus.gem.local:/glustexp/pri2/brick
Brick4: nw4glus.gem.local:/glustexp/pri2/brick
Options Reconfigured:
server.allow-insecure: on
storage.owner-uid: 107
storage.owner-gid: 107

[root@networker ~]# gluster volume status gv_pri detail


Status of volume: gv_pri
--
Brick : Brick nw1glus.gem.local:/glustexp/pri1/brick
Port : 50178
Online : Y
Pid : 25721
File System : xfs
Device : /dev/mapper/vg_guests-lv_brick1
Mount Options : rw,noatime
Inode Size : 512
Disk 

Re: [Gluster-users] Replication delay

2014-01-24 Thread Fabio Rosati
Ok, that's the output after the VM has been halted:

[root@networker ~]# getfattr -d -m. -e hex /glustexp/pri1/brick/alfresco.qc2 
getfattr: Removing leading '/' from absolute path names
# file: glustexp/pri1/brick/alfresco.qc2
security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000
trusted.afr.gv_pri-client-0=0x0139
trusted.afr.gv_pri-client-1=0x
trusted.gfid=0x298c76de7c8643a3909f7ef77dc294fe

[root@networker2 ~]# getfattr -d -m. -e hex /glustexp/pri1/brick/alfresco.qc2
getfattr: Removing leading '/' from absolute path names
# file: glustexp/pri1/brick/alfresco.qc2
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000
trusted.afr.gv_pri-client-0=0x0139
trusted.afr.gv_pri-client-1=0x
trusted.gfid=0x298c76de7c8643a3909f7ef77dc294fe


When heal info stops reporting alfresco.qc2 I get:

[root@networker glusterfs]# getfattr -d -m. -e hex 
/glustexp/pri1/brick/alfresco.qc2 
getfattr: Removing leading '/' from absolute path names
# file: glustexp/pri1/brick/alfresco.qc2
security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000
trusted.afr.gv_pri-client-0=0x
trusted.afr.gv_pri-client-1=0x
trusted.gfid=0x298c76de7c8643a3909f7ef77dc294fe

[root@networker2 ~]# getfattr -d -m. -e hex /glustexp/pri1/brick/alfresco.qc2
getfattr: Removing leading '/' from absolute path names
# file: glustexp/pri1/brick/alfresco.qc2
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000
trusted.afr.gv_pri-client-0=0x
trusted.afr.gv_pri-client-1=0x
trusted.gfid=0x298c76de7c8643a3909f7ef77dc294fe


Fabio

- Messaggio originale -
 Da: Pranith Kumar Karampuri pkara...@redhat.com
 A: Fabio Rosati fabio.ros...@geminformatica.it
 Cc: Gluster-users@gluster.org List gluster-users@gluster.org
 Inviato: Venerdì, 24 gennaio 2014 11:36:12
 Oggetto: Re: [Gluster-users] Replication delay
 
 This time when you stop the VM, could you get the output of getfattr -d -m.
 -e hex file-path-on-brick on both the bricks to debug further.
 
 Pranith
 - Original Message -
  From: Fabio Rosati fabio.ros...@geminformatica.it
  To: Pranith Kumar Karampuri pkara...@redhat.com
  Cc: Gluster-users@gluster.org List gluster-users@gluster.org
  Sent: Friday, January 24, 2014 3:58:38 PM
  Subject: Re: [Gluster-users] Replication delay
  
  
  
  
  - Messaggio originale -
   Da: Pranith Kumar Karampuri pkara...@redhat.com
   A: Fabio Rosati fabio.ros...@geminformatica.it
   Cc: Gluster-users@gluster.org List gluster-users@gluster.org
   Inviato: Venerdì, 24 gennaio 2014 11:02:15
   Oggetto: Re: [Gluster-users] Replication delay
   
   
   
   - Original Message -
From: Pranith Kumar Karampuri pkara...@redhat.com
To: Fabio Rosati fabio.ros...@geminformatica.it
Cc: Gluster-users@gluster.org List gluster-users@gluster.org
Sent: Friday, January 24, 2014 3:29:19 PM
Subject: Re: [Gluster-users] Replication delay

Hi Fabio,
 This is a known issue that has been addressed on master. It may be
 backported to 3.5. When a file is undergoing changes, it may
 appear
 in
 'gluster volume heal volname info' output even when it doesn't
 need
 any self-heal.

Pranith
   
   Sorry, I just saw that there is a self-heal happening for 15 minutes when
   you
   stop the VMs. How are you checking that the self-heal is happening?
  
  When I stop the VM for alfresco.qc2, heal info still reports alfresco.qc2
  as in need for healing for about 15min.
  It seems this is a real out-of-sync situation because if I check the two
  bricks I get different modification times up until they are healed (no more
  reported by heal info). This is the bricks' status for alfresco.qc2 while
  the VM is halted:
  
  [root@networker ~]# ll /glustexp/pri1/brick/
  totale 27769492
  -rw---. 2 qemu qemu 8212709376 24 gen 11:16 alfresco.qc2
  [...]
  
  [root@networker2 ~]# ll /glustexp/pri1/brick/
  totale 27769384
  -rw---. 2 qemu qemu 8212709376 24 gen 11:05 alfresco.qc2
  [...]
  
  Bricks' status AFTER heal info doesn't report alfresco.qc2 anymore:
  
  [root@networker ~]# ll /glustexp/pri1/brick/
  totale 27769492
  -rw---. 2 qemu qemu 8212709376 24 gen 11:05 alfresco.qc2
  
  [root@networker2 ~]# ll /glustexp/pri1/brick/
  totale 27769384
  -rw---. 2 qemu qemu 8212709376 24 gen 11:05 alfresco.qc2
  
  Thanks for helping!
  
  Fabio
  
   
   Pranith

- Original Message -
 From: Fabio Rosati fabio.ros...@geminformatica.it
 To: Gluster-users@gluster.org List gluster-users@gluster.org
 Sent: Friday, January 24, 2014 3:17:27 PM
 Subject: [Gluster-users] Replication delay
 
 Hi All,
 
 in a distributed-replicated volume hosting some VMs disk images
 

Re: [Gluster-users] Replication delay

2014-01-24 Thread Pranith Kumar Karampuri
Fabio,
  Seems like writes on first brick of this replica pair seem to be failing 
from the mount. Could you check both client and brick logs to see where these 
failures are coming from?

Pranith
- Original Message -
 From: Fabio Rosati fabio.ros...@geminformatica.it
 To: Pranith Kumar Karampuri pkara...@redhat.com
 Cc: Gluster-users@gluster.org List gluster-users@gluster.org
 Sent: Friday, January 24, 2014 4:50:52 PM
 Subject: Re: [Gluster-users] Replication delay
 
 Ok, that's the output after the VM has been halted:
 
 [root@networker ~]# getfattr -d -m. -e hex /glustexp/pri1/brick/alfresco.qc2
 getfattr: Removing leading '/' from absolute path names
 # file: glustexp/pri1/brick/alfresco.qc2
 security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000
 trusted.afr.gv_pri-client-0=0x0139
 trusted.afr.gv_pri-client-1=0x
 trusted.gfid=0x298c76de7c8643a3909f7ef77dc294fe
 
 [root@networker2 ~]# getfattr -d -m. -e hex /glustexp/pri1/brick/alfresco.qc2
 getfattr: Removing leading '/' from absolute path names
 # file: glustexp/pri1/brick/alfresco.qc2
 security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000
 trusted.afr.gv_pri-client-0=0x0139
 trusted.afr.gv_pri-client-1=0x
 trusted.gfid=0x298c76de7c8643a3909f7ef77dc294fe
 
 
 When heal info stops reporting alfresco.qc2 I get:
 
 [root@networker glusterfs]# getfattr -d -m. -e hex
 /glustexp/pri1/brick/alfresco.qc2
 getfattr: Removing leading '/' from absolute path names
 # file: glustexp/pri1/brick/alfresco.qc2
 security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000
 trusted.afr.gv_pri-client-0=0x
 trusted.afr.gv_pri-client-1=0x
 trusted.gfid=0x298c76de7c8643a3909f7ef77dc294fe
 
 [root@networker2 ~]# getfattr -d -m. -e hex /glustexp/pri1/brick/alfresco.qc2
 getfattr: Removing leading '/' from absolute path names
 # file: glustexp/pri1/brick/alfresco.qc2
 security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000
 trusted.afr.gv_pri-client-0=0x
 trusted.afr.gv_pri-client-1=0x
 trusted.gfid=0x298c76de7c8643a3909f7ef77dc294fe
 
 
 Fabio
 
 - Messaggio originale -
  Da: Pranith Kumar Karampuri pkara...@redhat.com
  A: Fabio Rosati fabio.ros...@geminformatica.it
  Cc: Gluster-users@gluster.org List gluster-users@gluster.org
  Inviato: Venerdì, 24 gennaio 2014 11:36:12
  Oggetto: Re: [Gluster-users] Replication delay
  
  This time when you stop the VM, could you get the output of getfattr -d
  -m.
  -e hex file-path-on-brick on both the bricks to debug further.
  
  Pranith
  - Original Message -
   From: Fabio Rosati fabio.ros...@geminformatica.it
   To: Pranith Kumar Karampuri pkara...@redhat.com
   Cc: Gluster-users@gluster.org List gluster-users@gluster.org
   Sent: Friday, January 24, 2014 3:58:38 PM
   Subject: Re: [Gluster-users] Replication delay
   
   
   
   
   - Messaggio originale -
Da: Pranith Kumar Karampuri pkara...@redhat.com
A: Fabio Rosati fabio.ros...@geminformatica.it
Cc: Gluster-users@gluster.org List gluster-users@gluster.org
Inviato: Venerdì, 24 gennaio 2014 11:02:15
Oggetto: Re: [Gluster-users] Replication delay



- Original Message -
 From: Pranith Kumar Karampuri pkara...@redhat.com
 To: Fabio Rosati fabio.ros...@geminformatica.it
 Cc: Gluster-users@gluster.org List gluster-users@gluster.org
 Sent: Friday, January 24, 2014 3:29:19 PM
 Subject: Re: [Gluster-users] Replication delay
 
 Hi Fabio,
  This is a known issue that has been addressed on master. It may
  be
  backported to 3.5. When a file is undergoing changes, it may
  appear
  in
  'gluster volume heal volname info' output even when it doesn't
  need
  any self-heal.
 
 Pranith

Sorry, I just saw that there is a self-heal happening for 15 minutes
when
you
stop the VMs. How are you checking that the self-heal is happening?
   
   When I stop the VM for alfresco.qc2, heal info still reports
   alfresco.qc2
   as in need for healing for about 15min.
   It seems this is a real out-of-sync situation because if I check the two
   bricks I get different modification times up until they are healed (no
   more
   reported by heal info). This is the bricks' status for alfresco.qc2
   while
   the VM is halted:
   
   [root@networker ~]# ll /glustexp/pri1/brick/
   totale 27769492
   -rw---. 2 qemu qemu 8212709376 24 gen 11:16 alfresco.qc2
   [...]
   
   [root@networker2 ~]# ll /glustexp/pri1/brick/
   totale 27769384
   -rw---. 2 qemu qemu 8212709376 24 gen 11:05 alfresco.qc2
   [...]
   
   Bricks' status AFTER heal info doesn't report alfresco.qc2 anymore:
   
   [root@networker ~]# ll /glustexp/pri1/brick/
   

Re: [Gluster-users] Replication delay

2014-01-24 Thread Fabio Rosati
You're right! In the brick log from the first peer (networker AKA 
nw1glus.gem.local) I found lots of these errors:

[2014-01-24 11:32:28.482639] E [posix.c:2135:posix_writev] 0-gv_pri-posix: 
write failed: offset 4812114432, Invalid argument
[2014-01-24 11:32:28.485334] I [server-rpc-fops.c:1439:server_writev_cbk] 
0-gv_pri-server: 31817: WRITEV 0 (f1e928ad-d4dd-49f3-abae-e99cb1f310e1) == 
(Invalid argument)
[2014-01-24 11:32:28.483791] E [posix.c:2135:posix_writev] 0-gv_pri-posix: 
write failed: offset 5562239488, Invalid argument
[2014-01-24 11:32:28.485416] I [server-rpc-fops.c:1439:server_writev_cbk] 
0-gv_pri-server: 31820: WRITEV 0 (f1e928ad-d4dd-49f3-abae-e99cb1f310e1) == 
(Invalid argument)
[2014-01-24 11:32:28.484275] E [posix.c:2135:posix_writev] 0-gv_pri-posix: 
write failed: offset 5757467136, Invalid argument
[2014-01-24 11:32:28.482841] E [posix.c:2135:posix_writev] 0-gv_pri-posix: 
write failed: offset 3742501376, Invalid argument
[2014-01-24 11:32:28.485494] I [server-rpc-fops.c:1439:server_writev_cbk] 
0-gv_pri-server: 31822: WRITEV 0 (f1e928ad-d4dd-49f3-abae-e99cb1f310e1) == 
(Invalid argument)
[2014-01-24 11:32:28.485534] I [server-rpc-fops.c:1439:server_writev_cbk] 
0-gv_pri-server: 31818: WRITEV 0 (f1e928ad-d4dd-49f3-abae-e99cb1f310e1) == 
(Invalid argument)
[2014-01-24 11:32:28.530943] E [posix.c:2135:posix_writev] 0-gv_pri-posix: 
write failed: offset 3156122112, Invalid argument
[2014-01-24 11:32:28.530997] I [server-rpc-fops.c:1439:server_writev_cbk] 
0-gv_pri-server: 31832: WRITEV 0 (f1e928ad-d4dd-49f3-abae-e99cb1f310e1) == 
(Invalid argument)

Then I noticed the SELinux context on the two bricks are different, I don't 
know if this can be the cause of the errors:

[root@networker gluspri]# ll -Z /glustexp/pri1/brick/
-rw---. qemu qemu system_u:object_r:file_t:s0  alfresco.qc2

[root@networker2 ~]# ll -Z /glustexp/pri1/brick/
-rw---. qemu qemu unconfined_u:object_r:file_t:s0  alfresco.qc2


Fabio

- Messaggio originale -
 Da: Pranith Kumar Karampuri pkara...@redhat.com
 A: Fabio Rosati fabio.ros...@geminformatica.it
 Cc: Gluster-users@gluster.org List gluster-users@gluster.org
 Inviato: Venerdì, 24 gennaio 2014 12:27:56
 Oggetto: Re: [Gluster-users] Replication delay
 
 Fabio,
   Seems like writes on first brick of this replica pair seem to be
   failing from the mount. Could you check both client and brick logs to
   see where these failures are coming from?
 
 Pranith
 - Original Message -
  From: Fabio Rosati fabio.ros...@geminformatica.it
  To: Pranith Kumar Karampuri pkara...@redhat.com
  Cc: Gluster-users@gluster.org List gluster-users@gluster.org
  Sent: Friday, January 24, 2014 4:50:52 PM
  Subject: Re: [Gluster-users] Replication delay
  
  Ok, that's the output after the VM has been halted:
  
  [root@networker ~]# getfattr -d -m. -e hex
  /glustexp/pri1/brick/alfresco.qc2
  getfattr: Removing leading '/' from absolute path names
  # file: glustexp/pri1/brick/alfresco.qc2
  security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000
  trusted.afr.gv_pri-client-0=0x0139
  trusted.afr.gv_pri-client-1=0x
  trusted.gfid=0x298c76de7c8643a3909f7ef77dc294fe
  
  [root@networker2 ~]# getfattr -d -m. -e hex
  /glustexp/pri1/brick/alfresco.qc2
  getfattr: Removing leading '/' from absolute path names
  # file: glustexp/pri1/brick/alfresco.qc2
  security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000
  trusted.afr.gv_pri-client-0=0x0139
  trusted.afr.gv_pri-client-1=0x
  trusted.gfid=0x298c76de7c8643a3909f7ef77dc294fe
  
  
  When heal info stops reporting alfresco.qc2 I get:
  
  [root@networker glusterfs]# getfattr -d -m. -e hex
  /glustexp/pri1/brick/alfresco.qc2
  getfattr: Removing leading '/' from absolute path names
  # file: glustexp/pri1/brick/alfresco.qc2
  security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000
  trusted.afr.gv_pri-client-0=0x
  trusted.afr.gv_pri-client-1=0x
  trusted.gfid=0x298c76de7c8643a3909f7ef77dc294fe
  
  [root@networker2 ~]# getfattr -d -m. -e hex
  /glustexp/pri1/brick/alfresco.qc2
  getfattr: Removing leading '/' from absolute path names
  # file: glustexp/pri1/brick/alfresco.qc2
  security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000
  trusted.afr.gv_pri-client-0=0x
  trusted.afr.gv_pri-client-1=0x
  trusted.gfid=0x298c76de7c8643a3909f7ef77dc294fe
  
  
  Fabio
  
  - Messaggio originale -
   Da: Pranith Kumar Karampuri pkara...@redhat.com
   A: Fabio Rosati fabio.ros...@geminformatica.it
   Cc: Gluster-users@gluster.org List gluster-users@gluster.org
   Inviato: Venerdì, 24 gennaio 2014 11:36:12
   Oggetto: Re: [Gluster-users] Replication delay
   
   This time when you stop the VM, could you get the output of getfattr 

Re: [Gluster-users] Replication delay

2014-01-24 Thread Pranith Kumar Karampuri
Fabio,
 It has nothing to do with SELINUX IMO. You were saying self-heal happens 
when the VM is paused, that means writes from self-heal's fd are succeeding. So 
something happened to that VM's fd using which kvm writes. Wonder what!. When 
did you start getting this problem? What happened at that time.

Pranith

- Original Message -
 From: Fabio Rosati fabio.ros...@geminformatica.it
 To: Pranith Kumar Karampuri pkara...@redhat.com
 Cc: Gluster-users@gluster.org List gluster-users@gluster.org
 Sent: Friday, January 24, 2014 5:09:25 PM
 Subject: Re: [Gluster-users] Replication delay
 
 You're right! In the brick log from the first peer (networker AKA
 nw1glus.gem.local) I found lots of these errors:
 
 [2014-01-24 11:32:28.482639] E [posix.c:2135:posix_writev] 0-gv_pri-posix:
 write failed: offset 4812114432, Invalid argument
 [2014-01-24 11:32:28.485334] I [server-rpc-fops.c:1439:server_writev_cbk]
 0-gv_pri-server: 31817: WRITEV 0 (f1e928ad-d4dd-49f3-abae-e99cb1f310e1) ==
 (Invalid argument)
 [2014-01-24 11:32:28.483791] E [posix.c:2135:posix_writev] 0-gv_pri-posix:
 write failed: offset 5562239488, Invalid argument
 [2014-01-24 11:32:28.485416] I [server-rpc-fops.c:1439:server_writev_cbk]
 0-gv_pri-server: 31820: WRITEV 0 (f1e928ad-d4dd-49f3-abae-e99cb1f310e1) ==
 (Invalid argument)
 [2014-01-24 11:32:28.484275] E [posix.c:2135:posix_writev] 0-gv_pri-posix:
 write failed: offset 5757467136, Invalid argument
 [2014-01-24 11:32:28.482841] E [posix.c:2135:posix_writev] 0-gv_pri-posix:
 write failed: offset 3742501376, Invalid argument
 [2014-01-24 11:32:28.485494] I [server-rpc-fops.c:1439:server_writev_cbk]
 0-gv_pri-server: 31822: WRITEV 0 (f1e928ad-d4dd-49f3-abae-e99cb1f310e1) ==
 (Invalid argument)
 [2014-01-24 11:32:28.485534] I [server-rpc-fops.c:1439:server_writev_cbk]
 0-gv_pri-server: 31818: WRITEV 0 (f1e928ad-d4dd-49f3-abae-e99cb1f310e1) ==
 (Invalid argument)
 [2014-01-24 11:32:28.530943] E [posix.c:2135:posix_writev] 0-gv_pri-posix:
 write failed: offset 3156122112, Invalid argument
 [2014-01-24 11:32:28.530997] I [server-rpc-fops.c:1439:server_writev_cbk]
 0-gv_pri-server: 31832: WRITEV 0 (f1e928ad-d4dd-49f3-abae-e99cb1f310e1) ==
 (Invalid argument)
 
 Then I noticed the SELinux context on the two bricks are different, I don't
 know if this can be the cause of the errors:
 
 [root@networker gluspri]# ll -Z /glustexp/pri1/brick/
 -rw---. qemu qemu system_u:object_r:file_t:s0  alfresco.qc2
 
 [root@networker2 ~]# ll -Z /glustexp/pri1/brick/
 -rw---. qemu qemu unconfined_u:object_r:file_t:s0  alfresco.qc2
 
 
 Fabio
 
 - Messaggio originale -
  Da: Pranith Kumar Karampuri pkara...@redhat.com
  A: Fabio Rosati fabio.ros...@geminformatica.it
  Cc: Gluster-users@gluster.org List gluster-users@gluster.org
  Inviato: Venerdì, 24 gennaio 2014 12:27:56
  Oggetto: Re: [Gluster-users] Replication delay
  
  Fabio,
Seems like writes on first brick of this replica pair seem to be
failing from the mount. Could you check both client and brick logs to
see where these failures are coming from?
  
  Pranith
  - Original Message -
   From: Fabio Rosati fabio.ros...@geminformatica.it
   To: Pranith Kumar Karampuri pkara...@redhat.com
   Cc: Gluster-users@gluster.org List gluster-users@gluster.org
   Sent: Friday, January 24, 2014 4:50:52 PM
   Subject: Re: [Gluster-users] Replication delay
   
   Ok, that's the output after the VM has been halted:
   
   [root@networker ~]# getfattr -d -m. -e hex
   /glustexp/pri1/brick/alfresco.qc2
   getfattr: Removing leading '/' from absolute path names
   # file: glustexp/pri1/brick/alfresco.qc2
   security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000
   trusted.afr.gv_pri-client-0=0x0139
   trusted.afr.gv_pri-client-1=0x
   trusted.gfid=0x298c76de7c8643a3909f7ef77dc294fe
   
   [root@networker2 ~]# getfattr -d -m. -e hex
   /glustexp/pri1/brick/alfresco.qc2
   getfattr: Removing leading '/' from absolute path names
   # file: glustexp/pri1/brick/alfresco.qc2
   security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000
   trusted.afr.gv_pri-client-0=0x0139
   trusted.afr.gv_pri-client-1=0x
   trusted.gfid=0x298c76de7c8643a3909f7ef77dc294fe
   
   
   When heal info stops reporting alfresco.qc2 I get:
   
   [root@networker glusterfs]# getfattr -d -m. -e hex
   /glustexp/pri1/brick/alfresco.qc2
   getfattr: Removing leading '/' from absolute path names
   # file: glustexp/pri1/brick/alfresco.qc2
   security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000
   trusted.afr.gv_pri-client-0=0x
   trusted.afr.gv_pri-client-1=0x
   trusted.gfid=0x298c76de7c8643a3909f7ef77dc294fe
   
   [root@networker2 ~]# getfattr -d -m. -e hex
   /glustexp/pri1/brick/alfresco.qc2
   getfattr: Removing leading '/' from absolute path 

Re: [Gluster-users] Replication delay

2014-01-24 Thread Vijay Bellur

On 01/24/2014 05:09 PM, Fabio Rosati wrote:

You're right! In the brick log from the first peer (networker AKA 
nw1glus.gem.local) I found lots of these errors:

[2014-01-24 11:32:28.482639] E [posix.c:2135:posix_writev] 0-gv_pri-posix: 
write failed: offset 4812114432, Invalid argument
[2014-01-24 11:32:28.485334] I [server-rpc-fops.c:1439:server_writev_cbk] 
0-gv_pri-server: 31817: WRITEV 0 (f1e928ad-d4dd-49f3-abae-e99cb1f310e1) == 
(Invalid argument)
[2014-01-24 11:32:28.483791] E [posix.c:2135:posix_writev] 0-gv_pri-posix: 
write failed: offset 5562239488, Invalid argument
[2014-01-24 11:32:28.485416] I [server-rpc-fops.c:1439:server_writev_cbk] 
0-gv_pri-server: 31820: WRITEV 0 (f1e928ad-d4dd-49f3-abae-e99cb1f310e1) == 
(Invalid argument)
[2014-01-24 11:32:28.484275] E [posix.c:2135:posix_writev] 0-gv_pri-posix: 
write failed: offset 5757467136, Invalid argument
[2014-01-24 11:32:28.482841] E [posix.c:2135:posix_writev] 0-gv_pri-posix: 
write failed: offset 3742501376, Invalid argument
[2014-01-24 11:32:28.485494] I [server-rpc-fops.c:1439:server_writev_cbk] 
0-gv_pri-server: 31822: WRITEV 0 (f1e928ad-d4dd-49f3-abae-e99cb1f310e1) == 
(Invalid argument)
[2014-01-24 11:32:28.485534] I [server-rpc-fops.c:1439:server_writev_cbk] 
0-gv_pri-server: 31818: WRITEV 0 (f1e928ad-d4dd-49f3-abae-e99cb1f310e1) == 
(Invalid argument)
[2014-01-24 11:32:28.530943] E [posix.c:2135:posix_writev] 0-gv_pri-posix: 
write failed: offset 3156122112, Invalid argument
[2014-01-24 11:32:28.530997] I [server-rpc-fops.c:1439:server_writev_cbk] 
0-gv_pri-server: 31832: WRITEV 0 (f1e928ad-d4dd-49f3-abae-e99cb1f310e1) == 
(Invalid argument)

Then I noticed the SELinux context on the two bricks are different, I don't 
know if this can be the cause of the errors:




Might be related to the sector size in xfs and the disks that are being 
used. [1] and [2] have some details.


-Vijay

[1] http://www.gluster.org/pipermail/gluster-users/2013-November/037842.html

[2] https://bugzilla.redhat.com/show_bug.cgi?id=997839



___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] gluster and kvm livemigration

2014-01-24 Thread Bernhard Glomm

I submitted 
Bug 1057645 https://bugzilla.redhat.com/show_bug.cgi?id=1057645


Bernhard
On 24.01.2014 11:07:49, Bernhard Glomm wrote:
 samuli wrote: 
   Can you try to set storage.owner-uid and storage.owner-gid to 
   libvirt-qemu? To do that you have to stop volume.
 hi samuli, hi all 
 I tried setting storage.owner-uid and storage.owner-gid to  libvirt-qemu, as 
 suggested, but with the same effect, during livemigration the ownership of 
 the imagefile changes from libvirt-qemu/kvm to root/root
 root@pong[/5]:~ # gluster volume info glfs_atom01   Volume Name: 
 glfs_atom01 Type: Replicate Volume ID: 
 f28f0f62-37b3-4b10-8e86-9b373f4c0e75 Status: Started Number of Bricks: 1 x 
 2 = 2 Transport-type: tcp Bricks: Brick1: 172.24.1.11:/ecopool/fs_atom01 
 Brick2: 172.24.1.13:/ecopool/fs_atom01 Options Reconfigured: 
 storage.owner-gid: 104 storage.owner-uid: 107 network.remote-dio: enable
 this is tree -pfungiA path to where my images live : atom01 is running
 [-rw--- libvirt- kvm     ]  /srv/vms/mnt_atom01/atom01.img [drwxr-xr-x 
 libvirt- kvm     ]  /srv/vms/mnt_atom02 [-rw--- root     root    ]  
 /srv/vms/mnt_atom02/atom02.img [drwxr-xr-x libvirt- kvm     ]  
 /srv/vms/mnt_atom03
 Now I migrate through VirtualMachineManager and watching tree I see the 
 permission changing to:
 [drwxr-xr-x libvirt- kvm     ]  /srv/vms/mnt_atom01 [-rw--- root     
 root    ]  /srv/vms/mnt_atom01/atom01.img [drwxr-xr-x libvirt- kvm     ]  
 /srv/vms/mnt_atom02 [-rw--- root     root    ]  
 /srv/vms/mnt_atom02/atom02.img
 From inside the atom01 (the VM) the filesystem becomes readonly. But in 
 contrast to http://epboven.home.xs4all.nl/gluster-migrate.html
 I can still read all file, can checksum them, just no write access from 
 outside the image file behaves as Paul described, as long as the machine is 
 running I can't read the file
 root@pong[/5]:~ # virsh list  Id    Name                           State
 
  6     atom01                         running

 root@pong[/5]:~ # l /srv/vms/mnt_atom01/atom01.img
 -rw--- 1 root root 10G Jan 24 10:20 /srv/vms/mnt_atom01/atom01.img
 root@pong[/5]:~ # file /srv/vms/mnt_atom01/atom01.img
 /srv/vms/mnt_atom01/atom01.img: writable, regular file, no read permission
 root@pong[/5]:~ # md5sum /srv/vms/mnt_atom01/atom01.img
 md5sum: /srv/vms/mnt_atom01/atom01.img: Permission denied
 root@pong[/5]:~ # virsh destroy atom01
 Domain atom01 destroyed

 root@pong[/5]:~ # l /srv/vms/mnt_atom01/atom01.img
 -rw--- 1 root root 10G Jan 24 10:20 /srv/vms/mnt_atom01/atom01.img
 root@pong[/5]:~ # file /srv/vms/mnt_atom01/atom01.img
 /srv/vms/mnt_atom01/atom01.img: x86 boot sector; partition 1: ID=0x83, 
 starthead 1, startsector 63, 16777165 sectors; partition 2: ID=0xf, starthead 
 254, startsector 16777228, 1677718 sectors, code offset 0x63
 root@pong[/5]:~ # md5sum /srv/vms/mnt_atom01/atom01.img
 9d048558deb46fef7b24e8895711c554  /srv/vms/mnt_atom01/atom01.img
 root@pong[/5]:~ # 
 But interestingly the source of the migration can access the file after 
 migration completed like so: start atom01 on host ping, migrate it to 
 pong
 root@pong[/8]:~ # file /srv/vms/mnt_atom01/atom01.img 
 /srv/vms/mnt_atom01/atom01.img: writable, regular file, no read permission  
 root@ping[/5]:~ # file /srv/vms/mnt_atom01/atom01.img 
 /srv/vms/mnt_atom01/atom01.img: x86 boot sector; partition 1: ID=0x83, 
 starthead 1, startsector 63, 16777165 sectors; partition 2: ID=0xf, starthead 
 254, startsector 16777228, 1677718 sectors, code offset 0x63
 100% reproducible 
 Regards
 Bernhard ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://supercolony.gluster.org/mailman/listinfo/gluster-users



-- 


  
 


  

  
Bernhard Glomm

IT Administration


  

  Phone:


  +49 (30) 86880 134

  
  Fax:


  +49 (30) 86880 100

  
  Skype:


  bernhard.glomm.ecologic

  

  









  


  Ecologic Institut gemeinnützige GmbH | Pfalzburger Str. 43/44 | 10717 
Berlin | Germany

  GF: R. Andreas Kraemer | AG: Charlottenburg HRB 57947 | 
USt/VAT-IdNr.: DE811963464

  Ecologic™ is a Trade Mark (TM) of Ecologic Institut gemeinnützige GmbH

  

 

___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Replication delay

2014-01-24 Thread Fabio Rosati
Pranith and Vijay,

the problems began when I started to use the alfresco.qc2 and other disk images 
on the gv_pri volume backed by XFS on LVM partitions. I copied these images 
from another GlusterFS volume, (backed by ext4, no LVM partitions) where they 
works as expected. The VMs runs on the same hosts, so the qemu-kvm version is 
the same.

Here are the details of a brick from the gv_pri (new and problematic) volume:

[root@networker bricks]# xfs_info /glustexp/pri1
meta-data=/dev/mapper/vg_guests-lv_brick1 isize=512agcount=16, 
agsize=3194880 blks
 =   sectsz=4096  attr=2, projid32bit=0
data =   bsize=4096   blocks=51118080, imaxpct=25
 =   sunit=0  swidth=0 blks
naming   =version 2  bsize=4096   ascii-ci=0
log  =internal   bsize=4096   blocks=24960, version=2
 =   sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none   extsz=4096   blocks=0, rtextents=0


This is a brick partition from the gv_sec (old and working properly) volume:

[root@networker2 bricks]# dumpe2fs -h /dev/sda1
dumpe2fs 1.41.12 (17-May-2010)
Filesystem volume name:   none
Last mounted on:  /glustexp/sec2
Filesystem UUID:  87678a0d-aef6-403c-930a-a9b2b4cb7c37
Filesystem magic number:  0xEF53
Filesystem revision #:1 (dynamic)
Filesystem features:  has_journal ext_attr resize_inode dir_index filetype 
needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg 
dir_nlink extra_isize
Filesystem flags: signed_directory_hash 
Default mount options:(none)
Filesystem state: clean
Errors behavior:  Continue
Filesystem OS type:   Linux
Inode count:  9773056
Block count:  39072718
Reserved block count: 1953635
Free blocks:  36406615
Free inodes:  9772982
First block:  0
Block size:   4096
Fragment size:4096
Reserved GDT blocks:  1014
Blocks per group: 32768
Fragments per group:  32768
Inodes per group: 8192
Inode blocks per group:   512
Flex block group size:16
Filesystem created:   Wed Dec 18 10:03:39 2013
Last mount time:  Thu Jan  9 23:03:24 2014
Last write time:  Thu Jan  9 23:03:24 2014
Mount count:  2
Maximum mount count:  39
Last checked: Wed Dec 18 10:03:39 2013
Check interval:   15552000 (6 months)
Next check after: Mon Jun 16 11:03:39 2014
Lifetime writes:  189 GB
Reserved blocks uid:  0 (user root)
Reserved blocks gid:  0 (group root)
First inode:  11
Inode size:   256
Required extra isize: 28
Desired extra isize:  28
Journal inode:8
First orphan inode:   917534
Default directory hash:   half_md4
Directory Hash Seed:  4891a3c2-8e00-45a3-ac6b-ea96de069b38
Journal backup:   inode blocks
Journal features: journal_incompat_revoke
Journal size: 128M
Journal length:   32768
Journal sequence: 0x0015bae4
Journal start:31215


The block size is the same, 4096 bytes.
I did some other investigation and it seems the problem happens only with VM 
disk images internally formatted with a blocksize of 1024 bytes. There are no 
problems with disk images formatted with a block size on 4096 bytes. Anyway, I 
don't know if this is a coincidence.

Do you think this could be the origin of the problem? If so, how can I solve it?
In the links posted by Vijay someone suggests to start the VM with cache != 
none but this will prevent live migration, AFAIK.
Another solution may be to recreate the volume backing it with XFS partitions 
formatted with a different block size (smaller? 1024 bytes?), this would be a 
painful option, but if this will solve the problem, I go for it.


Thanks a lot,

Fabio


- Messaggio originale -
Da: Pranith Kumar Karampuri pkara...@redhat.com
A: Fabio Rosati fabio.ros...@geminformatica.it
Cc: Gluster-users@gluster.org List gluster-users@gluster.org
Inviato: Venerdì, 24 gennaio 2014 12:52:44
Oggetto: Re: [Gluster-users] Replication delay

Fabio,
 It has nothing to do with SELINUX IMO. You were saying self-heal happens 
when the VM is paused, that means writes from self-heal's fd are succeeding. So 
something happened to that VM's fd using which kvm writes. Wonder what!. When 
did you start getting this problem? What happened at that time.

Pranith

- Original Message -
 From: Fabio Rosati fabio.ros...@geminformatica.it
 To: Pranith Kumar Karampuri pkara...@redhat.com
 Cc: Gluster-users@gluster.org List gluster-users@gluster.org
 Sent: Friday, January 24, 2014 5:09:25 PM
 Subject: Re: [Gluster-users] Replication delay
 
 You're right! In the brick log from the first peer (networker AKA
 nw1glus.gem.local) I found lots of these errors:
 
 [2014-01-24 11:32:28.482639] E 

[Gluster-users] Feasibility Over Lower Bandwidth Link and General Questions...

2014-01-24 Thread Brock Nanson
I'm doing some early research into how/if we can mirror two Samba servers
connected via a 10 mbit VPN.  The intent is to have users (MS Office,
AutoCAD, some GIS workstations) have a 'local' copy of all files, no matter
which office location they're sitting in (and which Samba server they're
working with).  We need to have the identical file (as seen by the other
server) locked when the 'local' copy is open.  Then updated when saved, of
course.

I'm new to this and am working my way up the learning curve with an ice axe
and crampons... I'm still stumbling across the icefield and an attempt at
the summit is a lng way off.

I've learned that CephFS isn't ready for mainstream use just yet.

I've learned that XtreemFS doesn't lock files with POSIX commands... if the
application using the file doesn't write a lock file (that's quickly
replicated across the OSD's), the potential for both copies being edited at
the same time exists.

I want to be able to ensure the local Samba server utilizes the physically
local files, drives, volumes - whatever you want to call them - to avoid
delays in opening and closing files over the VPN.  Striping across local
and distant drives would presumably create a big speed problem.

On the face of it, is GlusterFS an option that might fit my needs?  Is
there a document that explains the configuration of such an arrangement?
 Clearly I'm going to be setting up a test bed of sorts eventually, but I'd
prefer to start with something that *might* work, rather than something
that simply can't...

Thanks!
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Gluster Volume Properties

2014-01-24 Thread James
Hi there,

I've been taking another look at some of gluster volume properties. If
you know of some that are missing from my list or have incorrect
entries, please let me know! My list is here:

https://github.com/purpleidea/puppet-gluster/blob/master/manifests/volume/property/data.pp#L18

This curated list makes it easy to manage properties with
Puppet-Gluster. The list isn't complete though. This is where I need
your help!


Semiosis: The latest git master adds support for the options you
requested in gluster volume properties. It also contains this patch:
https://github.com/purpleidea/puppet-gluster/commit/221e3049f04fb608d013d7092bcfb258010b2d6d

which adds support for adding the rpc-auth-allow-insecure option to the
glusterd.vol file. You can use these two together like:

class { '::gluster::simple':
volume = 'yourvolumename',
rpcauthallowinsecure = true,
}

gluster::volume::property{ 'yourvolumename#server.allow-insecure':
value = on,# you can use true/false, on/off
}

which should hopefully make testing your libgfapi-jni easy!

If anyone has any other questions, please let me know.

James
@purpleidea (twitter / irc)
https://ttboj.wordpress.com/



signature.asc
Description: This is a digitally signed message part
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Replication delay

2014-01-24 Thread Vijay Bellur

On 01/24/2014 09:24 PM, Fabio Rosati wrote:




The block size is the same, 4096 bytes.
I did some other investigation and it seems the problem happens only with VM 
disk images internally formatted with a blocksize of 1024 bytes. There are no 
problems with disk images formatted with a block size on 4096 bytes. Anyway, I 
don't know if this is a coincidence.

Do you think this could be the origin of the problem? If so, how can I solve it?
In the links posted by Vijay someone suggests to start the VM with cache != 
none but this will prevent live migration, AFAIK.
Another solution may be to recreate the volume backing it with XFS partitions 
formatted with a different block size (smaller? 1024 bytes?), this would be a 
painful option, but if this will solve the problem, I go for it.



A lower sector size (512) for xfs has been observed to be useful in 
overcoming this problem.


Another solution might be to use logical_block_size=4096 option as 
referred here [1].


-Vijay

[1] https://bugzilla.redhat.com/show_bug.cgi?id=997839#c7

___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users