[Gluster-users] Replication delay
Hi All, in a distributed-replicated volume hosting some VMs disk images (GlusterFS 3.4.2 on CentOS 6.5, qemu-kvm with glusterfs native support, no fuse mount), I always get the same two files that need healing: [root@networker ~]# gluster volume heal gv_pri info Gathering Heal info on volume gv_pri has been successful Brick nw1glus.gem.local:/glustexp/pri1/brick Number of entries: 2 /alfresco.qc2 /remlog.qc2 Brick nw2glus.gem.local:/glustexp/pri1/brick Number of entries: 2 /alfresco.qc2 /remlog.qc2 Brick nw3glus.gem.local:/glustexp/pri2/brick Number of entries: 0 Brick nw4glus.gem.local:/glustexp/pri2/brick Number of entries: 0 This is not a split-brain situation (I checked) and If I stop the two VMs that use these images, I get the two files healed/synced in about 15min. This is too much time, IMHO. In this volume there are other VM images with (smaller) disk images replicated on the same bricks and they get synced in real-time. These are the volume's details, the host networker is nw1glus.gem.local : [root@networker ~]# gluster volume info gv_pri Volume Name: gv_pri Type: Distributed-Replicate Volume ID: 3d91b91e-4d72-484f-8655-e5ed8d38bb28 Status: Started Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: nw1glus.gem.local:/glustexp/pri1/brick Brick2: nw2glus.gem.local:/glustexp/pri1/brick Brick3: nw3glus.gem.local:/glustexp/pri2/brick Brick4: nw4glus.gem.local:/glustexp/pri2/brick Options Reconfigured: server.allow-insecure: on storage.owner-uid: 107 storage.owner-gid: 107 [root@networker ~]# gluster volume status gv_pri detail Status of volume: gv_pri -- Brick : Brick nw1glus.gem.local:/glustexp/pri1/brick Port : 50178 Online : Y Pid : 25721 File System : xfs Device : /dev/mapper/vg_guests-lv_brick1 Mount Options : rw,noatime Inode Size : 512 Disk Space Free : 168.4GB Total Disk Space : 194.9GB Inode Count : 102236160 Free Inodes : 102236130 -- Brick : Brick nw2glus.gem.local:/glustexp/pri1/brick Port : 50178 Online : Y Pid : 27832 File System : xfs Device : /dev/mapper/vg_guests-lv_brick1 Mount Options : rw,noatime Inode Size : 512 Disk Space Free : 168.4GB Total Disk Space : 194.9GB Inode Count : 102236160 Free Inodes : 102236130 -- Brick : Brick nw3glus.gem.local:/glustexp/pri2/brick Port : 50182 Online : Y Pid : 14571 File System : xfs Device : /dev/mapper/vg_guests-lv_brick2 Mount Options : rw,noatime Inode Size : 512 Disk Space Free : 418.3GB Total Disk Space : 433.8GB Inode Count : 227540992 Free Inodes : 227540973 -- Brick : Brick nw4glus.gem.local:/glustexp/pri2/brick Port : 50181 Online : Y Pid : 21942 File System : xfs Device : /dev/mapper/vg_guests-lv_brick2 Mount Options : rw,noatime Inode Size : 512 Disk Space Free : 418.3GB Total Disk Space : 433.8GB Inode Count : 227540992 Free Inodes : 227540973 fuse-mount of the gv_pri volume: [root@networker ~]# ll -h /mnt/gluspri/ totale 37G -rw---. 1 qemu qemu 7,7G 24 gen 10:21 alfresco.qc2 -rw---. 1 qemu qemu 4,2G 24 gen 10:22 check_mk-salmo.qc2 -rw---. 1 qemu qemu 27M 23 gen 16:42 newnxserver.qc2 -rw---. 1 qemu qemu 1,1G 23 gen 13:38 newubutest1.qc2 -rw---. 1 qemu qemu 11G 24 gen 10:17 nxserver.qc2 -rw---. 1 qemu qemu 8,1G 24 gen 10:17 remlog.qc2 -rw---. 1 qemu qemu 5,6G 24 gen 10:19 ubutest1.qc2 Do you think this is the expected behaviour, maybe due to caching? What if the most updated node goes down while the VMs are running? Thanks a lot, Fabio Rosati ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Replication delay
Hi Fabio, This is a known issue that has been addressed on master. It may be backported to 3.5. When a file is undergoing changes, it may appear in 'gluster volume heal volname info' output even when it doesn't need any self-heal. Pranith - Original Message - From: Fabio Rosati fabio.ros...@geminformatica.it To: Gluster-users@gluster.org List gluster-users@gluster.org Sent: Friday, January 24, 2014 3:17:27 PM Subject: [Gluster-users] Replication delay Hi All, in a distributed-replicated volume hosting some VMs disk images (GlusterFS 3.4.2 on CentOS 6.5, qemu-kvm with glusterfs native support, no fuse mount), I always get the same two files that need healing: [root@networker ~]# gluster volume heal gv_pri info Gathering Heal info on volume gv_pri has been successful Brick nw1glus.gem.local:/glustexp/pri1/brick Number of entries: 2 /alfresco.qc2 /remlog.qc2 Brick nw2glus.gem.local:/glustexp/pri1/brick Number of entries: 2 /alfresco.qc2 /remlog.qc2 Brick nw3glus.gem.local:/glustexp/pri2/brick Number of entries: 0 Brick nw4glus.gem.local:/glustexp/pri2/brick Number of entries: 0 This is not a split-brain situation (I checked) and If I stop the two VMs that use these images, I get the two files healed/synced in about 15min. This is too much time, IMHO. In this volume there are other VM images with (smaller) disk images replicated on the same bricks and they get synced in real-time. These are the volume's details, the host networker is nw1glus.gem.local : [root@networker ~]# gluster volume info gv_pri Volume Name: gv_pri Type: Distributed-Replicate Volume ID: 3d91b91e-4d72-484f-8655-e5ed8d38bb28 Status: Started Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: nw1glus.gem.local:/glustexp/pri1/brick Brick2: nw2glus.gem.local:/glustexp/pri1/brick Brick3: nw3glus.gem.local:/glustexp/pri2/brick Brick4: nw4glus.gem.local:/glustexp/pri2/brick Options Reconfigured: server.allow-insecure: on storage.owner-uid: 107 storage.owner-gid: 107 [root@networker ~]# gluster volume status gv_pri detail Status of volume: gv_pri -- Brick : Brick nw1glus.gem.local:/glustexp/pri1/brick Port : 50178 Online : Y Pid : 25721 File System : xfs Device : /dev/mapper/vg_guests-lv_brick1 Mount Options : rw,noatime Inode Size : 512 Disk Space Free : 168.4GB Total Disk Space : 194.9GB Inode Count : 102236160 Free Inodes : 102236130 -- Brick : Brick nw2glus.gem.local:/glustexp/pri1/brick Port : 50178 Online : Y Pid : 27832 File System : xfs Device : /dev/mapper/vg_guests-lv_brick1 Mount Options : rw,noatime Inode Size : 512 Disk Space Free : 168.4GB Total Disk Space : 194.9GB Inode Count : 102236160 Free Inodes : 102236130 -- Brick : Brick nw3glus.gem.local:/glustexp/pri2/brick Port : 50182 Online : Y Pid : 14571 File System : xfs Device : /dev/mapper/vg_guests-lv_brick2 Mount Options : rw,noatime Inode Size : 512 Disk Space Free : 418.3GB Total Disk Space : 433.8GB Inode Count : 227540992 Free Inodes : 227540973 -- Brick : Brick nw4glus.gem.local:/glustexp/pri2/brick Port : 50181 Online : Y Pid : 21942 File System : xfs Device : /dev/mapper/vg_guests-lv_brick2 Mount Options : rw,noatime Inode Size : 512 Disk Space Free : 418.3GB Total Disk Space : 433.8GB Inode Count : 227540992 Free Inodes : 227540973 fuse-mount of the gv_pri volume: [root@networker ~]# ll -h /mnt/gluspri/ totale 37G -rw---. 1 qemu qemu 7,7G 24 gen 10:21 alfresco.qc2 -rw---. 1 qemu qemu 4,2G 24 gen 10:22 check_mk-salmo.qc2 -rw---. 1 qemu qemu 27M 23 gen 16:42 newnxserver.qc2 -rw---. 1 qemu qemu 1,1G 23 gen 13:38 newubutest1.qc2 -rw---. 1 qemu qemu 11G 24 gen 10:17 nxserver.qc2 -rw---. 1 qemu qemu 8,1G 24 gen 10:17 remlog.qc2 -rw---. 1 qemu qemu 5,6G 24 gen 10:19 ubutest1.qc2 Do you think this is the expected behaviour, maybe due to caching? What if the most updated node goes down while the VMs are running? Thanks a lot, Fabio Rosati ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Replication delay
- Original Message - From: Pranith Kumar Karampuri pkara...@redhat.com To: Fabio Rosati fabio.ros...@geminformatica.it Cc: Gluster-users@gluster.org List gluster-users@gluster.org Sent: Friday, January 24, 2014 3:29:19 PM Subject: Re: [Gluster-users] Replication delay Hi Fabio, This is a known issue that has been addressed on master. It may be backported to 3.5. When a file is undergoing changes, it may appear in 'gluster volume heal volname info' output even when it doesn't need any self-heal. Pranith Sorry, I just saw that there is a self-heal happening for 15 minutes when you stop the VMs. How are you checking that the self-heal is happening? Pranith - Original Message - From: Fabio Rosati fabio.ros...@geminformatica.it To: Gluster-users@gluster.org List gluster-users@gluster.org Sent: Friday, January 24, 2014 3:17:27 PM Subject: [Gluster-users] Replication delay Hi All, in a distributed-replicated volume hosting some VMs disk images (GlusterFS 3.4.2 on CentOS 6.5, qemu-kvm with glusterfs native support, no fuse mount), I always get the same two files that need healing: [root@networker ~]# gluster volume heal gv_pri info Gathering Heal info on volume gv_pri has been successful Brick nw1glus.gem.local:/glustexp/pri1/brick Number of entries: 2 /alfresco.qc2 /remlog.qc2 Brick nw2glus.gem.local:/glustexp/pri1/brick Number of entries: 2 /alfresco.qc2 /remlog.qc2 Brick nw3glus.gem.local:/glustexp/pri2/brick Number of entries: 0 Brick nw4glus.gem.local:/glustexp/pri2/brick Number of entries: 0 This is not a split-brain situation (I checked) and If I stop the two VMs that use these images, I get the two files healed/synced in about 15min. This is too much time, IMHO. In this volume there are other VM images with (smaller) disk images replicated on the same bricks and they get synced in real-time. These are the volume's details, the host networker is nw1glus.gem.local : [root@networker ~]# gluster volume info gv_pri Volume Name: gv_pri Type: Distributed-Replicate Volume ID: 3d91b91e-4d72-484f-8655-e5ed8d38bb28 Status: Started Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: nw1glus.gem.local:/glustexp/pri1/brick Brick2: nw2glus.gem.local:/glustexp/pri1/brick Brick3: nw3glus.gem.local:/glustexp/pri2/brick Brick4: nw4glus.gem.local:/glustexp/pri2/brick Options Reconfigured: server.allow-insecure: on storage.owner-uid: 107 storage.owner-gid: 107 [root@networker ~]# gluster volume status gv_pri detail Status of volume: gv_pri -- Brick : Brick nw1glus.gem.local:/glustexp/pri1/brick Port : 50178 Online : Y Pid : 25721 File System : xfs Device : /dev/mapper/vg_guests-lv_brick1 Mount Options : rw,noatime Inode Size : 512 Disk Space Free : 168.4GB Total Disk Space : 194.9GB Inode Count : 102236160 Free Inodes : 102236130 -- Brick : Brick nw2glus.gem.local:/glustexp/pri1/brick Port : 50178 Online : Y Pid : 27832 File System : xfs Device : /dev/mapper/vg_guests-lv_brick1 Mount Options : rw,noatime Inode Size : 512 Disk Space Free : 168.4GB Total Disk Space : 194.9GB Inode Count : 102236160 Free Inodes : 102236130 -- Brick : Brick nw3glus.gem.local:/glustexp/pri2/brick Port : 50182 Online : Y Pid : 14571 File System : xfs Device : /dev/mapper/vg_guests-lv_brick2 Mount Options : rw,noatime Inode Size : 512 Disk Space Free : 418.3GB Total Disk Space : 433.8GB Inode Count : 227540992 Free Inodes : 227540973 -- Brick : Brick nw4glus.gem.local:/glustexp/pri2/brick Port : 50181 Online : Y Pid : 21942 File System : xfs Device : /dev/mapper/vg_guests-lv_brick2 Mount Options : rw,noatime Inode Size : 512 Disk Space Free : 418.3GB Total Disk Space : 433.8GB Inode Count : 227540992 Free Inodes : 227540973 fuse-mount of the gv_pri volume: [root@networker ~]# ll -h /mnt/gluspri/ totale 37G -rw---. 1 qemu qemu 7,7G 24 gen 10:21 alfresco.qc2 -rw---. 1 qemu qemu 4,2G 24 gen 10:22 check_mk-salmo.qc2 -rw---. 1 qemu qemu 27M 23 gen 16:42 newnxserver.qc2 -rw---. 1 qemu qemu 1,1G 23 gen 13:38 newubutest1.qc2 -rw---. 1 qemu qemu 11G 24 gen 10:17 nxserver.qc2 -rw---. 1 qemu qemu 8,1G 24 gen 10:17 remlog.qc2 -rw---. 1 qemu qemu 5,6G 24 gen 10:19 ubutest1.qc2 Do you think this is the expected behaviour, maybe due to caching? What if the most updated node goes down while the VMs are running?
Re: [Gluster-users] gluster and kvm livemigration
samuli wrote: Can you try to set storage.owner-uid and storage.owner-gid to libvirt-qemu? To do that you have to stop volume. hi samuli, hi all I tried setting storage.owner-uid and storage.owner-gid to libvirt-qemu, as suggested, but with the same effect,during livemigration the ownership of the imagefile changes from libvirt-qemu/kvm to root/root root@pong[/5]:~ # gluster volume info glfs_atom01 Volume Name: glfs_atom01Type: ReplicateVolume ID: f28f0f62-37b3-4b10-8e86-9b373f4c0e75Status: StartedNumber of Bricks: 1 x 2 = 2Transport-type: tcpBricks:Brick1: 172.24.1.11:/ecopool/fs_atom01Brick2: 172.24.1.13:/ecopool/fs_atom01Options Reconfigured:storage.owner-gid: 104storage.owner-uid: 107network.remote-dio: enable this is tree -pfungiA path to where my images live : atom01 is running [-rw--- libvirt- kvm ] /srv/vms/mnt_atom01/atom01.img[drwxr-xr-x libvirt- kvm ] /srv/vms/mnt_atom02[-rw--- root root ] /srv/vms/mnt_atom02/atom02.img[drwxr-xr-x libvirt- kvm ] /srv/vms/mnt_atom03 Now I migrate through VirtualMachineManager and watching treeI see the permission changing to: [drwxr-xr-x libvirt- kvm ] /srv/vms/mnt_atom01[-rw--- root root ] /srv/vms/mnt_atom01/atom01.img[drwxr-xr-x libvirt- kvm ] /srv/vms/mnt_atom02[-rw--- root root ] /srv/vms/mnt_atom02/atom02.img From inside the atom01 (the VM) the filesystem becomes readonly.But in contrast tohttp://epboven.home.xs4all.nl/gluster-migrate.html I can still read all file, can checksum them, just no write accessfrom outside the image file behaves as Paul described,as long as the machine is running I can't read the file root@pong[/5]:~ # virsh list Id Name State 6 atom01 running root@pong[/5]:~ # l /srv/vms/mnt_atom01/atom01.img -rw--- 1 root root 10G Jan 24 10:20 /srv/vms/mnt_atom01/atom01.img root@pong[/5]:~ # file /srv/vms/mnt_atom01/atom01.img /srv/vms/mnt_atom01/atom01.img: writable, regular file, no read permission root@pong[/5]:~ # md5sum /srv/vms/mnt_atom01/atom01.img md5sum: /srv/vms/mnt_atom01/atom01.img: Permission denied root@pong[/5]:~ # virsh destroy atom01 Domain atom01 destroyed root@pong[/5]:~ # l /srv/vms/mnt_atom01/atom01.img -rw--- 1 root root 10G Jan 24 10:20 /srv/vms/mnt_atom01/atom01.img root@pong[/5]:~ # file /srv/vms/mnt_atom01/atom01.img /srv/vms/mnt_atom01/atom01.img: x86 boot sector; partition 1: ID=0x83, starthead 1, startsector 63, 16777165 sectors; partition 2: ID=0xf, starthead 254, startsector 16777228, 1677718 sectors, code offset 0x63 root@pong[/5]:~ # md5sum /srv/vms/mnt_atom01/atom01.img 9d048558deb46fef7b24e8895711c554 /srv/vms/mnt_atom01/atom01.img root@pong[/5]:~ # But interestingly the source of the migration can access the file after migration completedlike so: start atom01 on host ping, migrate it to pong root@pong[/8]:~ # file /srv/vms/mnt_atom01/atom01.img/srv/vms/mnt_atom01/atom01.img: writable, regular file, no read permission root@ping[/5]:~ # file /srv/vms/mnt_atom01/atom01.img/srv/vms/mnt_atom01/atom01.img: x86 boot sector; partition 1: ID=0x83, starthead 1, startsector 63, 16777165 sectors; partition 2: ID=0xf, starthead 254, startsector 16777228, 1677718 sectors, code offset 0x63 100% reproducible Regards Bernhard ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Replication delay
- Messaggio originale - Da: Pranith Kumar Karampuri pkara...@redhat.com A: Fabio Rosati fabio.ros...@geminformatica.it Cc: Gluster-users@gluster.org List gluster-users@gluster.org Inviato: Venerdì, 24 gennaio 2014 11:02:15 Oggetto: Re: [Gluster-users] Replication delay - Original Message - From: Pranith Kumar Karampuri pkara...@redhat.com To: Fabio Rosati fabio.ros...@geminformatica.it Cc: Gluster-users@gluster.org List gluster-users@gluster.org Sent: Friday, January 24, 2014 3:29:19 PM Subject: Re: [Gluster-users] Replication delay Hi Fabio, This is a known issue that has been addressed on master. It may be backported to 3.5. When a file is undergoing changes, it may appear in 'gluster volume heal volname info' output even when it doesn't need any self-heal. Pranith Sorry, I just saw that there is a self-heal happening for 15 minutes when you stop the VMs. How are you checking that the self-heal is happening? When I stop the VM for alfresco.qc2, heal info still reports alfresco.qc2 as in need for healing for about 15min. It seems this is a real out-of-sync situation because if I check the two bricks I get different modification times up until they are healed (no more reported by heal info). This is the bricks' status for alfresco.qc2 while the VM is halted: [root@networker ~]# ll /glustexp/pri1/brick/ totale 27769492 -rw---. 2 qemu qemu 8212709376 24 gen 11:16 alfresco.qc2 [...] [root@networker2 ~]# ll /glustexp/pri1/brick/ totale 27769384 -rw---. 2 qemu qemu 8212709376 24 gen 11:05 alfresco.qc2 [...] Bricks' status AFTER heal info doesn't report alfresco.qc2 anymore: [root@networker ~]# ll /glustexp/pri1/brick/ totale 27769492 -rw---. 2 qemu qemu 8212709376 24 gen 11:05 alfresco.qc2 [root@networker2 ~]# ll /glustexp/pri1/brick/ totale 27769384 -rw---. 2 qemu qemu 8212709376 24 gen 11:05 alfresco.qc2 Thanks for helping! Fabio Pranith - Original Message - From: Fabio Rosati fabio.ros...@geminformatica.it To: Gluster-users@gluster.org List gluster-users@gluster.org Sent: Friday, January 24, 2014 3:17:27 PM Subject: [Gluster-users] Replication delay Hi All, in a distributed-replicated volume hosting some VMs disk images (GlusterFS 3.4.2 on CentOS 6.5, qemu-kvm with glusterfs native support, no fuse mount), I always get the same two files that need healing: [root@networker ~]# gluster volume heal gv_pri info Gathering Heal info on volume gv_pri has been successful Brick nw1glus.gem.local:/glustexp/pri1/brick Number of entries: 2 /alfresco.qc2 /remlog.qc2 Brick nw2glus.gem.local:/glustexp/pri1/brick Number of entries: 2 /alfresco.qc2 /remlog.qc2 Brick nw3glus.gem.local:/glustexp/pri2/brick Number of entries: 0 Brick nw4glus.gem.local:/glustexp/pri2/brick Number of entries: 0 This is not a split-brain situation (I checked) and If I stop the two VMs that use these images, I get the two files healed/synced in about 15min. This is too much time, IMHO. In this volume there are other VM images with (smaller) disk images replicated on the same bricks and they get synced in real-time. These are the volume's details, the host networker is nw1glus.gem.local : [root@networker ~]# gluster volume info gv_pri Volume Name: gv_pri Type: Distributed-Replicate Volume ID: 3d91b91e-4d72-484f-8655-e5ed8d38bb28 Status: Started Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: nw1glus.gem.local:/glustexp/pri1/brick Brick2: nw2glus.gem.local:/glustexp/pri1/brick Brick3: nw3glus.gem.local:/glustexp/pri2/brick Brick4: nw4glus.gem.local:/glustexp/pri2/brick Options Reconfigured: server.allow-insecure: on storage.owner-uid: 107 storage.owner-gid: 107 [root@networker ~]# gluster volume status gv_pri detail Status of volume: gv_pri -- Brick : Brick nw1glus.gem.local:/glustexp/pri1/brick Port : 50178 Online : Y Pid : 25721 File System : xfs Device : /dev/mapper/vg_guests-lv_brick1 Mount Options : rw,noatime Inode Size : 512 Disk Space Free : 168.4GB Total Disk Space : 194.9GB Inode Count : 102236160 Free Inodes : 102236130 -- Brick : Brick nw2glus.gem.local:/glustexp/pri1/brick Port : 50178 Online : Y Pid : 27832 File System : xfs Device : /dev/mapper/vg_guests-lv_brick1 Mount Options : rw,noatime Inode Size : 512 Disk Space Free : 168.4GB Total Disk Space : 194.9GB Inode Count : 102236160 Free Inodes : 102236130 -- Brick : Brick
Re: [Gluster-users] Replication delay
This time when you stop the VM, could you get the output of getfattr -d -m. -e hex file-path-on-brick on both the bricks to debug further. Pranith - Original Message - From: Fabio Rosati fabio.ros...@geminformatica.it To: Pranith Kumar Karampuri pkara...@redhat.com Cc: Gluster-users@gluster.org List gluster-users@gluster.org Sent: Friday, January 24, 2014 3:58:38 PM Subject: Re: [Gluster-users] Replication delay - Messaggio originale - Da: Pranith Kumar Karampuri pkara...@redhat.com A: Fabio Rosati fabio.ros...@geminformatica.it Cc: Gluster-users@gluster.org List gluster-users@gluster.org Inviato: Venerdì, 24 gennaio 2014 11:02:15 Oggetto: Re: [Gluster-users] Replication delay - Original Message - From: Pranith Kumar Karampuri pkara...@redhat.com To: Fabio Rosati fabio.ros...@geminformatica.it Cc: Gluster-users@gluster.org List gluster-users@gluster.org Sent: Friday, January 24, 2014 3:29:19 PM Subject: Re: [Gluster-users] Replication delay Hi Fabio, This is a known issue that has been addressed on master. It may be backported to 3.5. When a file is undergoing changes, it may appear in 'gluster volume heal volname info' output even when it doesn't need any self-heal. Pranith Sorry, I just saw that there is a self-heal happening for 15 minutes when you stop the VMs. How are you checking that the self-heal is happening? When I stop the VM for alfresco.qc2, heal info still reports alfresco.qc2 as in need for healing for about 15min. It seems this is a real out-of-sync situation because if I check the two bricks I get different modification times up until they are healed (no more reported by heal info). This is the bricks' status for alfresco.qc2 while the VM is halted: [root@networker ~]# ll /glustexp/pri1/brick/ totale 27769492 -rw---. 2 qemu qemu 8212709376 24 gen 11:16 alfresco.qc2 [...] [root@networker2 ~]# ll /glustexp/pri1/brick/ totale 27769384 -rw---. 2 qemu qemu 8212709376 24 gen 11:05 alfresco.qc2 [...] Bricks' status AFTER heal info doesn't report alfresco.qc2 anymore: [root@networker ~]# ll /glustexp/pri1/brick/ totale 27769492 -rw---. 2 qemu qemu 8212709376 24 gen 11:05 alfresco.qc2 [root@networker2 ~]# ll /glustexp/pri1/brick/ totale 27769384 -rw---. 2 qemu qemu 8212709376 24 gen 11:05 alfresco.qc2 Thanks for helping! Fabio Pranith - Original Message - From: Fabio Rosati fabio.ros...@geminformatica.it To: Gluster-users@gluster.org List gluster-users@gluster.org Sent: Friday, January 24, 2014 3:17:27 PM Subject: [Gluster-users] Replication delay Hi All, in a distributed-replicated volume hosting some VMs disk images (GlusterFS 3.4.2 on CentOS 6.5, qemu-kvm with glusterfs native support, no fuse mount), I always get the same two files that need healing: [root@networker ~]# gluster volume heal gv_pri info Gathering Heal info on volume gv_pri has been successful Brick nw1glus.gem.local:/glustexp/pri1/brick Number of entries: 2 /alfresco.qc2 /remlog.qc2 Brick nw2glus.gem.local:/glustexp/pri1/brick Number of entries: 2 /alfresco.qc2 /remlog.qc2 Brick nw3glus.gem.local:/glustexp/pri2/brick Number of entries: 0 Brick nw4glus.gem.local:/glustexp/pri2/brick Number of entries: 0 This is not a split-brain situation (I checked) and If I stop the two VMs that use these images, I get the two files healed/synced in about 15min. This is too much time, IMHO. In this volume there are other VM images with (smaller) disk images replicated on the same bricks and they get synced in real-time. These are the volume's details, the host networker is nw1glus.gem.local : [root@networker ~]# gluster volume info gv_pri Volume Name: gv_pri Type: Distributed-Replicate Volume ID: 3d91b91e-4d72-484f-8655-e5ed8d38bb28 Status: Started Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: nw1glus.gem.local:/glustexp/pri1/brick Brick2: nw2glus.gem.local:/glustexp/pri1/brick Brick3: nw3glus.gem.local:/glustexp/pri2/brick Brick4: nw4glus.gem.local:/glustexp/pri2/brick Options Reconfigured: server.allow-insecure: on storage.owner-uid: 107 storage.owner-gid: 107 [root@networker ~]# gluster volume status gv_pri detail Status of volume: gv_pri -- Brick : Brick nw1glus.gem.local:/glustexp/pri1/brick Port : 50178 Online : Y Pid : 25721 File System : xfs Device : /dev/mapper/vg_guests-lv_brick1 Mount Options : rw,noatime Inode Size : 512 Disk
Re: [Gluster-users] Replication delay
Ok, that's the output after the VM has been halted: [root@networker ~]# getfattr -d -m. -e hex /glustexp/pri1/brick/alfresco.qc2 getfattr: Removing leading '/' from absolute path names # file: glustexp/pri1/brick/alfresco.qc2 security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000 trusted.afr.gv_pri-client-0=0x0139 trusted.afr.gv_pri-client-1=0x trusted.gfid=0x298c76de7c8643a3909f7ef77dc294fe [root@networker2 ~]# getfattr -d -m. -e hex /glustexp/pri1/brick/alfresco.qc2 getfattr: Removing leading '/' from absolute path names # file: glustexp/pri1/brick/alfresco.qc2 security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000 trusted.afr.gv_pri-client-0=0x0139 trusted.afr.gv_pri-client-1=0x trusted.gfid=0x298c76de7c8643a3909f7ef77dc294fe When heal info stops reporting alfresco.qc2 I get: [root@networker glusterfs]# getfattr -d -m. -e hex /glustexp/pri1/brick/alfresco.qc2 getfattr: Removing leading '/' from absolute path names # file: glustexp/pri1/brick/alfresco.qc2 security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000 trusted.afr.gv_pri-client-0=0x trusted.afr.gv_pri-client-1=0x trusted.gfid=0x298c76de7c8643a3909f7ef77dc294fe [root@networker2 ~]# getfattr -d -m. -e hex /glustexp/pri1/brick/alfresco.qc2 getfattr: Removing leading '/' from absolute path names # file: glustexp/pri1/brick/alfresco.qc2 security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000 trusted.afr.gv_pri-client-0=0x trusted.afr.gv_pri-client-1=0x trusted.gfid=0x298c76de7c8643a3909f7ef77dc294fe Fabio - Messaggio originale - Da: Pranith Kumar Karampuri pkara...@redhat.com A: Fabio Rosati fabio.ros...@geminformatica.it Cc: Gluster-users@gluster.org List gluster-users@gluster.org Inviato: Venerdì, 24 gennaio 2014 11:36:12 Oggetto: Re: [Gluster-users] Replication delay This time when you stop the VM, could you get the output of getfattr -d -m. -e hex file-path-on-brick on both the bricks to debug further. Pranith - Original Message - From: Fabio Rosati fabio.ros...@geminformatica.it To: Pranith Kumar Karampuri pkara...@redhat.com Cc: Gluster-users@gluster.org List gluster-users@gluster.org Sent: Friday, January 24, 2014 3:58:38 PM Subject: Re: [Gluster-users] Replication delay - Messaggio originale - Da: Pranith Kumar Karampuri pkara...@redhat.com A: Fabio Rosati fabio.ros...@geminformatica.it Cc: Gluster-users@gluster.org List gluster-users@gluster.org Inviato: Venerdì, 24 gennaio 2014 11:02:15 Oggetto: Re: [Gluster-users] Replication delay - Original Message - From: Pranith Kumar Karampuri pkara...@redhat.com To: Fabio Rosati fabio.ros...@geminformatica.it Cc: Gluster-users@gluster.org List gluster-users@gluster.org Sent: Friday, January 24, 2014 3:29:19 PM Subject: Re: [Gluster-users] Replication delay Hi Fabio, This is a known issue that has been addressed on master. It may be backported to 3.5. When a file is undergoing changes, it may appear in 'gluster volume heal volname info' output even when it doesn't need any self-heal. Pranith Sorry, I just saw that there is a self-heal happening for 15 minutes when you stop the VMs. How are you checking that the self-heal is happening? When I stop the VM for alfresco.qc2, heal info still reports alfresco.qc2 as in need for healing for about 15min. It seems this is a real out-of-sync situation because if I check the two bricks I get different modification times up until they are healed (no more reported by heal info). This is the bricks' status for alfresco.qc2 while the VM is halted: [root@networker ~]# ll /glustexp/pri1/brick/ totale 27769492 -rw---. 2 qemu qemu 8212709376 24 gen 11:16 alfresco.qc2 [...] [root@networker2 ~]# ll /glustexp/pri1/brick/ totale 27769384 -rw---. 2 qemu qemu 8212709376 24 gen 11:05 alfresco.qc2 [...] Bricks' status AFTER heal info doesn't report alfresco.qc2 anymore: [root@networker ~]# ll /glustexp/pri1/brick/ totale 27769492 -rw---. 2 qemu qemu 8212709376 24 gen 11:05 alfresco.qc2 [root@networker2 ~]# ll /glustexp/pri1/brick/ totale 27769384 -rw---. 2 qemu qemu 8212709376 24 gen 11:05 alfresco.qc2 Thanks for helping! Fabio Pranith - Original Message - From: Fabio Rosati fabio.ros...@geminformatica.it To: Gluster-users@gluster.org List gluster-users@gluster.org Sent: Friday, January 24, 2014 3:17:27 PM Subject: [Gluster-users] Replication delay Hi All, in a distributed-replicated volume hosting some VMs disk images
Re: [Gluster-users] Replication delay
Fabio, Seems like writes on first brick of this replica pair seem to be failing from the mount. Could you check both client and brick logs to see where these failures are coming from? Pranith - Original Message - From: Fabio Rosati fabio.ros...@geminformatica.it To: Pranith Kumar Karampuri pkara...@redhat.com Cc: Gluster-users@gluster.org List gluster-users@gluster.org Sent: Friday, January 24, 2014 4:50:52 PM Subject: Re: [Gluster-users] Replication delay Ok, that's the output after the VM has been halted: [root@networker ~]# getfattr -d -m. -e hex /glustexp/pri1/brick/alfresco.qc2 getfattr: Removing leading '/' from absolute path names # file: glustexp/pri1/brick/alfresco.qc2 security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000 trusted.afr.gv_pri-client-0=0x0139 trusted.afr.gv_pri-client-1=0x trusted.gfid=0x298c76de7c8643a3909f7ef77dc294fe [root@networker2 ~]# getfattr -d -m. -e hex /glustexp/pri1/brick/alfresco.qc2 getfattr: Removing leading '/' from absolute path names # file: glustexp/pri1/brick/alfresco.qc2 security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000 trusted.afr.gv_pri-client-0=0x0139 trusted.afr.gv_pri-client-1=0x trusted.gfid=0x298c76de7c8643a3909f7ef77dc294fe When heal info stops reporting alfresco.qc2 I get: [root@networker glusterfs]# getfattr -d -m. -e hex /glustexp/pri1/brick/alfresco.qc2 getfattr: Removing leading '/' from absolute path names # file: glustexp/pri1/brick/alfresco.qc2 security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000 trusted.afr.gv_pri-client-0=0x trusted.afr.gv_pri-client-1=0x trusted.gfid=0x298c76de7c8643a3909f7ef77dc294fe [root@networker2 ~]# getfattr -d -m. -e hex /glustexp/pri1/brick/alfresco.qc2 getfattr: Removing leading '/' from absolute path names # file: glustexp/pri1/brick/alfresco.qc2 security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000 trusted.afr.gv_pri-client-0=0x trusted.afr.gv_pri-client-1=0x trusted.gfid=0x298c76de7c8643a3909f7ef77dc294fe Fabio - Messaggio originale - Da: Pranith Kumar Karampuri pkara...@redhat.com A: Fabio Rosati fabio.ros...@geminformatica.it Cc: Gluster-users@gluster.org List gluster-users@gluster.org Inviato: Venerdì, 24 gennaio 2014 11:36:12 Oggetto: Re: [Gluster-users] Replication delay This time when you stop the VM, could you get the output of getfattr -d -m. -e hex file-path-on-brick on both the bricks to debug further. Pranith - Original Message - From: Fabio Rosati fabio.ros...@geminformatica.it To: Pranith Kumar Karampuri pkara...@redhat.com Cc: Gluster-users@gluster.org List gluster-users@gluster.org Sent: Friday, January 24, 2014 3:58:38 PM Subject: Re: [Gluster-users] Replication delay - Messaggio originale - Da: Pranith Kumar Karampuri pkara...@redhat.com A: Fabio Rosati fabio.ros...@geminformatica.it Cc: Gluster-users@gluster.org List gluster-users@gluster.org Inviato: Venerdì, 24 gennaio 2014 11:02:15 Oggetto: Re: [Gluster-users] Replication delay - Original Message - From: Pranith Kumar Karampuri pkara...@redhat.com To: Fabio Rosati fabio.ros...@geminformatica.it Cc: Gluster-users@gluster.org List gluster-users@gluster.org Sent: Friday, January 24, 2014 3:29:19 PM Subject: Re: [Gluster-users] Replication delay Hi Fabio, This is a known issue that has been addressed on master. It may be backported to 3.5. When a file is undergoing changes, it may appear in 'gluster volume heal volname info' output even when it doesn't need any self-heal. Pranith Sorry, I just saw that there is a self-heal happening for 15 minutes when you stop the VMs. How are you checking that the self-heal is happening? When I stop the VM for alfresco.qc2, heal info still reports alfresco.qc2 as in need for healing for about 15min. It seems this is a real out-of-sync situation because if I check the two bricks I get different modification times up until they are healed (no more reported by heal info). This is the bricks' status for alfresco.qc2 while the VM is halted: [root@networker ~]# ll /glustexp/pri1/brick/ totale 27769492 -rw---. 2 qemu qemu 8212709376 24 gen 11:16 alfresco.qc2 [...] [root@networker2 ~]# ll /glustexp/pri1/brick/ totale 27769384 -rw---. 2 qemu qemu 8212709376 24 gen 11:05 alfresco.qc2 [...] Bricks' status AFTER heal info doesn't report alfresco.qc2 anymore: [root@networker ~]# ll /glustexp/pri1/brick/
Re: [Gluster-users] Replication delay
You're right! In the brick log from the first peer (networker AKA nw1glus.gem.local) I found lots of these errors: [2014-01-24 11:32:28.482639] E [posix.c:2135:posix_writev] 0-gv_pri-posix: write failed: offset 4812114432, Invalid argument [2014-01-24 11:32:28.485334] I [server-rpc-fops.c:1439:server_writev_cbk] 0-gv_pri-server: 31817: WRITEV 0 (f1e928ad-d4dd-49f3-abae-e99cb1f310e1) == (Invalid argument) [2014-01-24 11:32:28.483791] E [posix.c:2135:posix_writev] 0-gv_pri-posix: write failed: offset 5562239488, Invalid argument [2014-01-24 11:32:28.485416] I [server-rpc-fops.c:1439:server_writev_cbk] 0-gv_pri-server: 31820: WRITEV 0 (f1e928ad-d4dd-49f3-abae-e99cb1f310e1) == (Invalid argument) [2014-01-24 11:32:28.484275] E [posix.c:2135:posix_writev] 0-gv_pri-posix: write failed: offset 5757467136, Invalid argument [2014-01-24 11:32:28.482841] E [posix.c:2135:posix_writev] 0-gv_pri-posix: write failed: offset 3742501376, Invalid argument [2014-01-24 11:32:28.485494] I [server-rpc-fops.c:1439:server_writev_cbk] 0-gv_pri-server: 31822: WRITEV 0 (f1e928ad-d4dd-49f3-abae-e99cb1f310e1) == (Invalid argument) [2014-01-24 11:32:28.485534] I [server-rpc-fops.c:1439:server_writev_cbk] 0-gv_pri-server: 31818: WRITEV 0 (f1e928ad-d4dd-49f3-abae-e99cb1f310e1) == (Invalid argument) [2014-01-24 11:32:28.530943] E [posix.c:2135:posix_writev] 0-gv_pri-posix: write failed: offset 3156122112, Invalid argument [2014-01-24 11:32:28.530997] I [server-rpc-fops.c:1439:server_writev_cbk] 0-gv_pri-server: 31832: WRITEV 0 (f1e928ad-d4dd-49f3-abae-e99cb1f310e1) == (Invalid argument) Then I noticed the SELinux context on the two bricks are different, I don't know if this can be the cause of the errors: [root@networker gluspri]# ll -Z /glustexp/pri1/brick/ -rw---. qemu qemu system_u:object_r:file_t:s0 alfresco.qc2 [root@networker2 ~]# ll -Z /glustexp/pri1/brick/ -rw---. qemu qemu unconfined_u:object_r:file_t:s0 alfresco.qc2 Fabio - Messaggio originale - Da: Pranith Kumar Karampuri pkara...@redhat.com A: Fabio Rosati fabio.ros...@geminformatica.it Cc: Gluster-users@gluster.org List gluster-users@gluster.org Inviato: Venerdì, 24 gennaio 2014 12:27:56 Oggetto: Re: [Gluster-users] Replication delay Fabio, Seems like writes on first brick of this replica pair seem to be failing from the mount. Could you check both client and brick logs to see where these failures are coming from? Pranith - Original Message - From: Fabio Rosati fabio.ros...@geminformatica.it To: Pranith Kumar Karampuri pkara...@redhat.com Cc: Gluster-users@gluster.org List gluster-users@gluster.org Sent: Friday, January 24, 2014 4:50:52 PM Subject: Re: [Gluster-users] Replication delay Ok, that's the output after the VM has been halted: [root@networker ~]# getfattr -d -m. -e hex /glustexp/pri1/brick/alfresco.qc2 getfattr: Removing leading '/' from absolute path names # file: glustexp/pri1/brick/alfresco.qc2 security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000 trusted.afr.gv_pri-client-0=0x0139 trusted.afr.gv_pri-client-1=0x trusted.gfid=0x298c76de7c8643a3909f7ef77dc294fe [root@networker2 ~]# getfattr -d -m. -e hex /glustexp/pri1/brick/alfresco.qc2 getfattr: Removing leading '/' from absolute path names # file: glustexp/pri1/brick/alfresco.qc2 security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000 trusted.afr.gv_pri-client-0=0x0139 trusted.afr.gv_pri-client-1=0x trusted.gfid=0x298c76de7c8643a3909f7ef77dc294fe When heal info stops reporting alfresco.qc2 I get: [root@networker glusterfs]# getfattr -d -m. -e hex /glustexp/pri1/brick/alfresco.qc2 getfattr: Removing leading '/' from absolute path names # file: glustexp/pri1/brick/alfresco.qc2 security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000 trusted.afr.gv_pri-client-0=0x trusted.afr.gv_pri-client-1=0x trusted.gfid=0x298c76de7c8643a3909f7ef77dc294fe [root@networker2 ~]# getfattr -d -m. -e hex /glustexp/pri1/brick/alfresco.qc2 getfattr: Removing leading '/' from absolute path names # file: glustexp/pri1/brick/alfresco.qc2 security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000 trusted.afr.gv_pri-client-0=0x trusted.afr.gv_pri-client-1=0x trusted.gfid=0x298c76de7c8643a3909f7ef77dc294fe Fabio - Messaggio originale - Da: Pranith Kumar Karampuri pkara...@redhat.com A: Fabio Rosati fabio.ros...@geminformatica.it Cc: Gluster-users@gluster.org List gluster-users@gluster.org Inviato: Venerdì, 24 gennaio 2014 11:36:12 Oggetto: Re: [Gluster-users] Replication delay This time when you stop the VM, could you get the output of getfattr
Re: [Gluster-users] Replication delay
Fabio, It has nothing to do with SELINUX IMO. You were saying self-heal happens when the VM is paused, that means writes from self-heal's fd are succeeding. So something happened to that VM's fd using which kvm writes. Wonder what!. When did you start getting this problem? What happened at that time. Pranith - Original Message - From: Fabio Rosati fabio.ros...@geminformatica.it To: Pranith Kumar Karampuri pkara...@redhat.com Cc: Gluster-users@gluster.org List gluster-users@gluster.org Sent: Friday, January 24, 2014 5:09:25 PM Subject: Re: [Gluster-users] Replication delay You're right! In the brick log from the first peer (networker AKA nw1glus.gem.local) I found lots of these errors: [2014-01-24 11:32:28.482639] E [posix.c:2135:posix_writev] 0-gv_pri-posix: write failed: offset 4812114432, Invalid argument [2014-01-24 11:32:28.485334] I [server-rpc-fops.c:1439:server_writev_cbk] 0-gv_pri-server: 31817: WRITEV 0 (f1e928ad-d4dd-49f3-abae-e99cb1f310e1) == (Invalid argument) [2014-01-24 11:32:28.483791] E [posix.c:2135:posix_writev] 0-gv_pri-posix: write failed: offset 5562239488, Invalid argument [2014-01-24 11:32:28.485416] I [server-rpc-fops.c:1439:server_writev_cbk] 0-gv_pri-server: 31820: WRITEV 0 (f1e928ad-d4dd-49f3-abae-e99cb1f310e1) == (Invalid argument) [2014-01-24 11:32:28.484275] E [posix.c:2135:posix_writev] 0-gv_pri-posix: write failed: offset 5757467136, Invalid argument [2014-01-24 11:32:28.482841] E [posix.c:2135:posix_writev] 0-gv_pri-posix: write failed: offset 3742501376, Invalid argument [2014-01-24 11:32:28.485494] I [server-rpc-fops.c:1439:server_writev_cbk] 0-gv_pri-server: 31822: WRITEV 0 (f1e928ad-d4dd-49f3-abae-e99cb1f310e1) == (Invalid argument) [2014-01-24 11:32:28.485534] I [server-rpc-fops.c:1439:server_writev_cbk] 0-gv_pri-server: 31818: WRITEV 0 (f1e928ad-d4dd-49f3-abae-e99cb1f310e1) == (Invalid argument) [2014-01-24 11:32:28.530943] E [posix.c:2135:posix_writev] 0-gv_pri-posix: write failed: offset 3156122112, Invalid argument [2014-01-24 11:32:28.530997] I [server-rpc-fops.c:1439:server_writev_cbk] 0-gv_pri-server: 31832: WRITEV 0 (f1e928ad-d4dd-49f3-abae-e99cb1f310e1) == (Invalid argument) Then I noticed the SELinux context on the two bricks are different, I don't know if this can be the cause of the errors: [root@networker gluspri]# ll -Z /glustexp/pri1/brick/ -rw---. qemu qemu system_u:object_r:file_t:s0 alfresco.qc2 [root@networker2 ~]# ll -Z /glustexp/pri1/brick/ -rw---. qemu qemu unconfined_u:object_r:file_t:s0 alfresco.qc2 Fabio - Messaggio originale - Da: Pranith Kumar Karampuri pkara...@redhat.com A: Fabio Rosati fabio.ros...@geminformatica.it Cc: Gluster-users@gluster.org List gluster-users@gluster.org Inviato: Venerdì, 24 gennaio 2014 12:27:56 Oggetto: Re: [Gluster-users] Replication delay Fabio, Seems like writes on first brick of this replica pair seem to be failing from the mount. Could you check both client and brick logs to see where these failures are coming from? Pranith - Original Message - From: Fabio Rosati fabio.ros...@geminformatica.it To: Pranith Kumar Karampuri pkara...@redhat.com Cc: Gluster-users@gluster.org List gluster-users@gluster.org Sent: Friday, January 24, 2014 4:50:52 PM Subject: Re: [Gluster-users] Replication delay Ok, that's the output after the VM has been halted: [root@networker ~]# getfattr -d -m. -e hex /glustexp/pri1/brick/alfresco.qc2 getfattr: Removing leading '/' from absolute path names # file: glustexp/pri1/brick/alfresco.qc2 security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000 trusted.afr.gv_pri-client-0=0x0139 trusted.afr.gv_pri-client-1=0x trusted.gfid=0x298c76de7c8643a3909f7ef77dc294fe [root@networker2 ~]# getfattr -d -m. -e hex /glustexp/pri1/brick/alfresco.qc2 getfattr: Removing leading '/' from absolute path names # file: glustexp/pri1/brick/alfresco.qc2 security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000 trusted.afr.gv_pri-client-0=0x0139 trusted.afr.gv_pri-client-1=0x trusted.gfid=0x298c76de7c8643a3909f7ef77dc294fe When heal info stops reporting alfresco.qc2 I get: [root@networker glusterfs]# getfattr -d -m. -e hex /glustexp/pri1/brick/alfresco.qc2 getfattr: Removing leading '/' from absolute path names # file: glustexp/pri1/brick/alfresco.qc2 security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000 trusted.afr.gv_pri-client-0=0x trusted.afr.gv_pri-client-1=0x trusted.gfid=0x298c76de7c8643a3909f7ef77dc294fe [root@networker2 ~]# getfattr -d -m. -e hex /glustexp/pri1/brick/alfresco.qc2 getfattr: Removing leading '/' from absolute path
Re: [Gluster-users] Replication delay
On 01/24/2014 05:09 PM, Fabio Rosati wrote: You're right! In the brick log from the first peer (networker AKA nw1glus.gem.local) I found lots of these errors: [2014-01-24 11:32:28.482639] E [posix.c:2135:posix_writev] 0-gv_pri-posix: write failed: offset 4812114432, Invalid argument [2014-01-24 11:32:28.485334] I [server-rpc-fops.c:1439:server_writev_cbk] 0-gv_pri-server: 31817: WRITEV 0 (f1e928ad-d4dd-49f3-abae-e99cb1f310e1) == (Invalid argument) [2014-01-24 11:32:28.483791] E [posix.c:2135:posix_writev] 0-gv_pri-posix: write failed: offset 5562239488, Invalid argument [2014-01-24 11:32:28.485416] I [server-rpc-fops.c:1439:server_writev_cbk] 0-gv_pri-server: 31820: WRITEV 0 (f1e928ad-d4dd-49f3-abae-e99cb1f310e1) == (Invalid argument) [2014-01-24 11:32:28.484275] E [posix.c:2135:posix_writev] 0-gv_pri-posix: write failed: offset 5757467136, Invalid argument [2014-01-24 11:32:28.482841] E [posix.c:2135:posix_writev] 0-gv_pri-posix: write failed: offset 3742501376, Invalid argument [2014-01-24 11:32:28.485494] I [server-rpc-fops.c:1439:server_writev_cbk] 0-gv_pri-server: 31822: WRITEV 0 (f1e928ad-d4dd-49f3-abae-e99cb1f310e1) == (Invalid argument) [2014-01-24 11:32:28.485534] I [server-rpc-fops.c:1439:server_writev_cbk] 0-gv_pri-server: 31818: WRITEV 0 (f1e928ad-d4dd-49f3-abae-e99cb1f310e1) == (Invalid argument) [2014-01-24 11:32:28.530943] E [posix.c:2135:posix_writev] 0-gv_pri-posix: write failed: offset 3156122112, Invalid argument [2014-01-24 11:32:28.530997] I [server-rpc-fops.c:1439:server_writev_cbk] 0-gv_pri-server: 31832: WRITEV 0 (f1e928ad-d4dd-49f3-abae-e99cb1f310e1) == (Invalid argument) Then I noticed the SELinux context on the two bricks are different, I don't know if this can be the cause of the errors: Might be related to the sector size in xfs and the disks that are being used. [1] and [2] have some details. -Vijay [1] http://www.gluster.org/pipermail/gluster-users/2013-November/037842.html [2] https://bugzilla.redhat.com/show_bug.cgi?id=997839 ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] gluster and kvm livemigration
I submitted Bug 1057645 https://bugzilla.redhat.com/show_bug.cgi?id=1057645 Bernhard On 24.01.2014 11:07:49, Bernhard Glomm wrote: samuli wrote: Can you try to set storage.owner-uid and storage.owner-gid to libvirt-qemu? To do that you have to stop volume. hi samuli, hi all I tried setting storage.owner-uid and storage.owner-gid to libvirt-qemu, as suggested, but with the same effect, during livemigration the ownership of the imagefile changes from libvirt-qemu/kvm to root/root root@pong[/5]:~ # gluster volume info glfs_atom01 Volume Name: glfs_atom01 Type: Replicate Volume ID: f28f0f62-37b3-4b10-8e86-9b373f4c0e75 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: 172.24.1.11:/ecopool/fs_atom01 Brick2: 172.24.1.13:/ecopool/fs_atom01 Options Reconfigured: storage.owner-gid: 104 storage.owner-uid: 107 network.remote-dio: enable this is tree -pfungiA path to where my images live : atom01 is running [-rw--- libvirt- kvm ] /srv/vms/mnt_atom01/atom01.img [drwxr-xr-x libvirt- kvm ] /srv/vms/mnt_atom02 [-rw--- root root ] /srv/vms/mnt_atom02/atom02.img [drwxr-xr-x libvirt- kvm ] /srv/vms/mnt_atom03 Now I migrate through VirtualMachineManager and watching tree I see the permission changing to: [drwxr-xr-x libvirt- kvm ] /srv/vms/mnt_atom01 [-rw--- root root ] /srv/vms/mnt_atom01/atom01.img [drwxr-xr-x libvirt- kvm ] /srv/vms/mnt_atom02 [-rw--- root root ] /srv/vms/mnt_atom02/atom02.img From inside the atom01 (the VM) the filesystem becomes readonly. But in contrast to http://epboven.home.xs4all.nl/gluster-migrate.html I can still read all file, can checksum them, just no write access from outside the image file behaves as Paul described, as long as the machine is running I can't read the file root@pong[/5]:~ # virsh list Id Name State 6 atom01 running root@pong[/5]:~ # l /srv/vms/mnt_atom01/atom01.img -rw--- 1 root root 10G Jan 24 10:20 /srv/vms/mnt_atom01/atom01.img root@pong[/5]:~ # file /srv/vms/mnt_atom01/atom01.img /srv/vms/mnt_atom01/atom01.img: writable, regular file, no read permission root@pong[/5]:~ # md5sum /srv/vms/mnt_atom01/atom01.img md5sum: /srv/vms/mnt_atom01/atom01.img: Permission denied root@pong[/5]:~ # virsh destroy atom01 Domain atom01 destroyed root@pong[/5]:~ # l /srv/vms/mnt_atom01/atom01.img -rw--- 1 root root 10G Jan 24 10:20 /srv/vms/mnt_atom01/atom01.img root@pong[/5]:~ # file /srv/vms/mnt_atom01/atom01.img /srv/vms/mnt_atom01/atom01.img: x86 boot sector; partition 1: ID=0x83, starthead 1, startsector 63, 16777165 sectors; partition 2: ID=0xf, starthead 254, startsector 16777228, 1677718 sectors, code offset 0x63 root@pong[/5]:~ # md5sum /srv/vms/mnt_atom01/atom01.img 9d048558deb46fef7b24e8895711c554 /srv/vms/mnt_atom01/atom01.img root@pong[/5]:~ # But interestingly the source of the migration can access the file after migration completed like so: start atom01 on host ping, migrate it to pong root@pong[/8]:~ # file /srv/vms/mnt_atom01/atom01.img /srv/vms/mnt_atom01/atom01.img: writable, regular file, no read permission root@ping[/5]:~ # file /srv/vms/mnt_atom01/atom01.img /srv/vms/mnt_atom01/atom01.img: x86 boot sector; partition 1: ID=0x83, starthead 1, startsector 63, 16777165 sectors; partition 2: ID=0xf, starthead 254, startsector 16777228, 1677718 sectors, code offset 0x63 100% reproducible Regards Bernhard ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users -- Bernhard Glomm IT Administration Phone: +49 (30) 86880 134 Fax: +49 (30) 86880 100 Skype: bernhard.glomm.ecologic Ecologic Institut gemeinnützige GmbH | Pfalzburger Str. 43/44 | 10717 Berlin | Germany GF: R. Andreas Kraemer | AG: Charlottenburg HRB 57947 | USt/VAT-IdNr.: DE811963464 Ecologic™ is a Trade Mark (TM) of Ecologic Institut gemeinnützige GmbH ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Replication delay
Pranith and Vijay, the problems began when I started to use the alfresco.qc2 and other disk images on the gv_pri volume backed by XFS on LVM partitions. I copied these images from another GlusterFS volume, (backed by ext4, no LVM partitions) where they works as expected. The VMs runs on the same hosts, so the qemu-kvm version is the same. Here are the details of a brick from the gv_pri (new and problematic) volume: [root@networker bricks]# xfs_info /glustexp/pri1 meta-data=/dev/mapper/vg_guests-lv_brick1 isize=512agcount=16, agsize=3194880 blks = sectsz=4096 attr=2, projid32bit=0 data = bsize=4096 blocks=51118080, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 log =internal bsize=4096 blocks=24960, version=2 = sectsz=4096 sunit=1 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 This is a brick partition from the gv_sec (old and working properly) volume: [root@networker2 bricks]# dumpe2fs -h /dev/sda1 dumpe2fs 1.41.12 (17-May-2010) Filesystem volume name: none Last mounted on: /glustexp/sec2 Filesystem UUID: 87678a0d-aef6-403c-930a-a9b2b4cb7c37 Filesystem magic number: 0xEF53 Filesystem revision #:1 (dynamic) Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize Filesystem flags: signed_directory_hash Default mount options:(none) Filesystem state: clean Errors behavior: Continue Filesystem OS type: Linux Inode count: 9773056 Block count: 39072718 Reserved block count: 1953635 Free blocks: 36406615 Free inodes: 9772982 First block: 0 Block size: 4096 Fragment size:4096 Reserved GDT blocks: 1014 Blocks per group: 32768 Fragments per group: 32768 Inodes per group: 8192 Inode blocks per group: 512 Flex block group size:16 Filesystem created: Wed Dec 18 10:03:39 2013 Last mount time: Thu Jan 9 23:03:24 2014 Last write time: Thu Jan 9 23:03:24 2014 Mount count: 2 Maximum mount count: 39 Last checked: Wed Dec 18 10:03:39 2013 Check interval: 15552000 (6 months) Next check after: Mon Jun 16 11:03:39 2014 Lifetime writes: 189 GB Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group root) First inode: 11 Inode size: 256 Required extra isize: 28 Desired extra isize: 28 Journal inode:8 First orphan inode: 917534 Default directory hash: half_md4 Directory Hash Seed: 4891a3c2-8e00-45a3-ac6b-ea96de069b38 Journal backup: inode blocks Journal features: journal_incompat_revoke Journal size: 128M Journal length: 32768 Journal sequence: 0x0015bae4 Journal start:31215 The block size is the same, 4096 bytes. I did some other investigation and it seems the problem happens only with VM disk images internally formatted with a blocksize of 1024 bytes. There are no problems with disk images formatted with a block size on 4096 bytes. Anyway, I don't know if this is a coincidence. Do you think this could be the origin of the problem? If so, how can I solve it? In the links posted by Vijay someone suggests to start the VM with cache != none but this will prevent live migration, AFAIK. Another solution may be to recreate the volume backing it with XFS partitions formatted with a different block size (smaller? 1024 bytes?), this would be a painful option, but if this will solve the problem, I go for it. Thanks a lot, Fabio - Messaggio originale - Da: Pranith Kumar Karampuri pkara...@redhat.com A: Fabio Rosati fabio.ros...@geminformatica.it Cc: Gluster-users@gluster.org List gluster-users@gluster.org Inviato: Venerdì, 24 gennaio 2014 12:52:44 Oggetto: Re: [Gluster-users] Replication delay Fabio, It has nothing to do with SELINUX IMO. You were saying self-heal happens when the VM is paused, that means writes from self-heal's fd are succeeding. So something happened to that VM's fd using which kvm writes. Wonder what!. When did you start getting this problem? What happened at that time. Pranith - Original Message - From: Fabio Rosati fabio.ros...@geminformatica.it To: Pranith Kumar Karampuri pkara...@redhat.com Cc: Gluster-users@gluster.org List gluster-users@gluster.org Sent: Friday, January 24, 2014 5:09:25 PM Subject: Re: [Gluster-users] Replication delay You're right! In the brick log from the first peer (networker AKA nw1glus.gem.local) I found lots of these errors: [2014-01-24 11:32:28.482639] E
[Gluster-users] Feasibility Over Lower Bandwidth Link and General Questions...
I'm doing some early research into how/if we can mirror two Samba servers connected via a 10 mbit VPN. The intent is to have users (MS Office, AutoCAD, some GIS workstations) have a 'local' copy of all files, no matter which office location they're sitting in (and which Samba server they're working with). We need to have the identical file (as seen by the other server) locked when the 'local' copy is open. Then updated when saved, of course. I'm new to this and am working my way up the learning curve with an ice axe and crampons... I'm still stumbling across the icefield and an attempt at the summit is a lng way off. I've learned that CephFS isn't ready for mainstream use just yet. I've learned that XtreemFS doesn't lock files with POSIX commands... if the application using the file doesn't write a lock file (that's quickly replicated across the OSD's), the potential for both copies being edited at the same time exists. I want to be able to ensure the local Samba server utilizes the physically local files, drives, volumes - whatever you want to call them - to avoid delays in opening and closing files over the VPN. Striping across local and distant drives would presumably create a big speed problem. On the face of it, is GlusterFS an option that might fit my needs? Is there a document that explains the configuration of such an arrangement? Clearly I'm going to be setting up a test bed of sorts eventually, but I'd prefer to start with something that *might* work, rather than something that simply can't... Thanks! ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Gluster Volume Properties
Hi there, I've been taking another look at some of gluster volume properties. If you know of some that are missing from my list or have incorrect entries, please let me know! My list is here: https://github.com/purpleidea/puppet-gluster/blob/master/manifests/volume/property/data.pp#L18 This curated list makes it easy to manage properties with Puppet-Gluster. The list isn't complete though. This is where I need your help! Semiosis: The latest git master adds support for the options you requested in gluster volume properties. It also contains this patch: https://github.com/purpleidea/puppet-gluster/commit/221e3049f04fb608d013d7092bcfb258010b2d6d which adds support for adding the rpc-auth-allow-insecure option to the glusterd.vol file. You can use these two together like: class { '::gluster::simple': volume = 'yourvolumename', rpcauthallowinsecure = true, } gluster::volume::property{ 'yourvolumename#server.allow-insecure': value = on,# you can use true/false, on/off } which should hopefully make testing your libgfapi-jni easy! If anyone has any other questions, please let me know. James @purpleidea (twitter / irc) https://ttboj.wordpress.com/ signature.asc Description: This is a digitally signed message part ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Replication delay
On 01/24/2014 09:24 PM, Fabio Rosati wrote: The block size is the same, 4096 bytes. I did some other investigation and it seems the problem happens only with VM disk images internally formatted with a blocksize of 1024 bytes. There are no problems with disk images formatted with a block size on 4096 bytes. Anyway, I don't know if this is a coincidence. Do you think this could be the origin of the problem? If so, how can I solve it? In the links posted by Vijay someone suggests to start the VM with cache != none but this will prevent live migration, AFAIK. Another solution may be to recreate the volume backing it with XFS partitions formatted with a different block size (smaller? 1024 bytes?), this would be a painful option, but if this will solve the problem, I go for it. A lower sector size (512) for xfs has been observed to be useful in overcoming this problem. Another solution might be to use logical_block_size=4096 option as referred here [1]. -Vijay [1] https://bugzilla.redhat.com/show_bug.cgi?id=997839#c7 ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users