Re: [Gluster-users] Client and server file "view", different results?! Client can't see the right file.
No, all files are running VMs. No-one alters them manually (which would kill the VM...) So, all was done by the replicate mechanism and the sync. We have to reboot servers from time to time for upgrades, but we do bring the back up with the Gluster running before tackling a second server. Best, Martin -Original Message- From: Mohit Anchlia [mailto:mohitanch...@gmail.com] Sent: Thursday, May 19, 2011 7:05 PM To: Pranith Kumar. Karampuri; Martin Schenker Cc: gluster-users@gluster.org Subject: Re: [Gluster-users] Client and server file "view", different results?! Client can't see the right file. What's more interesting is that pserver3 shows "0" bytes and rest 3 of them show the same "size". While pserver12 & 13 has "trusted.glusterfs.dht.linkto="storage0-replicate-0" set. Was there every any manual operation done with these files? On Thu, May 19, 2011 at 5:16 AM, Pranith Kumar. Karampuri wrote: > Need the logs from May 13th to 17th. > > Pranith. > - Original Message - > From: "Martin Schenker" > To: "Pranith Kumar. Karampuri" > Cc: gluster-users@gluster.org > Sent: Thursday, May 19, 2011 5:28:06 PM > Subject: RE: [Gluster-users] Client and server file "view", different results?! Client can't see the right file. > > Hi Pranith! > > That's what I would have expected as well! The files should be on one brick. But they appear on both. > I'm quite stumped WHY the files show up on the other brick, this isn't what I understood from the manual/setup! The vol-file doesn't seem to be wrong so any ideas? > > Best, Martin > > > > -Original Message- > From: Pranith Kumar. Karampuri [mailto:prani...@gluster.com] > Sent: Thursday, May 19, 2011 1:52 PM > To: Martin Schenker > Cc: gluster-users@gluster.org > Subject: Re: [Gluster-users] Client and server file "view", different results?! Client can't see the right file. > > Martin, > The output suggests that there are 2 replicas per 1 volume. So it should be present on only 2 bricks. Why is the file present in 4 bricks?. It should either be present on pserver12&13 or pserver3 & 5. I am not sure why you are expecting it to be there on 4 bricks. > Am I missing any info here?. > > Pranith > > - Original Message - > From: "Martin Schenker" > To: gluster-users@gluster.org > Sent: Wednesday, May 18, 2011 2:23:09 PM > Subject: Re: [Gluster-users] Client and server file "view", different results?! Client can't see the right file. > > Here is another occurrence: > > The file 20819 is shown twice, different timestamps and attributes. 0 > filesize on pserver3, outdated on pserver5, just 12&13 seems to be in sync. > So what's going on? > > > 0 root@de-dc1-c1-pserver13:~ # ls -al > /opt/profitbricks/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/h > dd-images/2081* > -rwxrwx--- 1 libvirt-qemu kvm 53687091200 May 18 08:44 > /opt/profitbricks/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/h > dd-images/20819 > -rwxrwx--- 1 libvirt-qemu kvm 53687091200 May 18 08:44 > /opt/profitbricks/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/h > dd-images/20819 > > 0 root@de-dc1-c1-pserver3:~ # find /mnt/gluster/brick?/ -name 20819 | xargs > -i ls -al {} > -rwxrwx--- 1 libvirt-qemu vcb 0 May 14 17:00 > /mnt/gluster/brick0/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef > /hdd-images/20819 > 0 root@de-dc1-c1-pserver3:~ # getfattr -dm - > /mnt/gluster/brick0/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef > /hdd-images/20819 > getfattr: Removing leading '/' from absolute path names > # file: > mnt/gluster/brick0/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/ > hdd-images/20819 > trusted.gfid=0sa5/rvjUUQ3ibSf32O3izOw== > > 0 root@pserver5:~ # find /mnt/gluster/brick?/ -name 20819 | xargs -i ls -al > {} > -rwxrwx--- 1 libvirt-qemu vcb 53687091200 May 14 17:00 > /mnt/gluster/brick0/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef > /hdd-images/20819 > 0 root@pserver5:~ # getfattr -dm - > /mnt/gluster/brick0/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef > /hdd-images/20819 > getfattr: Removing leading '/' from absolute path names > # file: > mnt/gluster/brick0/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/ > hdd-images/20819 > trusted.afr.storage0-client-0=0sAgIA > trusted.afr.storage0-client-1=0s > trusted.gfid=0sa5/rvjUUQ3ibSf32O3izOw== > > 0 root@pserver12:~ # find /mnt/gluster/brick?/ -name 20819 | xargs -i ls -al > {} > -rwxrwx--- 1 libvirt-qemu kvm 53687091200 May 18 08:
Re: [Gluster-users] Client and server file "view", different results?! Client can't see the right file.
Hi Pranith! That's what I would have expected as well! The files should be on one brick. But they appear on both. I'm quite stumped WHY the files show up on the other brick, this isn't what I understood from the manual/setup! The vol-file doesn't seem to be wrong so any ideas? Best, Martin -Original Message- From: Pranith Kumar. Karampuri [mailto:prani...@gluster.com] Sent: Thursday, May 19, 2011 1:52 PM To: Martin Schenker Cc: gluster-users@gluster.org Subject: Re: [Gluster-users] Client and server file "view", different results?! Client can't see the right file. Martin, The output suggests that there are 2 replicas per 1 volume. So it should be present on only 2 bricks. Why is the file present in 4 bricks?. It should either be present on pserver12&13 or pserver3 & 5. I am not sure why you are expecting it to be there on 4 bricks. Am I missing any info here?. Pranith - Original Message - From: "Martin Schenker" To: gluster-users@gluster.org Sent: Wednesday, May 18, 2011 2:23:09 PM Subject: Re: [Gluster-users] Client and server file "view", different results?! Client can't see the right file. Here is another occurrence: The file 20819 is shown twice, different timestamps and attributes. 0 filesize on pserver3, outdated on pserver5, just 12&13 seems to be in sync. So what's going on? 0 root@de-dc1-c1-pserver13:~ # ls -al /opt/profitbricks/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/h dd-images/2081* -rwxrwx--- 1 libvirt-qemu kvm 53687091200 May 18 08:44 /opt/profitbricks/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/h dd-images/20819 -rwxrwx--- 1 libvirt-qemu kvm 53687091200 May 18 08:44 /opt/profitbricks/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/h dd-images/20819 0 root@de-dc1-c1-pserver3:~ # find /mnt/gluster/brick?/ -name 20819 | xargs -i ls -al {} -rwxrwx--- 1 libvirt-qemu vcb 0 May 14 17:00 /mnt/gluster/brick0/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef /hdd-images/20819 0 root@de-dc1-c1-pserver3:~ # getfattr -dm - /mnt/gluster/brick0/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef /hdd-images/20819 getfattr: Removing leading '/' from absolute path names # file: mnt/gluster/brick0/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/ hdd-images/20819 trusted.gfid=0sa5/rvjUUQ3ibSf32O3izOw== 0 root@pserver5:~ # find /mnt/gluster/brick?/ -name 20819 | xargs -i ls -al {} -rwxrwx--- 1 libvirt-qemu vcb 53687091200 May 14 17:00 /mnt/gluster/brick0/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef /hdd-images/20819 0 root@pserver5:~ # getfattr -dm - /mnt/gluster/brick0/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef /hdd-images/20819 getfattr: Removing leading '/' from absolute path names # file: mnt/gluster/brick0/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/ hdd-images/20819 trusted.afr.storage0-client-0=0sAgIA trusted.afr.storage0-client-1=0s trusted.gfid=0sa5/rvjUUQ3ibSf32O3izOw== 0 root@pserver12:~ # find /mnt/gluster/brick?/ -name 20819 | xargs -i ls -al {} -rwxrwx--- 1 libvirt-qemu kvm 53687091200 May 18 08:41 /mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef /hdd-images/20819 0 root@pserver12:~ # getfattr -dm - /mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef /hdd-images/20819 getfattr: Removing leading '/' from absolute path names # file: mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/ hdd-images/20819 trusted.afr.storage0-client-6=0s trusted.afr.storage0-client-7=0s trusted.gfid=0sa5/rvjUUQ3ibSf32O3izOw== trusted.glusterfs.dht.linkto="storage0-replicate-0 0 root@de-dc1-c1-pserver13:~ # find /mnt/gluster/brick?/ -name 20819 | xargs -i ls -al {} -rwxrwx--- 1 libvirt-qemu kvm 53687091200 May 18 08:39 /mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef /hdd-images/20819 0 root@de-dc1-c1-pserver13:~ # getfattr -dm - /mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef /hdd-images/20819 getfattr: Removing leading '/' from absolute path names # file: mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/ hdd-images/20819 trusted.afr.storage0-client-6=0s trusted.afr.storage0-client-7=0s trusted.gfid=0sa5/rvjUUQ3ibSf32O3izOw== trusted.glusterfs.dht.linkto="storage0-replicate-0 Only entrance in log file on pserver5, no references in the other three logs/servers: 0 root@pserver5:~ # grep 20819 /var/log/glusterfs/opt-profitbricks-storage.log [2011-05-17 20:37:30.52535] I [client-handshake.c:407:client3_1_reopen_cbk] 0-storage0-client-7: reopen on /images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/hdd-images/20819 succeeded (remote-fd = 6) [2011-05-17 20:37:34.824934] I [afr-open.c:435:afr_openfd_
Re: [Gluster-users] Client and server file "view", different results?! Client can't see the right file.
No, we had these issues before already, running on 3.1.3 Only the system load has gone up a lot in the meantime... Best, Martin -Original Message- From: Mohit Anchlia [mailto:mohitanch...@gmail.com] Sent: Wednesday, May 18, 2011 10:36 PM To: Martin Schenker Cc: gluster-users@gluster.org Subject: Re: [Gluster-users] Client and server file "view", different results?! Client can't see the right file. So you started seeing this issue after rolling it back to 3.1.3? On Wed, May 18, 2011 at 1:30 PM, Martin Schenker wrote: > We're running 3.1.3. we had a brief test of 3.2.0 and rolled back to 3.1.3 > by reinstalling the Debian package. > > 0 root@pserver12:~ # gluster volume info all > > Volume Name: storage0 > Type: Distributed-Replicate > Status: Started > Number of Bricks: 4 x 2 = 8 > Transport-type: rdma > Bricks: > Brick1: de-dc1-c1-pserver3:/mnt/gluster/brick0/storage > Brick2: de-dc1-c1-pserver5:/mnt/gluster/brick0/storage > Brick3: de-dc1-c1-pserver3:/mnt/gluster/brick1/storage > Brick4: de-dc1-c1-pserver5:/mnt/gluster/brick1/storage > Brick5: de-dc1-c1-pserver12:/mnt/gluster/brick0/storage > Brick6: de-dc1-c1-pserver13:/mnt/gluster/brick0/storage > Brick7: de-dc1-c1-pserver12:/mnt/gluster/brick1/storage > Brick8: de-dc1-c1-pserver13:/mnt/gluster/brick1/storage > Options Reconfigured: > network.ping-timeout: 5 > nfs.disable: on > performance.cache-size: 4096MB > > Best, Martin > > -Original Message- > From: Mohit Anchlia [mailto:mohitanch...@gmail.com] > Sent: Wednesday, May 18, 2011 9:43 PM > To: Martin Schenker > Cc: gluster-users@gluster.org > Subject: Re: [Gluster-users] Client and server file "view", different > results?! Client can't see the right file. > > Which version are you running? Can you also post output from volume info? > > Meanwhile, anyone from dev want to answer?? > > On Wed, May 18, 2011 at 1:53 AM, Martin Schenker > wrote: >> Here is another occurrence: >> >> The file 20819 is shown twice, different timestamps and attributes. 0 >> filesize on pserver3, outdated on pserver5, just 12&13 seems to be in > sync. >> So what's going on? >> >> >> 0 root@de-dc1-c1-pserver13:~ # ls -al >> > /opt/profitbricks/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/h >> dd-images/2081* >> -rwxrwx--- 1 libvirt-qemu kvm 53687091200 May 18 08:44 >> > /opt/profitbricks/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/h >> dd-images/20819 >> -rwxrwx--- 1 libvirt-qemu kvm 53687091200 May 18 08:44 >> > /opt/profitbricks/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/h >> dd-images/20819 >> >> 0 root@de-dc1-c1-pserver3:~ # find /mnt/gluster/brick?/ -name 20819 | > xargs >> -i ls -al {} >> -rwxrwx--- 1 libvirt-qemu vcb 0 May 14 17:00 >> > /mnt/gluster/brick0/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef >> /hdd-images/20819 >> 0 root@de-dc1-c1-pserver3:~ # getfattr -dm - >> > /mnt/gluster/brick0/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef >> /hdd-images/20819 >> getfattr: Removing leading '/' from absolute path names >> # file: >> > mnt/gluster/brick0/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/ >> hdd-images/20819 >> trusted.gfid=0sa5/rvjUUQ3ibSf32O3izOw== >> >> 0 root@pserver5:~ # find /mnt/gluster/brick?/ -name 20819 | xargs -i ls > -al >> {} >> -rwxrwx--- 1 libvirt-qemu vcb 53687091200 May 14 17:00 >> > /mnt/gluster/brick0/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef >> /hdd-images/20819 >> 0 root@pserver5:~ # getfattr -dm - >> > /mnt/gluster/brick0/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef >> /hdd-images/20819 >> getfattr: Removing leading '/' from absolute path names >> # file: >> > mnt/gluster/brick0/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/ >> hdd-images/20819 >> trusted.afr.storage0-client-0=0sAgIA >> trusted.afr.storage0-client-1=0s >> trusted.gfid=0sa5/rvjUUQ3ibSf32O3izOw== >> >> 0 root@pserver12:~ # find /mnt/gluster/brick?/ -name 20819 | xargs -i ls > -al >> {} >> -rwxrwx--- 1 libvirt-qemu kvm 53687091200 May 18 08:41 >> > /mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef >> /hdd-images/20819 >> 0 root@pserver12:~ # getfattr -dm - >> > /mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef >> /hdd-images/20819 >> getfattr: Removing leading '/' from absolute path names >> # file: >> > mnt/gluster/brick1/storage/i
Re: [Gluster-users] Client and server file "view", different results?! Client can't see the right file.
We're running 3.1.3. we had a brief test of 3.2.0 and rolled back to 3.1.3 by reinstalling the Debian package. 0 root@pserver12:~ # gluster volume info all Volume Name: storage0 Type: Distributed-Replicate Status: Started Number of Bricks: 4 x 2 = 8 Transport-type: rdma Bricks: Brick1: de-dc1-c1-pserver3:/mnt/gluster/brick0/storage Brick2: de-dc1-c1-pserver5:/mnt/gluster/brick0/storage Brick3: de-dc1-c1-pserver3:/mnt/gluster/brick1/storage Brick4: de-dc1-c1-pserver5:/mnt/gluster/brick1/storage Brick5: de-dc1-c1-pserver12:/mnt/gluster/brick0/storage Brick6: de-dc1-c1-pserver13:/mnt/gluster/brick0/storage Brick7: de-dc1-c1-pserver12:/mnt/gluster/brick1/storage Brick8: de-dc1-c1-pserver13:/mnt/gluster/brick1/storage Options Reconfigured: network.ping-timeout: 5 nfs.disable: on performance.cache-size: 4096MB Best, Martin -Original Message- From: Mohit Anchlia [mailto:mohitanch...@gmail.com] Sent: Wednesday, May 18, 2011 9:43 PM To: Martin Schenker Cc: gluster-users@gluster.org Subject: Re: [Gluster-users] Client and server file "view", different results?! Client can't see the right file. Which version are you running? Can you also post output from volume info? Meanwhile, anyone from dev want to answer?? On Wed, May 18, 2011 at 1:53 AM, Martin Schenker wrote: > Here is another occurrence: > > The file 20819 is shown twice, different timestamps and attributes. 0 > filesize on pserver3, outdated on pserver5, just 12&13 seems to be in sync. > So what's going on? > > > 0 root@de-dc1-c1-pserver13:~ # ls -al > /opt/profitbricks/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/h > dd-images/2081* > -rwxrwx--- 1 libvirt-qemu kvm 53687091200 May 18 08:44 > /opt/profitbricks/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/h > dd-images/20819 > -rwxrwx--- 1 libvirt-qemu kvm 53687091200 May 18 08:44 > /opt/profitbricks/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/h > dd-images/20819 > > 0 root@de-dc1-c1-pserver3:~ # find /mnt/gluster/brick?/ -name 20819 | xargs > -i ls -al {} > -rwxrwx--- 1 libvirt-qemu vcb 0 May 14 17:00 > /mnt/gluster/brick0/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef > /hdd-images/20819 > 0 root@de-dc1-c1-pserver3:~ # getfattr -dm - > /mnt/gluster/brick0/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef > /hdd-images/20819 > getfattr: Removing leading '/' from absolute path names > # file: > mnt/gluster/brick0/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/ > hdd-images/20819 > trusted.gfid=0sa5/rvjUUQ3ibSf32O3izOw== > > 0 root@pserver5:~ # find /mnt/gluster/brick?/ -name 20819 | xargs -i ls -al > {} > -rwxrwx--- 1 libvirt-qemu vcb 53687091200 May 14 17:00 > /mnt/gluster/brick0/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef > /hdd-images/20819 > 0 root@pserver5:~ # getfattr -dm - > /mnt/gluster/brick0/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef > /hdd-images/20819 > getfattr: Removing leading '/' from absolute path names > # file: > mnt/gluster/brick0/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/ > hdd-images/20819 > trusted.afr.storage0-client-0=0sAgIA > trusted.afr.storage0-client-1=0s > trusted.gfid=0sa5/rvjUUQ3ibSf32O3izOw== > > 0 root@pserver12:~ # find /mnt/gluster/brick?/ -name 20819 | xargs -i ls -al > {} > -rwxrwx--- 1 libvirt-qemu kvm 53687091200 May 18 08:41 > /mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef > /hdd-images/20819 > 0 root@pserver12:~ # getfattr -dm - > /mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef > /hdd-images/20819 > getfattr: Removing leading '/' from absolute path names > # file: > mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/ > hdd-images/20819 > trusted.afr.storage0-client-6=0s > trusted.afr.storage0-client-7=0s > trusted.gfid=0sa5/rvjUUQ3ibSf32O3izOw== > trusted.glusterfs.dht.linkto="storage0-replicate-0 > > 0 root@de-dc1-c1-pserver13:~ # find /mnt/gluster/brick?/ -name 20819 | xargs > -i ls -al {} > -rwxrwx--- 1 libvirt-qemu kvm 53687091200 May 18 08:39 > /mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef > /hdd-images/20819 > 0 root@de-dc1-c1-pserver13:~ # getfattr -dm - > /mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef > /hdd-images/20819 > getfattr: Removing leading '/' from absolute path names > # file: > mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/ > hdd-images/20819 > trusted.afr.storage0-client-6=0s > trusted.afr.storage0-client-7=0s > trusted.gfid=0sa5/rvjUUQ3ibSf32O3izOw== > trus
Re: [Gluster-users] Client and server file "view", different results?! Client can't see the right file.
Here is another occurrence: The file 20819 is shown twice, different timestamps and attributes. 0 filesize on pserver3, outdated on pserver5, just 12&13 seems to be in sync. So what's going on? 0 root@de-dc1-c1-pserver13:~ # ls -al /opt/profitbricks/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/h dd-images/2081* -rwxrwx--- 1 libvirt-qemu kvm 53687091200 May 18 08:44 /opt/profitbricks/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/h dd-images/20819 -rwxrwx--- 1 libvirt-qemu kvm 53687091200 May 18 08:44 /opt/profitbricks/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/h dd-images/20819 0 root@de-dc1-c1-pserver3:~ # find /mnt/gluster/brick?/ -name 20819 | xargs -i ls -al {} -rwxrwx--- 1 libvirt-qemu vcb 0 May 14 17:00 /mnt/gluster/brick0/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef /hdd-images/20819 0 root@de-dc1-c1-pserver3:~ # getfattr -dm - /mnt/gluster/brick0/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef /hdd-images/20819 getfattr: Removing leading '/' from absolute path names # file: mnt/gluster/brick0/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/ hdd-images/20819 trusted.gfid=0sa5/rvjUUQ3ibSf32O3izOw== 0 root@pserver5:~ # find /mnt/gluster/brick?/ -name 20819 | xargs -i ls -al {} -rwxrwx--- 1 libvirt-qemu vcb 53687091200 May 14 17:00 /mnt/gluster/brick0/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef /hdd-images/20819 0 root@pserver5:~ # getfattr -dm - /mnt/gluster/brick0/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef /hdd-images/20819 getfattr: Removing leading '/' from absolute path names # file: mnt/gluster/brick0/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/ hdd-images/20819 trusted.afr.storage0-client-0=0sAgIA trusted.afr.storage0-client-1=0s trusted.gfid=0sa5/rvjUUQ3ibSf32O3izOw== 0 root@pserver12:~ # find /mnt/gluster/brick?/ -name 20819 | xargs -i ls -al {} -rwxrwx--- 1 libvirt-qemu kvm 53687091200 May 18 08:41 /mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef /hdd-images/20819 0 root@pserver12:~ # getfattr -dm - /mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef /hdd-images/20819 getfattr: Removing leading '/' from absolute path names # file: mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/ hdd-images/20819 trusted.afr.storage0-client-6=0s trusted.afr.storage0-client-7=0s trusted.gfid=0sa5/rvjUUQ3ibSf32O3izOw== trusted.glusterfs.dht.linkto="storage0-replicate-0 0 root@de-dc1-c1-pserver13:~ # find /mnt/gluster/brick?/ -name 20819 | xargs -i ls -al {} -rwxrwx--- 1 libvirt-qemu kvm 53687091200 May 18 08:39 /mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef /hdd-images/20819 0 root@de-dc1-c1-pserver13:~ # getfattr -dm - /mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef /hdd-images/20819 getfattr: Removing leading '/' from absolute path names # file: mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/ hdd-images/20819 trusted.afr.storage0-client-6=0s trusted.afr.storage0-client-7=0s trusted.gfid=0sa5/rvjUUQ3ibSf32O3izOw== trusted.glusterfs.dht.linkto="storage0-replicate-0 Only entrance in log file on pserver5, no references in the other three logs/servers: 0 root@pserver5:~ # grep 20819 /var/log/glusterfs/opt-profitbricks-storage.log [2011-05-17 20:37:30.52535] I [client-handshake.c:407:client3_1_reopen_cbk] 0-storage0-client-7: reopen on /images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/hdd-images/20819 succeeded (remote-fd = 6) [2011-05-17 20:37:34.824934] I [afr-open.c:435:afr_openfd_sh] 0-storage0-replicate-3: data self-heal triggered. path: /images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/hdd-images/20819, reason: Replicate up down flush, data lock is held [2011-05-17 20:37:34.825557] E [afr-self-heal-common.c:1214:sh_missing_entries_create] 0-storage0-replicate-3: no missing files - /images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/hdd-images/20819. proceeding to metadata check [2011-05-17 21:08:59.241203] I [afr-self-heal-algorithm.c:526:sh_diff_loop_driver_done] 0-storage0-replicate-3: diff self-heal on /images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/hdd-images/20819: 6 blocks of 409600 were different (0.00%) [2011-05-17 21:08:59.275873] I [afr-self-heal-common.c:1527:afr_self_heal_completion_cbk] 0-storage0-replicate-3: background data self-heal completed on /images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/hdd-images/20819 ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Client and server file "view", different results?! Client can't see the right file.
This is an inherited system, I guess it was set up by hand. I guess I can switch off these options, but the glusterd service will have to be restarted, right?!? I'm also getting current error messages like these on the peer pair 3&5: Pserver3 [2011-05-17 10:06:28.540355] E [rpc-clnt.c:199:call_bail] 0-storage0-client-2: bailing out frame type(GlusterFS 3.1) op(FINODELK(30)) xid = 0x805809xsent = 2011-05-17 09:36:18.393519. timeout = 1800 Pserver5 [2011-05-17 10:02:23.738887] E [dht-common.c:1873:dht_getxattr] 0-storage0-dht: layout is NULL [2011-05-17 10:02:23.738909] W [fuse-bridge.c:2499:fuse_xattr_cbk] 0-glusterfs-fuse: 489090: GETXATTR() /images/2078/ebb83b05-3a83-9d18-ad8f-8542864da 6ef/hdd-images/21351 => -1 (No such file or directory) [2011-05-17 10:02:23.738954] W [fuse-bridge.c:660:fuse_setattr_cbk] 0-glusterfs-fuse: 489091: SETATTR() /images/2078/ebb83b05-3a83-9d18-ad8f-8542864da 6ef/hdd-images/21351 => -1 (Invalid argument) Best, Martin -Original Message- From: gluster-users-boun...@gluster.org [mailto:gluster-users-boun...@gluster.org] On Behalf Of Joe Landman Sent: Tuesday, May 17, 2011 1:54 PM To: gluster-users@gluster.org Subject: Re: [Gluster-users] Client and server file "view", different results?! Client can't see the right file. On 05/17/2011 01:43 AM, Martin Schenker wrote: > Yes, it is! > > Here's the volfile: > > cat /mnt/gluster/brick0/config/vols/storage0/storage0-fuse.vol: > > volume storage0-client-0 > type protocol/client > option remote-host de-dc1-c1-pserver3 > option remote-subvolume /mnt/gluster/brick0/storage > option transport-type rdma > option ping-timeout 5 > end-volume Hmmm ... did you create these by hand or using the CLI? I noticed quick-read and stat-cache on. We recommend turning both of them off. We experienced many issues with them on (from gluster 3.x.y) -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics, Inc. email: land...@scalableinformatics.com web : http://scalableinformatics.com http://scalableinformatics.com/sicluster phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Client and server file "view", different results?! Client can't see the right file.
It version 3.1.3 (we tried 3.2.0 for about 10h and rolled back) Unfortunateltly the file view was "repaired" already by brutally copying manually from the correct /mnt (server) mountpoint to the /opt (client) mount which fixed the situation for now. We needed the files accessible ASAP. Best, Martin > -Original Message- > From: Pranith Kumar. Karampuri [mailto:prani...@gluster.com] > Sent: Tuesday, May 17, 2011 10:41 AM > To: Martin Schenker > Cc: gluster-users@gluster.org > Subject: Re: [Gluster-users] Client and server file "view", > different results?! Client can't see the right file. > > > hi Martin, > Could you please gather the following outputs so that we > can debug as to what is happening: > 1) whats the version of the gluster. > 2) backend "ls -l" of files in question on all bricks that > file is replicated on. > 3) 'ls -l" o/p from mnt point for that file. > > Thanks > Pranith > - Original Message - > From: "Martin Schenker" > To: "Pranith Kumar. Karampuri" > Cc: gluster-users@gluster.org > Sent: Tuesday, May 17, 2011 11:13:32 AM > Subject: RE: [Gluster-users] Client and server file "view", > different results?! Client can't see the right file. > > Yes, it is! > > Here's the volfile: > > cat /mnt/gluster/brick0/config/vols/storage0/storage0-fuse.vol: > > volume storage0-client-0 > type protocol/client > option remote-host de-dc1-c1-pserver3 > option remote-subvolume /mnt/gluster/brick0/storage > option transport-type rdma > option ping-timeout 5 > end-volume > > volume storage0-client-1 > type protocol/client > option remote-host de-dc1-c1-pserver5 > option remote-subvolume /mnt/gluster/brick0/storage > option transport-type rdma > option ping-timeout 5 > end-volume > > volume storage0-client-2 > type protocol/client > option remote-host de-dc1-c1-pserver3 > option remote-subvolume /mnt/gluster/brick1/storage > option transport-type rdma > option ping-timeout 5 > end-volume > > volume storage0-client-3 > type protocol/client > option remote-host de-dc1-c1-pserver5 > option remote-subvolume /mnt/gluster/brick1/storage > option transport-type rdma > option ping-timeout 5 > end-volume > > volume storage0-client-4 > type protocol/client > option remote-host de-dc1-c1-pserver12 > option remote-subvolume /mnt/gluster/brick0/storage > option transport-type rdma > option ping-timeout 5 > end-volume > > volume storage0-client-5 > type protocol/client > option remote-host de-dc1-c1-pserver13 > option remote-subvolume /mnt/gluster/brick0/storage > option transport-type rdma > option ping-timeout 5 > end-volume > > volume storage0-client-6 > type protocol/client > option remote-host de-dc1-c1-pserver12 > option remote-subvolume /mnt/gluster/brick1/storage > option transport-type rdma > option ping-timeout 5 > end-volume > > volume storage0-client-7 > type protocol/client > option remote-host de-dc1-c1-pserver13 > option remote-subvolume /mnt/gluster/brick1/storage > option transport-type rdma > option ping-timeout 5 > end-volume > > volume storage0-replicate-0 > type cluster/replicate > subvolumes storage0-client-0 storage0-client-1 > end-volume > > volume storage0-replicate-1 > type cluster/replicate > subvolumes storage0-client-2 storage0-client-3 > end-volume > > volume storage0-replicate-2 > type cluster/replicate > subvolumes storage0-client-4 storage0-client-5 > end-volume > > volume storage0-replicate-3 > type cluster/replicate > subvolumes storage0-client-6 storage0-client-7 > end-volume > > volume storage0-dht > type cluster/distribute > subvolumes storage0-replicate-0 storage0-replicate-1 > storage0-replicate-2 storage0-replicate-3 end-volume > > volume storage0-write-behind > type performance/write-behind > subvolumes storage0-dht > end-volume > > volume storage0-read-ahead > type performance/read-ahead > subvolumes storage0-write-behind > end-volume > > volume storage0-io-cache > type performance/io-cache > option cache-size 4096MB > subvolumes storage0-read-ahead > end-volume > > volume storage0-quick-read > type performance/quick-read > option cache-size 4096MB > subvolumes storage0-io-cache > end-volume > > volume storage0-stat-prefetch > type performance/stat-prefetch > sub
Re: [Gluster-users] Client and server file "view", different results?! Client can't see the right file.
Yes, it is! Here's the volfile: cat /mnt/gluster/brick0/config/vols/storage0/storage0-fuse.vol: volume storage0-client-0 type protocol/client option remote-host de-dc1-c1-pserver3 option remote-subvolume /mnt/gluster/brick0/storage option transport-type rdma option ping-timeout 5 end-volume volume storage0-client-1 type protocol/client option remote-host de-dc1-c1-pserver5 option remote-subvolume /mnt/gluster/brick0/storage option transport-type rdma option ping-timeout 5 end-volume volume storage0-client-2 type protocol/client option remote-host de-dc1-c1-pserver3 option remote-subvolume /mnt/gluster/brick1/storage option transport-type rdma option ping-timeout 5 end-volume volume storage0-client-3 type protocol/client option remote-host de-dc1-c1-pserver5 option remote-subvolume /mnt/gluster/brick1/storage option transport-type rdma option ping-timeout 5 end-volume volume storage0-client-4 type protocol/client option remote-host de-dc1-c1-pserver12 option remote-subvolume /mnt/gluster/brick0/storage option transport-type rdma option ping-timeout 5 end-volume volume storage0-client-5 type protocol/client option remote-host de-dc1-c1-pserver13 option remote-subvolume /mnt/gluster/brick0/storage option transport-type rdma option ping-timeout 5 end-volume volume storage0-client-6 type protocol/client option remote-host de-dc1-c1-pserver12 option remote-subvolume /mnt/gluster/brick1/storage option transport-type rdma option ping-timeout 5 end-volume volume storage0-client-7 type protocol/client option remote-host de-dc1-c1-pserver13 option remote-subvolume /mnt/gluster/brick1/storage option transport-type rdma option ping-timeout 5 end-volume volume storage0-replicate-0 type cluster/replicate subvolumes storage0-client-0 storage0-client-1 end-volume volume storage0-replicate-1 type cluster/replicate subvolumes storage0-client-2 storage0-client-3 end-volume volume storage0-replicate-2 type cluster/replicate subvolumes storage0-client-4 storage0-client-5 end-volume volume storage0-replicate-3 type cluster/replicate subvolumes storage0-client-6 storage0-client-7 end-volume volume storage0-dht type cluster/distribute subvolumes storage0-replicate-0 storage0-replicate-1 storage0-replicate-2 storage0-replicate-3 end-volume volume storage0-write-behind type performance/write-behind subvolumes storage0-dht end-volume volume storage0-read-ahead type performance/read-ahead subvolumes storage0-write-behind end-volume volume storage0-io-cache type performance/io-cache option cache-size 4096MB subvolumes storage0-read-ahead end-volume volume storage0-quick-read type performance/quick-read option cache-size 4096MB subvolumes storage0-io-cache end-volume volume storage0-stat-prefetch type performance/stat-prefetch subvolumes storage0-quick-read end-volume volume storage0 type debug/io-stats subvolumes storage0-stat-prefetch end-volume > -Original Message- > From: Pranith Kumar. Karampuri [mailto:prani...@gluster.com] > Sent: Tuesday, May 17, 2011 7:16 AM > To: Martin Schenker > Cc: gluster-users@gluster.org > Subject: Re: [Gluster-users] Client and server file "view", > different results?! Client can't see the right file. > > > Martin, > Is this a distributed-replicate setup?. Could you > attach the vol-file of the client. > > Pranith > - Original Message - > From: "Martin Schenker" > To: gluster-users@gluster.org > Sent: Monday, May 16, 2011 2:49:29 PM > Subject: [Gluster-users] Client and server file "view", > different results?! Client can't see the right file. > > > Client and server file "view", different results?! Client > can't see the right file. > > Hi all! > > Here we have another mismatch between the client "view" and > the server mounts: > > From the server site everything seems well, the 20G file is > visible and the attributes seem to match: > > 0 root@pserver5:~ # getfattr -R -d -e hex -m "trusted.afr." > /mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8 > f-8542864da6ef/hdd-images/ > > # file: > mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f > -8542864da6ef/hdd-images//20964 > trusted.afr.storage0-client-2=0x > trusted.afr.storage0-client-3=0x > > 0 root@pserver5:~ # find /mnt/gluster/ -name 20964 | xargs -i > ls -al {} > -rwxrwx--- 1 libvirt-qemu vcb 21474836480 May 13 11:21 > /mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8 > f-8542864da6ef/hdd-images/20964 &g
[Gluster-users] Client and server file "view", different results?! Client can't see the right file.
Hi all! Here we have another mismatch between the client "view" and the server mounts: >From the server site everything seems well, the 20G file is visible and the attributes seem to match: 0 root@pserver5:~ # getfattr -R -d -e hex -m "trusted.afr." /mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef /hdd-images/ # file: mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/ hdd-images//20964 trusted.afr.storage0-client-2=0x trusted.afr.storage0-client-3=0x 0 root@pserver5:~ # find /mnt/gluster/ -name 20964 | xargs -i ls -al {} -rwxrwx--- 1 libvirt-qemu vcb 21474836480 May 13 11:21 /mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef /hdd-images/20964 But the client view shows 2!! files with 0 byte size!! And these aren't any link files created by Gluster ( with the T on the end) 0 root@pserver5:~ # find /opt/profitbricks/storage/ -name 20964 | xargs -i ls -al {} -rwxrwx--- 1 libvirt-qemu kvm 0 May 13 11:24 /opt/profitbricks/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/h dd-images/20964 -rwxrwx--- 1 libvirt-qemu kvm 0 May 13 11:24 /opt/profitbricks/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/h dd-images/20964 I'm a bit stumped that we seem to have so many weird errors cropping up. Any ideas? I've checked the ext4 filesystem on all boxes, no real problems. We run a distributed cluster with 4 servers offering 2 bricks each. Best, Martin > -Original Message- > From: Mohit Anchlia [mailto:mohitanch...@gmail.com] > Sent: Monday, May 16, 2011 2:24 AM > To: Martin Schenker > Cc: gluster-users@gluster.org > Subject: Re: [Gluster-users] Brick pair file mismatch, > self-heal problems? > > > Try this to trigger self heal: > > find -noleaf -print0 -name | xargs > --null stat >/dev/null > > > > On Sun, May 15, 2011 at 11:20 AM, Martin Schenker > wrote: > > Can someone enlighten me what's going on here? We have a two peers, > > the file 21313 is shown through the client mountpoint as > "1Jan1970", > > attribs on server pserver3 don't match but NO self-heal or > repair can > > be triggered through "ls -alR"?!? > > > > Checking the files through the server mounts show that two versions > > are on the system. But the wrong one (as with the > "1Jan1970") seems to > > be the preferred one by the client?!? > > > > Do I need to use setattr or what in order to get the client > to see the > > RIGHT version?!? This is not the ONLY file displaying this > problematic > > behaviour! > > > > Thanks for any feedback. > > > > Martin > > > > pserver5: > > > > 0 root@pserver5:~ # ls -al > > > /mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-854286 > > 4da6ef > > /hdd-images > > > > -rwxrwx--- 1 libvirt-qemu vcb 483183820800 May 13 13:41 21313 > > > > 0 root@pserver5:~ # getfattr -R -d -e hex -m "trusted.afr." > > > /mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-854286 > > 4da6ef > > /hdd-images/21313 > > getfattr: Removing leading '/' from absolute path names > > # file: > > > mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f > -8542864da6ef/ > > hdd-images/21313 > > trusted.afr.storage0-client-2=0x > > trusted.afr.storage0-client-3=0x > > > > 0 root@pserver5:~ # ls -alR > > > /opt/profitbricks/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864d > > a6ef/h > > dd-images/21313 > > -rwxrwx--- 1 libvirt-qemu kvm 483183820800 Jan 1 1970 > > > /opt/profitbricks/storage/images/2078/ebb83b05-3a83-9d18-ad8f- > 8542864da6ef/h > > dd-images/21313 > > > > pserver3: > > > > 0 root@pserver3:~ # ls -al > > > /mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-854286 > > 4da6ef > > /hdd-images > > > > -rwxrwx--- 1 libvirt-qemu kvm 483183820800 Jan 1 1970 21313 > > > > 0 root@pserver3:~ # ls -alR > > > /opt/profitbricks/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864d > > a6ef/h > > dd-images/21313 > > -rwxrwx--- 1 libvirt-qemu kvm 483183820800 Jan 1 1970 > > > /opt/profitbricks/storage/images/2078/ebb83b05-3a83-9d18-ad8f- > 8542864da6ef/h > > dd-images/21313 > > > > 0 root@pserver3:~ # getfattr -R -d -e hex -m "trusted.afr." > > /mnt/gluster/brick1/storage/images/2078/ebb83b05-3a
[Gluster-users] Brick pair file mismatch, self-heal problems?
Can someone enlighten me what's going on here? We have a two peers, the file 21313 is shown through the client mountpoint as "1Jan1970", attribs on server pserver3 don't match but NO self-heal or repair can be triggered through "ls -alR"?!? Checking the files through the server mounts show that two versions are on the system. But the wrong one (as with the "1Jan1970") seems to be the preferred one by the client?!? Do I need to use setattr or what in order to get the client to see the RIGHT version?!? This is not the ONLY file displaying this problematic behaviour! Thanks for any feedback. Martin pserver5: 0 root@pserver5:~ # ls -al /mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef /hdd-images -rwxrwx--- 1 libvirt-qemu vcb 483183820800 May 13 13:41 21313 0 root@pserver5:~ # getfattr -R -d -e hex -m "trusted.afr." /mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef /hdd-images/21313 getfattr: Removing leading '/' from absolute path names # file: mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/ hdd-images/21313 trusted.afr.storage0-client-2=0x trusted.afr.storage0-client-3=0x 0 root@pserver5:~ # ls -alR /opt/profitbricks/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/h dd-images/21313 -rwxrwx--- 1 libvirt-qemu kvm 483183820800 Jan 1 1970 /opt/profitbricks/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/h dd-images/21313 pserver3: 0 root@pserver3:~ # ls -al /mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef /hdd-images -rwxrwx--- 1 libvirt-qemu kvm 483183820800 Jan 1 1970 21313 0 root@pserver3:~ # ls -alR /opt/profitbricks/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/h dd-images/21313 -rwxrwx--- 1 libvirt-qemu kvm 483183820800 Jan 1 1970 /opt/profitbricks/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/h dd-images/21313 0 root@pserver3:~ # getfattr -R -d -e hex -m "trusted.afr." /mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18- ad8f-8542864da6ef/hdd-images/21313 getfattr: Removing leading '/' from absolute path names # file: mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/ hdd-images/21313 trusted.afr.storage0-client-2=0x trusted.afr.storage0-client-3=0x0b090900 <- mismatch, should be targeted for self-heal/repair? Why is there a difference in the views? >From the volfile: volume storage0-client-2 type protocol/client option remote-host de-dc1-c1-pserver3 option remote-subvolume /mnt/gluster/brick1/storage option transport-type rdma option ping-timeout 5 end-volume volume storage0-client-3 type protocol/client option remote-host de-dc1-c1-pserver5 option remote-subvolume /mnt/gluster/brick1/storage option transport-type rdma option ping-timeout 5 end-volume ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] How to debug a hanging client?
Error messages on pserver12 (opt*.log) [2011-05-13 11:41:58.812937] E [client-handshake.c:116:rpc_client_ping_timer_expired] 0-storage0-client-0: Server 10.6.0.108:24009 has not responded in the last 5 seconds, disconnecting. [2011-05-13 12:11:57.954369] E [rpc-clnt.c:199:call_bail] 0-storage0-client-0: bailing out frame type(GlusterFS Handshake) op(PING(3)) xid = 0x210x sent = 2011-05-13 11:41:53.422855. timeout = 1800 [2011-05-13 12:11:57.954415] E [rpc-clnt.c:199:call_bail] 0-storage0-client-0: bailing out frame type(GlusterFS 3.1) op(LOOKUP(27)) xid = 0x209x sent = 2011-05-13 11:41:53.422846. timeout = 1800 Errors on pserver8 (the peer): [2011-05-13 14:51:26.727334] E [rdma.c:3423:rdma_handle_failed_send_completion] 0-rpc-transport/rdma: send work request on `mlx4_0' returned error wc.status = 12, wc.vendor_err = 129, post->buf = 0x43fa000, wc.byte_len = 0, post->reused = 8 9791 [2011-05-13 14:51:26.727374] E [rdma.c:3431:rdma_handle_failed_send_completion] 0-rdma: connection between client and se rver not working. check by running 'ibv_srq_pingpong'. also make sure subnet manager is running (eg: 'opensm'), or check if rdma port is valid (or active) by running 'ibv_devinfo'. contact Gluster Support Team if the problem persists. [2011-05-13 14:51:26.727617] E [rpc-clnt.c:340:saved_frames_unwind] (-->/usr/lib/libgfrpc.so.0(rpc_clnt_notify+0x77) [0x 7f397dd0ba07] (-->/usr/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7e) [0x7f397dd0b19e] (-->/usr/lib/libgfrpc.so.0(s aved_frames_destroy+0xe) [0x7f397dd0b0fe]))) 0-rpc-clnt: forced unwinding frame type(GF-DUMP) op(DUMP(1)) called at 2011 -05-13 14:51:22.620059 [2011-05-13 14:51:26.727670] M [client-handshake.c:1178:client_dump_version_cbk] 0-: some error, retry again later [2011-05-13 14:51:26.727686] I [client.c:1601:client_rpc_notify] 0-storage0-client-1: disconnected Could this be a bad IB card? After a reboot of pserver12 the system work again, a try to shut down and restart just the ib0 interface failed (hung) Best, Martin -Original Message- From: Martin Schenker [mailto:martin.schen...@profitbricks.com] Sent: Friday, May 13, 2011 3:36 PM To: 'gluster-users@gluster.org' Subject: How to debug a hanging client? Hi all! We have on server/client where the client part hangs quite often. Strace shows: 0 root@de-blnstage-c2-pserver12:~ # strace -Tfv -p 12407 ( Process 12407 attached with 6 threads - interrupt to quit [pid 12417] futex(0x2cb98a8, FUTEX_WAIT_PRIVATE, 2, NULL [pid 12412] read(12, [pid 12411] read(11, [pid 12410] futex(0x2cb9330, FUTEX_WAIT_PRIVATE, 2, NULL [pid 12408] rt_sigtimedwait([HUP INT TRAP BUS USR1 USR2 PIPE ALRM TERM CHLD TTOU], NULL, NULL, 8 I can read from the server mountpoint just fine but any access to the fuse mounted glusterfs hangs and can only be killed. Any idea how to resolve this? If I try to kill all glusterfs process the kill -9 on the process root 12407 1 0 May11 ?00:00:01 /usr/sbin/glusterfs --log-level=NORMAL --volfile-id=storage0 --volfile-server=localhost /opt/profitbricks/storage will hang as well. Just like an NFS server hang... waiting for I/O Thanks, Martin ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] How to debug a hanging client?
Hi all! We have on server/client where the client part hangs quite often. Strace shows: 0 root@de-blnstage-c2-pserver12:~ # strace -Tfv -p 12407 ( Process 12407 attached with 6 threads - interrupt to quit [pid 12417] futex(0x2cb98a8, FUTEX_WAIT_PRIVATE, 2, NULL [pid 12412] read(12, [pid 12411] read(11, [pid 12410] futex(0x2cb9330, FUTEX_WAIT_PRIVATE, 2, NULL [pid 12408] rt_sigtimedwait([HUP INT TRAP BUS USR1 USR2 PIPE ALRM TERM CHLD TTOU], NULL, NULL, 8 I can read from the server mountpoint just fine but any access to the fuse mounted glusterfs hangs and can only be killed. Any idea how to resolve this? If I try to kill all glusterfs process the kill -9 on the process root 12407 1 0 May11 ?00:00:01 /usr/sbin/glusterfs --log-level=NORMAL --volfile-id=storage0 --volfile-server=localhost /opt/profitbricks/storage will hang as well. Just like an NFS server hang... waiting for I/O Thanks, Martin ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Fuse mounting problems.
Looks like we had an install conflict, in the meantime I dropped the complete install and rebuilt to 3.1.3. Now the system is back up and running... Best, Martin > -Original Message- > From: Mohit Anchlia [mailto:mohitanch...@gmail.com] > Sent: Wednesday, May 11, 2011 6:27 PM > To: Martin Schenker > Cc: gluster-users@gluster.org > Subject: Re: [Gluster-users] Fuse mounting problems. > > > How are you mounting the gluster FS? Can you paste volume > info and your mount commands? > > On Wed, May 11, 2011 at 4:35 AM, Martin Schenker > wrote: > > Looks like we have some problems after a "running upgrade" > from 3.1.3 > > to 3.2.0 > > > > Most pressing is to fix the current error state on > pserver12 after the > > roll-back, the FUSE mount doesn't work: > > > > Error messages in pserver12 opt log: > > > > [2011-05-11 09:45:03.1147] E > [glusterfsd-mgmt.c:628:mgmt_getspec_cbk] > > 0-glusterfs: failed to get the 'volume file' from server > [2011-05-11 > > 09:45:03.1222] E [glusterfsd-mgmt.c:695:mgmt_getspec_cbk] > > 0-mgmt: failed to fetch volume file (key:storage0) [2011-05-11 > > 09:45:03.1421] W [glusterfsd.c:700:cleanup_and_exit] > > (-->/usr/lib/libgfrpc.so.0(rpc_clnt_notify+0xcd) [0x7f98a5032c0d] > > (-->/usr/lib/libgfrpc.so.0(rpc_clnt_handle_reply+0xa4) > > [0x7f98a50329c4] > > (-->/usr/sbin/glusterfs(mgmt_getspec_cbk+0x2fb) > [0x4080eb]))) 0-: received > > signum (0), shutting down > > [2011-05-11 09:45:03.1452] I [fuse-bridge.c:3688:fini] > 0-fuse: Unmounting > > '/opt/profitbricks/storage'. > > > > > > Vol files are located under > /mnt/gluster/brick0/config/vols/storage0/ > > which is mounted and readable. > > > > I can ping both servers with the names given in the FUSE > vol file, so > > they are visible. > > > > Any ideas appreciated! > > > > Best, Martin > > > > > > > > ___ > > Gluster-users mailing list > > Gluster-users@gluster.org > > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users > > > ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] Fuse mounting problems.
Looks like we have some problems after a "running upgrade" from 3.1.3 to 3.2.0 Most pressing is to fix the current error state on pserver12 after the roll-back, the FUSE mount doesn't work: Error messages in pserver12 opt log: [2011-05-11 09:45:03.1147] E [glusterfsd-mgmt.c:628:mgmt_getspec_cbk] 0-glusterfs: failed to get the 'volume file' from server [2011-05-11 09:45:03.1222] E [glusterfsd-mgmt.c:695:mgmt_getspec_cbk] 0-mgmt: failed to fetch volume file (key:storage0) [2011-05-11 09:45:03.1421] W [glusterfsd.c:700:cleanup_and_exit] (-->/usr/lib/libgfrpc.so.0(rpc_clnt_notify+0xcd) [0x7f98a5032c0d] (-->/usr/lib/libgfrpc.so.0(rpc_clnt_handle_reply+0xa4) [0x7f98a50329c4] (-->/usr/sbin/glusterfs(mgmt_getspec_cbk+0x2fb) [0x4080eb]))) 0-: received signum (0), shutting down [2011-05-11 09:45:03.1452] I [fuse-bridge.c:3688:fini] 0-fuse: Unmounting '/opt/profitbricks/storage'. Vol files are located under /mnt/gluster/brick0/config/vols/storage0/ which is mounted and readable. I can ping both servers with the names given in the FUSE vol file, so they are visible. Any ideas appreciated! Best, Martin ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] How do I load the I/O stats translator for theprofiling options?
Hi Pranith! Log says "RPC Program procedure not available", nothing else... 0 root@de-blnstage-c1-pserver4:~ # tail /var/log/glusterfs/etc* [2011-05-09 18:08:07.48398] I [glusterd-handler.c:729:glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req [2011-05-09 18:40:21.382115] E [rpcsvc.c:707:rpcsvc_program_actor] 0-rpc-service: RPC Program procedure not available [2011-05-09 18:41:40.77542] E [rpcsvc.c:707:rpcsvc_program_actor] 0-rpc-service: RPC Program procedure not available [2011-05-09 18:46:52.231895] E [rpcsvc.c:707:rpcsvc_program_actor] 0-rpc-service: RPC Program procedure not available [2011-05-09 18:47:27.543788] I [glusterd-handler.c:775:glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req [2011-05-09 18:47:27.50] I [glusterd-handler.c:775:glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req [2011-05-09 18:47:40.115263] E [rpcsvc.c:707:rpcsvc_program_actor] 0-rpc-service: RPC Program procedure not available [2011-05-09 18:47:42.707978] I [glusterd-handler.c:775:glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req [2011-05-09 18:47:42.708565] I [glusterd-handler.c:775:glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req [2011-05-10 05:49:48.784387] E [rpcsvc.c:707:rpcsvc_program_actor] 0-rpc-service: RPC Program procedure not available [2011-05-10 05:50:30.147291] E [rpcsvc.c:707:rpcsvc_program_actor] 0-rpc-service: RPC Program procedure not available 1 root@de-blnstage-c1-pserver4:~ # tail /var/log/glusterfs/opt* [2011-05-09 16:14:04.934058] I [afr-open.c:435:afr_openfd_sh] 0-storage0-replicate-0: data self-heal triggered. path: /images/2828/4473b07f-d879-2398-4745-0e85ec246342/hdd-images/64190, reason: Replicate up down flush, data lock is held [2011-05-09 16:14:04.934578] E [afr-self-heal-common.c:1214:sh_missing_entries_create] 0-storage0-replicate-0: no missing files - /images/2828/4473b07f-d879-2398-4745-0e85ec246342/hdd-images/64190. proceeding to metadata check [2011-05-09 16:14:04.935980] E [afr-common.c:110:afr_set_split_brain] 0-storage0-replicate-0: invalid argument: inode [2011-05-09 16:14:04.936025] I [afr-self-heal-common.c:1527:afr_self_heal_completion_cbk] 0-storage0-replicate-0: background data self-heal completed on /images/2828/4473b07f-d879-2398-4745-0e85ec246342/hdd-images/64190 Martin > -Original Message- > From: Pranith Kumar. Karampuri [mailto:prani...@gluster.com] > Sent: Tuesday, May 10, 2011 8:20 AM > To: Martin Schenker > Cc: Gluster General Discussion List > Subject: Re: [Gluster-users] How do I load the I/O stats > translator for theprofiling options? > > > Could you please post the logs from the glusterd, it is > generally located at: > /usr/local/var/log/glusterfs/usr-local-etc-glusterfs-glusterd. > vol.log, better zip it and send it. I will take a look and > let you know what the problem is. > > Pranith. > - Original Message - > From: "Martin Schenker" > To: "Pranith Kumar. Karampuri" > Cc: "Gluster General Discussion List" > Sent: Tuesday, May 10, 2011 11:23:03 AM > Subject: RE: [Gluster-users] How do I load the I/O stats > translator for theprofiling options? > > That's what I did after the upgrade to 3.2.0 > > No feedback from the system that the stats are recorded... > > "gluster volume profile storage0 start" didn't respond with > ANYTHING. And no new flags show up in the vol files... > > > Best, Martin > > > > -Original Message- > > From: Pranith Kumar. Karampuri [mailto:prani...@gluster.com] > > Sent: Tuesday, May 10, 2011 7:27 AM > > To: Martin Schenker > > Cc: Gluster General Discussion List > > Subject: Re: [Gluster-users] How do I load the I/O stats > > translator for theprofiling options? > > > > > > hi Martin, > > IO-stats is loaded by default. Please use the profile > > commands listed in the following document to > > start/stop/display profile output. > > http://www.gluster.com/community/documentation/index.php/Glust > > er_3.2:_Running_GlusterFS_Volume_Profile_Command > > > > Pranith > > - Original Message - > > From: "Martin Schenker" > > To: "Gluster General Discussion List" > > Sent: Tuesday, May 10, 2011 12:21:49 AM > > Subject: [Gluster-users] How do I load the I/O stats > > translator for the profiling options? > > > > Just upgraded to 3.2.0 and would like to check the I/O stats. > > But how do I "load the I/O stats translator" properly as > > described in the 3.2 manual? > > > > The manual is a bit vague... > > > > Thanks, Martin > > > > ___ > > Gluster-users mailing list > > Gluster-users@gluster.org > > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users > > > ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] How do I load the I/O stats translator for theprofiling options?
That's what I did after the upgrade to 3.2.0 No feedback from the system that the stats are recorded... "gluster volume profile storage0 start" didn't respond with ANYTHING. And no new flags show up in the vol files... Best, Martin > -Original Message- > From: Pranith Kumar. Karampuri [mailto:prani...@gluster.com] > Sent: Tuesday, May 10, 2011 7:27 AM > To: Martin Schenker > Cc: Gluster General Discussion List > Subject: Re: [Gluster-users] How do I load the I/O stats > translator for theprofiling options? > > > hi Martin, > IO-stats is loaded by default. Please use the profile > commands listed in the following document to > start/stop/display profile output. > http://www.gluster.com/community/documentation/index.php/Glust > er_3.2:_Running_GlusterFS_Volume_Profile_Command > > Pranith > - Original Message - > From: "Martin Schenker" > To: "Gluster General Discussion List" > Sent: Tuesday, May 10, 2011 12:21:49 AM > Subject: [Gluster-users] How do I load the I/O stats > translator for theprofiling options? > > Just upgraded to 3.2.0 and would like to check the I/O stats. > But how do I "load the I/O stats translator" properly as > described in the 3.2 manual? > > The manual is a bit vague... > > Thanks, Martin > > ___ > Gluster-users mailing list > Gluster-users@gluster.org > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users > ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] How do I load the I/O stats translator for the profiling options?
Just upgraded to 3.2.0 and would like to check the I/O stats. But how do I "load the I/O stats translator" properly as described in the 3.2 manual? The manual is a bit vague... Thanks, Martin ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Best practice to stop the Gluster CLIENT process?
Thanks Joe! ...thought so! Thanks for the confirmation. Time to redesign... system was set up before my time. Oh, well... Perhaps this should be added in the manual? Or is that already in there and was overlooked? Best, Martin -Original Message- From: Joe Landman [mailto:land...@scalableinformatics.com] Sent: Friday, May 06, 2011 10:22 PM To: Martin Schenker Subject: Re: [Gluster-users] Best practice to stop the Gluster CLIENT process? On 05/06/2011 04:13 PM, Martin Schenker wrote: > Won't help if it's the logging process itself, I guess?!? Had that > earlier... > > Is it recommended to log somewhere else than the Gluster file system? Yes. You should log to a file system on a device that the Gluster file system is not on. > > Best, Martin > > -Original Message- > From: gluster-users-boun...@gluster.org > [mailto:gluster-users-boun...@gluster.org] On Behalf Of Joe Landman > Sent: Friday, May 06, 2011 9:36 PM > To: gluster-users@gluster.org > Subject: Re: [Gluster-users] Best practice to stop the Gluster CLIENT > process? > > On 05/06/2011 03:14 PM, Martin Schenker wrote: >> So if I get this right, you'll have to rip the heart out (kill all gluster >> processes; server AND client) in order to get to the local server >> filesystem. >> >> I had hoped that the client part could be left running (to the second > mirror >> brick) when doing repairs etc. Looks like a wrong assumption, I guess... > > or use fuser/lsof to determine which process is locking which volume. > Kill only that process. > > -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics Inc. email: land...@scalableinformatics.com web : http://scalableinformatics.com http://scalableinformatics.com/sicluster phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Best practice to stop the Gluster CLIENT process?
So if I get this right, you'll have to rip the heart out (kill all gluster processes; server AND client) in order to get to the local server filesystem. I had hoped that the client part could be left running (to the second mirror brick) when doing repairs etc. Looks like a wrong assumption, I guess... Are client/server hybrids ONLY connected to the LOCAL server? Best, Martin ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] [SPAM?] Do clients need to run glusterd?
That's exactly how we're running our systems. Boxes are both, server and clients. So there's then no way to separate the client functionality from the server part? No clean cut between glusterd and glusterfsd? Best, Martin -Original Message- From: gluster-users-boun...@gluster.org [mailto:gluster-users-boun...@gluster.org] On Behalf Of Anthony J. Biacco Sent: Friday, May 06, 2011 8:53 PM To: Burnash, James; gluster-users Subject: Re: [Gluster-users] [SPAM?] Do clients need to run glusterd? I never understood before why shutting down the glusterd service killed the clients too. But given this, now I get it. I'm guessing it would not be recommended or supported then to run a gluster client on the same machine as a gluster server? Even if the gluster client is connecting to a server other than the one on the local machine. -Tony --- Manager, IT Operations Format Dynamics, Inc. P: 303-228-7327 F: 303-228-7305 abia...@formatdynamics.com http://www.formatdynamics.com ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Best practice to stop the Gluster CLIENT process?
Thanks for all the responses! That's what I did, umount the client dir. But this STILL left the filesystem locked... no luck here. I'll try James' script next. Best, Martin ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] Best practice to stop the Gluster CLIENT process?
Hi all! What's the best way to stop the CLIENT process for Gluster? We have dual systems, where the Gluster servers also act as clients, so both, glusterd and glusterfsd are running on the system. Stopping the server app. works via "/etc/init.d/glusterd stop" but the client is stopped how? I need to unmount the filesystem from the server in order to do a fsck on the ext4 volume; we have the "needs_recovery" flag set. But the client is hogging it as well due to the log files being located on the volume (might be a good idea to log somewhere else...) Any pointers welcome, I find it difficult to obtain "simple" instructions like this from the Gluster pages. Even google doesn't help, sigh. Or I'm too blind... Best, Martin ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] Output of "getfattr" command, what does the code tell me?
Is there a way to make use of the attr code given by 0 root@de-dc1-c1-pserver13:~ # getfattr -R -d -e hex -m "trusted.afr." /mnt/gluster/brick?/storage | grep -v 0x | grep -B1 -A1 trusted # file: mnt/gluster/brick0/storage/pserver3-23 trusted.afr.storage0-client-4=0xd701 or 0 root@de-dc1-c1-pserver13:~ # getfattr -R -d -m "trusted.afr." /mnt/gluster/brick?/storage | grep -v 0s | grep -B1 -A1 trusted # file: mnt/gluster/brick0/storage/pserver3-23 trusted.afr.storage0-client-4=0s1wAAAQAA Is there any useful information hidden in the attribute strings? I guess so but I failed to find anything with Google etc. Thanks, Martin ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Split brain; which file to choose for repair?
Thanks! Unfortunately the md5sums don't match... but the sizes & timestamps do. These binary files from VMs are quite difficult to work with. That's why I was after some ideas WHICH file should be preferred. I'm actually quite concerned how on earth do we trigger "split brain" conditions on a regular basis. Are there any good "DON'Ts" for server updates/maintenance? The current way is shutting down one server , upgrading/installing it, bringing up the Gluster daemons (server & client)/mounting the file system, checking everything is running. Then starting on the next server. So a pure sequential approach. Thought this would minimise the risk of getting the files in a twist?!? Best, Martin From: Anand Avati [mailto:anand.av...@gmail.com] Sent: Wednesday, May 04, 2011 2:55 PM To: Martin Schenker Cc: gluster-users@gluster.org Subject: Re: [Gluster-users] Split brain; which file to choose for repair? Occurance of a split brain situation is only under a specific sequence of events and modifications and the filesystem cannot decide which of the two copies of the file is updated. It might so happen that the two changes were actually the same "change" and hence the two copies of your file might match md5sum (in which case you can delete one arbitrarily). If not, you need to know how your application works and which of the file (inspecting the content) is more appropriate to be deleted. Avati On Wed, May 4, 2011 at 5:54 PM, Martin Schenker wrote: Hi all! Is there anybody who can give some pointers regarding which file to choose in a "split brain" condition? What tests do I need to run? What does the hex AFR code actually show? Is there a way to pinpoint the "better/worse" file for deletion? On pserver12: # file: mnt/gluster/brick0/storage/pserver3-19 trusted.afr.storage0-client-5=0x3f01 On pserver13: # file: mnt/gluster/brick0/storage/pserver3-19 trusted.afr.storage0-client-4=0xd701 These are test files, but I'd like to know what to do in a LIFE situation which will be just around the corner. The Timestamps show the same values, so I'm a bit puzzled HOW to choose a file. pserver12: 0 root@de-dc1-c1-pserver12:~ # ls -al /mnt/gluster/brick0/storage/pserver3-19 -rw-r--r-- 1 vcb root 3456106496 Apr 29 17:40 /mnt/gluster/brick0/storage/pserver3-19 0 root@de-dc1-c1-pserver12:~ # ls -alu /mnt/gluster/brick0/storage/pserver3-19 -rw-r--r-- 1 vcb root 3456106496 Apr 28 16:18 /mnt/gluster/brick0/storage/pserver3-19 pserver13: 0 root@de-dc1-c1-pserver13:~ # ls -al /mnt/gluster/brick0/storage/pserver3-19 -rw-r--r-- 1 vcb root 3456106496 Apr 29 17:40 /mnt/gluster/brick0/storage/pserver3-19 0 root@de-dc1-c1-pserver13:~ # ls -alu /mnt/gluster/brick0/storage/pserver3-19 -rw-r--r-- 1 vcb root 3456106496 Apr 28 16:18 /mnt/gluster/brick0/storage/pserver3-19 Best, Martin ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Split brain; which file to choose for repair?
Hi all! Is there anybody who can give some pointers regarding which file to choose in a "split brain" condition? What tests do I need to run? What does the hex AFR code actually show? Is there a way to pinpoint the "better/worse" file for deletion? On pserver12: # file: mnt/gluster/brick0/storage/pserver3-19 trusted.afr.storage0-client-5=0x3f01 On pserver13: # file: mnt/gluster/brick0/storage/pserver3-19 trusted.afr.storage0-client-4=0xd701 These are test files, but I'd like to know what to do in a LIFE situation which will be just around the corner. The Timestamps show the same values, so I'm a bit puzzled HOW to choose a file. pserver12: 0 root@de-dc1-c1-pserver12:~ # ls -al /mnt/gluster/brick0/storage/pserver3-19 -rw-r--r-- 1 vcb root 3456106496 Apr 29 17:40 /mnt/gluster/brick0/storage/pserver3-19 0 root@de-dc1-c1-pserver12:~ # ls -alu /mnt/gluster/brick0/storage/pserver3-19 -rw-r--r-- 1 vcb root 3456106496 Apr 28 16:18 /mnt/gluster/brick0/storage/pserver3-19 pserver13: 0 root@de-dc1-c1-pserver13:~ # ls -al /mnt/gluster/brick0/storage/pserver3-19 -rw-r--r-- 1 vcb root 3456106496 Apr 29 17:40 /mnt/gluster/brick0/storage/pserver3-19 0 root@de-dc1-c1-pserver13:~ # ls -alu /mnt/gluster/brick0/storage/pserver3-19 -rw-r--r-- 1 vcb root 3456106496 Apr 28 16:18 /mnt/gluster/brick0/storage/pserver3-19 Best, Martin ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] Split brain; which file to choose for repair?
Hi all! Another incident, now a real "split brain" situation: Server pair 12 & 13, a set of files can't be repaired and throws errors. Is there a way to interpret the AFR code in order to select which files should be chosen to be deleted/overwritten?! No errors in opt-profitbricks-storage.log from pserver12; but opt-profitbricks-storage.log from pserver13 says: [2011-05-03 18:14:29.343512] I [afr-common.c:672:afr_lookup_done] 0-storage0-replicate-2: split brain detected during lookup of /pserver3-11. [2011-05-03 18:14:29.344467] E [afr-self-heal-data.c:645:afr_sh_data_fix] 0-storage0-replicate-2: Unable to self-heal contents of '/pserver3-11' (possible split-brain). Please delete the file from all but the preferred subvolume. [2011-05-03 18:14:29.347376] I [afr-common.c:672:afr_lookup_done] 0-storage0-replicate-2: split brain detected during lookup of /pserver3-16. [2011-05-03 18:14:29.348157] E [afr-self-heal-data.c:645:afr_sh_data_fix] 0-storage0-replicate-2: Unable to self-heal contents of '/pserver3-16' (possible split-brain). Please delete the file from all but the preferred subvolume. [2011-05-03 18:14:29.349013] I [afr-common.c:672:afr_lookup_done] 0-storage0-replicate-2: split brain detected during lookup of /pserver3-17. [2011-05-03 18:14:29.349817] E [afr-self-heal-data.c:645:afr_sh_data_fix] 0-storage0-replicate-2: Unable to self-heal contents of '/pserver3-17' (possible split-brain). Please delete the file from all but the preferred subvolume. [2011-05-03 18:14:29.351252] I [afr-common.c:672:afr_lookup_done] 0-storage0-replicate-2: split brain detected during lookup of /pserver3-19. [2011-05-03 18:14:29.352043] E [afr-self-heal-data.c:645:afr_sh_data_fix] 0-storage0-replicate-2: Unable to self-heal contents of '/pserver3-19' (possible split-brain). Please delete the file from all but the preferred subvolume. [2011-05-03 18:14:29.353477] I [afr-common.c:672:afr_lookup_done] 0-storage0-replicate-2: split brain detected during lookup of /pserver3-20. [2011-05-03 18:14:29.354242] E [afr-self-heal-data.c:645:afr_sh_data_fix] 0-storage0-replicate-2: Unable to self-heal contents of '/pserver3-20' (possible split-brain). Please delete the file from all but the preferred subvolume. [2011-05-03 18:14:29.356343] I [afr-common.c:672:afr_lookup_done] 0-storage0-replicate-2: split brain detected during lookup of /pserver3-23. [2011-05-03 18:14:29.357198] E [afr-self-heal-data.c:645:afr_sh_data_fix] 0-storage0-replicate-2: Unable to self-heal contents of '/pserver3-23' (possible split-brain). Please delete the file from all but the preferred subvolume. [2011-05-03 18:14:29.358030] I [afr-common.c:672:afr_lookup_done] 0-storage0-replicate-2: split brain detected during lookup of /pserver3-24. [2011-05-03 18:14:29.358877] E [afr-self-heal-data.c:645:afr_sh_data_fix] 0-storage0-replicate-2: Unable to self-heal contents of '/pserver3-24' (possible split-brain). Please delete the file from all but the preferred subvolume. [2011-05-03 18:14:29.362652] I [afr-common.c:672:afr_lookup_done] 0-storage0-replicate-2: split brain detected during lookup of /pserver3-3. [2011-05-03 18:14:29.363431] E [afr-self-heal-data.c:645:afr_sh_data_fix] 0-storage0-replicate-2: Unable to self-heal contents of '/pserver3-3' (possible split-brain). Please delete the file from all but the preferred subvolume. [2011-05-03 18:14:29.364261] I [afr-common.c:672:afr_lookup_done] 0-storage0-replicate-2: split brain detected during lookup of /pserver3-30. [2011-05-03 18:14:29.365041] E [afr-self-heal-data.c:645:afr_sh_data_fix] 0-storage0-replicate-2: Unable to self-heal contents of '/pserver3-30' (possible split-brain). Please delete the file from all but the preferred subvolume. [2011-05-03 18:14:29.368924] I [afr-common.c:672:afr_lookup_done] 0-storage0-replicate-2: split brain detected during lookup of /pserver3-36. [2011-05-03 18:14:29.369682] E [afr-self-heal-data.c:645:afr_sh_data_fix] 0-storage0-replicate-2: Unable to self-heal contents of '/pserver3-36' (possible split-brain). Please delete the file from all but the preferred subvolume. [2011-05-03 18:14:29.371696] I [afr-common.c:672:afr_lookup_done] 0-storage0-replicate-2: split brain detected during lookup of /pserver3-39. [2011-05-03 18:14:29.372451] E [afr-self-heal-data.c:645:afr_sh_data_fix] 0-storage0-replicate-2: Unable to self-heal contents of '/pserver3-39' (possible split-brain). Please delete the file from all but the preferred subvolume. [2011-05-03 18:14:29.373939] I [afr-common.c:672:afr_lookup_done] 0-storage0-replicate-2: split brain detected during lookup of /pserver3-5. [2011-05-03 18:14:29.374705] E [afr-self-heal-data.c:645:afr_sh_data_fix] 0-storage0-replicate-2: Unable to self-heal contents of '/pserver3-5' (possible split-brain). Please delete the file from all but the preferred subvolume. 0 root@de-dc1-c1-pserver12:/var/log/glusterfs # getfattr -R -d -e hex -m "trusted.afr." /mnt/gluster/brick?/storage | grep -v 0x | grep -B1 -A
Re: [Gluster-users] Server outage, file sync/self-heal doesn't sync ALL files?!
Hi all! After the start of pserver12 I ran the getfattr command on all 4 systems in order to check which files were out of sync. This came back with 63 files on pserver12 and none on the others. After starting the gluster server and client daemons on 12, the first batch was done automagically, as stated before. But not all of them as II would have expected. Best, Martin 2011/4/29 Pranith Kumar. Karampuri > This means that there is no differences in gfids. Could you let me know how > the self heal is done after the pserver12 was brought up?. > How did you find out that the self-heal is needed for 63 files?. > > Pranith. > - Original Message ----- > From: "Martin Schenker" > To: "Pranith Kumar. Karampuri" , > gluster-users@gluster.org > Sent: Friday, April 29, 2011 11:05:55 PM > Subject: Re: [Gluster-users] Server outage, file sync/self-heal doesn't > sync ALL files?! > > Sorry, I had manually sync due to imminent server upgrades. > 50 min. after the initial sync I was asked to bring the servers in a > safe state for an upgrade and did a manual > "touch-on-server13-client-mountpoint" which triggered an immediate > self-heal on the rest of the files. > > All files were in sync across all four server after this action. Will > run this command next time!! > > Best, Martin > > Am 29.04.2011 19:30, schrieb Pranith Kumar. Karampuri: > > hi Martin, > >Could you please send the output of -m "trusted*" instead of > "trusted.afr" for the remaining 24 files from both the servers. I would like > to see the gfids of these files on both the machines. > > > > Pranith. > > - Original Message - > > From: "Martin Schenker" > > To: gluster-users@gluster.org > > Sent: Friday, April 29, 2011 8:39:46 PM > > Subject: [Gluster-users] Server outage, file sync/self-heal doesn't > sync ALL files?! > > > > Hi all! > > > > We have another incident over here. > > > > One of the servers (pserver12) in a pair (12& 13) has been rebooted. > > pserver13 showed 63 files not in sync after the outage for 2h. > > > > Both server are clients as well. > > > > Starting pserver12 brought up the self-heal mechanism, but only 39 files > > were triggered within the first 10 min. Now the system seems dormant and > > 24 files are left hanging. > > > > On the other three servers no inconsistencies are seen. > > > > tail of client log file: > > > > 2011-04-29 14:48:23.820022] I > > [afr-self-heal-algorithm.c:526:sh_diff_loop_driver_done] > > 0-storage0-replicate-2: diff self-heal on /pserver13-17: 1960 blocks of > > 22736 were different (8.62%) > > [2011-04-29 14:48:23.887651] E [afr-common.c:110:afr_set_split_brain] > > 0-storage0-replicate-2: invalid argument: inode > > [2011-04-29 14:48:23.887740] I > > [afr-self-heal-common.c:1527:afr_self_heal_completion_cbk] > > 0-storage0-replicate-2: background data self-heal completed on > > /pserver13-17 > > [2011-04-29 14:48:24.272220] I > > [afr-self-heal-algorithm.c:526:sh_diff_loop_driver_done] > > 0-storage0-replicate-2: diff self-heal on /pserver13-19: 1960 blocks of > > 22744 were different (8.62%) > > [2011-04-29 14:48:24.341868] E [afr-common.c:110:afr_set_split_brain] > > 0-storage0-replicate-2: invalid argument: inode > > [2011-04-29 14:48:24.341959] I > > [afr-self-heal-common.c:1527:afr_self_heal_completion_cbk] > > 0-storage0-replicate-2: background data self-heal completed on > > /pserver13-19 > > [2011-04-29 14:48:24.758131] I > > [afr-self-heal-algorithm.c:526:sh_diff_loop_driver_done] > > 0-storage0-replicate-2: diff self-heal on /pserver13-23: 1952 blocks of > > 22752 were different (8.58%) > > [2011-04-29 14:48:24.766054] E [afr-common.c:110:afr_set_split_brain] > > 0-storage0-replicate-2: invalid argument: inode > > [2011-04-29 14:48:24.766137] I > > [afr-self-heal-common.c:1527:afr_self_heal_completion_cbk] > > 0-storage0-replicate-2: background data self-heal completed on > > /pserver13-23 > > [2011-04-29 14:48:24.884613] I > > [afr-self-heal-algorithm.c:526:sh_diff_loop_driver_done] > > 0-storage0-replicate-2: diff self-heal on /pserver13-10: 1952 blocks of > > 22760 were different (8.58%) > > [2011-04-29 14:48:24.895631] E [afr-common.c:110:afr_set_split_brain] > > 0-storage0-replicate-2: invalid argument: inode > > [2011-04-29 14:48:24.895721] I > > [afr-self-heal-common.c:1527:afr_self_heal_completion_cbk] > > 0-storage0-replicate-2: background data self-heal completed on > > /pserver13-10 > &g
Re: [Gluster-users] Server outage, file sync/self-heal doesn't sync ALL files?!
Sorry, I had manually sync due to imminent server upgrades. 50 min. after the initial sync I was asked to bring the servers in a safe state for an upgrade and did a manual "touch-on-server13-client-mountpoint" which triggered an immediate self-heal on the rest of the files. All files were in sync across all four server after this action. Will run this command next time!! Best, Martin Am 29.04.2011 19:30, schrieb Pranith Kumar. Karampuri: hi Martin, Could you please send the output of -m "trusted*" instead of "trusted.afr" for the remaining 24 files from both the servers. I would like to see the gfids of these files on both the machines. Pranith. - Original Message - From: "Martin Schenker" To: gluster-users@gluster.org Sent: Friday, April 29, 2011 8:39:46 PM Subject: [Gluster-users] Server outage, file sync/self-heal doesn't sync ALL files?! Hi all! We have another incident over here. One of the servers (pserver12) in a pair (12& 13) has been rebooted. pserver13 showed 63 files not in sync after the outage for 2h. Both server are clients as well. Starting pserver12 brought up the self-heal mechanism, but only 39 files were triggered within the first 10 min. Now the system seems dormant and 24 files are left hanging. On the other three servers no inconsistencies are seen. tail of client log file: 2011-04-29 14:48:23.820022] I [afr-self-heal-algorithm.c:526:sh_diff_loop_driver_done] 0-storage0-replicate-2: diff self-heal on /pserver13-17: 1960 blocks of 22736 were different (8.62%) [2011-04-29 14:48:23.887651] E [afr-common.c:110:afr_set_split_brain] 0-storage0-replicate-2: invalid argument: inode [2011-04-29 14:48:23.887740] I [afr-self-heal-common.c:1527:afr_self_heal_completion_cbk] 0-storage0-replicate-2: background data self-heal completed on /pserver13-17 [2011-04-29 14:48:24.272220] I [afr-self-heal-algorithm.c:526:sh_diff_loop_driver_done] 0-storage0-replicate-2: diff self-heal on /pserver13-19: 1960 blocks of 22744 were different (8.62%) [2011-04-29 14:48:24.341868] E [afr-common.c:110:afr_set_split_brain] 0-storage0-replicate-2: invalid argument: inode [2011-04-29 14:48:24.341959] I [afr-self-heal-common.c:1527:afr_self_heal_completion_cbk] 0-storage0-replicate-2: background data self-heal completed on /pserver13-19 [2011-04-29 14:48:24.758131] I [afr-self-heal-algorithm.c:526:sh_diff_loop_driver_done] 0-storage0-replicate-2: diff self-heal on /pserver13-23: 1952 blocks of 22752 were different (8.58%) [2011-04-29 14:48:24.766054] E [afr-common.c:110:afr_set_split_brain] 0-storage0-replicate-2: invalid argument: inode [2011-04-29 14:48:24.766137] I [afr-self-heal-common.c:1527:afr_self_heal_completion_cbk] 0-storage0-replicate-2: background data self-heal completed on /pserver13-23 [2011-04-29 14:48:24.884613] I [afr-self-heal-algorithm.c:526:sh_diff_loop_driver_done] 0-storage0-replicate-2: diff self-heal on /pserver13-10: 1952 blocks of 22760 were different (8.58%) [2011-04-29 14:48:24.895631] E [afr-common.c:110:afr_set_split_brain] 0-storage0-replicate-2: invalid argument: inode [2011-04-29 14:48:24.895721] I [afr-self-heal-common.c:1527:afr_self_heal_completion_cbk] 0-storage0-replicate-2: background data self-heal completed on /pserver13-10 0 root@pserver13:/var/log/glusterfs # date Fri Apr 29 15:08:18 UTC 2011 Search for mismatch: 0 root@pserver13:~ # getfattr -R -d -e hex -m "trusted.afr." /mnt/gluster/brick?/storage | grep -v 0x | grep -B1 -A1 trusted | grep -c file getfattr: Removing leading '/' from absolute path names *24* 0 root@pserver13:~ # getfattr -R -d -e hex -m "trusted.afr." /mnt/gluster/brick?/storage | grep -v 0x | grep -B1 trusted getfattr: Removing leading '/' from absolute path names # file: mnt/gluster/brick0/storage/de-dc1-c1-pserver5-33 trusted.afr.storage0-client-4=0x2701 -- # file: mnt/gluster/brick0/storage/de-dc1-c1-pserver5-26 trusted.afr.storage0-client-4=0x2701 -- # file: mnt/gluster/brick0/storage/images/1959/cd55c5f3-9aa1-bfd9-99a0-01c13a7d8559/hdd-images trusted.afr.storage0-client-4=0x00160001 -- # file: mnt/gluster/brick0/storage/de-dc1-c1-pserver5-24 trusted.afr.storage0-client-4=0x2701 -- # file: mnt/gluster/brick0/storage/de-dc1-c1-pserver5-8 trusted.afr.storage0-client-4=0x2701 -- # file: mnt/gluster/brick0/storage/de-dc1-c1-pserver5-21 trusted.afr.storage0-client-4=0x2701 -- # file: mnt/gluster/brick0/storage/de-dc1-c1-pserver5-22 trusted.afr.storage0-client-4=0x2701 -- # file: mnt/gluster/brick0/storage/de-dc1-c1-pserver5-30 trusted.afr.storage0-client-4=0x2701 -- # file: mnt/gluster/brick0/storage/de-dc1-c1-pserver5-20 trusted.afr.storage0-client-4=0x2701 -- # file: mnt/gluster/brick0/storage/de-
[Gluster-users] Server outage, file sync/self-heal doesn't sync ALL files?!
Hi all! We have another incident over here. One of the servers (pserver12) in a pair (12 & 13) has been rebooted. pserver13 showed 63 files not in sync after the outage for 2h. Both server are clients as well. Starting pserver12 brought up the self-heal mechanism, but only 39 files were triggered within the first 10 min. Now the system seems dormant and 24 files are left hanging. On the other three servers no inconsistencies are seen. tail of client log file: 2011-04-29 14:48:23.820022] I [afr-self-heal-algorithm.c:526:sh_diff_loop_driver_done] 0-storage0-replicate-2: diff self-heal on /pserver13-17: 1960 blocks of 22736 were different (8.62%) [2011-04-29 14:48:23.887651] E [afr-common.c:110:afr_set_split_brain] 0-storage0-replicate-2: invalid argument: inode [2011-04-29 14:48:23.887740] I [afr-self-heal-common.c:1527:afr_self_heal_completion_cbk] 0-storage0-replicate-2: background data self-heal completed on /pserver13-17 [2011-04-29 14:48:24.272220] I [afr-self-heal-algorithm.c:526:sh_diff_loop_driver_done] 0-storage0-replicate-2: diff self-heal on /pserver13-19: 1960 blocks of 22744 were different (8.62%) [2011-04-29 14:48:24.341868] E [afr-common.c:110:afr_set_split_brain] 0-storage0-replicate-2: invalid argument: inode [2011-04-29 14:48:24.341959] I [afr-self-heal-common.c:1527:afr_self_heal_completion_cbk] 0-storage0-replicate-2: background data self-heal completed on /pserver13-19 [2011-04-29 14:48:24.758131] I [afr-self-heal-algorithm.c:526:sh_diff_loop_driver_done] 0-storage0-replicate-2: diff self-heal on /pserver13-23: 1952 blocks of 22752 were different (8.58%) [2011-04-29 14:48:24.766054] E [afr-common.c:110:afr_set_split_brain] 0-storage0-replicate-2: invalid argument: inode [2011-04-29 14:48:24.766137] I [afr-self-heal-common.c:1527:afr_self_heal_completion_cbk] 0-storage0-replicate-2: background data self-heal completed on /pserver13-23 [2011-04-29 14:48:24.884613] I [afr-self-heal-algorithm.c:526:sh_diff_loop_driver_done] 0-storage0-replicate-2: diff self-heal on /pserver13-10: 1952 blocks of 22760 were different (8.58%) [2011-04-29 14:48:24.895631] E [afr-common.c:110:afr_set_split_brain] 0-storage0-replicate-2: invalid argument: inode [2011-04-29 14:48:24.895721] I [afr-self-heal-common.c:1527:afr_self_heal_completion_cbk] 0-storage0-replicate-2: background data self-heal completed on /pserver13-10 0 root@pserver13:/var/log/glusterfs # date Fri Apr 29 15:08:18 UTC 2011 Search for mismatch: 0 root@pserver13:~ # getfattr -R -d -e hex -m "trusted.afr." /mnt/gluster/brick?/storage | grep -v 0x | grep -B1 -A1 trusted | grep -c file getfattr: Removing leading '/' from absolute path names *24* 0 root@pserver13:~ # getfattr -R -d -e hex -m "trusted.afr." /mnt/gluster/brick?/storage | grep -v 0x | grep -B1 trusted getfattr: Removing leading '/' from absolute path names # file: mnt/gluster/brick0/storage/de-dc1-c1-pserver5-33 trusted.afr.storage0-client-4=0x2701 -- # file: mnt/gluster/brick0/storage/de-dc1-c1-pserver5-26 trusted.afr.storage0-client-4=0x2701 -- # file: mnt/gluster/brick0/storage/images/1959/cd55c5f3-9aa1-bfd9-99a0-01c13a7d8559/hdd-images trusted.afr.storage0-client-4=0x00160001 -- # file: mnt/gluster/brick0/storage/de-dc1-c1-pserver5-24 trusted.afr.storage0-client-4=0x2701 -- # file: mnt/gluster/brick0/storage/de-dc1-c1-pserver5-8 trusted.afr.storage0-client-4=0x2701 -- # file: mnt/gluster/brick0/storage/de-dc1-c1-pserver5-21 trusted.afr.storage0-client-4=0x2701 -- # file: mnt/gluster/brick0/storage/de-dc1-c1-pserver5-22 trusted.afr.storage0-client-4=0x2701 -- # file: mnt/gluster/brick0/storage/de-dc1-c1-pserver5-30 trusted.afr.storage0-client-4=0x2701 -- # file: mnt/gluster/brick0/storage/de-dc1-c1-pserver5-20 trusted.afr.storage0-client-4=0x2701 -- # file: mnt/gluster/brick0/storage/de-dc1-c1-pserver5-9 trusted.afr.storage0-client-4=0x2701 -- # file: mnt/gluster/brick0/storage/de-dc1-c1-pserver5-38 trusted.afr.storage0-client-4=0x2701 -- # file: mnt/gluster/brick1/storage/de-dc1-c1-pserver5-18 trusted.afr.storage0-client-6=0x2701 -- # file: mnt/gluster/brick1/storage/de-dc1-c1-pserver5-2 trusted.afr.storage0-client-6=0x2701 -- # file: mnt/gluster/brick1/storage/de-dc1-c1-pserver5-23 trusted.afr.storage0-client-6=0x2701 -- # file: mnt/gluster/brick1/storage/de-dc1-c1-pserver5-4 trusted.afr.storage0-client-6=0x2701 -- # file: mnt/gluster/brick1/storage/de-dc1-c1-pserver5-3 trusted.afr.storage0-client-6=0x2701 -- # file: mnt/gluster/brick1/storage/de-dc1-c1-pserver5-34 trusted.afr.storage0-client-6=0x2701 -- # file: mnt/gluster/brick1/storage/de-
[Gluster-users] How to preserve log files when servers are restarted?
Hi all! Is the a better way to save the logfiles for glusterd and glusterfs processes than manually copy them away before restarting a server? / "gluster volume log rotate/ " just works on the brick storage log, no other files seem to be addressable?! This makes error tracking a bit difficult when the logs are gone... Thanks, Martin ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] "Self heal" or sync issues?
With millions of files on a system this is a HUGE overhead. Running the getfattr command just for the mismatched files and using this as a source for triggering self-heal/sync might be a better option and less costly... but you might see files in the process of being sync'd as well. I'd only trigger a repair after a certain time has passed and nothing has happened. I'm still puzzled what caused the mismatch and why it didn't get repaired on it's own? Any ideas? Best, Martin Am 28.04.2011 16:48, schrieb Whit Blauvelt: On Thu, Apr 28, 2011 at 04:16:51PM +0200, Martin Schenker wrote: After triggering manually with "touch" using the right *CLIENT* mount points, the self-heal/sync function worked fine. I was using the server mounts before, as shown by the getfattr output. Not good... Now the question remains WHY the Gluster system didn't do anything on it's own? Is this a "healthy" situation and we shouldn't worry? Would it be good practice to regularly run a script to trigger any self-healing that might be necessary - or to test if necessary (how?) and then run on that condition? It would be easy, for instance, to use Python's os.walk function to run through and touch - or whatever - every file in the space. That adds a non-trivial load to a system, but for systems with load to spare, would running that say every hour, or every day, but good practice? Whit ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] "Self heal" or sync issues?
Hi all! Thanks for the pointers! After triggering manually with "touch" using the right *CLIENT* mount points, the self-heal/sync function worked fine. I was using the server mounts before, as shown by the getfattr output. Not good... Now the question remains WHY the Gluster system didn't do anything on it's own? Is this a "healthy" situation and we shouldn't worry? Best, Martin ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] "Self heal" or sync issues?
Hi all! We're running a 4 node cluster on Gluster 3.1.3 currently. After a staged server update/reboot only ONE of the 4 servers shows some mismatches in the file attributes. It shows that 28 files differ from /0x/ the "all-in-sync" state. No sync or self heal has happened within the last 16h, we checked last night, this morning and now. Even after opening each file with /od -c | head -2/ the self-heal/sync process doesn't seem to start. There are no errors in the logs, how can I check that things ARE happening correctly? /0 root@de-dc1-c1-pserver5:~ # getfattr -R -d -e hex -m "trusted.afr." /mnt/gluster/brick?/storage | grep -v 0x | grep -A1 -B1 trusted getfattr: Removing leading '/' from absolute path names # file: mnt/gluster/brick0/storage/images/1831/db88d55e-3282-c7c6-d1dd-ec41a665011f/hdd-images/8987 trusted.afr.storage0-client-0=0x0100 -- # file: mnt/gluster/brick0/storage/images/1831/92f63f17-eb6c-8dba-2b9c-2e9cc52a8b2c/hdd-images/8786 trusted.afr.storage0-client-0=0x0100 -- # file: mnt/gluster/brick0/storage/images/1831/6ae6c5eb-e6e2-4dfe-7bb3-75c622910f27/hdd-images/9113 trusted.afr.storage0-client-0=0x0100 -- # file: mnt/gluster/brick0/storage/images/1828/4e5fd475-19b3-c9a7-1ad0-e4da528e6dbd/iso-images/11091 trusted.afr.storage0-client-0=0x0200 -- # file: mnt/gluster/brick0/storage/images/1853/3df576b8-4206-e45a-33d7-433d56b700f0/iso-images/1957 trusted.afr.storage0-client-0=0x0200 # file: mnt/gluster/brick0/storage/images/1853/3df576b8-4206-e45a-33d7-433d56b700f0/iso-images/1960 trusted.afr.storage0-client-0=0x0200 -- # file: mnt/gluster/brick0/storage/images/2003/5e9b2bdc-a158-796f-6d81-60f39aee5137/hdd-images/9772 trusted.afr.storage0-client-0=0x0200 -- # file: mnt/gluster/brick0/storage/images/1962/5c4cd738-bb56-d723-f001-0428e55ea81b/iso-images/8110 trusted.afr.storage0-client-0=0x0200 -- # file: mnt/gluster/brick0/storage/images/1787/a190f24c-ed40-5642-4226-f00726dfc99f/iso-images/9837 trusted.afr.storage0-client-0=0x0200 -- # file: mnt/gluster/brick0/storage/images/1787/ad8179fd-4f00-086c-955f-c2e469809e64/iso-images/2854 trusted.afr.storage0-client-0=0x0200 # file: mnt/gluster/brick0/storage/images/1787/ad8179fd-4f00-086c-955f-c2e469809e64/iso-images/10703 trusted.afr.storage0-client-0=0x0200 -- # file: mnt/gluster/brick0/storage/images/1787/1782da17-059c-d159-373a-9ad9f5f9289f/iso-images/8519 trusted.afr.storage0-client-0=0x0200 -- # file: mnt/gluster/brick0/storage/images/1787/aed64fba-2372-6f06-0690-be46136464a0/iso-images/10258 trusted.afr.storage0-client-0=0x0200 # file: mnt/gluster/brick0/storage/images/1787/aed64fba-2372-6f06-0690-be46136464a0/iso-images/10452 trusted.afr.storage0-client-0=0x0200 -- # file: mnt/gluster/brick0/storage/images/1787/cd6089d7-c2cd-a5e1-2130-770fe028b5e3/iso-images/10511 trusted.afr.storage0-client-0=0x0200 -- # file: mnt/gluster/brick0/storage/images/1834/02d04e16-40db-d244-aa4e-3e53cfaa2405/iso-images/504 trusted.afr.storage0-client-0=0x0200 -- # file: mnt/gluster/brick0/storage/images/1978/21527903-ca4e-4715-b40b-30c150f86d44/iso-images/9275 trusted.afr.storage0-client-0=0x0100 # file: mnt/gluster/brick0/storage/images/1978/21527903-ca4e-4715-b40b-30c150f86d44/iso-images/9511 trusted.afr.storage0-client-0=0x0100 -- # file: mnt/gluster/brick1/storage/images/1828/fc701d50-0b29-7827-89c0-77134ba96205/iso-images/9442 trusted.afr.storage0-client-2=0x0200 -- # file: mnt/gluster/brick1/storage/images/1878/875ed0c0-38b3-4552-7f1f-49a619996e5c/hdd-images/5758 trusted.afr.storage0-client-3=0x -- # file: mnt/gluster/brick1/storage/images/1787/ad8179fd-4f00-086c-955f-c2e469809e64/iso-images/2857 trusted.afr.storage0-client-2=0x0200 # file: mnt/gluster/brick1/storage/images/1787/ad8179fd-4f00-086c-955f-c2e469809e64/iso-images/10773 trusted.afr.storage0-client-2=0x0200 # file: mnt/gluster/brick1/storage/images/1787/ad8179fd-4f00-086c-955f-c2e469809e64/iso-images/10648 trusted.afr.storage0-client-2=0x0200 -- # file: mnt/gluster/brick1/storage/images/2003/8d7880ff-e7b2-3996-3fa3-ddb8022ca403/iso-images/9979 trusted.afr.storage0-client-2=0x0200 -- # file: mnt/gluster/brick1/storage/images/2003/116a4a8f-c8c2-6f70-1256-b29477d65e72/iso-images/10587 trusted.afr.storage0-client-2=0x0200 -- # file: mnt/gluster/brick1/storage/images/1956/ff4bbfd7-3b1a-00da-c901-c35cd967b600/iso-images/6815 trusted.afr.storage0-client-2=0x0200 -- # fil
Re: [Gluster-users] "Gluster volume show" function?
Hmm, while looking at # gluster volume info all Volume Name: storage0 Type: Distributed-Replicate Status: Started Number of Bricks: 4 x 2 = 8 Transport-type: rdma Bricks: Brick1: de-dc1-c1-pserver3:/mnt/gluster/brick0/storage Brick2: de-dc1-c1-pserver5:/mnt/gluster/brick0/storage Brick3: de-dc1-c1-pserver3:/mnt/gluster/brick1/storage Brick4: de-dc1-c1-pserver5:/mnt/gluster/brick1/storage Brick5: de-dc1-c1-pserver12:/mnt/gluster/brick0/storage Brick6: de-dc1-c1-pserver13:/mnt/gluster/brick0/storage Brick7: de-dc1-c1-pserver12:/mnt/gluster/brick1/storage Brick8: de-dc1-c1-pserver13:/mnt/gluster/brick1/storage Options Reconfigured: *network.ping-timeout: 5 nfs.disable: on performance.cache-size: 4096MB * I guess the lower bit (Options Reconfigured:) is showing all the used options for the volume setup, right?!? So question answered... Best, Martin Hi all! Is there a quick way to figure out what volume options have been used to set up a Gluster volume? http://gluster.com/community/documentation/index.php/Gluster_3.1:_Setting_Volume_Options gives me the list of options but there seems to be now way to check what WAS set already?!? Or am I just looking in the wrong places?!? I've "inherited" a running system with some performance/stability issues and I'm trying to peer under the hood... Best, Martin ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] "Gluster volume show" function?
Hi all! Is there a quick way to figure out what volume options have been used to set up a Gluster volume? http://gluster.com/community/documentation/index.php/Gluster_3.1:_Setting_Volume_Options gives me the list of options but there seems to be now way to check what WAS set already?!? Or am I just looking in the wrong places?!? I've "inherited" a running system with some performance/stability issues and I'm trying to peer under the hood... Best, Martin ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] Does gluster make use of a multicore setup? Hardware recs.?
Hi all! I'm new to the Gluster system and tried to find answers to some simple questions (and couldn't find the information with Google etc.) -does Gluster spread it's cpu load across a multicore environment? So does it make sense to have 50 core units as Gluster server? CPU loads seem to go up quite high during file system repairs so spreading / multithreading should help? What kind of CPUs are working well? How much memory does help the preformance? -Are there any recommendations for commodity hardware? We're thinking of 36 slot 4U servers, what kind of controllers DO work well for IO speed? Any real life experiences? Does it dramatically improve the performance to increase the number of controllers per disk? The aim is for a ~80-120T file system with 2-3 bricks. Thanks for any feedback! Best, Martin ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users