Re: [Gluster-users] Client and server file "view", different results?! Client can't see the right file.

2011-05-20 Thread Martin Schenker
No, all files are running VMs. No-one alters them manually (which would kill
the VM...)

So, all was done by the replicate mechanism and the sync. We have to reboot
servers from time to time for upgrades, but we do bring the back up with the
Gluster running before  tackling a second server.

Best, Martin

-Original Message-
From: Mohit Anchlia [mailto:mohitanch...@gmail.com] 
Sent: Thursday, May 19, 2011 7:05 PM
To: Pranith Kumar. Karampuri; Martin Schenker
Cc: gluster-users@gluster.org
Subject: Re: [Gluster-users] Client and server file "view", different
results?! Client can't see the right file.

What's more interesting is that pserver3 shows "0" bytes and rest 3 of
them show the same "size". While pserver12 & 13 has
"trusted.glusterfs.dht.linkto="storage0-replicate-0" set.

Was there every any manual operation done with these files?

On Thu, May 19, 2011 at 5:16 AM, Pranith Kumar. Karampuri
 wrote:
> Need the logs from May 13th to 17th.
>
> Pranith.
> - Original Message -
> From: "Martin Schenker" 
> To: "Pranith Kumar. Karampuri" 
> Cc: gluster-users@gluster.org
> Sent: Thursday, May 19, 2011 5:28:06 PM
> Subject: RE: [Gluster-users] Client and server file "view",     different
results?! Client can't see the right file.
>
> Hi Pranith!
>
> That's what I would have expected as well! The files should be on one
brick. But they appear on both.
> I'm quite stumped WHY the files show up on the other brick, this isn't
what I understood from the manual/setup! The vol-file doesn't seem to be
wrong so any ideas?
>
> Best, Martin
>
>
>
> -Original Message-
> From: Pranith Kumar. Karampuri [mailto:prani...@gluster.com]
> Sent: Thursday, May 19, 2011 1:52 PM
> To: Martin Schenker
> Cc: gluster-users@gluster.org
> Subject: Re: [Gluster-users] Client and server file "view", different
results?! Client can't see the right file.
>
> Martin,
>     The output suggests that there are 2 replicas per 1 volume. So it
should be present on only 2 bricks. Why is the file present in 4 bricks?. It
should either be present on pserver12&13 or pserver3 & 5. I am not sure why
you are expecting it to be there on 4 bricks.
> Am I missing any info here?.
>
> Pranith
>
> - Original Message -
> From: "Martin Schenker" 
> To: gluster-users@gluster.org
> Sent: Wednesday, May 18, 2011 2:23:09 PM
> Subject: Re: [Gluster-users] Client and server file "view",     different
results?! Client can't see the right file.
>
> Here is another occurrence:
>
> The file 20819 is shown twice, different timestamps and attributes. 0
> filesize on pserver3, outdated on pserver5, just 12&13 seems to be in
sync.
> So what's going on?
>
>
> 0 root@de-dc1-c1-pserver13:~ # ls -al
>
/opt/profitbricks/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/h
> dd-images/2081*
> -rwxrwx--- 1 libvirt-qemu kvm 53687091200 May 18 08:44
>
/opt/profitbricks/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/h
> dd-images/20819
> -rwxrwx--- 1 libvirt-qemu kvm 53687091200 May 18 08:44
>
/opt/profitbricks/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/h
> dd-images/20819
>
> 0 root@de-dc1-c1-pserver3:~ # find /mnt/gluster/brick?/ -name 20819 |
xargs
> -i ls -al {}
> -rwxrwx--- 1 libvirt-qemu vcb 0 May 14 17:00
>
/mnt/gluster/brick0/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef
> /hdd-images/20819
> 0 root@de-dc1-c1-pserver3:~ # getfattr -dm -
>
/mnt/gluster/brick0/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef
> /hdd-images/20819
> getfattr: Removing leading '/' from absolute path names
> # file:
>
mnt/gluster/brick0/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/
> hdd-images/20819
> trusted.gfid=0sa5/rvjUUQ3ibSf32O3izOw==
>
> 0 root@pserver5:~ # find /mnt/gluster/brick?/ -name 20819 | xargs -i ls
-al
> {}
> -rwxrwx--- 1 libvirt-qemu vcb 53687091200 May 14 17:00
>
/mnt/gluster/brick0/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef
> /hdd-images/20819
> 0 root@pserver5:~ # getfattr -dm -
>
/mnt/gluster/brick0/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef
> /hdd-images/20819
> getfattr: Removing leading '/' from absolute path names
> # file:
>
mnt/gluster/brick0/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/
> hdd-images/20819
> trusted.afr.storage0-client-0=0sAgIA
> trusted.afr.storage0-client-1=0s
> trusted.gfid=0sa5/rvjUUQ3ibSf32O3izOw==
>
> 0 root@pserver12:~ # find /mnt/gluster/brick?/ -name 20819 | xargs -i ls
-al
> {}
> -rwxrwx--- 1 libvirt-qemu kvm 53687091200 May 18 08:

Re: [Gluster-users] Client and server file "view", different results?! Client can't see the right file.

2011-05-19 Thread Martin Schenker
Hi Pranith!

That's what I would have expected as well! The files should be on one brick. 
But they appear on both.
I'm quite stumped WHY the files show up on the other brick, this isn't what I 
understood from the manual/setup! The vol-file doesn't seem to be wrong so any 
ideas?

Best, Martin



-Original Message-
From: Pranith Kumar. Karampuri [mailto:prani...@gluster.com] 
Sent: Thursday, May 19, 2011 1:52 PM
To: Martin Schenker
Cc: gluster-users@gluster.org
Subject: Re: [Gluster-users] Client and server file "view", different results?! 
Client can't see the right file.

Martin,
 The output suggests that there are 2 replicas per 1 volume. So it should 
be present on only 2 bricks. Why is the file present in 4 bricks?. It should 
either be present on pserver12&13 or pserver3 & 5. I am not sure why you are 
expecting it to be there on 4 bricks.
Am I missing any info here?.

Pranith

- Original Message -
From: "Martin Schenker" 
To: gluster-users@gluster.org
Sent: Wednesday, May 18, 2011 2:23:09 PM
Subject: Re: [Gluster-users] Client and server file "view", different 
results?! Client can't see the right file.

Here is another occurrence:

The file 20819 is shown twice, different timestamps and attributes. 0
filesize on pserver3, outdated on pserver5, just 12&13 seems to be in sync.
So what's going on? 


0 root@de-dc1-c1-pserver13:~ # ls -al
/opt/profitbricks/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/h
dd-images/2081*
-rwxrwx--- 1 libvirt-qemu kvm 53687091200 May 18 08:44
/opt/profitbricks/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/h
dd-images/20819
-rwxrwx--- 1 libvirt-qemu kvm 53687091200 May 18 08:44
/opt/profitbricks/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/h
dd-images/20819

0 root@de-dc1-c1-pserver3:~ # find /mnt/gluster/brick?/ -name 20819 | xargs
-i ls -al {}
-rwxrwx--- 1 libvirt-qemu vcb 0 May 14 17:00
/mnt/gluster/brick0/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef
/hdd-images/20819
0 root@de-dc1-c1-pserver3:~ # getfattr -dm -
/mnt/gluster/brick0/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef
/hdd-images/20819
getfattr: Removing leading '/' from absolute path names
# file:
mnt/gluster/brick0/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/
hdd-images/20819
trusted.gfid=0sa5/rvjUUQ3ibSf32O3izOw==

0 root@pserver5:~ # find /mnt/gluster/brick?/ -name 20819 | xargs -i ls -al
{}
-rwxrwx--- 1 libvirt-qemu vcb 53687091200 May 14 17:00
/mnt/gluster/brick0/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef
/hdd-images/20819
0 root@pserver5:~ # getfattr -dm -
/mnt/gluster/brick0/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef
/hdd-images/20819
getfattr: Removing leading '/' from absolute path names
# file:
mnt/gluster/brick0/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/
hdd-images/20819
trusted.afr.storage0-client-0=0sAgIA
trusted.afr.storage0-client-1=0s
trusted.gfid=0sa5/rvjUUQ3ibSf32O3izOw==

0 root@pserver12:~ # find /mnt/gluster/brick?/ -name 20819 | xargs -i ls -al
{}
-rwxrwx--- 1 libvirt-qemu kvm 53687091200 May 18 08:41
/mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef
/hdd-images/20819
0 root@pserver12:~ # getfattr -dm -
/mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef
/hdd-images/20819
getfattr: Removing leading '/' from absolute path names
# file:
mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/
hdd-images/20819
trusted.afr.storage0-client-6=0s
trusted.afr.storage0-client-7=0s
trusted.gfid=0sa5/rvjUUQ3ibSf32O3izOw==
trusted.glusterfs.dht.linkto="storage0-replicate-0

0 root@de-dc1-c1-pserver13:~ # find /mnt/gluster/brick?/ -name 20819 | xargs
-i ls -al {}
-rwxrwx--- 1 libvirt-qemu kvm 53687091200 May 18 08:39
/mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef
/hdd-images/20819
0 root@de-dc1-c1-pserver13:~ # getfattr -dm -
/mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef
/hdd-images/20819
getfattr: Removing leading '/' from absolute path names
# file:
mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/
hdd-images/20819
trusted.afr.storage0-client-6=0s
trusted.afr.storage0-client-7=0s
trusted.gfid=0sa5/rvjUUQ3ibSf32O3izOw==
trusted.glusterfs.dht.linkto="storage0-replicate-0

Only entrance in log file on pserver5, no references in the other three
logs/servers:

0 root@pserver5:~ # grep 20819
/var/log/glusterfs/opt-profitbricks-storage.log
[2011-05-17 20:37:30.52535] I [client-handshake.c:407:client3_1_reopen_cbk]
0-storage0-client-7: reopen on
/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/hdd-images/20819 succeeded
(remote-fd = 6)
[2011-05-17 20:37:34.824934] I [afr-open.c:435:afr_openfd_

Re: [Gluster-users] Client and server file "view", different results?! Client can't see the right file.

2011-05-18 Thread Martin Schenker
No, we had these issues before already, running on 3.1.3 
Only the system load has gone up a lot in the meantime...

Best, Martin

-Original Message-
From: Mohit Anchlia [mailto:mohitanch...@gmail.com] 
Sent: Wednesday, May 18, 2011 10:36 PM
To: Martin Schenker
Cc: gluster-users@gluster.org
Subject: Re: [Gluster-users] Client and server file "view", different
results?! Client can't see the right file.

So you started seeing this issue after rolling it back to 3.1.3?

On Wed, May 18, 2011 at 1:30 PM, Martin Schenker
 wrote:
> We're running 3.1.3. we had a brief test of 3.2.0 and rolled back to 3.1.3
> by reinstalling the Debian package.
>
> 0 root@pserver12:~ # gluster volume info all
>
> Volume Name: storage0
> Type: Distributed-Replicate
> Status: Started
> Number of Bricks: 4 x 2 = 8
> Transport-type: rdma
> Bricks:
> Brick1: de-dc1-c1-pserver3:/mnt/gluster/brick0/storage
> Brick2: de-dc1-c1-pserver5:/mnt/gluster/brick0/storage
> Brick3: de-dc1-c1-pserver3:/mnt/gluster/brick1/storage
> Brick4: de-dc1-c1-pserver5:/mnt/gluster/brick1/storage
> Brick5: de-dc1-c1-pserver12:/mnt/gluster/brick0/storage
> Brick6: de-dc1-c1-pserver13:/mnt/gluster/brick0/storage
> Brick7: de-dc1-c1-pserver12:/mnt/gluster/brick1/storage
> Brick8: de-dc1-c1-pserver13:/mnt/gluster/brick1/storage
> Options Reconfigured:
> network.ping-timeout: 5
> nfs.disable: on
> performance.cache-size: 4096MB
>
> Best, Martin
>
> -Original Message-
> From: Mohit Anchlia [mailto:mohitanch...@gmail.com]
> Sent: Wednesday, May 18, 2011 9:43 PM
> To: Martin Schenker
> Cc: gluster-users@gluster.org
> Subject: Re: [Gluster-users] Client and server file "view", different
> results?! Client can't see the right file.
>
> Which version are you running? Can you also post output from volume info?
>
> Meanwhile, anyone from dev want to answer??
>
> On Wed, May 18, 2011 at 1:53 AM, Martin Schenker
>  wrote:
>> Here is another occurrence:
>>
>> The file 20819 is shown twice, different timestamps and attributes. 0
>> filesize on pserver3, outdated on pserver5, just 12&13 seems to be in
> sync.
>> So what's going on?
>>
>>
>> 0 root@de-dc1-c1-pserver13:~ # ls -al
>>
>
/opt/profitbricks/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/h
>> dd-images/2081*
>> -rwxrwx--- 1 libvirt-qemu kvm 53687091200 May 18 08:44
>>
>
/opt/profitbricks/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/h
>> dd-images/20819
>> -rwxrwx--- 1 libvirt-qemu kvm 53687091200 May 18 08:44
>>
>
/opt/profitbricks/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/h
>> dd-images/20819
>>
>> 0 root@de-dc1-c1-pserver3:~ # find /mnt/gluster/brick?/ -name 20819 |
> xargs
>> -i ls -al {}
>> -rwxrwx--- 1 libvirt-qemu vcb 0 May 14 17:00
>>
>
/mnt/gluster/brick0/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef
>> /hdd-images/20819
>> 0 root@de-dc1-c1-pserver3:~ # getfattr -dm -
>>
>
/mnt/gluster/brick0/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef
>> /hdd-images/20819
>> getfattr: Removing leading '/' from absolute path names
>> # file:
>>
>
mnt/gluster/brick0/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/
>> hdd-images/20819
>> trusted.gfid=0sa5/rvjUUQ3ibSf32O3izOw==
>>
>> 0 root@pserver5:~ # find /mnt/gluster/brick?/ -name 20819 | xargs -i ls
> -al
>> {}
>> -rwxrwx--- 1 libvirt-qemu vcb 53687091200 May 14 17:00
>>
>
/mnt/gluster/brick0/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef
>> /hdd-images/20819
>> 0 root@pserver5:~ # getfattr -dm -
>>
>
/mnt/gluster/brick0/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef
>> /hdd-images/20819
>> getfattr: Removing leading '/' from absolute path names
>> # file:
>>
>
mnt/gluster/brick0/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/
>> hdd-images/20819
>> trusted.afr.storage0-client-0=0sAgIA
>> trusted.afr.storage0-client-1=0s
>> trusted.gfid=0sa5/rvjUUQ3ibSf32O3izOw==
>>
>> 0 root@pserver12:~ # find /mnt/gluster/brick?/ -name 20819 | xargs -i ls
> -al
>> {}
>> -rwxrwx--- 1 libvirt-qemu kvm 53687091200 May 18 08:41
>>
>
/mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef
>> /hdd-images/20819
>> 0 root@pserver12:~ # getfattr -dm -
>>
>
/mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef
>> /hdd-images/20819
>> getfattr: Removing leading '/' from absolute path names
>> # file:
>>
>
mnt/gluster/brick1/storage/i

Re: [Gluster-users] Client and server file "view", different results?! Client can't see the right file.

2011-05-18 Thread Martin Schenker
We're running 3.1.3. we had a brief test of 3.2.0 and rolled back to 3.1.3
by reinstalling the Debian package.

0 root@pserver12:~ # gluster volume info all

Volume Name: storage0
Type: Distributed-Replicate
Status: Started
Number of Bricks: 4 x 2 = 8
Transport-type: rdma
Bricks:
Brick1: de-dc1-c1-pserver3:/mnt/gluster/brick0/storage
Brick2: de-dc1-c1-pserver5:/mnt/gluster/brick0/storage
Brick3: de-dc1-c1-pserver3:/mnt/gluster/brick1/storage
Brick4: de-dc1-c1-pserver5:/mnt/gluster/brick1/storage
Brick5: de-dc1-c1-pserver12:/mnt/gluster/brick0/storage
Brick6: de-dc1-c1-pserver13:/mnt/gluster/brick0/storage
Brick7: de-dc1-c1-pserver12:/mnt/gluster/brick1/storage
Brick8: de-dc1-c1-pserver13:/mnt/gluster/brick1/storage
Options Reconfigured:
network.ping-timeout: 5
nfs.disable: on
performance.cache-size: 4096MB

Best, Martin

-Original Message-
From: Mohit Anchlia [mailto:mohitanch...@gmail.com] 
Sent: Wednesday, May 18, 2011 9:43 PM
To: Martin Schenker
Cc: gluster-users@gluster.org
Subject: Re: [Gluster-users] Client and server file "view", different
results?! Client can't see the right file.

Which version are you running? Can you also post output from volume info?

Meanwhile, anyone from dev want to answer??

On Wed, May 18, 2011 at 1:53 AM, Martin Schenker
 wrote:
> Here is another occurrence:
>
> The file 20819 is shown twice, different timestamps and attributes. 0
> filesize on pserver3, outdated on pserver5, just 12&13 seems to be in
sync.
> So what's going on?
>
>
> 0 root@de-dc1-c1-pserver13:~ # ls -al
>
/opt/profitbricks/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/h
> dd-images/2081*
> -rwxrwx--- 1 libvirt-qemu kvm 53687091200 May 18 08:44
>
/opt/profitbricks/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/h
> dd-images/20819
> -rwxrwx--- 1 libvirt-qemu kvm 53687091200 May 18 08:44
>
/opt/profitbricks/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/h
> dd-images/20819
>
> 0 root@de-dc1-c1-pserver3:~ # find /mnt/gluster/brick?/ -name 20819 |
xargs
> -i ls -al {}
> -rwxrwx--- 1 libvirt-qemu vcb 0 May 14 17:00
>
/mnt/gluster/brick0/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef
> /hdd-images/20819
> 0 root@de-dc1-c1-pserver3:~ # getfattr -dm -
>
/mnt/gluster/brick0/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef
> /hdd-images/20819
> getfattr: Removing leading '/' from absolute path names
> # file:
>
mnt/gluster/brick0/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/
> hdd-images/20819
> trusted.gfid=0sa5/rvjUUQ3ibSf32O3izOw==
>
> 0 root@pserver5:~ # find /mnt/gluster/brick?/ -name 20819 | xargs -i ls
-al
> {}
> -rwxrwx--- 1 libvirt-qemu vcb 53687091200 May 14 17:00
>
/mnt/gluster/brick0/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef
> /hdd-images/20819
> 0 root@pserver5:~ # getfattr -dm -
>
/mnt/gluster/brick0/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef
> /hdd-images/20819
> getfattr: Removing leading '/' from absolute path names
> # file:
>
mnt/gluster/brick0/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/
> hdd-images/20819
> trusted.afr.storage0-client-0=0sAgIA
> trusted.afr.storage0-client-1=0s
> trusted.gfid=0sa5/rvjUUQ3ibSf32O3izOw==
>
> 0 root@pserver12:~ # find /mnt/gluster/brick?/ -name 20819 | xargs -i ls
-al
> {}
> -rwxrwx--- 1 libvirt-qemu kvm 53687091200 May 18 08:41
>
/mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef
> /hdd-images/20819
> 0 root@pserver12:~ # getfattr -dm -
>
/mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef
> /hdd-images/20819
> getfattr: Removing leading '/' from absolute path names
> # file:
>
mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/
> hdd-images/20819
> trusted.afr.storage0-client-6=0s
> trusted.afr.storage0-client-7=0s
> trusted.gfid=0sa5/rvjUUQ3ibSf32O3izOw==
> trusted.glusterfs.dht.linkto="storage0-replicate-0
>
> 0 root@de-dc1-c1-pserver13:~ # find /mnt/gluster/brick?/ -name 20819 |
xargs
> -i ls -al {}
> -rwxrwx--- 1 libvirt-qemu kvm 53687091200 May 18 08:39
>
/mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef
> /hdd-images/20819
> 0 root@de-dc1-c1-pserver13:~ # getfattr -dm -
>
/mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef
> /hdd-images/20819
> getfattr: Removing leading '/' from absolute path names
> # file:
>
mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/
> hdd-images/20819
> trusted.afr.storage0-client-6=0s
> trusted.afr.storage0-client-7=0s
> trusted.gfid=0sa5/rvjUUQ3ibSf32O3izOw==
> trus

Re: [Gluster-users] Client and server file "view", different results?! Client can't see the right file.

2011-05-18 Thread Martin Schenker
Here is another occurrence:

The file 20819 is shown twice, different timestamps and attributes. 0
filesize on pserver3, outdated on pserver5, just 12&13 seems to be in sync.
So what's going on? 


0 root@de-dc1-c1-pserver13:~ # ls -al
/opt/profitbricks/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/h
dd-images/2081*
-rwxrwx--- 1 libvirt-qemu kvm 53687091200 May 18 08:44
/opt/profitbricks/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/h
dd-images/20819
-rwxrwx--- 1 libvirt-qemu kvm 53687091200 May 18 08:44
/opt/profitbricks/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/h
dd-images/20819

0 root@de-dc1-c1-pserver3:~ # find /mnt/gluster/brick?/ -name 20819 | xargs
-i ls -al {}
-rwxrwx--- 1 libvirt-qemu vcb 0 May 14 17:00
/mnt/gluster/brick0/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef
/hdd-images/20819
0 root@de-dc1-c1-pserver3:~ # getfattr -dm -
/mnt/gluster/brick0/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef
/hdd-images/20819
getfattr: Removing leading '/' from absolute path names
# file:
mnt/gluster/brick0/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/
hdd-images/20819
trusted.gfid=0sa5/rvjUUQ3ibSf32O3izOw==

0 root@pserver5:~ # find /mnt/gluster/brick?/ -name 20819 | xargs -i ls -al
{}
-rwxrwx--- 1 libvirt-qemu vcb 53687091200 May 14 17:00
/mnt/gluster/brick0/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef
/hdd-images/20819
0 root@pserver5:~ # getfattr -dm -
/mnt/gluster/brick0/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef
/hdd-images/20819
getfattr: Removing leading '/' from absolute path names
# file:
mnt/gluster/brick0/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/
hdd-images/20819
trusted.afr.storage0-client-0=0sAgIA
trusted.afr.storage0-client-1=0s
trusted.gfid=0sa5/rvjUUQ3ibSf32O3izOw==

0 root@pserver12:~ # find /mnt/gluster/brick?/ -name 20819 | xargs -i ls -al
{}
-rwxrwx--- 1 libvirt-qemu kvm 53687091200 May 18 08:41
/mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef
/hdd-images/20819
0 root@pserver12:~ # getfattr -dm -
/mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef
/hdd-images/20819
getfattr: Removing leading '/' from absolute path names
# file:
mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/
hdd-images/20819
trusted.afr.storage0-client-6=0s
trusted.afr.storage0-client-7=0s
trusted.gfid=0sa5/rvjUUQ3ibSf32O3izOw==
trusted.glusterfs.dht.linkto="storage0-replicate-0

0 root@de-dc1-c1-pserver13:~ # find /mnt/gluster/brick?/ -name 20819 | xargs
-i ls -al {}
-rwxrwx--- 1 libvirt-qemu kvm 53687091200 May 18 08:39
/mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef
/hdd-images/20819
0 root@de-dc1-c1-pserver13:~ # getfattr -dm -
/mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef
/hdd-images/20819
getfattr: Removing leading '/' from absolute path names
# file:
mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/
hdd-images/20819
trusted.afr.storage0-client-6=0s
trusted.afr.storage0-client-7=0s
trusted.gfid=0sa5/rvjUUQ3ibSf32O3izOw==
trusted.glusterfs.dht.linkto="storage0-replicate-0

Only entrance in log file on pserver5, no references in the other three
logs/servers:

0 root@pserver5:~ # grep 20819
/var/log/glusterfs/opt-profitbricks-storage.log
[2011-05-17 20:37:30.52535] I [client-handshake.c:407:client3_1_reopen_cbk]
0-storage0-client-7: reopen on
/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/hdd-images/20819 succeeded
(remote-fd = 6)
[2011-05-17 20:37:34.824934] I [afr-open.c:435:afr_openfd_sh]
0-storage0-replicate-3:  data self-heal triggered. path:
/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/hdd-images/20819, reason:
Replicate up down flush, data lock is held
[2011-05-17 20:37:34.825557] E
[afr-self-heal-common.c:1214:sh_missing_entries_create]
0-storage0-replicate-3: no missing files -
/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/hdd-images/20819.
proceeding to metadata check
[2011-05-17 21:08:59.241203] I
[afr-self-heal-algorithm.c:526:sh_diff_loop_driver_done]
0-storage0-replicate-3: diff self-heal on
/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/hdd-images/20819: 6 blocks
of 409600 were different (0.00%)
[2011-05-17 21:08:59.275873] I
[afr-self-heal-common.c:1527:afr_self_heal_completion_cbk]
0-storage0-replicate-3: background  data self-heal completed on
/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/hdd-images/20819


___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Client and server file "view", different results?! Client can't see the right file.

2011-05-17 Thread Martin Schenker
This is an inherited system, I guess it was set up by hand. I guess I can
switch off these options, but the glusterd service will have to be
restarted, right?!?

I'm also getting current error messages like these on the peer pair 3&5:

Pserver3
[2011-05-17 10:06:28.540355] E [rpc-clnt.c:199:call_bail]
0-storage0-client-2: bailing out frame type(GlusterFS 3.1) op(FINODELK(30))
xid = 0x805809xsent = 2011-05-17 09:36:18.393519. timeout = 1800

Pserver5
[2011-05-17 10:02:23.738887] E [dht-common.c:1873:dht_getxattr]
0-storage0-dht: layout is NULL
[2011-05-17 10:02:23.738909] W [fuse-bridge.c:2499:fuse_xattr_cbk]
0-glusterfs-fuse: 489090: GETXATTR()
/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da
6ef/hdd-images/21351 => -1 (No such file or directory)
[2011-05-17 10:02:23.738954] W [fuse-bridge.c:660:fuse_setattr_cbk]
0-glusterfs-fuse: 489091: SETATTR()
/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da
6ef/hdd-images/21351 => -1 (Invalid argument)

Best, Martin

-Original Message-
From: gluster-users-boun...@gluster.org
[mailto:gluster-users-boun...@gluster.org] On Behalf Of Joe Landman
Sent: Tuesday, May 17, 2011 1:54 PM
To: gluster-users@gluster.org
Subject: Re: [Gluster-users] Client and server file "view", different
results?! Client can't see the right file.

On 05/17/2011 01:43 AM, Martin Schenker wrote:
> Yes, it is!
>
> Here's the volfile:
>
> cat  /mnt/gluster/brick0/config/vols/storage0/storage0-fuse.vol:
>
> volume storage0-client-0
>  type protocol/client
>  option remote-host de-dc1-c1-pserver3
>  option remote-subvolume /mnt/gluster/brick0/storage
>  option transport-type rdma
>  option ping-timeout 5
> end-volume

Hmmm ... did you create these by hand or using the CLI?

I noticed quick-read and stat-cache on.  We recommend turning both of 
them off.  We experienced many issues with them on (from gluster 3.x.y)

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics, Inc.
email: land...@scalableinformatics.com
web  : http://scalableinformatics.com
http://scalableinformatics.com/sicluster
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Client and server file "view", different results?! Client can't see the right file.

2011-05-17 Thread Martin Schenker
It version 3.1.3 (we tried 3.2.0 for about 10h and rolled back)

Unfortunateltly the file view was "repaired" already by brutally copying
manually from the correct /mnt (server) mountpoint to the /opt (client)
mount which fixed the situation for now. We needed the files accessible
ASAP. 

Best, Martin

> -Original Message-
> From: Pranith Kumar. Karampuri [mailto:prani...@gluster.com] 
> Sent: Tuesday, May 17, 2011 10:41 AM
> To: Martin Schenker
> Cc: gluster-users@gluster.org
> Subject: Re: [Gluster-users] Client and server file "view", 
> different results?! Client can't see the right file.
> 
> 
> hi Martin,
> Could you please gather the following outputs so that we 
> can debug as to what is happening:
> 1) whats the version of the gluster.
> 2) backend "ls -l" of files in question on all bricks that 
> file is replicated on.
> 3) 'ls -l" o/p from mnt point for that file.
> 
> Thanks
> Pranith
> - Original Message -
> From: "Martin Schenker" 
> To: "Pranith Kumar. Karampuri" 
> Cc: gluster-users@gluster.org
> Sent: Tuesday, May 17, 2011 11:13:32 AM
> Subject: RE: [Gluster-users] Client and server file "view",   
> different results?! Client can't see the right file.
> 
> Yes, it is!
> 
> Here's the volfile:
> 
> cat  /mnt/gluster/brick0/config/vols/storage0/storage0-fuse.vol:
> 
> volume storage0-client-0
> type protocol/client
> option remote-host de-dc1-c1-pserver3
> option remote-subvolume /mnt/gluster/brick0/storage
> option transport-type rdma
> option ping-timeout 5
> end-volume
> 
> volume storage0-client-1
> type protocol/client
> option remote-host de-dc1-c1-pserver5
> option remote-subvolume /mnt/gluster/brick0/storage
> option transport-type rdma
> option ping-timeout 5
> end-volume
> 
> volume storage0-client-2
> type protocol/client
> option remote-host de-dc1-c1-pserver3
> option remote-subvolume /mnt/gluster/brick1/storage
> option transport-type rdma
> option ping-timeout 5
> end-volume
> 
> volume storage0-client-3
> type protocol/client
> option remote-host de-dc1-c1-pserver5
> option remote-subvolume /mnt/gluster/brick1/storage
> option transport-type rdma
> option ping-timeout 5
> end-volume
> 
> volume storage0-client-4
> type protocol/client
> option remote-host de-dc1-c1-pserver12
> option remote-subvolume /mnt/gluster/brick0/storage
> option transport-type rdma
> option ping-timeout 5
> end-volume
> 
> volume storage0-client-5
> type protocol/client
> option remote-host de-dc1-c1-pserver13
> option remote-subvolume /mnt/gluster/brick0/storage
> option transport-type rdma
> option ping-timeout 5
> end-volume
> 
> volume storage0-client-6
> type protocol/client
> option remote-host de-dc1-c1-pserver12
> option remote-subvolume /mnt/gluster/brick1/storage
> option transport-type rdma
> option ping-timeout 5
> end-volume
> 
> volume storage0-client-7
> type protocol/client
> option remote-host de-dc1-c1-pserver13
> option remote-subvolume /mnt/gluster/brick1/storage
> option transport-type rdma
> option ping-timeout 5
> end-volume
> 
> volume storage0-replicate-0
> type cluster/replicate
> subvolumes storage0-client-0 storage0-client-1
> end-volume
> 
> volume storage0-replicate-1
> type cluster/replicate
> subvolumes storage0-client-2 storage0-client-3
> end-volume
> 
> volume storage0-replicate-2
> type cluster/replicate
> subvolumes storage0-client-4 storage0-client-5
> end-volume
> 
> volume storage0-replicate-3
> type cluster/replicate
> subvolumes storage0-client-6 storage0-client-7
> end-volume
> 
> volume storage0-dht
> type cluster/distribute
> subvolumes storage0-replicate-0 storage0-replicate-1 
> storage0-replicate-2 storage0-replicate-3 end-volume
> 
> volume storage0-write-behind
> type performance/write-behind
> subvolumes storage0-dht
> end-volume
> 
> volume storage0-read-ahead
> type performance/read-ahead
> subvolumes storage0-write-behind
> end-volume
> 
> volume storage0-io-cache
> type performance/io-cache
> option cache-size 4096MB
> subvolumes storage0-read-ahead
> end-volume
> 
> volume storage0-quick-read
> type performance/quick-read
> option cache-size 4096MB
> subvolumes storage0-io-cache
> end-volume
> 
> volume storage0-stat-prefetch
> type performance/stat-prefetch
> sub

Re: [Gluster-users] Client and server file "view", different results?! Client can't see the right file.

2011-05-16 Thread Martin Schenker
Yes, it is!

Here's the volfile:

cat  /mnt/gluster/brick0/config/vols/storage0/storage0-fuse.vol:

volume storage0-client-0
type protocol/client
option remote-host de-dc1-c1-pserver3
option remote-subvolume /mnt/gluster/brick0/storage
option transport-type rdma
option ping-timeout 5
end-volume

volume storage0-client-1
type protocol/client
option remote-host de-dc1-c1-pserver5
option remote-subvolume /mnt/gluster/brick0/storage
option transport-type rdma
option ping-timeout 5
end-volume

volume storage0-client-2
type protocol/client
option remote-host de-dc1-c1-pserver3
option remote-subvolume /mnt/gluster/brick1/storage
option transport-type rdma
option ping-timeout 5
end-volume

volume storage0-client-3
type protocol/client
option remote-host de-dc1-c1-pserver5
option remote-subvolume /mnt/gluster/brick1/storage
option transport-type rdma
option ping-timeout 5
end-volume

volume storage0-client-4
type protocol/client
option remote-host de-dc1-c1-pserver12
option remote-subvolume /mnt/gluster/brick0/storage
option transport-type rdma
option ping-timeout 5
end-volume

volume storage0-client-5
type protocol/client
option remote-host de-dc1-c1-pserver13
option remote-subvolume /mnt/gluster/brick0/storage
option transport-type rdma
option ping-timeout 5
end-volume

volume storage0-client-6
type protocol/client
option remote-host de-dc1-c1-pserver12
option remote-subvolume /mnt/gluster/brick1/storage
option transport-type rdma
option ping-timeout 5
end-volume

volume storage0-client-7
type protocol/client
option remote-host de-dc1-c1-pserver13
option remote-subvolume /mnt/gluster/brick1/storage
option transport-type rdma
option ping-timeout 5
end-volume

volume storage0-replicate-0
type cluster/replicate
subvolumes storage0-client-0 storage0-client-1
end-volume

volume storage0-replicate-1
type cluster/replicate
subvolumes storage0-client-2 storage0-client-3
end-volume

volume storage0-replicate-2
type cluster/replicate
subvolumes storage0-client-4 storage0-client-5
end-volume

volume storage0-replicate-3
type cluster/replicate
subvolumes storage0-client-6 storage0-client-7
end-volume

volume storage0-dht
type cluster/distribute
subvolumes storage0-replicate-0 storage0-replicate-1
storage0-replicate-2 storage0-replicate-3
end-volume

volume storage0-write-behind
type performance/write-behind
subvolumes storage0-dht
end-volume

volume storage0-read-ahead
type performance/read-ahead
subvolumes storage0-write-behind
end-volume

volume storage0-io-cache
type performance/io-cache
option cache-size 4096MB
subvolumes storage0-read-ahead
end-volume

volume storage0-quick-read
type performance/quick-read
option cache-size 4096MB
subvolumes storage0-io-cache
end-volume

volume storage0-stat-prefetch
type performance/stat-prefetch
subvolumes storage0-quick-read
end-volume

volume storage0
type debug/io-stats
subvolumes storage0-stat-prefetch
end-volume


> -Original Message-
> From: Pranith Kumar. Karampuri [mailto:prani...@gluster.com] 
> Sent: Tuesday, May 17, 2011 7:16 AM
> To: Martin Schenker
> Cc: gluster-users@gluster.org
> Subject: Re: [Gluster-users] Client and server file "view", 
> different results?! Client can't see the right file.
> 
> 
> Martin,
>   Is this a distributed-replicate setup?. Could you 
> attach the vol-file of the client.
> 
> Pranith
> - Original Message -
> From: "Martin Schenker" 
> To: gluster-users@gluster.org
> Sent: Monday, May 16, 2011 2:49:29 PM
> Subject: [Gluster-users] Client and server file "view",   
> different results?! Client can't see the right file.
> 
> 
> Client and server file "view", different results?! Client 
> can't see the right file. 
> 
> Hi all! 
> 
> Here we have another mismatch between the client "view" and 
> the server mounts: 
> 
> From the server site everything seems well, the 20G file is 
> visible and the attributes seem to match: 
> 
> 0 root@pserver5:~ # getfattr -R -d -e hex -m "trusted.afr." 
> /mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8
> f-8542864da6ef/hdd-images/ 
> 
> # file: 
> mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f
> -8542864da6ef/hdd-images//20964 
> trusted.afr.storage0-client-2=0x 
> trusted.afr.storage0-client-3=0x 
> 
> 0 root@pserver5:~ # find /mnt/gluster/ -name 20964 | xargs -i 
> ls -al {} 
> -rwxrwx--- 1 libvirt-qemu vcb 21474836480 May 13 11:21 
> /mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8
> f-8542864da6ef/hdd-images/20964 
&g

[Gluster-users] Client and server file "view", different results?! Client can't see the right file.

2011-05-16 Thread Martin Schenker
Hi all!

Here we have another mismatch between the client "view" and the server
mounts:

>From the server site everything seems well, the 20G file is visible and the
attributes seem to match:

0 root@pserver5:~ # getfattr -R -d -e hex -m "trusted.afr."
/mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef
/hdd-images/

# file:
mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/
hdd-images//20964
trusted.afr.storage0-client-2=0x
trusted.afr.storage0-client-3=0x

0 root@pserver5:~ # find /mnt/gluster/ -name 20964 | xargs -i ls -al
{}
-rwxrwx--- 1 libvirt-qemu vcb 21474836480 May 13 11:21
/mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef
/hdd-images/20964

But the client view shows 2!! files with 0 byte size!! And these aren't any
link files created by Gluster… ( with the T on the end)

0 root@pserver5:~ # find /opt/profitbricks/storage/ -name 20964 |
xargs -i ls -al {}
-rwxrwx--- 1 libvirt-qemu kvm 0 May 13 11:24
/opt/profitbricks/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/h
dd-images/20964
-rwxrwx--- 1 libvirt-qemu kvm 0 May 13 11:24
/opt/profitbricks/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/h
dd-images/20964

I'm a bit stumped that we seem to have so many weird errors cropping up. Any
ideas? I've checked the ext4 filesystem on all boxes, no real problems. We
run a distributed cluster with 4 servers offering 2 bricks each.

Best, Martin




> -Original Message-
> From: Mohit Anchlia [mailto:mohitanch...@gmail.com] 
> Sent: Monday, May 16, 2011 2:24 AM
> To: Martin Schenker
> Cc: gluster-users@gluster.org
> Subject: Re: [Gluster-users] Brick pair file mismatch, 
> self-heal problems?
> 
> 
> Try this to trigger self heal:
> 
> find  -noleaf -print0 -name | xargs 
> --null stat >/dev/null
> 
> 
> 
> On Sun, May 15, 2011 at 11:20 AM, Martin Schenker 
>  wrote:
> > Can someone enlighten me what's going on here? We have a two peers, 
> > the file 21313 is shown through the client mountpoint as 
> "1Jan1970", 
> > attribs on server pserver3 don't match but NO self-heal or 
> repair can 
> > be triggered through "ls -alR"?!?
> >
> > Checking the files through the server mounts show that two versions 
> > are on the system. But the wrong one (as with the 
> "1Jan1970") seems to 
> > be the preferred one by the client?!?
> >
> > Do I need to use setattr or what in order to get the client 
> to see the 
> > RIGHT version?!? This is not the ONLY file displaying this 
> problematic 
> > behaviour!
> >
> > Thanks for any feedback.
> >
> > Martin
> >
> > pserver5:
> >
> > 0 root@pserver5:~ # ls -al 
> > 
> /mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-854286
> > 4da6ef
> > /hdd-images
> >
> > -rwxrwx--- 1 libvirt-qemu vcb  483183820800 May 13 13:41 21313
> >
> > 0 root@pserver5:~ # getfattr -R -d -e hex -m "trusted.afr." 
> > 
> /mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-854286
> > 4da6ef
> > /hdd-images/21313
> > getfattr: Removing leading '/' from absolute path names
> > # file:
> > 
> mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f
> -8542864da6ef/
> > hdd-images/21313
> > trusted.afr.storage0-client-2=0x
> > trusted.afr.storage0-client-3=0x
> >
> > 0 root@pserver5:~ # ls -alR 
> > 
> /opt/profitbricks/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864d
> > a6ef/h
> > dd-images/21313
> > -rwxrwx--- 1 libvirt-qemu kvm 483183820800 Jan  1  1970
> > 
> /opt/profitbricks/storage/images/2078/ebb83b05-3a83-9d18-ad8f-
> 8542864da6ef/h
> > dd-images/21313
> >
> > pserver3:
> >
> > 0 root@pserver3:~ # ls -al 
> > 
> /mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-854286
> > 4da6ef
> > /hdd-images
> >
> > -rwxrwx--- 1 libvirt-qemu kvm  483183820800 Jan  1  1970 21313
> >
> > 0 root@pserver3:~ # ls -alR 
> > 
> /opt/profitbricks/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864d
> > a6ef/h
> > dd-images/21313
> > -rwxrwx--- 1 libvirt-qemu kvm 483183820800 Jan  1  1970
> > 
> /opt/profitbricks/storage/images/2078/ebb83b05-3a83-9d18-ad8f-
> 8542864da6ef/h
> > dd-images/21313
> >
> > 0 root@pserver3:~ # getfattr -R -d -e hex -m "trusted.afr."
> > /mnt/gluster/brick1/storage/images/2078/ebb83b05-3a

[Gluster-users] Brick pair file mismatch, self-heal problems?

2011-05-15 Thread Martin Schenker
Can someone enlighten me what's going on here? We have a two peers, the file
21313 is shown through the client mountpoint as "1Jan1970", attribs on
server pserver3 don't match but NO self-heal or repair can be triggered
through "ls -alR"?!?

Checking the files through the server mounts show that two versions are on
the system. But the wrong one (as with the "1Jan1970") seems to be the
preferred one by the client?!?

Do I need to use setattr or what in order to get the client to see the RIGHT
version?!? This is not the ONLY file displaying this problematic behaviour!

Thanks for any feedback.

Martin

pserver5:

0 root@pserver5:~ # ls -al
/mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef
/hdd-images

-rwxrwx--- 1 libvirt-qemu vcb  483183820800 May 13 13:41 21313

0 root@pserver5:~ # getfattr -R -d -e hex -m "trusted.afr."
/mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef
/hdd-images/21313
getfattr: Removing leading '/' from absolute path names
# file:
mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/
hdd-images/21313
trusted.afr.storage0-client-2=0x
trusted.afr.storage0-client-3=0x

0 root@pserver5:~ # ls -alR
/opt/profitbricks/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/h
dd-images/21313
-rwxrwx--- 1 libvirt-qemu kvm 483183820800 Jan  1  1970
/opt/profitbricks/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/h
dd-images/21313

pserver3:

0 root@pserver3:~ # ls -al
/mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef
/hdd-images
 
-rwxrwx--- 1 libvirt-qemu kvm  483183820800 Jan  1  1970 21313

0 root@pserver3:~ # ls -alR
/opt/profitbricks/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/h
dd-images/21313
-rwxrwx--- 1 libvirt-qemu kvm 483183820800 Jan  1  1970
/opt/profitbricks/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/h
dd-images/21313

0 root@pserver3:~ # getfattr -R -d -e hex -m "trusted.afr."
/mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-
ad8f-8542864da6ef/hdd-images/21313
getfattr: Removing leading '/' from absolute path names
# file:
mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/
hdd-images/21313
trusted.afr.storage0-client-2=0x
trusted.afr.storage0-client-3=0x0b090900  <- mismatch,
should be targeted for self-heal/repair? Why is there a difference in the
views?


>From the volfile:

volume storage0-client-2
type protocol/client
option remote-host de-dc1-c1-pserver3
option remote-subvolume /mnt/gluster/brick1/storage
option transport-type rdma
option ping-timeout 5
end-volume

volume storage0-client-3
type protocol/client
option remote-host de-dc1-c1-pserver5
option remote-subvolume /mnt/gluster/brick1/storage
option transport-type rdma
option ping-timeout 5
end-volume



___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] How to debug a hanging client?

2011-05-13 Thread Martin Schenker
Error messages on pserver12 (opt*.log)

[2011-05-13 11:41:58.812937] E
[client-handshake.c:116:rpc_client_ping_timer_expired] 0-storage0-client-0:
Server 10.6.0.108:24009 has not responded in the last 5 seconds,
disconnecting.
[2011-05-13 12:11:57.954369] E [rpc-clnt.c:199:call_bail]
0-storage0-client-0: bailing out frame type(GlusterFS Handshake) op(PING(3))
xid = 0x210x sent = 2011-05-13 11:41:53.422855. timeout = 1800
[2011-05-13 12:11:57.954415] E [rpc-clnt.c:199:call_bail]
0-storage0-client-0: bailing out frame type(GlusterFS 3.1) op(LOOKUP(27))
xid = 0x209x sent = 2011-05-13 11:41:53.422846. timeout = 1800

Errors on pserver8 (the peer):

[2011-05-13 14:51:26.727334] E
[rdma.c:3423:rdma_handle_failed_send_completion] 0-rpc-transport/rdma: send
work request
on `mlx4_0' returned error wc.status = 12, wc.vendor_err = 129, post->buf =
0x43fa000, wc.byte_len = 0, post->reused = 8
9791
[2011-05-13 14:51:26.727374] E
[rdma.c:3431:rdma_handle_failed_send_completion] 0-rdma: connection between
client and se
rver not working. check by running 'ibv_srq_pingpong'. also make sure subnet
manager is running (eg: 'opensm'), or check
 if rdma port is valid (or active) by running 'ibv_devinfo'. contact Gluster
Support Team if the problem persists.
[2011-05-13 14:51:26.727617] E [rpc-clnt.c:340:saved_frames_unwind]
(-->/usr/lib/libgfrpc.so.0(rpc_clnt_notify+0x77) [0x
7f397dd0ba07] (-->/usr/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7e)
[0x7f397dd0b19e] (-->/usr/lib/libgfrpc.so.0(s
aved_frames_destroy+0xe) [0x7f397dd0b0fe]))) 0-rpc-clnt: forced unwinding
frame type(GF-DUMP) op(DUMP(1)) called at 2011
-05-13 14:51:22.620059
[2011-05-13 14:51:26.727670] M
[client-handshake.c:1178:client_dump_version_cbk] 0-: some error, retry
again later
[2011-05-13 14:51:26.727686] I [client.c:1601:client_rpc_notify]
0-storage0-client-1: disconnected

Could this be a bad IB card? After a reboot of pserver12 the system work
again, a try to shut down and restart just the ib0 interface failed (hung)

Best, Martin

-Original Message-
From: Martin Schenker [mailto:martin.schen...@profitbricks.com] 
Sent: Friday, May 13, 2011 3:36 PM
To: 'gluster-users@gluster.org'
Subject: How to debug a hanging client?

Hi all!

We have on server/client where the client part hangs quite often.

Strace shows:
0 root@de-blnstage-c2-pserver12:~ # strace -Tfv -p 12407 (
Process 12407 attached with 6 threads - interrupt to quit
[pid 12417] futex(0x2cb98a8, FUTEX_WAIT_PRIVATE, 2, NULL 
[pid 12412] read(12,  
[pid 12411] read(11,  
[pid 12410] futex(0x2cb9330, FUTEX_WAIT_PRIVATE, 2, NULL 
[pid 12408] rt_sigtimedwait([HUP INT TRAP BUS USR1 USR2 PIPE ALRM TERM CHLD
TTOU], NULL, NULL, 8

I can read from the server mountpoint just fine but any access to the fuse
mounted glusterfs hangs and can only be killed.

Any idea how to resolve this? If I try to kill all glusterfs process the
kill -9 on the process

root 12407 1  0 May11 ?00:00:01 /usr/sbin/glusterfs
--log-level=NORMAL --volfile-id=storage0 --volfile-server=localhost
/opt/profitbricks/storage

will hang as well. Just like an NFS server hang... waiting for I/O

Thanks, Martin

___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


[Gluster-users] How to debug a hanging client?

2011-05-13 Thread Martin Schenker
Hi all!

We have on server/client where the client part hangs quite often.

Strace shows:
0 root@de-blnstage-c2-pserver12:~ # strace -Tfv -p 12407 (
Process 12407 attached with 6 threads - interrupt to quit
[pid 12417] futex(0x2cb98a8, FUTEX_WAIT_PRIVATE, 2, NULL 
[pid 12412] read(12,  
[pid 12411] read(11,  
[pid 12410] futex(0x2cb9330, FUTEX_WAIT_PRIVATE, 2, NULL 
[pid 12408] rt_sigtimedwait([HUP INT TRAP BUS USR1 USR2 PIPE ALRM TERM CHLD
TTOU], NULL, NULL, 8

I can read from the server mountpoint just fine but any access to the fuse
mounted glusterfs hangs and can only be killed.

Any idea how to resolve this? If I try to kill all glusterfs process the
kill -9 on the process

root 12407 1  0 May11 ?00:00:01 /usr/sbin/glusterfs
--log-level=NORMAL --volfile-id=storage0 --volfile-server=localhost
/opt/profitbricks/storage

will hang as well. Just like an NFS server hang... waiting for I/O

Thanks, Martin

___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Fuse mounting problems.

2011-05-11 Thread Martin Schenker
Looks like we had an install conflict, in the meantime I dropped the
complete install and rebuilt to 3.1.3. Now the system is back up and
running...

Best, Martin

> -Original Message-
> From: Mohit Anchlia [mailto:mohitanch...@gmail.com] 
> Sent: Wednesday, May 11, 2011 6:27 PM
> To: Martin Schenker
> Cc: gluster-users@gluster.org
> Subject: Re: [Gluster-users] Fuse mounting problems.
> 
> 
> How are you mounting the gluster FS? Can you paste volume 
> info and your mount commands?
> 
> On Wed, May 11, 2011 at 4:35 AM, Martin Schenker 
>  wrote:
> > Looks like we have some problems after a "running upgrade" 
> from 3.1.3 
> > to 3.2.0
> >
> > Most pressing is to fix the current error state on 
> pserver12 after the 
> > roll-back, the FUSE mount doesn't work:
> >
> > Error messages in pserver12 opt log:
> >
> > [2011-05-11 09:45:03.1147] E 
> [glusterfsd-mgmt.c:628:mgmt_getspec_cbk]
> > 0-glusterfs: failed to get the 'volume file' from server 
> [2011-05-11 
> > 09:45:03.1222] E [glusterfsd-mgmt.c:695:mgmt_getspec_cbk]
> > 0-mgmt: failed to fetch volume file (key:storage0) [2011-05-11 
> > 09:45:03.1421] W [glusterfsd.c:700:cleanup_and_exit]
> > (-->/usr/lib/libgfrpc.so.0(rpc_clnt_notify+0xcd) [0x7f98a5032c0d]
> > (-->/usr/lib/libgfrpc.so.0(rpc_clnt_handle_reply+0xa4) 
> > [0x7f98a50329c4]
> > (-->/usr/sbin/glusterfs(mgmt_getspec_cbk+0x2fb) 
> [0x4080eb]))) 0-: received
> > signum (0), shutting down
> > [2011-05-11 09:45:03.1452] I [fuse-bridge.c:3688:fini] 
> 0-fuse: Unmounting
> > '/opt/profitbricks/storage'.
> >
> >
> > Vol files are located under 
> /mnt/gluster/brick0/config/vols/storage0/ 
> > which is mounted and readable.
> >
> > I can ping both servers with the names given in the FUSE 
> vol file, so 
> > they are visible.
> >
> > Any ideas appreciated!
> >
> > Best, Martin
> >
> >
> >
> > ___
> > Gluster-users mailing list
> > Gluster-users@gluster.org 
> > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
> >
> 

___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


[Gluster-users] Fuse mounting problems.

2011-05-11 Thread Martin Schenker
Looks like we have some problems after a "running upgrade" from 3.1.3 to
3.2.0

Most pressing is to fix the current error state on pserver12 after the
roll-back, the FUSE mount doesn't work:

Error messages in pserver12 opt log:

[2011-05-11 09:45:03.1147] E [glusterfsd-mgmt.c:628:mgmt_getspec_cbk]
0-glusterfs: failed to get the 'volume file' from server
[2011-05-11 09:45:03.1222] E [glusterfsd-mgmt.c:695:mgmt_getspec_cbk]
0-mgmt: failed to fetch volume file (key:storage0)
[2011-05-11 09:45:03.1421] W [glusterfsd.c:700:cleanup_and_exit]
(-->/usr/lib/libgfrpc.so.0(rpc_clnt_notify+0xcd) [0x7f98a5032c0d]
(-->/usr/lib/libgfrpc.so.0(rpc_clnt_handle_reply+0xa4) [0x7f98a50329c4]
(-->/usr/sbin/glusterfs(mgmt_getspec_cbk+0x2fb) [0x4080eb]))) 0-: received
signum (0), shutting down
[2011-05-11 09:45:03.1452] I [fuse-bridge.c:3688:fini] 0-fuse: Unmounting
'/opt/profitbricks/storage'.


Vol files are located under /mnt/gluster/brick0/config/vols/storage0/ which
is mounted and readable.

I can ping both servers with the names given in the FUSE vol file, so they
are visible.

Any ideas appreciated!

Best, Martin



___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] How do I load the I/O stats translator for theprofiling options?

2011-05-09 Thread Martin Schenker
Hi Pranith!

Log says "RPC Program procedure not available", nothing else...

0 root@de-blnstage-c1-pserver4:~ # tail  /var/log/glusterfs/etc*
[2011-05-09 18:08:07.48398] I
[glusterd-handler.c:729:glusterd_handle_cli_list_friends] 0-glusterd:
Received cli list req
[2011-05-09 18:40:21.382115] E [rpcsvc.c:707:rpcsvc_program_actor]
0-rpc-service: RPC Program procedure not available
[2011-05-09 18:41:40.77542] E [rpcsvc.c:707:rpcsvc_program_actor]
0-rpc-service: RPC Program procedure not available
[2011-05-09 18:46:52.231895] E [rpcsvc.c:707:rpcsvc_program_actor]
0-rpc-service: RPC Program procedure not available
[2011-05-09 18:47:27.543788] I
[glusterd-handler.c:775:glusterd_handle_cli_get_volume] 0-glusterd: Received
get vol req
[2011-05-09 18:47:27.50] I
[glusterd-handler.c:775:glusterd_handle_cli_get_volume] 0-glusterd: Received
get vol req
[2011-05-09 18:47:40.115263] E [rpcsvc.c:707:rpcsvc_program_actor]
0-rpc-service: RPC Program procedure not available
[2011-05-09 18:47:42.707978] I
[glusterd-handler.c:775:glusterd_handle_cli_get_volume] 0-glusterd: Received
get vol req
[2011-05-09 18:47:42.708565] I
[glusterd-handler.c:775:glusterd_handle_cli_get_volume] 0-glusterd: Received
get vol req
[2011-05-10 05:49:48.784387] E [rpcsvc.c:707:rpcsvc_program_actor]
0-rpc-service: RPC Program procedure not available
[2011-05-10 05:50:30.147291] E [rpcsvc.c:707:rpcsvc_program_actor]
0-rpc-service: RPC Program procedure not available

1 root@de-blnstage-c1-pserver4:~ # tail /var/log/glusterfs/opt*
[2011-05-09 16:14:04.934058] I [afr-open.c:435:afr_openfd_sh]
0-storage0-replicate-0:  data self-heal triggered. path:
/images/2828/4473b07f-d879-2398-4745-0e85ec246342/hdd-images/64190, reason:
Replicate up down flush, data lock is held
[2011-05-09 16:14:04.934578] E
[afr-self-heal-common.c:1214:sh_missing_entries_create]
0-storage0-replicate-0: no missing files -
/images/2828/4473b07f-d879-2398-4745-0e85ec246342/hdd-images/64190.
proceeding to metadata check
[2011-05-09 16:14:04.935980] E [afr-common.c:110:afr_set_split_brain]
0-storage0-replicate-0: invalid argument: inode
[2011-05-09 16:14:04.936025] I
[afr-self-heal-common.c:1527:afr_self_heal_completion_cbk]
0-storage0-replicate-0: background  data self-heal completed on
/images/2828/4473b07f-d879-2398-4745-0e85ec246342/hdd-images/64190

Martin

> -Original Message-
> From: Pranith Kumar. Karampuri [mailto:prani...@gluster.com] 
> Sent: Tuesday, May 10, 2011 8:20 AM
> To: Martin Schenker
> Cc: Gluster General Discussion List
> Subject: Re: [Gluster-users] How do I load the I/O stats 
> translator for theprofiling options?
> 
> 
> Could you please post the logs from the glusterd, it is 
> generally located at: 
> /usr/local/var/log/glusterfs/usr-local-etc-glusterfs-glusterd.
> vol.log, better zip it and send it. I will take a look and 
> let you know what the problem is.
> 
> Pranith.
> - Original Message -
> From: "Martin Schenker" 
> To: "Pranith Kumar. Karampuri" 
> Cc: "Gluster General Discussion List" 
> Sent: Tuesday, May 10, 2011 11:23:03 AM
> Subject: RE: [Gluster-users] How do I load the I/O stats 
> translator for theprofiling options?
> 
> That's what I did after the upgrade to 3.2.0
> 
> No feedback from the system that the stats are recorded...
> 
> "gluster volume profile storage0 start" didn't respond with 
> ANYTHING. And no new flags show up in the vol files...
> 
> 
> Best, Martin
> 
> 
> > -Original Message-
> > From: Pranith Kumar. Karampuri [mailto:prani...@gluster.com]
> > Sent: Tuesday, May 10, 2011 7:27 AM
> > To: Martin Schenker
> > Cc: Gluster General Discussion List
> > Subject: Re: [Gluster-users] How do I load the I/O stats 
> > translator for theprofiling options?
> > 
> > 
> > hi Martin,
> > IO-stats is loaded by default. Please use the profile
> > commands listed in the following document to 
> > start/stop/display profile output. 
> > http://www.gluster.com/community/documentation/index.php/Glust
> > er_3.2:_Running_GlusterFS_Volume_Profile_Command
> > 
> > Pranith
> > - Original Message -
> > From: "Martin Schenker" 
> > To: "Gluster General Discussion List" 
> > Sent: Tuesday, May 10, 2011 12:21:49 AM
> > Subject: [Gluster-users] How do I load the I/O stats 
> > translator for the  profiling options?
> > 
> > Just upgraded to 3.2.0 and would like to check the I/O stats.
> > But how do I "load the I/O stats translator" properly as 
> > described in the 3.2 manual? 
> > 
> > The manual is a bit vague...
> > 
> > Thanks, Martin
> > 
> > ___
> > Gluster-users mailing list
> > Gluster-users@gluster.org
> > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
> > 
> 

___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] How do I load the I/O stats translator for theprofiling options?

2011-05-09 Thread Martin Schenker
That's what I did after the upgrade to 3.2.0

No feedback from the system that the stats are recorded...

"gluster volume profile storage0 start" didn't respond with ANYTHING. And no
new flags show up in the vol files...


Best, Martin


> -Original Message-
> From: Pranith Kumar. Karampuri [mailto:prani...@gluster.com] 
> Sent: Tuesday, May 10, 2011 7:27 AM
> To: Martin Schenker
> Cc: Gluster General Discussion List
> Subject: Re: [Gluster-users] How do I load the I/O stats 
> translator for theprofiling options?
> 
> 
> hi Martin,
> IO-stats is loaded by default. Please use the profile 
> commands listed in the following document to 
> start/stop/display profile output. 
> http://www.gluster.com/community/documentation/index.php/Glust
> er_3.2:_Running_GlusterFS_Volume_Profile_Command
> 
> Pranith
> - Original Message -
> From: "Martin Schenker" 
> To: "Gluster General Discussion List" 
> Sent: Tuesday, May 10, 2011 12:21:49 AM
> Subject: [Gluster-users] How do I load the I/O stats 
> translator for theprofiling options?
> 
> Just upgraded to 3.2.0 and would like to check the I/O stats. 
> But how do I "load the I/O stats translator" properly as 
> described in the 3.2 manual? 
> 
> The manual is a bit vague...
> 
> Thanks, Martin
> 
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org 
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
> 

___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


[Gluster-users] How do I load the I/O stats translator for the profiling options?

2011-05-09 Thread Martin Schenker
Just upgraded to 3.2.0 and would like to check the I/O stats. But how do I
"load the I/O stats translator" properly as described in the 3.2 manual? 

The manual is a bit vague...

Thanks, Martin

___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Best practice to stop the Gluster CLIENT process?

2011-05-06 Thread Martin Schenker
Thanks Joe!

...thought so! Thanks for the confirmation. Time to redesign... system was
set up before my time. Oh, well...

Perhaps this should be added in the manual? Or is that already in there and
was overlooked?

Best, Martin

-Original Message-
From: Joe Landman [mailto:land...@scalableinformatics.com] 
Sent: Friday, May 06, 2011 10:22 PM
To: Martin Schenker
Subject: Re: [Gluster-users] Best practice to stop the Gluster CLIENT
process?

On 05/06/2011 04:13 PM, Martin Schenker wrote:
> Won't help if it's the logging process itself, I guess?!? Had that
> earlier...
>
> Is it recommended to log somewhere else than the Gluster file system?

Yes.  You should log to a file system on a device that the Gluster file 
system is not on.

>
> Best, Martin
>
> -Original Message-
> From: gluster-users-boun...@gluster.org
> [mailto:gluster-users-boun...@gluster.org] On Behalf Of Joe Landman
> Sent: Friday, May 06, 2011 9:36 PM
> To: gluster-users@gluster.org
> Subject: Re: [Gluster-users] Best practice to stop the Gluster CLIENT
> process?
>
> On 05/06/2011 03:14 PM, Martin Schenker wrote:
>> So if I get this right, you'll have to rip the heart out (kill all
gluster
>> processes; server AND client) in order to get to the local server
>> filesystem.
>>
>> I had hoped that the client part could be left running (to the second
> mirror
>> brick) when doing repairs etc. Looks like a wrong assumption, I guess...
>
> or use fuser/lsof to determine which process is locking which volume.
> Kill only that process.
>
>


-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: land...@scalableinformatics.com
web  : http://scalableinformatics.com
http://scalableinformatics.com/sicluster
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615

___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Best practice to stop the Gluster CLIENT process?

2011-05-06 Thread Martin Schenker
So if I get this right, you'll have to rip the heart out (kill all gluster
processes; server AND client) in order to get to the local server
filesystem.

I had hoped that the client part could be left running (to the second mirror
brick) when doing repairs etc. Looks like a wrong assumption, I guess...

Are client/server hybrids ONLY connected to the LOCAL server?

Best, Martin

___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] [SPAM?] Do clients need to run glusterd?

2011-05-06 Thread Martin Schenker
That's exactly how we're running our systems. Boxes are both, server and
clients. 

So there's then no way to separate the client functionality from the server
part? No clean cut between glusterd and glusterfsd?

Best,  Martin



-Original Message-
From: gluster-users-boun...@gluster.org
[mailto:gluster-users-boun...@gluster.org] On Behalf Of Anthony J. Biacco
Sent: Friday, May 06, 2011 8:53 PM
To: Burnash, James; gluster-users
Subject: Re: [Gluster-users] [SPAM?] Do clients need to run glusterd?

I never understood before why shutting down the glusterd service killed
the clients too. But given this, now I get it.
I'm guessing it would not be recommended or supported then to run a
gluster client on the same machine as a gluster server? Even if the
gluster client is connecting to a server other than the one on the local
machine.

-Tony
---
Manager, IT Operations
Format Dynamics, Inc.
P: 303-228-7327
F: 303-228-7305
abia...@formatdynamics.com
http://www.formatdynamics.com


___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Best practice to stop the Gluster CLIENT process?

2011-05-06 Thread Martin Schenker
Thanks for all the responses!

That's what I did, umount the client dir. But this STILL left the filesystem
locked... no luck here.

I'll try James' script next.

Best, Martin

___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


[Gluster-users] Best practice to stop the Gluster CLIENT process?

2011-05-06 Thread Martin Schenker
Hi all!

What's the best way to stop the CLIENT process for Gluster?

We have dual systems, where the Gluster servers also act as clients, so
both, glusterd and glusterfsd are running on the system.

Stopping the server app. works via "/etc/init.d/glusterd stop" but the
client is stopped how? 

I need to unmount the filesystem from the server in order to do a fsck on
the ext4 volume; we have the "needs_recovery" flag set. But the client is
hogging it as well due to the log files being located on the volume (might
be a good idea to log somewhere else...)

Any pointers welcome, I find it difficult to obtain "simple" instructions
like this from the Gluster pages.

Even google doesn't help, sigh. Or I'm too blind...

Best, Martin

___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


[Gluster-users] Output of "getfattr" command, what does the code tell me?

2011-05-04 Thread Martin Schenker
Is there a way to make use of the attr code given by

0 root@de-dc1-c1-pserver13:~ # getfattr -R -d -e hex -m "trusted.afr."
/mnt/gluster/brick?/storage | grep -v 0x | grep -B1
-A1 trusted

# file: mnt/gluster/brick0/storage/pserver3-23
trusted.afr.storage0-client-4=0xd701

or

0 root@de-dc1-c1-pserver13:~ # getfattr -R -d -m "trusted.afr."
/mnt/gluster/brick?/storage | grep -v 0s | grep -B1 -A1
trusted

# file: mnt/gluster/brick0/storage/pserver3-23
trusted.afr.storage0-client-4=0s1wAAAQAA

Is there any useful information hidden in the attribute strings? I guess so
but I failed to find anything with Google etc.

Thanks, Martin


___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Split brain; which file to choose for repair?

2011-05-04 Thread Martin Schenker
Thanks!

 

Unfortunately the md5sums don't match...  but the sizes & timestamps do. 

These binary files from VMs are quite difficult to work with. That's why I
was after some ideas WHICH file should be preferred.

 

I'm actually quite concerned how on earth do we trigger "split brain"
conditions on a regular basis. 

 

Are there any good "DON'Ts" for server updates/maintenance? The current way
is shutting down one server , upgrading/installing it, bringing up the
Gluster daemons (server & client)/mounting the file system, checking
everything is running. Then starting on the next server. 

So a pure sequential approach. Thought this would minimise the risk of
getting the files in a twist?!?

 

Best, Martin

 

From: Anand Avati [mailto:anand.av...@gmail.com] 
Sent: Wednesday, May 04, 2011 2:55 PM
To: Martin Schenker
Cc: gluster-users@gluster.org
Subject: Re: [Gluster-users] Split brain; which file to choose for repair?

 

Occurance of a split brain situation is only under a specific sequence of
events and modifications and the filesystem cannot decide which of the two
copies of the file is updated. It might so happen that the two changes were
actually the same "change" and hence the two copies of your file might match
md5sum (in which case you can delete one arbitrarily). If not, you need to
know how your application works and which of the file (inspecting the
content) is more appropriate to be deleted.

 

Avati

On Wed, May 4, 2011 at 5:54 PM, Martin Schenker
 wrote:

Hi all!

Is there anybody who can give some pointers regarding which file to choose
in a "split brain" condition?

What tests do I need to run?

What does the hex AFR code actually show? Is there a way to pinpoint the
"better/worse" file for deletion?

On pserver12:


# file: mnt/gluster/brick0/storage/pserver3-19
trusted.afr.storage0-client-5=0x3f01

On pserver13:


# file: mnt/gluster/brick0/storage/pserver3-19
trusted.afr.storage0-client-4=0xd701

These are test files, but I'd like to know what to do in a LIFE situation
which will be just around the corner.

The Timestamps show the same values, so I'm a bit puzzled HOW to choose a
file.

pserver12:

0 root@de-dc1-c1-pserver12:~ # ls -al
/mnt/gluster/brick0/storage/pserver3-19
-rw-r--r-- 1 vcb root 3456106496 Apr 29 17:40
/mnt/gluster/brick0/storage/pserver3-19

0 root@de-dc1-c1-pserver12:~ # ls -alu
/mnt/gluster/brick0/storage/pserver3-19
-rw-r--r-- 1 vcb root 3456106496 Apr 28 16:18
/mnt/gluster/brick0/storage/pserver3-19

pserver13:

0 root@de-dc1-c1-pserver13:~ # ls -al
/mnt/gluster/brick0/storage/pserver3-19
-rw-r--r-- 1 vcb root 3456106496 Apr 29 17:40
/mnt/gluster/brick0/storage/pserver3-19

0 root@de-dc1-c1-pserver13:~ # ls -alu
/mnt/gluster/brick0/storage/pserver3-19
-rw-r--r-- 1 vcb root 3456106496 Apr 28 16:18
/mnt/gluster/brick0/storage/pserver3-19

Best, Martin


___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

 

___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Split brain; which file to choose for repair?

2011-05-04 Thread Martin Schenker
Hi all! 

Is there anybody who can give some pointers regarding which file to choose
in a "split brain" condition? 

What tests do I need to run? 

What does the hex AFR code actually show? Is there a way to pinpoint the
"better/worse" file for deletion? 

On pserver12:

# file: mnt/gluster/brick0/storage/pserver3-19
trusted.afr.storage0-client-5=0x3f01

On pserver13:

# file: mnt/gluster/brick0/storage/pserver3-19
trusted.afr.storage0-client-4=0xd701

These are test files, but I'd like to know what to do in a LIFE situation
which will be just around the corner.

The Timestamps show the same values, so I'm a bit puzzled HOW to choose a
file.

pserver12:

0 root@de-dc1-c1-pserver12:~ # ls -al
/mnt/gluster/brick0/storage/pserver3-19
-rw-r--r-- 1 vcb root 3456106496 Apr 29 17:40
/mnt/gluster/brick0/storage/pserver3-19

0 root@de-dc1-c1-pserver12:~ # ls -alu
/mnt/gluster/brick0/storage/pserver3-19
-rw-r--r-- 1 vcb root 3456106496 Apr 28 16:18
/mnt/gluster/brick0/storage/pserver3-19

pserver13:

0 root@de-dc1-c1-pserver13:~ # ls -al
/mnt/gluster/brick0/storage/pserver3-19
-rw-r--r-- 1 vcb root 3456106496 Apr 29 17:40
/mnt/gluster/brick0/storage/pserver3-19

0 root@de-dc1-c1-pserver13:~ # ls -alu
/mnt/gluster/brick0/storage/pserver3-19
-rw-r--r-- 1 vcb root 3456106496 Apr 28 16:18
/mnt/gluster/brick0/storage/pserver3-19

Best, Martin

___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


[Gluster-users] Split brain; which file to choose for repair?

2011-05-03 Thread Martin Schenker
Hi all!

Another incident, now a real "split brain" situation:

Server pair 12 & 13, a set of files can't be repaired and throws errors. 

Is there a way to interpret the AFR code in order to select which files
should be chosen to be deleted/overwritten?!


No errors in opt-profitbricks-storage.log from pserver12; but
opt-profitbricks-storage.log from pserver13 says:

 [2011-05-03 18:14:29.343512] I [afr-common.c:672:afr_lookup_done]
0-storage0-replicate-2: split brain detected during lookup of /pserver3-11.
[2011-05-03 18:14:29.344467] E [afr-self-heal-data.c:645:afr_sh_data_fix]
0-storage0-replicate-2: Unable to self-heal contents of '/pserver3-11'
(possible split-brain). Please delete the file from all but the preferred
subvolume.
[2011-05-03 18:14:29.347376] I [afr-common.c:672:afr_lookup_done]
0-storage0-replicate-2: split brain detected during lookup of /pserver3-16.
[2011-05-03 18:14:29.348157] E [afr-self-heal-data.c:645:afr_sh_data_fix]
0-storage0-replicate-2: Unable to self-heal contents of '/pserver3-16'
(possible split-brain). Please delete the file from all but the preferred
subvolume.
[2011-05-03 18:14:29.349013] I [afr-common.c:672:afr_lookup_done]
0-storage0-replicate-2: split brain detected during lookup of /pserver3-17.
[2011-05-03 18:14:29.349817] E [afr-self-heal-data.c:645:afr_sh_data_fix]
0-storage0-replicate-2: Unable to self-heal contents of '/pserver3-17'
(possible split-brain). Please delete the file from all but the preferred
subvolume.
[2011-05-03 18:14:29.351252] I [afr-common.c:672:afr_lookup_done]
0-storage0-replicate-2: split brain detected during lookup of /pserver3-19.
[2011-05-03 18:14:29.352043] E [afr-self-heal-data.c:645:afr_sh_data_fix]
0-storage0-replicate-2: Unable to self-heal contents of '/pserver3-19'
(possible split-brain). Please delete the file from all but the preferred
subvolume.
[2011-05-03 18:14:29.353477] I [afr-common.c:672:afr_lookup_done]
0-storage0-replicate-2: split brain detected during lookup of /pserver3-20.
[2011-05-03 18:14:29.354242] E [afr-self-heal-data.c:645:afr_sh_data_fix]
0-storage0-replicate-2: Unable to self-heal contents of '/pserver3-20'
(possible split-brain). Please delete the file from all but the preferred
subvolume.
[2011-05-03 18:14:29.356343] I [afr-common.c:672:afr_lookup_done]
0-storage0-replicate-2: split brain detected during lookup of /pserver3-23.
[2011-05-03 18:14:29.357198] E [afr-self-heal-data.c:645:afr_sh_data_fix]
0-storage0-replicate-2: Unable to self-heal contents of '/pserver3-23'
(possible split-brain). Please delete the file from all but the preferred
subvolume.
[2011-05-03 18:14:29.358030] I [afr-common.c:672:afr_lookup_done]
0-storage0-replicate-2: split brain detected during lookup of /pserver3-24.
[2011-05-03 18:14:29.358877] E [afr-self-heal-data.c:645:afr_sh_data_fix]
0-storage0-replicate-2: Unable to self-heal contents of '/pserver3-24'
(possible split-brain). Please delete the file from all but the preferred
subvolume.
[2011-05-03 18:14:29.362652] I [afr-common.c:672:afr_lookup_done]
0-storage0-replicate-2: split brain detected during lookup of /pserver3-3.
[2011-05-03 18:14:29.363431] E [afr-self-heal-data.c:645:afr_sh_data_fix]
0-storage0-replicate-2: Unable to self-heal contents of '/pserver3-3'
(possible split-brain). Please delete the file from all but the preferred
subvolume.
[2011-05-03 18:14:29.364261] I [afr-common.c:672:afr_lookup_done]
0-storage0-replicate-2: split brain detected during lookup of /pserver3-30.
[2011-05-03 18:14:29.365041] E [afr-self-heal-data.c:645:afr_sh_data_fix]
0-storage0-replicate-2: Unable to self-heal contents of '/pserver3-30'
(possible split-brain). Please delete the file from all but the preferred
subvolume.
[2011-05-03 18:14:29.368924] I [afr-common.c:672:afr_lookup_done]
0-storage0-replicate-2: split brain detected during lookup of /pserver3-36.
[2011-05-03 18:14:29.369682] E [afr-self-heal-data.c:645:afr_sh_data_fix]
0-storage0-replicate-2: Unable to self-heal contents of '/pserver3-36'
(possible split-brain). Please delete the file from all but the preferred
subvolume.
[2011-05-03 18:14:29.371696] I [afr-common.c:672:afr_lookup_done]
0-storage0-replicate-2: split brain detected during lookup of /pserver3-39.
[2011-05-03 18:14:29.372451] E [afr-self-heal-data.c:645:afr_sh_data_fix]
0-storage0-replicate-2: Unable to self-heal contents of '/pserver3-39'
(possible split-brain). Please delete the file from all but the preferred
subvolume.
[2011-05-03 18:14:29.373939] I [afr-common.c:672:afr_lookup_done]
0-storage0-replicate-2: split brain detected during lookup of /pserver3-5.
[2011-05-03 18:14:29.374705] E [afr-self-heal-data.c:645:afr_sh_data_fix]
0-storage0-replicate-2: Unable to self-heal contents of '/pserver3-5'
(possible split-brain). Please delete the file from all but the preferred
subvolume.

0 root@de-dc1-c1-pserver12:/var/log/glusterfs # getfattr -R -d -e hex -m
"trusted.afr." /mnt/gluster/brick?/storage | grep -v
0x | grep -B1 -A

Re: [Gluster-users] Server outage, file sync/self-heal doesn't sync ALL files?!

2011-05-01 Thread Martin Schenker
Hi all!

After the start of pserver12 I ran the getfattr command on all 4 systems in
order to check which files were out of sync. This came back with 63 files on
pserver12 and none on the others. After starting the gluster server and
client daemons on 12, the first batch was done automagically, as stated
before. But not all of them as II would have expected.

Best, Martin

2011/4/29 Pranith Kumar. Karampuri 

> This means that there is no differences in gfids. Could you let me know how
> the self heal is done after the pserver12 was brought up?.
> How did you find out that the self-heal is needed for 63 files?.
>
> Pranith.
> - Original Message -----
> From: "Martin Schenker" 
> To: "Pranith Kumar. Karampuri" ,
> gluster-users@gluster.org
> Sent: Friday, April 29, 2011 11:05:55 PM
> Subject: Re: [Gluster-users] Server outage, file sync/self-heal doesn't
> sync ALL files?!
>
> Sorry, I had manually sync due to imminent server upgrades.
> 50 min. after the initial sync I was asked to bring the servers in a
> safe state for an upgrade and did a manual
> "touch-on-server13-client-mountpoint" which triggered an immediate
> self-heal on the rest of the files.
>
> All files were in sync across all four server after this action. Will
> run this command next time!!
>
> Best, Martin
>
> Am 29.04.2011 19:30, schrieb Pranith Kumar. Karampuri:
> > hi Martin,
> >Could you please send the output of -m "trusted*" instead of
> "trusted.afr" for the remaining 24 files from both the servers. I would like
> to see the gfids of these files on both the machines.
> >
> > Pranith.
> > - Original Message -
> > From: "Martin Schenker"
> > To: gluster-users@gluster.org
> > Sent: Friday, April 29, 2011 8:39:46 PM
> > Subject: [Gluster-users] Server outage,   file sync/self-heal doesn't
> sync ALL files?!
> >
> > Hi all!
> >
> > We have another incident over here.
> >
> > One of the servers (pserver12) in a pair (12&  13) has been rebooted.
> > pserver13 showed 63 files not in sync after the outage for 2h.
> >
> > Both server are clients as well.
> >
> > Starting pserver12 brought up the self-heal mechanism, but only 39 files
> > were triggered within the first 10 min. Now the system seems dormant and
> > 24 files are left hanging.
> >
> > On the other three servers no inconsistencies are seen.
> >
> > tail of client log file:
> >
> > 2011-04-29 14:48:23.820022] I
> > [afr-self-heal-algorithm.c:526:sh_diff_loop_driver_done]
> > 0-storage0-replicate-2: diff self-heal on /pserver13-17: 1960 blocks of
> > 22736 were different (8.62%)
> > [2011-04-29 14:48:23.887651] E [afr-common.c:110:afr_set_split_brain]
> > 0-storage0-replicate-2: invalid argument: inode
> > [2011-04-29 14:48:23.887740] I
> > [afr-self-heal-common.c:1527:afr_self_heal_completion_cbk]
> > 0-storage0-replicate-2: background  data self-heal completed on
> > /pserver13-17
> > [2011-04-29 14:48:24.272220] I
> > [afr-self-heal-algorithm.c:526:sh_diff_loop_driver_done]
> > 0-storage0-replicate-2: diff self-heal on /pserver13-19: 1960 blocks of
> > 22744 were different (8.62%)
> > [2011-04-29 14:48:24.341868] E [afr-common.c:110:afr_set_split_brain]
> > 0-storage0-replicate-2: invalid argument: inode
> > [2011-04-29 14:48:24.341959] I
> > [afr-self-heal-common.c:1527:afr_self_heal_completion_cbk]
> > 0-storage0-replicate-2: background  data self-heal completed on
> > /pserver13-19
> > [2011-04-29 14:48:24.758131] I
> > [afr-self-heal-algorithm.c:526:sh_diff_loop_driver_done]
> > 0-storage0-replicate-2: diff self-heal on /pserver13-23: 1952 blocks of
> > 22752 were different (8.58%)
> > [2011-04-29 14:48:24.766054] E [afr-common.c:110:afr_set_split_brain]
> > 0-storage0-replicate-2: invalid argument: inode
> > [2011-04-29 14:48:24.766137] I
> > [afr-self-heal-common.c:1527:afr_self_heal_completion_cbk]
> > 0-storage0-replicate-2: background  data self-heal completed on
> > /pserver13-23
> > [2011-04-29 14:48:24.884613] I
> > [afr-self-heal-algorithm.c:526:sh_diff_loop_driver_done]
> > 0-storage0-replicate-2: diff self-heal on /pserver13-10: 1952 blocks of
> > 22760 were different (8.58%)
> > [2011-04-29 14:48:24.895631] E [afr-common.c:110:afr_set_split_brain]
> > 0-storage0-replicate-2: invalid argument: inode
> > [2011-04-29 14:48:24.895721] I
> > [afr-self-heal-common.c:1527:afr_self_heal_completion_cbk]
> > 0-storage0-replicate-2: background  data self-heal completed on
> > /pserver13-10
> &g

Re: [Gluster-users] Server outage, file sync/self-heal doesn't sync ALL files?!

2011-04-29 Thread Martin Schenker

Sorry, I had manually sync due to imminent server upgrades.
50 min. after the initial sync I was asked to bring the servers in a 
safe state for an upgrade and did a manual 
"touch-on-server13-client-mountpoint" which triggered an immediate 
self-heal on the rest of the files.


All files were in sync across all four server after this action. Will 
run this command next time!!


Best, Martin

Am 29.04.2011 19:30, schrieb Pranith Kumar. Karampuri:

hi Martin,
   Could you please send the output of -m "trusted*" instead of 
"trusted.afr" for the remaining 24 files from both the servers. I would like to see the 
gfids of these files on both the machines.

Pranith.
- Original Message -
From: "Martin Schenker"
To: gluster-users@gluster.org
Sent: Friday, April 29, 2011 8:39:46 PM
Subject: [Gluster-users] Server outage, file sync/self-heal doesn't sync ALL 
files?!

Hi all!

We have another incident over here.

One of the servers (pserver12) in a pair (12&  13) has been rebooted.
pserver13 showed 63 files not in sync after the outage for 2h.

Both server are clients as well.

Starting pserver12 brought up the self-heal mechanism, but only 39 files
were triggered within the first 10 min. Now the system seems dormant and
24 files are left hanging.

On the other three servers no inconsistencies are seen.

tail of client log file:

2011-04-29 14:48:23.820022] I
[afr-self-heal-algorithm.c:526:sh_diff_loop_driver_done]
0-storage0-replicate-2: diff self-heal on /pserver13-17: 1960 blocks of
22736 were different (8.62%)
[2011-04-29 14:48:23.887651] E [afr-common.c:110:afr_set_split_brain]
0-storage0-replicate-2: invalid argument: inode
[2011-04-29 14:48:23.887740] I
[afr-self-heal-common.c:1527:afr_self_heal_completion_cbk]
0-storage0-replicate-2: background  data self-heal completed on
/pserver13-17
[2011-04-29 14:48:24.272220] I
[afr-self-heal-algorithm.c:526:sh_diff_loop_driver_done]
0-storage0-replicate-2: diff self-heal on /pserver13-19: 1960 blocks of
22744 were different (8.62%)
[2011-04-29 14:48:24.341868] E [afr-common.c:110:afr_set_split_brain]
0-storage0-replicate-2: invalid argument: inode
[2011-04-29 14:48:24.341959] I
[afr-self-heal-common.c:1527:afr_self_heal_completion_cbk]
0-storage0-replicate-2: background  data self-heal completed on
/pserver13-19
[2011-04-29 14:48:24.758131] I
[afr-self-heal-algorithm.c:526:sh_diff_loop_driver_done]
0-storage0-replicate-2: diff self-heal on /pserver13-23: 1952 blocks of
22752 were different (8.58%)
[2011-04-29 14:48:24.766054] E [afr-common.c:110:afr_set_split_brain]
0-storage0-replicate-2: invalid argument: inode
[2011-04-29 14:48:24.766137] I
[afr-self-heal-common.c:1527:afr_self_heal_completion_cbk]
0-storage0-replicate-2: background  data self-heal completed on
/pserver13-23
[2011-04-29 14:48:24.884613] I
[afr-self-heal-algorithm.c:526:sh_diff_loop_driver_done]
0-storage0-replicate-2: diff self-heal on /pserver13-10: 1952 blocks of
22760 were different (8.58%)
[2011-04-29 14:48:24.895631] E [afr-common.c:110:afr_set_split_brain]
0-storage0-replicate-2: invalid argument: inode
[2011-04-29 14:48:24.895721] I
[afr-self-heal-common.c:1527:afr_self_heal_completion_cbk]
0-storage0-replicate-2: background  data self-heal completed on
/pserver13-10
0 root@pserver13:/var/log/glusterfs # date
Fri Apr 29 15:08:18 UTC 2011


Search for mismatch:

0 root@pserver13:~ # getfattr -R -d -e hex -m "trusted.afr."
/mnt/gluster/brick?/storage | grep -v 0x | grep
-B1 -A1 trusted | grep -c file
getfattr: Removing leading '/' from absolute path names
*24*


0 root@pserver13:~ # getfattr -R -d -e hex -m "trusted.afr."
/mnt/gluster/brick?/storage | grep -v 0x | grep
-B1  trusted
getfattr: Removing leading '/' from absolute path names
# file: mnt/gluster/brick0/storage/de-dc1-c1-pserver5-33
trusted.afr.storage0-client-4=0x2701
--
# file: mnt/gluster/brick0/storage/de-dc1-c1-pserver5-26
trusted.afr.storage0-client-4=0x2701
--
# file:
mnt/gluster/brick0/storage/images/1959/cd55c5f3-9aa1-bfd9-99a0-01c13a7d8559/hdd-images
trusted.afr.storage0-client-4=0x00160001
--
# file: mnt/gluster/brick0/storage/de-dc1-c1-pserver5-24
trusted.afr.storage0-client-4=0x2701
--
# file: mnt/gluster/brick0/storage/de-dc1-c1-pserver5-8
trusted.afr.storage0-client-4=0x2701
--
# file: mnt/gluster/brick0/storage/de-dc1-c1-pserver5-21
trusted.afr.storage0-client-4=0x2701
--
# file: mnt/gluster/brick0/storage/de-dc1-c1-pserver5-22
trusted.afr.storage0-client-4=0x2701
--
# file: mnt/gluster/brick0/storage/de-dc1-c1-pserver5-30
trusted.afr.storage0-client-4=0x2701
--
# file: mnt/gluster/brick0/storage/de-dc1-c1-pserver5-20
trusted.afr.storage0-client-4=0x2701
--
# file: mnt/gluster/brick0/storage/de-

[Gluster-users] Server outage, file sync/self-heal doesn't sync ALL files?!

2011-04-29 Thread Martin Schenker

Hi all!

We have another incident over here.

One of the servers (pserver12) in a pair (12 & 13) has been rebooted.  
pserver13 showed 63 files not in sync after the outage for 2h.


Both server are clients as well.

Starting pserver12 brought up the self-heal mechanism, but only 39 files 
were triggered within the first 10 min. Now the system seems dormant and 
24 files are left hanging.


On the other three servers no inconsistencies are seen.

tail of client log file:

2011-04-29 14:48:23.820022] I 
[afr-self-heal-algorithm.c:526:sh_diff_loop_driver_done] 
0-storage0-replicate-2: diff self-heal on /pserver13-17: 1960 blocks of 
22736 were different (8.62%)
[2011-04-29 14:48:23.887651] E [afr-common.c:110:afr_set_split_brain] 
0-storage0-replicate-2: invalid argument: inode
[2011-04-29 14:48:23.887740] I 
[afr-self-heal-common.c:1527:afr_self_heal_completion_cbk] 
0-storage0-replicate-2: background  data self-heal completed on 
/pserver13-17
[2011-04-29 14:48:24.272220] I 
[afr-self-heal-algorithm.c:526:sh_diff_loop_driver_done] 
0-storage0-replicate-2: diff self-heal on /pserver13-19: 1960 blocks of 
22744 were different (8.62%)
[2011-04-29 14:48:24.341868] E [afr-common.c:110:afr_set_split_brain] 
0-storage0-replicate-2: invalid argument: inode
[2011-04-29 14:48:24.341959] I 
[afr-self-heal-common.c:1527:afr_self_heal_completion_cbk] 
0-storage0-replicate-2: background  data self-heal completed on 
/pserver13-19
[2011-04-29 14:48:24.758131] I 
[afr-self-heal-algorithm.c:526:sh_diff_loop_driver_done] 
0-storage0-replicate-2: diff self-heal on /pserver13-23: 1952 blocks of 
22752 were different (8.58%)
[2011-04-29 14:48:24.766054] E [afr-common.c:110:afr_set_split_brain] 
0-storage0-replicate-2: invalid argument: inode
[2011-04-29 14:48:24.766137] I 
[afr-self-heal-common.c:1527:afr_self_heal_completion_cbk] 
0-storage0-replicate-2: background  data self-heal completed on 
/pserver13-23
[2011-04-29 14:48:24.884613] I 
[afr-self-heal-algorithm.c:526:sh_diff_loop_driver_done] 
0-storage0-replicate-2: diff self-heal on /pserver13-10: 1952 blocks of 
22760 were different (8.58%)
[2011-04-29 14:48:24.895631] E [afr-common.c:110:afr_set_split_brain] 
0-storage0-replicate-2: invalid argument: inode
[2011-04-29 14:48:24.895721] I 
[afr-self-heal-common.c:1527:afr_self_heal_completion_cbk] 
0-storage0-replicate-2: background  data self-heal completed on 
/pserver13-10

0 root@pserver13:/var/log/glusterfs # date
Fri Apr 29 15:08:18 UTC 2011


Search for mismatch:

0 root@pserver13:~ # getfattr -R -d -e hex -m "trusted.afr." 
/mnt/gluster/brick?/storage | grep -v 0x | grep 
-B1 -A1 trusted | grep -c file

getfattr: Removing leading '/' from absolute path names
*24*


0 root@pserver13:~ # getfattr -R -d -e hex -m "trusted.afr." 
/mnt/gluster/brick?/storage | grep -v 0x | grep 
-B1  trusted

getfattr: Removing leading '/' from absolute path names
# file: mnt/gluster/brick0/storage/de-dc1-c1-pserver5-33
trusted.afr.storage0-client-4=0x2701
--
# file: mnt/gluster/brick0/storage/de-dc1-c1-pserver5-26
trusted.afr.storage0-client-4=0x2701
--
# file: 
mnt/gluster/brick0/storage/images/1959/cd55c5f3-9aa1-bfd9-99a0-01c13a7d8559/hdd-images

trusted.afr.storage0-client-4=0x00160001
--
# file: mnt/gluster/brick0/storage/de-dc1-c1-pserver5-24
trusted.afr.storage0-client-4=0x2701
--
# file: mnt/gluster/brick0/storage/de-dc1-c1-pserver5-8
trusted.afr.storage0-client-4=0x2701
--
# file: mnt/gluster/brick0/storage/de-dc1-c1-pserver5-21
trusted.afr.storage0-client-4=0x2701
--
# file: mnt/gluster/brick0/storage/de-dc1-c1-pserver5-22
trusted.afr.storage0-client-4=0x2701
--
# file: mnt/gluster/brick0/storage/de-dc1-c1-pserver5-30
trusted.afr.storage0-client-4=0x2701
--
# file: mnt/gluster/brick0/storage/de-dc1-c1-pserver5-20
trusted.afr.storage0-client-4=0x2701
--
# file: mnt/gluster/brick0/storage/de-dc1-c1-pserver5-9
trusted.afr.storage0-client-4=0x2701
--
# file: mnt/gluster/brick0/storage/de-dc1-c1-pserver5-38
trusted.afr.storage0-client-4=0x2701
--
# file: mnt/gluster/brick1/storage/de-dc1-c1-pserver5-18
trusted.afr.storage0-client-6=0x2701
--
# file: mnt/gluster/brick1/storage/de-dc1-c1-pserver5-2
trusted.afr.storage0-client-6=0x2701
--
# file: mnt/gluster/brick1/storage/de-dc1-c1-pserver5-23
trusted.afr.storage0-client-6=0x2701
--
# file: mnt/gluster/brick1/storage/de-dc1-c1-pserver5-4
trusted.afr.storage0-client-6=0x2701
--
# file: mnt/gluster/brick1/storage/de-dc1-c1-pserver5-3
trusted.afr.storage0-client-6=0x2701
--
# file: mnt/gluster/brick1/storage/de-dc1-c1-pserver5-34
trusted.afr.storage0-client-6=0x2701
--
# file: mnt/gluster/brick1/storage/de-

[Gluster-users] How to preserve log files when servers are restarted?

2011-04-28 Thread Martin Schenker

Hi all!

Is the a better way to save the logfiles for glusterd and glusterfs 
processes than manually copy them away before restarting a server?

/
"gluster volume log rotate/ " just works on the brick storage 
log, no other files seem to be addressable?!


This makes error tracking a bit difficult when the logs are gone...

Thanks, Martin
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] "Self heal" or sync issues?

2011-04-28 Thread Martin Schenker
With millions of files on a system this is a HUGE overhead. Running the 
getfattr command just for the mismatched files and using this as a 
source for triggering self-heal/sync might be a better option and less 
costly... but you might see files in the process of being sync'd as well.


I'd only trigger a repair after a certain time has passed and nothing 
has happened.


I'm still puzzled what caused the mismatch and why it didn't get 
repaired on it's own?


Any ideas?

Best, Martin



Am 28.04.2011 16:48, schrieb Whit Blauvelt:

On Thu, Apr 28, 2011 at 04:16:51PM +0200, Martin Schenker wrote:

   

After triggering manually with "touch" using the right *CLIENT*
mount points, the self-heal/sync function worked fine. I was using
the server mounts before, as shown by the getfattr output. Not
good...

Now the question remains WHY the Gluster system didn't do anything
on it's own? Is this a "healthy" situation and we shouldn't worry?
 

Would it be good practice to regularly run a script to trigger any
self-healing that might be necessary - or to test if necessary (how?) and
then run on that condition? It would be easy, for instance, to use Python's
os.walk function to run through and touch - or whatever - every file in the
space. That adds a non-trivial load to a system, but for systems with load
to spare, would running that say every hour, or every day, but good
practice?

Whit
   


___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] "Self heal" or sync issues?

2011-04-28 Thread Martin Schenker

Hi all!

Thanks for the pointers!

After triggering manually with "touch" using the right *CLIENT* mount 
points, the self-heal/sync function worked fine. I was using the server 
mounts before, as shown by the getfattr output. Not good...


Now the question remains WHY the Gluster system didn't do anything on 
it's own? Is this a "healthy" situation and we shouldn't worry?


Best, Martin




___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


[Gluster-users] "Self heal" or sync issues?

2011-04-28 Thread Martin Schenker

Hi all!

We're running a 4 node cluster on Gluster 3.1.3 currently.

After a staged server update/reboot only ONE of the 4 servers shows some 
mismatches in the file attributes. It shows that 28 files differ from 
/0x/ the "all-in-sync" state. No sync or self 
heal has happened within the last 16h, we checked last night, this 
morning and now.


Even after opening each file with /od -c  | head -2/ the 
self-heal/sync process doesn't seem to start.


There are no errors in the logs, how can I check that things ARE 
happening correctly?


/0 root@de-dc1-c1-pserver5:~ # getfattr -R -d -e hex -m "trusted.afr." 
/mnt/gluster/brick?/storage | grep -v 0x | grep 
-A1 -B1 trusted

getfattr: Removing leading '/' from absolute path names

# file: 
mnt/gluster/brick0/storage/images/1831/db88d55e-3282-c7c6-d1dd-ec41a665011f/hdd-images/8987

trusted.afr.storage0-client-0=0x0100
--
# file: 
mnt/gluster/brick0/storage/images/1831/92f63f17-eb6c-8dba-2b9c-2e9cc52a8b2c/hdd-images/8786

trusted.afr.storage0-client-0=0x0100
--
# file: 
mnt/gluster/brick0/storage/images/1831/6ae6c5eb-e6e2-4dfe-7bb3-75c622910f27/hdd-images/9113

trusted.afr.storage0-client-0=0x0100
--
# file: 
mnt/gluster/brick0/storage/images/1828/4e5fd475-19b3-c9a7-1ad0-e4da528e6dbd/iso-images/11091

trusted.afr.storage0-client-0=0x0200
--
# file: 
mnt/gluster/brick0/storage/images/1853/3df576b8-4206-e45a-33d7-433d56b700f0/iso-images/1957

trusted.afr.storage0-client-0=0x0200

# file: 
mnt/gluster/brick0/storage/images/1853/3df576b8-4206-e45a-33d7-433d56b700f0/iso-images/1960

trusted.afr.storage0-client-0=0x0200
--
# file: 
mnt/gluster/brick0/storage/images/2003/5e9b2bdc-a158-796f-6d81-60f39aee5137/hdd-images/9772

trusted.afr.storage0-client-0=0x0200
--
# file: 
mnt/gluster/brick0/storage/images/1962/5c4cd738-bb56-d723-f001-0428e55ea81b/iso-images/8110

trusted.afr.storage0-client-0=0x0200
--
# file: 
mnt/gluster/brick0/storage/images/1787/a190f24c-ed40-5642-4226-f00726dfc99f/iso-images/9837

trusted.afr.storage0-client-0=0x0200
--
# file: 
mnt/gluster/brick0/storage/images/1787/ad8179fd-4f00-086c-955f-c2e469809e64/iso-images/2854

trusted.afr.storage0-client-0=0x0200

# file: 
mnt/gluster/brick0/storage/images/1787/ad8179fd-4f00-086c-955f-c2e469809e64/iso-images/10703

trusted.afr.storage0-client-0=0x0200
--
# file: 
mnt/gluster/brick0/storage/images/1787/1782da17-059c-d159-373a-9ad9f5f9289f/iso-images/8519

trusted.afr.storage0-client-0=0x0200
--
# file: 
mnt/gluster/brick0/storage/images/1787/aed64fba-2372-6f06-0690-be46136464a0/iso-images/10258

trusted.afr.storage0-client-0=0x0200

# file: 
mnt/gluster/brick0/storage/images/1787/aed64fba-2372-6f06-0690-be46136464a0/iso-images/10452

trusted.afr.storage0-client-0=0x0200
--
# file: 
mnt/gluster/brick0/storage/images/1787/cd6089d7-c2cd-a5e1-2130-770fe028b5e3/iso-images/10511

trusted.afr.storage0-client-0=0x0200
--
# file: 
mnt/gluster/brick0/storage/images/1834/02d04e16-40db-d244-aa4e-3e53cfaa2405/iso-images/504

trusted.afr.storage0-client-0=0x0200
--
# file: 
mnt/gluster/brick0/storage/images/1978/21527903-ca4e-4715-b40b-30c150f86d44/iso-images/9275

trusted.afr.storage0-client-0=0x0100

# file: 
mnt/gluster/brick0/storage/images/1978/21527903-ca4e-4715-b40b-30c150f86d44/iso-images/9511

trusted.afr.storage0-client-0=0x0100
--
# file: 
mnt/gluster/brick1/storage/images/1828/fc701d50-0b29-7827-89c0-77134ba96205/iso-images/9442

trusted.afr.storage0-client-2=0x0200
--
# file: 
mnt/gluster/brick1/storage/images/1878/875ed0c0-38b3-4552-7f1f-49a619996e5c/hdd-images/5758

trusted.afr.storage0-client-3=0x
--
# file: 
mnt/gluster/brick1/storage/images/1787/ad8179fd-4f00-086c-955f-c2e469809e64/iso-images/2857

trusted.afr.storage0-client-2=0x0200

# file: 
mnt/gluster/brick1/storage/images/1787/ad8179fd-4f00-086c-955f-c2e469809e64/iso-images/10773

trusted.afr.storage0-client-2=0x0200

# file: 
mnt/gluster/brick1/storage/images/1787/ad8179fd-4f00-086c-955f-c2e469809e64/iso-images/10648

trusted.afr.storage0-client-2=0x0200
--
# file: 
mnt/gluster/brick1/storage/images/2003/8d7880ff-e7b2-3996-3fa3-ddb8022ca403/iso-images/9979

trusted.afr.storage0-client-2=0x0200
--
# file: 
mnt/gluster/brick1/storage/images/2003/116a4a8f-c8c2-6f70-1256-b29477d65e72/iso-images/10587

trusted.afr.storage0-client-2=0x0200
--
# file: 
mnt/gluster/brick1/storage/images/1956/ff4bbfd7-3b1a-00da-c901-c35cd967b600/iso-images/6815

trusted.afr.storage0-client-2=0x0200
--
# fil

Re: [Gluster-users] "Gluster volume show" function?

2011-04-27 Thread Martin Schenker

Hmm, while looking at

# gluster volume info all

Volume Name: storage0
Type: Distributed-Replicate
Status: Started
Number of Bricks: 4 x 2 = 8
Transport-type: rdma
Bricks:
Brick1: de-dc1-c1-pserver3:/mnt/gluster/brick0/storage
Brick2: de-dc1-c1-pserver5:/mnt/gluster/brick0/storage
Brick3: de-dc1-c1-pserver3:/mnt/gluster/brick1/storage
Brick4: de-dc1-c1-pserver5:/mnt/gluster/brick1/storage
Brick5: de-dc1-c1-pserver12:/mnt/gluster/brick0/storage
Brick6: de-dc1-c1-pserver13:/mnt/gluster/brick0/storage
Brick7: de-dc1-c1-pserver12:/mnt/gluster/brick1/storage
Brick8: de-dc1-c1-pserver13:/mnt/gluster/brick1/storage
Options Reconfigured:
*network.ping-timeout: 5
nfs.disable: on
performance.cache-size: 4096MB
*
I guess the lower bit (Options Reconfigured:) is showing all the used 
options for the volume setup, right?!?


So question answered...

Best, Martin



Hi all!

Is there a quick way to figure out what volume options have been used 
to set up a Gluster volume?


http://gluster.com/community/documentation/index.php/Gluster_3.1:_Setting_Volume_Options 



gives me the list of options but there seems to be now way to check 
what WAS set already?!? Or am I just looking in the wrong places?!?


I've "inherited" a running system with some performance/stability 
issues and I'm trying to peer under the hood...


Best, Martin



___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


[Gluster-users] "Gluster volume show" function?

2011-04-27 Thread Martin Schenker

Hi all!

Is there a quick way to figure out what volume options have been used to 
set up a Gluster volume?


http://gluster.com/community/documentation/index.php/Gluster_3.1:_Setting_Volume_Options

gives me the list of options but there seems to be now way to check what 
WAS set already?!? Or am I just looking in the wrong places?!?


I've "inherited" a running system with some performance/stability issues 
and I'm trying to peer under the hood...


Best, Martin

___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


[Gluster-users] Does gluster make use of a multicore setup? Hardware recs.?

2011-04-27 Thread Martin Schenker

Hi all!

I'm new to the Gluster system and tried to find answers to some simple 
questions (and couldn't find the information with Google etc.)


-does Gluster spread it's cpu load across a multicore environment? So 
does it make sense to have 50 core units as Gluster server? CPU loads 
seem to go up quite high during file system repairs so spreading / 
multithreading should help? What kind of CPUs are working well? How much 
memory does help the preformance?


-Are there any recommendations for commodity hardware? We're thinking of 
36 slot 4U servers, what kind of controllers DO work well for IO speed? 
Any real life experiences? Does it dramatically improve the performance 
to increase the number of controllers per disk?


The aim is for  a ~80-120T file system with 2-3 bricks.

Thanks for any feedback!

Best, Martin
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users