Re: [Gluster-users] Need help in understanding volume heal-info behavior

2014-04-28 Thread Chalcogen

Thank you very much!

On Monday 28 April 2014 07:41 AM, Ravishankar N wrote:

On 04/28/2014 01:30 AM, Chalcogen wrote:

Hi everyone,

I have trouble understanding the following behavior:

Suppose I have a replica 2 volume 'testvol' on two servers, server1 
and server2, composed of server1:/bricks/testvol/brick and 
server2:/bricks/testvol/brick. Also, suppose it contains a good 
number of files.


Now, assume I remove one of the two bricks, as:

root@server1~# gluster volume remove-brick testvol replica 1 
server1:/bricks/testvol/brick


Now, I unmount and delete the logical volume supporting the brick and 
then recreate it (with a different size), and mount it the same way 
as it was mounted before (at /brick/testvol/). Then, I re-add it as:


root@server1~# gluster volume add-brick testvol replica 2 
server1:/bricks/testvol/brick


I observe that the brick on server1 does not contain any of the data 
that was in the volume.


root@server1~# ls /bricks/testvol/brick
root@server1~#

This is all right by me, since glusterfs needs some time to discover 
and sync files that are absent on the brick of server1. In fact, if I 
leave the setup undisturbed for 15 minutes to half an hour, I find 
that all data appears within the brick of server1, just as you would 
expect. Also, if I wish to speed up the process, I simply do a ls -Ra 
on the directory where the volume is mounted, and all files sync onto 
server1's brick. This is also very much as expected.


However, during the period where data on server1's brick is not 
available, if you query the heal info for the volume, gluster cli 
reports that 'Number of entries' is '0', and that too all of 'info', 
'heal-failed', and 'split-brain'. This is what becomes a bit of a 
trouble for me. Fact is, we are attempting to automate the monitoring 
of our glusterfs volumes, and we depend upon heal info alone to 
decide whether data on server1 and server2 are in sync.


Could somebody, therefore, help me with the following questions?
a) Which files exactly show up in heal info?
The files which are healed either by the self-heal daemon or by the 
gluster heal commands.
b) What exactly should I look to monitor if we are to ascertain that 
data on our servers are in sync?


After adding a new replica brick, you need to run a full heal (gluster 
volume heal vol-name full). Then the results will show up in the 
heal info output.

Thanks a lot for your responses!

Anirban

P.s. I am using glusterfs 3.4.2 over linux kernel version 2.6.34.



___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users




___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Command /etc/init.d/glusterd start failed

2014-04-19 Thread Chalcogen
I have been plagued by errors of this kind every so often, mainly 
because we are in a development phase and we reboot our servers so 
frequently. If you start glusterd in debug mode:


sh$ glusterd --debug

you can easily pinpoint exactly which volume/peer data is causing the 
initialization failure for mgmt/glusterd.


In addition, from my own experiences, two of the leading reasons for 
failure include:
a) Bad peer data if glusterd is somehow killed during an active peer 
probe operation, and
b) I have noticed that if glusterd needs to update info for volume/brick 
(say info for volume testvol) in /var/lib/glusterd, it first renames 
/var/lib/glusterd/vols/testvol/info to info.tmp, and then creates a new 
file info, which is probably written into _freshly_. If glusterd were to 
crash at this point, it would cause failures in glusterd startup till 
this is manually resolved. Usually, moving info.tmp into info works for me.


Thanks,
Anirban

On Saturday 12 April 2014 08:45 AM, 吴保川 wrote:

It is tcp.

[root@server1 wbc]# gluster volume info

Volume Name: gv_replica
Type: Replicate
Volume ID: 81014863-ee59-409b-8897-6485d411d14d
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 192.168.1.3:/home/wbc/vdir/gv_replica
Brick2: 192.168.1.4:/home/wbc/vdir/gv_replica

Volume Name: gv1
Type: Distribute
Volume ID: cfe2b8a0-284b-489d-a153-21182933f266
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: 192.168.1.4:/home/wbc/vdir/gv1
Brick2: 192.168.1.3:/home/wbc/vdir/gv1

Thanks,
Baochuan Wu



2014-04-12 10:11 GMT+08:00 Nagaprasad Sathyanarayana 
nsath...@redhat.com mailto:nsath...@redhat.com:


If you run

# gluster volume info

What is the value set for transport-type?

Thanks
Naga


On 12-Apr-2014, at 7:33 am, 吴保川 wildpointe...@gmail.com
mailto:wildpointe...@gmail.com wrote:


Thanks, Joe. I found one of my machine has been assigned wrong IP
address. This leads to the error.
Originally, I thought the following error is critical:
[2014-04-11 18:12:03.433371] E
[rpc-transport.c:269:rpc_transport_load] 0-rpc-transport:
/usr/local/lib/glusterfs/3.4.3/rpc-transport/rdma.so: cannot open
shared object file: No such file or directory


2014-04-12 5:34 GMT+08:00 Joe Julian j...@julianfamily.org
mailto:j...@julianfamily.org:

On 04/11/2014 11:18 AM, 吴保川 wrote:

[2014-04-11 18:12:05.165989] E
[glusterd-store.c:2663:glusterd_resolve_all_bricks]
0-glusterd: resolve brick failed in restore

I'm pretty sure that means that one of the bricks isn't
resolved in your list of peers.


___
Gluster-users mailing list
Gluster-users@gluster.org mailto:Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users





___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] One node goes offline, the other node loses its connection to its local Gluster volume

2014-02-23 Thread Chalcogen
I'm not from the glusterfs development team or anything, but I, too 
started with glusterfs somewhere around the time frame you mention, and 
also work with a twin-replicated setup just like yours.


When I do what you indicate here on my setup, the command initially 
hangs, and on both servers for about as long as the peer ping timeout 
thing (which is defaulted at 48 secs or so). After that it works.


If we can see new bugs in this setup then I would be interested, in part 
because the stability of my product depends upon this, too. Do you think 
you could share your glulster volume info and gluster volume status?


Also, what did heal info say before you performed this exercise?

Thanks,
Anirban

On Sunday 23 February 2014 07:14 AM, Greg Scott wrote:


We first went down this path back in July 2013 and now I'm back again 
for more.  It's a similar situation but now with new versions of 
everything.   I'm using glusterfs 3.4.2 with Fedora 20.


I have 2 nodes named fw1 and fw2.  When I ifdown the NIC I'm using for 
Gluster on either node, that node cannot see  its Gluster volume, but 
the other node can see it after a timeout.  As soon as I ifup that 
NIC, everyone can see everything again.


Is this expected behavior?  When that interconnect drops, I want both 
nodes to see their own local copy and then sync everything back up 
when the interconnect connects again.


Here are details.  Node fw1 has an XFS filesystem named gluster-fw1.  
Node fw2 has an XFS filesystem named gluster-fw2.   Those are both 
gluster bricks and both nodes mount the bricks as /firewall-scripts.  
So anything one node does in /firewall-scripts should also be on the 
other node within a few milliseconds.   The test is to isolate the 
nodes from each other and see if they can still access their own local 
copy of /firewall-scripts.  The easiest way to do this is to ifdown 
the interconnect NIC.  But this doesn't work.


Here is what happens when I ifdown the NIC on node fw1.  Node fw2 can 
see /firewall-scripts but fw1 shows an error.  When I ifdown on fw2, 
the behavior is identical, but swapping fw1 and fw2.


On fw1, after an ifdown  I lose connection with my Gluster filesystem.

[root@stylmark-fw1 firewall-scripts]# ifdown enp5s4

[root@stylmark-fw1 firewall-scripts]# ls /firewall-scripts

ls: cannot access /firewall-scripts: Transport endpoint is not connected

[root@stylmark-fw1 firewall-scripts]# df -h

df: â/firewall-scriptsâ: Transport endpoint is not connected

Filesystem   Size  Used Avail Use% Mounted on

/dev/mapper/fedora-root   17G 2.2G   14G  14% /

devtmpfs 989M 0  989M   0% /dev

tmpfs996M   0  996M   0% /dev/shm

tmpfs996M 564K  996M   1% /run

tmpfs996M 0  996M   0% /sys/fs/cgroup

tmpfs996M 0  996M   0% /tmp

/dev/sda2477M 87M  362M  20% /boot

/dev/sda1200M 9.6M  191M   5% /boot/efi

/dev/mapper/fedora-gluster--fw1  9.8G 33M  9.8G   1% /gluster-fw1

10.10.10.2:/fwmaster 214G 75G  128G  37% /mnt/fwmaster

[root@stylmark-fw1 firewall-scripts]#

But on fw2, I can still look at it:

[root@stylmark-fw2 ~]# ls /firewall-scripts

allow-all   failover-monitor.sh rcfirewall.conf

allow-all-with-nat  initial_rc.firewall start-failover-monitor.sh

etc rc.firewall var

[root@stylmark-fw2 ~]#

[root@stylmark-fw2 ~]#

[root@stylmark-fw2 ~]# df -h

Filesystem   Size  Used Avail Use% Mounted on

/dev/mapper/fedora-root   17G 2.3G   14G  14% /

devtmpfs 989M  0  989M   0% /dev

tmpfs996M 0  996M   0% /dev/shm

tmpfs996M 560K  996M   1% /run

tmpfs996M 0  996M   0% /sys/fs/cgroup

tmpfs996M 0  996M   0% /tmp

/dev/sda2477M 87M  362M  20% /boot

/dev/sda1200M 9.6M  191M   5% /boot/efi

/dev/mapper/fedora-gluster--fw2  9.8G 33M  9.8G   1% /gluster-fw2

192.168.253.2:/firewall-scripts  9.8G 33M  9.8G   1% /firewall-scripts

10.10.10.2:/fwmaster 214G 75G  128G  37% /mnt/fwmaster

[root@stylmark-fw2 ~]#

And back to fw1 -- after an ifup, I can see it again:

[root@stylmark-fw1 firewall-scripts]# ifup enp5s4

[root@stylmark-fw1 firewall-scripts]#

[root@stylmark-fw1 firewall-scripts]# ls /firewall-scripts

allow-all   failover-monitor.sh rcfirewall.conf

allow-all-with-nat  initial_rc.firewall start-failover-monitor.sh

etc rc.firewall var

[root@stylmark-fw1 firewall-scripts]# df -h

Filesystem   Size  Used Avail Use% Mounted on

/dev/mapper/fedora-root   17G 2.2G   14G  14% /

devtmpfs 989M 0  989M   0% /dev

tmpfs996M  0  996M   0% /dev/shm

tmpfs

[Gluster-users] Failed cleanup on peer probe tmp file causes volume re-initialization problems

2014-02-20 Thread Chalcogen

Hi everybody,

This is more of a part of a larger wishlist:

I found out that when a peer probe is performed by the user, 
mgmt/glusterd write a file named after the hostname of the peer in 
question. On successful probes, this file is replaced with a file named 
after the UUID of the glusterd instance on the peer, while a failed 
probe causes the temp file to simply get deleted.


Here's an illustration:

root@someserver:/var/lib/glusterd/peers] gluster peer probe some_non_host 
[1] 25918
root@someserver:/var/lib/glusterd/peers] cat some_non_host
uuid=----
state=0
hostname1=ksome_non_host
root@someserver:/var/lib/glusterd/peers]
root@someserver:/var/lib/glusterd/peers] peer probe: failed: Probe 
returned with unknown errno 107


[1]+  Exit 1  gluster peer probe some_non_host
root@someserver:/var/lib/glusterd/peers] ls
root@someserver:/var/lib/glusterd/peers]

Here's the deal. When, for some reason, glulsterd is killed off before 
it get a chance to clean up on the temp file (say for a peer that really 
doesn't exist), and then, if you reboot your machine, the temporary file 
will really break mgmt/glusterd's recovery graph, and glusterd will be 
unable to initialize any of the existing volumes without having to 
delete the tmp file manually.


It seems to me that mgmt/glusterd should have the intelligence to 
distinguish between a genuine peer and a temp file created during probe. 
The temp file should not affect the recovery graph after reboot. 
Something like a peer-name.tmp? Preferably, also delete any temp file 
discovered during recovery at startup?


I reported a bug over this at bugzilla. Its 
https://bugzilla.redhat.com/show_bug.cgi?id=1067733.


Thanks,
Anirban
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] File (setuid) permission changes during volume heal - possible bug?

2014-01-27 Thread Chalcogen

Hi,

I am working on a twin-replicated setup (server1 and server2) with 
glusterfs 3.4.0. I perform the following steps:


1. Create a distributed volume 'testvol' with the XFS brick
   server1:/brick/testvol on server1, and mount it using the glusterfs
   native client at /testvol.

2. I copy the following file to /testvol:
   server1:~$ ls -l /bin/su
   -rw*s*r-xr-x 1 root root 84742 Jan 17  2014 /bin/su
   server1:~$ cp -a /bin/su /testvol

3. Within /testvol if I list out the file I just copied, I find its
   attributes intact.

4. Now, I add the XFS brick server2:/brick/testvol.
   server2:~$ gluster volume add-brick testvol replica 2
   server2:/brick/testvol

   At this point, heal kicks in and the file is replicated on server 2.

5. If I list out su in testvol on either server now, now, this is what
   I see.
   server1:~$ ls -l /testvol/su
   -rw*s*r-xr-x 1 root root 84742 Jan 17  2014 /bin/su

   server2:~$ ls -l /testvol/su
   -rw*x*r-xr-x 1 root root 84742 Jan 17  2014 /bin/su

That is, the 's' file mode gets changed to plain 'x' - meaning, all the 
attributes are not preserved upon heal completion. Would you consider 
this a bug? Is the behavior different on a higher release?


Thanks a lot.
Anirban
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Passing noforget option to glusterfs native client mounts

2013-12-18 Thread Chalcogen

Hi everybody,

A few months back I joined a project where people want to replace their 
legacy fuse-based (twin-server) replicated file-system with GlusterFS. 
They also have a high-availability NFS server code tagged with the 
kernel NFSD that they would wish to retain (the nfs-kernel-server, I 
mean). The reason they wish to retain the kernel NFS and not use the NFS 
server that comes with GlusterFS is mainly because there's this bit of 
code that allows NFS IP's to be migrated from one host server to the 
other in the case that one happens to go down, and tweaks on the export 
server configuration allow the file-handles to remain identical on the 
new host server.


The solution was to mount gluster volumes using the mount.glusterfs 
native client program and then export the directories over the kernel 
NFS server. This seems to work most of the time, but on rare occasions, 
'stale file handle' is reported off certain clients, which really puts a 
damper over the 'high-availability' thing. After suitably instrumenting 
the nfsd/fuse code in the kernel, it seems that decoding of the 
file-handle fails on the server because the inode record corresponding 
to the nodeid in the handle cannot be looked up. Combining this with the 
fact that a second attempt by the client to execute lookup on the same 
file passes, one might suspect that the problem is identical to what 
many people attempting to export fuse mounts over the kernel's NFS 
server are facing; viz, fuse 'forgets' the inode records thereby causing 
ilookup5() to fail. Miklos and other fuse developers/hackers would point 
towards '-o noforget' while mounting their fuse file-systems.


I tried passing  '-o noforget' to mount.glusterfs, but it does not seem 
to recognize it. Could somebody help me out with the correct syntax to 
pass noforget to gluster volumes? Or, something we could pass to 
glusterfs that would instruct fuse to allocate a bigger cache for our 
inodes?


Additionally, should you think that something else might be behind our 
problems, please do let me know.


Here's my configuration:

Linux kernel version: 2.6.34.12
GlusterFS versionn: 3.4.0
nfs.disable option for volumes: OFF on all volumes

Thanks a lot for your time!
Anirban

P.s. I found quite a few pages on the web that admonish users that 
GlusterFS is not compatible with the kernel NFS server, but do not 
really give much detail. Is this one of the reasons for saying so?
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Passing noforget option to glusterfs native client mounts

2013-12-18 Thread Chalcogen

P.s. I think I need to clarify this:

I am only reading from the mounts, and not modifying anything on the 
server. and so the commonest causes on stale file handles do not appy.


Anirban

On Thursday 19 December 2013 01:16 AM, Chalcogen wrote:

Hi everybody,

A few months back I joined a project where people want to replace 
their legacy fuse-based (twin-server) replicated file-system with 
GlusterFS. They also have a high-availability NFS server code tagged 
with the kernel NFSD that they would wish to retain (the 
nfs-kernel-server, I mean). The reason they wish to retain the kernel 
NFS and not use the NFS server that comes with GlusterFS is mainly 
because there's this bit of code that allows NFS IP's to be migrated 
from one host server to the other in the case that one happens to go 
down, and tweaks on the export server configuration allow the 
file-handles to remain identical on the new host server.


The solution was to mount gluster volumes using the mount.glusterfs 
native client program and then export the directories over the kernel 
NFS server. This seems to work most of the time, but on rare 
occasions, 'stale file handle' is reported off certain clients, which 
really puts a damper over the 'high-availability' thing. After 
suitably instrumenting the nfsd/fuse code in the kernel, it seems that 
decoding of the file-handle fails on the server because the inode 
record corresponding to the nodeid in the handle cannot be looked up. 
Combining this with the fact that a second attempt by the client to 
execute lookup on the same file passes, one might suspect that the 
problem is identical to what many people attempting to export fuse 
mounts over the kernel's NFS server are facing; viz, fuse 'forgets' 
the inode records thereby causing ilookup5() to fail. Miklos and other 
fuse developers/hackers would point towards '-o noforget' while 
mounting their fuse file-systems.


I tried passing  '-o noforget' to mount.glusterfs, but it does not 
seem to recognize it. Could somebody help me out with the correct 
syntax to pass noforget to gluster volumes? Or, something we could 
pass to glusterfs that would instruct fuse to allocate a bigger cache 
for our inodes?


Additionally, should you think that something else might be behind our 
problems, please do let me know.


Here's my configuration:

Linux kernel version: 2.6.34.12
GlusterFS versionn: 3.4.0
nfs.disable option for volumes: OFF on all volumes

Thanks a lot for your time!
Anirban

P.s. I found quite a few pages on the web that admonish users that 
GlusterFS is not compatible with the kernel NFS server, but do not 
really give much detail. Is this one of the reasons for saying so?


___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users