Re: [Gluster-users] Split-brain seen with [0 0] pending matrix and io-cache page errors

2014-10-20 Thread Anirban Ghoshal
Ok, no problem. The issue is very rare, even with our setup - we have seen it 
only once on one site even though we have been in production for several months 
now. For now, we can live with that IMO. 

And, thanks again. 

Anirban___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Split-brain seen with [0 0] pending matrix and io-cache page errors

2014-10-19 Thread Anirban Ghoshal
It is possible, yes, because these are actually a kind of log files. I suppose, 
like other logging frameworks these files an remain open for a considerable 
period, and then get renamed to support log rotate semantics. 

That said, I might need to check with the team that actually manages the 
logging framework to be sure. I only take care of the file-system stuff. I can 
tell you for sure Monday. 

If it is the same race that you mention, is there a fix for it?

Thanks,
Anirban___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Split-brain seen with [0 0] pending matrix and io-cache page errors

2014-10-19 Thread Anirban Ghoshal
I see. Thanks a tonne for the thorough explanation! :) I can see that our setup 
would be vulnerable here because the logger on one server is not generally 
aware of the state of the replica on the other server. So, it is possible that 
the log files may have been renamed before heal had a chance to kick in. 

Could I also request you for the bug ID (should there be one) against which you 
are coding up the fix, so that we could get a notification once it is passed?

Also, as an aside, is O_DIRECT supposed to prevent this from occurring if one 
were to make allowance for the performance hit? 

Thanks again,
Anirban___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Split-brain seen with [0 0] pending matrix and io-cache page errors

2014-10-18 Thread Anirban Ghoshal
Hi,

Yes, they do, and considerably. I#39;d forgotten to mention that on my last 
email. Their mtimes, however, as far as i could tell on separate servers, 
seemed to coincide. 

Thanks,
Anirban___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] NFS not start on localhost

2014-10-18 Thread Anirban Ghoshal
Maybe share the last 15-20 lines of you /var/log/glusterfs/nfs.log for the 
consideration of everyone on the list? Thanks. ___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] NFS not start on localhost

2014-10-17 Thread Anirban Ghoshal
It happens with me sometimes. Try `tail -n 20 /var/log/glusterfs/nfs.log`. You 
will probably find something out that will help your cause. In general, if you 
just wish to start the thing up without going into the why of it, try `gluster 
volume set engine nfs.disable on` followed by `gluster volume set engine 
nfs.disable off`. It does the trick quite often for me because it is a polite 
way to askmgmt/glusterd to try and respawn the nfs server process if need be. 
But, keep in mind that this will call a (albeit small) service interruption to 
all clients accessing volume engine over nfs.

Thanks, 
Anirban


On Saturday, 18 October 2014 1:03 AM, Demeter Tibor tdeme...@itsmart.hu wrote:
 




Hi,

I have make a glusterfs with nfs support.

I don't know why, but after a reboot the nfs does not listen on localhost, only 
on gs01.


[root@node0 ~]# gluster volume info engine

Volume Name: engine
Type: Replicate
Volume ID: 2ea009bf-c740-492e-956d-e1bca76a0bd3
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: gs00.itsmart.cloud:/gluster/engine0
Brick2: gs01.itsmart.cloud:/gluster/engine1
Options Reconfigured:
storage.owner-uid: 36
storage.owner-gid: 36
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: enable
cluster.quorum-type: auto
cluster.server-quorum-type: server
auth.allow: *
nfs.disable: off

[root@node0 ~]# gluster volume status engine
Status of volume: engine
Gluster process Port Online Pid
--
Brick gs00.itsmart.cloud:/gluster/engine0 50158 Y 3250
Brick gs01.itsmart.cloud:/gluster/engine1 50158 Y 5518
NFS Server on localhost N/A N N/A
Self-heal Daemon on localhost N/A Y 3261
NFS Server on gs01.itsmart.cloud 2049 Y 5216
Self-heal Daemon on gs01.itsmart.cloud N/A Y 5223



Does anybody help me?

Thanks in advance.

Tibor
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Split-brain seen with [0 0] pending matrix and io-cache page errors

2014-10-17 Thread Anirban Ghoshal
Hi everyone,

I have this really confusing split-brain here that's bothering me. I am running 
glusterfs 3.4.2 over linux 2.6.34. I have a replica 2 volume 'testvol' that is 
It seems I cannot read/stat/edit the file in question, and `gluster volume heal 
testvol info split-brain` shows nothing. Here are the logs from the fuse-mount 
for the volume:

[2014-09-29 07:53:02.867111] W [fuse-bridge.c:1172:fuse_err_cbk] 
0-glusterfs-fuse: 4560969: FLUSH() ERR = -1 (Input/output error) 
[2014-09-29 07:54:16.007799] W [page.c:991:__ioc_page_error] 
0-testvol-io-cache: page error for page = 0x7fd5c8529d20  waitq = 
0x7fd5c8067d40 
[2014-09-29 07:54:16.007854] W [fuse-bridge.c:2089:fuse_readv_cbk] 
0-glusterfs-fuse: 4561103: READ = -1 (Input/output error) 
[2014-09-29 07:54:16.008018] W [page.c:991:__ioc_page_error] 
0-testvol-io-cache: page error for page = 0x7fd5c8607ee0  waitq = 
0x7fd5c8067d40 
[2014-09-29 07:54:16.008056] W [fuse-bridge.c:2089:fuse_readv_cbk] 
0-glusterfs-fuse: 4561104: READ = -1 (Input/output error) 
[2014-09-29 07:54:16.008233] W [page.c:991:__ioc_page_error] 
0-testvol-io-cache: page error for page = 0x7fd5c8066f30  waitq = 
0x7fd5c8067d40 
[2014-09-29 07:54:16.008269] W [fuse-bridge.c:2089:fuse_readv_cbk] 
0-glusterfs-fuse: 4561105: READ = -1 (Input/output error) 
[2014-09-29 07:54:16.008800] W [page.c:991:__ioc_page_error] 
0-testvol-io-cache: page error for page = 0x7fd5c860bcf0  waitq = 
0x7fd5c863b1f0 
[2014-09-29 07:54:16.008839] W [fuse-bridge.c:2089:fuse_readv_cbk] 
0-glusterfs-fuse: 4561107: READ = -1 (Input/output error) 
[2014-09-29 07:54:16.009365] W [page.c:991:__ioc_page_error] 
0-testvol-io-cache: page error for page = 0x7fd5c85fd120  waitq = 
0x7fd5c8067d40 
[2014-09-29 07:54:16.009413] W [fuse-bridge.c:2089:fuse_readv_cbk] 
0-glusterfs-fuse: 4561109: READ = -1 (Input/output error) 
[2014-09-29 07:54:16.040549] W [afr-open.c:213:afr_open] 0-testvol-replicate-0: 
failed to open as split brain seen, returning EIO 
[2014-09-29 07:54:16.040594] W [fuse-bridge.c:915:fuse_fd_cbk] 
0-glusterfs-fuse: 4561142: OPEN() 
/SECLOG/20140908.d/SECLOG_00427425_.log = -1 
(Input/output error)


Could somebody please give me some clue on where to begin? I checked the xattrs 
on /SECLOG/20140908.d/SECLOG_00427425_.log and 
it seems the changelogs are [0, 0] on both replicas, and the gfid's match.

Thank you very much for any help on this.
Anirban___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Need help: Mount -t glusterfs hangs

2014-09-16 Thread Anirban Ghoshal
Hello,

I am facing an intermittent issue with the mount.glusterfs program freezing up. 
Investigation reveals that it is not exactly the mount system call but the 
'stat' call from within mount.glusterfs that is actually hung. Any other 
command such as df or ls also hang. 

I opened a redhat bugzilla bug against this. It has more details on the precise 
observation and steps followed. What intrigues me is that as soon as I set a 
volume option such as one for diagnostics or maybe the statedump path, the 
problem goes away. 

Any advice would be greatly appreciated as this is causing problems with some 
server replacement procedures I am trying out. 

Thanks a lot, 
Anirban___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Need help: Mount -t glusterfs hangs

2014-09-16 Thread Anirban Ghoshal
Yes, very sorry. Actually, I kind of missed adding the bug id which made it all 
look so incomplete. Fact is, I am totally caught up in work today so am more 
absent minded than usual..

Thanks for pointing that out! As to the other details:
Bug #1141940
Glusterfs 3.4.2
Linux 2.6.34

Also, I was kind of hoping that the fact that setting a volume option sort of 
might ring a bell with someone.. Someone who knows precisely what they do..

Thanks again,
Anirban___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Enforce direct-io for volume rather than mount

2014-08-18 Thread Anirban Ghoshal
Suppose I have a replica 2 setup using XFS for my volume, and that I also 
export my files over nfs, and write access is expected.

Now if I wish to avoid split brains at all cost, even during server crashes and 
such, I am given to understand that direct-io mode would help. I know that 
there is a direct-io option with my mount.glusterfs program, but I wish for a 
direct-io enforcement for my entire volume so as to enforce uncached writes to 
the underlying XFS. 

Any ideas on how I could achieve that? 

Thanks for your replies.
Anirban___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] : On breaking the connection between replicated volumes certain files return -ENOTCONN

2014-02-06 Thread Anirban Ghoshal

We migrated to stable version 3.4.2 and confirmed that the error occurs with 
that as well. I reported this over bug 1062287.

Thanks again,
Anirban



--
On Tue 4 Feb, 2014 2:27 PM MST Anirban Ghoshal wrote:

Hi everyone,

Here's a strange issue. I am using glusterfs 3.4.0 alpha. We need to move to a 
stable version ASAP, but I am telling you this just off chance that it might 
be interesting for somebody from the glusterfs development team. Please excuse 
the sheer length of this mail, but I am new to browsing such massive code, and 
not good at presenting my ideas very clearly.


Here's a set of observations:

1. You have a replica 2 volume (testvol) on server1 and server2. You assume 
that on either server, it is also locally mounted via mount.glusterfs at 
/testvol.
2. You have a large number of soft-linked files within the volume.
3. You check heal info (all its facets) to ensure not a single file is out of 
sync (also, verify md5sum or such, if possible).
4. You abrupty take down the ethernet device over which the servers are 
conencted (ip link set eth-dev down).
5. On one of the servers (say, server1 for definiteness), if you do an 'ls -l' 
readlink returns 'Transport endpoint is not connected'.
6. The error resolves all by itself if you get the eth-link up.

Here's some additional detail:
7. The error is intermittent, and not all soft-linked files have the issue.
8. If you take a directory containing soft-linked files, and if you do a ls -l 
_on_the_directory, like so,

server1$ ls -l /testvol/somedir/bin/

ls: cannot read symbolic link /testvol/somedir/bin/reset: Transport endpoint 
is not connected
ls: cannot read symbolic link /testvol/somedir/bin/bzless: Transport endpoint 
is not connected
ls: cannot read symbolic link /testvol/somedir/bin/i386: Transport endpoint is 
not connected
ls: cannot read symbolic link /testvol/somedir/bin/kill: Transport endpoint is 
not connected
ls: cannot read symbolic link /testvol/somedir/bin/linux32: Transport endpoint 
is not connected
ls: cannot read symbolic link /testvol/somedir/bin/linux64: Transport endpoint 
is not connected
ls: cannot read symbolic link /testvol/somedir/bin/logger: Transport endpoint 
is not connected
ls: cannot read symbolic link /testvol/somedir/bin/x86_64: Transport endpoint 
is not connected
ls: cannot read symbolic link /testvol/somedir/bin/python2: Transport endpoint 
is not connected
connected


9. If, however, you take a faulty soft-link and do an ls -l on it directly, 
then it rights itself immediately.

server1$ ls -l /testvol/somedir/bin/x86_64
lrwxrwxrwx 1 root root 7 May  7 23:11 /testvol/somedir/bin/x86_64 - setarch


I tried raising the client log level to 'trace'. Here's what I saw:

Upon READLINK failures, (ls -l /testvol/somedir/bin/):

[2010-05-09 01:13:28.140265] T [fuse-bridge.c:2453:fuse_readdir_cbk] 
0-glusterfs-fuse: 2783484: READDIR = 23/4096,1380
[2010-05-09 01:13:28.140444] T [fuse-resolve.c:51:fuse_resolve_loc_touchup] 
0-fuse: return value inode_path 45
[2010-05-09 01:13:28.140477] T [fuse-bridge.c:708:fuse_getattr_resume] 
0-glusterfs-fuse: 2783485: GETATTR 140299577689176 (/testvol/somedir/bin)
[2010-05-09 01:13:28.140618] T [fuse-bridge.c:641:fuse_attr_cbk] 
0-glusterfs-fuse: 2783485: STAT() /testvol/somedir/bin = -5626802993936595428
[2010-05-09 01:13:28.140722] T [fuse-resolve.c:51:fuse_resolve_loc_touchup] 
0-fuse: return value inode_path 52
[2010-05-09 01:13:28.140737] T [fuse-bridge.c:506:fuse_lookup_resume] 
0-glusterfs-fuse: 2783486: LOOKUP 
/testvol/somedir/bin/x86_64(025d1c57-865f-4f1f-bc95-96ddcef3dc03)
[2010-05-09 01:13:28.140851] T [fuse-bridge.c:376:fuse_entry_cbk] 
0-glusterfs-fuse: 2783486: LOOKUP() /testvol/somedir/bin/x86_64 = 
-4857810743645185021
[2010-05-09 01:13:28.140954] T [fuse-resolve.c:51:fuse_resolve_loc_touchup] 
0-fuse: return value inode_path 52
[2010-05-09 01:13:28.140975] T [fuse-bridge.c:1296:fuse_readlink_resume] 
0-glusterfs-fuse: 2783487 READLINK 
/testvol/somedir/bin/x86_64/025d1c57-865f-4f1f-bc95-96ddcef3dc03 
[2010-05-09 01:13:28.141090] D [afr-common.c:760:afr_get_call_child] 
0-_testvol-replicate-0: Returning -107, call_child: -1, last_index: -1
[2010-05-09 01:13:28.141120] W [fuse-bridge.c:1271:fuse_readlink_cbk] 
0-glusterfs-fuse: 2783487: /testvol/somedir/bin/x86_64 = -1 (Transport 
endpoint is not connected)

Upon successful readlink (ls -l /testvol/somedir/bin/x86_64):

[2010-05-09 01:13:37.717904] T [fuse-bridge.c:376:fuse_entry_cbk] 
0-glusterfs-fuse: 2790073: LOOKUP() /testvol/somedir/bin = 
-5626802993936595428
[2010-05-09 01:13:37.718070] T [fuse-resolve.c:51:fuse_resolve_loc_touchup] 
0-fuse: return value inode_path 52
[2010-05-09 01:13:37.718127] T [fuse-bridge.c:506:fuse_lookup_resume] 
0-glusterfs-fuse: 2790074: LOOKUP 
/testvol/somedir/bin/x86_64(025d1c57-865f-4f1f-bc95-96ddcef3dc03)
[2010-05-09 01:13:37.718306] D [afr-common.c:131:afr_lookup_xattr_req_prepare] 
0-_testvol-replicate-0: /testvol/somedir/bin/x86_64

[Gluster-users] On breaking the connection between replicated volumes certain files return -ENOTCONN

2014-02-04 Thread Anirban Ghoshal
Hi everyone,

Here's a strange issue. I am using glusterfs 3.4.0 alpha. We need to move to a 
stable version ASAP, but I am telling you this just off chance that it might be 
interesting for somebody from the glusterfs development team. Please excuse the 
sheer length of this mail, but I am new to browsing such massive code, and not 
good at presenting my ideas very clearly.


Here's a set of observations:

1. You have a replica 2 volume (testvol) on server1 and server2. You assume 
that on either server, it is also locally mounted via mount.glusterfs at 
/testvol.
2. You have a large number of soft-linked files within the volume.
3. You check heal info (all its facets) to ensure not a single file is out of 
sync (also, verify md5sum or such, if possible).
4. You abrupty take down the ethernet device over which the servers are 
conencted (ip link set eth-dev down).
5. On one of the servers (say, server1 for definiteness), if you do an 'ls -l' 
readlink returns 'Transport endpoint is not connected'.
6. The error resolves all by itself if you get the eth-link up.

Here's some additional detail:
7. The error is intermittent, and not all soft-linked files have the issue.
8. If you take a directory containing soft-linked files, and if you do a ls -l 
_on_the_directory, like so,

server1$ ls -l /testvol/somedir/bin/

ls: cannot read symbolic link /testvol/somedir/bin/reset: Transport endpoint is 
not connected
ls: cannot read symbolic link /testvol/somedir/bin/bzless: Transport endpoint 
is not connected
ls: cannot read symbolic link /testvol/somedir/bin/i386: Transport endpoint is 
not connected
ls: cannot read symbolic link /testvol/somedir/bin/kill: Transport endpoint is 
not connected
ls: cannot read symbolic link /testvol/somedir/bin/linux32: Transport endpoint 
is not connected
ls: cannot read symbolic link /testvol/somedir/bin/linux64: Transport endpoint 
is not connected
ls: cannot read symbolic link /testvol/somedir/bin/logger: Transport endpoint 
is not connected
ls: cannot read symbolic link /testvol/somedir/bin/x86_64: Transport endpoint 
is not connected
ls: cannot read symbolic link /testvol/somedir/bin/python2: Transport endpoint 
is not connected
connected


9. If, however, you take a faulty soft-link and do an ls -l on it directly, 
then it rights itself immediately.

server1$ ls -l /testvol/somedir/bin/x86_64
lrwxrwxrwx 1 root root 7 May  7 23:11 /testvol/somedir/bin/x86_64 - setarch


I tried raising the client log level to 'trace'. Here's what I saw:

Upon READLINK failures, (ls -l /testvol/somedir/bin/):

[2010-05-09 01:13:28.140265] T [fuse-bridge.c:2453:fuse_readdir_cbk] 
0-glusterfs-fuse: 2783484: READDIR = 23/4096,1380
[2010-05-09 01:13:28.140444] T [fuse-resolve.c:51:fuse_resolve_loc_touchup] 
0-fuse: return value inode_path 45
[2010-05-09 01:13:28.140477] T [fuse-bridge.c:708:fuse_getattr_resume] 
0-glusterfs-fuse: 2783485: GETATTR 140299577689176 (/testvol/somedir/bin)
[2010-05-09 01:13:28.140618] T [fuse-bridge.c:641:fuse_attr_cbk] 
0-glusterfs-fuse: 2783485: STAT() /testvol/somedir/bin = -5626802993936595428
[2010-05-09 01:13:28.140722] T [fuse-resolve.c:51:fuse_resolve_loc_touchup] 
0-fuse: return value inode_path 52
[2010-05-09 01:13:28.140737] T [fuse-bridge.c:506:fuse_lookup_resume] 
0-glusterfs-fuse: 2783486: LOOKUP 
/testvol/somedir/bin/x86_64(025d1c57-865f-4f1f-bc95-96ddcef3dc03)
[2010-05-09 01:13:28.140851] T [fuse-bridge.c:376:fuse_entry_cbk] 
0-glusterfs-fuse: 2783486: LOOKUP() /testvol/somedir/bin/x86_64 = 
-4857810743645185021
[2010-05-09 01:13:28.140954] T [fuse-resolve.c:51:fuse_resolve_loc_touchup] 
0-fuse: return value inode_path 52
[2010-05-09 01:13:28.140975] T [fuse-bridge.c:1296:fuse_readlink_resume] 
0-glusterfs-fuse: 2783487 READLINK 
/testvol/somedir/bin/x86_64/025d1c57-865f-4f1f-bc95-96ddcef3dc03 
[2010-05-09 01:13:28.141090] D [afr-common.c:760:afr_get_call_child] 
0-_testvol-replicate-0: Returning -107, call_child: -1, last_index: -1
[2010-05-09 01:13:28.141120] W [fuse-bridge.c:1271:fuse_readlink_cbk] 
0-glusterfs-fuse: 2783487: /testvol/somedir/bin/x86_64 = -1 (Transport 
endpoint is not connected)

Upon successful readlink (ls -l /testvol/somedir/bin/x86_64):

[2010-05-09 01:13:37.717904] T [fuse-bridge.c:376:fuse_entry_cbk] 
0-glusterfs-fuse: 2790073: LOOKUP() /testvol/somedir/bin = -5626802993936595428
[2010-05-09 01:13:37.718070] T [fuse-resolve.c:51:fuse_resolve_loc_touchup] 
0-fuse: return value inode_path 52
[2010-05-09 01:13:37.718127] T [fuse-bridge.c:506:fuse_lookup_resume] 
0-glusterfs-fuse: 2790074: LOOKUP 
/testvol/somedir/bin/x86_64(025d1c57-865f-4f1f-bc95-96ddcef3dc03)
[2010-05-09 01:13:37.718306] D [afr-common.c:131:afr_lookup_xattr_req_prepare] 
0-_testvol-replicate-0: /testvol/somedir/bin/x86_64: failed to get the gfid 
from dict
[2010-05-09 01:13:37.718355] T [rpc-clnt.c:1301:rpc_clnt_record] 
0-_testvol-client-1: Auth Info: pid: 3343, uid: 0, gid: 0, owner: 

[2010-05-09 01:13:37.718383] T 

Re: [Gluster-users] File (setuid) permission changes during volume heal - possible bug?

2014-02-02 Thread Anirban Ghoshal
Hi Ravi, 

Many thanks for the super-quick turnaround on this!

Didn't know about this one quirk os chown, so thanks for that as well.

Anirban




On Thursday, 30 January 2014 9:22 AM, Ravishankar N ravishan...@redhat.com 
wrote:
 
Hi Anirban,
Thanks for taking the time off to file the bugzilla bug report.
  The fix has been sent for review upstream
  (http://review.gluster.org/#/c/6862/). Once it is merged, I will backport 
it to 3.4 as well.    
Regards,
Ravi


On 01/28/2014 02:07 AM, Chalcogen wrote:

Hi,

I am working on a twin-replicated setup (server1 and server2)
with glusterfs 3.4.0. I perform the following steps:

 
1. Create a distributed volume 'testvol' with the XFS brick 
server1:/brick/testvol on server1, and mount it using the glusterfs native 
client at /testvol.


2. I copy the following file to /testvol:
server1:~$ ls -l /bin/su
-rwsr-xr-x 1 root root 84742 Jan 17  2014 /bin/su
server1:~$ cp -a /bin/su /testvol


3. Within /testvol if I list out the file I just copied, I find its 
attributes intact.


4. Now, I add the XFS brick server2:/brick/testvol.
server2:~$ gluster volume add-brick testvol replica 2 server2:/brick/testvol

At this point, heal kicks in and the file is replicated on
server 2.


5. If I list out su in testvol on either server now, now, this is what 
I see.
server1:~$ ls -l /testvol/su
-rwsr-xr-x 1 root root 84742 Jan 17  2014 /bin/su

server2:~$ ls -l /testvol/su
-rwxr-xr-x 1 root root 84742 Jan 17  2014 /bin/suThat is, the 's' file mode 
gets changed to plain 'x' - meaning, all the attributes are not preserved upon 
heal completion. Would you consider this a bug? Is the behavior different on a 
higher release?

Thanks a lot.
Anirban
 


___
Gluster-users mailing list Gluster-users@gluster.org 
http://supercolony.gluster.org/mailman/listinfo/gluster-users___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Mounting soft-linked paths over nfs

2014-01-14 Thread Anirban Ghoshal

Dear gluster users,

I am facing a tiny bit of issue with the glusterfs nfs server. If I export a 
volume testvol, and within testvol, I have a path, say, dir1/dir2. If dir1 and 
dir2 are actual directories, then one can simply mount testvol/dir1/dir2 over 
nfs. However, if either of dir1 or dir2 is a soft-link, then mount.nfs returns 
-EINVAL. 

Would you say that this is normal behavior with this nfs server? Also, I am 
using the 3.4.0 release. Would it help if I upgrade? 

Thanks a lot, 
Anirban
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Passing noforget option to glusterfs native client mounts

2013-12-24 Thread Anirban Ghoshal
Hi, and Thanks a lot, Anand!

I was initially searching for a good answer to why the glusterfs site lists 
knfsd as NOT compatible with the glusterfs.  So, now I know. :)

Funnily enough, we didn't have a problem with the failover during our testing. 
We passed constant fsid's (fsid=xxx) while exporting our mounts and NFS mounts 
on client applications haven't called any of the file handles out stale while 
migrating the NFS service from one server to the other. Not sure why this 
happpens. Do nodeid's and generation numbers remain invariant across storage 
servers in glusterfs-3.4.0? 


We, for our part, have a pretty small amount of data in our filesystem (that 
is, compared with the petabyte sized volumes glusterfs commonly manages). Our 
total volume size would be somewhere around 4 GB, and some 50, 000 files is all 
they contain. Each server has around 16 GB of RAM, so space is not at a premium 
for this project... 

However, saying that, if glusterfs NFS server does maintain identical file 
handles across all its servers and does not alter file-handles upon failover, 
then in the long run it might be prudent to switch to glusterFS NFS as the 
cleaner solution... 


Thanks again!
Anirban




On Tuesday, 24 December 2013 1:58 PM, Anand Avati av...@gluster.org wrote:
 
Hi,
Allowing noforget option to FUSE will not help for your cause. Gluster persents 
the address of the inode_t as the nodeid to FUSE. In turn FUSE creates a 
filehandle using this nodeid for knfsd to export to nfs client. When knfsd 
fails over to another server, FUSE will decode the handle encoded by the other 
NFS server and try to use the nodeid of the other server - which will obviously 
not work as the virtual address of glusterfs process on the other server is not 
valid here.

Short version: the file-handle generated through FUSE is not durable. The 
noforget option in FUSE is a hack to avoid ESTALE messages because of dcache 
pruning. If you have enough inode in your volume, your system will go OOM at 
some point. The noforget is NOT a solution for providing NFS failover to a 
different server.

For reasons such as these, we ended up implementing our own NFS server where we 
encode a filehandle using the GFID (which is durable across reboots and server 
failovers). I would strongly recommend NOT using knfsd with any FUSE based 
filesystems (not just glusterfs) for a serious production use, and it will just 
not work if you are designing for NFS high availability/fail-over.

Thanks,
Avati



On Sat, Dec 21, 2013 at 8:52 PM, Anirban Ghoshal 
chalcogen_eg_oxy...@yahoo.com wrote:

If somebody has an idea on how this could be done, could you please help out? I 
am still stuck on this, apparently...

Thanks,
Anirban




On Thursday, 19 December 2013 1:40 AM, Chalcogen 
chalcogen_eg_oxy...@yahoo.com wrote:
 
P.s. I think I need to clarify this:

I am only reading from the mounts, and not modifying anything on the
server. and so the commonest causes on stale file handles do not
appy.

Anirban


On Thursday 19 December 2013 01:16 AM, Chalcogen wrote:

Hi everybody,

A few months back I joined a project where people want to
replace their legacy fuse-based (twin-server) replicated
file-system with GlusterFS. They also have a high-availability
NFS server code tagged with the kernel NFSD that they would wish
to retain (the nfs-kernel-server, I mean). The reason they wish
to retain the kernel NFS and not use the NFS server that comes
with GlusterFS is mainly because there's this bit of code that
allows NFS IP's to be migrated from one host server to the other
in the case that one happens to go down, and tweaks on the
export server configuration allow the file-handles to remain
identical on the new host server.

The solution was to mount gluster volumes using the
mount.glusterfs native client program and then export the
directories over the kernel NFS server. This seems to work most
of the time, but on rare occasions, 'stale file handle' is
reported off certain clients, which really puts a damper over
the 'high-availability' thing. After suitably instrumenting the
nfsd/fuse code in the kernel, it seems that decoding of the
file-handle fails on the server because the inode record
corresponding to the nodeid in the handle cannot be looked up.
Combining this with the fact that a second attempt by the client
to execute lookup on the same file passes, one might suspect
that the problem is identical to what many people attempting to
export fuse mounts over the kernel's NFS server are facing; viz,
fuse 'forgets' the inode records thereby causing ilookup5() to
fail. Miklos and other fuse developers/hackers would point
towards '-o noforget' while mounting their fuse file-systems. 

I tried passing  '-o noforget' to mount.glusterfs, but it does

Re: [Gluster-users] Passing noforget option to glusterfs native client mounts

2013-12-24 Thread Anirban Ghoshal
Thanks, Harshavardhana, Anand for the tips!

I checked out parts of the linux-2.6.34 (the one we are using down here) 
knfsd/fuse code. I understood (hopefully, rightly) that when we export a fuse 
directory over NFS and specify an fsid, the handle is constructed somewhat like 
this:

fh_size (4 bytes) - fh_version and stuff (4 bytes) - fsid, from export parms (4 
bytes) - nodeid (8 bytes) - generation number (4 bytes) - parent nodeid (8 
bytes) - parent generation (4 bytes).

So, since Anand mentions that nodeid's for glusterfs are just the inode_t 
addresses on servers, I can now relate to the fact that the file handles might 
not even survive failovers in any and every case, even with the fsid constant. 

That's why I was so confused.. I never faced an issue with stale file handles 
during failover yet! Maybe something to do with the order in which files were 
created on the replica server following heal commencement (our data is quite 
static btw) - like, if you malloc identical things on two identical platforms 
by running the same executable on each, you get allocations at the exact same 
virtual addresses...

However, now that I understand at least in part how this works, Glusterfs NFS 
does seem a lot cleaner... Will also try out the Ganesha... 


Thanks!


On Tuesday, 24 December 2013 11:04 PM, Harshavardhana 
har...@harshavardhana.net wrote:
 
On Tue, Dec 24, 2013 at 8:21 AM, Anirban Ghoshal

chalcogen_eg_oxy...@yahoo.com wrote:
 Hi, and Thanks a lot, Anand!

 I was initially searching for a good answer to why the glusterfs site lists
 knfsd as NOT compatible with the glusterfs.  So, now I know. :)

 Funnily enough, we didn't have a problem with the failover during our
 testing. We passed constant fsid's (fsid=xxx) while exporting our mounts and
 NFS mounts on client applications haven't called any of the file handles out
 stale while migrating the NFS service from one server to the other. Not sure
 why this happpens.

Using fsid is just a workaround always used to solve ESTALE on file
handles. The device major/minor numbers are embedded in the NFS file
handle, a problem when an NFS export is failed over or moved to
another node during failover is that these numbers change when the
resource is exported on the new node resulting in client to see a
Stale NFS file handle error. We need to make sure the embedded
number stays the same that is where the fsid export option - allowing
us to specify a coherent number across various clients.

GlusterNFS server is way cleaner solution for such consistency.

Another thing would be to take the next step, give a go for
'NFS-Ganesha' and 'GlusterFS' integration?

https://forge.gluster.org/nfs-ganesha-and-glusterfs-integration
http://www.gluster.org/2013/09/gluster-ganesha-nfsv4-initial-impressions/

Cheers
-- 
Religious confuse piety with mere ritual, the virtuous confuse
regulation with outcomes___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Passing noforget option to glusterfs native client mounts

2013-12-21 Thread Anirban Ghoshal
If somebody has an idea on how this could be done, could you please help out? I 
am still stuck on this, apparently...

Thanks,
Anirban




On Thursday, 19 December 2013 1:40 AM, Chalcogen 
chalcogen_eg_oxy...@yahoo.com wrote:
 
P.s. I think I need to clarify this:

I am only reading from the mounts, and not modifying anything on the
server. and so the commonest causes on stale file handles do not
appy.

Anirban


On Thursday 19 December 2013 01:16 AM, Chalcogen wrote:

Hi everybody,

A few months back I joined a project where people want to
replace their legacy fuse-based (twin-server) replicated
file-system with GlusterFS. They also have a high-availability
NFS server code tagged with the kernel NFSD that they would wish
to retain (the nfs-kernel-server, I mean). The reason they wish
to retain the kernel NFS and not use the NFS server that comes
with GlusterFS is mainly because there's this bit of code that
allows NFS IP's to be migrated from one host server to the other
in the case that one happens to go down, and tweaks on the
export server configuration allow the file-handles to remain
identical on the new host server.

The solution was to mount gluster volumes using the
mount.glusterfs native client program and then export the
directories over the kernel NFS server. This seems to work most
of the time, but on rare occasions, 'stale file handle' is
reported off certain clients, which really puts a damper over
the 'high-availability' thing. After suitably instrumenting the
nfsd/fuse code in the kernel, it seems that decoding of the
file-handle fails on the server because the inode record
corresponding to the nodeid in the handle cannot be looked up.
Combining this with the fact that a second attempt by the client
to execute lookup on the same file passes, one might suspect
that the problem is identical to what many people attempting to
export fuse mounts over the kernel's NFS server are facing; viz,
fuse 'forgets' the inode records thereby causing ilookup5() to
fail. Miklos and other fuse developers/hackers would point
towards '-o noforget' while mounting their fuse file-systems. 

I tried passing  '-o noforget' to mount.glusterfs, but it does
not seem to recognize it. Could somebody help me out with the
correct syntax to pass noforget to gluster volumes? Or,
something we could pass to glusterfs that would instruct fuse to
allocate a bigger cache for our inodes?

Additionally, should you think that something else might be
behind our problems, please do let me know.

Here's my configuration:

Linux kernel version: 2.6.34.12
GlusterFS versionn: 3.4.0
nfs.disable option for volumes: OFF on all volumes

Thanks a lot for your time!
Anirban

P.s. I found quite a few pages on the web that admonish users
that GlusterFS is not compatible with the kernel NFS server, but
do not really give much detail. Is this one of the reasons for
saying so?



___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users