Re: [Gluster-users] Passing noforget option to glusterfs native client mounts

2013-12-24 Thread Anand Avati
Hi,
Allowing noforget option to FUSE will not help for your cause. Gluster
persents the address of the inode_t as the nodeid to FUSE. In turn FUSE
creates a filehandle using this nodeid for knfsd to export to nfs client.
When knfsd fails over to another server, FUSE will decode the handle
encoded by the other NFS server and try to use the nodeid of the other
server - which will obviously not work as the virtual address of glusterfs
process on the other server is not valid here.

Short version: the file-handle generated through FUSE is not durable. The
noforget option in FUSE is a hack to avoid ESTALE messages because of
dcache pruning. If you have enough inode in your volume, your system will
go OOM at some point. The noforget is NOT a solution for providing NFS
failover to a different server.

For reasons such as these, we ended up implementing our own NFS server
where we encode a filehandle using the GFID (which is durable across
reboots and server failovers). I would strongly recommend NOT using knfsd
with any FUSE based filesystems (not just glusterfs) for a serious
production use, and it will just not work if you are designing for NFS high
availability/fail-over.

Thanks,
Avati


On Sat, Dec 21, 2013 at 8:52 PM, Anirban Ghoshal 
chalcogen_eg_oxy...@yahoo.com wrote:

 If somebody has an idea on how this could be done, could you please help
 out? I am still stuck on this, apparently...

 Thanks,
 Anirban


   On Thursday, 19 December 2013 1:40 AM, Chalcogen 
 chalcogen_eg_oxy...@yahoo.com wrote:
   P.s. I think I need to clarify this:

 I am only reading from the mounts, and not modifying anything on the
 server. and so the commonest causes on stale file handles do not appy.

 Anirban

 On Thursday 19 December 2013 01:16 AM, Chalcogen wrote:

 Hi everybody,

 A few months back I joined a project where people want to replace their
 legacy fuse-based (twin-server) replicated file-system with GlusterFS. They
 also have a high-availability NFS server code tagged with the kernel NFSD
 that they would wish to retain (the nfs-kernel-server, I mean). The reason
 they wish to retain the kernel NFS and not use the NFS server that comes
 with GlusterFS is mainly because there's this bit of code that allows NFS
 IP's to be migrated from one host server to the other in the case that one
 happens to go down, and tweaks on the export server configuration allow the
 file-handles to remain identical on the new host server.

 The solution was to mount gluster volumes using the mount.glusterfs native
 client program and then export the directories over the kernel NFS server.
 This seems to work most of the time, but on rare occasions, 'stale file
 handle' is reported off certain clients, which really puts a damper over
 the 'high-availability' thing. After suitably instrumenting the nfsd/fuse
 code in the kernel, it seems that decoding of the file-handle fails on the
 server because the inode record corresponding to the nodeid in the handle
 cannot be looked up. Combining this with the fact that a second attempt by
 the client to execute lookup on the same file passes, one might suspect
 that the problem is identical to what many people attempting to export fuse
 mounts over the kernel's NFS server are facing; viz, fuse 'forgets' the
 inode records thereby causing ilookup5() to fail. Miklos and other fuse
 developers/hackers would point towards '-o noforget' while mounting their
 fuse file-systems.

 I tried passing  '-o noforget' to mount.glusterfs, but it does not seem to
 recognize it. Could somebody help me out with the correct syntax to pass
 noforget to gluster volumes? Or, something we could pass to glusterfs that
 would instruct fuse to allocate a bigger cache for our inodes?

 Additionally, should you think that something else might be behind our
 problems, please do let me know.

 Here's my configuration:

 Linux kernel version: 2.6.34.12
 GlusterFS versionn: 3.4.0
 nfs.disable option for volumes: OFF on all volumes

 Thanks a lot for your time!
 Anirban

 P.s. I found quite a few pages on the web that admonish users that
 GlusterFS is not compatible with the kernel NFS server, but do not really
 give much detail. Is this one of the reasons for saying so?


 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://supercolony.gluster.org/mailman/listinfo/gluster-users



 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://supercolony.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Passing noforget option to glusterfs native client mounts

2013-12-24 Thread Anirban Ghoshal
Hi, and Thanks a lot, Anand!

I was initially searching for a good answer to why the glusterfs site lists 
knfsd as NOT compatible with the glusterfs.  So, now I know. :)

Funnily enough, we didn't have a problem with the failover during our testing. 
We passed constant fsid's (fsid=xxx) while exporting our mounts and NFS mounts 
on client applications haven't called any of the file handles out stale while 
migrating the NFS service from one server to the other. Not sure why this 
happpens. Do nodeid's and generation numbers remain invariant across storage 
servers in glusterfs-3.4.0? 


We, for our part, have a pretty small amount of data in our filesystem (that 
is, compared with the petabyte sized volumes glusterfs commonly manages). Our 
total volume size would be somewhere around 4 GB, and some 50, 000 files is all 
they contain. Each server has around 16 GB of RAM, so space is not at a premium 
for this project... 

However, saying that, if glusterfs NFS server does maintain identical file 
handles across all its servers and does not alter file-handles upon failover, 
then in the long run it might be prudent to switch to glusterFS NFS as the 
cleaner solution... 


Thanks again!
Anirban




On Tuesday, 24 December 2013 1:58 PM, Anand Avati av...@gluster.org wrote:
 
Hi,
Allowing noforget option to FUSE will not help for your cause. Gluster persents 
the address of the inode_t as the nodeid to FUSE. In turn FUSE creates a 
filehandle using this nodeid for knfsd to export to nfs client. When knfsd 
fails over to another server, FUSE will decode the handle encoded by the other 
NFS server and try to use the nodeid of the other server - which will obviously 
not work as the virtual address of glusterfs process on the other server is not 
valid here.

Short version: the file-handle generated through FUSE is not durable. The 
noforget option in FUSE is a hack to avoid ESTALE messages because of dcache 
pruning. If you have enough inode in your volume, your system will go OOM at 
some point. The noforget is NOT a solution for providing NFS failover to a 
different server.

For reasons such as these, we ended up implementing our own NFS server where we 
encode a filehandle using the GFID (which is durable across reboots and server 
failovers). I would strongly recommend NOT using knfsd with any FUSE based 
filesystems (not just glusterfs) for a serious production use, and it will just 
not work if you are designing for NFS high availability/fail-over.

Thanks,
Avati



On Sat, Dec 21, 2013 at 8:52 PM, Anirban Ghoshal 
chalcogen_eg_oxy...@yahoo.com wrote:

If somebody has an idea on how this could be done, could you please help out? I 
am still stuck on this, apparently...

Thanks,
Anirban




On Thursday, 19 December 2013 1:40 AM, Chalcogen 
chalcogen_eg_oxy...@yahoo.com wrote:
 
P.s. I think I need to clarify this:

I am only reading from the mounts, and not modifying anything on the
server. and so the commonest causes on stale file handles do not
appy.

Anirban


On Thursday 19 December 2013 01:16 AM, Chalcogen wrote:

Hi everybody,

A few months back I joined a project where people want to
replace their legacy fuse-based (twin-server) replicated
file-system with GlusterFS. They also have a high-availability
NFS server code tagged with the kernel NFSD that they would wish
to retain (the nfs-kernel-server, I mean). The reason they wish
to retain the kernel NFS and not use the NFS server that comes
with GlusterFS is mainly because there's this bit of code that
allows NFS IP's to be migrated from one host server to the other
in the case that one happens to go down, and tweaks on the
export server configuration allow the file-handles to remain
identical on the new host server.

The solution was to mount gluster volumes using the
mount.glusterfs native client program and then export the
directories over the kernel NFS server. This seems to work most
of the time, but on rare occasions, 'stale file handle' is
reported off certain clients, which really puts a damper over
the 'high-availability' thing. After suitably instrumenting the
nfsd/fuse code in the kernel, it seems that decoding of the
file-handle fails on the server because the inode record
corresponding to the nodeid in the handle cannot be looked up.
Combining this with the fact that a second attempt by the client
to execute lookup on the same file passes, one might suspect
that the problem is identical to what many people attempting to
export fuse mounts over the kernel's NFS server are facing; viz,
fuse 'forgets' the inode records thereby causing ilookup5() to
fail. Miklos and other fuse developers/hackers would point
towards '-o noforget' while mounting their fuse file-systems. 

I tried passing  '-o noforget' to mount.glusterfs, but it does
 

Re: [Gluster-users] Passing noforget option to glusterfs native client mounts

2013-12-24 Thread Harshavardhana
On Tue, Dec 24, 2013 at 8:21 AM, Anirban Ghoshal
chalcogen_eg_oxy...@yahoo.com wrote:
 Hi, and Thanks a lot, Anand!

 I was initially searching for a good answer to why the glusterfs site lists
 knfsd as NOT compatible with the glusterfs.  So, now I know. :)

 Funnily enough, we didn't have a problem with the failover during our
 testing. We passed constant fsid's (fsid=xxx) while exporting our mounts and
 NFS mounts on client applications haven't called any of the file handles out
 stale while migrating the NFS service from one server to the other. Not sure
 why this happpens.

Using fsid is just a workaround always used to solve ESTALE on file
handles. The device major/minor numbers are embedded in the NFS file
handle, a problem when an NFS export is failed over or moved to
another node during failover is that these numbers change when the
resource is exported on the new node resulting in client to see a
Stale NFS file handle error. We need to make sure the embedded
number stays the same that is where the fsid export option - allowing
us to specify a coherent number across various clients.

GlusterNFS server is way cleaner solution for such consistency.

Another thing would be to take the next step, give a go for
'NFS-Ganesha' and 'GlusterFS' integration?

https://forge.gluster.org/nfs-ganesha-and-glusterfs-integration
http://www.gluster.org/2013/09/gluster-ganesha-nfsv4-initial-impressions/

Cheers
-- 
Religious confuse piety with mere ritual, the virtuous confuse
regulation with outcomes
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Passing noforget option to glusterfs native client mounts

2013-12-24 Thread Anirban Ghoshal
Thanks, Harshavardhana, Anand for the tips!

I checked out parts of the linux-2.6.34 (the one we are using down here) 
knfsd/fuse code. I understood (hopefully, rightly) that when we export a fuse 
directory over NFS and specify an fsid, the handle is constructed somewhat like 
this:

fh_size (4 bytes) - fh_version and stuff (4 bytes) - fsid, from export parms (4 
bytes) - nodeid (8 bytes) - generation number (4 bytes) - parent nodeid (8 
bytes) - parent generation (4 bytes).

So, since Anand mentions that nodeid's for glusterfs are just the inode_t 
addresses on servers, I can now relate to the fact that the file handles might 
not even survive failovers in any and every case, even with the fsid constant. 

That's why I was so confused.. I never faced an issue with stale file handles 
during failover yet! Maybe something to do with the order in which files were 
created on the replica server following heal commencement (our data is quite 
static btw) - like, if you malloc identical things on two identical platforms 
by running the same executable on each, you get allocations at the exact same 
virtual addresses...

However, now that I understand at least in part how this works, Glusterfs NFS 
does seem a lot cleaner... Will also try out the Ganesha... 


Thanks!


On Tuesday, 24 December 2013 11:04 PM, Harshavardhana 
har...@harshavardhana.net wrote:
 
On Tue, Dec 24, 2013 at 8:21 AM, Anirban Ghoshal

chalcogen_eg_oxy...@yahoo.com wrote:
 Hi, and Thanks a lot, Anand!

 I was initially searching for a good answer to why the glusterfs site lists
 knfsd as NOT compatible with the glusterfs.  So, now I know. :)

 Funnily enough, we didn't have a problem with the failover during our
 testing. We passed constant fsid's (fsid=xxx) while exporting our mounts and
 NFS mounts on client applications haven't called any of the file handles out
 stale while migrating the NFS service from one server to the other. Not sure
 why this happpens.

Using fsid is just a workaround always used to solve ESTALE on file
handles. The device major/minor numbers are embedded in the NFS file
handle, a problem when an NFS export is failed over or moved to
another node during failover is that these numbers change when the
resource is exported on the new node resulting in client to see a
Stale NFS file handle error. We need to make sure the embedded
number stays the same that is where the fsid export option - allowing
us to specify a coherent number across various clients.

GlusterNFS server is way cleaner solution for such consistency.

Another thing would be to take the next step, give a go for
'NFS-Ganesha' and 'GlusterFS' integration?

https://forge.gluster.org/nfs-ganesha-and-glusterfs-integration
http://www.gluster.org/2013/09/gluster-ganesha-nfsv4-initial-impressions/

Cheers
-- 
Religious confuse piety with mere ritual, the virtuous confuse
regulation with outcomes___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Passing noforget option to glusterfs native client mounts

2013-12-21 Thread Anirban Ghoshal
If somebody has an idea on how this could be done, could you please help out? I 
am still stuck on this, apparently...

Thanks,
Anirban




On Thursday, 19 December 2013 1:40 AM, Chalcogen 
chalcogen_eg_oxy...@yahoo.com wrote:
 
P.s. I think I need to clarify this:

I am only reading from the mounts, and not modifying anything on the
server. and so the commonest causes on stale file handles do not
appy.

Anirban


On Thursday 19 December 2013 01:16 AM, Chalcogen wrote:

Hi everybody,

A few months back I joined a project where people want to
replace their legacy fuse-based (twin-server) replicated
file-system with GlusterFS. They also have a high-availability
NFS server code tagged with the kernel NFSD that they would wish
to retain (the nfs-kernel-server, I mean). The reason they wish
to retain the kernel NFS and not use the NFS server that comes
with GlusterFS is mainly because there's this bit of code that
allows NFS IP's to be migrated from one host server to the other
in the case that one happens to go down, and tweaks on the
export server configuration allow the file-handles to remain
identical on the new host server.

The solution was to mount gluster volumes using the
mount.glusterfs native client program and then export the
directories over the kernel NFS server. This seems to work most
of the time, but on rare occasions, 'stale file handle' is
reported off certain clients, which really puts a damper over
the 'high-availability' thing. After suitably instrumenting the
nfsd/fuse code in the kernel, it seems that decoding of the
file-handle fails on the server because the inode record
corresponding to the nodeid in the handle cannot be looked up.
Combining this with the fact that a second attempt by the client
to execute lookup on the same file passes, one might suspect
that the problem is identical to what many people attempting to
export fuse mounts over the kernel's NFS server are facing; viz,
fuse 'forgets' the inode records thereby causing ilookup5() to
fail. Miklos and other fuse developers/hackers would point
towards '-o noforget' while mounting their fuse file-systems. 

I tried passing  '-o noforget' to mount.glusterfs, but it does
not seem to recognize it. Could somebody help me out with the
correct syntax to pass noforget to gluster volumes? Or,
something we could pass to glusterfs that would instruct fuse to
allocate a bigger cache for our inodes?

Additionally, should you think that something else might be
behind our problems, please do let me know.

Here's my configuration:

Linux kernel version: 2.6.34.12
GlusterFS versionn: 3.4.0
nfs.disable option for volumes: OFF on all volumes

Thanks a lot for your time!
Anirban

P.s. I found quite a few pages on the web that admonish users
that GlusterFS is not compatible with the kernel NFS server, but
do not really give much detail. Is this one of the reasons for
saying so?



___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Passing noforget option to glusterfs native client mounts

2013-12-18 Thread Chalcogen

Hi everybody,

A few months back I joined a project where people want to replace their 
legacy fuse-based (twin-server) replicated file-system with GlusterFS. 
They also have a high-availability NFS server code tagged with the 
kernel NFSD that they would wish to retain (the nfs-kernel-server, I 
mean). The reason they wish to retain the kernel NFS and not use the NFS 
server that comes with GlusterFS is mainly because there's this bit of 
code that allows NFS IP's to be migrated from one host server to the 
other in the case that one happens to go down, and tweaks on the export 
server configuration allow the file-handles to remain identical on the 
new host server.


The solution was to mount gluster volumes using the mount.glusterfs 
native client program and then export the directories over the kernel 
NFS server. This seems to work most of the time, but on rare occasions, 
'stale file handle' is reported off certain clients, which really puts a 
damper over the 'high-availability' thing. After suitably instrumenting 
the nfsd/fuse code in the kernel, it seems that decoding of the 
file-handle fails on the server because the inode record corresponding 
to the nodeid in the handle cannot be looked up. Combining this with the 
fact that a second attempt by the client to execute lookup on the same 
file passes, one might suspect that the problem is identical to what 
many people attempting to export fuse mounts over the kernel's NFS 
server are facing; viz, fuse 'forgets' the inode records thereby causing 
ilookup5() to fail. Miklos and other fuse developers/hackers would point 
towards '-o noforget' while mounting their fuse file-systems.


I tried passing  '-o noforget' to mount.glusterfs, but it does not seem 
to recognize it. Could somebody help me out with the correct syntax to 
pass noforget to gluster volumes? Or, something we could pass to 
glusterfs that would instruct fuse to allocate a bigger cache for our 
inodes?


Additionally, should you think that something else might be behind our 
problems, please do let me know.


Here's my configuration:

Linux kernel version: 2.6.34.12
GlusterFS versionn: 3.4.0
nfs.disable option for volumes: OFF on all volumes

Thanks a lot for your time!
Anirban

P.s. I found quite a few pages on the web that admonish users that 
GlusterFS is not compatible with the kernel NFS server, but do not 
really give much detail. Is this one of the reasons for saying so?
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Passing noforget option to glusterfs native client mounts

2013-12-18 Thread Chalcogen

P.s. I think I need to clarify this:

I am only reading from the mounts, and not modifying anything on the 
server. and so the commonest causes on stale file handles do not appy.


Anirban

On Thursday 19 December 2013 01:16 AM, Chalcogen wrote:

Hi everybody,

A few months back I joined a project where people want to replace 
their legacy fuse-based (twin-server) replicated file-system with 
GlusterFS. They also have a high-availability NFS server code tagged 
with the kernel NFSD that they would wish to retain (the 
nfs-kernel-server, I mean). The reason they wish to retain the kernel 
NFS and not use the NFS server that comes with GlusterFS is mainly 
because there's this bit of code that allows NFS IP's to be migrated 
from one host server to the other in the case that one happens to go 
down, and tweaks on the export server configuration allow the 
file-handles to remain identical on the new host server.


The solution was to mount gluster volumes using the mount.glusterfs 
native client program and then export the directories over the kernel 
NFS server. This seems to work most of the time, but on rare 
occasions, 'stale file handle' is reported off certain clients, which 
really puts a damper over the 'high-availability' thing. After 
suitably instrumenting the nfsd/fuse code in the kernel, it seems that 
decoding of the file-handle fails on the server because the inode 
record corresponding to the nodeid in the handle cannot be looked up. 
Combining this with the fact that a second attempt by the client to 
execute lookup on the same file passes, one might suspect that the 
problem is identical to what many people attempting to export fuse 
mounts over the kernel's NFS server are facing; viz, fuse 'forgets' 
the inode records thereby causing ilookup5() to fail. Miklos and other 
fuse developers/hackers would point towards '-o noforget' while 
mounting their fuse file-systems.


I tried passing  '-o noforget' to mount.glusterfs, but it does not 
seem to recognize it. Could somebody help me out with the correct 
syntax to pass noforget to gluster volumes? Or, something we could 
pass to glusterfs that would instruct fuse to allocate a bigger cache 
for our inodes?


Additionally, should you think that something else might be behind our 
problems, please do let me know.


Here's my configuration:

Linux kernel version: 2.6.34.12
GlusterFS versionn: 3.4.0
nfs.disable option for volumes: OFF on all volumes

Thanks a lot for your time!
Anirban

P.s. I found quite a few pages on the web that admonish users that 
GlusterFS is not compatible with the kernel NFS server, but do not 
really give much detail. Is this one of the reasons for saying so?


___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users