Re: [Gluster-devel] How to fix wrong telldir/seekdir usage

2014-09-18 Thread Emmanuel Dreyfus
On Sat, Sep 13, 2014 at 01:34:58PM -0700, Anand Avati wrote:
> How does the NetBSD nfs server provide stable directory offsets, for the
> NFS client to resume reading from at a later point in time? Very similar
> problems are present in that scenario and it might be helpful to see what
> approaches are taken there (which are probably more tried and tested)

I still do not have an answer for this question, but I have a patch to 
fix the standard-violating seekdir() usage. As discussed in this threead,
I remove fd anonymity in afr-selfèheald.c:
http://review.gluster.org/8760

The resource cleanup code may be wrong, it needs review. At least the
patch lets NetBSD pass self-held.t without getting locked into an 
infinite loop.

For the NFS question, I suspect it is out of scope: seekdir() does not
operate at kernel interface level, but at libc level; the data we
see from readdir() are cached in userland, and adding a node will not 
invalidate them. I have to run tess to confirm, though.


-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] How to fix wrong telldir/seekdir usage

2014-09-15 Thread Emmanuel Dreyfus
J. Bruce Fields  wrote:

> Again, this is a requirement for any filesystem that wants to be
> exported over NFS.  

Not sure. We operate at readdir level, which is not even the system call
level. 

>What filesystem exactly are you testing on? 

FFS

> What is the NetBSD NFS server doing?

I do not have the answer yet, stay tuned.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] How to fix wrong telldir/seekdir usage

2014-09-15 Thread J. Bruce Fields
On Mon, Sep 15, 2014 at 07:57:21PM +0200, Emmanuel Dreyfus wrote:
> J. Bruce Fields  wrote:
> 
> > 4) Report this as a bug to NetBSD.
> > 
> > You may be correct about the letter of the spec, but specs don't
> > necessarily capture all requirements.  And as others say NFS server code
> > at least will require that these actually work across reboots.
> > Filesystems go to great lengths to make that work.
> 
> If you intent GlusterFS to be portable, then you will not be able to
> "fix" all non Linux systems.  You have to avoid wandering outside of area
> specified by POSIX. This is a good practice anyway since you are never
> sure that Linux itself will stick to a given unspecified behavior.

Again, this is a requirement for any filesystem that wants to be
exported over NFS.  What filesystem exactly are you testing on?  What is
the NetBSD NFS server doing?

--b.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] How to fix wrong telldir/seekdir usage

2014-09-15 Thread Emmanuel Dreyfus
J. Bruce Fields  wrote:

> 4) Report this as a bug to NetBSD.
> 
> You may be correct about the letter of the spec, but specs don't
> necessarily capture all requirements.  And as others say NFS server code
> at least will require that these actually work across reboots.
> Filesystems go to great lengths to make that work.

If you intent GlusterFS to be portable, then you will not be able to
"fix" all non Linux systems. You have to avoid wandering outside of area
specified by POSIX. This is a good practice anyway since you are never
sure that Linux itself will stick to a given unspecified behavior.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] How to fix wrong telldir/seekdir usage

2014-09-15 Thread J. Bruce Fields
On Sat, Sep 13, 2014 at 09:02:55PM +0200, Emmanuel Dreyfus wrote:
> In <1lrx1si.n8tms1igmi5pm%m...@netbsd.org> I explained why NetBSD
> currently fails self-heald.t, but since the subjet is burried deep in a
> thread, it might be worth starting a new one to talk about how to fix.
> 
> In 3 places within glusterfs code (features/index,
> features/snapview-server and storage/posix), a server component answers
> readdir requests on a directory which may be split in mulitple calls.
> 
> To answer one call, we have the following library calls:
> - opendir()
> - seekdir() to resume where the previous request was
> - readdir()
> - telldir() to record where we are for the next request
> - closedir()
> 
> This relies on unspecified behavior, as POSIX says: "The value of loc
> should have been returned from an earlier call to telldir() using the
> same directory stream."
> http://pubs.opengroup.org/onlinepubs/9699919799/functions/seekdir.html
> 
> Since we do opendir() and closedir() at each time, we do not use the
> same directory stream. It causes an infinite loop on NetBSD because it
> badly resume from previous request, and in the general case it will
> break badly if an entry is added in the directory between two requests.
> 
> How can we fix that?
> 
> 1) we can keep the directory stream open. The change is intrusive since
> we will need a chained list of open contexts, and we need to clean them
> if they timeout.
> 
> 2) in order to keep state between requests, we can use the entry index
> (first encoutered is 1, and so on) instead of values returned by
> telldir(). That works around the unspecified behavior, but it still
> breaks if directory content is changed between two requests
> 
> 3) make sure the readdir is done in a single request. That means trying
> with bigger buffers until it works. For instance  in
> xlator/cluster/afr/src/afr-self-heald.c we have:
>while ((ret = syncop_readdir (subvol, fd, 131072, offset, &entries)))

4) Report this as a bug to NetBSD.

You may be correct about the letter of the spec, but specs don't
necessarily capture all requirements.  And as others say NFS server code
at least will require that these actually work across reboots.
Filesystems go to great lengths to make that work.

--b.

> 
> We would use -1 instead of 131072 to tell that we want everything
> without a size limit, and the server component (here features/index)
> would either return everyting or fail, whithout playing with
> telldir/seekdir.
> 
> Opinions? The third solution seems the best to me since it is not very
> intrusive and it makes things simplier. Indeed we allow unbound data
> size to come back from the brick to glustershd, but we trust the brick,
> right?
> 
> -- 
> Emmanuel Dreyfus
> http://hcpnet.free.fr/pubz
> m...@netbsd.org
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-devel
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] How to fix wrong telldir/seekdir usage

2014-09-14 Thread Emmanuel Dreyfus
Pranith Kumar Karampuri  wrote:

> If we hit a dead-end there we should make this change in self-heald.c
> Keep us posted.

Sure. In the meantime I am stil linterested by your change about non
anonymous fd in features/index

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] How to fix wrong telldir/seekdir usage

2014-09-14 Thread Pranith Kumar Karampuri


On 09/14/2014 10:41 PM, Emmanuel Dreyfus wrote:

'Pranith Kumar Karampuri  wrote:


I can do that.

That will teach me about that anonymous fd. Reading the code it seems
afr-self-heald.c code does opendir and use the fd for readdir syncop,
which suggest underlying xlator will use the same DIR *, but logging in
index.c I can see it calls opendir/closedir on each readdir.


Do we know the answer to Avati's question about how
readdir works for nfs handles?

Not yet. I asked in the relevant mailing list and I still await for a
reply. I suspect we hit a dark area within a grey area :-)
If we hit a dead-end there we should make this change in self-heald.c 
Keep us posted.


Pranith.




___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] How to fix wrong telldir/seekdir usage

2014-09-14 Thread Emmanuel Dreyfus
'Pranith Kumar Karampuri  wrote:

> I can do that. 

That will teach me about that anonymous fd. Reading the code it seems
afr-self-heald.c code does opendir and use the fd for readdir syncop,
which suggest underlying xlator will use the same DIR *, but logging in
index.c I can see it calls opendir/closedir on each readdir. 

> Do we know the answer to Avati's question about how 
> readdir works for nfs handles?

Not yet. I asked in the relevant mailing list and I still await for a
reply. I suspect we hit a dark area within a grey area :-)

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] How to fix wrong telldir/seekdir usage

2014-09-14 Thread Pranith Kumar Karampuri


On 09/14/2014 10:39 AM, Emmanuel Dreyfus wrote:

Pranith Kumar Karampuri  wrote:


Just to make sure I understand the problem, the issue is happening
because self-heal-daemon uses anonymous fds to perform readdirs? i.e.
there is no explicit opendir on the directory. Everytime there is a
readdir it may lead to opendir/seekdir/readdir/closedir. Did I get that
right?

Yes, on the brick, it happens in xlator/features/index.


I believe posix xlator doesn't have this problem for non-anonymous fds
where the DIR* stream is open till the final unref on the fd.

Then perhaps the solution is to change xlator/features/index behavior to
match xlator/storage/posix? There is also
xlator/features/snapview-server that may be affected.

I can do that. Do we know the answer to Avati's question about how 
readdir works for nfs handles?


Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] How to fix wrong telldir/seekdir usage

2014-09-14 Thread Emmanuel Dreyfus
Pranith Kumar Karampuri  wrote:

> I believe posix xlator doesn't have this problem for 
> non-anonymous fds where the DIR* stream is open till the final unref on
> the fd.

You must be right since reading a directory in a glusterfs volume does
not cause an infinite loop on NetBSD [1]. A quick inspection in the code
seems to suggest that we only have opendir/readdir/closedir in the same
function, and the opendir in posix_opendir.

snapview-server seems to suffer the same problem as index, though.

[1] reusing for seekdir() an offset obtained by telldir() on another DIR
* is unspecified. On Linux it seems to work (without any guarantee it is
reliable) NetBSD it is nilpotent, which cause readdir() to always
restart at the beginning of the directory.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] How to fix wrong telldir/seekdir usage

2014-09-13 Thread Emmanuel Dreyfus
Pranith Kumar Karampuri  wrote:

> Just to make sure I understand the problem, the issue is happening 
> because self-heal-daemon uses anonymous fds to perform readdirs? i.e.
> there is no explicit opendir on the directory. Everytime there is a 
> readdir it may lead to opendir/seekdir/readdir/closedir. Did I get that
> right?

Yes, on the brick, it happens in xlator/features/index. 

> I believe posix xlator doesn't have this problem for non-anonymous fds
> where the DIR* stream is open till the final unref on the fd.

Then perhaps the solution is to change xlator/features/index behavior to
match xlator/storage/posix? There is also
xlator/features/snapview-server that may be affected.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] How to fix wrong telldir/seekdir usage

2014-09-13 Thread Pranith Kumar Karampuri


On 09/14/2014 12:32 AM, Emmanuel Dreyfus wrote:

In <1lrx1si.n8tms1igmi5pm%m...@netbsd.org> I explained why NetBSD
currently fails self-heald.t, but since the subjet is burried deep in a
thread, it might be worth starting a new one to talk about how to fix.

In 3 places within glusterfs code (features/index,
features/snapview-server and storage/posix), a server component answers
readdir requests on a directory which may be split in mulitple calls.

To answer one call, we have the following library calls:
- opendir()
- seekdir() to resume where the previous request was
- readdir()
- telldir() to record where we are for the next request
- closedir()

This relies on unspecified behavior, as POSIX says: "The value of loc
should have been returned from an earlier call to telldir() using the
same directory stream."
http://pubs.opengroup.org/onlinepubs/9699919799/functions/seekdir.html

Since we do opendir() and closedir() at each time, we do not use the
same directory stream. It causes an infinite loop on NetBSD because it
badly resume from previous request, and in the general case it will
break badly if an entry is added in the directory between two requests.

How can we fix that?

1) we can keep the directory stream open. The change is intrusive since
we will need a chained list of open contexts, and we need to clean them
if they timeout.

2) in order to keep state between requests, we can use the entry index
(first encoutered is 1, and so on) instead of values returned by
telldir(). That works around the unspecified behavior, but it still
breaks if directory content is changed between two requests

3) make sure the readdir is done in a single request. That means trying
with bigger buffers until it works. For instance  in
xlator/cluster/afr/src/afr-self-heald.c we have:
while ((ret = syncop_readdir (subvol, fd, 131072, offset, &entries)))

We would use -1 instead of 131072 to tell that we want everything
without a size limit, and the server component (here features/index)
would either return everyting or fail, whithout playing with
telldir/seekdir.

Opinions? The third solution seems the best to me since it is not very
intrusive and it makes things simplier. Indeed we allow unbound data
size to come back from the brick to glustershd, but we trust the brick,
right?
I saw cases where the number of entries in the index directory was close 
to 10. If brick process tries to read everything it will be OOM 
killed by the kernel.


Just to make sure I understand the problem, the issue is happening 
because self-heal-daemon uses anonymous fds to perform readdirs? i.e. 
there is no explicit opendir on the directory. Everytime there is a 
readdir it may lead to opendir/seekdir/readdir/closedir. Did I get that 
right? I believe posix xlator doesn't have this problem for 
non-anonymous fds where the DIR* stream is open till the final unref on 
the fd.


Pranith




___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] How to fix wrong telldir/seekdir usage

2014-09-13 Thread Anand Avati
How does the NetBSD nfs server provide stable directory offsets, for the
NFS client to resume reading from at a later point in time? Very similar
problems are present in that scenario and it might be helpful to see what
approaches are taken there (which are probably more tried and tested)

Thanks

On Sat, Sep 13, 2014 at 12:02 PM, Emmanuel Dreyfus  wrote:

> In <1lrx1si.n8tms1igmi5pm%m...@netbsd.org> I explained why NetBSD
> currently fails self-heald.t, but since the subjet is burried deep in a
> thread, it might be worth starting a new one to talk about how to fix.
>
> In 3 places within glusterfs code (features/index,
> features/snapview-server and storage/posix), a server component answers
> readdir requests on a directory which may be split in mulitple calls.
>
> To answer one call, we have the following library calls:
> - opendir()
> - seekdir() to resume where the previous request was
> - readdir()
> - telldir() to record where we are for the next request
> - closedir()
>
> This relies on unspecified behavior, as POSIX says: "The value of loc
> should have been returned from an earlier call to telldir() using the
> same directory stream."
> http://pubs.opengroup.org/onlinepubs/9699919799/functions/seekdir.html
>
> Since we do opendir() and closedir() at each time, we do not use the
> same directory stream. It causes an infinite loop on NetBSD because it
> badly resume from previous request, and in the general case it will
> break badly if an entry is added in the directory between two requests.
>
> How can we fix that?
>
> 1) we can keep the directory stream open. The change is intrusive since
> we will need a chained list of open contexts, and we need to clean them
> if they timeout.
>
> 2) in order to keep state between requests, we can use the entry index
> (first encoutered is 1, and so on) instead of values returned by
> telldir(). That works around the unspecified behavior, but it still
> breaks if directory content is changed between two requests
>
> 3) make sure the readdir is done in a single request. That means trying
> with bigger buffers until it works. For instance  in
> xlator/cluster/afr/src/afr-self-heald.c we have:
>while ((ret = syncop_readdir (subvol, fd, 131072, offset, &entries)))
>
> We would use -1 instead of 131072 to tell that we want everything
> without a size limit, and the server component (here features/index)
> would either return everyting or fail, whithout playing with
> telldir/seekdir.
>
> Opinions? The third solution seems the best to me since it is not very
> intrusive and it makes things simplier. Indeed we allow unbound data
> size to come back from the brick to glustershd, but we trust the brick,
> right?
>
> --
> Emmanuel Dreyfus
> http://hcpnet.free.fr/pubz
> m...@netbsd.org
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-devel
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] How to fix wrong telldir/seekdir usage

2014-09-13 Thread Joe Julian
Personally, I like the third option provided that doesn't cause memory issues.

In fact, read the whole thing, transfer it to the client and let the client 
handle the posix syntax.

Optionally add a path cache timeout client side that stores the directory 
listing for a period of time to mitigate the "php" dilemma for those types of 
use cases. 
 

On September 13, 2014 12:02:55 PM PDT, m...@netbsd.org wrote:
>In <1lrx1si.n8tms1igmi5pm%m...@netbsd.org> I explained why NetBSD
>currently fails self-heald.t, but since the subjet is burried deep in a
>thread, it might be worth starting a new one to talk about how to fix.
>
>In 3 places within glusterfs code (features/index,
>features/snapview-server and storage/posix), a server component answers
>readdir requests on a directory which may be split in mulitple calls.
>
>To answer one call, we have the following library calls:
>- opendir()
>- seekdir() to resume where the previous request was
>- readdir()
>- telldir() to record where we are for the next request
>- closedir()
>
>This relies on unspecified behavior, as POSIX says: "The value of loc
>should have been returned from an earlier call to telldir() using the
>same directory stream."
>http://pubs.opengroup.org/onlinepubs/9699919799/functions/seekdir.html
>
>Since we do opendir() and closedir() at each time, we do not use the
>same directory stream. It causes an infinite loop on NetBSD because it
>badly resume from previous request, and in the general case it will
>break badly if an entry is added in the directory between two requests.
>
>How can we fix that?
>
>1) we can keep the directory stream open. The change is intrusive since
>we will need a chained list of open contexts, and we need to clean them
>if they timeout.
>
>2) in order to keep state between requests, we can use the entry index
>(first encoutered is 1, and so on) instead of values returned by
>telldir(). That works around the unspecified behavior, but it still
>breaks if directory content is changed between two requests
>
>3) make sure the readdir is done in a single request. That means trying
>with bigger buffers until it works. For instance  in
>xlator/cluster/afr/src/afr-self-heald.c we have:
>  while ((ret = syncop_readdir (subvol, fd, 131072, offset, &entries)))
>
>We would use -1 instead of 131072 to tell that we want everything
>without a size limit, and the server component (here features/index)
>would either return everyting or fail, whithout playing with
>telldir/seekdir.
>
>Opinions? The third solution seems the best to me since it is not very
>intrusive and it makes things simplier. Indeed we allow unbound data
>size to come back from the brick to glustershd, but we trust the brick,
>right?
>
>-- 
>Emmanuel Dreyfus
>http://hcpnet.free.fr/pubz
>m...@netbsd.org
>___
>Gluster-devel mailing list
>Gluster-devel@gluster.org
>http://supercolony.gluster.org/mailman/listinfo/gluster-devel

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] How to fix wrong telldir/seekdir usage

2014-09-13 Thread Emmanuel Dreyfus
In <1lrx1si.n8tms1igmi5pm%m...@netbsd.org> I explained why NetBSD
currently fails self-heald.t, but since the subjet is burried deep in a
thread, it might be worth starting a new one to talk about how to fix.

In 3 places within glusterfs code (features/index,
features/snapview-server and storage/posix), a server component answers
readdir requests on a directory which may be split in mulitple calls.

To answer one call, we have the following library calls:
- opendir()
- seekdir() to resume where the previous request was
- readdir()
- telldir() to record where we are for the next request
- closedir()

This relies on unspecified behavior, as POSIX says: "The value of loc
should have been returned from an earlier call to telldir() using the
same directory stream."
http://pubs.opengroup.org/onlinepubs/9699919799/functions/seekdir.html

Since we do opendir() and closedir() at each time, we do not use the
same directory stream. It causes an infinite loop on NetBSD because it
badly resume from previous request, and in the general case it will
break badly if an entry is added in the directory between two requests.

How can we fix that?

1) we can keep the directory stream open. The change is intrusive since
we will need a chained list of open contexts, and we need to clean them
if they timeout.

2) in order to keep state between requests, we can use the entry index
(first encoutered is 1, and so on) instead of values returned by
telldir(). That works around the unspecified behavior, but it still
breaks if directory content is changed between two requests

3) make sure the readdir is done in a single request. That means trying
with bigger buffers until it works. For instance  in
xlator/cluster/afr/src/afr-self-heald.c we have:
   while ((ret = syncop_readdir (subvol, fd, 131072, offset, &entries)))

We would use -1 instead of 131072 to tell that we want everything
without a size limit, and the server component (here features/index)
would either return everyting or fail, whithout playing with
telldir/seekdir.

Opinions? The third solution seems the best to me since it is not very
intrusive and it makes things simplier. Indeed we allow unbound data
size to come back from the brick to glustershd, but we trust the brick,
right?

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel