[Gluster-devel] READDIR bug in NFS server (wasl: mount.t oddity)

2014-08-15 Thread Emmanuel Dreyfus
> Sorry for the missing subject, here it is.
> 
> On Thu, Aug 14, 2014 at 02:10:16PM +, Emmanuel Dreyfus wrote:
> > I observe a strange thing with tests/basic/mount.t on NetBSD.
> > It hangs on
> > TEST 23 (line 66): ! rm /mnt/glusterfs/1/newfile

I came to the conclusion this is a bug in GlusterFS NFS server component. 

Here the IP paccket for READDIR reply send by GlusterFS NFS server when the
only entry in the directory is a file called AAA (along with dot and dotdot):

0x:  4500 0110 2227 4000 4006  17fd ac40  E..."'@.@..@
0x0010:  17fd ac40 0801 03f4 1cf8 339a 1b0e 84e7  ...@..3.
0x0020:  8018 0100 897d  0101 080a  0002  .}..
0x0030:   0001 8000 00d8 746a 6647  0001  tjfG
0x0040:           
0x0050:     0001  0002  01ed  
0x0060:   0003    0064    ...d
0x0070:   0400    0800    
0x0080:    de01 7120 5cb3 7985    ..q.\.y.
0x0090:   0001 53ed 8b4d 0c78 b966 53ed 82e9  S..M.x.fS...
0x00a0:  29af 5df5 53ed 82e9 29af 5df5 f8e1 23bb  ).].S...).]...#.
0x00b0:     0001    0001  
0x00c0:   0001 2e00  7fff  ee68 2415  .h$.
0x00d0:   0001    0001  0002  
0x00e0:  2e2e  7fff  ee68 2419  0001  .h$.
0x00f0:  8aab 132e eed7 a537  0003 4141 4100  ...7AAA.
0x0100:  7fff  ee68 2421      .h$!

Note the trailing nul byte. It is eof boolean flag, and it should be set to 1.
For some reason the Linux NFS client can cope with this error (I guess it uses
the packet length?), but the NetBSD NFS client keeps looping on the last
entry.

Fixing this is not straightforward. The eof field is set in the NFS reply
frame by nfs3_fill_readdir3res() when op_errno is ENOENT. Here is below the
kind of backtrace to  nfs3_fill_readdir3res() I get when mounting the NFS
filesystem. Further debugging shows op_errno is always 0. Obviously there must
be a op_errno = ENOENT missing somewhere in caller functions, but I have
trouble to tell where. I do not see anything going to the posix xlator as I
would have expected.

0xb9ac364a  at
/autobuild/install/lib/glusterfs/3.7dev/xlator/nfs/server.so
0xb9abc528  at
/autobuild/install/lib/glusterfs/3.7dev/xlator/nfs/server.so
0xb9abc758  at
/autobuild/install/lib/glusterfs/3.7dev/xlator/nfs/server.so
0xb9a9ccb5  at
/autobuild/install/lib/glusterfs/3.7dev/xlator/nfs/server.so
0xbb30e98e  at
/autobuild/install/lib/glusterfs/3.7dev/xlator/debug/io-stats.so
0xbb7708ac  at /autobuild/install/lib/libglusterfs.so.0
0xb9b2be69  at
/autobuild/install/lib/glusterfs/3.7dev/xlator/cluster/distribute.so
0xb9b57b52  at
/autobuild/install/lib/glusterfs/3.7dev/xlator/cluster/stripe.so
0xb9b8119e  at
/autobuild/install/lib/glusterfs/3.7dev/xlator/cluster/replicate.so
0xb9be00bf  at
/autobuild/install/lib/glusterfs/3.7dev/xlator/protocol/client.so
0xbb73d45f  at /autobuild/install/lib/libgfrpc.so.0
0xbb73d757  at /autobuild/install/lib/libgfrpc.so.0
0xbb739c95  at /autobuild/install/lib/libgfrpc.so.0
0xbb38c81b <_init+28395> at
/autobuild/install/lib/glusterfs/3.7dev/rpc-transport/socket.so
0xbb38ccd5 <_init+29605> at
/autobuild/install/lib/glusterfs/3.7dev/rpc-transport/socket.so
0xbb7c5cac  at
/autobuild/install/lib/libglusterfs.so.0
0xbb7c5f00  at
/autobuild/install/lib/libglusterfs.so.0
0xbb798c5e  at /autobuild/install/lib/libglusterfs.so.0
0x80515a8  at /autobuild/install/sbin/glusterfs
0x804c505 <__start+309> at /autobuild/install/sbin/glusterfs



-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] READDIR bug in NFS server (wasl: mount.t oddity)

2014-08-15 Thread Emmanuel Dreyfus
Emmanuel Dreyfus  wrote:

> Fixing this is not straightforward. The eof field is set in the NFS reply
> frame by nfs3_fill_readdir3res() when op_errno is ENOENT. Here is below the
> kind of backtrace to  nfs3_fill_readdir3res() I get when mounting the NFS
> filesystem. Further debugging shows op_errno is always 0. Obviously there must
> be a op_errno = ENOENT missing somewhere in caller functions, but I have
> trouble to tell where. I do not see anything going to the posix xlator as I
> would have expected.

But I was a bit confused, as the request must go to bricks from NFS server
which act as a gluster client. In bricks the posix xlator is involded. It
ineed sets errno = ENOENT in posix_fill_readdir() when reaching the end of
directory.

The backtrace leading to posix_fill_readdir() is below. The next question is
once errno is set within an IO thread, how is it transmitted to the glusterfs
server part so that it has a chance to be seen by the client?

0xb9bef7ad  at
/autobuild/install/lib/glusterfs/3.7dev/xlator/storage/posix.so
0xb9befe9d  at
/autobuild/install/lib/glusterfs/3.7dev/xlator/storage/posix.so
0xb9bf0300  at
/autobuild/install/lib/glusterfs/3.7dev/xlator/storage/posix.so
0xbb779b90  at /autobuild/install/lib/libglusterfs.so.0
0xbb30dc96  at
/autobuild/install/lib/glusterfs/3.7dev/xlator/features/access-control.so
0xb9bc7ca3  at
/autobuild/install/lib/glusterfs/3.7dev/xlator/features/locks.so
0xbb77742a  at
/autobuild/install/lib/libglusterfs.so.0
0xbb78f043  at
/autobuild/install/lib/libglusterfs.so.0
0xbb795cac  at /autobuild/install/lib/libglusterfs.so.0
0xb9bb2e05  at
/autobuild/install/lib/glusterfs/3.7dev/xlator/performance/io-threads.so
0xbb705783  at /usr/lib/libpthread.so.1
0xbb491ee0 <_lwp_exit+0> at /usr/lib/libc.so.12

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] READDIR bug in NFS server (wasl: mount.t oddity)

2014-08-15 Thread Niels de Vos
On Fri, Aug 15, 2014 at 04:32:56PM +0200, Emmanuel Dreyfus wrote:
> Emmanuel Dreyfus  wrote:
> 
> > Fixing this is not straightforward. The eof field is set in the NFS reply
> > frame by nfs3_fill_readdir3res() when op_errno is ENOENT. Here is below the
> > kind of backtrace to  nfs3_fill_readdir3res() I get when mounting the NFS
> > filesystem. Further debugging shows op_errno is always 0. Obviously there 
> > must
> > be a op_errno = ENOENT missing somewhere in caller functions, but I have
> > trouble to tell where. I do not see anything going to the posix xlator as I
> > would have expected.
> 
> But I was a bit confused, as the request must go to bricks from NFS server
> which act as a gluster client. In bricks the posix xlator is involded. It
> ineed sets errno = ENOENT in posix_fill_readdir() when reaching the end of
> directory.
> 
> The backtrace leading to posix_fill_readdir() is below. The next question is
> once errno is set within an IO thread, how is it transmitted to the glusterfs
> server part so that it has a chance to be seen by the client?

I've just checked xlators/nfs/server/src/nfs3.c a little, and it seems 
that at least nfs3svc_readdir_fstat_cbk() tries to handle it:

4093 /* Check whether we encountered a end of directory stream while
4094  * readdir'ing.
4095  */
4096 if (cs->operrno == ENOENT) {
4097 gf_log (GF_NFS3, GF_LOG_TRACE, "Reached end-of-directory");
4098 is_eof = 1;
4099 }

is_eof is later on passed to nfs_readdir_reply() or nfs_readdirp_reply():

4111 nfs3_readdir_reply (cs->req, stat, &cs->parent,
4112 (uintptr_t)cs->fd, buf, &cs->entries,
4113 cs->dircount, is_eof);

4118 nfs3_readdirp_reply (cs->req, stat, &cs->parent,
4119  (uintptr_t)cs->fd, buf,
4120  &cs->entries, cs->dircount,
4121  cs->maxcount, is_eof);

There are other callers of nfs3_readdir{,p}_reply() that do not pass a
conditional is_eof. Fixing these callers looks like a good place to 
start. I don't have time to look into this today or over the weekend, 
but I can plan to check it next week.

In any case, do file a bug for it (and add me on CC) so that I won't 
forget to follow up.

Thanks,
Niels


> 
> 0xb9bef7ad  at
> /autobuild/install/lib/glusterfs/3.7dev/xlator/storage/posix.so
> 0xb9befe9d  at
> /autobuild/install/lib/glusterfs/3.7dev/xlator/storage/posix.so
> 0xb9bf0300  at
> /autobuild/install/lib/glusterfs/3.7dev/xlator/storage/posix.so
> 0xbb779b90  at /autobuild/install/lib/libglusterfs.so.0
> 0xbb30dc96  at
> /autobuild/install/lib/glusterfs/3.7dev/xlator/features/access-control.so
> 0xb9bc7ca3  at
> /autobuild/install/lib/glusterfs/3.7dev/xlator/features/locks.so
> 0xbb77742a  at
> /autobuild/install/lib/libglusterfs.so.0
> 0xbb78f043  at
> /autobuild/install/lib/libglusterfs.so.0
> 0xbb795cac  at /autobuild/install/lib/libglusterfs.so.0
> 0xb9bb2e05  at
> /autobuild/install/lib/glusterfs/3.7dev/xlator/performance/io-threads.so
> 0xbb705783  at /usr/lib/libpthread.so.1
> 0xbb491ee0 <_lwp_exit+0> at /usr/lib/libc.so.12
> 
> -- 
> Emmanuel Dreyfus
> http://hcpnet.free.fr/pubz
> m...@netbsd.org
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-devel
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] GlusterFS new-features/project ideas

2014-08-15 Thread Vimal A R
Hello Gluster devel list,

I would like to know if there exist a to-do feature-list somewhere on 
gluster.org, for features that would be expected to be implemented sooner or 
later.

I am looking for a project idea for my under graduate degree which can be 
completed in around 3-4 months. Are there any suggestions/ideas to help me 
further? 


Thanks a lot,___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] READDIR bug in NFS server (wasl: mount.t oddity)

2014-08-15 Thread Emmanuel Dreyfus
Niels de Vos  wrote:

> I've just checked xlators/nfs/server/src/nfs3.c a little, and it seems 
> that at least nfs3svc_readdir_fstat_cbk() tries to handle it:

cs->operrno is always 0 there. The value comes from nfs3svc_readdir_cbk where it
is 0 as well. The backtrace tells mewe go throigh:
nfs3svc_readdir_cbk 
io_stats_readdirp_cbk
dht_readdirp_cbk 
stripe_readdirp_cbk<- errno == ENOENT is lost here
afr_readdir_cbk 
client3_3_readdirp_cbk 

In stripe_readdirp_cbk:
STRIPE_STACK_UNWIND (readdir, frame, local->op_ret,
 local->op_errno, &local->entries, NULL);

Here local->op_errno = 0 and op_errno = 2 (ENOENT). I suspect op_ret is not set
correctly. I will explore further later.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] READDIR bug in NFS server (wasl: mount.t oddity)

2014-08-15 Thread Emmanuel Dreyfus
Emmanuel Dreyfus  wrote:

> Here local->op_errno = 0 and op_errno = 2 (ENOENT). I suspect op_ret is
> not set correctly. I will explore further later.

Here is a possible fix:
http://review.gluster.org/8493

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Academic project in Distributed systems

2014-08-15 Thread Vipul Nayyar
Hi Justin,

I'd like to work preferably in C or Python. BTW apart from Java, language is 
not a barrier for me if the problem being encountered is challenging. 

Also I won't say I'm very good at maths, but I'm quite comfortable with 
pursuing any topic that might be relevant to the project.
Please do refer me to any relevant topics that I should look at.

Regards
Vipul Nayyar 



On Thursday, 7 August 2014 9:06 PM, Justin Clift  wrote:
 


- Original Message -
> Hello,
> 
> I'm Vipul. I'm currently in my final year of computer engineering and would
> like some guidance on choosing an academic project to spend 2-3 months
> working on it. I've got a little experience in some internal Glusterfs
> components. But overall, I'm interested in contributing to the field of
> Distributed systems, OS or Cloud. Although an opportunity to contribute back
> to an Open source project including Gluster or some another would be great,
> but a research oriented project in this domain would be really exciting to
> work on.

As a first thought before recommending stuff, which programming languages do
you have a good understanding of (and like to use)?

That will help figure things out.  Also, how good are your math skills?  If
they're very strong, then some of the clustering algorithm stuff might be
the thing to look at. :)



> On another note, I apologize if this email perceives you to be a misuse of
> the mailing list, but I'd really be grateful, if I could get any pointers
> regarding this.

You're fine.  It's not abuse of the mailing list at all. :)

Regards and best wishes,

Justin Clift

-- 
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel