On Mar 29, 2007, at 1:51 PM, Sam Lang wrote:
Hi Phil,
I managed to try your remove dir test on a 2.4 kernel, and was able
to get a segfault with unpatched HEAD. It looks like returning the
actual dentry is unexpected for 2.4. If I just return NULL all the
time in the successful cases, it seems to work. This is what your
patch essentially does I think. I've attached a patch that does
pretty much the same thing. Seems to work on 2.4 and 2.6 (as yours
does).
On Mar 27, 2007, at 4:33 PM, Phil Carns wrote:
Just to clarify a little bit, there is actually a problem with the
2.6 code path here as well. I went back and ran some tests with
and without the namei.patch on a RHEL4 box (2.6.9-something) to
confirm. If I leave the patch out, then the following two LTP
tests (open08, statfs02) fail:
On 2.6.20 these tests seem to pass with unpatched HEAD. But yeah,
with your patch and the one attached, they pass as well.
Its odd that esp. the 2.6 kernel expects NULL instead of the actual
dentry. The particular semantics (and how its changed over the
different kernel versions) of lookup's expected return value aren't
well documented.
It looks like if a non-null dentry is returned from lookup, dput is
called on that dentry, which decrements the usage count. If null is
returned dput isn't called. Could it be that we're actually leaking
entries in the dcache with these patches?
-sam
In any case, let me know what you think of this patch.
-sam
<namei3.patch>
<<<test_start>>>
tag=open08 stime=1175025961
cmdline="open08"
contacts=""
analysis=exit
initiation_status="ok"
<<<test_output>>>
open08 1 PASS : expected failure - errno = 17 : File exists
open08 2 PASS : expected failure - errno = 21 : Is a
directory
open08 3 PASS : expected failure - errno = 20 : Not a
directory
open08 4 FAIL : unexpected error - 2 : No such file or
directory - expected 36
open08 5 PASS : expected failure - errno = 13 : Permission
denied
open08 6 PASS : expected failure - errno = 14 : Bad address
<<<execution_status>>>
duration=0 termination_type=exited termination_id=1 corefile=no
cutime=0 cstime=1
<<<test_end>>>
<<<test_start>>>
tag=statfs02 stime=1175026039
cmdline="statfs02"
contacts=""
analysis=exit
initiation_status="ok"
<<<test_output>>>
statfs02 1 PASS : expected failure - errno = 20 : Not a
directory
statfs02 2 PASS : expected failure - errno = 2 : No such
file or directory
statfs02 3 FAIL : unexpected error - 2 : No such file or
directory - expected 36
statfs02 4 PASS : expected failure - errno = 14 : Bad address
statfs02 5 PASS : expected failure - errno = 14 : Bad address
<<<execution_status>>>
duration=0 termination_type=exited termination_id=1 corefile=no
cutime=0 cstime=0
<<<test_end>>>
They pass without any trouble if I apply namei.patch.
I haven't dug into your technical comments below yet (inodes and
dentries make my head hurt), but I'll try and catch up on them
soon. From a high level, it does sound like there are some odd
things in the patch, but I don't know which part of it is
relevant. I just narrowed down cvsps patchsets until I found a
code snippet that seemed to be the source of my problems and
generated a patch to revert it :)
I wish that I could say that we could stop using 2.4 soon, but we
can't quite seem to get to that point yet :(
-Phil
Sam Lang wrote:
Hi Phil,
I have comments about the patch inline. Two general comments
though. Reading this code (at least for me) has been hard
because of all the feature check #ifdefs. I think trying to
support all these different kernel versions gives us a lot of
headache that maybe we don't need (I've seen this recently with
HAVE_AIO_VFS_SUPPORT). Any chance we could pair it down to more
recent versions? Do you guys expect to discontinue supporting
2.4 in the near future? Would it be possible to say that future
pvfs releases only support 2.6 (maybe even > 2.6.x)? Anyone
that has an older kernel has to use an older pvfs version?
-sam
On Mar 20, 2007, at 9:36 AM, Phil Carns wrote:
I am sending this patch in a separate email because it may need
some discussion to hash out. Sometime in the past several
months, the pvfs2_lookup() function in namei.c changed (I think
along with something not directly related, but I don't recall
exactly what happened now).
This change caused several directory related bugs to show up for
us on 2.4 and 2.6 kernels. The 2.4 one was more severe,
though, because it caused a kernel panic. It could be
triggered by the "rename01" test in LTP, or by the following
manual steps:
[EMAIL PROTECTED] pvfs2]# mkdir testdir
[EMAIL PROTECTED] pvfs2]# cd testdir
[EMAIL PROTECTED] testdir]# mkdir dir1
[EMAIL PROTECTED] testdir]# mv dir1 dir2
[EMAIL PROTECTED] testdir]# ls -alh
total 12K
drwxr-xr-x 1 root root 4.0K Dec 1 12:18 .
drwxrwxrwt 1 root root 4.0K Dec 1 12:17 ..
drwxr-xr-x 1 root root 4.0K Dec 1 12:17 dir2
[EMAIL PROTECTED] testdir]# rm dir2
<crash>
... so it had something to do with removing a directory that
had previously been renamed.
At any rate, I don't know enough about dentries and inodes
anymore to truly understand the old logic that used to work or
the newer logic that causes us problems. This patch just
naively reverts some of the logic in namei.c to the point that
it works again for us (without changing anything else that was
in that set of commits). With this in place, we don't see any
more test case failures or kernel panics on 2.6 or 2.4.
We have been using this patch for several months with success,
but it would probably be a good idea for someone more familiar
with this code to look at the change more carefully.
I don't have access to a 2.4 kernel with root at the moment, but
following the code paths I have some comments. Hopefully Murali
can chime in at some point and correct me where I'm wrong about
stuff.
-Phil
Index: pvfs2_src/src/kernel/linux-2.6/namei.c
===================================================================
--- pvfs2_src/src/kernel/linux-2.6/namei.c (revision 2909)
+++ pvfs2_src/src/kernel/linux-2.6/namei.c (revision 2910)
@@ -164,19 +164,24 @@
inode = pvfs2_iget(sb, &new_op->downcall.resp.lookup.refn);
if (inode && !is_bad_inode(inode))
{
- struct dentry *res;
+ found_pvfs2_inode = PVFS2_I(inode);
+ /* store the retrieved handle and fs_id */
+ found_pvfs2_inode->refn = new_op-
>downcall.resp.lookup.refn;
+
I think this is redundant. Unless you're using a _very_ old 2.4
kernel, the pvfs2_iget call sets the fsid and handle in the
pvfs2 inode pointer of the inode. Basically, pvfs2_iget
translates to a pvfs2_iget_common (with keep_locked == 0), which
means that pvfs2_set_inode will get called, which will do the
same thing as above. The only way I can see that this wouldn't
happen is if iget4_locked isn't supported by your kernel
version, but it appears to have been in 2.4.25 and up, so you'd
have to be running with something pretty old.
I think that Murali added pvfs2_iget to abstract out this manual
setting, which will still appear to do in some places.
/* update dentry/inode pair into dcache */
dentry->d_op = &pvfs2_dentry_operations;
- res = pvfs2_d_splice_alias(dentry, inode);
+ pvfs2_d_splice_alias(dentry, inode);
gossip_debug(GOSSIP_NAME_DEBUG, "Lookup success
(inode ct = %d)\n",
(int)atomic_read(&inode->i_count));
+#if 0
op_release(new_op);
if (res)
res->d_op = &pvfs2_dentry_operations;
return res;
+#endif
Here too, if you're running 2.4, then res is guaranteed to equal
dentry, so while setting the dentry_operations struct to d_op is
redundant for 2.4, its needed for 2.6 in the case where we found
a disconnected dentry and returned that instead. The only
difference here in the 2.4 code path that I can see is that if a
new entry _was_ added to the dcache, you return NULL now,
instead of returning the new entry. Since the new entry is in
the dcache, maybe that's ok (and expected for 2.4). For 2.6 it
seems clear that the new dentry is supposed to be returned.
}
else if (inode && is_bad_inode(inode))
{
@@ -227,7 +232,14 @@
}
op_release(new_op);
- return NULL;
+ if(ret != -ENOENT)
+ {
+ return ERR_PTR(ret);
+ }
+ else
+ {
+ return NULL;
+ }
This might be what's causing your failures with 2.4. It expects
NULL if a dentry was added to the dcache (which happens with
ENOENT), or a non-null error pointer for any other error. As
mentioned above, I think the semantics of the return from lookup
have changed in 2.6, so we might want to change this up a bit.
}
/* return 0 on success; non-zero otherwise */
_______________________________________________
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-
developers
_______________________________________________
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers