ld-elf/libc problem? (was: php3/php4 problem)

2003-09-01 Thread Andrey Alekseyev
Hello,

Two months ago, after a thorough tracking of commits to STABLE, I've built
a 4.8-STABLE distribution. The distribution was built from sources cvsup'd
on Jul 06, 2003 and was targeted to be an upgrade package for a large
production site.

While the customer was testing this distribution, a serious compatibility
problem revealed. Namely, php3+php4 Apache dynamic modules combo stopped
working. Perfectly stable before, this configuration always caused Apache
to crash on 4.8-STABLE (Jul 06). Rebuilding Apache and php didn't help.
Even with the simpliest config, Apache always failed when php3 and php4 were
both enabled.

I spent long hours reading through commit logs since 4.6.2 and I found a
solution. It turns out that backing out the commits below and replacing
certain ld-elf and libc source files with the appropriate old versions
effectively solves php3+php4 dynamic modules configuration.

Frankly speaking, I'm extremely suprised just nobody noticed this problem.
All I could find were two postings from people in Germany. The customer
I worked for, contacted them only to find they had given up and thrown
their php3+php4 combo away.

Btw, this commit
MFC: r1.69, support binaries with arbitrary number of PT_LOAD segments
probably also needs suitable modifications to the kernel RLIMIT_VMEM mechanism.

Comments?

Old versions checkout list:

cvs co -r1.10.2.2  include/dlfcn.h
cvs co -r1.20.2.1  include/link.h
cvs co -r1.6   lib/libc/gen/dlfcn.c
cvs co -r1.43.2.11 libexec/rtld-elf/rtld.c
cvs co -r1.7   libexec/rtld-elf/map_object.c
cvs co -r1.3.2.2   libexec/rtld-elf/malloc.c
cvs co -r1.15.2.5  libexec/rtld-elf/rtld.h
cvs co -r1.73.2.12 sys/kern/imgact_elf.c
cvs co -r1.20.2.1  sys/sys/link_elf.h
cvs co -r1.16  sys/vm/vm.h



Commits backed out:

kan 2002/11/29 07:22:17 PST

   Modified files:(Branch: RELENG_4)
 libexec/rtld-elf rtld.c
   Log:
   MFC: r1.68 symbool lookup order change.

   Revision   ChangesPath
   1.43.2.12  +12 -12src/libexec/rtld-elf/rtld.c

kan 2002/11/29 08:05:14 PST

   Modified files:(Branch: RELENG_4)
 libexec/rtld-elf map_object.c rtld.c
   Log:
   MFC: r1.69, support binaries with arbitrary number of PT_LOAD segments

   Revision   ChangesPath
   1.7.2.1+57 -39src/libexec/rtld-elf/map_object.c
   1.43.2.13  +1 -5  src/libexec/rtld-elf/rtld.c

dillon  2002/12/28 11:49:41 PST

   Modified files:(Branch: RELENG_4)
 sys/vm   vm.h
 sys/kern imgact_elf.c
 libexec/rtld-elf map_object.c
   Log:
   MFC ELF coredump handling fixes.  Do not skip read-only pages unrelated
   to the binary image.  Use NOCORE to differentiate between the two.
   Introduce a debug.elf_legacy_coredump sysctl which, if set, reverts to
   the old behavior.

   See the log message in sys/kern/imgact_elf.c 1.133.

   PR: kern/45994

   Revision   ChangesPath
   1.7.2.2+23 -6 src/libexec/rtld-elf/map_object.c
   1.73.2.13  +31 -11src/sys/kern/imgact_elf.c
   1.16.2.1   +2 -1  src/sys/vm/vm.h

kan 2003/02/20 12:42:46 PST

   Modified files:(Branch: RELENG_4)
 include  dlfcn.h link.h
 lib/libc/gen dlfcn.c
 libexec/rtld-elf malloc.c rtld.c rtld.h
 sys/sys  link_elf.h
   Added files:   (Branch: RELENG_4)
 lib/libc/gen dlinfo.3
   Log:
   MFC: Properly remove unloaded objects from all lists.
Implement dlinfo function.

   Aproved by: re (murray)

   Revision   ChangesPath
   1.10.2.3   +35 -1 src/include/dlfcn.h
   1.20.2.2   +14 -3 src/include/link.h
   1.6.2.1+9 -1  src/lib/libc/gen/dlfcn.c
   1.3.2.1+266 -0src/lib/libc/gen/dlinfo.3 (new)
   1.3.2.3+18 -1 src/libexec/rtld-elf/malloc.c
   1.43.2.15  +288 -56   src/libexec/rtld-elf/rtld.c
   1.15.2.6   +3 -3  src/libexec/rtld-elf/rtld.h
   1.20.2.2   +14 -3 src/sys/sys/link_elf.h
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: open() and ESTALE error

2003-06-20 Thread Andrey Alekseyev
Terry,

Thanks much for you comments, but see below.

 The real problem here is that you know you did an operation
 on the file which would break the name/nfsnode relationship,
 but did not flush the cached name and nfsnode data.

nfs_request() actually calls cache_purge() on ESTALE, and vn_open()
frees vnode with vput() if a lookup was successful but there were
an error from the underlying filesystem (like ESTALE resulting from
nfs_request() which is eventually called from VOP_ACCESS or VOP_OPEN).

 A more correct solution would resync the nfsnode.

I think this is exactly what happens :) Actually, I believe, I'm just
getting another namecache entry with another vnode/nfsnode/file handle.

 The main problem with your solution is that it doesn't work
 in the case that you don't know the name of the remote file
 (in which case, all you really have is a stale file handle,
 with no way to unstale it).

I think, in this case (if the file was rm'd on the server), I'll just
get ENOENT from the second vn_open() attempt, which would be more
than appropriate. A real drawback is that for a stale current
directory it'll take another lookup to detect true ESTALE.

 This would fix a lot more cases than the single failure you
 are fixing.

Actually, as I said, I played with different parts of the code to solve
this (including, nfs_open(), nfs_access(), nfs_lookup() and vn_open())
only to find the previously mentioned solution to be the simpliest and
most suitable for all situations (for me!)  :)
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: open() and ESTALE error

2003-06-20 Thread Andrey Alekseyev
Don,

 I can't get very enthusiastic about changing the file system independent
 code to fix a deficiency in the NFS implementation.

You're right. But it appears to be hard and inefficient to fix NFS for this
(I tried, though). It will certainly require to pass names below VFS.
On the other hand, there are NFS-related functions in the VFS already. See
vfs_syscalls.c:getfh(), fhopen() and similar functions. There are things
related to NFS server in the UFS/FFS code too. So, I finally decided
that my fix doesn't do much harm to the above mentioned concept :)

 current working directory, and your current working directory is nuked
 on the server, vn_open will return ESTALE, and your patch above will
 loop forever.

It won't loop forever :) The stale integer is in there exactly for that
purpose :) In case of a stale current directory, open() will still return
ESTALE. In case of a file that was rm'd from the server, I believe
it'll return something different.

 If the rename on the server was done within the attribute validity time
 on the client, vn_open() will succeed even without your patch, but you
 may encounter the ESTALE error when you actually try to read or write
 the file.

Sure! But open() will succeed and probably you'll even be lucky to get
file contents from the cache. But that's another story, related to
attributes tuning (I have another patch for that:)  However, even with
the existing FreeBSD NFS attribute cache behaviour, it's ok for me.

 server could do the rename while the client has the file open, after
 which some I/O operation on the client will encounter ESTALE.

Sure. That's perfectly understood. I'm not trying to solve all the
NFS inefficiencies related to heavily shared files.

 acdirmin/acdirmax, or a new handle timeout parameter.  This may decrease
 performance, but nothing is free ...

In the normal situation, namecache entry+vnode+nfsnode+file handle may
stay cached for a really long time (until re-used? deleted or renamed
on the *client*). Expiring file handles (a new mechanism?) means much the
same to me as simply obtaining a new name cache entry+other data
on ESTALE :) I may be wrong, though.

Anyway, thanks for the comments.

See also:
http://www.blackflag.ru/patches/nfs_attr.txt
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: open() and ESTALE error

2003-06-20 Thread Andrey Alekseyev
Don,

 One case where there is a difference between timing out old file handles
 and just invalidating them on ESTALE:

Frankly, I just didn't find any mechanism in the STABLE kernel that
does timing out for file handles. Do you mean, it would be nice to have
it or are you trying to point it out to me? ;-P

 client%   cmd1  file1; cmd2  file2
 server% mv file1 tmpfile; mv file2 file1; mv tmpfile file1
 
 wait an hour
 
 client% cat /dev/null  file1
 
 If file handles are cached indefinitely, and the client didn't recycle
 the vnode for file1, which file on the server got truncated?  Since
 neither file was deleted on the server, you can't rely on ESTALE to
 detect this situation.

Eh, but the generation number for file1 should have been changed! This will
result in a definite ESTALE error for file1 from the server. That is, I
believe that if you attempt to open(file1, O_CREAT) after an hour, you'll
get ESTALE from the server (on which nfs_request() will invalidate file1
namecache entry and vnode+nfsnode+old-file-handle) and the second vn_open()
will re-lookup file1 and get a valid new file handle.

Actually, this is what indeed happens if the second open() comes from the
userland application :)  I'm just trying to eliminate the need of modifying
a generic application.

For my example with moves, the next cat will always(!) succeed.

 Question: does the timeout of the directory attributes cause open() do
 do an NFS lookup on the file, or does open() just find the vnode in the
 cache and use its cached handle?

Well, for open() without O_CREAT the sequence is this:
open() - vn_open() - namei() - lookup() - VOP_LOOKUP() - nfs_lookup()
  |
  VOP_ACCESS() - nfs_access() [ - nfs3_access_otw() possibly]
  |
  VOP_OPEN() - nfs_open()

Lookup is always done first (obviously). It may return cached name which
contains a pointer to a cached vnode/nfsnode. Cached vnode/nfsnode is used
further in VOP_ACCESS() and VOP_OPEN(). Either function may or may not
update file attributes cached inside nfsnode. Neither VOP_ACCESS() or
VOP_OPEN() ever updates the *file handle*. File handle comes from
VOP_LOOKUP().  And VOP_LOOKUP() only places it there if vnode/nfsnode isn't
cached.  Which I believe happens only if there is no cached filename in
the namecache. I really tried to do my best to describe everything in:
http://www.blackflag.ru/patches/nfs_attr.txt
Please take a look.

Whether ESTALE came from VOP_ACCESS() or VOP_OPEN() depends on several
factors. Namely, the value of nfsaccess_cache_timeout sysctl, acmin/acmax
and the age of the file in question.

Generally speaking, if nfsaccess_cache_timeout is less than acmin,
VOP_ACCESS() that comes right before VOP_OPEN() in vn_open() will try to do
an access RPC request and it'll fail if the file handle is stale. If
nfsaccess_cache_timeout is greater than acmin, than it's possible that
VOP_ACCESS() will answer yes basing on the cached attributes, but
VOP_GETATTR(), which is called from nfs_open() (which is VOP_OPEN() for
NFS) will in turn go to the wire and still nfs_request() will fail with
ESTALE.

Hope, I'm making it clear :)
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: open() and ESTALE error

2003-06-20 Thread Andrey Alekseyev
 Eh, but the generation number for file1 should have been changed! This will

I'm sorry, the generation number is not changed in your scenario. Thus,
I believe if the sequence of actions on the server is 

mv file1 tmpfile
mv file2 file1
mv tmpfile file1

like you described, it's safe to continue to use a cached file handle
for file1 on the server since it still references the original file.
And file2 just disappears from the server.
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: open() and ESTALE error

2003-06-20 Thread Andrey Alekseyev
Terry,

 The place to correct this is probably the underlying FS.  I'd
 argue that getting ESTALE is a poke with a sharp stick that
 makes this more likely to happen.  ;^).

Initially I was going to fix the underlying FS (that is, the NFS code).
But it's extremely hard to do nice, because I need to re-lookup the name(!)
which is not referenced (easily? at all?) below VFS.

  I think this is exactly what happens :) Actually, I believe, I'm just
  getting another namecache entry with another vnode/nfsnode/file handle.
 
 You can't have this for other reasons; specifically, if you have
 the file open at th time of the rename, and it becomes a .#nfs...
 file (or whatever) on the server.

I didn't trace sillyrename scenario much. But I believe, nfs_sillyrename()
keeps it tight. At least, it uses nfs_lookitup() which may actually
*update* the file handle. And it plays with the name cache purging as well.
So I don't consider it as a real problem.

However, for open for reading/writing the scenario looks quite clear for me.
As I said in my previous message to Don, I'm just trying to eliminate
the need to modify otherwise generic application to cope with the necessity
of doing immediate open() if the first open failed with ESTALE. For a certain
more or less common situation :)  And I know, the second open from the
userland application always works for the case I've described.

 Don points out that Solaris tries to fix this via the noac mount
 option for client NFS.

It does bad things to performance, though :)  I'm not trying to uncache
everything. It's safe for me to use file pagecache if open() succeeds.
I'm not trying to reach an absolute shared file integrity with NFS, believe
me :)

   { A, B, C }
 fd1 open on B
 fd2 open on C
 rename B - C
 rename A - B
 
 ?  With your patch, I think we would potentially convert fd2 to point
 to B whien it really *should* be ESTALE, which is wrong (think in
 terms of 2 or more clients doing the operations).

You didn't specify client or server side, though. The result heavily
depends on the exact scenario.

With a single client, a new open() for C will result in fd2 if the
original C is still opened (because of sillyrename?).
Without fd2, any new open() for C will get a valid file handle for what
originally was B. And that's a correct behaviour.

If the renames were on the server, then fd1 will be valid until the last
client's close. However, any reference to the original C will fail.
Re-opening C should result in a new file handle for what originally was B.

Am I wrong?
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: open() and ESTALE error

2003-06-20 Thread Andrey Alekseyev
Don,

 old vnode and its associated file handle.  If the file on the server was
 renamed and not deleted, the server won't return ESTALE for the handle

I'm all confused and messed up :)  Actually, a rename on the server is not
the same as sillyrename on the client.  If you rename a file on the
server for which there is a cached file handle on the client, next time
the client will use its cached file handle, it'll get ESTALE from the server.
I don't know how this happens, though. Until I dig more around all the
rename paraphernalia, I won't know. If someone can clear this out, please
do. It'll be much appreciated. At this time I can't link this with the
inode generation number changes (as there is no new inode allocated when
the file is renamed).

I'm not strong in rename and sillyrename alchemy, just can deduce something
from the code, though not much. However, I've just tested my patch with
the rename-to-other-name-on-the-server scenario, and it seems to return
ENOENT to the application after the local file pagecache is invalidated and
the client tries to actually read the file from server using old name
and old file handle.

 Also, fixing open() doesn't fix similar problems that can occur with
 other syscalls that take path names, such as stat() and readlink().

That's a good point. However, if the patch for open() succeeds it can be
further extended to other syscalls as well.

 If the lookup code is changed so that it more frequently revalidates the
 name-vnode-handle entries, then the window where open() can fail due
 to ESTALE would be greatly reduced.

Sorry, I've got no time for that :)  I'm generally not in this area of
activities. At least for the next few years I'm an extremely busy man :)

Again, thanks a lot for your comments.
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


open() and ESTALE error

2003-06-19 Thread Andrey Alekseyev
Hello,

I've been trying lately to develop a solution for the problem with
open() that manifests itself in ESTALE error in the following situation:

1. NFS server: echo   file01
2. NFS client: cat file01
3. NFS server: echo   file02  mv file02 file01
4. NFS client: cat file01 (either old file01 contents or ESTALE)

My study shows that actually the problem appears to be in VOP_ACCESS()
which is called from vn_open(). If nfs_access() decides to go to the wire
in #4, it then uses a cached file handle which is indeed stale. Thus,
open() eventually fails with ESTALE too (ESTALE comes from underlying
nfs_request()).

I understand all the fundamental NFS-related integrity problems, but not
this one :) That is, I see no reason for open() to fail to open a file for
reading or writing if the system knows the problem is it's own. Why not
just do another lookup and try obtain a valid file handle?

I was playing with different parts of the kernel while fixing this for
myself. However, I believe, the simpliest patch would be for
vfs_syscalls.c:open() (I've also made a working patch against vn_open(),
though).

Could anyone please be so kind to comment this issue?

TIA

--- kern/vfs_syscalls.c.origThu Jun 19 13:22:50 2003
+++ kern/vfs_syscalls.c Thu Jun 19 13:29:11 2003
@@ -1008,6 +1008,7 @@
int type, indx, error;
struct flock lf;
struct nameidata nd;
+   int stale = 0;
 
oflags = SCARG(uap, flags);
if ((oflags  O_ACCMODE) == O_ACCMODE)
@@ -1025,8 +1026,15 @@
 * the descriptor while we are blocked in vn_open()
 */
fhold(fp);
+again:
error = vn_open(nd, flags, cmode);
if (error) {
+   /*
+* if the underlying filesystem returns ESTALE
+* we must have used a cached file handle.
+*/
+   if (error == ESTALE  stale++ == 0)
+   goto again;
/*
 * release our own reference
 */
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: open() and ESTALE error

2003-06-19 Thread Andrey Alekseyev
Corrections:
- the patch is against STABLE
- I know the second lookup will fail if the *current* directory itself
is stale :)

 Could anyone please be so kind to comment this issue?

In particular, I'd like to know if I need NDINIT before entering vn_open()
again, as there are several comments throughout the code about struct nd
not being safe after namei() and lookup().  However, the patch seems to
work well without NDINIT. Another thing that interests me, is if any vnode
leakage is possible with re-entering vn_open() for the second time. It
seems to me that not (as the nd.ni_vp is vput()'d inside vn_open() in case
of any error after a successful lookup with namei()).

 --- kern/vfs_syscalls.c.orig  Thu Jun 19 13:22:50 2003
 +++ kern/vfs_syscalls.c   Thu Jun 19 13:29:11 2003
 @@ -1008,6 +1008,7 @@
   int type, indx, error;
   struct flock lf;
   struct nameidata nd;
 + int stale = 0;
  
   oflags = SCARG(uap, flags);
   if ((oflags  O_ACCMODE) == O_ACCMODE)
 @@ -1025,8 +1026,15 @@
* the descriptor while we are blocked in vn_open()
*/
   fhold(fp);
 +again:
   error = vn_open(nd, flags, cmode);
   if (error) {
 + /*
 +  * if the underlying filesystem returns ESTALE
 +  * we must have used a cached file handle.
 +  */
 + if (error == ESTALE  stale++ == 0)
 + goto again;
   /*
* release our own reference
*/
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: open() and ESTALE error

2003-06-19 Thread Andrey Alekseyev
Another correction:
- statement below is valid for a configuration where nfsaccess_cache_timeout
is generally less than acmin, otherwise chances are the failure will be
VOP_OPEN while requesting new attributes by a call to VOP_GETATTR

 which is called from vn_open(). If nfs_access() decides to go to the wire
 in #4, it then uses a cached file handle which is indeed stale. Thus,
 open() eventually fails with ESTALE too (ESTALE comes from underlying
 nfs_request()).
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: open() and ESTALE error

2003-06-19 Thread Andrey Alekseyev
In case anyone interested I wrote a paper for my own reference on
the FreeBSD NFS open() and attribute cache behavior.
It can be found here:
http://www.blackflag.ru/patches/nfs_attr.txt
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Creating a sysctl? (mission impossible)

2002-08-22 Thread Andrey Alekseyev

 with a 2.5-yr-old port to 4-STABLE.
   http://www-2.cs.cmu.edu/~dpetrou/research.html

Btw, we still use it quite successfully with minor modifications
on mass hosting backend servers. Currently, with 4.6-STABLE.
It really makes a difference.


-- 
Andrey Alekseyev. Zenon N.S.P.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: swapoff?

2002-07-13 Thread Andrey Alekseyev

Also this would probably be useful in the situation when you need
to change swap device on a running system.  We had to do this once
or twice on a very busy commerical mail server running Solaris. We
needed to dismount current swap device and use it for other purpose
while having switched paging/swapping to another disk.

 I wouldn't worry about it.  Nobody turns off swap on a running system
 at a whim.  It just needs to prevent stupid mistakes like trying to
 remove a swap device without having adequate memory + other swap to
 take care of the data.
 
   -Matt
   Matthew Dillon 
   [EMAIL PROTECTED]


-- 
Andrey Alekseyev. Zenon N.S.P.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: File locking, closes and performance in a distributed file systemenv

2002-05-15 Thread Andrey Alekseyev

Btw, Terry's implementation of this ported to 4.5-STABLE could be found here:

http://www.blackflag.ru/patches/nfs-client-and-server-locking-4.5-STABLE-20020312.diff

I've been testing it continuously for a month or so with an NFS server
on Solaris. Particularly, that was a combination of Connectathon NFS Testsuite
and several hundred of Perl scripts doing flock on a remote and local files.
So far found no problems with that.

:)

JFYI

 I've actually wanted the VOP_ADVLOCK to be veto-based for going
 on 6 years now, to avoid precisely the type of problems your are
 now facing.  If the upper layer code did local assertion on vnodes,
 and called the lower layer code only in the success cases, then the
 implementation would actually be done for you already.
 
 -- Terry
 
 To Unsubscribe: send mail to [EMAIL PROTECTED]
 with unsubscribe freebsd-hackers in the body of the message
 


-- 
Andrey Alekseyev. Zenon N.S.P.
Senior Unix systems administrator

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message