Re: [OpenAFS] best practice for salvage

2008-04-10 Thread Esther Filderman
On Thu, Apr 3, 2008 at 2:34 PM, Chas Williams (CONTRACTOR)
<[EMAIL PROTECTED]> wrote:
> In message <[EMAIL PROTECTED]>,Jeffrey Altman writes:
>  >What normal successfully completed operation is leaving unreferenced
>  >.__afs files behind?
>  >
>  >Lets fix the bug.
>
>  good idea.  i dont know how you fix machines not under your control
>  running older (broken) clients.  and the primary cause for these files is
>  typically crashing of the host operating system while holding a deleted
>  file reference.  i doubt you can fix this with a patch.
>
>

I always thought the problem was that Fortran compilers are horrendous
pieces of @*(%)[EMAIL PROTECTED]  But maybe I'm just a biased old fart.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] best practice for salvage

2008-04-05 Thread Todd M. Lewis



Jeffrey Altman wrote:

The way I would have implemented this functionality would be for the
file to be moved into the local client's cache and removed from the
file server since the file has now been unlinked and can therefore
not be referenced by other clients.  It would then be the client's
responsibility to clean up after itself.


Some files are larger than local client caches.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] best practice for salvage

2008-04-03 Thread Robert Banz


On Apr 3, 2008, at 1:11 PM, Jeffrey Altman wrote:


Robert Banz wrote:

That wouldn't work, because the file could have been open()'d by  
two different cache managers, unlinked by one, but should still be  
able to be written to.


That doesn't work.  Eventually the cache manager on the machine on  
which
the unlink() was executed is going to call RXAFS_RemoveFile().  When  
that happens the other client that has the file open locally is  
going to lose.  Next time it calls RXAFS_StoreFile() it will get  
VNOVNODE.




Only if one of them closes the file will that occur ;)

-rob
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] best practice for salvage

2008-04-03 Thread Jeffrey Altman

Robert Banz wrote:

That wouldn't work, because the file could have been open()'d by two 
different cache managers, unlinked by one, but should still be able to 
be written to. 


That doesn't work.  Eventually the cache manager on the machine on which
the unlink() was executed is going to call RXAFS_RemoveFile().  When 
that happens the other client that has the file open locally is going to 
lose.  Next time it calls RXAFS_StoreFile() it will get VNOVNODE.




smime.p7s
Description: S/MIME Cryptographic Signature


Re: [OpenAFS] best practice for salvage

2008-04-03 Thread Derrick Brashear
On Thu, Apr 3, 2008 at 3:32 PM, John Hascall <[EMAIL PROTECTED]> wrote:
>
>  > >> Since the file server has no way of knowing if the file is still in
>  > >> use it can't delete it.
>
>  > >Why not?  Is there no way for the file server to query the
>  > >cache manager and ask?
>
>  > The fact that the file is considered temporary is only known to the
>  > client.
>
>  And to salvager :)
>
>  So, the client has opened the file.  Doesn't this mean the
>  the fileserver has a callback for this file/client?  Then
>  the fileserver sees a RENAME op to the magic .__afs
>  name.  Seems to me like it might be possible to pervert
>  the callback to check up on the client/file.

Yuck. No.

Our hit squads are on the way now. Stay right there, ok?
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] best practice for salvage

2008-04-03 Thread Robert Banz



The way I would have implemented this functionality would be for the
file to be moved into the local client's cache and removed from the
file server since the file has now been unlinked and can therefore
not be referenced by other clients.  It would then be the client's
responsibility to clean up after itself.


That wouldn't work, because the file could have been open()'d by two  
different cache managers, unlinked by one, but should still be able to  
be written to. AFS is basically handling the problem similar to the  
way that NFS did, and its always been a common to have .__nfs files  
stick around after some badness -- if you're sure you don't have long  
running applications sitting around, you could easily craft a low- 
intensity find() job to remove these. I recall running similar things  
on NFS servers periodically, which used atime as a guide.


Unfortunately, we have a lack of atime to contend with in AFS, so the  
job should probably have to keep state and remember which .__afs files  
it's seen before, and only remove them after a suitable timeframe has  
elapsed. Sounds like a rather trivial perl script to throw together.


-rob

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] best practice for salvage

2008-04-03 Thread Jeffrey Altman

John Hascall wrote:

Since the file server has no way of knowing if the file is still in
use it can't delete it.



   Why not?  Is there no way for the file server to query the
   cache manager and ask?



The fact that the file is considered temporary is only known to the
client.


 And to salvager :)


No, the salvager does not know it is temporary.  The salvager simply
deletes any files that begin with .__afs which means that if I have
to be using trading software from the American Foundry Society or 
Associated Food Stores and the application happens to use hidden 
configuration files called .__afsConfig you are erasing the 
configuration file each time you run salvager.



 So, the client has opened the file.  Doesn't this mean the
 the fileserver has a callback for this file/client?  Then
 the fileserver sees a RENAME op to the magic .__afs
 name.  Seems to me like it might be possible to pervert
 the callback to check up on the client/file.


All it means is that at one point the client registered a callback.
It does not mean that there always will be a callback.   Callbacks
are flushed by the file server when the callback table runs out of
room.  The status cache objects might be flushed by the cache manager
even if the file is still open according to the operating system
because it hasn't been written to in a while.


 Your idea seems less icky. Besides typically the client
 just wrote the file, so it probably has most or all of it
 in it's cache anyway.  BUT, what do you do when it is
 too big to be entirely cached by the client?


don't cache it.  leave it on the file server.

The point is that only the client knows what is safe to delete.
It would be exactly the same if an application left temporary files
behind on the local disk if it crashed before they were deleted.

Only someone with knowledge of the application and the context
in which the files were left behind could 100% know that it was
safe to delete them.





smime.p7s
Description: S/MIME Cryptographic Signature


Re: [OpenAFS] best practice for salvage

2008-04-03 Thread John Hascall

> >> Since the file server has no way of knowing if the file is still in
> >> use it can't delete it.

> >Why not?  Is there no way for the file server to query the
> >cache manager and ask?

> The fact that the file is considered temporary is only known to the
> client.

 And to salvager :)

 So, the client has opened the file.  Doesn't this mean the
 the fileserver has a callback for this file/client?  Then
 the fileserver sees a RENAME op to the magic .__afs
 name.  Seems to me like it might be possible to pervert
 the callback to check up on the client/file.

 Your idea seems less icky. Besides typically the client
 just wrote the file, so it probably has most or all of it
 in it's cache anyway.  BUT, what do you do when it is
 too big to be entirely cached by the client?

John

> The only reason that the files are deleted at all by the Salvager
> is because the files were renamed to .__afs.  If the application
> had created temporary files with some other name and then disappeared
> without deleting them they would not be deleted by the salvager.
> 
> The way I would have implemented this functionality would be for the
> file to be moved into the local client's cache and removed from the
> file server since the file has now been unlinked and can therefore
> not be referenced by other clients.  It would then be the client's
> responsibility to clean up after itself.
> 
> Of course, even if we do implement this functionality in a future client
> revision we can't fix the clients that are already deployed.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] best practice for salvage

2008-04-03 Thread Jeffrey Altman

John Hascall wrote:

Since the file server has no way of knowing if the file is still in
use it can't delete it.


   Why not?  Is there no way for the file server to query the
   cache manager and ask?


The fact that the file is considered temporary is only known to the
client.  AFS is not like SMB servers that issue file handles to
clients in response to an "open" call and destroy them whenever the
client becomes unresponsive.  AFS is designed to work across WANs
in which it is expected that network connectivity will be interrupted
or that a client will migrate.

The only reason that the files are deleted at all by the Salvager
is because the files were renamed to .__afs.  If the application
had created temporary files with some other name and then disappeared
without deleting them they would not be deleted by the salvager.

The way I would have implemented this functionality would be for the
file to be moved into the local client's cache and removed from the
file server since the file has now been unlinked and can therefore
not be referenced by other clients.  It would then be the client's
responsibility to clean up after itself.

Of course, even if we do implement this functionality in a future client
revision we can't fix the clients that are already deployed.




smime.p7s
Description: S/MIME Cryptographic Signature


Re: [OpenAFS] best practice for salvage

2008-04-03 Thread Derrick Brashear
Fileserver has no idea which client has it open(*), so... query who?

* Not as such. You could guess. There's no mechanism to query though.
And what if the client has gone offline now, but will come back
shortly? Or is at a new address?

On Thu, Apr 3, 2008 at 2:56 PM, John Hascall <[EMAIL PROTECTED]> wrote:
>
>
>  [EMAIL PROTECTED] writes:
>
>
> > In other words, the .__afs files are unnamed files that as far
>  > as the file server is concerned are still in use by some client.
>  > The reason the files are left behind is because the AFS cache manager
>  > that renamed the file did not delete it before it lost contact with
>  > the file server (network dropped, cache manager was stopped, machine
>  > crashed, ...).
>
>  > Since the file server has no way of knowing if the file is still in
>  > use it can't delete it.
>
>Why not?  Is there no way for the file server to query the
>cache manager and ask?
>
>  John
>
>
> ___
>  OpenAFS-info mailing list
>  OpenAFS-info@openafs.org
>  https://lists.openafs.org/mailman/listinfo/openafs-info
>
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] best practice for salvage

2008-04-03 Thread John Hascall


[EMAIL PROTECTED] writes:

> In other words, the .__afs files are unnamed files that as far
> as the file server is concerned are still in use by some client.
> The reason the files are left behind is because the AFS cache manager
> that renamed the file did not delete it before it lost contact with
> the file server (network dropped, cache manager was stopped, machine
> crashed, ...).

> Since the file server has no way of knowing if the file is still in
> use it can't delete it.

   Why not?  Is there no way for the file server to query the
   cache manager and ask?

John
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] best practice for salvage

2008-04-03 Thread Jeffrey Altman

Russ Allbery wrote:

Jeffrey Altman <[EMAIL PROTECTED]> writes:


What normal successfully completed operation is leaving unreferenced
.__afs files behind?

Lets fix the bug.


Good question.  I know we accumulate a ton of them that get cleaned up on
each salvage, but I have no idea how to figure out what's creating them
and leaving them behind.


Read some code and spoke with Derrick a bit about this.
Here is what is going on.   There isn't a bug here.

If you look at the AFS UNIX cache manager code you will see that
the __afs files are the result of an existing file being
renamed by the cache manager in src/afs/VNOPS/afs_vnop_remove.c
when unlink() has been called by an application on a file that
is currently in use.

In other words, the .__afs files are unnamed files that as far
as the file server is concerned are still in use by some client.
The reason the files are left behind is because the AFS cache manager
that renamed the file did not delete it before it lost contact with
the file server (network dropped, cache manager was stopped, machine
crashed, ...).

Since the file server has no way of knowing if the file is still in
use it can't delete it.  Or we should say it doesn't delete them.
It can be argued that it isn't even safe to delete them by taking
the volume temporarily offline and performing the salvage operation
because the salvager has no knowledge of whether or not the files
are in use.

It could certainly be the case that a policy could be put in place
to delete them after some period of time.  However nothing would
be absolutely safe for all circumstances.

Jeffrey Altman


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [OpenAFS] best practice for salvage

2008-04-03 Thread Chas Williams (CONTRACTOR)
In message <[EMAIL PROTECTED]>,Jeffrey Altman writes:
>What normal successfully completed operation is leaving unreferenced 
>.__afs files behind?
>
>Lets fix the bug.

good idea.  i dont know how you fix machines not under your control
running older (broken) clients.  and the primary cause for these files is
typically crashing of the host operating system while holding a deleted
file reference.  i doubt you can fix this with a patch.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] best practice for salvage

2008-04-03 Thread Derrick Brashear
On Thu, Apr 3, 2008 at 1:33 PM, Andrew Bacchi <[EMAIL PROTECTED]> wrote:
>
>  From what I see of the 1.4.6 SPEC file, fast-restart fileserver is enabled
> by default.  Do I need to start the server with any added options?  Is there
> documentation to read?
>
>  config_opts="--enable-redhat-buildsys \
>  %{?_with_bitmap_later:--enable-bitmap-later} \
>  %{?_with_bos_restricted:--enable-bos-restricted-mode} \
>  %{?_with_fast_restart:--enable-fast-restart} \
>
>
Actually, it's not, and it won't be, either. You need to rebuild the
RPMs if you want it.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] best practice for salvage

2008-04-03 Thread Russ Allbery
Jeffrey Altman <[EMAIL PROTECTED]> writes:

> What normal successfully completed operation is leaving unreferenced
> .__afs files behind?
>
> Lets fix the bug.

Good question.  I know we accumulate a ton of them that get cleaned up on
each salvage, but I have no idea how to figure out what's creating them
and leaving them behind.

-- 
Russ Allbery ([EMAIL PROTECTED]) 
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] best practice for salvage

2008-04-03 Thread Andrew Bacchi




>From what I see of the 1.4.6 SPEC file, fast-restart fileserver is
enabled by default.  Do I need to start the server with any added
options?  Is there documentation to read?

config_opts="--enable-redhat-buildsys \
    %{?_with_bitmap_later:--enable-bitmap-later} \
    %{?_with_bos_restricted:--enable-bos-restricted-mode} \
    %{?_with_fast_restart:--enable-fast-restart} \


Robert Banz wrote:

  
  
  Just curious,
  
  
  What makes you think running salvage is a good thing? I had
gotten to the point where I would avoid running it like the plague --
using tools such as fast-restart -- and in the time I was running
fast-restart, which included some rather nasty power events which took
things down hard. And, believe it or not, even in those incidents I
only had one or two volumes that I had to hand-salvage.
  
  
  -rob
  
  On Apr 3, 2008, at 6:48 AM, Andrew Bacchi wrote:
  
 Thanks, Esther.   I can
always count on you for good advice.

I usually run salvage by hand once or twice a year, but my gut says run
it more often.  I'll write a script that runs on odd months and call it
from either linux-cron or afs-cron.  One drawback of afs-cron is it
only knows a weekly time schedule.  Could we put that on a wish list?

Esther Filderman wrote:

  On Wed, Apr 2, 2008 at 1:43 PM, Andrew Bacchi <[EMAIL PROTECTED]> wrote:
  
  
I'm considering running a weekly salvage on all file servers from BosConfig.
Is this too often?  Any reason not to?  What are others doing?  Thanks.



  
  At my last *cough* site, we ran with fast-restart.  Because of the
cruft that would sometimes get left behind in volumes due to things
like crappy fortran compilers, I would run a salvage on each server
every 2-3 months.   As there were rarely any real errors, it ran
pretty quickly and would fit in my "official downtime" window.

I used to run 'em by hand because, well, I only had like 6 servers
(and I'm a hands-on kinda Moose), but it easily could have been
automated.

Moose
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info

  


-- 
veritatis simplex oratio est

Andrew Bacchi
Staff Systems Programmer
Rensselaer Polytechnic Institute
phone: 518 276-6415  fax: 518 276-2809

http://www.rpi.edu/~bacchi/

___ OpenAFS-info mailing
list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
  
  


-- 
veritatis simplex oratio est

Andrew Bacchi
Staff Systems Programmer
Rensselaer Polytechnic Institute
phone: 518 276-6415  fax: 518 276-2809

http://www.rpi.edu/~bacchi/


___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] best practice for salvage

2008-04-03 Thread Chas Williams (CONTRACTOR)
In message <[EMAIL PROTECTED]>,"Christopher D. Clausen" 
writes:
>Would a find command execing rm do the same thing?  Or does the salvager 
>actually need to be run for a "correct" cleanup?

you could do it with rm but users tend to change their permissions
so the script would need to also change permissions on directories.
also iterating the entire cell in afs is kind of tedious.

>Also, is it not possible to have a volume "salvaged" during a vos move? 
>(I realize this may not happen in the code now, just if such a thing is 
>indeed possible.)

dont know.  i suppose vos move could certainly choose not to move
.__afs files.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] best practice for salvage

2008-04-03 Thread Jeffrey Altman

Chas Williams (CONTRACTOR) wrote:

In message <[EMAIL PROTECTED]>,Robert Banz write
s:
What makes you think running salvage is a good thing? I had gotten to  
the point where I would avoid running it like the plague -- using  


running salvage once in a while is a good way to clean up .__afs
files.


What normal successfully completed operation is leaving unreferenced 
.__afs files behind?


Lets fix the bug.

Jeffrey Altman



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [OpenAFS] best practice for salvage

2008-04-03 Thread Christopher D. Clausen
Chas Williams (CONTRACTOR) <[EMAIL PROTECTED]> wrote:
> In message <[EMAIL PROTECTED]>,Robert
> Banz write s:
>> What makes you think running salvage is a good thing? I had gotten to
>> the point where I would avoid running it like the plague -- using
>
> running salvage once in a while is a good way to clean up .__afs
> files.

Would a find command execing rm do the same thing?  Or does the salvager 
actually need to be run for a "correct" cleanup?

Also, is it not possible to have a volume "salvaged" during a vos move? 
(I realize this may not happen in the code now, just if such a thing is 
indeed possible.)



Re: [OpenAFS] best practice for salvage

2008-04-03 Thread Robert Banz


On Apr 3, 2008, at 10:06 AM, Chas Williams (CONTRACTOR) wrote:

In message <[EMAIL PROTECTED]>,Robert  
Banz write

s:

What makes you think running salvage is a good thing? I had gotten to
the point where I would avoid running it like the plague -- using


running salvage once in a while is a good way to clean up .__afs
files.


Perhaps we should build in a procedure to do this, and just this.  
Taking the volume off-line just to clear out a little cruft is not  
something I'd consider operationally acceptable.


-rob
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] best practice for salvage

2008-04-03 Thread Chas Williams (CONTRACTOR)
In message <[EMAIL PROTECTED]>,Robert Banz write
s:
>What makes you think running salvage is a good thing? I had gotten to  
>the point where I would avoid running it like the plague -- using  

running salvage once in a while is a good way to clean up .__afs
files.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] best practice for salvage

2008-04-03 Thread Robert Banz


Just curious,

What makes you think running salvage is a good thing? I had gotten to  
the point where I would avoid running it like the plague -- using  
tools such as fast-restart -- and in the time I was running fast- 
restart, which included some rather nasty power events which took  
things down hard. And, believe it or not, even in those incidents I  
only had one or two volumes that I had to hand-salvage.


-rob

On Apr 3, 2008, at 6:48 AM, Andrew Bacchi wrote:


Thanks, Esther.   I can always count on you for good advice.

I usually run salvage by hand once or twice a year, but my gut says  
run it more often.  I'll write a script that runs on odd months and  
call it from either linux-cron or afs-cron.  One drawback of afs- 
cron is it only knows a weekly time schedule.  Could we put that on  
a wish list?


Esther Filderman wrote:


On Wed, Apr 2, 2008 at 1:43 PM, Andrew Bacchi <[EMAIL PROTECTED]> wrote:

I'm considering running a weekly salvage on all file servers from  
BosConfig.
Is this too often?  Any reason not to?  What are others doing?   
Thanks.





At my last *cough* site, we ran with fast-restart.  Because of the
cruft that would sometimes get left behind in volumes due to things
like crappy fortran compilers, I would run a salvage on each server
every 2-3 months.   As there were rarely any real errors, it ran
pretty quickly and would fit in my "official downtime" window.

I used to run 'em by hand because, well, I only had like 6 servers
(and I'm a hands-on kinda Moose), but it easily could have been
automated.

Moose
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info




--
veritatis simplex oratio est

Andrew Bacchi
Staff Systems Programmer
Rensselaer Polytechnic Institute
phone: 518 276-6415  fax: 518 276-2809

http://www.rpi.edu/~bacchi/
___ OpenAFS-info mailing  
list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info




Re: [OpenAFS] best practice for salvage

2008-04-02 Thread Robert Banz


That shouldn't be necessary at all.

On Apr 2, 2008, at 10:43 AM, Andrew Bacchi wrote:

I'm considering running a weekly salvage on all file servers from  
BosConfig.  Is this too often?  Any reason not to?  What are others  
doing?  Thanks.


--
veritatis simplex oratio est

Andrew Bacchi
Staff Systems Programmer
Rensselaer Polytechnic Institute
phone: 518 276-6415  fax: 518 276-2809

http://www.rpi.edu/~bacchi/

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info