RE: [OpenAFS] Speeding up Salvage

2007-12-19 Thread Jerry Normandin
Sorry, left work just before I got your email.  

 

Yes I looked in the Boslog, this is what I found:

 

Tue Dec 18 11:04:38 2007: fs:file exited on signal 3

Tue Dec 18 11:05:41 2007: bos shutdown: volserver failed to shutdown
within 60 s

econds 

 

So AFS Server did not shutdown.  It was killed by bos restart, this
triggered Salvage and some anxious AFS users.

 

Are there any precautions I can take to prevent this?



From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Derrick Brashear
Sent: Tuesday, December 18, 2007 4:22 PM
To: OpenAFS Info
Subject: Re: [OpenAFS] Speeding up Salvage

 

 

On Dec 18, 2007 4:17 PM, Jerry Normandin <[EMAIL PROTECTED]>
wrote:

 

The servers are running salvage after a bos restart  -all


do the logs give you any hint why?
 

 



Re: [OpenAFS] Speeding up Salvage

2007-12-19 Thread Derrick Brashear
On Dec 19, 2007 9:50 AM, Jerry Normandin <[EMAIL PROTECTED]> wrote:

>  Sorry, left work just before I got your email.
>
>
>
> Yes I looked in the Boslog, this is what I found:
>
>
>
> Tue Dec 18 11:04:38 2007: fs:file exited on signal 3
>

This is fine, kill -QUIT is the normal clean shutdown.


> Tue Dec 18 11:05:41 2007: bos shutdown: volserver failed to shutdown
> within 60 s
>
> econds
>
> This was your issue, I have no idea why based on only that log message.


[OpenAFS] can someone point me in the right direction on cleaning up RO volumes?

2007-12-19 Thread Jerry Normandin
Hi,

 

   My AFS deployment is much faster now that I've been cleaning up the
vldb.

One problem is that I noticed that many of the RO and RW volumes are on
the same server. ( I didn't do it, I inherited this)

So I want to migrate the RO volumes from ENG03 to ENG02.   But there is
a catch, When I list my VLDB I see this:

 

home.shimona_verma

RWrite: 536871074 ROnly: 536871075 RClone: 536871075

number of sites -> 2

   server eng02.dafca.local partition /vicepa RO Site  -- Old
release

   server eng03.dafca.local partition /vicepa RW Site  -- New
release

 

home.susann_flowers

RWrite: 536871116 ROnly: 536871117 RClone: 536871117

number of sites -> 2

   server eng02.dafca.local partition /vicepa RO Site  -- Old
release

   server eng03.dafca.local partition /vicepa RW Site  -- New
release

 

Should I issue a vos release to sync up the RO volume ?  Why wouldn't
the RO copy stay in sync?   My restart caused this issue?

 



From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Derrick Brashear
Sent: Wednesday, December 19, 2007 10:45 AM
To: OpenAFS Info
Subject: Re: [OpenAFS] Speeding up Salvage

 

 

On Dec 19, 2007 9:50 AM, Jerry Normandin <[EMAIL PROTECTED]>
wrote:

Sorry, left work just before I got your email.  

 

Yes I looked in the Boslog, this is what I found:

 

Tue Dec 18 11:04:38 2007: fs:file exited on signal 3


This is fine, kill -QUIT is the normal clean shutdown.
 

Tue Dec 18 11:05:41 2007: bos shutdown: volserver failed to
shutdown within 60 s

econds 

This was your issue, I have no idea why based on only that log message. 

 



Re: [OpenAFS] can someone point me in the right direction on cleaning up RO volumes?

2007-12-19 Thread Jeffrey Altman
Jerry Normandin wrote:
> Hi,
> 
>  
> 
>My AFS deployment is much faster now that I’ve been cleaning up the vldb.
> 
> One problem is that I noticed that many of the RO and RW volumes are on
> the same server. ( I didn’t do it, I inherited this)

You want a RO on the same server as the RW and then replicas of the RO
on other servers.




smime.p7s
Description: S/MIME Cryptographic Signature


Re: [OpenAFS] can someone point me in the right direction on cleaning up RO volumes?

2007-12-19 Thread Derrick Brashear
On Dec 19, 2007 11:33 AM, Jerry Normandin <[EMAIL PROTECTED]> wrote:

>  Hi,
>
>
>
>My AFS deployment is much faster now that I've been cleaning up the
> vldb.
>
> One problem is that I noticed that many of the RO and RW volumes are on
> the same server. ( I didn't do it, I inherited this)
>
> It's good practice to have at least one RO on the RW site, so, leave it
be.

> So I want to migrate the RO volumes from ENG03 to ENG02.   But there is a
> catch, When I list my VLDB I see this:
>
>
>
add a second site, keep the first, release the volume.


RE: [OpenAFS] can someone point me in the right direction on cleaning up RO volumes?

2007-12-19 Thread Jerry Normandin
I thought the initial RO volume should be on the same server as the RW
volume.. for performance.

 

The person who had my position before I got here had an afs tools server
(development environment) RW and RO,   home RW server,  home RO server

And two VLservers.   I'm going to request an additional servers,   split
the home directories across eng02 RW RO and eng03 RW RO  , and  add an
additional

Server for RO and backup volumes only.

 



From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Derrick Brashear
Sent: Wednesday, December 19, 2007 12:06 PM
To: OpenAFS Info
Subject: Re: [OpenAFS] can someone point me in the right direction on
cleaning up RO volumes?

 

 

On Dec 19, 2007 11:33 AM, Jerry Normandin <[EMAIL PROTECTED]>
wrote:

Hi,

 

   My AFS deployment is much faster now that I've been cleaning up the
vldb.

One problem is that I noticed that many of the RO and RW volumes are on
the same server. ( I didn't do it, I inherited this)

It's good practice to have at least one RO on the RW site, so, leave it
be. 

So I want to migrate the RO volumes from ENG03 to ENG02.   But
there is a catch, When I list my VLDB I see this:

 

add a second site, keep the first, release the volume.
 

 



Re: [OpenAFS] can someone point me in the right direction on cleaning up RO volumes?

2007-12-19 Thread Kim Kimball




While it's true that putting an RO on the same server and partition as
the RW will save some disk space, it doesn't protect against failure of
the RW storage device (LUN, drive, whatever.)

I therefore put some critical ROs on separate LUNs on the RW server.

In rampant paranoia,

Kim


Jeffrey Altman wrote:

  Jerry Normandin wrote:
  
  
Hi,

 

   My AFS deployment is much faster now that I’ve been cleaning up the vldb.

One problem is that I noticed that many of the RO and RW volumes are on
the same server. ( I didn’t do it, I inherited this)

  
  
You want a RO on the same server as the RW and then replicas of the RO
on other servers.


  



___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] can someone point me in the right direction on cleaning up RO volumes?

2007-12-19 Thread Christopher D. Clausen
Kim Kimball <[EMAIL PROTECTED]> wrote:
> While it's true that putting an RO on the same server and partition
> as the RW will save some disk space, it doesn't protect against
> failure of the RW storage device (LUN, drive, whatever.)
>
> I therefore put some critical ROs on separate LUNs on the RW server.

I thought was point was to save some time during the vos release process 
and as such the RO clones MUST be on the same partition as the RW in 
order for this copy on write benefit to work correctly.



Re: [OpenAFS] Puzzled about tracking down a bunch of locks

2007-12-19 Thread Brian Gallew

Derrick Brashear wrote:

vos backup failed and left them locked.

however, turn on auditlogs and collect the info.
I'll do that, thanks.  Failed backup reports are what got me looking at 
this to begin with.

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


[OpenAFS] Apache/Kerberos/AFS k5start question

2007-12-19 Thread John Hammond



I'm hoping someone might have some insight on a problem I'm having. I'm 
running Apache/2.0.52, Kerberos5 and OpenAFS/1.4.5. Kerberos, AFS and 
Apache are initiated in the following manner in /etc/init.d/httpd:


/usr/bin/pagsh -c "/usr/local/bin/k5start -b -K 30 -l 10h -p 
/var/run/httpd.k5start.pid -f /etc/keytabs/krb5.wwwadmin -t wwwadmin; 
LANG=$HTTPD_LANG $httpd $OPTIONS"


The Apache server is run as user apache but credentials are under user 
wwwadmin. /tmp/krb5cc_0 permissions are as follows:

-rw---   1 root root787 Dec 13 08:30 krb5cc_0

I get the following error when certain cgi's are run. It does not appear 
to happen everytime the cgi's are run.
as-prod-web-2 kernel: afs: Tokens for user of AFS id 0 for cell 
cats.ucsc.edu are discarded (rxkad error=19270408)


klists gives the following:
# klist
Ticket cache: FILE:/tmp/krb5cc_0
Default principal: [EMAIL PROTECTED]

Valid starting ExpiresService principal
12/13/07 08:30:05  12/13/07 18:30:05  krbtgt/[EMAIL PROTECTED]
12/13/07 08:30:05  12/13/07 18:30:05  afs/[EMAIL PROTECTED]


Kerberos 4 ticket cache: /tmp/tkt0
klist: You have no tickets cached


Any ideas why I might be getting this error? Places to look? Debugging 
tips?


thanks
John


PS some data:
uname -a -> Linux as-prod-web-2.ucsc.edu 2.6.9-42.0.10.ELsmp #1 SMP Fri 
Feb 16 17:17:21 EST 2007 i686 i686 i386 GNU/Linux


/usr/sbin/httpd -V
Server version: Apache/2.0.52
Server built:   Jun 29 2007 05:07:13
Server's Module Magic Number: 20020903:9
Architecture:   32-bit
Server compiled with
-D APACHE_MPM_DIR="server/mpm/prefork"
-D APR_HAS_SENDFILE
-D APR_HAS_MMAP
-D APR_HAVE_IPV6 (IPv4-mapped addresses enabled)
-D APR_USE_SYSVSEM_SERIALIZE
-D APR_USE_PTHREAD_SERIALIZE
-D SINGLE_LISTEN_UNSERIALIZED_ACCEPT
-D APR_HAS_OTHER_CHILD
-D AP_HAVE_RELIABLE_PIPED_LOGS
-D HTTPD_ROOT="/etc/httpd"
-D SUEXEC_BIN="/usr/sbin/suexec"
-D DEFAULT_PIDLOG="logs/httpd.pid"
-D DEFAULT_SCOREBOARD="logs/apache_runtime_status"
-D DEFAULT_LOCKFILE="logs/accept.lock"
-D DEFAULT_ERRORLOG="logs/error_log"
-D AP_TYPES_CONFIG_FILE="conf/mime.types"
-D SERVER_CONFIG_FILE="conf/httpd.conf"

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


[OpenAFS] OpenAFS version

2007-12-19 Thread Randy Fiskum
I have an OpenAFS installation running on Solaris 8.  How do I tell
which version it is?

 

Thank you



Re: [OpenAFS] OpenAFS benchmark improvements

2007-12-19 Thread anne salemme
this is great practical advice. another useful thing to look at is cron 
jobs running on the afs servers, or cron jobs that affect the afs servers.
you can find things like really inefficient creation of backup volumes 
with respect to the actual backups you run, really inefficient volume
replication (multiple jobs trying to release the same volume, etc.), or 
afs restarts in the middle of other jobs that depend on afs. in other words,

you want to make sure the cron jobs aren't fighting with each other.

i did this professionally for a year...you see a lot of improvements 
that can be made that way. ok, it's not as exciting as finding low-level 
bugs in the code,
but if it suits your personality...cleaning up can't hurt, and it might 
help.


anne


Steve Simmons wrote:
I'm going to second a big chunk of what Jerry wrote. About five years 
ago I inherited an AFS cell that had been through some rough time and 
spent more than a little time cleaning. The end result was much faster 
service. Our performance was never as bad as Jerry's, but it was still 
nothing to write home about.


I did some of the same things, didn't have to do others. Of the things 
done, two surprised me in that they made a difference.  One was the 
same as Jerry's - getting rid of all the bogus values returned by vos 
listaddr.  It didn't seem to make much difference to the users, but by 
god, anything that tried to look at all the servers got orders of 
magnitude better.


The other was to salvage every single volume in the cell, attaching 
the orphans:


   bos salvage  -orphans attach

Sonofagun if my salvages on reboot didn't stop peppering me with 
complaints. We deleted all the dead files it found, thus reclaiming 
some of the 'missing' disk space and gaving it back to the users. As a 
side effect, now if I get a salvage message, I know it's something to 
look at. On the other hand, note that if you restore a volume from 
before you forced the attach, it will need a salvage.


Another thing which helped (and unlike the other two, I expected this 
to help) was to get the vldb and the on-server volumes back into sync. 
I don't recall the precise steps I had to go through, but it took more 
than just doing a set of vos syncvldb/vos syncserv commands on the 
various machines involved. I do recall generating a vldb list and 
comparing that to the output of vos listvol from all the servers, and 
that had to be followed by some vos zap and vos remove and vos remsite 
commands. The whole cell got a lot snappier after that.


Once we were confident the cell was stable, we upgraded from Transarc 
to OAFS. That

made a big difference too, but it wasn't done until after the cleanup.

On Dec 12, 2007, at 9:52 AM, Jerry Normandin wrote:


So.. any of you out there that are experiencing AFS slowness, do a
sanity check to see what you come up with.  You might just say, WTF!


Absolutely.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Apache/Kerberos/AFS k5start question

2007-12-19 Thread Jeffrey Altman
John Hammond wrote:

> cats.ucsc.edu are discarded (rxkad error=19270408)

[\]translate_et 19270408
19270408 = ticket contained unknown key version number

Possible answers:

1. One of your file servers does not have support for Kerberos v5

2. One of your file servers is missing a key that your KDC is issuing



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [OpenAFS] OpenAFS version

2007-12-19 Thread Mike Garrison

On Dec 18, 2007, at 8:04 AM, Randy Fiskum wrote:

I have an OpenAFS installation running on Solaris 8.  How do I tell  
which version it is?


Thank you


rxdebug -servers localhost -port 7001 -version

or replace localhost with the machine ip if you're running it on a  
remote machine.


--
Mike Garrison

Re: [OpenAFS] OpenAFS version

2007-12-19 Thread Jeffrey Altman
Randy Fiskum wrote:
> I have an OpenAFS installation running on Solaris 8.  How do I tell
> which version it is?
> 
>  
> 
> Thank you

One way is to use rxdebug   -version

where 7000 is the file server
  7001 is the cache manager
  7002 is the protection service
  7003 is the volume location service
  ...





smime.p7s
Description: S/MIME Cryptographic Signature


Re: [OpenAFS] OpenAFS version

2007-12-19 Thread Avinesh Kumar
Try the following command

strings /usr/vice/etc/afsd | grep OpenAFS

On Dec 18, 2007 8:04 AM, Randy Fiskum <[EMAIL PROTECTED]> wrote:

>  I have an OpenAFS installation running on Solaris 8.  How do I tell which
> version it is?
>
>
>
> Thank you
>


Re: [OpenAFS] can someone point me in the right direction on cleaning up RO volumes?

2007-12-19 Thread Russ Allbery
Kim Kimball <[EMAIL PROTECTED]> writes:

> While it's true that putting an RO on the same server and partition as the RW
> will save some disk space, it doesn't protect against failure of the RW
> storage device (LUN, drive, whatever.)
>
> I therefore put some critical ROs on separate LUNs on the RW server.

There's usually no reason not to *also* have an RO replica on the RW
server, since you can have lots of them.  The first one is basically free.

-- 
Russ Allbery ([EMAIL PROTECTED]) 
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info