Re: [OpenAFS] Re: nightly failure since upgrading to 1.6.5

2014-02-12 Thread Tracy Di Marco White
On Mon, Feb 10, 2014 at 2:23 PM, Andrew Deason 
wrote:
>
> On Mon, 10 Feb 2014 00:27:59 -0600
> Tracy Di Marco White  wrote:
>
> > VolserLog
> > Sat Feb  8 00:02:42 2014 SYNC_ask:  length field in response
inconsistent
> > on circuit 'FSSYNC'
> > Sat Feb  8 00:02:42 2014 SYNC_ask: protocol communications failure on
> > circuit 'FSSYNC'; attempting reconnect to server
>
> This message says what one of the problems is, but isn't providing a lot
> of information. If it's convenient for you to apply a patch and rebuild,
> the following patch would give us a little more information in this
> situation (from gerrit 10829):
>
> <
http://git.openafs.org/?p=openafs.git;a=patch;h=9604a45e94ed23a2941d0a7e11bfd892a0bd0bf7
>

VolserLog (yesteday)
Wed Feb 12 01:04:48 2014 SYNC_ask:  length field in response inconsistent
on circuit 'FSSYNC' command 65543, 200 != 292
Wed Feb 12 01:04:48 2014 SYNC_ask: protocol communications failure on
circuit 'FSSYNC'; attempting reconnect to server
Wed Feb 12 01:04:48 2014 SYNC_ask:  length field in response inconsistent
on circuit 'FSSYNC' command 65543, 200 != 292
Wed Feb 12 01:04:48 2014 SYNC_ask: protocol communications failure on
circuit 'FSSYNC'; attempting reconnect to server
Wed Feb 12 01:04:49 2014 SYNC_ask:  length field in response inconsistent
on circuit 'FSSYNC' command 65543, 200 != 292
Wed Feb 12 01:04:49 2014 SYNC_ask: protocol communications failure on
circuit 'FSSYNC'; attempting reconnect to server
Wed Feb 12 01:04:49 2014 SYNC_ask:  length field in response inconsistent
on circuit 'FSSYNC' command 65543, 200 != 292
Wed Feb 12 01:04:49 2014 SYNC_ask: protocol communications failure on
circuit 'FSSYNC'; attempting reconnect to server
Wed Feb 12 01:04:52 2014 SYNC_ask:  length field in response inconsistent
on circuit 'FSSYNC' command 65543, 200 != 292
 (continued for a while)

FileLog (yesterday)
Wed Feb 12 01:04:48 2014 SYNC_getCom:  error receiving command
Wed Feb 12 01:04:48 2014 FSYNC_com:  read failed; dropping connection
(cnt=89505)
Wed Feb 12 01:04:48 2014 SYNC_getCom:  error receiving command
Wed Feb 12 01:04:48 2014 FSYNC_com:  read failed; dropping connection
(cnt=89537)
Wed Feb 12 01:04:49 2014 SYNC_getCom:  error receiving command
Wed Feb 12 01:04:49 2014 FSYNC_com:  read failed; dropping connection
(cnt=90013)
Wed Feb 12 01:04:49 2014 SYNC_getCom:  error receiving command
Wed Feb 12 01:04:49 2014 FSYNC_com:  read failed; dropping connection
(cnt=90459)
Wed Feb 12 01:04:52 2014 SYNC_getCom:  error receiving command
Wed Feb 12 01:04:52 2014 FSYNC_com:  read failed; dropping connection
(cnt=94010)
(continued for a while)

VolserLog (today)
Thu Feb 13 00:04:26 2014 SYNC_ask:  length field in response inconsistent
on circuit 'FSSYNC' command 65543, 200 != 292
Thu Feb 13 00:04:26 2014 SYNC_ask: protocol communications failure on
circuit 'FSSYNC'; attempting reconnect to server

FileLog (today)
Thu Feb 13 00:04:26 2014 SYNC_getCom:  error receiving command
Thu Feb 13 00:04:26 2014 FSYNC_com:  read failed; dropping connection
(cnt=923666)
Thu Feb 13 00:04:26 2014 _VLockFd: conflicting lock held on fd 29, offset
537170029 by pid 6070 (locktype=1)
Thu Feb 13 00:04:26 2014 VAttachVolume: another program has vol 537170029
locked
Thu Feb 13 00:04:29 2014 fssync: breaking all call backs for volume
537170031
Thu Feb 13 00:04:29 2014 VPreattachVolumeByVp_r: volume 537170029 not in
quiescent state (state 2 flags 0x18)

-Tracy


Re: [OpenAFS] Re: nightly failure since upgrading to 1.6.5

2014-02-10 Thread Tracy Di Marco White
On Mon, Feb 10, 2014 at 3:22 PM, Andrew Deason wrote:

> On Mon, 10 Feb 2014 15:09:25 -0600
> Tracy Di Marco White  wrote:
>


> I may have misinterpreted something up there. Were you running a prior
> 1.6 release with DAFS before, and this just started happening with
> 1.6.5? Or did you "switch" to DAFS and this started happening? Or did
> you upgrade from 1.4 and switch to DAFS at the same time?


I've had a single fileserver running DAFS with less valuable data for more
than a year, but as the only issue it saw was some interaction issues with
an AFS client of Harald's, I had no fear of finally upgrading the rest. That
server was running 1.6.2. The rest were running 1.4.something. I emptied
three servers, upgraded them to NetBSD 6.1.3 and OpenAFS 1.6.5 from
pkgsrc, adding a patch for davolserver. (I'll update the package to 1.6.6
in my copious free time this week, maybe, unless I'm beaten to that.)
Then I dumped another fileserver at them. As far as I can tell the other
two are working flawlessly. At least by comparison. The oldest is fine.


> > It happens on one server, of four, and it's most of the way through
> > creating backup volumes on this particular server. It is consistently
> > happening on one, and only one, server.
>
> Oh okay, well that makes me feel a little better :)


I will note that the volumes on the server that's falling over at
midnight:02
every night were previously on a different server that was also not staying
up more than a few days at a time. So there may be something odd with
a volume, I just don't know which one yet.

For what it's worth, when I did the restart this morning, the backupsys
continued on its merry way.

-Tracy


Re: [OpenAFS] Re: nightly failure since upgrading to 1.6.5

2014-02-10 Thread Tracy Di Marco White
On Mon, Feb 10, 2014 at 2:23 PM, Andrew Deason wrote:

> On Mon, 10 Feb 2014 00:27:59 -0600
> Tracy Di Marco White  wrote:
>
> > Every night at midnight, we run 'vos backupsys'. For three nights in a
> > row, on one of the servers I've upgraded to 1.6.5 and dafs, I've been
> > getting the following errors, and it mostly stops being a fileserver.
> > Is this fixed in 1.6.6? Anyone else seeing it? This is on NetBSD
> > 6.1.3.
>
> I would guess you are the only one using NetBSD for a "real" fileserver,
> at least for DAFS. The errors you've posted indicate there are some
> problems with the mechanism by which the fileserver and other processes
> use to communicate with each other, so it may be advisable to not trust
> DAFS on NetBSD with "real" data until it's known what's going on, as
> errors like this could possibly lead to corrupted volumes.
>

That's possible, certainly, depending on your definition of 'real'. I know
other people are using DAFS on NetBSD for fileservers. Personally,
I've only been doing it for a year or two.


> Do you know if this seems to happen immediately, or if 'vos backupsys'
> seems to correctly create some backup clones, and then eventually
> triggers this error? I (or someone else) will probably need to reproduce
> this to get a better idea of what's going on, but you can maybe save us
> some time with some more info:


It happens on one server, of four, and it's most of the way through creating
backup volumes on this particular server. It is consistently happening on
one, and only one, server.


> > VolserLog
> > Sat Feb  8 00:02:42 2014 SYNC_ask:  length field in response inconsistent
> > on circuit 'FSSYNC'
> > Sat Feb  8 00:02:42 2014 SYNC_ask: protocol communications failure on
> > circuit 'FSSYNC'; attempting reconnect to server
>
> This message says what one of the problems is, but isn't providing a lot
> of information. If it's convenient for you to apply a patch and rebuild,
> the following patch would give us a little more information in this
> situation (from gerrit 10829):
>
> <
> http://git.openafs.org/?p=openafs.git;a=patch;h=9604a45e94ed23a2941d0a7e11bfd892a0bd0bf7
> >
>


Sure, since I'm restarting just after midnight every night anyway.

On Mon, 10 Feb 2014 12:15:08 -0600
> Tracy Di Marco White  wrote:
>
> > root  4129  0.0  0.2 46288 5124 ? Sl7:46AM  0:00.02
> > /usr/pkg/libexec/openafs/davolserver -sleep 5/60 -nojumbo
> > root  7155  0.0  1.2  85200  42424 ? Il8:06AM  1:27.36
> > /usr/pkg/libexec/openafs/davolserver -sleep 5/60 -nojumbo
>
> Do you have any idea why you have multiple davolserver processes running
> at once? Does BosLog maybe say anything about processes dying or
> anything? Could you provide a 'ps' listing of all afs server processes
> on that machine?
>

It's not. Those are three different days, three different restarts.
Restarting
afs is the only way I know of to make the fileserver work again.

-Tracy


Re: [OpenAFS] nightly failure since upgrading to 1.6.5

2014-02-10 Thread Tracy Di Marco White
Sorry, no need to guess, it was in my monitoring client.
21378  4248 root 4:17PM Sl43  0.0 0:00.12  0.2   5184  46288
/usr/pkg/libexec/openafs/davolserver -sleep 5/60 -nojumbo



On Mon, Feb 10, 2014 at 12:15 PM, Tracy Di Marco White
wrote:

> Somehow, I still have two of them in my scroll back.
> root  4129  0.0  0.2  46288   5124 ? Sl7:46AM  0:00.02
> /usr/pkg/libexec/openafs/davolserver -sleep 5/60 -nojumbo
> root  7155  0.0  1.2  85200  42424 ? Il8:06AM  1:27.36
> /usr/pkg/libexec/openafs/davolserver -sleep 5/60 -nojumbo
>
> I'd assume that means you can guess the third.
>
>
> On Mon, Feb 10, 2014 at 7:00 AM, Peter Grandi wrote:
>
>> > Every night at midnight, we run 'vos backupsys'. For three
>> > nights in a row, on one of the servers I've upgraded to 1.6.5
>> > and dafs, I've been getting the following errors, and it
>> > mostly stops being a fileserver.
>>
>> [ ... ]
>> > Sun Feb  9 00:00:03 2014 SYNC_getCom:  error receiving command
>> > Sun Feb  9 00:00:03 2014 FSYNC_com:  read failed; dropping connection
>> (cnt=493489)
>> > Sun Feb  9 00:00:03 2014 _VLockFd: conflicting lock held on fd 225,
>> offset 538046785 by pid 4129 (locktype=1)
>> > Sun Feb  9 00:00:03 2014 VAttachVolume: another program has vol
>> 538046785 locked
>> > Sun Feb  9 00:00:03 2014 VPreattachVolumeByVp_r: volume 538046785 not
>> in quiescent state (state 2 flags 0x18)
>> [ ... ]
>> > Sun Feb  9 00:00:03 2014 1 Volser: Clone: Recloning volume 538046785 to
>> volume 538046787
>> > Sun Feb  9 00:00:03 2014 SYNC_ask:  length field in response
>> inconsistent on circuit 'FSSYNC'
>> > Sun Feb  9 00:00:03 2014 SYNC_ask: protocol communications failure on
>> circuit 'FSSYNC'; attempting reconnect to server
>> [ ... ]
>>
>> That " _VLockFd: conflicting lock held" and "VAttachVolume:
>> another program has vol  locked" looks vaguely familiar, and
>> in a case that I have seen it was because a DB server was
>> offline, and 'vos' took a very very long time to switch to an
>> online one. But this was with 1.4 and supposedly 1.6 should have
>> a shorter timeout.
>>
>> In another case that vaguely resembles this there was a race
>> between creating a clone and registering it in the VLDB:
>>
>>   http://rt.central.org/rt/Ticket/Display.html?id=131797
>>
>> It would be interesting to know what processes 21378, 4129, 7155
>> were doing and why they held a lock on the RW original.
>> ___
>> OpenAFS-info mailing list
>> OpenAFS-info@openafs.org
>> https://lists.openafs.org/mailman/listinfo/openafs-info
>>
>
>


Re: [OpenAFS] nightly failure since upgrading to 1.6.5

2014-02-10 Thread Tracy Di Marco White
Somehow, I still have two of them in my scroll back.
root  4129  0.0  0.2  46288   5124 ? Sl7:46AM  0:00.02
/usr/pkg/libexec/openafs/davolserver -sleep 5/60 -nojumbo
root  7155  0.0  1.2  85200  42424 ? Il8:06AM  1:27.36
/usr/pkg/libexec/openafs/davolserver -sleep 5/60 -nojumbo

I'd assume that means you can guess the third.


On Mon, Feb 10, 2014 at 7:00 AM, Peter Grandi wrote:

> > Every night at midnight, we run 'vos backupsys'. For three
> > nights in a row, on one of the servers I've upgraded to 1.6.5
> > and dafs, I've been getting the following errors, and it
> > mostly stops being a fileserver.
>
> [ ... ]
> > Sun Feb  9 00:00:03 2014 SYNC_getCom:  error receiving command
> > Sun Feb  9 00:00:03 2014 FSYNC_com:  read failed; dropping connection
> (cnt=493489)
> > Sun Feb  9 00:00:03 2014 _VLockFd: conflicting lock held on fd 225,
> offset 538046785 by pid 4129 (locktype=1)
> > Sun Feb  9 00:00:03 2014 VAttachVolume: another program has vol
> 538046785 locked
> > Sun Feb  9 00:00:03 2014 VPreattachVolumeByVp_r: volume 538046785 not in
> quiescent state (state 2 flags 0x18)
> [ ... ]
> > Sun Feb  9 00:00:03 2014 1 Volser: Clone: Recloning volume 538046785 to
> volume 538046787
> > Sun Feb  9 00:00:03 2014 SYNC_ask:  length field in response
> inconsistent on circuit 'FSSYNC'
> > Sun Feb  9 00:00:03 2014 SYNC_ask: protocol communications failure on
> circuit 'FSSYNC'; attempting reconnect to server
> [ ... ]
>
> That " _VLockFd: conflicting lock held" and "VAttachVolume:
> another program has vol  locked" looks vaguely familiar, and
> in a case that I have seen it was because a DB server was
> offline, and 'vos' took a very very long time to switch to an
> online one. But this was with 1.4 and supposedly 1.6 should have
> a shorter timeout.
>
> In another case that vaguely resembles this there was a race
> between creating a clone and registering it in the VLDB:
>
>   http://rt.central.org/rt/Ticket/Display.html?id=131797
>
> It would be interesting to know what processes 21378, 4129, 7155
> were doing and why they held a lock on the RW original.
> ___
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info
>


[OpenAFS] nightly failure since upgrading to 1.6.5

2014-02-09 Thread Tracy Di Marco White
Every night at midnight, we run 'vos backupsys'. For three nights in a row,
on one of the servers I've upgraded to 1.6.5 and dafs, I've been getting
the following errors, and it mostly stops being a fileserver. Is this fixed
in 1.6.6? Anyone else seeing it? This is on NetBSD 6.1.3.

Thanks,
Tracy

Feb 8:
FileLog
Sat Feb  8 00:02:42 2014 fssync: breaking all call backs for volume
537054876
Sat Feb  8 00:02:42 2014 SYNC_getCom:  error receiving command
Sat Feb  8 00:02:42 2014 FSYNC_com:  read failed; dropping connection
(cnt=1372738)
Sat Feb  8 00:02:42 2014 _VLockFd: conflicting lock held on fd 222, offset
537011871 by pid 21378 (locktype=1)
Sat Feb  8 00:02:42 2014 VAttachVolume: another program has vol 537011871
locked
Sat Feb  8 00:02:42 2014 fssync: breaking all call backs for volume
537011873
Sat Feb  8 00:02:42 2014 VPreattachVolumeByVp_r: volume 537011871 not in
quiescent state (state 2 flags 0x18)
Sat Feb  8 00:05:57 2014 CB: ProbeUuid for host B1EB0B00 (
173.30.18.151:11887) failed -1

VolserLog
Sat Feb  8 00:02:42 2014 SYNC_ask:  length field in response inconsistent
on circuit 'FSSYNC'
Sat Feb  8 00:02:42 2014 SYNC_ask: protocol communications failure on
circuit 'FSSYNC'; attempting reconnect to server

Feb 9:
FileLog
Sun Feb  9 00:00:03 2014 SYNC_getCom:  error receiving command
Sun Feb  9 00:00:03 2014 FSYNC_com:  read failed; dropping connection
(cnt=493489)
Sun Feb  9 00:00:03 2014 _VLockFd: conflicting lock held on fd 225, offset
538046785 by pid 4129 (locktype=1)
Sun Feb  9 00:00:03 2014 VAttachVolume: another program has vol 538046785
locked
Sun Feb  9 00:00:03 2014 VPreattachVolumeByVp_r: volume 538046785 not in
quiescent state (state 2 flags 0x18)

VolserLog
Sun Feb  9 00:00:03 2014 1 Volser: Clone: Recloning volume 538046785 to
volume 538046787
Sun Feb  9 00:00:03 2014 SYNC_ask:  length field in response inconsistent
on circuit 'FSSYNC'
Sun Feb  9 00:00:03 2014 SYNC_ask: protocol communications failure on
circuit 'FSSYNC'; attempting reconnect to server

Feb 10:
FileLog
Mon Feb 10 00:00:21 2014 fssync: breaking all call backs for volume
538410173
Mon Feb 10 00:00:22 2014 SYNC_getCom:  error receiving command
Mon Feb 10 00:00:22 2014 FSYNC_com:  read failed; dropping connection
(cnt=542873)
Mon Feb 10 00:00:22 2014 _VLockFd: conflicting lock held on fd 40, offset
538316382 by pid 7155 (locktype=1)
Mon Feb 10 00:00:22 2014 VAttachVolume: another program has vol 538316382
locked
Mon Feb 10 00:00:22 2014 fssync: breaking all call backs for volume
538316384
Mon Feb 10 00:00:22 2014 VPreattachVolumeByVp_r: volume 538316382 not in
quiescent state (state 2 flags 0x18)

VolserLog
Mon Feb 10 00:00:21 2014 1 Volser: Clone: Recloning volume 538410171 to
volume 538410173
Mon Feb 10 00:00:22 2014 SYNC_ask:  length field in response inconsistent
on circuit 'FSSYNC'
Mon Feb 10 00:00:22 2014 SYNC_ask: protocol communications failure on
circuit 'FSSYNC'; attempting reconnect to server
Mon Feb 10 00:00:22 2014 1 Volser: Clone: Recloning volume 538316382 to
volume 538316384


Re: [OpenAFS] Automatic move of volumes

2007-11-06 Thread Tracy Di Marco White
On 10/24/07, Steven Jenkins <[EMAIL PROTECTED]> wrote:
> On 10/24/07, Derrick Brashear <[EMAIL PROTECTED]> wrote:
> ...
> > perl scripts exist to do it and I think have been posted here in the past;
> > they may even deal with the "RO already exists" case.
> >
> It would be nice if there were a repository of publically available
> contrib stuff like that.

http://www.eyrie.org/~eagle/software/ is one of my favorite sources
of Russ's software... mvto is quite useful for moving RO volumes around,
although I'd already written all my own scripts for moving volumes when
I found it.  Also, I still use balance.

-Tracy
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] renaming principals (Was: One of my users has married - what to do? )

2007-04-29 Thread Tracy Di Marco White

On 4/29/07, Ken Hornstein <[EMAIL PROTECTED]> wrote:

And I think you're being rather optimistic about the user experiencing
a service outage.  Unless you're able to change their Unix account,
any ACLs, pts entry, etc etc, all at once, the user is going to have
some kind of outage.  You could shorten it, but I don't see how you're
going to make it zero without having everything using one mega database
backend (I'm not talking about Moira ... this would have to handle
every authorization request).


For us (iastate), they can certainly log into the unix account within a
few minutes, if moira's incrementals aren't sadly swamped. Windows
access would be a few minutes too, I think. We have moira send the
incrementals off to trigger all the updates to all our directories pretty
quickly.  LDAP & MIT KDC takes care of the OS X, Active Directory
takes care of the windows, and hesiod & MIT KDC for unix, and all
of those are triggered from moira very quickly. The user would even
be able to get their mail to their new username immediately, I believe,
just any mail they hadn't fetched to their old username may get
batched to them at the end of the day, when the old username
becomes a list. Looking at one rename, it seems to have taken
10 seconds for all the changes that moira pushes out to happen.

That's not zero time, but it's not bad. moira wasn't very busy then,
either.

-Tracy
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] One of my users has married - what to do?

2007-04-29 Thread Tracy Di Marco White

On 4/29/07, Ken Hornstein <[EMAIL PROTECTED]> wrote:

>If I recall correctly, our method for handling the salt correctly for
>any enctype now involves having the person set a new password
>when they change their username.

If you're going to do this anyway, and assuming you aren't doing
the right magic to preserve the password history correctly (from what I
remember, that old code in kadmind didn't do that), then why are you
adding the code for rename_principal back into kadmind?  It sounds
like you could do everything you are talking about with a delete
and an add.


We started having users set a new password when they change
their username within the last year.  We've been putting the
rename code back in for a lot longer.  John would have to say
if we do anything with password history, though I think we
don't.

-Tracy
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] One of my users has married - what to do?

2007-04-29 Thread Tracy Di Marco White

I keep seeing this subject in my mail, and I've kept wanting to
reply "Congratulate them!"... Since I was replying anyway, I
decided not to restrain myself.

On 4/29/07, Marcus Watts <[EMAIL PROTECTED]> wrote:

John Hascall <[EMAIL PROTECTED]> writes:
> > > On Thu, 19 Apr 2007, Helmut Jarausch wrote:
> > >> what do I have to do to rename a user.
> > >> It was easy with pts but how to rename a user
> > >> with kas.
>
> > > You can't. My old trick was to use a tool which we had hacked up to
> > > pull a key from the database, and reinject that key for the new
> > > username, then delete the old one.
>
> > Is it possible to perform a similar trick directly on true Kerberos 5
> > principals?
>
> Not in any recent from-MIT version.  There used to be a
>
>rename_principal ${oldname} ${newname}
>
> command in kadmin[.local] but it vanished at some point.
> We've been adding it back in ever since here as we end
> up doing a couple hundred renames a year.
...

Oddly enough, we also add in support for rename_principal to our copy
of MIT kerberos (umich.edu).  The main interesting complication is
handling salt right.  We probably do several hundred of these a year.
In addition to handling kerberos and pts, it's also necessary (in our
environment) to rename the user volume, its mount point, the entry in
the password file, the imap mailbox, the ldap directory entry, and to
locate and change any ldap directory attributes that point to that
directory entry.  Also there's a local oracle database with billing
information, and some data in peoplesoft, and an entry in MS active
directory, and another directory entry in Novell eDir, and...


We use moira to handle all the changes in our central services,
which include kerberos, moira mailing lists, nfs groups if there is
one, creating a moira mailing list for the old name that gets
forwarded for a year, rename disk & print quota grants, afs
filesystem, pts entry, groups & mountpoint, updating locker
ownership, updating ldap attributes, updating active directory & novell,
change webct, update name servers (we do username.mail.iastate.edu
for mail servers, we use hesiod directory services, and we provide
username.public.iastate.edu for web services), update majordomo
lists, update mailbox names, change the finger server with .plan &
.project files, rename all the possible kerberos instances as well as
the base instance, update the online phone book, propagating the
username changes off to everyone else's databases and there's
probably more that I'm missing. All of that is automated, and I'm
working on giving privileges to do it to all of the full time employees
at our help desk, rather than just a couple. John has done most of
heavy lifting on making it work though, I just ask for what I want it
to do.


Needless to say we also discourage login changes.
We don't yet have a way to change cached data in meatware.


We do a couple hundred every year.  We used to require proof of
name change, or a fee. Now we don't. The most we've ever done
in a year was 1644, and the number we do now is down probably
because everyone comes in more savvy about usernames, and
we point out fairly obviously that this is going on your placement
account, therefore companies you are going to be applying to work
for will see this.

If I recall correctly, our method for handling the salt correctly for
any enctype now involves having the person set a new password
when they change their username.

-Tracy
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] AFS-Backup-Limits

2005-12-27 Thread Tracy Di Marco White
On 12/27/05, Chris Huebsch <[EMAIL PROTECTED]> wrote:
> On Tue, 27 Dec 2005, Tracy Di Marco White wrote:
>
> > We've been adding several 1.2+ TB servers, and it has become no longer
> > reasonable to put a tape drive on every server, as we had been doing.
>
> You do not have a tape drive on every server. AFS Backup can send its
> backup via network to an other afs-backup-server.

Right.  I started using that on our new servers that we added before
the new backup server was in production.

> > Our full backups were taking longer than a day, sometimes three or
> > four days, and things were set up so that it was more complicated to
> > do incremental backups while the full backups were running.
>
> This is really ugly. Did you evaluate the reason for that? Are the disks
> to slow, or the tape-drives or the system-bus of your server machines?

AFS seemed to be our bottleneck.

-Tracy
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] AFS-Backup-Limits

2005-12-27 Thread Tracy Di Marco White
On 12/27/05, Frank Burkhardt <[EMAIL PROTECTED]> wrote:
> Hi,
>
> On Mon, Dec 26, 2005 at 02:31:34PM -0600, Tracy Di Marco White wrote:
>
> [snip]
>
> > We stopped using the AFS backup system two weeks ago.
>
> What were the reasons?

We've been adding several 1.2+ TB servers, and it has become no longer
reasonable to put a tape drive on every server, as we had been doing. 
Our full backups were taking longer than a day, sometimes three or
four days, and things were set up so that it was more complicated to
do incremental backups while the full backups were running. Moving to
a single backup server that can backup AFS, Unix, Windows, and Novell
may make my life easier.  Eventually.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] AFS-Backup-Limits

2005-12-26 Thread Tracy Di Marco White
On 12/26/05, Frank Burkhardt <[EMAIL PROTECTED]> wrote:
Hi,are there any known limits to OpenAFS' backup database? I'm most interestedin:  * max number of volume sets  * max number of tapes  * max number of dumps
I think we were doing 60 full tapes a week, with the disk partitions
sized such that it was some number of integer multiple disk partitions
per tape.  A minimum of 60 incremental tapes, as they could ask
for additional tapes. We kept tapes for a three week rotation, but
occasionally would end up with double that number in the
database.  We have a 3 week rotation schedule, with something
under 60k volumes, and almost everything got backed up.  We did
one full tape per volume set.
We stopped using the AFS backup system two weeks ago.

-Tracy



Re: [OpenAFS] OpenAFS in a production environment

2005-09-01 Thread Tracy Di Marco White
On 9/1/05, Lester Barrows <[EMAIL PROTECTED]> wrote:
Hi Jeffrey,On Thursday 01 September 2005 6:43 pm, you wrote:> OpenAFS _clients_ work fine behind a NAT that provides reasonable
> connection tracking and does not time out UDP port associations too> quickly.  For those that do time out such associations quickly, it is> possible to increase the frequency with which the cache manager polls the
> fileserver, resulting in a "keep-alive" effect, but this has the> disadvantage of additional load on the network and fileservers.OpenAFS clients in excess of one system work poorly behind any NAT I've ever
put them behind, be that hardware such as those on Cisco or Foundry routers,or software such as iptables with the Linux kernel. There may be a few typesof NATs which work properly, and increasing polling frequency may indeed
help, but from an architectural standpoint I wouldn't recommend placingseveral AFS clients behind a NAT. It's simply asking for trouble from myexperience, which is the context in which my response was written.

I have three clients in my living room and five more clients in my home
office that all do AFS quite happily through a NAT.  Only two of
them are OpenAFS, the rest are arla, and the only drawback I have seen
is that reads are somewhat slow with OpenAFS through the NAT. 
Reads are fine with arla and writes are close enough to wire/disk
speeds for both OpenAFS & arla.

-Tracy



Re: [OpenAFS] Large volumes -- anyone using?

2005-08-24 Thread Tracy Di Marco White
On 8/24/05, Russ Allbery <[EMAIL PROTECTED]> wrote:
> Tracy Di Marco White <[EMAIL PROTECTED]> writes:
> 
> > I've had a vos move run about 36 hours without timing out running recent
> > versions of OpenAFS on the servers, and I have had very small volumes
> > time out on vos moves using Transarc AFS on the servers.  We also had
> > vos releases failing when vos moves were failing.
> 
> The latter is the standard problem with a single-threaded volserver.  I
> have great hopes for 1.4 finally putting that one to bed.

Things got much better when I upgraded all my general afs fileservers
to openafs 1.2.11, actually.

-Tracy
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Large volumes -- anyone using?

2005-08-24 Thread Tracy Di Marco White
On 8/24/05, David Thompson <[EMAIL PROTECTED]> wrote:
> "Dexter 'Kim' Kimball" wrote:
> >If you've got experience with large volumes (tens to hundreds of GB) I'd
> >much appreciate any experiences you may have had, good bad or indifferent.
> 
> We run a mirror site for software distributions (mirror.cs.wisc.edu) that is
> backed in afs, with many volumes in the 10-50 GB range.  We've seen very few
> operational issues, although we do have annoyance problems with 'vos move's
> timing out.

I've had a vos move run about 36 hours without timing out running
recent versions of OpenAFS on the servers, and I have had very small
volumes time out on vos moves using Transarc AFS on the servers.  We
also had vos releases failing when vos moves were failing.

-Tracy
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] 8gb limit?

2005-08-19 Thread Tracy Di Marco White
On 8/16/05, Todd T. Fries <[EMAIL PROTECTED]> wrote:
> I believe I've run into an 8gb volume limit on OpenBSD/i386 3.8-beta.
> 
> I'm running cvs head, and found that 'file too large' errors were being
> given when trying to write files to my root.cell volume.. I removed a
> 300mb file and was able to write several smaller files.  Then I noticed
> this and someone on #openafs on freenode suggested there was a historic
> 8gb volume limit that should be gone by now.
> 
> $ df -ih /vicepa
> Filesystem SizeUsed   Avail Capacity iused   ifree  %iused
> Mounted on
> /dev/wd2g 21.7G7.8G   12.8G38%   26575 2879151 1%   /vicepa
> $
> 
> At the moment, I'm only `testing' OpenAFS on OpenBSD so I have all this
> in the root.cell volume, will refactor things into more reasonable
> chunks when I set things up for real.
> 
> Thoughts on this?

My largest volume so far had around 500GB in it, on a NetBSD afs
server. I've been using 70GB volumes for a few years now, on Digital 
Unix Transarc afs servers, so it isn't that new.
-Tracy
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] [1.3.86] heimdal/krb5 auth for BOS requests fails during initial cell setup

2005-08-08 Thread Tracy Di Marco White
On 8/8/05, Brandon S. Allbery KF8NH <[EMAIL PROTECTED]> wrote:
> On Mon, 2005-08-08 at 19:07 -0500, Tracy Di Marco White wrote:
> > On 8/8/05, Brandon S. Allbery KF8NH <[EMAIL PROTECTED]> wrote:
> > > On Tue, 2005-08-09 at 01:39 +0200, scorch wrote:
> > > > -r /on my heimdal install. doing a klist -T  hangs though.
> >
> > You can't get tokens without afs/arla kernel module loaded, and maybe
> > arla/afsd running.
> 
> But in that case klist will quickly discover that there's no pioctl() in
> the kernel and not try to list tokens.  Except on Solaris when built
> with optimization, in which case it still does weird stuff because kafs
> returns trash instead of a meaningful response (valid token or error).

This is what you get on a NetBSD box when you try to do most any
command without arla running and without using -local (with KeyFile
access):
# bos status afs-5
Bad system call(core dumped)

Loading the arla kernel module makes it much happier.  klist doesn't
hang for me, I just don't get tokens.  I have seen klist -T hang
when it can't find the CellServDB on NetBSD.

-Tracy
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] [1.3.86] heimdal/krb5 auth for BOS requests fails during initial cell setup

2005-08-08 Thread Tracy Di Marco White
On 8/8/05, Brandon S. Allbery KF8NH <[EMAIL PROTECTED]> wrote:
> On Tue, 2005-08-09 at 01:39 +0200, scorch wrote:
> > -r /on my heimdal install. doing a klist -T  hangs though.

You can't get tokens without afs/arla kernel module loaded, and maybe
arla/afsd running.

> This wouldn't happen to be Solaris, would it?  The kafs library needs to
> be compiled without optimization for some reason, at least with gcc, or
> the attempt to read and parse tokens from the kernel will access random
> memory leading to either very slow operation as it uselessly scans large
> tracts of memory, or core dumps.
>
> > libprot: AFS kernel pioctl doesn't exist Could not get afs tokens, running 
> > unauthenticated.
> 
> Your libafs kernel module isn't there for some reason, or isn't where
> the AFS libraries expect it.  If this is Solaris, make sure you have the
> correct entry in /etc/name_to_sysnum and reboot to activate it.

Oh. this reminds me that you don't get tokens, and can't do things that
requires tokens unless you're running afs the client or arla the client.
IIRC you're running OpenBSD, which means either should work.

-Tracy
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] [1.3.86] heimdal/krb5 auth for BOS requests fails during initial cell setup

2005-08-08 Thread Tracy Di Marco White
On 8/8/05, scorch <[EMAIL PROTECTED]> wrote:
> Tracy Di Marco White said the following on 2005-08-05 03:58:

> >If he's using the instructions we wrote, he's likely using heimdal, and so
> >kinit will get tokens magically if he has "afslog = yes" in "[appdefaults]"
> >in his /etc/krb5.conf.  (Sample krb5.conf on page 13, same instructions.)
> >I don't see appdefaults in his krb5.conf snippet, so I don't know if he has
> >that, but I don't see tokens in his klist, so probably not.
> >
> 
> I added the /afslog=yes/ & now I get:
> 
> [EMAIL PROTECTED]:/home/wavey $ klist
> Credentials cache: FILE:/tmp/krb5cc_1000
> Principal: wavey/[EMAIL PROTECTED]
> 
> Issued   Expires  Principal
> Aug  9 00:25:51  Aug  9 10:25:51  krbtgt/[EMAIL PROTECTED]
> Aug  9 00:25:51  Aug  9 10:25:51  afs/[EMAIL PROTECTED]
> 
> which is clearly an improvement with the AFS tickets. NB /add
> -random-key afs/example.com /has to be written as /--random-key /, or/
> -r /on my heimdal install. doing a klist -T  hangs though.

You should probably ktrace it and see why it hangs.  It's likely all
the rest of your problems will go away once that's fixed.  Do you have
a CellServDB where ever it is you compiled it to go?

> I'm OK up to 'Installing the initial AFS DB server'
> 
> * Copy KeyFile created above to /usr/pkg/etc/openafs/server/KeyFile
> 
> I've not got a //usr/pkg/etc/openafs/server/KeyFile/, I put it in
> //usr/afs/etc/KeyFile
> 
> /But this isn't enough to restart the BOSS with just my tickets for
> authentication:
> 
> [EMAIL PROTECTED]:/usr/afs/bin $ /usr/afs/bin/bosserver -log
> [EMAIL PROTECTED]:/usr/afs/bin $ klist
> Credentials cache: FILE:/tmp/krb5cc_0
> Principal: wavey/[EMAIL PROTECTED]
> 
>   Issued   Expires  Principal
> Aug  9 00:34:11  Aug  9 10:34:11  krbtgt/[EMAIL PROTECTED]
> Aug  9 00:34:11  Aug  9 10:34:11  afs/[EMAIL PROTECTED]
> 
> [EMAIL PROTECTED]:/usr/afs/bin $ ./pts examine wavey.afs
> libprot: AFS kernel pioctl doesn't exist Could not get afs tokens, running 
> unauthenticated.
> Name: wavey.afs, id: 1, owner: system:administrators, creator: anonymous,
>   membership: 1, flags: S, group quota: unlimited.
> 
> [EMAIL PROTECTED]:/usr/afs/bin $ ./bos restart -server scorch.muse.net.nz
> bos: AFS kernel pioctl doesn't exist (getting tickets)
> bos: running unauthenticated
> bos: failed to restart servers (you are not authorized for this operation)
> 
> 
> & yet under/ -localauth/ it works. I've got my
> //usr/pkg/etc/openafs/server/KeyFile/ stored in //usr/afs/etc/KeyFile/
> -- I assume this is the correct place based on info in the Wiki.Do you
> have any other suggestions for me?

-localauth working means you put your KeyFile in the right place.

-Tracy
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] [1.3.86] heimdal/krb5 auth for BOS requests fails during initial cell setup

2005-08-04 Thread Tracy Di Marco White
On 8/4/05, zeroguy <[EMAIL PROTECTED]> wrote:
> On Thu, 04 Aug 2005 07:40:35 +0200
> scorch <[EMAIL PROTECTED]> wrote:
> [...]
> > -- thanks :-) but I'm stuck after switching out of -noauth, despite
> > having seeming correct k5 tickets. My guess is that I need something
> > like aklog, or my krb configuration but I am lost for the obvious
> answer.
> 
> You need to run aklog. There's not a whole lot else you need to know
> (it just grants you your afs token from your krb tickets). Just 'aklog',
> no arguments, immediately after you run a successful kinit. Unless I'm
> missing something and there's something special about your setup, that
> is all you are missing.

If he's using the instructions we wrote, he's likely using heimdal, and so
kinit will get tokens magically if he has "afslog = yes" in "[appdefaults]"
in his /etc/krb5.conf.  (Sample krb5.conf on page 13, same instructions.)
I don't see appdefaults in his krb5.conf snippet, so I don't know if he has
that, but I don't see tokens in his klist, so probably not.

-Tracy
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] [1.3.86] heimdal/krb5 auth for BOS requests fails during initial cell setup

2005-08-04 Thread Tracy Di Marco White
On 8/4/05, scorch <[EMAIL PROTECTED]> wrote:
> hi,
> 
> I've been following a number of how-to guides, the best being
> http://kula.public.iastate.edu/talks/afs-bpw-2005/afs-bpw-2005-iowa.pdf
> -- thanks :-) but I'm stuck after switching out of -noauth, despite
> having seeming correct k5 tickets. My guess is that I need something
> like aklog, or my krb configuration but I am lost for the obvious answer.

Thanks! Glad it's helpful.

> After page 33, I switch after running in -noauth to 'restart BOS server
> with authentication'. I always receive the following error:
> [EMAIL PROTECTED]:/usr/afs/bin $ ./bos shutdown mercury.muse.net.nz 
> -noauth
> bos: failed to shutdown servers (you are not authorized for this 
> operation)
> despite all my best kinit efforts. I'm sure I am missing something
> obvious but I can't find info in the logs. Any suggestions on how to
> proceed?

If you're getting that message, bosserver isn't running -noauth anymore,
I suspect, and so bos shutdown can't be run with -noauth anymore, but
you may be able to use -local if you're running bos shutdown on the
fileserver and your shell can read KeyFile.

-Tracy
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] San Storage & AFS servers

2005-06-17 Thread Tracy Di Marco White
On 6/17/05, Steve Devine <[EMAIL PROTECTED]> wrote:
> I question how best to configure our servers. Currently we have Dell
> Poweredges with dual controllers attached to split bus disk shelves
> with  ( 10 )36gig drives mirrored to provide (5 )36Gig volumes that we
> mount as /vicepa  /vicepb ; etc.
> Do we set up one lun as /vicepa and make it say 180 Gig as this would
> represent the aggregate size of the old disk shelves?
> Or can I go up to 256/ 512 Gig?  I realize this is not something that
> can be objectively answered (My mileage may vary).
> I am interested however in how others have configured their file servers
> and what size they are setting their volumes.

We have volume sizes up to 75GB, although we don't have very many of those.
Our user volumes (students/staff/faculty) have a base quota of 1GB, and can
raise it at whim, with a charge for usage, not quota.  Our partition sizes range
from 1GB (old disks, going away, mounted on a for pay basis for cheaper disk
at the time) to 25GB (older servers, based on what could fit on a 20/40 DLT),
to 50GB (three/four partitions on a 160/320 SDLT), and our new servers are
currently partitioned with 300GB & 600GB partitions, although that's
still in flux,
somewhat.  

-Tracy
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] AFS over high(ish) latency link

2005-05-18 Thread Tracy Di Marco White

In message <[EMAIL PROTECTED]>, Peter Nelson writes:
>Hi, I'm wondering if anyone has experience using AFS over higher latency 
>links, where by higher I mean residential broadband.  I am only able to 
>achieve rates of about 150k/sec across my cable connection using AFS 
>while I can download at about 500k/sec using http from the same server.  
>I believe the culprit is the fact that my ping times are almost 100ms 
>and somewhere the rx window size is limiting the amount of in-flight 
>data.  Does anyone have suggestions as to what to tune, including 
>#DEFINE's in the source?

I access my afs cell at work over my broadband link while at home regularly,
and I access my afs cell at home from work, pretty much just as regularly.
The only problems I've really noticed are NAT related, rather than broadband
related.  And even to that, things work through my NAT, it's just reads are
slow, while writes seem to happen at wire/disk speeds.  (Slow meaning
I can play mp3s through my NAT out of AFS, but not videos.  Particularly
bad were the Star Wars trailers at high resolutions.)  I'm not sure, but I
think most people have more problems than I do with NATs & AFS (in part
because arla doesn't have the slow read problem through a NAT, and most
of my machines are NetBSD, and so the AFS client I use most must be Arla).
Experiments with OpenAFS for Windows & Mac OS X show the problem with
very slow reads on both.

-Tracy
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] 1.3.80 server strangeness (kernel 2.6.11-gentoo-r3)

2005-03-29 Thread Tracy Di Marco White
On Tue, 29 Mar 2005 18:17:38 -0500, Kevin <[EMAIL PROTECTED]> wrote:
> But without explanation this morning at 0400, all of the server
> instances on this machine just shut down.  The bosserver was still up,
> but log entries in /usr/afs/logs showed what seemed to be a normal
> termination for all the others; only I didn't order it.  I'm 100%
> certain that foul play is not a factor because the network hosting the
> the cell is (for the time being), not even accessible from without.  I
> tried a:
> # bos status server -long -localauth
> bos: failed to contact host's bosserver (communications failure (-1)).
> 
> but no explanation there.
> 
> Similarly, nothing in /var/log/messages.
> 
> Any ideas on where else to look for a reason for the shutdown?

4am sounds like the weekly restart, but bos should still respond.
bos getrestart -server 
will tell you about the scheduled restarts.

-Tracy
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] adding a group to a group?

2005-03-07 Thread Tracy Di Marco White
On Sat, 05 Mar 2005 16:31:15 +0100, Lars Schimmer
<[EMAIL PROTECTED]> wrote:
> I just wanted to create a group named "all" in which all other groups should 
> be
> member in OpenAFS.
> But the pts adduser only let me add users to goups. So how can I add a group
> named "fooo" to a group named "all" ?

Not answering the question you asked, but have you looked at system:authuser
if all you want is an ACL that contains all your authorized users?  Or
system:anyuser
if you want everyone everywhere with AFS access on an ACL.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] OpenAFS FreeBSD Info...

2005-03-04 Thread Tracy Di Marco White

In message <[EMAIL PROTECTED]>, Esther Filderman writes:
>You should be able to run the NetBSD server port on FreeBSD, i seem to recall.
>
>As for clients, well, you could run Arla, but it's not a fully
>functional client -- you can't do much fileserver controlling
>remotely.

For NetBSD I install Arla & OpenAFS (without the kernel module) on any
machine where I need any basic fileserver control, and just Arla on
any other client machine.  I just make sure they install into different
areas.  Should be essentially the same for FreeBSD.

-Tracy
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] MacOSX with reliable AFS homedirs?

2005-02-03 Thread Tracy Di Marco White

In message <[EMAIL PROTECTED]>, Troy Benjegerdes writes:
>Has anyone gotten Krb5, ldap, and AFS homedirs working reliably?

Have you looked at the ISU OS X documentation?
http://tech.ait.iastate.edu/macosx/

I'm just using krb5 & AFS, no LDAP, but mine is mostly a single user
machine.

>We've had to resort to setting up each individual users with a startup
>items script to run aklog.

I know the ISU lab documentation talks about using LDAP:
http://tech.ait.iastate.edu/macosx/how-to/labs-10.3.shtml

>I've tried the 'kfm_aklog' plugin, but it doesn't seem to work, and none
>of the apple login hook stuff seems to work. 
>
>What is the equivalent of a linux PAM line like:
>
>sessionlibpam-openafs-session.so debug

PAM I'm not really using yet, so I can't help there.

-Tracy
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info