Re: [OpenAFS] connection timed out, how long is the timeout?

2018-02-05 Thread Jose M Calhariz
On Sun, Feb 04, 2018 at 05:21:16PM -0500, Jeffrey Altman wrote:
> On 2/4/2018 7:29 AM, Jose M Calhariz wrote:
> > I am chasing the root problem in my infra-structure of afsdb and
> > afs-fileservers.  Sometimes my afsdb loses quorum in the middle of a
> > vos operation or the Linux clients time out talking to the
> > file servers.  To help diagnose the problem I would like to know how
> > long is the timeout and if I can change the time out connections in
> > the Debian clients and for the vos operations.
> >[...]
> > The core of my infra-structure are 4 afsdb running Debian 9, and using
> > OpenAFS from Debian 1.6.20, on a shared virtualization platform.  The
> > file-servers running Debian 9 and using OpenAFS from Debian, 1.6.20,
> > are VMs in dedicated hosts for OpenAFS on top of libvirt/KVM.
> 
> Jose,
>

(...)

Thank you for your report.  I will read it with very much attention
this nigth and again tomorrow.  I am travelling from FOSDEM to home.

> 
> Jeffrey Altman
> AuriStor, Inc.

> begin:vcard
> fn:Jeffrey Altman
> n:Altman;Jeffrey
> org:AuriStor, Inc.
> adr:Suite 6B;;255 West 94Th Street;New York;New York;10025-6985;United States
> email;internet:jalt...@auristor.com
> title:Founder and CEO
> tel;work:+1-212-769-9018
> note;quoted-printable:LinkedIn: 
> https://www.linkedin.com/in/jeffreyaltman=0D=0A=
>   Skype: jeffrey.e.altman=0D=0A=
>   
> url:https://www.auristor.com/
> version:2.1
> end:vcard
> 

Kind regards
Jose M Calhariz


-- 
--

De cem favoritos dos reis, noventa e cinco foram enforcados

--Napoleão Bonaparte
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] connection timed out, how long is the timeout?

2018-02-04 Thread Jeffrey Altman
On 2/4/2018 7:54 AM, Dirk Heinrichs wrote:
> Am 04.02.2018 um 13:29 schrieb Jose M Calhariz:
> 
>> The core of my infra-structure are 4 afsdb
> 
> Wasn't it so that it's better to have an odd number of DB servers (with
> a max. of 5)?

The maximum number of ubik servers in an AFS3 cell is 20.  This is a
protocol constraint.  However, due to performance characteristics it is
unlikely that anyone could run that number of servers in a production
cell.  As the server count increases the number of messages that must be
exchanged to conduct an election, complete database synchronization
recovery, maintain quorum, and complete remote transactions.  These
messages compete with the application level requests arriving from
clients.  As the application level calls (vl, pt, ...) increase the risk
of delayed processing of disk and vote calls increases which can lead to
loss of quorum or remote transaction failures.

The reason that odd numbers of servers are preferred is because of the
failover properties.

one server - single point of failure.  outage leads to read and write
failures.

two servers - single point of failure for writes.  only the lowest ipv4
address server can be elected coordinator.  if it fails, writes are
blocked.  If it fails during a write transaction, read transactions on
the second server are blocked until the first server recovers.

three or four servers - either the first or second lowest ipv4 address
servers can be elected coordinator.  any one server can fail without
loss of write or read.

five or six servers - any of the first three lowest ipv4 address servers
can be elected coordinator.  any two servers can fail without loss of
write or read.

Although adding a fourth server increases the number of servers that can
satisfy read requests, the lack of improved resiliency to failure and
the increased risk of quorum loss makes its less desirable.


The original poster indicated that his ubik servers are virtual
machines.  The OpenAFS Rx stack throughput is limited by the clock speed
of a single processor core.  The 1.6 ubik stack is further limited by
the need to share a single processor core with all of the vote, disk and
application call processing.  As a result, anything that increases the
overhead reduces increases the risk of quorum failures.

This includes virtualization as well as the overhead imposed as a result
of Meltdown and Spectre fixes.  Meltdown and Spectre can provided a
double whammy as a result of increased overhead both within the virtual
machine and within the host's virtualization layer.

AuriStor's UBIK variant does not suffer the scaling problems of AFS3
UBIK.  AuriStor's UBIK has been successfully tested with 80 ubik servers
in a cell. This is possible because of a more efficient protocol that is
 incompatible with AFS3 UBIK and the efficiencies in AuriStor's Rx
implementation.

Jeffrey Altman
AuriStor, Inc.
<>

smime.p7s
Description: S/MIME Cryptographic Signature


Re: [OpenAFS] connection timed out, how long is the timeout?

2018-02-04 Thread Jeffrey Altman
On 2/4/2018 7:29 AM, Jose M Calhariz wrote:
> I am chasing the root problem in my infra-structure of afsdb and
> afs-fileservers.  Sometimes my afsdb loses quorum in the middle of a
> vos operation or the Linux clients time out talking to the
> file servers.  To help diagnose the problem I would like to know how
> long is the timeout and if I can change the time out connections in
> the Debian clients and for the vos operations.
>[...]
> The core of my infra-structure are 4 afsdb running Debian 9, and using
> OpenAFS from Debian 1.6.20, on a shared virtualization platform.  The
> file-servers running Debian 9 and using OpenAFS from Debian, 1.6.20,
> are VMs in dedicated hosts for OpenAFS on top of libvirt/KVM.

Jose,

There is unlikely to be a single problem but since I'm procrastinating
and curious I decided to perform some research on your cell.  This
research is the type of analysis that AuriStor performs on behalf of our
support customers.  Many of the problems you are experiencing with
OpenAFS are likely due to or exacerbated by architectural limitations
that are simply not present in AuriStorFS.

Your cell has four db servers afs01 through afs04 with associated IP
addresses that rank the servers from afs01 through afs04.  therefore
afs01 is the preferred coordinator (sync site) and if its not running
afs02 will be elected.  Given there are four servers it is not possible
for afs03 or afs04 to be elected.

There are of course multiple independent ubik database services (vl, pt,
and bu) and it is possible for quorum to exist for one and not for others.

The vl service is used to store volume location information as well as
fileserver/volserver location information.  vl entries are modified when
a fileserver restarts, when a vos command locks and unlocks an entry, or
creates, updates or deletes an entry.   Its primary consumer is the afs
client which queries volume and file server location information.

The pt service stores user and group entries.  pt entries are modified
by pts when new user entries are created, modified or deleted; and when
groups are created, modified or deleted; or when group membership
information is modified.  The primary consumer is the fileserver which
queries the pt service for user and host current protection sets each
time a client establishes an rxkad connection to the fileserver.

The vl and pt services are of course ubik services.  Therefore each
vlserver and ptserver process also offers the ubik disk and vote
services which are critical.  The vote service is used to hold
elections, distribute current database version info, and maintain
quorum.  The disk service is used to distribute the database, update the
database, and maintain database consistency.  It should be noted that
the vote service is time sensitive in that packets that are used to
request votes from peers and the responses only have a limited valid
lifetime.

Some statistics regarding your vl service.  Each server is configured
with 16 LWP threads.  afs03 and afs04 have both failed to service calls
in a timely fashion since the last restart.  If those failures were vote
or disk calls then the coordinator would mark afs03 and afs04 as
unreachable, force a recovery operation, and if both were marked down
across an election could result in lose of quorum.

Since the last restart afs01 has processed 1894352 vl transactions,
afs02 1075698 transactions, afs03 2059186 transactions, and afs04
1403592 transactions.  That will provide you some idea of the load
balancing across your cache managers. The coordinator of course is the
only one to handle write transactions; the rest are read transactions.

For the pt service the transaction counts are afs01 1818212, afs02
1619962, afs03 1554918, and afs04 1075620.  Roughly on par with the vl
service load.  Like the vl service each server has 16 LWP threads.
However, unlike the vl service the pt service is not keeping up with the
requests.  Since the last restart all four servers have failed to
service incoming calls in a timely manner thousands of times each.

The pt service failing to be responsive is a problem because it has
ripple effects on the file servers.  The longer it takes a fileserver to
query the CPS data the longer it takes to accept a new connection from a
cache manager.

The ubik services in all versions of OpenAFS prior to the 1.8 branch
have been built as LWP (cooperatively threaded) processes.  There is
only a single thread in the process that swaps context state.  The rx
threads (listener, event, ...), the vote, disk, and application (vl, pt,
bu, ...) contexts are swapped in either upon a blocking event or a
yield.  Failure of a context to yield blocks other activities including
reading packets, processing requests, etc.  Like AuriStorFS the OpenAFS
1.8 series converts the ubik services (vl, pt, bu) to native threading.
This will permit the vote and disk services and the rx threads
(listener, event,...) to operate with greater parallelism.  Unlike

Re: [OpenAFS] connection timed out, how long is the timeout?

2018-02-04 Thread Jose M Calhariz
On Sun, Feb 04, 2018 at 01:27:07PM -0600, Benjamin Kaduk wrote:
> On Sun, Feb 04, 2018 at 12:29:30PM +, Jose M Calhariz wrote:
> > 
> > Hi,
> > 
> > I am chasing the root problem in my infra-structure of afsdb and
> > afs-fileservers.  Sometimes my afsdb loses quorum in the middle of a
> 
> It is a pretty disruptive event to lose quorum; do you have any idea
> what might be responsible for that happening?

In recent times I have seen two times a "vos release" of a critical
volume to fail.  I may have wrongly interpreted the error message.  So
I past it here the last one:

Could not release lock on the VLDB entry for volume XXX
u: major synchronization error
Error in vos release command.
u: major synchronization error



> 
> > vos operation or the Linux clients time out talking to the
> > file servers.  To help diagnose the problem I would like to know how
> > long is the timeout and if I can change the time out connections in
> > the Debian clients and for the vos operations.  My plan is to increase and
> 
> The ubik election to determine quorum happens every SMALLTIME (60)
> seconds, but normally the current coordinator will retain that role
> and operations can span multiple election cycles.
> 
> Most of the timeouts involved (e.g., RX_IDLE_DEAD_TIME and
> AFS_RXDEADTIME) are also on the order of a minute.
> 
> I think you'd need to recompile in order to adjust these timeouts,
> though.  And I really would recommend tracking down why you're
> losing quorum before trying to paper over things with longer
> timeouts.

I am too chasing a second problem where a Debian OpenAFS client fail
to comunicate with the fileserver and this problem is frequent.  May I
think that this timeout is about 60 seconds?  And that I need to
recompile the client to increase or decrease the timeout?




> 
> -Ben
> 
> > decrease the timeouts in OpenAFS and other timeouts in Linux to
> > identify if I have a possible problem with the data network, iSCSI
> > network, overload on the hosts of VM, overload on the file servers or
> > other possible problem.
> > 
> > The core of my infra-structure are 4 afsdb running Debian 9, and using
> > OpenAFS from Debian 1.6.20, on a shared virtualization platform.  The
> > file-servers running Debian 9 and using OpenAFS from Debian, 1.6.20,
> > are VMs in dedicated hosts for OpenAFS on top of libvirt/KVM.
> > 
> > 
> > Kind regards
> > Jose M Calhariz
> > 
> ___
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info
> 

Kind regards
Jose M Calhariz


-- 
--
.adanibober odnes enilgaT .edraugA
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] connection timed out, how long is the timeout?

2018-02-04 Thread Benjamin Kaduk
On Sun, Feb 04, 2018 at 12:29:30PM +, Jose M Calhariz wrote:
> 
> Hi,
> 
> I am chasing the root problem in my infra-structure of afsdb and
> afs-fileservers.  Sometimes my afsdb loses quorum in the middle of a

It is a pretty disruptive event to lose quorum; do you have any idea
what might be responsible for that happening?

> vos operation or the Linux clients time out talking to the
> file servers.  To help diagnose the problem I would like to know how
> long is the timeout and if I can change the time out connections in
> the Debian clients and for the vos operations.  My plan is to increase and

The ubik election to determine quorum happens every SMALLTIME (60)
seconds, but normally the current coordinator will retain that role
and operations can span multiple election cycles.

Most of the timeouts involved (e.g., RX_IDLE_DEAD_TIME and
AFS_RXDEADTIME) are also on the order of a minute.

I think you'd need to recompile in order to adjust these timeouts,
though.  And I really would recommend tracking down why you're
losing quorum before trying to paper over things with longer
timeouts.

-Ben

> decrease the timeouts in OpenAFS and other timeouts in Linux to
> identify if I have a possible problem with the data network, iSCSI
> network, overload on the hosts of VM, overload on the file servers or
> other possible problem.
> 
> The core of my infra-structure are 4 afsdb running Debian 9, and using
> OpenAFS from Debian 1.6.20, on a shared virtualization platform.  The
> file-servers running Debian 9 and using OpenAFS from Debian, 1.6.20,
> are VMs in dedicated hosts for OpenAFS on top of libvirt/KVM.
> 
> 
> Kind regards
> Jose M Calhariz
> 
> -- 
> --
> 
> A Coca-Cola encarna a verdadeira beleza do capitalismo. Ela é uma espécie de 
> religião secular, sem ensinamento moral nem outro mandamento que não seja o 
> aumento do consumo de sua bebida
> 
> --Mark Pendergrast
> ___
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] connection timed out, how long is the timeout?

2018-02-04 Thread Jose M Calhariz
On Sun, Feb 04, 2018 at 01:54:26PM +0100, Dirk Heinrichs wrote:
> Am 04.02.2018 um 13:29 schrieb Jose M Calhariz:
> 
> > The core of my infra-structure are 4 afsdb
> 
> Wasn't it so that it's better to have an odd number of DB servers (with
> a max. of 5)?

Yes, it would be better with an odd number.  For historical reasons is
stuck on 4.  But I think this is not the root cause of my problem.

> 
> Bye...
> 
>     Dirk
> 

Kind regards
Jose M Calhariz


-- 
--

A Coca-Cola encarna a verdadeira beleza do capitalismo. Ela é uma espécie de 
religião secular, sem ensinamento moral nem outro mandamento que não seja o 
aumento do consumo de sua bebida

--Mark Pendergrast
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] connection timed out, how long is the timeout?

2018-02-04 Thread Dirk Heinrichs
Am 04.02.2018 um 13:29 schrieb Jose M Calhariz:

> The core of my infra-structure are 4 afsdb

Wasn't it so that it's better to have an odd number of DB servers (with
a max. of 5)?

Bye...

    Dirk

-- 
Dirk Heinrichs 
GPG Public Key: D01B367761B0F7CE6E6D81AAD5A2E54246986015
Sichere Internetkommunikation: http://www.retroshare.org
Privacy Handbuch: https://www.privacy-handbuch.de




signature.asc
Description: OpenPGP digital signature


[OpenAFS] connection timed out, how long is the timeout?

2018-02-04 Thread Jose M Calhariz

Hi,

I am chasing the root problem in my infra-structure of afsdb and
afs-fileservers.  Sometimes my afsdb loses quorum in the middle of a
vos operation or the Linux clients time out talking to the
file servers.  To help diagnose the problem I would like to know how
long is the timeout and if I can change the time out connections in
the Debian clients and for the vos operations.  My plan is to increase and
decrease the timeouts in OpenAFS and other timeouts in Linux to
identify if I have a possible problem with the data network, iSCSI
network, overload on the hosts of VM, overload on the file servers or
other possible problem.

The core of my infra-structure are 4 afsdb running Debian 9, and using
OpenAFS from Debian 1.6.20, on a shared virtualization platform.  The
file-servers running Debian 9 and using OpenAFS from Debian, 1.6.20,
are VMs in dedicated hosts for OpenAFS on top of libvirt/KVM.


Kind regards
Jose M Calhariz

-- 
--

A Coca-Cola encarna a verdadeira beleza do capitalismo. Ela é uma espécie de 
religião secular, sem ensinamento moral nem outro mandamento que não seja o 
aumento do consumo de sua bebida

--Mark Pendergrast
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Connection timed out on new mount point

2016-12-02 Thread Dirk Heinrichs
Am 02.12.2016 um 17:48 schrieb Jeffrey Altman:

> The client has cached information for the volume group that indicates
> that no backup volume exists.
>
>   fs checkvolumes

That solved it, indeed.

Thanks a lot.

Bye...

Dirk

-- 
Dirk Heinrichs 
GPG Public Key CB614542 | Jabber: dirk.heinri...@altum.de
Tox: he...@toxme.se
Sichere Internetkommunikation: http://www.retroshare.org
Privacy Handbuch: https://www.privacy-handbuch.de

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Connection timed out on new mount point

2016-12-02 Thread Jeffrey Altman
On 12/2/2016 11:35 AM, Dirk Heinrichs wrote:
> Hi,
> 
> I'm currently facing a strange problem with connection timeouts after
> creating a mount point (fs mkm) for a new volume:
> 
> # fs mkm tester home.tester.backup
> #  ll
> ls: cannot access 'tester': Connection timed out
> total 132K
> ...
> ??   ? ?  ? ?? tester
> 
> The mount point has been created from a client workstation and only
> becomes available there after reboot or cache manager restart. OTOH,
> it's accessible immediately on the server (where /afs is usually not
> accessed):
> 
> # ll
> total 134K
> ...
> drwx--   2  1005  1001 2.0K Dec  1 21:49 tester
> 
> Both server and client are up-to-date Debian Stretch systems running
> OpenAFS 1.6.18.3.
> 
> Any ideas what could be causing the problem?
> 
> Thanks...
> 
> Dirk

The client has cached information for the volume group that indicates
that no backup volume exists.

  fs checkvolumes

Jeffrey Altman

<>

smime.p7s
Description: S/MIME Cryptographic Signature


[OpenAFS] Connection timed out on new mount point

2016-12-02 Thread Dirk Heinrichs
Hi,

I'm currently facing a strange problem with connection timeouts after
creating a mount point (fs mkm) for a new volume:

# fs mkm tester home.tester.backup
#  ll
ls: cannot access 'tester': Connection timed out
total 132K
...
??   ? ?  ? ?? tester

The mount point has been created from a client workstation and only
becomes available there after reboot or cache manager restart. OTOH,
it's accessible immediately on the server (where /afs is usually not
accessed):

# ll
total 134K
...
drwx--   2  1005  1001 2.0K Dec  1 21:49 tester

Both server and client are up-to-date Debian Stretch systems running
OpenAFS 1.6.18.3.

Any ideas what could be causing the problem?

Thanks...

Dirk

-- 
Dirk Heinrichs 
GPG Public Key CB614542 | Jabber: dirk.heinri...@altum.de
Tox: he...@toxme.se
Sichere Internetkommunikation: http://www.retroshare.org
Privacy Handbuch: https://www.privacy-handbuch.de


___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Connection timed out - problem with cache manager?

2016-11-30 Thread Andreas Ladanyi
Iam not sure. I dont know your kernel version.

Maybe the reason is the old afs client module version. There was a
problem with the splice kernel function since kernel 4.4 and backports.

We are using the openafs ppa repository
(https://launchpad.net/~openafs/+archive/ubuntu/stable) on Ubuntu below
Ubuntu 16.10 because this problem is solved in openafs >= 1.6.18 which
isnt part of Ubuntu repo. below 16.10.

I hope this help you.

regards,
Andreas

> Some users at our site reports problems with downloading files
> directly to AFS (and this problem has existed for years).
>
> I'm now working to try to find the cause. Just to eliminate the
> server, we have moved the user's volume to our YFS server, but we
> experience exactly the same problem.
>
> I can't seem to reproduce it on my own machine (Ubuntu 14.04.1 LTS
> with openafs client 1.6.7-1ubuntu1.1).
>
> However, the machine where I have managed to reproduce the problem is
> a terminal server (with lots of users). It's a Ubuntu 12.04.5 LTS with
> openafs version 1.6.1-1+ubuntu0.7.
>
> The AFS cache is set to:
> > cat /etc/openafs/cacheinfo
> /afs:/cache/openafs:500
>
>
> What happens is this:
> I run a wget (from siemens in this case, but probably not important).
> The wget either aborts at 70% or so, with a "Connection timed out",
> or, as happened for me just now:
>
> HTTP request sent, awaiting response... 200 OK
> Length: 1983588866 (1,8G) [application/zip]
> Saving to: `nx-9.0.3.zip.1'
>
> 100%[>] 1 983 588 866 17,7M/s   in
> 1m 50s
>
> utime(nx-9.0.3.zip.1): Connection timed out
> 2016-11-30 11:33:39 (17,3 MB/s) - `nx-9.0.3.zip.1' saved
> [1983588866/1983588866]
>
> So, the file downloaded 100% (to the AFS cache). Then there was a
> delay for some time before the error popped up (while flushing the
> cache, I would guess).
>
> If I look at the resulting file, I see that it's corrupt.
>
> Downloading to local disk first, and then copy to AFS seems to work
> every time.
>
> Does anyone recognize this problem?
>
> /Staffan
>




smime.p7s
Description: S/MIME Cryptographic Signature


[OpenAFS] Connection timed out - problem with cache manager?

2016-11-30 Thread Staffan Hämälä
Some users at our site reports problems with downloading files directly 
to AFS (and this problem has existed for years).


I'm now working to try to find the cause. Just to eliminate the server, 
we have moved the user's volume to our YFS server, but we experience 
exactly the same problem.


I can't seem to reproduce it on my own machine (Ubuntu 14.04.1 LTS with 
openafs client 1.6.7-1ubuntu1.1).


However, the machine where I have managed to reproduce the problem is a 
terminal server (with lots of users). It's a Ubuntu 12.04.5 LTS with 
openafs version 1.6.1-1+ubuntu0.7.


The AFS cache is set to:
> cat /etc/openafs/cacheinfo
/afs:/cache/openafs:500


What happens is this:
I run a wget (from siemens in this case, but probably not important). 
The wget either aborts at 70% or so, with a "Connection timed out", or, 
as happened for me just now:


HTTP request sent, awaiting response... 200 OK
Length: 1983588866 (1,8G) [application/zip]
Saving to: `nx-9.0.3.zip.1'

100%[>] 1 983 588 866 17,7M/s   in 
1m 50s


utime(nx-9.0.3.zip.1): Connection timed out
2016-11-30 11:33:39 (17,3 MB/s) - `nx-9.0.3.zip.1' saved 
[1983588866/1983588866]


So, the file downloaded 100% (to the AFS cache). Then there was a delay 
for some time before the error popped up (while flushing the cache, I 
would guess).


If I look at the resulting file, I see that it's corrupt.

Downloading to local disk first, and then copy to AFS seems to work 
every time.


Does anyone recognize this problem?

/Staffan



smime.p7s
Description: S/MIME Cryptographic Signature


[OpenAFS] Connection timed out and device doesn't exist finally solved

2013-12-24 Thread Timothy Balcer
Very very odd behavior. To put it in short.. an entire fileserver's RW
volumes became unavailable to our colo sites, but not the local site. Every
effort to determine the cause was met with frustration (all sorts of
cachemanager operations yielded nothing)

That is, until I did an fs whereis on the affected volume, on the
fileserver machine itself...

It told me the RW volume was available on host 192.168.122.1. Formerly a
virtual host bridge interface, but no longer used.

VLDB did not show this.. syncserv and syncvldb's had not fixed the problem.
Restarting the fileserver process did not release it, even though the IP
was no longer active.

So I moved one volume. That worked. But I didn't want to do that for the
entire fileserver.

So I entered -rxbind to the fileserver process and restarted it.

Voila. Problem solved.

-- 
Timothy Balcer / IT Services
Telmate / San Francisco, CA
Direct / (415) 300-4313
Customer Service / (800) 205-5510


Re: [OpenAFS] Connection Timed Out errors occasionally when accessing openafs drive

2009-05-19 Thread Ken Elkabany
I upgraded our server and client to 1.4.10. Unfortunately, I am still
receiving Connection Timed Out errors. They rarely occur, but when
they do they are a severe hindrance. My use case is as follows:

Three different unix user accounts (root, www-data, aux) are all
running multiple background processes (~9 total) which access the afs
mount. They each automatically acquire, or re-acquire tickets and
tokens, and then proceed to read, copy, and write files. Occasionally,
upon creating a directory using a python os command similar to mkdir
-p (os.makedirs), I receive a Connection Timed Out error. The
processes must then be restarted.

Any other suggestions?

Ken

On Sun, May 10, 2009 at 7:41 PM, Derrick Brashear sha...@gmail.com wrote:
 it probably matters in the server here, but both.

 Derrick


 On May 10, 2009, at 10:35 PM, Ken Elkabany k...@elkabany.com wrote:

 Is this bug fixed in the client or the server? Thanks.

 Ken

 On Sun, May 10, 2009 at 7:22 PM, Derrick Brashear sha...@gmail.com
 wrote:

 I'd venture this is a bug fixed in 1.4.10, with idle dead time
 computation
 in rx.

 Derrick


 On May 10, 2009, at 9:53 PM, Ken Elkabany k...@elkabany.com wrote:

 Hello,

 I have openafs 1.4.9 client and server running on two separate
 machines across a WAN. The client has scripts that access the
 /afs/our.cell/ directory. Occasionally, the script will fail to
 complete, and the logs will say that the Connection Timed Out on a
 mkdir -p /afs/our.cell/x/y/z command. The frequency of the errors
 are approximately 1 in 100, small enough to not be easily reproducible
 manually, but enough to hamper our project. The scripts run as the
 root user, and is guaranteed to have the proper ticket and token. It's
 also important to note that these scripts often run in parallel (4 at
 a time, all root, modifying our cell). When one fails, all scripts
 running concurrently will fail with the same error, and I typically
 either unlog;kdestroy or restart the openafs-client (I am unsure which
 of those solutions is necessary or sufficient). I will soon have an
 additional LAN setup, and will determine if the same error occurs. Has
 anyone dealt with this issue before?

 Thank you for the assistance,

 Ken
 ___
 OpenAFS-info mailing list
 OpenAFS-info@openafs.org
 https://lists.openafs.org/mailman/listinfo/openafs-info


___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


[OpenAFS] Connection Timed Out errors occasionally when accessing openafs drive

2009-05-10 Thread Ken Elkabany
Hello,

I have openafs 1.4.9 client and server running on two separate
machines across a WAN. The client has scripts that access the
/afs/our.cell/ directory. Occasionally, the script will fail to
complete, and the logs will say that the Connection Timed Out on a
mkdir -p /afs/our.cell/x/y/z command. The frequency of the errors
are approximately 1 in 100, small enough to not be easily reproducible
manually, but enough to hamper our project. The scripts run as the
root user, and is guaranteed to have the proper ticket and token. It's
also important to note that these scripts often run in parallel (4 at
a time, all root, modifying our cell). When one fails, all scripts
running concurrently will fail with the same error, and I typically
either unlog;kdestroy or restart the openafs-client (I am unsure which
of those solutions is necessary or sufficient). I will soon have an
additional LAN setup, and will determine if the same error occurs. Has
anyone dealt with this issue before?

Thank you for the assistance,

Ken
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Connection Timed Out errors occasionally when accessing openafs drive

2009-05-10 Thread Derrick Brashear
I'd venture this is a bug fixed in 1.4.10, with idle dead time  
computation in rx.


Derrick


On May 10, 2009, at 9:53 PM, Ken Elkabany k...@elkabany.com wrote:


Hello,

I have openafs 1.4.9 client and server running on two separate
machines across a WAN. The client has scripts that access the
/afs/our.cell/ directory. Occasionally, the script will fail to
complete, and the logs will say that the Connection Timed Out on a
mkdir -p /afs/our.cell/x/y/z command. The frequency of the errors
are approximately 1 in 100, small enough to not be easily reproducible
manually, but enough to hamper our project. The scripts run as the
root user, and is guaranteed to have the proper ticket and token. It's
also important to note that these scripts often run in parallel (4 at
a time, all root, modifying our cell). When one fails, all scripts
running concurrently will fail with the same error, and I typically
either unlog;kdestroy or restart the openafs-client (I am unsure which
of those solutions is necessary or sufficient). I will soon have an
additional LAN setup, and will determine if the same error occurs. Has
anyone dealt with this issue before?

Thank you for the assistance,

Ken
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Connection Timed Out errors occasionally when accessing openafs drive

2009-05-10 Thread Ken Elkabany
Is this bug fixed in the client or the server? Thanks.

Ken

On Sun, May 10, 2009 at 7:22 PM, Derrick Brashear sha...@gmail.com wrote:
 I'd venture this is a bug fixed in 1.4.10, with idle dead time computation
 in rx.

 Derrick


 On May 10, 2009, at 9:53 PM, Ken Elkabany k...@elkabany.com wrote:

 Hello,

 I have openafs 1.4.9 client and server running on two separate
 machines across a WAN. The client has scripts that access the
 /afs/our.cell/ directory. Occasionally, the script will fail to
 complete, and the logs will say that the Connection Timed Out on a
 mkdir -p /afs/our.cell/x/y/z command. The frequency of the errors
 are approximately 1 in 100, small enough to not be easily reproducible
 manually, but enough to hamper our project. The scripts run as the
 root user, and is guaranteed to have the proper ticket and token. It's
 also important to note that these scripts often run in parallel (4 at
 a time, all root, modifying our cell). When one fails, all scripts
 running concurrently will fail with the same error, and I typically
 either unlog;kdestroy or restart the openafs-client (I am unsure which
 of those solutions is necessary or sufficient). I will soon have an
 additional LAN setup, and will determine if the same error occurs. Has
 anyone dealt with this issue before?

 Thank you for the assistance,

 Ken
 ___
 OpenAFS-info mailing list
 OpenAFS-info@openafs.org
 https://lists.openafs.org/mailman/listinfo/openafs-info

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Connection Timed Out errors occasionally when accessing openafs drive

2009-05-10 Thread Derrick Brashear

it probably matters in the server here, but both.

Derrick


On May 10, 2009, at 10:35 PM, Ken Elkabany k...@elkabany.com wrote:


Is this bug fixed in the client or the server? Thanks.

Ken

On Sun, May 10, 2009 at 7:22 PM, Derrick Brashear sha...@gmail.com  
wrote:
I'd venture this is a bug fixed in 1.4.10, with idle dead time  
computation

in rx.

Derrick


On May 10, 2009, at 9:53 PM, Ken Elkabany k...@elkabany.com wrote:


Hello,

I have openafs 1.4.9 client and server running on two separate
machines across a WAN. The client has scripts that access the
/afs/our.cell/ directory. Occasionally, the script will fail to
complete, and the logs will say that the Connection Timed Out on a
mkdir -p /afs/our.cell/x/y/z command. The frequency of the errors
are approximately 1 in 100, small enough to not be easily  
reproducible

manually, but enough to hamper our project. The scripts run as the
root user, and is guaranteed to have the proper ticket and token.  
It's
also important to note that these scripts often run in parallel (4  
at

a time, all root, modifying our cell). When one fails, all scripts
running concurrently will fail with the same error, and I typically
either unlog;kdestroy or restart the openafs-client (I am unsure  
which

of those solutions is necessary or sufficient). I will soon have an
additional LAN setup, and will determine if the same error occurs.  
Has

anyone dealt with this issue before?

Thank you for the assistance,

Ken
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info



___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Connection timed out?

2009-03-11 Thread Harald Barth
 During this test we encounter 'Permission denied' errors, which seem to
 coincide with 'kernel: afs: failed to store file (110)' entries in
 /var/log/messages. 110=Connection timed out. The fileserver is busy but
 responsive, about 25 builds (out of 50) complete normally.

I don't know if this is a coincidence or not. I have 1.4.8 clients
that does not behave against a 1.4.2 (yeah, I know...) server:

Mar 11 13:21:18 a03c11n14 kernel: afs: Waiting for busy volume 537086116 
(prj.sbc.aronh.13) in cell pdc.kth.se
Mar 11 13:21:20 a03c11n14 kernel: afs: failed to store file (network problems)
Mar 11 13:23:33 a03c11n14 last message repeated 3 times
Mar 11 13:25:26 a03c11n14 last message repeated 4 times
Mar 11 13:27:23 a03c11n14 last message repeated 4 times
Mar 11 13:29:30 a03c11n14 last message repeated 4 times
Mar 11 13:31:37 a03c11n14 last message repeated 4 times
Mar 11 13:33:39 a03c11n14 last message repeated 4 times
Mar 11 13:35:36 a03c11n14 last message repeated 4 times
Mar 11 13:37:34 a03c11n14 last message repeated 4 times
Mar 11 13:39:38 a03c11n14 last message repeated 4 times

Then silence.

Console said something like:
Call Trace: ... system_call+0x7e/0x83
 do_sys_open+0x5c/0xbe
.. Kernel panic - not syncing: Fatal exception

As this is (eh, was) a parallell job several but not all clients
involved did crash like this. Unfortunately, I have no way how to
repeat. I have moved the volume to a 1.4.8 server to start with.

Harald.


___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


[OpenAFS] Connection timed out?

2009-03-10 Thread Robbert Eggermont
L.S.,

We are evaluating OpenAFS for use with 50 clients. One of the tests is a
kernel build on 50 clients at the same time.

During this test we encounter 'Permission denied' errors, which seem to
coincide with 'kernel: afs: failed to store file (110)' entries in
/var/log/messages. 110=Connection timed out. The fileserver is busy but
responsive, about 25 builds (out of 50) complete normally.

We are running 1.4.8 client  server, kernel  2.6.18 64-bits. Currently
all server processes run on the same server. Fileserver settings:
/usr/afs/bin/fileserver -p 128 -b 512 -l 3072 -s 3072 -vc 3072 -cb 65536
-busyat 1536 -rxpck 1024 -nojumbo

What are we doing wrong (except for the way we test;-))?

Regards,

Robbert

-- 
Robbert Eggermont   Information  Communication Theory
r.eggerm...@tudelft.nl Electr.Eng., Mathematics  Comp.Science
+31 (15) 2783234Delft University of Technology
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Connection timed out?

2009-03-10 Thread Felix Frank

On Tue, 10 Mar 2009, Robbert Eggermont wrote:


L.S.,

We are evaluating OpenAFS for use with 50 clients. One of the tests is a
kernel build on 50 clients at the same time.

During this test we encounter 'Permission denied' errors, which seem to
coincide with 'kernel: afs: failed to store file (110)' entries in
/var/log/messages. 110=Connection timed out. The fileserver is busy but
responsive, about 25 builds (out of 50) complete normally.

We are running 1.4.8 client  server, kernel  2.6.18 64-bits. Currently
all server processes run on the same server. Fileserver settings:
/usr/afs/bin/fileserver -p 128 -b 512 -l 3072 -s 3072 -vc 3072 -cb 65536
-busyat 1536 -rxpck 1024 -nojumbo


The number of threads seems to be more than appropriate for 50 clients.
It might be interesting to look at the output of rxdebug server 7000
during a build, especially the top, where it tells you about waiting calls
and idle threads.

Regards
Felix
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Connection timed out?

2009-03-10 Thread Robbert Eggermont
Felix Frank wrote:
 The number of threads seems to be more than appropriate for 50 clients.
 It might be interesting to look at the output of rxdebug server 7000
 during a build, especially the top, where it tells you about waiting calls
 and idle threads.

The test consists of an untar, make -j2, and rm. The connection timeouts
started at about 22:05 (during the make).

rxde...@server:
 2009-03-09T21:15+0100: Trying 127.0.0.1 (port 7000):
 Free packets: 2891, packet reclaims: 10968, calls: 14533306, used FDs: 20
 not waiting for packets.
 0 calls waiting for a thread
 123 threads are idle
 2009-03-09T21:20+0100: Trying 127.0.0.1 (port 7000):
 Free packets: 2496, packet reclaims: 10968, calls: 14806865, used FDs: 61
 not waiting for packets.
 0 calls waiting for a thread
 78 threads are idle
 2009-03-09T21:25+0100: Trying 127.0.0.1 (port 7000):
 Free packets: 2067, packet reclaims: 10968, calls: 15155769, used FDs: 64
 not waiting for packets.
 0 calls waiting for a thread
 86 threads are idle
 2009-03-09T21:30+0100: Trying 127.0.0.1 (port 7000):
 Free packets: 2361, packet reclaims: 10968, calls: 15451575, used FDs: 64
 not waiting for packets.
 0 calls waiting for a thread
 87 threads are idle
 2009-03-09T21:35+0100: Trying 127.0.0.1 (port 7000):
 Free packets: 2361, packet reclaims: 10968, calls: 15888390, used FDs: 64
 not waiting for packets.
 0 calls waiting for a thread
 99 threads are idle
 2009-03-09T21:40+0100: Trying 127.0.0.1 (port 7000):
 Free packets: 2382, packet reclaims: 10968, calls: 16312797, used FDs: 64
 not waiting for packets.
 0 calls waiting for a thread
 96 threads are idle
 2009-03-09T21:45+0100: Trying 127.0.0.1 (port 7000):
 Free packets: 2551, packet reclaims: 10968, calls: 17050004, used FDs: 64
 not waiting for packets.
 0 calls waiting for a thread
 105 threads are idle
 2009-03-09T21:50+0100: Trying 127.0.0.1 (port 7000):
 Free packets: 2697, packet reclaims: 10968, calls: 17827397, used FDs: 64
 not waiting for packets.
 0 calls waiting for a thread
 99 threads are idle
 2009-03-09T21:55+0100: Trying 127.0.0.1 (port 7000):
 Free packets: 2574, packet reclaims: 10968, calls: 18517191, used FDs: 64
 not waiting for packets.
 0 calls waiting for a thread
 103 threads are idle
 2009-03-09T22:00+0100: Trying 127.0.0.1 (port 7000):
 Free packets: 2562, packet reclaims: 10968, calls: 19140482, used FDs: 64
 not waiting for packets.
 0 calls waiting for a thread
 90 threads are idle
 2009-03-09T22:05+0100: Trying 127.0.0.1 (port 7000):
 Free packets: 1466, packet reclaims: 11269, calls: 19335878, used FDs: 64
 not waiting for packets.
 0 calls waiting for a thread
 40 threads are idle
 2009-03-09T22:10+0100: Trying 127.0.0.1 (port 7000):
 Free packets: 1219, packet reclaims: 12979, calls: 19414589, used FDs: 64
 not waiting for packets.
 0 calls waiting for a thread
 43 threads are idle
 2009-03-09T22:15+0100: Trying 127.0.0.1 (port 7000):
 Free packets: 2484, packet reclaims: 14897, calls: 19466551, used FDs: 64
 not waiting for packets.
 0 calls waiting for a thread
 84 threads are idle

upt...@server:
  21:20:02 up 27 days,  4:34,  9 users,  load average: 6.14, 2.46, 0.95
  21:25:01 up 27 days,  4:39,  9 users,  load average: 3.72, 3.92, 2.05
  21:30:01 up 27 days,  4:44,  9 users,  load average: 5.04, 3.94, 2.50
  21:35:02 up 27 days,  4:49,  9 users,  load average: 5.72, 4.82, 3.26
  21:40:01 up 27 days,  4:54,  9 users,  load average: 7.06, 5.53, 3.95
  21:45:01 up 27 days,  4:59,  9 users,  load average: 10.97, 8.74, 5.73
  21:50:02 up 27 days,  5:04, 10 users,  load average: 4.00, 7.05, 5.94
  21:55:02 up 27 days,  5:09, 10 users,  load average: 4.29, 5.32, 5.46
  22:00:02 up 27 days,  5:14, 10 users,  load average: 8.73, 8.09, 6.68
  22:05:02 up 27 days,  5:19, 10 users,  load average: 2.99, 5.27, 5.89
  22:10:02 up 27 days,  5:24, 10 users,  load average: 2.38, 3.75, 5.07
  22:15:02 up 27 days,  5:29, 10 users,  load average: 4.29, 3.44, 4.51

The first peak is during the untar, the second during the make.
After ~10 clients timed out, the load went down a bit.

rxdebug localhost -rxstats -long (from this morning):
 Trying 127.0.0.1 (port 7000):
 Free packets: 2895, packet reclaims: 18020, calls: 22235421, used FDs: 13
 not waiting for packets.
 0 calls waiting for a thread
 123 threads are idle
 rx stats: free packets 2895, allocs 367120898, alloc-failures(rcv 0/0,send 
 0/0,ack 0)
greedy 0, bogusReads 0 (last from host 0), noPackets 0, noBuffers 0, 
 selects 0, sendSelects 0
packets read: data 327835144 ack 33311295 busy 0 abort 3 ackall 0 
 challenge 1066 response 610 debug 654 params 0 unused 0 unused 0 unused 0 
 version 0
other read counters: data 327835144, ack 33311295, dup 3574 spurious 0 
 dally 0
packets sent: data 38234254 ack 206138290 busy 0 abort 3072 ackall 0 
 challenge 626 response 1066 debug 0 params 0 unused 0 unused 0 unused 0 
 version 0
other send counters: ack 206138290, data 76468508 (not resends), resends 
 18183, pushed 0, 

Re: [OpenAFS] Connection timed out?

2009-03-10 Thread Hartmut Reuter
Robbert Eggermont wrote:
 L.S.,
 
 We are evaluating OpenAFS for use with 50 clients. One of the tests is a
 kernel build on 50 clients at the same time.
 
 During this test we encounter 'Permission denied' errors, which seem to
 coincide with 'kernel: afs: failed to store file (110)' entries in
 /var/log/messages. 110=Connection timed out. The fileserver is busy but
 responsive, about 25 builds (out of 50) complete normally.
 
 We are running 1.4.8 client  server, kernel  2.6.18 64-bits. Currently
 all server processes run on the same server. Fileserver settings:
 /usr/afs/bin/fileserver -p 128 -b 512 -l 3072 -s 3072 -vc 3072 -cb 65536
 -busyat 1536 -rxpck 1024 -nojumbo
 
 What are we doing wrong (except for the way we test;-))?
 
 Regards,
 
 Robbert
 

My feeling is that here the famous new (with 1.4.8) idleDead mechanism
plays a role. It would be interesting whether the same happens on 1.4.7
clients or not.

Hartmut


smime.p7s
Description: S/MIME Cryptographic Signature


[OpenAFS] connection timed out after salvage completes

2006-07-27 Thread Adam Megacz

My fileserver seems to want to salvage every time the machine boots,
but that's another story...

It seems that if I access a volume being salvaged (OpenAFS 1.4.1 Linux
client), I get the usual connection timed out error... but once the
volume finishes salvaging and comes on-line (and other clients can
access it), the client that got the error continues getting the error
for several minutes.

Is this the expected behavior, or should I narrow down the problem
further and file a bug report?

  - a

-- 
PGP/GPG: 5C9F F366 C9CF 2145 E770  B1B8 EFB1 462D A146 C380

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] connection timed out after salvage completes

2006-07-27 Thread Russ Allbery
Adam Megacz [EMAIL PROTECTED] writes:

 My fileserver seems to want to salvage every time the machine boots, but
 that's another story...

Make sure your system shutdown process is cleanly shutting down the file
server.

 It seems that if I access a volume being salvaged (OpenAFS 1.4.1 Linux
 client), I get the usual connection timed out error... but once the
 volume finishes salvaging and comes on-line (and other clients can
 access it), the client that got the error continues getting the error
 for several minutes.

 Is this the expected behavior, or should I narrow down the problem
 further and file a bug report?

It's expected; when a file server is down, the cache manager will mark the
host as down and won't retry for some interval (five minutes sticks in my
head).  You can force an immediate check with fs checkservers.

-- 
Russ Allbery ([EMAIL PROTECTED]) http://www.eyrie.org/~eagle/
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


[OpenAFS] Connection timed out

2006-01-23 Thread Amir Saad
Hallo,
i use OpenAFS 1.4 , MIT Kerberos 
i can successfully acquire a ticket and aklog run 
i got the error  fs:'/afs': Connection timed out when i tried to run fs 
setacl /afs system:anyuser rl
can anyone help?
thanks
Amir Saad
Software Engineer
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Connection timed out

2006-01-23 Thread Derrick J Brashear

On Mon, 23 Jan 2006, Amir Saad wrote:


Hallo,
i use OpenAFS 1.4 , MIT Kerberos
i can successfully acquire a ticket and aklog run
i got the error  fs:'/afs': Connection timed out when i tried to run fs setacl 
/afs system:anyuser rl
can anyone help?


Just a guess, but...

Turn off dynroot, or stop trying to set an ACL on a fake directory. You 
shouldn't need to anyway.


Derrick
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Connection timed out?

2003-04-02 Thread Michael Robokoff
We are having a similar problem on some of our machines. It seems
some of our machines time out on file transfers but lookup access
seems fine.
--Mike

John Koyle wrote:
I have about 6 volumes on a server and have a separate server that has
readonly replicas of those volumes.   Call them a, a.b, a.c, a.d, etc.
I can access all volumes just fine from two different clients, however
one volume, a.c, keeps getting a connection timed out error on the
clients.  It happens roughly at the same time on both client systems,
but does not happen with any other volumes - I can access them just
fine.
Running fs checkv clears up the problem for awhile, but several hours
later (6-8), the problem crops up again.  I've tried doing a backup of
the volume, deleteing it from the servers, then restoring and it still
happens.
Does anyone have any ideas for this?  Running v1.2.8 on solaris9 servers
and RH linux 7.x clients.
Thanks!
John
___
OpenAFS-info mailing list
[EMAIL PROTECTED]
https://lists.openafs.org/mailman/listinfo/openafs-info


___
OpenAFS-info mailing list
[EMAIL PROTECTED]
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Connection timed out

2002-04-03 Thread Turbo Fredriksson

 Torbjorn == Torbjorn Pettersson [EMAIL PROTECTED] writes:

 It seem that there is no space on the device (No space left on
 device), but why would the client stop responding because of
 this?

Torbjorn  You checked so you don't run out of diskspace on the
Torbjorn cache?

It did, but why did that force a restart of the client? Seems kind'a
dumb, doesn't it?

Torbjorn I'm using the debian testing openafs packages,
Torbjorn v1.2.3final2-3, with with a kerberos 5 server, on amd
Torbjorn cpu;s...

Me to (Debian and all), just recompiled for my 'semi-potato' box...
-- 
Serbian Legion of Doom $400 million in gold bullion attack Albanian
ammunition FBI Peking nitrate ammonium Mossad FSF KGB Waco, Texas
Semtex
[See http://www.aclu.org/echelonwatch/index.html for more about this]
___
OpenAFS-info mailing list
[EMAIL PROTECTED]
https://lists.openafs.org/mailman/listinfo/openafs-info



Re: [OpenAFS] Connection timed out

2002-04-03 Thread Torbjorn Pettersson

Turbo Fredriksson [EMAIL PROTECTED] writes:

  Torbjorn == Torbjorn Pettersson [EMAIL PROTECTED] writes:
 
  It seem that there is no space on the device (No space left on
  device), but why would the client stop responding because of
  this?
 
 Torbjorn  You checked so you don't run out of diskspace on the
 Torbjorn cache?
 
 It did, but why did that force a restart of the client? Seems kind'a
 dumb, doesn't it?
 

 I seem to remember that there is no actuall consistancy checks
on the cache, so I think you are entering kind of an undefined
state when you do trash it... I would recomend that you adjust
your cachesize settings to make sure that it doesn't happen
again. Also, having a separate partition for the cache is a good
thing(tm).  

 Torbjorn I'm using the debian testing openafs packages,
 Torbjorn v1.2.3final2-3, with with a kerberos 5 server, on amd
 Torbjorn cpu;s...
 
 Me to (Debian and all), just recompiled for my 'semi-potato' box...
 -- 
 Serbian Legion of Doom $400 million in gold bullion attack Albanian
 ammunition FBI Peking nitrate ammonium Mossad FSF KGB Waco, Texas
 Semtex
 [See http://www.aclu.org/echelonwatch/index.html for more about this]
 ___
 OpenAFS-info mailing list
 [EMAIL PROTECTED]
 https://lists.openafs.org/mailman/listinfo/openafs-info

//Tobbe
-- 
##
Torbjörn Pettersson   #  Email   [EMAIL PROTECTED]
Vattugatan 5  #  Web www.strul.nu/~tobbe
S-111 52  Stockholm, Sweden   #
##
___
OpenAFS-info mailing list
[EMAIL PROTECTED]
https://lists.openafs.org/mailman/listinfo/openafs-info