Re: [OpenAFS] CopyOnWrite failed - orphaned files

2002-03-05 Thread hoffman

 I guess these problems have nothing to do with your hardware.
 
 Have you checked the file he tries to read exists in the /vicepX partition
 and has the correct size?

I never thought to try that.  I will do so the next time it happens.
I'm used to the older AFS architectures where /vicepX contained only
the volume headers.

 If so, I would try to run the non-threaded version of the fileserver which
 is built in src/viced. Also the problem of processes not going away looks
 for me like a problem in the pthread environment.

This sounds like a good thing to try in any case.  I will do that and
report back.

Many thanks!

---Bob.
--
Bob Hoffman, N3CVL  University of Pittsburgh   Tel: +1 412 624 8404
[EMAIL PROTECTED] Department of Computer Science Fax: +1 412 624 8854
___
OpenAFS-info mailing list
[EMAIL PROTECTED]
https://lists.openafs.org/mailman/listinfo/openafs-info



Re: [OpenAFS] CopyOnWrite failed - orphaned files

2002-03-05 Thread Derrick J Brashear

On Tue, 5 Mar 2002 [EMAIL PROTECTED] wrote:

  If so, I would try to run the non-threaded version of the fileserver which
  is built in src/viced. Also the problem of processes not going away looks
  for me like a problem in the pthread environment.
 
 This sounds like a good thing to try in any case.  I will do that and
 report back.

My suspicion is it won't reoccur with that, making it a workaround but not
a useful way to debug, sadly. I can't provide any more detail than that
yet though

-D


___
OpenAFS-info mailing list
[EMAIL PROTECTED]
https://lists.openafs.org/mailman/listinfo/openafs-info



Re: [OpenAFS] CopyOnWrite failed - orphaned files

2002-03-05 Thread Hartmut Reuter


I guess these problems have nothing to do with your hardware.

Have you checked the file he tries to read exists in the /vicepX partition
and has the correct size?

If so, I would try to run the non-threaded version of the fileserver which
is built in src/viced. Also the problem of processes not going away looks
for me like a problem in the pthread environment.

Hartmut


On Mon, 4 Mar 2002 [EMAIL PROTECTED] wrote:

 Marco Foglia ([EMAIL PROTECTED]) wrote:
 
  we are having continuous troubles with CopyOnWrite failures on a linux
  file server and as a result orphaned files. We have tried several setups
  and even changed hardware but it is still there. The current setup is 
  redhat-6.2 with 2.2.19-6.2.12smp and Openafs 1.2.2. 
 
  It only happens after a full backup (vos dump volume.backup -time 0) of
  a backup volume. You can find the following message in the file server
  log file:
 
  CopyOnWrite failed: volume 536872685 in partition /vicepa  (tried
  reading 8192, read 0, wrote 0, errno 4) volume needs salvage
 
  Does anybody have the same problems? Solutions? 
 
 
 I have been fighting this problem for almost a year with no solution.
 I didn't post anything to this list before because I wasn't convinced
 that the problem wasn't in my hardware.  By now, I've tried enough
 different combinations that I'm convinced that it's a software problem.
 
 My hardware:   Dual 800 MHz Pentium IIIs on two different motherboards,
1 GB SDRAM
DPT Century VI RAID controller
 or
Adaptec 3200S RAID controller
 or
Mylex AcceleRAID 352 RAID controller
 or
non-RAID Symbios SCSI controller
 
 My software:   RedHat 6.2, 2.2.16-3
Transarc AFS Base configuration afs3.6 2.0
 or
RedHat 6.2, 2.2.16-3
Transarc AFS Base configuration afs3.6 2.3
 or
RedHat 6.2, 2.2.16-3
OpenAFS 1.0.3
 or
[ many different versions in between ]
 or
RedHat 7.2, kernel 2.4.9-21
OpenAFS 1.2.3
 
 As you can see, I've tried many different hardware and software
 combinations and I still get corrupted volumes.  It can happen
 on any backup, not just full backups.  I have tried running the
 servers in uniprocessor mode and while that seemed to help, it
 did not eliminate the problem.
 
 
 Here are some excerpts from the log files on the most recent
 corruption, using the last software configuration above:
 
 FileLog:
 Thu Feb 28 15:47:19 2002 CopyOnWrite failed: volume 536877621 in partition /vicepc  
(tried reading 8192, read 0, wrote 0, errno 4) volume needs salvage
 Thu Feb 28 15:58:46 2002 VAttachVolume: volume salvage flag is ON for 
/vicepc//V0536877621.vol; volume needs salvage
 
 VolserLog:
 Thu Feb 28 15:35:09 2002 1 Volser: Clone: Recloning volume 536876261 to volume 
536876263
 [...]
 Thu Feb 28 15:58:46 2002 VAttachVolume: volume salvage flag is ON for 
/vicepc/V0536877621.vol; volume needs salvage
 Thu Feb 28 15:58:46 2002 1 Volser: ListVolumes: Could not attach volume 536877621 
(V0536877621.vol) error=101
 
 SalvageLog.old:
 @(#) OpenAFS 1.2.3 built  2002-02-01 
 02/28/2002 16:00:48 STARTING AFS SALVAGER 2.4 (/usr/afs/bin/salvager /vicepc 
536877621)
 02/28/2002 16:00:48 CHECKING CLONED VOLUME 536877622.
 02/28/2002 16:00:48 cs.usr0.naveen.backup (536877622) updated 02/28/2002 15:30
 02/28/2002 16:00:48 Vnode 1: length incorrect; (is 8192 should be 0)
 02/28/2002 16:00:48 SALVAGING VOLUME 536877621.
 02/28/2002 16:00:48 cs.usr0.naveen (536877621) updated 02/28/2002 15:46
 02/28/2002 16:00:48 Vnode 1198: version  inode version; fixed (old status)
 02/28/2002 16:00:48 Vnode 1514: version  inode version; fixed (old status)
 02/28/2002 16:00:48 Vnode 2754: version  inode version; fixed (old status)
 [ similar lines deleted ]
 02/28/2002 16:00:48 Vnode 10078: version  inode version; fixed (old status)
 02/28/2002 16:00:48 Vnode 1: length incorrect; changed from 8192 to 0
 02/28/2002 16:02:19 First page in directory does not exist.
 02/28/2002 16:02:19 Directory bad, vnode 1; salvaging...
 02/28/2002 16:02:19 Salvaging directory 1...
 02/28/2002 16:02:19 Failed to read first page of fromDir!
 02/28/2002 16:02:19 Checking the results of the directory salvage...
 02/28/2002 16:02:20 dir vnode 601: special old unlink-while-referenced file 
.__afsD8D is deleted (vnode 1120)
 02/28/2002 16:02:20 dir vnode 601: special old unlink-while-referenced file 
.__afs8036 is deleted (vnode 780)
 02/28/2002 16:02:20 dir vnode 623: special old unlink-while-referenced file 
.__afsDED1 is deleted (vnode 7988)
 [ similar lines deleted ]
 02/28/2002 16:02:20 dir vnode 623: special old unlink-while-referenced file 
.__afs4975 is deleted (vnode 5862)
 02/28/2002 16:02:20 Vnode 1: link count incorrect (was 45, now 2)
 02/28/2002 16:02:30 Found 4784