I am trying out the new coda release, not with success. Here's a description
of my failed effort.
- create a vice setup on a debian x86 box. The box, lambda.csail.mit.edu,
lives in a real machine room with a fat pipe to the net.
- create a venus setup on an ubuntu x86_64 box with
+ 10Gb of cache
+ big DATA & LOG:
% ls -l DATA LOG
-rw------- 1 root root 1201865464 2007-05-01 12:07 DATA
-rw------- 1 root root 300468736 2007-05-01 12:09 LOG
This box lives in my office at Northeastern, about 2 miles away from the
server over at MIT. It also has a university-grade net connection. It has
4Gb of ram and 10Gb of (encrypted) swap space on 3 spindles.
Here is the /etc/venus.conf:
realm="lambda.csail.mit.edu"
# 10 Gb of local file caching
cacheblocks=10000000
errorlog="/var/log/coda/venus.err"
logfile="/var/log/coda/venus.log"
rvm_log="/var/lib/coda/LOG"
rvm_data="/var/lib/coda/DATA"
cachedir="/var/lib/coda/cache"
checkpointdir="/var/lib/coda/spool"
pid_file="/var/run/coda-client.pid"
run_control_file="/var/run/coda-client.ctrl"
marinersocket="/var/run/coda-client.mariner"
mapprivate=1
DATA, LOG and the cache tree all live in my ext3 /var file system --
no raw partitions of any kind being used here.
- copy 2.6 Gb / 26k files of my home dir into my coda filesys, using
cp -pr <stuff> /coda/lambda.csail.mit.edu/user/shivers/.
This runs for a while, then completes w/no problem.
I can poke around in the coda dir from the shell using cd, ls & more,
no problem.
Meanwhile, codacon is scrolling writeback messages like crazy. Eventually,
everything is copied back to the server, cfs lv shows no pending CML
entries, and a du -sk of the vice directory shows that it's got all 2.6Gb
of the bits.
- hoard it all with
hoard add /coda/lambda.csail.mit.edu d+
Note that my 2.6Gb hoard should fit entirely within my 10Gb cache.
This runs for a while, then terminates successfully.
- Test it by walking the whole tree & reading every file, using find(1):
find /coda/lambda.csail.mit.edu/user/shivers -type f -exec /tmp/eat {} \;
-print
where /tmp/eat is a simple shell script that cats its args to /dev/null:
#!/bin/sh -
exec cat "$@" > /dev/null
This runs & completes successfully. Codacon shows no server->client file
transfer during this. So far, so good! Now for the trouble.
- Test it again by saying "cfs disconnect" and then redoing the find
tree-walk to read the whole subdir a second time.
This runs along fine for a while, with a silent codacon, then codacon
suddenly outputs
ValidateAttrsPlusSHA CVS(4.7f000000.baf.28f8) [0] ( 11:56:43 )
Probe ( 11:57:21 )
and the find tree walk hangs. After a minute or two, codacon says
unreachable lambda.csail.mit.edu ( 11:58:10 )
and the find walk resumes with the following output:
find: ./research/mrlc/mrlc/spim/CVS: Permission denied
./research/mrlc/mrlc/spim/mips-syscall.h
./research/mrlc/mrlc/spim/endian.c
./research/mrlc/mrlc/spim/buttons.h
./research/mrlc/mrlc/spim/mips-syscall.o
./research/mrlc/mrlc/spim/display-utils.c
find: ./research/mrlc/mrlc/confpaper: Permission denied
find: ./research/mrlc/mrlc/paper: Permission denied
find: ./research/mrlc/mrlc/CVS: Permission denied
find: ./research/mrlc/mrlc/source: Permission denied
find: ./research/mrlc/mrlc/harness: Permission denied
find: ./research/mrlc/mrlc/paper1: Permission denied
./research/mrlc/okasaki-msg
.
.
.
and the rest of the find walk has these "Permission denied" message
scattered throughout the transcript.
- Then I go poke around in the file system. I now have trouble accessing
the problem directories. For example:
% ls -ld research/mrlc/mrlc/spim/CVS
drwxr-xr-x 1 shivers nogroup 2048 2007-04-30 16:04
research/mrlc/mrlc/spim/CVS
% ls research/mrlc/mrlc/spim/CVS
ls: research/mrlc/mrlc/spim/CVS: Permission denied
% cfs la research/mrlc/mrlc/spim/CVS
research/mrlc/mrlc/spim/CVS: Connection timed out
%
Timed out? Hey, I hoarded the file and *disconnected*. Why is venus
even trying to connect at all?
Here is what cfs lv says while I'm in the disconnected state:
% cfs lv /coda/lambda.csail.mit.edu/
Status of volume 7f000000 (2130706432) named "/"
Volume type is Replicated
Connection State is Unreachable
Reintegration age: 0 sec, time 5.000 sec
- Then I reconnect with
cfs reconnect
Now I can see the problem directories with no trouble. I redo the find
tree-walk a third time and it completes with no problems.
+ codacon shows *no* server->client file motion.
+ my network load meter shows only minor traffic.
So I have reason to believe the whole tree walk ran entirely out of the
cache.
I'm mystified. By the way, I don't think it's because I'm running on an
x86_64 client. I have gotten similar problems when running with my simple
x86 notebook as the client this week.
-Olin