On Tue, May 03, 2005 at 08:38:36AM -0600, Patrick Walsh wrote:
> What would you suggest would be the best way to detect when a conflict
> occurs so that an administrator can be notified? Is there a particular
> message we can monitor one of the logs for? Or perhaps a cron job with
> a find command similar to this:
>
> find . -type l -a -not \( -xtype f -o -xtype d \)
I typically use
find . -lname '@*'
The problem is really that local-global conflicts only appear on the
client that failed to reintegrate. server-server conflict are seen first
by the client that noticed the version-vector differences when it called
getattr.
> Or perhaps monitoring the /usr/coda/spool directory? How is this
> managed in other places?
Not sure how other are doing it, but for most conflicts it is typically
a user that alerts me. I don't get all that many conflicts on our
'backend' servers because things like the hypermail mailinglist archives
are actually built on the local disk and rsync'd over to /coda.
If the rsync gets stuck it only affects the client that is writing the
update, so users just don't see the new mails.
> Also, the repair utility doesn't seem to have a way to list what
> objects are in conflict -- you have to already know the full path to
> them. Are there any undocumented commands or shortcuts for using this
> utility?
one shortcut in combination with the previous find is,
find . -lname '@*' -exec repair {} /tmp/fix -owner 7768 -mode 755 \;
However this only works reliably for directory conflicts, if there are
any file conflicts this would overwrite them with the contents of the
fix-file that was written by the previous directory repair.
> Would it help my situation if there was a minimum for the RTT estimate
> in the case where the estimate is near zero? That would make it so the
> server can take a moment to flush a file without the client write
> disconnecting.
There is a minimum RTT value which I think is 300ms. That should be
pretty conservative, especially since even a 10baseT network tends to
have <10ms RTTs. I think this value was picked because it was 50% more
than a typical roundtrip on a ppp link.
> > At the same time, the poor server is still
> > stuck waiting for the disk, and can't even dash off a quick ack telling
> > the client that it did get the request and is working on it.
>
> Are there any plans to make the server multi-threaded to avoid these
> sorts of bottle-necks?
There is a version of LWP that runs on top of pthreads. If that is used
when building RVM, it runs the flush/truncate daemon thread fully
concurrent. But the RPC2 socket listener still runs as a non-concurrent
thread. I have used a venus built this way for a bit when I was trying
to catch some memory leaks with valgrind. But overall is isn't totally
reliable and a bit slower. It is also not really possible to go
completely multi-threaded, a lot of the code expects that threads are
cooperative and that concurrency is limited only to places where we
explicitly yield control.
Jan