On Fri, 31 Jul 2009, Jeff Garzik wrote:
> You have outlined my ideas for CLD version 2.0, essentially:  create a
> libpaxos and libpaxos_db, and use those in CLD.
> 
> So, 100% agreed...
> 
> Unfortunately that is a lower priority for me than hammering out a rock solid
> CLD <-> cldc network protocol, and getting out a version 1.0 of the CLD
> service with _some_ form of solid, working replication and master fail-over.
> 
> I figured, for CLD version 1.0, db4 already went through the pain of debugging
> a replicated database.  Avoiding that myself would help get CLD up and running
> much more rapidly.

Definitely!

> > I'm going to look a bit more closely at what it'll take to moving ceph to
> > cld, then.  Among other things, it'll mean part of cldc in the kernel, but
> > should be a net architectural improvement.
> 
> Cool!  A couple kernel-related comments:
> 
> * some operations involved in master-discovery and master-failover, most
> notably DNS SRV lookups, you probably want to do in userspace

Right.  Currently the dns lookups are done by mount.ceph, so that won't 
change.  I think the only real requirement here as far as the protocol 
goes is that cldc be able to discover future cell changes from the servers 
(in resolved form).

> * libcldc is intentionally written such that you should be able to use
> lib/cldc.c in embedded applications (such as the kernel), and successfully
> ignore related modules cldc-udp.c and cldc-dns.c.

Sounds good.

One question about the choice of UDP.  I'm not sure how closely you're 
following the chubby design.. but if it's a similar liveness/notification 
model, the server is delaying keepalive rpc responses and piggybacking 
notification of updates.  If the replies are lossy, are you just planning 
on a conservative client timeout/retry, and keeping the normal keepalive 
round trip a healthy factor shorter than the client lease length?  
Shorter timeouts mean higher server load (more frequent keepalives)... and 
longer timeouts mean frequent stalls on writes when doing the cache 
invalidation if there is any packet loss on the network...

sage

--
To unsubscribe from this list: send the line "unsubscribe hail-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to