On Thu, 8 Aug 2013 15:40:08 -0400 (EDT) Benjamin Kaduk <[email protected]> wrote:
> However, the bosserver is currently using LWP for parallelism, and > GSSAPI libraries which are compatible with LWP are hard to come by; > the obvious solution is to convert the bosserver to pthreads. Just mentioning... you can have a pthread process with lwp emulation that implements all of the lwp primitives in terms of pthreads. This is like having just a giant anti-preemption lock around everything, and I thought this already existed somewhere. I don't think that helps with signalling, but it can help with locking issues. Not that that's a good general pthread-ifying solution, but since bosserver doesn't need to run very fast, consistency seems more important than actual parallelism. > First off: do we need to keep an LWP version of the bosserver around > as well as a pthreaded one? I don't think so, and I believe Simon > agrees, but it would be good to get consensus. It does not need to stay very long; lwp fileserver has already been removed. If you're asking if you can get rid of the LWP bosserver at the same time as introducing a pthreaded bosserver, I think that depends on how sure you are that it functions correctly. I would vote for a tbozo directory, but if the changes are not complex and you're verify confident, it may not be necessary. But I think it's easier to implement a tbozo, and then remove bozo (and move tbozo into it) when it's just as good. > Second, how strong of an integrity guarantee do we need for the bos > config? My understanding is that configuration changes (adding or > removing or en/disabling bnodes) are rare events, and it is highly > unlikely that multiple administrator connnections changing things will > be made concurrently. We can assume they are infrequent, but we must assume that they will happen. That is, there needs to be locking, but it doesn't need to be very granular. That is, it can be slow, but it cannot cause something to break or behave weirdly. > If this is true, then we can rely on time-domain "locking" for > synchronization and eliminate some aspects of code-level locking. For > example, a per-bnode lock acquired before writing any bnode state > would not be needed, and a single global lock would be sufficient. I don't really see how one of these is offering integrity but the other isn't, but... A single lock is fine, if I understand this correctly. You've never been able to do certain bozo things in parallel, but I haven't heard complaining about it. In any case, rxgk is more important than improving that. > Relatedly, is it okay to assume that shutdown/restart/etc. will not be > issued concurrently with config changes? A "fully correct" > implementation would seem to need to only shutdown/restart the bnodes > which were configured when the command was issued, and ignore any new > nodes created since then. Because the implementation of > shutdown/restart must drop locks, making this guarantee seems to > require additional sychronization effort, whether via a temporary > queue to store the bnodes being acted upon, or a higher-level lock. Are you talking about a 'bos create' racing with a 'bos restart -all'? I would think you'd block out all modifications during a restart. While the ordering may not matter for 'bos restart -all', it may matter for 'bos restart -bosserver', just so it doesn't leave behind a running process and then re-exec itself or something. > I haven't been able to convince myself that the additional complexity > of the extra watcher threads is necessary, but if someone else could > convince me, that would be good. My opinion is that we should explicitly drop LINUX24 support on servers (or at least tbozo, if we eventually provide both tbozo and bozo). I have never heard of demand for LINUX24 servers, and it's easy to migrate off of them. The thing I have heard demand for and is not easy to migrate off of is LINUX24 clients, which we could still keep. I mean, regardless of what solution we end up with, how much testing is anyone really going to do for bozo on LINUX24? We're just going to end up with something that theoretically works but we're not very confident has solved various possible race conditions or whatnot. If we want to keep LINUX24 for this, we should at least put a big warning on it that mentions something involving the relevant issues. That doesn't deal with any signalling specifics, but keep in mind our current bozo signal handling is not always great, and does not necessarily need to be fixed at the same time. I've always seen bozo misidentify core dumps, which I thought was due to this, but I've never really cared. -- Andrew Deason [email protected] _______________________________________________ OpenAFS-devel mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-devel
