On Thu, 8 Aug 2013, Andrew Deason wrote:
On Thu, 8 Aug 2013 15:40:08 -0400 (EDT)
Benjamin Kaduk <[email protected]> wrote:
First off: do we need to keep an LWP version of the bosserver around
as well as a pthreaded one? I don't think so, and I believe Simon
agrees, but it would be good to get consensus.
It does not need to stay very long; lwp fileserver has already been
removed. If you're asking if you can get rid of the LWP bosserver at the
same time as introducing a pthreaded bosserver, I think that depends on
how sure you are that it functions correctly. I would vote for a tbozo
directory, but if the changes are not complex and you're verify
confident, it may not be necessary. But I think it's easier to implement
a tbozo, and then remove bozo (and move tbozo into it) when it's just as
good.
I guess part of the reason I was pushing for skipping the "separate tbozo"
step with my implementation is that it was a pretty straightforward
translation of the old code to pthreaded primitives. I guess you're
right, and I would be less comfortable skipping the tbozo step with the
extra changes in the earlier patchset.
Second, how strong of an integrity guarantee do we need for the bos
config? My understanding is that configuration changes (adding or
removing or en/disabling bnodes) are rare events, and it is highly
unlikely that multiple administrator connnections changing things will
be made concurrently.
We can assume they are infrequent, but we must assume that they will
happen. That is, there needs to be locking, but it doesn't need to be
very granular. That is, it can be slow, but it cannot cause something to
break or behave weirdly.
Hmm, okay. I think that the current LWP code and my tree do not have any
explicit locking when writing to bnode fields; Chas has a per-bnode lock
which protects that state. I don't remember offhand whether the global
lock in my tree can safely be extended to cover such bnode state. I guess
this makes my current CV usage a bit bogus, as it sleeps on the global
mutex but is not necessarily using it to protect anything.
Relatedly, is it okay to assume that shutdown/restart/etc. will not be
issued concurrently with config changes? A "fully correct"
implementation would seem to need to only shutdown/restart the bnodes
which were configured when the command was issued, and ignore any new
nodes created since then. Because the implementation of
shutdown/restart must drop locks, making this guarantee seems to
require additional sychronization effort, whether via a temporary
queue to store the bnodes being acted upon, or a higher-level lock.
Are you talking about a 'bos create' racing with a 'bos restart -all'? I
would think you'd block out all modifications during a restart. While
the ordering may not matter for 'bos restart -all', it may matter for
'bos restart -bosserver', just so it doesn't leave behind a running
process and then re-exec itself or something.
Well, that's the use case that I backsolved to from reading the code;
maybe Chas had something else in mind.
Basically, in things like WaitAll, the code takes the global locks and
holds it while copying the allBnode queue to a temporary queue. It then
releases the "main" global lock and holds the "tempq" global lock across
the blocking operations that make up the wait in the WaitAll. The main
global lock can't be held like that, as holding it while blocking would
tie up all action. My intuition is telling me that a flag variable
protected by the global lock could offer the same protection with less
code complexity, but I didn't try to implement such a scheme or analyze it
fully.
I haven't been able to convince myself that the additional complexity
of the extra watcher threads is necessary, but if someone else could
convince me, that would be good.
My opinion is that we should explicitly drop LINUX24 support on servers
(or at least tbozo, if we eventually provide both tbozo and bozo). I
have never heard of demand for LINUX24 servers, and it's easy to migrate
off of them. The thing I have heard demand for and is not easy to
migrate off of is LINUX24 clients, which we could still keep.
I mean, regardless of what solution we end up with, how much testing is
anyone really going to do for bozo on LINUX24? We're just going to end
up with something that theoretically works but we're not very confident
has solved various possible race conditions or whatnot. If we want to
keep LINUX24 for this, we should at least put a big warning on it that
mentions something involving the relevant issues.
I'm not going to argue with this proposal :)
That doesn't deal with any signalling specifics, but keep in mind our
current bozo signal handling is not always great, and does not
necessarily need to be fixed at the same time. I've always seen bozo
misidentify core dumps, which I thought was due to this, but I've never
really cared.
I think the really important thing is to not miss any SIGCHLD events.
Misidentifying core dumps is not as important.
I also failed to note that Chas fixed up the bnode reference counting,
which is totally bogus in master. I had deferred that to a follow-up
commit, which is not currently implemented. (It should not be hard to
do.)
-Ben
_______________________________________________
OpenAFS-devel mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-devel