On Wed, 4 Sep 2013, chas williams - CONTRACTOR wrote:
On Wed, 4 Sep 2013 12:58:22 -0400 (EDT)
Benjamin Kaduk <[email protected]> wrote:
On Fri, 9 Aug 2013, Benjamin Kaduk wrote:
On Fri, 9 Aug 2013, Benjamin Kaduk wrote:
Maybe, I should just try to implement my idea and we will see if it works
or not.
A sketch of what I had in mind is up at
https://github.com/kaduk/openafs/commits/chas-bozo ; it passes minimal
smoke-testing on my dev box (running a ptserver that doesn't have a prdb).
It does not have the extra "shutting down" flag value that jhutz mentioned,
for now.
At 72 insertions/75 deletions, it doesn't really have a "smaller code"
endorsement going for it ... what do people think?
I'm assembling a changeset merging the still-useful bits of my patchset
with Chas's heavy lifting on the pthreading front, at
https://github.com/kaduk/openafs/commits/newbozo . I changed the
commit-IDs for everything past the first commit (which was verified by
buildbot already), so as to not cause confusion on gerrit. (I also
abandoned my gerrit changes for my old patchset.)
Once I have squahsed, reviewed and tested things, I plan to push it to
gerrit for general review.
I wrote what I think is a general fix for the problems mentioned in
this thread but haven't had much time to test it. It is attached. It
tries to match Jeff's wish list as much as possible. Right now it lets
all apply operations run at the same time on a fixed list of inodes.
If you are in the middle of a restart, someone else can begin a restart
as well. The side effect is that someone else can also run a status
as well. If you wanted to serialize these, it would simply be changing
the newBnodes_lock to a write lock in all cases.
Thanks for pulling that together, I'll look at that after I've finished
looking over the commits so far.
I am not thrilled about your code introducing what looks like C
primitives with BNODE_LOCK/BNODE_UNLOCK. Atleast put some ()'s on the
end of the macros. I am not sure this is entirely correct either:
I first wrote it as a proof-of-concept, and haven't done any cleanup since
then. (Your version is probably better ;) )
for (opr_queue_ScanSafe(&allBnodes, cursor, store )) {
struct bnode *tb = opr_queue_Entry(cursor, struct bnode, q);
bnode_Hold(tb);
BNODE_UNLOCK;
code = (*aproc) (tb, arock);
bnode_Release(tb);
BNODE_LOCK;
While store holds the pointer to the next bnode, you didn't hold a
reference to that bnode so it could potentially go away after
BNODE_UNLOCK.
That sounds right. I have not even really convinced myself that the
queue_ScanSafe is necessary; regular queue_Scan might be sufficient.
I'll keep that in mind when I am at the polishing up stage.
Thanks again,
Ben
_______________________________________________
OpenAFS-devel mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-devel