Hello,
Some kind of refcount assertion failure we have been having is "refcount
detected use-after-free!" from the call to refcounts_ref inside
_ports_bucket_class_iterate, called for instance by the periodic sync.
I'm thinking: what about this scenario:
- thread A calls diskfs_nput()
- that calls refcounts_deref_weak(), which gets both hard and weak
counts to 0
- thread B calls diskfs_sync_everything, which calls
ports_bucket_iterate, which calls _ports_bucket_class_iterate
- _ports_bucket_class_iterate iterates over nodes, and calls
refcounts_ref for the node deref-ed above. It then traps on the
assertion.
So the situation is that diskfs_nput is releasing a node while
sync_everything is trying to sync it.
Just using refcounts_unsafe_ref is most probably not a solution since
the node is on its way out, the function called by the iterator will
most probably get it all wrong on a node being destroyed.
Another way would be as attached: just skip nodes which didn't have any
reference. I however don't know libports enough to say whether that's
correct behavior for libports or not: AIUI callers of refcounts_deref*
are supposed to check for 0,0 and are then responsible for destroying
the node?
Samuel
diff --git a/libports/bucket-iterate.c b/libports/bucket-iterate.c
index b021b99..76dc3f7 100644
--- a/libports/bucket-iterate.c
+++ b/libports/bucket-iterate.c
@@ -58,7 +58,14 @@ _ports_bucket_class_iterate (struct hurd_ihash *ht,
if (class == 0 || pi->class == class)
{
- refcounts_ref (&pi->refcounts, NULL);
+ struct references result;
+ refcounts_unsafe_ref (&pi->refcounts, &result);
+ if (result.hard == 1 && result.weak == 0)
+ {
+ /* This one is on its way out, skip it. */
+ refcounts_deref (&pi->refcounts, NULL);
+ continue;
+ }
p[n] = pi;
n++;
}