On 09/10/2017 10:17, Andriy Gapon wrote: > > void > g_slice_orphan(struct g_consumer *cp) > { > struct g_slicer *gsp; > > g_trace(G_T_TOPOLOGY, "g_slice_orphan(%p/%s)", cp, > cp->provider->name); > g_topology_assert(); > > /* XXX: Not good enough we leak the softc and its suballocations */ > gsp = cp->geom->softc; > cp->geom->softc = NULL; <--- I think that doing this is unsafe > g_slice_free(gsp); <--- and this > g_wither_geom(cp->geom, ENXIO); > }
So, g_slice_orphan() is obviously unsafe as it destroys its softc even before it errors its geom and its providers. I have created a review request for my attempt to fix the problem: https://reviews.freebsd.org/D12809 I am still somewhat concerned about the lack of a lock-step mechanism between setting an error on a provider and checking nstart/nend (described below). But given the amount of inter-thread communication required before a provider is actually "finished", I do not think that any problem are likely because of that. > I do not see how g_slice_orphan() - or, in fact, the whole orphanization > process > - is synchronized with g_io_request() and g_io_schedule_down(). I think that > it > is quite possible for a geom to become orphaned after g_io_check() succeeds > and > before geom->start is executed (or while it's being executed). > I see that there is an attempt to prevent orphanization while a geom is in > use. > The code in one_event() does not call geom->orphan until pp->nstart == > pp->nend. > But there is no common lock to actually synchronize the check and updates of > the variables. Atomic operations are not used either. So, in my opinion, the > check is not really safe. > > It looks like r162200 attempted to fix the general problem but either it was > not > a complete solution or the I/O code was later changed: > https://svnweb.freebsd.org/base?view=revision&revision=162200 > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=102766 > > Back to g_slice, I see that g_slice_start() makes use of the softc. So, this > is > why I believe that destroying it right in g_slice_orphan is not correct. > > It seems that other GEOM classes go to greater lengths to ensure that there is > no race between orphanization and I/O processing. Typically an internal lock > and a flag seems to be used. > > I wonder if g_slice should follow that technique. > Or maybe it's better to use a lock or atomic operations at GEOM infrastructure > level to ensure that changes to pp->error, pp->flags, pp->nstart and pp->nend > are really seen by other threads and in the correct order. > -- Andriy Gapon _______________________________________________ freebsd-geom@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-geom To unsubscribe, send any mail to "freebsd-geom-unsubscr...@freebsd.org"