On 09/10/2017 10:17, Andriy Gapon wrote:
> 
> void
> g_slice_orphan(struct g_consumer *cp)
> {
>         struct g_slicer *gsp;
> 
>         g_trace(G_T_TOPOLOGY, "g_slice_orphan(%p/%s)", cp, 
> cp->provider->name);
>         g_topology_assert();
> 
>         /* XXX: Not good enough we leak the softc and its suballocations */
>         gsp = cp->geom->softc;
>         cp->geom->softc = NULL;       <--- I think that doing this is unsafe
>         g_slice_free(gsp);    <--- and this
>         g_wither_geom(cp->geom, ENXIO);
> }

So, g_slice_orphan() is obviously unsafe as it destroys its softc even before it
errors its geom and its providers.
I have created a review request for my attempt to fix the problem:
https://reviews.freebsd.org/D12809

I am still somewhat concerned about the lack of a lock-step mechanism between
setting an error on a provider and checking nstart/nend (described below).
But given the amount of inter-thread communication required before a provider is
actually "finished", I do not think that any problem are likely because of that.

> I do not see how g_slice_orphan() - or, in fact, the whole orphanization 
> process
> - is synchronized with g_io_request() and g_io_schedule_down().  I think that 
> it
> is quite possible for a geom to become orphaned after g_io_check() succeeds 
> and
> before geom->start is executed (or while it's being executed).
> I see that there is an attempt to prevent orphanization while a geom is in 
> use.
> The code in one_event() does not call geom->orphan until pp->nstart == 
> pp->nend.
>  But there is no common lock to actually synchronize the check and updates of
> the variables.  Atomic operations are not used either.  So, in my opinion, the
> check is not really safe.
> 
> It looks like r162200 attempted to fix the general problem but either it was 
> not
> a complete solution or the I/O code was later changed:
> https://svnweb.freebsd.org/base?view=revision&revision=162200
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=102766
> 
> Back to g_slice, I see that g_slice_start() makes use of the softc.  So, this 
> is
> why I believe that destroying it right in g_slice_orphan is not correct.
> 
> It seems that other GEOM classes go to greater lengths to ensure that there is
> no race between orphanization and I/O processing.  Typically an internal lock
> and a flag seems to be used.
> 
> I wonder if g_slice should follow that technique.
> Or maybe it's better to use a lock or atomic operations at GEOM infrastructure
> level to ensure that changes to pp->error, pp->flags, pp->nstart and pp->nend
> are really seen by other threads and in the correct order.
> 


-- 
Andriy Gapon
_______________________________________________
freebsd-geom@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-geom
To unsubscribe, send any mail to "freebsd-geom-unsubscr...@freebsd.org"

Reply via email to