On Sat, Nov 19, 2011 at 01:37:39AM +0000, Al Viro wrote:
> On Fri, Nov 18, 2011 at 08:20:56PM +0000, Al Viro wrote:
> > On Fri, Nov 18, 2011 at 02:38:54PM -0500, Josef Bacik wrote:
> > > Al pointed out that if we fail to start a worker for whatever reason 
> > > (ENOMEM
> > > basically), we could leak our count for num_start_workers, and so we'd 
> > > think we
> > > had more workers than we actually do.  This could cause us to shrink 
> > > workers
> > > when we shouldn't or not start workers when we should.  So check the 
> > > return
> > > value and if we failed fix num_start_workers and fallback.  Thanks,
> > 
> > It's actually uglier than that; consider check_pending_workers_create()
> > where we
> >     * bump the num_start_workers
> >     * call start_new_worker(), which can fail, and then we have the same
> > leak; if it doesn't fail, it schedules a call of start_new_worker_func()
> >     * when start_new_worker_func() runs, it does btrfs_start_workers(),
> > which can run into the same leak again (this time on another pool - one
> > we have as ->async_helper).
> 
> Nuts...  AFAICS, we _always_ leak ->num_start_workers here (i.e. when
> check_pending_workers_create() finds ->atomic_start_pending set).  In
> effect, we bump it once in check_pending_workers_create() itself, then
> another time (on the same pool) when start_new_worker_func() calls
> btrfs_start_workers().  That one will be dropped when we manage to 
> start the thread, but the first one won't.
> 
> Shouldn't we use __btrfs_start_workers() instead here?

OK, tentative fixes for that stuff are pushed into #btrfs in vfs.git; comments
would be welcome...
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to