Re: clear chunk_alloc flag on retryable failure

2013-02-22 Thread Josef Bacik
On Thu, Feb 21, 2013 at 02:15:14PM -0700, Alexandre Oliva wrote:
> I've experienced filesystem freezes with permanent spikes in the active
> process count for quite a while, particularly on filesystems whose
> available raw space has already been fully allocated to chunks.
> 
> While looking into this, I found a pretty obvious error in
> do_chunk_alloc: it sets space_info->chunk_alloc, but if
> btrfs_alloc_chunk returns an error other than ENOSPC, it returns leaving
> that flag set, which causes any other threads waiting for
> space_info->chunk_alloc to become zero to spin indefinitely.
> 
> I haven't double-checked that this patch fixes the failure I've observed
> fully (it's not exactly trivial to trigger), but it surely is a bug and
> the fix is trivial, so...  Please put it in :-)

Yup putting in btrfs-next, thanks.

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: collapse concurrent forced allocations (was: Re: clear chunk_alloc flag on retryable failure)

2013-02-22 Thread Josef Bacik
On Thu, Feb 21, 2013 at 06:15:49PM -0700, Alexandre Oliva wrote:
> On Feb 21, 2013, Alexandre Oliva  wrote:
> 
> > What I saw in that function also happens to explain why in some cases I
> > see filesystems allocate a huge number of chunks that remain unused
> > (leading to the scenario above, of not having more chunks to allocate).
> > It happens for data and metadata, but not necessarily both.  I'm
> > guessing some thread sets the force_alloc flag on the corresponding
> > space_info, and then several threads trying to get disk space end up
> > attempting to allocate a new chunk concurrently.  All of them will see
> > the force_alloc flag and bump their local copy of force up to the level
> > they see first, and they won't clear it even if another thread succeeds
> > in allocating a chunk, thus clearing the force flag.  Then each thread
> > that observed the force flag will, on its turn, force the allocation of
> > a new chunk.  And any threads that come in while it does that will see
> > the force flag still set and pick it up, and so on.  This sounds like a
> > problem to me, but...  what should the correct behavior be?  Clear
> > force_flag once we copy it to a local force?  Reset force to the
> > incoming value on every loop?
> 
> I think a slight variant of the following makes the most sense, so I
> implemented it in the patch below.
> 
> > Set the flag to our incoming force if we have it at first, clear our
> > local flag, and move it from the space_info when we determined that we
> > are the thread that's going to perform the allocation?
> 
> 

> From: Alexandre Oliva 
> 
> btrfs: consume force_alloc in the first thread to chunk_alloc
> 
> Even if multiple threads in do_chunk_alloc look at force_alloc and see
> a force flag, it suffices that one of them consumes the flag.  Arrange
> for an incoming force argument to make to force_alloc in case of
> concurrent calls, so that it is used only by the first thread to get
> to allocation after the initial request.
> 
> Signed-off-by: Alexandre Oliva 
> ---
>  fs/btrfs/extent-tree.c |8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 6ee89d5..66283f7 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -3574,8 +3574,12 @@ static int do_chunk_alloc(struct btrfs_trans_handle 
> *trans,
>  
>  again:
>   spin_lock(&space_info->lock);
> +
> + /* Bring force_alloc to force and tentatively consume it.  */
>   if (force < space_info->force_alloc)
>   force = space_info->force_alloc;
> + space_info->force_alloc = CHUNK_ALLOC_NO_FORCE;
> +
>   if (space_info->full) {
>   spin_unlock(&space_info->lock);
>   return 0;
> @@ -3586,6 +3590,10 @@ again:
>   return 0;
>   } else if (space_info->chunk_alloc) {
>   wait_for_alloc = 1;
> + /* Reset force_alloc so that it's consumed by the
> +first thread that completes the allocation.  */
> + space_info->force_alloc = force;
> + force = CHUNK_ALLOC_NO_FORCE;

So I understand what you are getting at, but I think you are doing it wrong.  If
we're calling with CHUNK_ALLOC_FORCE, but somebody has already started to
allocate with CHUNK_ALLOC_NO_FORCE, we'll reset the space_info->force_alloc to
our original caller's CHUNK_ALLOC_FORCE.  So we only really care about making
sure a chunk is actually allocated, instead of doing this flag shuffling we
should just do

if (space_info->chunk_alloc) {
spin_unlock(&space_info->lock);
wait_event(!space_info->chunk_alloc);
return 0;
}

and that way we don't allocate more chunks than normal.  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


collapse concurrent forced allocations (was: Re: clear chunk_alloc flag on retryable failure)

2013-02-21 Thread Alexandre Oliva
On Feb 21, 2013, Alexandre Oliva  wrote:

> What I saw in that function also happens to explain why in some cases I
> see filesystems allocate a huge number of chunks that remain unused
> (leading to the scenario above, of not having more chunks to allocate).
> It happens for data and metadata, but not necessarily both.  I'm
> guessing some thread sets the force_alloc flag on the corresponding
> space_info, and then several threads trying to get disk space end up
> attempting to allocate a new chunk concurrently.  All of them will see
> the force_alloc flag and bump their local copy of force up to the level
> they see first, and they won't clear it even if another thread succeeds
> in allocating a chunk, thus clearing the force flag.  Then each thread
> that observed the force flag will, on its turn, force the allocation of
> a new chunk.  And any threads that come in while it does that will see
> the force flag still set and pick it up, and so on.  This sounds like a
> problem to me, but...  what should the correct behavior be?  Clear
> force_flag once we copy it to a local force?  Reset force to the
> incoming value on every loop?

I think a slight variant of the following makes the most sense, so I
implemented it in the patch below.

> Set the flag to our incoming force if we have it at first, clear our
> local flag, and move it from the space_info when we determined that we
> are the thread that's going to perform the allocation?


From: Alexandre Oliva 

btrfs: consume force_alloc in the first thread to chunk_alloc

Even if multiple threads in do_chunk_alloc look at force_alloc and see
a force flag, it suffices that one of them consumes the flag.  Arrange
for an incoming force argument to make to force_alloc in case of
concurrent calls, so that it is used only by the first thread to get
to allocation after the initial request.

Signed-off-by: Alexandre Oliva 
---
 fs/btrfs/extent-tree.c |8 
 1 file changed, 8 insertions(+)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 6ee89d5..66283f7 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3574,8 +3574,12 @@ static int do_chunk_alloc(struct btrfs_trans_handle *trans,
 
 again:
 	spin_lock(&space_info->lock);
+
+	/* Bring force_alloc to force and tentatively consume it.  */
 	if (force < space_info->force_alloc)
 		force = space_info->force_alloc;
+	space_info->force_alloc = CHUNK_ALLOC_NO_FORCE;
+
 	if (space_info->full) {
 		spin_unlock(&space_info->lock);
 		return 0;
@@ -3586,6 +3590,10 @@ again:
 		return 0;
 	} else if (space_info->chunk_alloc) {
 		wait_for_alloc = 1;
+		/* Reset force_alloc so that it's consumed by the
+		   first thread that completes the allocation.  */
+		space_info->force_alloc = force;
+		force = CHUNK_ALLOC_NO_FORCE;
 	} else {
 		space_info->chunk_alloc = 1;
 	}

-- 
Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist  Red Hat Brazil Compiler Engineer


clear chunk_alloc flag on retryable failure

2013-02-21 Thread Alexandre Oliva
I've experienced filesystem freezes with permanent spikes in the active
process count for quite a while, particularly on filesystems whose
available raw space has already been fully allocated to chunks.

While looking into this, I found a pretty obvious error in
do_chunk_alloc: it sets space_info->chunk_alloc, but if
btrfs_alloc_chunk returns an error other than ENOSPC, it returns leaving
that flag set, which causes any other threads waiting for
space_info->chunk_alloc to become zero to spin indefinitely.

I haven't double-checked that this patch fixes the failure I've observed
fully (it's not exactly trivial to trigger), but it surely is a bug and
the fix is trivial, so...  Please put it in :-)


What I saw in that function also happens to explain why in some cases I
see filesystems allocate a huge number of chunks that remain unused
(leading to the scenario above, of not having more chunks to allocate).
It happens for data and metadata, but not necessarily both.  I'm
guessing some thread sets the force_alloc flag on the corresponding
space_info, and then several threads trying to get disk space end up
attempting to allocate a new chunk concurrently.  All of them will see
the force_alloc flag and bump their local copy of force up to the level
they see first, and they won't clear it even if another thread succeeds
in allocating a chunk, thus clearing the force flag.  Then each thread
that observed the force flag will, on its turn, force the allocation of
a new chunk.  And any threads that come in while it does that will see
the force flag still set and pick it up, and so on.  This sounds like a
problem to me, but...  what should the correct behavior be?  Clear
force_flag once we copy it to a local force?  Reset force to the
incoming value on every loop?  Set the flag to our incoming force if we
have it at first, clear our local flag, and move it from the space_info
when we determined that we are the thread that's going to perform the
allocation?


btrfs: clear chunk_alloc flag on retryable failure

From: Alexandre Oliva 

If btrfs_alloc_chunk fails with e.g. ENOMEM, we exit do_chunk_alloc
without clearing chunk_alloc in space_info.  As a result, any further
calls to do_chunk_alloc on that filesystem will start busy-waiting for
chunk_alloc to be cleared, but it never will be.  This patch adjusts
do_chunk_alloc so that it clears this flag in case of an error.

Signed-off-by: Alexandre Oliva 
---
 fs/btrfs/extent-tree.c |6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 5a3327b..b597cdf 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3632,19 +3632,21 @@ again:
 	check_system_chunk(trans, extent_root, flags);
 
 	ret = btrfs_alloc_chunk(trans, extent_root, flags);
+
+	spin_lock(&space_info->lock);
+
 	if (ret < 0 && ret != -ENOSPC)
 		goto out;
 
-	spin_lock(&space_info->lock);
 	if (ret)
 		space_info->full = 1;
 	else
 		ret = 1;
 
 	space_info->force_alloc = CHUNK_ALLOC_NO_FORCE;
+out:
 	space_info->chunk_alloc = 0;
 	spin_unlock(&space_info->lock);
-out:
 	mutex_unlock(&fs_info->chunk_mutex);
 	return ret;
 }

-- 
Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist  Red Hat Brazil Compiler Engineer