Re: collapse concurrent forced allocations

2013-03-03 Thread Alexandre Oliva
On Feb 23, 2013, Alexandre Oliva ol...@gnu.org wrote:

 On Feb 22, 2013, Josef Bacik jba...@fusionio.com wrote:
 So I understand what you are getting at, but I think you are doing it wrong. 
  If
 we're calling with CHUNK_ALLOC_FORCE, but somebody has already started to
 allocate with CHUNK_ALLOC_NO_FORCE, we'll reset the space_info-force_alloc 
 to
 our original caller's CHUNK_ALLOC_FORCE.

 But that's ok, do_chunk_alloc will set space_info-force_alloc to
 CHUNK_ALLOC_NO_FORCE at the end, when it succeeds allocating, and then
 anyone else waiting on the mutex to try to allocate will load the
 NO_FORCE from space_info.

 So we only really care about making sure a chunk is actually
 allocated, instead of doing this flag shuffling we should just do

 if (space_info-chunk_alloc) { spin_unlock(space_info-lock);
 wait_event(!space_info-chunk_alloc); return 0;

I looked a bit further into it.  I think I this would work if we had a
wait_queue for space_info-chunk_alloc.  We don't, so the mutex
interface is probably the best we can do.

OTOH, I found out we seem to get into an allocate spree when a large
file is being quickly created, such as when creating a ceph journal or
making a copy of a multi-GB file.  I suppose btrfs is just trying to
allocate contiguous space for the file, but unfortunately there doesn't
seem to be a fallback for allocation failure: as soon as data allocation
fails and space_info is set as full, the large write fails and the
filesystem becomes full, without even trying to use non-contiguous
storage.  Isn't that a bug?


I've also been trying to track down why, on a single-data filesystem,
(compressed?) data reads that fail because of bad blocks also spike the
CPU load and lock the file that failed to map in and the entire
filesystem, so that the only way to recover is to force a reboot.
Does this sound familiar to anyone?

-- 
Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist  Red Hat Brazil Compiler Engineer
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: collapse concurrent forced allocations

2013-02-23 Thread Alexandre Oliva
On Feb 22, 2013, Josef Bacik jba...@fusionio.com wrote:

 So I understand what you are getting at, but I think you are doing it wrong.  
 If
 we're calling with CHUNK_ALLOC_FORCE, but somebody has already started to
 allocate with CHUNK_ALLOC_NO_FORCE, we'll reset the space_info-force_alloc to
 our original caller's CHUNK_ALLOC_FORCE.

But that's ok, do_chunk_alloc will set space_info-force_alloc to
CHUNK_ALLOC_NO_FORCE at the end, when it succeeds allocating, and then
anyone else waiting on the mutex to try to allocate will load the
NO_FORCE from space_info.

 So we only really care about making sure a chunk is actually
 allocated, instead of doing this flag shuffling we should just do

 if (space_info-chunk_alloc) {
   spin_unlock(space_info-lock);
   wait_event(!space_info-chunk_alloc);
   return 0;
 }

Sorry, I don't follow.

-- 
Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist  Red Hat Brazil Compiler Engineer
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: collapse concurrent forced allocations (was: Re: clear chunk_alloc flag on retryable failure)

2013-02-22 Thread Josef Bacik
On Thu, Feb 21, 2013 at 06:15:49PM -0700, Alexandre Oliva wrote:
 On Feb 21, 2013, Alexandre Oliva ol...@gnu.org wrote:
 
  What I saw in that function also happens to explain why in some cases I
  see filesystems allocate a huge number of chunks that remain unused
  (leading to the scenario above, of not having more chunks to allocate).
  It happens for data and metadata, but not necessarily both.  I'm
  guessing some thread sets the force_alloc flag on the corresponding
  space_info, and then several threads trying to get disk space end up
  attempting to allocate a new chunk concurrently.  All of them will see
  the force_alloc flag and bump their local copy of force up to the level
  they see first, and they won't clear it even if another thread succeeds
  in allocating a chunk, thus clearing the force flag.  Then each thread
  that observed the force flag will, on its turn, force the allocation of
  a new chunk.  And any threads that come in while it does that will see
  the force flag still set and pick it up, and so on.  This sounds like a
  problem to me, but...  what should the correct behavior be?  Clear
  force_flag once we copy it to a local force?  Reset force to the
  incoming value on every loop?
 
 I think a slight variant of the following makes the most sense, so I
 implemented it in the patch below.
 
  Set the flag to our incoming force if we have it at first, clear our
  local flag, and move it from the space_info when we determined that we
  are the thread that's going to perform the allocation?
 
 

 From: Alexandre Oliva ol...@gnu.org
 
 btrfs: consume force_alloc in the first thread to chunk_alloc
 
 Even if multiple threads in do_chunk_alloc look at force_alloc and see
 a force flag, it suffices that one of them consumes the flag.  Arrange
 for an incoming force argument to make to force_alloc in case of
 concurrent calls, so that it is used only by the first thread to get
 to allocation after the initial request.
 
 Signed-off-by: Alexandre Oliva ol...@gnu.org
 ---
  fs/btrfs/extent-tree.c |8 
  1 file changed, 8 insertions(+)
 
 diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
 index 6ee89d5..66283f7 100644
 --- a/fs/btrfs/extent-tree.c
 +++ b/fs/btrfs/extent-tree.c
 @@ -3574,8 +3574,12 @@ static int do_chunk_alloc(struct btrfs_trans_handle 
 *trans,
  
  again:
   spin_lock(space_info-lock);
 +
 + /* Bring force_alloc to force and tentatively consume it.  */
   if (force  space_info-force_alloc)
   force = space_info-force_alloc;
 + space_info-force_alloc = CHUNK_ALLOC_NO_FORCE;
 +
   if (space_info-full) {
   spin_unlock(space_info-lock);
   return 0;
 @@ -3586,6 +3590,10 @@ again:
   return 0;
   } else if (space_info-chunk_alloc) {
   wait_for_alloc = 1;
 + /* Reset force_alloc so that it's consumed by the
 +first thread that completes the allocation.  */
 + space_info-force_alloc = force;
 + force = CHUNK_ALLOC_NO_FORCE;

So I understand what you are getting at, but I think you are doing it wrong.  If
we're calling with CHUNK_ALLOC_FORCE, but somebody has already started to
allocate with CHUNK_ALLOC_NO_FORCE, we'll reset the space_info-force_alloc to
our original caller's CHUNK_ALLOC_FORCE.  So we only really care about making
sure a chunk is actually allocated, instead of doing this flag shuffling we
should just do

if (space_info-chunk_alloc) {
spin_unlock(space_info-lock);
wait_event(!space_info-chunk_alloc);
return 0;
}

and that way we don't allocate more chunks than normal.  Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


collapse concurrent forced allocations (was: Re: clear chunk_alloc flag on retryable failure)

2013-02-21 Thread Alexandre Oliva
On Feb 21, 2013, Alexandre Oliva ol...@gnu.org wrote:

 What I saw in that function also happens to explain why in some cases I
 see filesystems allocate a huge number of chunks that remain unused
 (leading to the scenario above, of not having more chunks to allocate).
 It happens for data and metadata, but not necessarily both.  I'm
 guessing some thread sets the force_alloc flag on the corresponding
 space_info, and then several threads trying to get disk space end up
 attempting to allocate a new chunk concurrently.  All of them will see
 the force_alloc flag and bump their local copy of force up to the level
 they see first, and they won't clear it even if another thread succeeds
 in allocating a chunk, thus clearing the force flag.  Then each thread
 that observed the force flag will, on its turn, force the allocation of
 a new chunk.  And any threads that come in while it does that will see
 the force flag still set and pick it up, and so on.  This sounds like a
 problem to me, but...  what should the correct behavior be?  Clear
 force_flag once we copy it to a local force?  Reset force to the
 incoming value on every loop?

I think a slight variant of the following makes the most sense, so I
implemented it in the patch below.

 Set the flag to our incoming force if we have it at first, clear our
 local flag, and move it from the space_info when we determined that we
 are the thread that's going to perform the allocation?


From: Alexandre Oliva ol...@gnu.org

btrfs: consume force_alloc in the first thread to chunk_alloc

Even if multiple threads in do_chunk_alloc look at force_alloc and see
a force flag, it suffices that one of them consumes the flag.  Arrange
for an incoming force argument to make to force_alloc in case of
concurrent calls, so that it is used only by the first thread to get
to allocation after the initial request.

Signed-off-by: Alexandre Oliva ol...@gnu.org
---
 fs/btrfs/extent-tree.c |8 
 1 file changed, 8 insertions(+)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 6ee89d5..66283f7 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3574,8 +3574,12 @@ static int do_chunk_alloc(struct btrfs_trans_handle *trans,
 
 again:
 	spin_lock(space_info-lock);
+
+	/* Bring force_alloc to force and tentatively consume it.  */
 	if (force  space_info-force_alloc)
 		force = space_info-force_alloc;
+	space_info-force_alloc = CHUNK_ALLOC_NO_FORCE;
+
 	if (space_info-full) {
 		spin_unlock(space_info-lock);
 		return 0;
@@ -3586,6 +3590,10 @@ again:
 		return 0;
 	} else if (space_info-chunk_alloc) {
 		wait_for_alloc = 1;
+		/* Reset force_alloc so that it's consumed by the
+		   first thread that completes the allocation.  */
+		space_info-force_alloc = force;
+		force = CHUNK_ALLOC_NO_FORCE;
 	} else {
 		space_info-chunk_alloc = 1;
 	}

-- 
Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist  Red Hat Brazil Compiler Engineer