Re: Still getting a lot of -28 (ENOSPC?) errors during balance

2013-04-02 Thread Roman Mamedov
On Tue, 2 Apr 2013 14:04:52 +0600
Roman Mamedov r...@romanrm.ru wrote:

 With kernel 3.7.10 patched with Btrfs: limit the global reserve to 512mb.
 (the problem was occuring also without this patch, but seemed to be even 
 worse).
 
 At the start of balance:
 
 Data: total=31.85GB, used=9.96GB
 System: total=4.00MB, used=16.00KB
 Metadata: total=1.01GB, used=696.17MB
 
 btrfs balance start -musage=5 -dusage=5 is going on for about 50 minutes
 
 Current situation:
 
 Balance on '/mnt/r1/' is running
 1 out of about 2 chunks balanced (20 considered),  50% left
 
 Data: total=30.85GB, used=10.04GB
 System: total=4.00MB, used=16.00KB
 Metadata: total=1.01GB, used=851.69MB

About 2 hours 10 minutes into the balance, it was still going, with:

Data: total=30.85GB, used=10.06GB
System: total=4.00MB, used=16.00KB
Metadata: total=1.01GB, used=909.16MB

Stream of -28 errors continues non-stop in dmesg;

At ~2hr20min looks like it decided to allocate some more space for metadata:

Data: total=30.85GB, used=10.01GB
System: total=4.00MB, used=16.00KB
Metadata: total=2.01GB, used=748.56MB

And shortly after (~ 2hr25min) it was done. After the balance:

Data: total=29.85GB, used=10.01GB
System: total=4.00MB, used=16.00KB
Metadata: total=2.01GB, used=748.27MB

-- 
With respect,
Roman


signature.asc
Description: PGP signature


Re: Still getting a lot of -28 (ENOSPC?) errors during balance

2013-04-02 Thread Josef Bacik
On Tue, Apr 02, 2013 at 02:04:52AM -0600, Roman Mamedov wrote:
 Hello,
 
 With kernel 3.7.10 patched with Btrfs: limit the global reserve to 512mb.
 (the problem was occuring also without this patch, but seemed to be even 
 worse).
 
 At the start of balance:
 
 Data: total=31.85GB, used=9.96GB
 System: total=4.00MB, used=16.00KB
 Metadata: total=1.01GB, used=696.17MB
 
 btrfs balance start -musage=5 -dusage=5 is going on for about 50 minutes
 
 Current situation:
 
 Balance on '/mnt/r1/' is running
 1 out of about 2 chunks balanced (20 considered),  50% left
 
 Data: total=30.85GB, used=10.04GB
 System: total=4.00MB, used=16.00KB
 Metadata: total=1.01GB, used=851.69MB
 
 And a constant stream of these in dmesg:
 

Can you try this out and see if it helps?  Thanks,

Josef

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 0d89ff0..9830e86 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -2548,6 +2548,13 @@ static int do_relocation(struct btrfs_trans_handle 
*trans,
list_for_each_entry(edge, node-upper, list[LOWER]) {
cond_resched();
 
+   ret = btrfs_block_rsv_refill(rc-extent_root, rc-block_rsv,
+rc-extent_root-leafsize,
+BTRFS_RESERVE_FLUSH_ALL);
+   if (ret) {
+   err = ret;
+   break;
+   }
upper = edge-node[UPPER];
root = select_reloc_root(trans, rc, upper, edges, nr);
BUG_ON(!root);
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Still getting a lot of -28 (ENOSPC?) errors during balance

2013-04-02 Thread Roman Mamedov
On Tue, 2 Apr 2013 09:46:26 -0400
Josef Bacik jba...@fusionio.com wrote:

 On Tue, Apr 02, 2013 at 02:04:52AM -0600, Roman Mamedov wrote:
  Hello,
  
  With kernel 3.7.10 patched with Btrfs: limit the global reserve to 512mb.
  (the problem was occuring also without this patch, but seemed to be even 
  worse).
  
  At the start of balance:
  
  Data: total=31.85GB, used=9.96GB
  System: total=4.00MB, used=16.00KB
  Metadata: total=1.01GB, used=696.17MB
  
  btrfs balance start -musage=5 -dusage=5 is going on for about 50 minutes
  
  Current situation:
  
  Balance on '/mnt/r1/' is running
  1 out of about 2 chunks balanced (20 considered),  50% left
  
  Data: total=30.85GB, used=10.04GB
  System: total=4.00MB, used=16.00KB
  Metadata: total=1.01GB, used=851.69MB
  
  And a constant stream of these in dmesg:
  
 
 Can you try this out and see if it helps?  Thanks,

Hello,

Well that balance has now completed, and unfortunately I don't have a complete
image of the filesystem from before, to apply the patch and check if the same
operation goes better this time.

I'll keep it in mind and will try to test it out if I will get a similar
situation again on some filesystem.

Generally what seems to make me run into various problems with balance, is the
following usage scenario: On an active filesystem (used as /home and root FS),
a snapshot is made every 30 minutes with an unique (timestamped) name; and once
a day snapshots from more than two days ago are purged. And it goes like this
for months.

Another variant of this, a backup partition, where snapshots are made every six
hours, and all snapshots are kept for 1-3 months before getting purged.

I guess this kind of usage causes a lot of internal fragmentation or
something, which makes it difficult for a balance to find enough free space to
work with.

 
 Josef
 
 diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
 index 0d89ff0..9830e86 100644
 --- a/fs/btrfs/relocation.c
 +++ b/fs/btrfs/relocation.c
 @@ -2548,6 +2548,13 @@ static int do_relocation(struct btrfs_trans_handle 
 *trans,
   list_for_each_entry(edge, node-upper, list[LOWER]) {
   cond_resched();
  
 + ret = btrfs_block_rsv_refill(rc-extent_root, rc-block_rsv,
 +  rc-extent_root-leafsize,
 +  BTRFS_RESERVE_FLUSH_ALL);
 + if (ret) {
 + err = ret;
 + break;
 + }
   upper = edge-node[UPPER];
   root = select_reloc_root(trans, rc, upper, edges, nr);
   BUG_ON(!root);
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


-- 
With respect,
Roman


signature.asc
Description: PGP signature