Re: nocow 'C' flag ignored after balance

2013-05-30 Thread Kyle Gates

On Wed, May 29, 2013 Miao Xie wrote:

On wed, 29 May 2013 10:55:11 +0900, Liu Bo wrote:

On Tue, May 28, 2013 at 09:22:11AM -0500, Kyle Gates wrote:

From: Liu Bo bo.li@oracle.com

Subject: [PATCH] Btrfs: fix broken nocow after a normal balance


[...]

Sorry for the long wait in replying.
This patch was unsuccessful in fixing the problem (on my 3.8 Ubuntu
Raring kernel). I can probably try again on a newer version if you
think it will help.
This was my first kernel compile so I patched by hand and waited (10
hours on my old 32 bit single core machine).

I did move some of the files off and back on to the filesystem to
start fresh and compare but all seem to exhibit the same behavior
after a balance.



Thanks for testing the patch although it didn't help you.
Actually I tested it to be sure that it fixed the problems in my 
reproducer.


So anyway can you please apply this debug patch in order to nail it down?


Your patch can not fix the above problem is because we may 
update -last_snapshot

after we relocate the file data extent.

For example, there are two block groups which will be relocated, One is 
data block
group, the other is metadata block group. Then we relocate the data block 
group firstly,
and set the new generation for the file data extent item/the relative 
extent item and
set (new_generation - 1) for -last_snapshot. After the relocation of this 
block group,
we will end the transaction and drop the relocation tree. If we end the 
space balance now,
we won't break the nocow rule because -last_snapshot is less than the 
generation of the file
data extent item/the relative extent item. But there is still one block 
group which will be
relocated, when relocating the second block group, we will also start a 
new transaction,
and update -last_snapshot if need. So, -last_snapshot is greater than 
the generation of the file

data extent item we set before. And the nocow rule is broken.

Back to this above problem. I don't think it is a serious problem, we only 
do COW once after
the relocation, then we will still honour the nocow rule. The behaviour is 
similar to snapshot.

So maybe it needn't be fixed.


I would argue that for large vm workloads, running a balance or adding disks 
is a common practice that will result in a drastic drop in performance as 
well as massive increases in metadata writes and fragmentation.
In my case my disks were thrashing severely, performance was poor and ntp 
couldn't even hold my clock stable.

If the fix is nontrival please add this to the todo list.
Thanks,
Kyle

If we must fix this problem, I think the only way is that get the 
generation at the beginning
of the space balance, and then set it to -last_snapshot 
if -last_snapshot is less than it,
don't use (current_generation - 1) to update the -last_snapshot. Besides 
that, don't forget
to store the generation into btrfs_balance_item, or the problem will 
happen after we resume the

balance.

Thanks
Miao



thanks,
liubo

[...]






--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: nocow 'C' flag ignored after balance

2013-05-29 Thread Miao Xie
On wed, 29 May 2013 10:55:11 +0900, Liu Bo wrote:
 On Tue, May 28, 2013 at 09:22:11AM -0500, Kyle Gates wrote:
 From: Liu Bo bo.li@oracle.com

 Subject: [PATCH] Btrfs: fix broken nocow after a normal balance

 [...]

 Sorry for the long wait in replying.
 This patch was unsuccessful in fixing the problem (on my 3.8 Ubuntu
 Raring kernel). I can probably try again on a newer version if you
 think it will help.
 This was my first kernel compile so I patched by hand and waited (10
 hours on my old 32 bit single core machine).

 I did move some of the files off and back on to the filesystem to
 start fresh and compare but all seem to exhibit the same behavior
 after a balance.

 
 Thanks for testing the patch although it didn't help you.
 Actually I tested it to be sure that it fixed the problems in my reproducer.
 
 So anyway can you please apply this debug patch in order to nail it down?

Your patch can not fix the above problem is because we may update 
-last_snapshot
after we relocate the file data extent.

For example, there are two block groups which will be relocated, One is data 
block
group, the other is metadata block group. Then we relocate the data block group 
firstly,
and set the new generation for the file data extent item/the relative extent 
item and
set (new_generation - 1) for -last_snapshot. After the relocation of this 
block group,
we will end the transaction and drop the relocation tree. If we end the space 
balance now,
we won't break the nocow rule because -last_snapshot is less than the 
generation of the file
data extent item/the relative extent item. But there is still one block group 
which will be
relocated, when relocating the second block group, we will also start a new 
transaction,
and update -last_snapshot if need. So, -last_snapshot is greater than the 
generation of the file
data extent item we set before. And the nocow rule is broken.

Back to this above problem. I don't think it is a serious problem, we only do 
COW once after
the relocation, then we will still honour the nocow rule. The behaviour is 
similar to snapshot.
So maybe it needn't be fixed.

If we must fix this problem, I think the only way is that get the generation at 
the beginning
of the space balance, and then set it to -last_snapshot if -last_snapshot is 
less than it,
don't use (current_generation - 1) to update the -last_snapshot. Besides that, 
don't forget
to store the generation into btrfs_balance_item, or the problem will happen 
after we resume the
balance.

Thanks
Miao

 
 thanks,
 liubo
 
  
 diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
 index df472ab..c12a11c 100644
 --- a/fs/btrfs/extent-tree.c
 +++ b/fs/btrfs/extent-tree.c
 @@ -2857,8 +2857,12 @@ static noinline int check_committed_ref(struct 
 btrfs_trans_handle *trans,
   goto out;
  
   if (btrfs_extent_generation(leaf, ei) =
 - btrfs_root_last_snapshot(root-root_item))
 + btrfs_root_last_snapshot(root-root_item)) {
 + printk(extent gen %llu last_snap %llu\n,
 + btrfs_extent_generation(leaf, ei),
 + btrfs_root_last_snapshot(root-root_item));
   goto out;
 + }
  
   iref = (struct btrfs_extent_inline_ref *)(ei + 1);
   if (btrfs_extent_inline_ref_type(leaf, iref) !=
 diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
 index 23c596c..8cad6ee 100644
 --- a/fs/btrfs/inode.c
 +++ b/fs/btrfs/inode.c
 @@ -1317,16 +1317,24 @@ next_slot:
   goto out_check;
   if (btrfs_file_extent_compression(leaf, fi) ||
   btrfs_file_extent_encryption(leaf, fi) ||
 - btrfs_file_extent_other_encoding(leaf, fi))
 + btrfs_file_extent_other_encoding(leaf, fi)) {
 + printk(special encoding\n);
   goto out_check;
 - if (extent_type == BTRFS_FILE_EXTENT_REG  !force)
 + }
 + if (extent_type == BTRFS_FILE_EXTENT_REG  !force) {
 + printk(BTRFS_FILE_EXTENT_REF\n);
   goto out_check;
 - if (btrfs_extent_readonly(root, disk_bytenr))
 + }
 + if (btrfs_extent_readonly(root, disk_bytenr)) {
 + printk(ro\n);
   goto out_check;
 + }
   if (btrfs_cross_ref_exist(trans, root, ino,
 found_key.offset -
 -   extent_offset, disk_bytenr))
 +   extent_offset, disk_bytenr)) {
 + printk(cross ref\n);
   goto out_check;
 + }
   disk_bytenr += extent_offset;
   disk_bytenr += 

Re: nocow 'C' flag ignored after balance

2013-05-28 Thread Kyle Gates

From: Liu Bo bo.li@oracle.com

Subject: [PATCH] Btrfs: fix broken nocow after a normal balance

Balance will create reloc_root for each fs root, and it's going to
record last_snapshot to filter shared blocks.  The side effect of
setting last_snapshot is to break nocow attributes of files.

So here we update file extent's generation while walking relocated
file extents in data reloc root, and use file extent's generation
instead for checking if we have cross refs for the file extent.

That way we can make nocow happy again and have no impact on others.

Reported-by: Kyle Gates kylega...@hotmail.com
Signed-off-by: Liu Bo bo.li@oracle.com
---
fs/btrfs/ctree.h   |2 +-
fs/btrfs/extent-tree.c |   18 +-
fs/btrfs/inode.c   |   10 --
fs/btrfs/relocation.c  |1 +
4 files changed, 23 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 4560052..eb2e782 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3090,7 +3090,7 @@ int btrfs_pin_extent_for_log_replay(struct 
btrfs_root *root,

 u64 bytenr, u64 num_bytes);
int btrfs_cross_ref_exist(struct btrfs_trans_handle *trans,
   struct btrfs_root *root,
-   u64 objectid, u64 offset, u64 bytenr);
+   u64 objectid, u64 offset, u64 bytenr, u64 gen);
struct btrfs_block_group_cache *btrfs_lookup_block_group(
 struct btrfs_fs_info *info,
 u64 bytenr);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 1e84c74..f3b3616 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2816,7 +2816,8 @@ out:
static noinline int check_committed_ref(struct btrfs_trans_handle *trans,
 struct btrfs_root *root,
 struct btrfs_path *path,
- u64 objectid, u64 offset, u64 bytenr)
+ u64 objectid, u64 offset, u64 bytenr,
+ u64 fi_gen)
{
 struct btrfs_root *extent_root = root-fs_info-extent_root;
 struct extent_buffer *leaf;
@@ -2861,8 +2862,15 @@ static noinline int check_committed_ref(struct 
btrfs_trans_handle

*trans,
 btrfs_extent_inline_ref_size(BTRFS_EXTENT_DATA_REF_KEY))
 goto out;

- if (btrfs_extent_generation(leaf, ei) =
- btrfs_root_last_snapshot(root-root_item))
+ /*
+ * Usually generation in extent item is larger than that in file extent
+ * item because of delay refs.  But we don't want balance to break
+ * file's nocow behaviour, so use file_extent's generation which has
+ * been updates when we update fs root to point to relocated file
+ * extents in data reloc root.
+ */
+ fi_gen = max_t(u64, btrfs_extent_generation(leaf, ei), fi_gen);
+ if (fi_gen = btrfs_root_last_snapshot(root-root_item))
 goto out;

 iref = (struct btrfs_extent_inline_ref *)(ei + 1);
@@ -2886,7 +2894,7 @@ out:

int btrfs_cross_ref_exist(struct btrfs_trans_handle *trans,
   struct btrfs_root *root,
-   u64 objectid, u64 offset, u64 bytenr)
+   u64 objectid, u64 offset, u64 bytenr, u64 gen)
{
 struct btrfs_path *path;
 int ret;
@@ -2898,7 +2906,7 @@ int btrfs_cross_ref_exist(struct btrfs_trans_handle 
*trans,


 do {
 ret = check_committed_ref(trans, root, path, objectid,
-   offset, bytenr);
+   offset, bytenr, gen);
 if (ret  ret != -ENOENT)
 goto out;

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 2cfdd33..976b045 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1727,6 +1727,8 @@ next_slot:
 ram_bytes = btrfs_file_extent_ram_bytes(leaf, fi);
 if (extent_type == BTRFS_FILE_EXTENT_REG ||
 extent_type == BTRFS_FILE_EXTENT_PREALLOC) {
+ u64 gen;
+ gen = btrfs_file_extent_generation(leaf, fi);
 disk_bytenr = btrfs_file_extent_disk_bytenr(leaf, fi);
 extent_offset = btrfs_file_extent_offset(leaf, fi);
 extent_end = found_key.offset +
@@ -1749,7 +1751,8 @@ next_slot:
 goto out_check;
 if (btrfs_cross_ref_exist(trans, root, ino,
   found_key.offset -
-   extent_offset, disk_bytenr))
+   extent_offset, disk_bytenr,
+   gen))
 goto out_check;
 disk_bytenr += extent_offset;
 disk_bytenr += cur_offset - found_key.offset;
@@ -7002,6 +7005,7 @@ static noinline int can_nocow_odirect(struct 
btrfs_trans_handle

*trans,
 struct btrfs_key key;
 u64 disk_bytenr;
 u64 backref_offset;
+ u64 fi_gen;
 u64 extent_end;
 u64 num_bytes;
 int slot;
@@ -7048,6 +7052,7 @@ static noinline int can_nocow_odirect(struct 
btrfs_trans_handle

*trans,
 }
 disk_bytenr = btrfs_file_extent_disk_bytenr(leaf, fi);
 backref_offset = btrfs_file_extent_offset(leaf, fi);
+ fi_gen = btrfs_file_extent_generation(leaf, fi);

 *orig_start = key.offset - backref_offset;
 *orig_block_len = btrfs_file_extent_disk_num_bytes(leaf, fi);
@@ -7067,7 +7072,8 @@ static noinline int can_nocow_odirect(struct 
btrfs_trans_handle

*trans,
 * find any we must cow
 */
 if (btrfs_cross_ref_exist(trans, root, btrfs_ino(inode),
-   key.offset - backref_offset, disk_bytenr))
+   key.offset - backref_offset, disk_bytenr,
+   fi_gen))
 goto out;

 /*
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 704a1b8..07faabf 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -1637,6 +1637,7 @@ int 

Re: nocow 'C' flag ignored after balance

2013-05-28 Thread Liu Bo
On Tue, May 28, 2013 at 09:22:11AM -0500, Kyle Gates wrote:
 From: Liu Bo bo.li@oracle.com
 
 Subject: [PATCH] Btrfs: fix broken nocow after a normal balance
 
[...]
 
 Sorry for the long wait in replying.
 This patch was unsuccessful in fixing the problem (on my 3.8 Ubuntu
 Raring kernel). I can probably try again on a newer version if you
 think it will help.
 This was my first kernel compile so I patched by hand and waited (10
 hours on my old 32 bit single core machine).
 
 I did move some of the files off and back on to the filesystem to
 start fresh and compare but all seem to exhibit the same behavior
 after a balance.


Thanks for testing the patch although it didn't help you.
Actually I tested it to be sure that it fixed the problems in my reproducer.

So anyway can you please apply this debug patch in order to nail it down?

thanks,
liubo

 
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index df472ab..c12a11c 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2857,8 +2857,12 @@ static noinline int check_committed_ref(struct 
btrfs_trans_handle *trans,
goto out;
 
if (btrfs_extent_generation(leaf, ei) =
-   btrfs_root_last_snapshot(root-root_item))
+   btrfs_root_last_snapshot(root-root_item)) {
+   printk(extent gen %llu last_snap %llu\n,
+   btrfs_extent_generation(leaf, ei),
+   btrfs_root_last_snapshot(root-root_item));
goto out;
+   }
 
iref = (struct btrfs_extent_inline_ref *)(ei + 1);
if (btrfs_extent_inline_ref_type(leaf, iref) !=
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 23c596c..8cad6ee 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1317,16 +1317,24 @@ next_slot:
goto out_check;
if (btrfs_file_extent_compression(leaf, fi) ||
btrfs_file_extent_encryption(leaf, fi) ||
-   btrfs_file_extent_other_encoding(leaf, fi))
+   btrfs_file_extent_other_encoding(leaf, fi)) {
+   printk(special encoding\n);
goto out_check;
-   if (extent_type == BTRFS_FILE_EXTENT_REG  !force)
+   }
+   if (extent_type == BTRFS_FILE_EXTENT_REG  !force) {
+   printk(BTRFS_FILE_EXTENT_REF\n);
goto out_check;
-   if (btrfs_extent_readonly(root, disk_bytenr))
+   }
+   if (btrfs_extent_readonly(root, disk_bytenr)) {
+   printk(ro\n);
goto out_check;
+   }
if (btrfs_cross_ref_exist(trans, root, ino,
  found_key.offset -
- extent_offset, disk_bytenr))
+ extent_offset, disk_bytenr)) {
+   printk(cross ref\n);
goto out_check;
+   }
disk_bytenr += extent_offset;
disk_bytenr += cur_offset - found_key.offset;
num_bytes = min(end + 1, extent_end) - cur_offset;

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: nocow 'C' flag ignored after balance

2013-05-17 Thread Liu Bo
On Thu, May 16, 2013 at 02:11:41PM -0500, Kyle Gates wrote:
 and mounted with autodefrag
 Am I actually just seeing large ranges getting split while remaining
 contiguous on disk? This would imply crc calculation on the two
 outside ranges. Or perhaps there is some data being inlined for each
 write. I believe writes on this file are 32KiB each.
 Does the balance produce persistent crc values in the metadata even
 though the files are nocow which implies nocrc?
 ...
 I ran this test again and here's filefrag -v after about a day of use:
 
[...]
 As you can see the 32KiB writes fit in the extents of size 9 and 55.
 Are those 9 block extents inlined?
 If I understand correctly, new extents are created for these nocow
 writes, then the old extents are basically hole punched producing
 three (four? because of inlining) separate extents.
 Something here begs for optimization. Perhaps balance should treat
 nocow files a little differently. That would be the time to remove
 the extra bits that prevent inplace overwrites. After the fact it
 becomes much more difficult, although removing a crc for the extent
 being written seems a little easier then iterating over the entire
 file.
 
 Thanks for taking the time to read,
 Kyle
 
 P.S. I'm CCing David as I believe he wrote the patch to get the 'C'
 flag working on empty files and directories.

Hi Kyle,

Can you please apply this patch and see if it helps?

thanks,
liubo


From: Liu Bo bo.li@oracle.com

Subject: [PATCH] Btrfs: fix broken nocow after a normal balance

Balance will create reloc_root for each fs root, and it's going to
record last_snapshot to filter shared blocks.  The side effect of
setting last_snapshot is to break nocow attributes of files.

So here we update file extent's generation while walking relocated
file extents in data reloc root, and use file extent's generation
instead for checking if we have cross refs for the file extent.

That way we can make nocow happy again and have no impact on others.

Reported-by: Kyle Gates kylega...@hotmail.com
Signed-off-by: Liu Bo bo.li@oracle.com
---
 fs/btrfs/ctree.h   |2 +-
 fs/btrfs/extent-tree.c |   18 +-
 fs/btrfs/inode.c   |   10 --
 fs/btrfs/relocation.c  |1 +
 4 files changed, 23 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 4560052..eb2e782 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3090,7 +3090,7 @@ int btrfs_pin_extent_for_log_replay(struct btrfs_root 
*root,
u64 bytenr, u64 num_bytes);
 int btrfs_cross_ref_exist(struct btrfs_trans_handle *trans,
  struct btrfs_root *root,
- u64 objectid, u64 offset, u64 bytenr);
+ u64 objectid, u64 offset, u64 bytenr, u64 gen);
 struct btrfs_block_group_cache *btrfs_lookup_block_group(
 struct btrfs_fs_info *info,
 u64 bytenr);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 1e84c74..f3b3616 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2816,7 +2816,8 @@ out:
 static noinline int check_committed_ref(struct btrfs_trans_handle *trans,
struct btrfs_root *root,
struct btrfs_path *path,
-   u64 objectid, u64 offset, u64 bytenr)
+   u64 objectid, u64 offset, u64 bytenr,
+   u64 fi_gen)
 {
struct btrfs_root *extent_root = root-fs_info-extent_root;
struct extent_buffer *leaf;
@@ -2861,8 +2862,15 @@ static noinline int check_committed_ref(struct 
btrfs_trans_handle
*trans,
btrfs_extent_inline_ref_size(BTRFS_EXTENT_DATA_REF_KEY))
goto out;
 
-   if (btrfs_extent_generation(leaf, ei) =
-   btrfs_root_last_snapshot(root-root_item))
+   /*
+* Usually generation in extent item is larger than that in file extent
+* item because of delay refs.  But we don't want balance to break
+* file's nocow behaviour, so use file_extent's generation which has
+* been updates when we update fs root to point to relocated file
+* extents in data reloc root.
+*/
+   fi_gen = max_t(u64, btrfs_extent_generation(leaf, ei), fi_gen);
+   if (fi_gen = btrfs_root_last_snapshot(root-root_item))
goto out;
 
iref = (struct btrfs_extent_inline_ref *)(ei + 1);
@@ -2886,7 +2894,7 @@ out:
 
 int btrfs_cross_ref_exist(struct btrfs_trans_handle *trans,
  struct btrfs_root *root,
- u64 objectid, u64 offset, u64 bytenr)
+ u64 objectid, u64 offset, u64 bytenr, u64 gen)
 {
struct btrfs_path *path;
int ret;
@@ -2898,7 +2906,7 @@ int btrfs_cross_ref_exist(struct 

Re: nocow 'C' flag ignored after balance

2013-05-17 Thread Kyle Gates

On Fri, 17 May 2013 15:04:45 +0800, Liu Bo wrote:

On Thu, May 16, 2013 at 02:11:41PM -0500, Kyle Gates wrote:

and mounted with autodefrag
Am I actually just seeing large ranges getting split while remaining
contiguous on disk? This would imply crc calculation on the two
outside ranges. Or perhaps there is some data being inlined for each
write. I believe writes on this file are 32KiB each.
Does the balance produce persistent crc values in the metadata even
though the files are nocow which implies nocrc?
...
I ran this test again and here's filefrag -v after about a day of use:

[...]
As you can see the 32KiB writes fit in the extents of size 9 and 55.
Are those 9 block extents inlined?
If I understand correctly, new extents are created for these nocow
writes, then the old extents are basically hole punched producing
three (four? because of inlining) separate extents.
Something here begs for optimization. Perhaps balance should treat
nocow files a little differently. That would be the time to remove
the extra bits that prevent inplace overwrites. After the fact it
becomes much more difficult, although removing a crc for the extent
being written seems a little easier then iterating over the entire
file.

Thanks for taking the time to read,
Kyle

P.S. I'm CCing David as I believe he wrote the patch to get the 'C'
flag working on empty files and directories.


Hi Kyle,

Can you please apply this patch and see if it helps?

thanks,
liubo


From: Liu Bo bo.li@oracle.com

Subject: [PATCH] Btrfs: fix broken nocow after a normal balance

Balance will create reloc_root for each fs root, and it's going to
record last_snapshot to filter shared blocks.  The side effect of
setting last_snapshot is to break nocow attributes of files.

So here we update file extent's generation while walking relocated
file extents in data reloc root, and use file extent's generation
instead for checking if we have cross refs for the file extent.

That way we can make nocow happy again and have no impact on others.

Reported-by: Kyle Gates kylega...@hotmail.com
Signed-off-by: Liu Bo bo.li@oracle.com
---
fs/btrfs/ctree.h   |2 +-
fs/btrfs/extent-tree.c |   18 +-
fs/btrfs/inode.c   |   10 --
fs/btrfs/relocation.c  |1 +
4 files changed, 23 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 4560052..eb2e782 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3090,7 +3090,7 @@ int btrfs_pin_extent_for_log_replay(struct 
btrfs_root *root,

 u64 bytenr, u64 num_bytes);
int btrfs_cross_ref_exist(struct btrfs_trans_handle *trans,
   struct btrfs_root *root,
-   u64 objectid, u64 offset, u64 bytenr);
+   u64 objectid, u64 offset, u64 bytenr, u64 gen);
struct btrfs_block_group_cache *btrfs_lookup_block_group(
 struct btrfs_fs_info *info,
 u64 bytenr);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 1e84c74..f3b3616 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2816,7 +2816,8 @@ out:
static noinline int check_committed_ref(struct btrfs_trans_handle *trans,
 struct btrfs_root *root,
 struct btrfs_path *path,
- u64 objectid, u64 offset, u64 bytenr)
+ u64 objectid, u64 offset, u64 bytenr,
+ u64 fi_gen)
{
 struct btrfs_root *extent_root = root-fs_info-extent_root;
 struct extent_buffer *leaf;
@@ -2861,8 +2862,15 @@ static noinline int check_committed_ref(struct 
btrfs_trans_handle

*trans,
 btrfs_extent_inline_ref_size(BTRFS_EXTENT_DATA_REF_KEY))
 goto out;

- if (btrfs_extent_generation(leaf, ei) =
- btrfs_root_last_snapshot(root-root_item))
+ /*
+ * Usually generation in extent item is larger than that in file extent
+ * item because of delay refs.  But we don't want balance to break
+ * file's nocow behaviour, so use file_extent's generation which has
+ * been updates when we update fs root to point to relocated file
+ * extents in data reloc root.
+ */
+ fi_gen = max_t(u64, btrfs_extent_generation(leaf, ei), fi_gen);
+ if (fi_gen = btrfs_root_last_snapshot(root-root_item))
 goto out;

 iref = (struct btrfs_extent_inline_ref *)(ei + 1);
@@ -2886,7 +2894,7 @@ out:

int btrfs_cross_ref_exist(struct btrfs_trans_handle *trans,
   struct btrfs_root *root,
-   u64 objectid, u64 offset, u64 bytenr)
+   u64 objectid, u64 offset, u64 bytenr, u64 gen)
{
 struct btrfs_path *path;
 int ret;
@@ -2898,7 +2906,7 @@ int btrfs_cross_ref_exist(struct btrfs_trans_handle 
*trans,


 do {
 ret = check_committed_ref(trans, root, path, objectid,
-   offset, bytenr);
+   offset, bytenr, gen);
 if (ret  ret != -ENOENT)
 goto out;

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 2cfdd33..976b045 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1727,6 +1727,8 @@ next_slot:
 ram_bytes = btrfs_file_extent_ram_bytes(leaf, fi);
 if (extent_type == BTRFS_FILE_EXTENT_REG ||
 extent_type == BTRFS_FILE_EXTENT_PREALLOC) {
+ u64 gen;
+ gen = btrfs_file_extent_generation(leaf, fi);
 disk_bytenr = 

Re: nocow 'C' flag ignored after balance

2013-05-16 Thread Kyle Gates

On Fri, May 10, 2013 Liu Bo wrote:

On Thu, May 09, 2013 at 03:41:49PM -0500, Kyle Gates wrote:

I'll preface that I'm running Ubuntu 13.04 with the standard 3.8
series kernel so please disregard if this has been fixed in higher
versions. This is on a btrfs RAID1 with 3 then 4 disks.

My use case is to set the nocow 'C' flag on a directory and copy in
some files, then make lots of writes (same file sizes) and note that
the number of extents stays the same, good.
Then run a balance (I added a disk) and start making writes again,
now the number of extents starts climbing, boo.
Is this standard behavior? I realize a balance will cow the files.
Are they also being checksummed thereby breaking the nocow flag?

I have made no snapshots and made no writes to said files while the
balance was running.


Hi Kyle,

It's hard to say if it's standard, it is a side effect casued by balance.

During balance, our reloc root works like a snapshot, so we set
last_snapshot on the fs root, and this makes new nocow writes think that
we have to do cow as the extent is created before taking snapshot.

But the nocow 'C' flag on the file is still there, if you make new
writes on the new extent after balance, you still get the same number of
extents.

thanks,
liubo


Thank you for the explanation.
On my machine this didn't happen however. IIRC one ~10GiB file had 24 
extents before balance, 26 extents after balance, and 1000+ and growing 
when I checked the following day.
I'll add that I am running a relatively recent version of btrfs-tools from 
a ppa.

and mounted with autodefrag
Am I actually just seeing large ranges getting split while remaining 
contiguous on disk? This would imply crc calculation on the two outside 
ranges. Or perhaps there is some data being inlined for each write. I 
believe writes on this file are 32KiB each.
Does the balance produce persistent crc values in the metadata even though 
the files are nocow which implies nocrc?

...
I ran this test again and here's filefrag -v after about a day of use:

Filesystem type is: 9123683e
File size of /blah/blah/file is 10213265920 (2493474 blocks, blocksize 4096)
ext logical physical expected length flags
  0   0 675625629   9
  1   9 675621279 675625638 55
  2  64 674410131 675621334886
  3 950 675558303 674411017  9
  4 959 675583473 675558312 55
  51014 674411081 675583528708
  61722 675456318 674411789  9
  71731 675710934 675456327 55
  81786 674411853 675710989521
  92307 675424433 674412374  9
 102316 675471062 675424442 55
 112371 674412438 675471117984
 123355 676012018 674413422  9
 133364 676024295 676012027 55
 143419 674413486 676024350871
 154290 675681138 674414357  9
 164299 675618500 675681147 55
...
13986 2486955 671627059 675876382627
13987 2487582 675677542 671627686  9
13988 2487591 675700351 675677551 55
13989 2487646 671627750 675700406   1212
13990 2488858 675932037 671628962  9
13991 2488867 675990025 675932046 55
13992 2488922 671629026 675990080220
13993 2489142 675674447 671629246  9
13994 2489151 675687864 675674456 55
13995 2489206 671629310 675687919   1821
13996 2491027 676209288 671631131  9
13997 2491036 676260767 676209297 55
13998 2491091 671631195 676260822285
13999 2491376 675650278 671631480  9
14000 2491385 675678822 675650287 55
14001 2491440 671631544 675678877   1464
14002 2492904 675534255 671633008  9
14003 2492913 675503514 675534264 55
14004 2492968 671633072 675503569506 eof
/blah/blah/file: 14005 extents found

As you can see the 32KiB writes fit in the extents of size 9 and 55. Are 
those 9 block extents inlined?
If I understand correctly, new extents are created for these nocow writes, 
then the old extents are basically hole punched producing three (four? 
because of inlining) separate extents.
Something here begs for optimization. Perhaps balance should treat nocow 
files a little differently. That would be the time to remove the extra bits 
that prevent inplace overwrites. After the fact it becomes much more 
difficult, although removing a crc for the extent being written seems a 
little easier then iterating over the entire file.


Thanks for taking the time to read,
Kyle

P.S. I'm CCing David as I believe he wrote the patch to get the 'C' flag 
working on empty files and directories. 


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: nocow 'C' flag ignored after balance

2013-05-10 Thread Kyle Gates

On Fri, May 10, 2013 Liu Bo wrote:

On Thu, May 09, 2013 at 03:41:49PM -0500, Kyle Gates wrote:

I'll preface that I'm running Ubuntu 13.04 with the standard 3.8
series kernel so please disregard if this has been fixed in higher
versions. This is on a btrfs RAID1 with 3 then 4 disks.

My use case is to set the nocow 'C' flag on a directory and copy in
some files, then make lots of writes (same file sizes) and note that
the number of extents stays the same, good.
Then run a balance (I added a disk) and start making writes again,
now the number of extents starts climbing, boo.
Is this standard behavior? I realize a balance will cow the files.
Are they also being checksummed thereby breaking the nocow flag?

I have made no snapshots and made no writes to said files while the
balance was running.


Hi Kyle,

It's hard to say if it's standard, it is a side effect casued by balance.

During balance, our reloc root works like a snapshot, so we set
last_snapshot on the fs root, and this makes new nocow writes think that
we have to do cow as the extent is created before taking snapshot.

But the nocow 'C' flag on the file is still there, if you make new
writes on the new extent after balance, you still get the same number of
extents.

thanks,
liubo


Thank you for the explanation.
On my machine this didn't happen however. IIRC one 10GiB file had 24 extents 
before balance, 26 extents after balance, and 1000+ and growing when I 
checked the following day.
I'll add that I am running a relatively recent version of btrfs-tools from a 
ppa. 


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


nocow 'C' flag ignored after balance

2013-05-09 Thread Kyle Gates
I'll preface that I'm running Ubuntu 13.04 with the standard 3.8 series 
kernel so please disregard if this has been fixed in higher versions. This 
is on a btrfs RAID1 with 3 then 4 disks.


My use case is to set the nocow 'C' flag on a directory and copy in some 
files, then make lots of writes (same file sizes) and note that the number 
of extents stays the same, good.
Then run a balance (I added a disk) and start making writes again, now the 
number of extents starts climbing, boo.
Is this standard behavior? I realize a balance will cow the files. Are they 
also being checksummed thereby breaking the nocow flag?


I have made no snapshots and made no writes to said files while the balance 
was running.


Thanks,
Kyle 


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: nocow 'C' flag ignored after balance

2013-05-09 Thread Liu Bo
On Thu, May 09, 2013 at 03:41:49PM -0500, Kyle Gates wrote:
 I'll preface that I'm running Ubuntu 13.04 with the standard 3.8
 series kernel so please disregard if this has been fixed in higher
 versions. This is on a btrfs RAID1 with 3 then 4 disks.
 
 My use case is to set the nocow 'C' flag on a directory and copy in
 some files, then make lots of writes (same file sizes) and note that
 the number of extents stays the same, good.
 Then run a balance (I added a disk) and start making writes again,
 now the number of extents starts climbing, boo.
 Is this standard behavior? I realize a balance will cow the files.
 Are they also being checksummed thereby breaking the nocow flag?
 
 I have made no snapshots and made no writes to said files while the
 balance was running.

Hi Kyle,

It's hard to say if it's standard, it is a side effect casued by balance.

During balance, our reloc root works like a snapshot, so we set
last_snapshot on the fs root, and this makes new nocow writes think that
we have to do cow as the extent is created before taking snapshot.

But the nocow 'C' flag on the file is still there, if you make new
writes on the new extent after balance, you still get the same number of
extents.

thanks,
liubo
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html