Please help me to contribute to btrfs project

2014-03-18 Thread Ajesh js
Hi,

I have used the btrfs filesystem in one of my projects and I have
added a small feature to it. I feel that the same feature will be
useful for others too. Hence I would like to contribute the same to
open source.

If everything works fine and this feature is not already added by
somebody else, this will be my first contribution to the opensource 
I am excited to join the huge family of opensource :)

Please help me with a precise steps to do the same.

Thank you,
Ajesh
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/6 EARLY RFC] Btrfs: Get rid of whole page I/O.

2014-03-18 Thread chandan
Hello David,

 I looked at previous postings of this patchset, but haven't found what
 are the expected supported block sizes.
 
 I assume powers of two starting with 512b, until 64k.

The earlier patchset posted by Chandra Seethraman was to get 4k
blocksize to work with ppc64's 64k PAGE_SIZE. I chose to do 2k
blocksize on x86_64's 4k PAGE_SIZE since that would allow others in
the community to work/experiment with subpagesize-blocksize feature.

The root node of tree root tree has 1957 bytes being written by
make_btrfs() (in btrfs-progs).  Hence I chose to do 2k blocksize for
the initial subpagesize-blocksize work. So with this patchset the
supported blocksizes would be in the range 2k-64k.

Thanks,
chandan

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How to handle a RAID5 arrawy with a failing drive?

2014-03-18 Thread Duncan
Marc MERLIN posted on Sun, 16 Mar 2014 15:20:26 -0700 as excerpted:

 Do I have other options?
 (data is not important at all, I just want to learn how to deal with
 such a case with the current code)

First just a note that you hijacked Mr Manana's patch thread.  Replying 
to a post and changing the topic (the usual cause of such hijacks) does 
NOT change the thread, as the References and In-Reply-To headers still 
includes the Message-IDs from the original thread, and that's what good 
clients thread by since the subject line isn't a reliable means of 
threading.  To start a NEW thread, don't reply to an existing thread, 
compose a NEW message, starting a NEW thread. =:^)

Back on topic...

Since you don't have to worry about the data I'd suggest blowing it away 
and starting over.  Btrfs raid5/6 code is known to be incomplete at this 
point, to work in normal mode and write everything out, but with 
incomplete recovery code.  So I'd treat it like the raid-0 mode it 
effectively is, and consider it lost if a device drops.

There *IS* a post from an earlier thread where someone mentioned a 
recovery under some specific circumstance that worked for him, but I'd 
consider that the exception not the norm since the code is known to be 
incomplete and I think he just got lucky and didn't hit the particular 
missing code in his specific case.  Certainly you could try to go back 
and see what he did and under what conditions, and that might actually be 
worth doing if you had valuable data you'd be losing otherwise, but since 
you don't, while of course it's up to you, I'd not bother were it me.

Which I haven't.  My use-case wouldn't be looking at raid5/6 (or raid0) 
anyway, but even if it were, I'd not touch the current code unless it 
/was/ just for something I'd consider risking on a raid0.  Other than 
pure testing, the /only/ case I'd consider btrfs raid5/6 for right now, 
would be something that I'd consider raid0 riskable currently, but with 
the bonus of it upgrading for free to raid5/6 when the code is complete 
without any further effort on my part, since it's actually being written 
as raid5/6 ATM, the recovery simply can't be relied upon as raid5/6, so 
in recovery terms you're effectively running raid0 until it can be.  
Other than that and for /pure/ testing, I just don't see the point of 
even thinking about raid5/6 at this point.

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix a crash of clone with inline extents's split

2014-03-18 Thread Liu Bo
On Mon, Mar 17, 2014 at 03:41:31PM +0100, David Sterba wrote:
 On Mon, Mar 10, 2014 at 06:56:07PM +0800, Liu Bo wrote:
  xfstests's btrfs/035 triggers a BUG_ON, which we use to detect the split
  of inline extents in __btrfs_drop_extents().
  
  For inline extents, we cannot duplicate another EXTENT_DATA item, because
  it breaks the rule of inline extents, that is, 'start offset' needs to be 0.
  
  We have set limitations for the source inode's compressed inline extents,
  because it needs to decompress and recompress.  Now the destination inode's
  inline extents also need similar limitations.
 
 The limitation (by lack of implementation, not by design) of compressed
 inline extents is there, but it's impossible to reach. The inline
 extents are never longer than the 'inline limit' (the ~3916 size), so
 the comment is more a note to the future.
 
 You're adding another limitation to avoid a crash, but I don't agree
 that EINVAL is right here, due to the fact that it's lack of
 implementation, not a real error.
 
 There are enough EINVAL's that verify correcntess of the input
 parameters and it's not always clear which one fails. The EOPNOTSUPP
 errocode is close to the true reason of the failure, but it could be
 misinterpreted as if the whole clone operation is not supported, so it's
 not all correct but IMO better than EINVAL.

Yep, I was hesitating on these two errors while making the patch, but I
prefer EINVAL rather than EOPNOTSUPP because of the reason you've stated.

I think it'd be good to add one more btrfs_printk message to clarify what's
happening here, agree?

 
 The most common case of 'cp --reflink' is not affected by this.
 
  
  With this, xfstests btrfs/035 doesn't run into panic.
  
  Signed-off-by: Liu Bo bo.li@oracle.com
  ---
   fs/btrfs/file.c  | 15 ---
   fs/btrfs/ioctl.c | 10 ++
   2 files changed, 18 insertions(+), 7 deletions(-)
  
  diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
  index 0165b86..2c34a04 100644
  --- a/fs/btrfs/ioctl.c
  +++ b/fs/btrfs/ioctl.c
  @@ -3090,8 +3090,9 @@ process_slot:
   new_key.offset + datal,
   1);
  if (ret) {
  -   btrfs_abort_transaction(trans, root,
  -   ret);
  +   if (ret != -EINVAL)
  +   btrfs_abort_transaction(trans,
  +   root, ret);
 
 The error comes from __btrfs_drop_extents and all callers would need to
 be updated (or at least reviewed) with the 'ret != ...' check as well,
 because it changes the semantics. And I'm not sure if to the right
 direction.

Good point, Dave, actually I missed this part before, just checked for
callers of __btrfs_drop_extents() and btrfs_drop_extents(), luckily EINVAL is
not a special one at these places, the error is just returned to upper callers.

 
  btrfs_end_transaction(trans, root);
  goto out;
  }
  @@ -3175,8 +3176,9 @@ static noinline long btrfs_ioctl_clone(struct file 
  *file, unsigned long srcfd,
   *   decompress into destination's address_space (the file offset
   *   may change, so source mapping won't do), then recompress (or
   *   otherwise reinsert) a subrange.
 
  -* - allow ranges within the same file to be cloned (provided
  -*   they don't overlap)?
 
 True, but unrelated.

yep, that's right, will clean it up.

Thanks for the comments!

-liubo
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/6] Btrfs-progs: fsck: deal with snapshot one by one when rebuilding extent tree

2014-03-18 Thread Wang Shilong
Previously, we deal with node block firstly and then leaf block which can
maximize readahead. However, to rebuild extent tree, we need deal with snapshot
one by one.

This patch makes us deal with snapshot one by one if we need rebuild extent
tree otherwise we drop into previous way.

Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com
---
 cmds-check.c | 248 +--
 1 file changed, 158 insertions(+), 90 deletions(-)

diff --git a/cmds-check.c b/cmds-check.c
index b3f7e22..e40b806 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -123,10 +123,14 @@ struct inode_backref {
char name[0];
 };
 
-struct dropping_root_item_record {
+struct root_item_record {
struct list_head list;
-   struct btrfs_root_item ri;
-   struct btrfs_key found_key;
+   u64 objectid;
+   u64 bytenr;
+   u8 level;
+   u8 drop_level;
+   int level_size;
+   struct btrfs_key drop_key;
 };
 
 #define REF_ERR_NO_DIR_ITEM(1  0)
@@ -3839,7 +3843,7 @@ static int run_next_block(struct btrfs_trans_handle 
*trans,
  struct rb_root *dev_cache,
  struct block_group_tree *block_group_cache,
  struct device_extent_tree *dev_extent_cache,
- struct btrfs_root_item *ri)
+ struct root_item_record *ri)
 {
struct extent_buffer *buf;
u64 bytenr;
@@ -4072,11 +4076,8 @@ static int run_next_block(struct btrfs_trans_handle 
*trans,
size = btrfs_level_size(root, level - 1);
btrfs_node_key_to_cpu(buf, key, i);
if (ri != NULL) {
-   struct btrfs_key drop_key;
-   btrfs_disk_key_to_cpu(drop_key,
- ri-drop_progress);
if ((level == ri-drop_level)
-is_dropped_key(key, drop_key)) {
+is_dropped_key(key, ri-drop_key)) {
continue;
}
}
@@ -4117,7 +4118,7 @@ static int add_root_to_pending(struct extent_buffer *buf,
   struct cache_tree *pending,
   struct cache_tree *seen,
   struct cache_tree *nodes,
-  struct btrfs_key *root_key)
+  u64 objectid)
 {
if (btrfs_header_level(buf)  0)
add_pending(nodes, seen, buf-start, buf-len);
@@ -4126,13 +4127,12 @@ static int add_root_to_pending(struct extent_buffer 
*buf,
add_extent_rec(extent_cache, NULL, 0, buf-start, buf-len,
   0, 1, 1, 0, 1, 0, buf-len);
 
-   if (root_key-objectid == BTRFS_TREE_RELOC_OBJECTID ||
+   if (objectid == BTRFS_TREE_RELOC_OBJECTID ||
btrfs_header_backref_rev(buf)  BTRFS_MIXED_BACKREF_REV)
add_tree_backref(extent_cache, buf-start, buf-start,
 0, 1);
else
-   add_tree_backref(extent_cache, buf-start, 0,
-root_key-objectid, 1);
+   add_tree_backref(extent_cache, buf-start, 0, objectid, 1);
return 0;
 }
 
@@ -5695,6 +5695,99 @@ static int check_devices(struct rb_root *dev_cache,
return ret;
 }
 
+static int add_root_item_to_list(struct list_head *head,
+ u64 objectid, u64 bytenr,
+ u8 level, u8 drop_level,
+ int level_size, struct btrfs_key *drop_key)
+{
+
+   struct root_item_record *ri_rec;
+   ri_rec = malloc(sizeof(*ri_rec));
+   if (!ri_rec)
+   return -ENOMEM;
+   ri_rec-bytenr = bytenr;
+   ri_rec-objectid = objectid;
+   ri_rec-level = level;
+   ri_rec-level_size = level_size;
+   ri_rec-drop_level = drop_level;
+   if (drop_key)
+   memcpy(ri_rec-drop_key, drop_key, sizeof(*drop_key));
+   list_add_tail(ri_rec-list, head);
+
+   return 0;
+}
+
+static int deal_root_from_list(struct list_head *list,
+  struct btrfs_trans_handle *trans,
+  struct btrfs_root *root,
+  struct block_info *bits,
+  int bits_nr,
+  struct cache_tree *pending,
+  struct cache_tree *seen,
+  struct cache_tree *reada,
+  struct cache_tree *nodes,
+  struct cache_tree *extent_cache,
+  struct cache_tree *chunk_cache,
+  struct rb_root *dev_cache,
+  struct block_group_tree 

[PATCH 1/6] Btrfs-progs: fsck: don't free @seen cache until we finish searching

2014-03-18 Thread Wang Shilong
@seen cache is used to avoid iterating same block more than once, and
we can not free them until we have finished searching.

Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com
---
 cmds-check.c | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/cmds-check.c b/cmds-check.c
index d1cafe1..c0b7f8c 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -3892,12 +3892,6 @@ static int run_next_block(struct btrfs_trans_handle 
*trans,
remove_cache_extent(nodes, cache);
free(cache);
}
-   cache = lookup_cache_extent(seen, bytenr, size);
-   if (cache) {
-   remove_cache_extent(seen, cache);
-   free(cache);
-   }
-
cache = lookup_cache_extent(extent_cache, bytenr, size);
if (cache) {
struct extent_record *rec;
@@ -5914,6 +5908,7 @@ out:
free_device_cache_tree(dev_cache);
free_block_group_tree(block_group_cache);
free_device_extent_tree(dev_extent_cache);
+   free_extent_cache_tree(seen);
return ret;
 }
 
-- 
1.9.0

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/6] Btrfs-progs: fsck: reduce memory usage of extent record struct

2014-03-18 Thread Wang Shilong
Two changes:
1.use bit filed for @found_rec
2.u32 is enough to calculate duplicate extent number.

Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com
---
 cmds-check.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/cmds-check.c b/cmds-check.c
index e1238d7..34f8fa6 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -92,7 +92,6 @@ struct extent_record {
struct list_head list;
struct cache_extent cache;
struct btrfs_disk_key parent_key;
-   unsigned int found_rec;
u64 start;
u64 max_size;
u64 nr;
@@ -101,8 +100,9 @@ struct extent_record {
u64 generation;
u64 parent_generation;
u64 info_objectid;
-   u64 num_duplicates;
+   u32 num_duplicates;
u8 info_level;
+   unsigned int found_rec:1;
unsigned int content_checked:1;
unsigned int owner_ref_checked:1;
unsigned int is_root:1;
@@ -2742,7 +2742,10 @@ static int add_extent_rec(struct cache_tree 
*extent_cache,
rec-start = start;
rec-max_size = max_size;
rec-nr = max(nr, max_size);
-   rec-found_rec = extent_rec;
+   if (extent_rec)
+   rec-found_rec = 1;
+   else
+   rec-found_rec = 0;
rec-content_checked = 0;
rec-owner_ref_checked = 0;
rec-num_duplicates = 0;
-- 
1.9.0

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/6] Btrfs-progs: fsck: fix wrong index in pick_next_pending()

2014-03-18 Thread Wang Shilong
Though all tree blocks have same size, we'd better use right
index here.

Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com
---
 cmds-check.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/cmds-check.c b/cmds-check.c
index 34f8fa6..ebdb643 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -2928,7 +2928,7 @@ static int pick_next_pending(struct cache_tree *pending,
cache = search_cache_extent(reada, 0);
if (cache) {
bits[0].start = cache-start;
-   bits[1].size = cache-size;
+   bits[0].size = cache-size;
*reada_bits = 1;
return 1;
}
-- 
1.9.0

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/6] Btrfs-progs: fsck: fix possible memory leaks in run_next_block()

2014-03-18 Thread Wang Shilong
We still need free allocated cache memory in case error happens.

Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com
---
 cmds-check.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/cmds-check.c b/cmds-check.c
index c0b7f8c..b3f7e22 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -5909,6 +5909,9 @@ out:
free_block_group_tree(block_group_cache);
free_device_extent_tree(dev_extent_cache);
free_extent_cache_tree(seen);
+   free_extent_cache_tree(pending);
+   free_extent_cache_tree(reada);
+   free_extent_cache_tree(nodes);
return ret;
 }
 
-- 
1.9.0

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/6] Btrfs-progs: fsck: add ability to rebuild extent tree with snapshots

2014-03-18 Thread Wang Shilong
This patch makes us to rebuild a really corrupt extent tree with snapshots.
To implement this, we have to verify whether a block is FULL BACKREF.

This idea come from Josef Bacik:

1) We walk down the original tree, every eb we encounter has
btrfs_header_owner(eb) == root-objectid.  We add normal references
for this root (BTRFS_TREE_BLOCK_REF_KEY) for this root.  World peace
is achieved.

2) We walk down the snapshotted tree.  Say we didn't change anything
at all, it was just a clean snapshot and then boom.  So the
btrfs_header_owner(root-node) == root-objectid, so normal backref.
We walk down to the next level, where btrfs_header_owner(eb) !=
root-objectid, but the level above did, so we add normal refs for all
of these blocks.  We go down the next level, now our
btrfs_header_owner(parent) != root-objectid and
btrfs_header_owner(eb) != root-objectid.  This is where we need to
now go back and see if btrfs_header_owner(eb) currently has a ref on
eb.  If it does we are done, move on to the next block in this same
level, we don't have to go further down.

3) Harder case, we snapshotted and then changed things in the original
root.  Do the same thing as in step 2, but now we get down to
btrfs_header_owner(eb) != root-objectid  btrfs_header_owner(parent)
!= root-objectid.  We lookup the references we have for eb and notice
that btrfs_header_owner(eb) no longer refers to eb.  So now we must
set FULL_BACKREF on this extent reference and add a
SHARED_BLOCK_REF_KEY for this eb using the parent-start as the
offset.  And we need to keep walking down and doing the same thing
until we either hit level 0 or btrfs_header_owner(eb) has a ref on the
block.

Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com
---
 cmds-check.c | 132 +--
 1 file changed, 129 insertions(+), 3 deletions(-)

diff --git a/cmds-check.c b/cmds-check.c
index e40b806..e1238d7 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -107,6 +107,7 @@ struct extent_record {
unsigned int owner_ref_checked:1;
unsigned int is_root:1;
unsigned int metadata:1;
+   unsigned int flag_block_full_backref:1;
 };
 
 struct inode_backref {
@@ -3829,6 +3830,127 @@ static int is_dropped_key(struct btrfs_key *key,
return 0;
 }
 
+static int calc_extent_flag(struct btrfs_root *root,
+  struct cache_tree *extent_cache,
+  struct extent_buffer *buf,
+  struct root_item_record *ri,
+  u64 *flags)
+{
+   int i;
+   int nritems = btrfs_header_nritems(buf);
+   struct btrfs_key key;
+   struct extent_record *rec;
+   struct cache_extent *cache;
+   struct data_backref *dback;
+   struct tree_backref *tback;
+   struct extent_buffer *new_buf;
+   u64 owner = 0;
+   u64 bytenr;
+   u64 offset;
+   u64 ptr;
+   int size;
+   int ret;
+   u8 level;
+
+   /*
+* Except file/reloc tree, we can not have
+* FULL BACKREF MODE
+*/
+   if (ri-objectid  BTRFS_FIRST_FREE_OBJECTID)
+   goto normal;
+   /*
+* root node
+*/
+   if (buf-start == ri-bytenr)
+   goto normal;
+   if (btrfs_is_leaf(buf)) {
+   /*
+* we are searching from original root, world
+* peace is achieved, we use normal backref.
+*/
+   owner = btrfs_header_owner(buf);
+   if (owner == ri-objectid)
+   goto normal;
+   /*
+* we check every eb here, and if any of
+* eb dosen't have original root refers
+* to this eb, we set full backref flag for
+* this extent, otherwise normal backref.
+*/
+   for (i = 0; i  nritems; i++) {
+   struct btrfs_file_extent_item *fi;
+   btrfs_item_key_to_cpu(buf, key, i);
+
+   if (key.type != BTRFS_EXTENT_DATA_KEY)
+   continue;
+   fi = btrfs_item_ptr(buf, i,
+   struct btrfs_file_extent_item);
+   if (btrfs_file_extent_type(buf, fi) ==
+   BTRFS_FILE_EXTENT_INLINE)
+   continue;
+   if (btrfs_file_extent_disk_bytenr(buf, fi) == 0)
+   continue;
+   bytenr = btrfs_file_extent_disk_bytenr(buf, fi);
+   cache = lookup_cache_extent(extent_cache, bytenr, 1);
+   if (!cache)
+   goto full_backref;
+   offset = btrfs_file_extent_offset(buf, fi);
+   rec = container_of(cache, struct extent_record, cache);
+   dback = find_data_backref(rec, 0, ri-objectid, owner,
+  

Re: Please help me to contribute to btrfs project

2014-03-18 Thread Ben Gamari
Ajesh js coolajes...@gmail.com writes:

 Hi,

 I have used the btrfs filesystem in one of my projects and I have
 added a small feature to it. I feel that the same feature will be
 useful for others too. Hence I would like to contribute the same to
 open source.

Excellent!

 If everything works fine and this feature is not already added by
 somebody else, this will be my first contribution to the opensource 
 I am excited to join the huge family of opensource :)

 Please help me with a precise steps to do the same.

In general the way to contribute is to send a patch for review. You
should have a look at the code style guidelines[1] and patch submission
guidelines[2] in the kernel tree. For nontrivial changes the patch
should be accompanied by a cover letter describing the change and the
motivations for any non-obvious design decisions.

It is possible that your change is acceptable as-is. More likely,
however, is that there will be some discussion and requests for
changes. Eventually the review process will produce a merge-worthy
patch. The first step, however, is sending something concrete for
community review.

Cheers,

- Ben


[1] 
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/CodingStyle
[2] 
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/SubmittingPatches



pgp9hFdMVn2wY.pgp
Description: PGP signature


[PATCH v3] Btrfs: part 2, fix incremental send's decision to delay a dir move/rename

2014-03-18 Thread Filipe David Borba Manana
For an incremental send, fix the process of determining whether the directory
inode we're currently processing needs to have its move/rename operation 
delayed.

We were ignoring the fact that if the inode's new immediate ancestor has a 
higher
inode number than ours but wasn't renamed/moved, we might still need to delay 
our
move/rename, because some other ancestor directory higher in the hierarchy might
have an inode number higher than ours *and* was renamed/moved too - in this case
we have to wait for rename/move of that ancestor to happen before our current
directory's rename/move operation.

Simple steps to reproduce this issue:

  $ mkfs.btrfs -f /dev/sdd
  $ mount /dev/sdd /mnt

  $ mkdir -p /mnt/a/x1/x2
  $ mkdir /mnt/a/Z
  $ mkdir -p /mnt/a/x1/x2/x3/x4/x5

  $ btrfs subvolume snapshot -r /mnt /mnt/snap1
  $ btrfs send /mnt/snap1 -f /tmp/base.send

  $ mv /mnt/a/x1/x2/x3 /mnt/a/Z/X33
  $ mv /mnt/a/x1/x2 /mnt/a/Z/X33/x4/x5/X22

  $ btrfs subvolume snapshot -r /mnt /mnt/snap2
  $ btrfs send -p /mnt/snap1 /mnt/snap2 -f /tmp/incremental.send

The incremental send caused the kernel code to enter an infinite loop when
building the path string for directory Z after its references are processed.

A more complex scenario:

  $ mkfs.btrfs -f /dev/sdd
  $ mount /dev/sdd /mnt

  $ mkdir -p /mnt/a/b/c/d
  $ mkdir /mnt/a/b/c/d/e
  $ mkdir /mnt/a/b/c/d/f
  $ mv /mnt/a/b/c/d/e /mnt/a/b/c/d/f/E2
  $ mkdir /mmt/a/b/c/g
  $ mv /mnt/a/b/c/d /mnt/a/b/D2

  $ btrfs subvolume snapshot -r /mnt /mnt/snap1
  $ btrfs send /mnt/snap1 -f /tmp/base.send

  $ mkdir /mnt/a/o
  $ mv /mnt/a/b/c/g /mnt/a/b/D2/f/G2
  $ mv /mnt/a/b/D2 /mnt/a/b/dd
  $ mv /mnt/a/b/c /mnt/a/C2
  $ mv /mnt/a/b/dd/f /mnt/a/o/FF
  $ mv /mnt/a/b /mnt/a/o/FF/E2/BB

  $ btrfs subvolume snapshot -r /mnt /mnt/snap2
  $ btrfs send -p /mnt/snap1 /mnt/snap2 -f /tmp/incremental.send

A test case for xfstests follows.

Signed-off-by: Filipe David Borba Manana fdman...@gmail.com
---

V2: Added missing error handling and fixed typo in commit message.
V3: Updated the algorithm to deal with more complex cases, hopefully all
cases are nailed down now.

 fs/btrfs/send.c |   56 ---
 1 file changed, 53 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index d869079..5d757ee 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -2916,7 +2916,7 @@ static void free_waiting_dir_move(struct send_ctx *sctx,
kfree(dm);
 }
 
-static int add_pending_dir_move(struct send_ctx *sctx, u64 parent_ino)
+static int add_pending_dir_move(struct send_ctx *sctx, u64 ino, u64 parent_ino)
 {
struct rb_node **p = sctx-pending_dir_moves.rb_node;
struct rb_node *parent = NULL;
@@ -2929,7 +2929,7 @@ static int add_pending_dir_move(struct send_ctx *sctx, 
u64 parent_ino)
if (!pm)
return -ENOMEM;
pm-parent_ino = parent_ino;
-   pm-ino = sctx-cur_ino;
+   pm-ino = ino;
pm-gen = sctx-cur_inode_gen;
INIT_LIST_HEAD(pm-list);
INIT_LIST_HEAD(pm-update_refs);
@@ -3183,6 +3183,7 @@ static int wait_for_parent_move(struct send_ctx *sctx,
struct fs_path *path_before = NULL;
struct fs_path *path_after = NULL;
int len1, len2;
+   int register_upper_dirs;
 
if (is_waiting_for_move(sctx, ino))
return 1;
@@ -3242,6 +3243,54 @@ static int wait_for_parent_move(struct send_ctx *sctx,
}
ret = 0;
 
+   /*
+* Ok, our new most direct ancestor has a higher inode number but
+* wasn't moved/renamed. So maybe some of the new ancestors higher in
+* the hierarchy have an higher inode number too *and* were renamed
+* or moved - in this case we need to wait for the ancestor's rename
+* or move operation before we can do the move/rename for the current
+* inode.
+*/
+   register_upper_dirs = 0;
+again:
+   while ((ret == 0 || register_upper_dirs) 
+  parent_ino_after  sctx-cur_ino) {
+   ino = parent_ino_after;
+   fs_path_reset(path_before);
+   fs_path_reset(path_after);
+
+   ret = get_first_ref(sctx-send_root, ino, parent_ino_after,
+   NULL, path_after);
+   if (ret  0)
+   goto out;
+   ret = get_first_ref(sctx-parent_root, ino, parent_ino_before,
+   NULL, path_before);
+   if (ret == -ENOENT) {
+   ret = 0;
+   break;
+   } else if (ret  0) {
+   goto out;
+   }
+
+   len1 = fs_path_len(path_before);
+   len2 = fs_path_len(path_after);
+   if (parent_ino_before != parent_ino_after || len1 != len2 ||
+   

[PATCH] Btrfs: remove unnecessary inode generation lookup in send

2014-03-18 Thread Filipe David Borba Manana
No need to search in the send tree for the generation number of the inode,
we already have it in the recorded_ref structure passed to us.

Signed-off-by: Filipe David Borba Manana fdman...@gmail.com
---
 fs/btrfs/send.c |9 ++---
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index 5d757ee..db4b10c 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -3179,7 +3179,7 @@ static int wait_for_parent_move(struct send_ctx *sctx,
int ret;
u64 ino = parent_ref-dir;
u64 parent_ino_before, parent_ino_after;
-   u64 new_gen, old_gen;
+   u64 old_gen;
struct fs_path *path_before = NULL;
struct fs_path *path_after = NULL;
int len1, len2;
@@ -3198,12 +3198,7 @@ static int wait_for_parent_move(struct send_ctx *sctx,
else if (ret  0)
return ret;
 
-   ret = get_inode_info(sctx-send_root, ino, NULL, new_gen,
-NULL, NULL, NULL, NULL);
-   if (ret  0)
-   return ret;
-
-   if (new_gen != old_gen)
+   if (parent_ref-dir_gen != old_gen)
return 0;
 
path_before = fs_path_alloc();
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3] xfstests: add test for btrfs send regarding directory moves/renames

2014-03-18 Thread Filipe David Borba Manana
Regression test for a btrfs incremental send issue where the kernel entered
an infinite loop building a path string. This happened when either of the 2
following cases happened:

1) A directory was made a child of another directory which has a lower inode
   number and has a pending move/rename operation;

2) A directory was made a child of another directory which has a higher inode
   number, but the new parent wasn't moved nor renamed. Instead some other
   ancestor higher in the hierarchy, with an higher inode number too, was
   moved/renamed too.

This issue is fixed by the following linux kernel btrfs patch:

   Btrfs: fix incremental send's decision to delay a dir move/rename
   Btrfs: part 2, fix incremental send's decision to delay a dir move/rename

Signed-off-by: Filipe David Borba Manana fdman...@gmail.com
---

V2: Added more tests.
V3: Added more tests for more complex cases.

 tests/btrfs/045 |  214 +++
 tests/btrfs/045.out |1 +
 tests/btrfs/group   |1 +
 3 files changed, 216 insertions(+)
 create mode 100755 tests/btrfs/045
 create mode 100644 tests/btrfs/045.out

diff --git a/tests/btrfs/045 b/tests/btrfs/045
new file mode 100755
index 000..85201e3
--- /dev/null
+++ b/tests/btrfs/045
@@ -0,0 +1,214 @@
+#! /bin/bash
+# FS QA Test No. btrfs/045
+#
+# Regression test for a btrfs incremental send issue where the kernel entered
+# an infinite loop building a path string. This happened when either of the
+# 2 following cases happened:
+#
+# 1) A directory was made a child of another directory which has a lower inode
+#number and has a pending move/rename operation;
+#
+# 2) A directory was made a child of another directory which has a higher inode
+#number, but the new parent wasn't moved nor renamed. Instead some other
+#ancestor higher in the hierarchy, with an higher inode number too, was
+#moved/renamed too.
+#
+# This issue is fixed by the following linux kernel btrfs patch:
+#
+#   Btrfs: fix incremental send's decision to delay a dir move/rename
+#   Btrfs: part 2, fix incremental send's decision to delay a dir move/rename
+#
+#---
+# Copyright (c) 2014 Filipe Manana.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo QA output created by $seq
+
+tmp=`mktemp -d`
+status=1   # failure is the default!
+trap _cleanup; exit \$status 0 1 2 3 15
+
+_cleanup()
+{
+rm -fr $tmp
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+
+# real QA test starts here
+_supported_fs btrfs
+_supported_os Linux
+_require_scratch
+_require_fssum
+_need_to_be_root
+
+rm -f $seqres.full
+
+_scratch_mkfs /dev/null 21
+_scratch_mount
+
+# case 1), mentioned above
+mkdir -p $SCRATCH_MNT/a/b
+mkdir $SCRATCH_MNT/a/c
+mkdir $SCRATCH_MNT/a/b/d
+touch $SCRATCH_MNT/a/file1
+touch $SCRATCH_MNT/a/b/file2
+mv $SCRATCH_MNT/a/file1 $SCRATCH_MNT/a/b/d/file3
+ln $SCRATCH_MNT/a/b/d/file3 $SCRATCH_MNT/a/b/file4
+mkdir $SCRATCH_MNT/a/b/f
+mv $SCRATCH_MNT/a/b $SCRATCH_MNT/a/c/b2
+touch $SCRATCH_MNT/a/c/b2/d/file5
+
+# case 2), mentioned above
+mkdir -p $SCRATCH_MNT/a/x1/x2
+mkdir $SCRATCH_MNT/a/Z
+mkdir -p $SCRATCH_MNT/a/x1/x2/x3/x4/x5
+
+# case 2) again, but a more complex scenario
+mkdir -p $SCRATCH_MNT/_a/_b/_c/_d
+mkdir $SCRATCH_MNT/_a/_b/_c/_d/_e
+mkdir $SCRATCH_MNT/_a/_b/_c/_d/_f
+mv $SCRATCH_MNT/_a/_b/_c/_d/_e $SCRATCH_MNT/_a/_b/_c/_d/_f/_E2
+mkdir $SCRATCH_MNT/_a/_b/_c/_g
+mv $SCRATCH_MNT/_a/_b/_c/_d $SCRATCH_MNT/_a/_b/_D2
+
+# Filesystem looks like:
+#
+# .   (ino 256)
+# |-- a/  (ino 257)
+# |   |-- c/  (ino 259)
+# |   |   |-- b2/ (ino 258)
+# |   |   |-- d/  (ino 260)
+# |   |   |   |-- file3   (ino 261)
+# |   |   |   |-- file5   (ino 264)
+# |   |   |
+# |   |   |-- file2   (ino 262)
+# |   |   |-- file4   (ino 261)
+# |   |   |-- f/  (ino 263)
+# |   |
+# |   |-- x1/ (ino 265)
+# |   |   |-- x2/ (ino 266)
+# |   |   |-- x3/ (ino 268)
+# |   |   |-- x4/ (ino 269)
+# |   |   

Re: [PATCH 5/6] Btrfs-progs: fsck: reduce memory usage of extent record struct

2014-03-18 Thread David Sterba
On Tue, Mar 18, 2014 at 08:02:46PM +0800, Wang Shilong wrote:
 @@ -2742,7 +2742,10 @@ static int add_extent_rec(struct cache_tree 
 *extent_cache,
 - rec-found_rec = extent_rec;
 + if (extent_rec)
 + rec-found_rec = 1;
 + else
 + rec-found_rec = 0;

I've modified this to avoid 'if'

rec-found_rec = !!extent_rec;

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/6] Btrfs-progs: fsck: reduce memory usage of extent record struct

2014-03-18 Thread Wang Shilong

On 03/19/2014 02:18 AM, David Sterba wrote:

On Tue, Mar 18, 2014 at 08:02:46PM +0800, Wang Shilong wrote:

@@ -2742,7 +2742,10 @@ static int add_extent_rec(struct cache_tree 
*extent_cache,
-   rec-found_rec = extent_rec;
+   if (extent_rec)
+   rec-found_rec = 1;
+   else
+   rec-found_rec = 0;

I've modified this to avoid 'if'

rec-found_rec = !!extent_rec;

Dave, thanks for doing this.:-)


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Please advise on repair action

2014-03-18 Thread Adam Khan
Hello,

I have a simple btrfs located on a dm-crypt volume. I'm getting a general 
protection fault when I 
attempt to access a specific directory in Thunar file manager and in a Python 
program.

The trace is attached for Thunar.

btrfsck returns this:

Checking filesystem on /dev/mapper/xyz_crypt
UUID: ...
found 88316880601 bytes used err is 1
total csum bytes: 180423792
total tree bytes: 291459072
total fs tree bytes: 50192384
total extent tree bytes: 12898304
btree space waste bytes: 55087032
file data blocks allocated: 352826490880
 referenced 184697802752
Btrfs v3.12

How should I proceed to repair this fs?

Best regards,

Adam
[  313.491347] general protection fault:  [#1] SMP 
[  313.491387] Modules linked in: ccm xt_conntrack xt_LOG xt_limit xt_tcpudp 
iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat 
nf_conntrack iptable_filter ip_tables x_tables rfcomm bnep deflate ctr 
twofish_generic twofish_x86_64_3way twofish_x86_64 twofish_common 
camellia_generic camellia_x86_64 serpent_sse2_x86_64 xts serpent_generic lrw 
gf128mul glue_helper blowfish_generic blowfish_x86_64 blowfish_common 
cast5_generic cast_common ablk_helper cryptd des_generic cmac xcbc rmd160 
sha512_ssse3 sha512_generic hmac crypto_null af_key xfrm_algo nfsd auth_rpcgss 
oid_registry nfs_acl nfs lockd fscache sunrpc ext4 mbcache jbd2 fuse parport_pc 
ppdev lp parport hid_generic joydev hid_lenovo_tpkbd usbhid hid sg btusb 
bluetooth crc16 usb_storage iTCO_wdt iTCO_vendor_support snd_hda_codec_conexant 
coretemp kvm_intel kvm psmouse serio_raw pcspkr evdev i2c_i801 lpc_ich mfd_core 
arc4 iwldvm mac80211 iwlwifi cfg80211 wmi battery thinkpad_acpi nvram rfkill ac 
snd_hda_intel snd_hda_codec tpm_tis snd_hwdep snd_pcm tpm snd_page_alloc 
snd_seq snd_seq_device snd_timer i915 snd video uhci_hcd ehci_pci 
drm_kms_helper button acpi_cpufreq ehci_hcd drm i2c_algo_bit e1000e i2c_core 
mei_me processor mei ptp pps_core soundcore usbcore usb_common btrfs crc32c 
libcrc32c xor raid6_pq sha256_ssse3 sha256_generic cbc dm_crypt dm_mod sd_mod 
crc_t10dif crct10dif_common ahci libahci libata scsi_mod thermal thermal_sys
[  313.492281] CPU: 1 PID: 3946 Comm: Thunar Not tainted 3.13-1-amd64 #1 Debian 
3.13.5-1
[  313.492313] Hardware name: LENOVO 7454CTO/7454CTO, BIOS 6DET71WW (3.21 ) 
12/13/2011
[  313.492345] task: 88022fe1c010 ti: 88022f6d8000 task.ti: 
88022f6d8000
[  313.492376] RIP: 0010:[8127c66d]  [8127c66d] 
memcpy+0xd/0x110
[  313.492414] RSP: 0018:88022f6d9970  EFLAGS: 00010206
[  313.492438] RAX: 8800aa2528b5 RBX: 034b RCX: 0069
[  313.492467] RDX: 0003 RSI: db738800 RDI: 8800aa2528b5
[  313.492496] RBP: 880225b9e9c0 R08:  R09: 1000
[  313.492525] R10:  R11:  R12: 6db6db6db6db6db7
[  313.492554] R13: 1600 R14: 8800aa252c00 R15: 034b
[  313.492584] FS:  7fe3282f7a00() GS:88023bc8() 
knlGS:
[  313.492620] CS:  0010 DS:  ES:  CR0: 80050033
[  313.492643] CR2: 7fe2e0029228 CR3: b7625000 CR4: 000407e0
[  313.492673] Stack:
[  313.492683]  a013f168  8800b8289000 
880225ac8c40
[  313.492724]   0c00 880225615330 
880227448658
[  313.492764]  a0125064 880225b9e8f0 1000 
8800aa252000
[  313.492804] Call Trace:
[  313.492836]  [a013f168] ? read_extent_buffer+0xc8/0x120 [btrfs]
[  313.492877]  [a0125064] ? btrfs_get_extent+0x8f4/0x950 [btrfs]
[  313.492917]  [a0138154] ? set_state_bits+0x34/0x70 [btrfs]
[  313.492957]  [a013b7b8] ? __do_readpage+0x378/0x730 [btrfs]
[  313.492995]  [a013a4dd] ? lock_extent_bits+0x6d/0x1c0 [btrfs]
[  313.493034]  [a0124770] ? btrfs_real_readdir+0x550/0x550 [btrfs]
[  313.493075]  [a013bf12] ? 
__extent_readpages.constprop.42+0x2d2/0x2f0 [btrfs]
[  313.493119]  [a0124770] ? btrfs_real_readdir+0x550/0x550 [btrfs]
[  313.493160]  [a013daa2] ? extent_readpages+0x182/0x190 [btrfs]
[  313.493201]  [a0124770] ? btrfs_real_readdir+0x550/0x550 [btrfs]
[  313.493234]  [811598a7] ? alloc_pages_current+0x97/0x150
[  313.493264]  [81121f03] ? __do_page_cache_readahead+0x193/0x240
[  313.493293]  [811223ba] ? ondemand_readahead+0x14a/0x280
[  313.493322]  [811186ee] ? generic_file_aio_read+0x4be/0x6e0
[  313.493350]  [81178d47] ? do_sync_read+0x57/0x90
[  313.493376]  [8117935b] ? vfs_read+0x8b/0x160
[  313.493399]  [81179e43] ? SyS_read+0x43/0xa0
[  313.493424]  [814adb39] ? system_call_fastpath+0x16/0x1b
[  313.493451] Code: fc ff ff 48 8b 43 58 48 2b 43 50 88 43 4e eb e9 90 90 90 
90 90 90 90 90 90 90 90 90 90 90 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 f3 48 
a5 89 d1 f3 a4 c3 20 4c 8b 06 4c 8b 4e 08 4c 8b 56 10 4c 
[