Re: [RFC PATCH 7/7] Btrfs: introduce BTRFS_IOC_SEND for btrfs send/receive (part 2)

2012-08-01 Thread Alexander Block
On Mon, Jul 23, 2012 at 5:17 PM, Alex Lyakas
alex.bolshoy.bt...@gmail.com wrote:
 Hi Alexander,
 I did some testing of the case where same inode, but with a different
 generation, exists both in send_root and in parent_root.
 I know that this can happen primarily when inode_cache option is
 enabled. So first I just tested some differential sends, where parent
 and root are unrelated subvolumes. Here are some issues:

 1) The top subvolume inode (ino=BTRFS_FIRST_FREE_OBJECTID) is treated
 also as deleted + recreated. So the code goes into process_all_refs()
 path and does several strange things, such as trying to orphanize the
 top inode. Also get_cur_path() always returns  for the top subvolume
 (without checking whether it is an orphan).  Another complication for
 the top inode is that its parent dir is itself.
 I made the following fix:
 @@ -3782,7 +3972,13 @@ static int changed_inode(struct send_ctx *sctx,

 right_gen = btrfs_inode_generation(sctx-right_path-nodes[0],
 right_ii);
 -   if (left_gen != right_gen)
 +   if (left_gen != right_gen  sctx-cur_ino !=
 BTRFS_FIRST_FREE_OBJECTID)
 sctx-cur_inode_new_gen = 1;

 So basically, don't try to delete and re-create it, but treat it like
 a change. Since the top subvolume inode is S_IFDIR, and dir can have
 only one hardlink (and hopefully it is always ..), we will never
 need to change anything for this INODE_REF. I also added:

 @@ -2526,6 +2615,14 @@ static int process_recorded_refs(struct send_ctx *sctx)
 int did_overwrite = 0;
 int is_orphan = 0;

 +   BUG_ON(sctx-cur_ino = BTRFS_FIRST_FREE_OBJECTID);
I applied both fixes to for-chris now.

 2) After I fixed this, I hit another issue, where inodes under the top
 subvolume dir, attempt to rmdir() the top dir, while iterating over
 check_dirs in process_recorded_refs(), because (top_dir_ino,
 top_dir_gen) indicate that it was deleted. So I added:

 @@ -2714,10 +2857,19 @@ verbose_printk(btrfs: process_recorded_refs
 %llu\n, sctx-cur_ino);
  */
 ULIST_ITER_INIT(uit);
 while ((un = ulist_next(check_dirs, uit))) {
 +   /* Do not attempt to rmdir it the top subvolume dir */
 +   if (un-val == BTRFS_FIRST_FREE_OBJECTID)
 +   continue;
 +
 if (un-val  sctx-cur_ino)
 continue;
I applied a similar fix by adding a check to can_rmdir. The way you
suggested would also skip utime updates for the top dir.

 3) process_recorded_refs() always increments the send_progress:
 /*
  * Current inode is now at it's new position, so we must increase
  * send_progress
  */
 sctx-send_progress = sctx-cur_ino + 1;

 However, in the changed_inode() path I am testing, process_all_refs()
 is called twice with the same inode (once for deleted inode, once for
 the recreated inode), so after the first call, send_progress is
 incremented and doesn't match the inode anymore. I don't think I hit
 any issues because of this, just that it's confusing.
I fixed this issue some days ago.

 4)

 +/*
 + * Record and process all refs at once. Needed when an inode changes the
 + * generation number, which means that it was deleted and recreated.
 + */
 +static int process_all_refs(struct send_ctx *sctx,
 +   enum btrfs_compare_tree_result cmd)
 +{
 +   int ret;
 +   struct btrfs_root *root;
 +   struct btrfs_path *path;
 +   struct btrfs_key key;
 +   struct btrfs_key found_key;
 +   struct extent_buffer *eb;
 +   int slot;
 +   iterate_inode_ref_t cb;
 +
 +   path = alloc_path_for_send();
 +   if (!path)
 +   return -ENOMEM;
 +
 +   if (cmd == BTRFS_COMPARE_TREE_NEW) {
 +   root = sctx-send_root;
 +   cb = __record_new_ref;
 +   } else if (cmd == BTRFS_COMPARE_TREE_DELETED) {
 +   root = sctx-parent_root;
 +   cb = __record_deleted_ref;
 +   } else {
 +   BUG();
 +   }
 +
 +   key.objectid = sctx-cmp_key-objectid;
 +   key.type = BTRFS_INODE_REF_KEY;
 +   key.offset = 0;
 +   while (1) {
 +   ret = btrfs_search_slot_for_read(root, key, path, 1, 0);
 +   if (ret  0) {
 +   btrfs_release_path(path);
 +   goto out;
 +   }
 +   if (ret) {
 +   btrfs_release_path(path);
 +   break;
 +   }
 +
 +   eb = path-nodes[0];
 +   slot = path-slots[0];
 +   btrfs_item_key_to_cpu(eb, found_key, slot);
 +
 +   if (found_key.objectid != key.objectid ||
 +   found_key.type != key.type) {
 +   btrfs_release_path(path);
 +   break;
 +   }
 +
 +   ret = iterate_inode_ref(sctx, 

Re: [RFC PATCH 7/7] Btrfs: introduce BTRFS_IOC_SEND for btrfs send/receive (part 2)

2012-07-25 Thread Alex Lyakas
Alexander,

 Same is true for BTRFS_FILE_EXTENT_PREALLOC extents, I think. Those
 also don't contain real data.
 So something like:
 if (left_disknr == 0 || left_type == BTRFS_FILE_EXTENT_REG) {
 ret = 1;
 goto out;
 }
 Do you mean || left_type == BTRFS_FILE_EXTENT_PREALLOC?

I see your point about bytenr==0, I missed that on the parent tree it
can be something else.

As for PREALLOC: can it happen that on differential send we see extent
of type BTRFS_FILE_EXTENT_PREALLOC? And can it happen that parent had
some real data extent in that place? I don't know the answer, but if
yes, then we must treat PREALLOC as normal extent. So this case is
similar to bytenr==0.

Thanks,
Alex.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 7/7] Btrfs: introduce BTRFS_IOC_SEND for btrfs send/receive (part 2)

2012-07-25 Thread Alexander Block
On Wed, Jul 25, 2012 at 7:20 PM, Alex Lyakas
alex.bolshoy.bt...@gmail.com wrote:
 Alexander,

 Same is true for BTRFS_FILE_EXTENT_PREALLOC extents, I think. Those
 also don't contain real data.
 So something like:
 if (left_disknr == 0 || left_type == BTRFS_FILE_EXTENT_REG) {
 ret = 1;
 goto out;
 }
 Do you mean || left_type == BTRFS_FILE_EXTENT_PREALLOC?

 I see your point about bytenr==0, I missed that on the parent tree it
 can be something else.

 As for PREALLOC: can it happen that on differential send we see extent
 of type BTRFS_FILE_EXTENT_PREALLOC? And can it happen that parent had
 some real data extent in that place? I don't know the answer, but if
 yes, then we must treat PREALLOC as normal extent. So this case is
 similar to bytenr==0.

I also don't know if that may happen. Currently, only REG extents are
checked by is_extent_unchanged. All other types are regarded as
changed and will be sent. So in the worst case the stream gets larget
then it should be, but we won't loose data. I need to leave in a few
minutes and will continue working on btrfs send/receive v2 later
today. We should probably postpone optimizations (actually bug
fixing) here for later...don't know if I find enough time to
investigate more.

 Thanks,
 Alex.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 7/7] Btrfs: introduce BTRFS_IOC_SEND for btrfs send/receive (part 2)

2012-07-23 Thread Alex Lyakas
Hi Alexander,
I did some testing of the case where same inode, but with a different
generation, exists both in send_root and in parent_root.
I know that this can happen primarily when inode_cache option is
enabled. So first I just tested some differential sends, where parent
and root are unrelated subvolumes. Here are some issues:

1) The top subvolume inode (ino=BTRFS_FIRST_FREE_OBJECTID) is treated
also as deleted + recreated. So the code goes into process_all_refs()
path and does several strange things, such as trying to orphanize the
top inode. Also get_cur_path() always returns  for the top subvolume
(without checking whether it is an orphan).  Another complication for
the top inode is that its parent dir is itself.
I made the following fix:
@@ -3782,7 +3972,13 @@ static int changed_inode(struct send_ctx *sctx,

right_gen = btrfs_inode_generation(sctx-right_path-nodes[0],
right_ii);
-   if (left_gen != right_gen)
+   if (left_gen != right_gen  sctx-cur_ino !=
BTRFS_FIRST_FREE_OBJECTID)
sctx-cur_inode_new_gen = 1;

So basically, don't try to delete and re-create it, but treat it like
a change. Since the top subvolume inode is S_IFDIR, and dir can have
only one hardlink (and hopefully it is always ..), we will never
need to change anything for this INODE_REF. I also added:

@@ -2526,6 +2615,14 @@ static int process_recorded_refs(struct send_ctx *sctx)
int did_overwrite = 0;
int is_orphan = 0;

+   BUG_ON(sctx-cur_ino = BTRFS_FIRST_FREE_OBJECTID);

2) After I fixed this, I hit another issue, where inodes under the top
subvolume dir, attempt to rmdir() the top dir, while iterating over
check_dirs in process_recorded_refs(), because (top_dir_ino,
top_dir_gen) indicate that it was deleted. So I added:

@@ -2714,10 +2857,19 @@ verbose_printk(btrfs: process_recorded_refs
%llu\n, sctx-cur_ino);
 */
ULIST_ITER_INIT(uit);
while ((un = ulist_next(check_dirs, uit))) {
+   /* Do not attempt to rmdir it the top subvolume dir */
+   if (un-val == BTRFS_FIRST_FREE_OBJECTID)
+   continue;
+
if (un-val  sctx-cur_ino)
continue;

3) process_recorded_refs() always increments the send_progress:
/*
 * Current inode is now at it's new position, so we must increase
 * send_progress
 */
sctx-send_progress = sctx-cur_ino + 1;

However, in the changed_inode() path I am testing, process_all_refs()
is called twice with the same inode (once for deleted inode, once for
the recreated inode), so after the first call, send_progress is
incremented and doesn't match the inode anymore. I don't think I hit
any issues because of this, just that it's confusing.

4)

 +/*
 + * Record and process all refs at once. Needed when an inode changes the
 + * generation number, which means that it was deleted and recreated.
 + */
 +static int process_all_refs(struct send_ctx *sctx,
 +   enum btrfs_compare_tree_result cmd)
 +{
 +   int ret;
 +   struct btrfs_root *root;
 +   struct btrfs_path *path;
 +   struct btrfs_key key;
 +   struct btrfs_key found_key;
 +   struct extent_buffer *eb;
 +   int slot;
 +   iterate_inode_ref_t cb;
 +
 +   path = alloc_path_for_send();
 +   if (!path)
 +   return -ENOMEM;
 +
 +   if (cmd == BTRFS_COMPARE_TREE_NEW) {
 +   root = sctx-send_root;
 +   cb = __record_new_ref;
 +   } else if (cmd == BTRFS_COMPARE_TREE_DELETED) {
 +   root = sctx-parent_root;
 +   cb = __record_deleted_ref;
 +   } else {
 +   BUG();
 +   }
 +
 +   key.objectid = sctx-cmp_key-objectid;
 +   key.type = BTRFS_INODE_REF_KEY;
 +   key.offset = 0;
 +   while (1) {
 +   ret = btrfs_search_slot_for_read(root, key, path, 1, 0);
 +   if (ret  0) {
 +   btrfs_release_path(path);
 +   goto out;
 +   }
 +   if (ret) {
 +   btrfs_release_path(path);
 +   break;
 +   }
 +
 +   eb = path-nodes[0];
 +   slot = path-slots[0];
 +   btrfs_item_key_to_cpu(eb, found_key, slot);
 +
 +   if (found_key.objectid != key.objectid ||
 +   found_key.type != key.type) {
 +   btrfs_release_path(path);
 +   break;
 +   }
 +
 +   ret = iterate_inode_ref(sctx, sctx-parent_root, path,
 +   found_key, 0, cb, sctx);

Shouldn't it be the root that you calculated eariler and not
sctx-parent_root? I guess in this case it doesn't matter, because
resolve is 0, and the passed root is only used for resolve. But
still confusing.

5) When I started testing with inode_cache 

Re: [RFC PATCH 7/7] Btrfs: introduce BTRFS_IOC_SEND for btrfs send/receive (part 2)

2012-07-23 Thread Alex Lyakas
Hi Arne,

(pls don't take this as if I pretend to have understood the code
better than you, because I have a list of questions for Alexander
too).

 +/*
 + * Helper function to generate a file name that is unique in the root of
 + * send_root and parent_root. This is used to generate names for orphan 
 inodes.
 + */
 +static int gen_unique_name(struct send_ctx *sctx,
 +u64 ino, u64 gen,
 +struct fs_path *dest)
 +{
 + int ret = 0;
 + struct btrfs_path *path;
 + struct btrfs_dir_item *di;
 + char tmp[64];
 + int len;
 + u64 idx = 0;
 +
 + path = alloc_path_for_send();
 + if (!path)
 + return -ENOMEM;
 +
 + while (1) {
 + len = snprintf(tmp, sizeof(tmp) - 1, o%llu-%llu-%llu,
 + ino, gen, idx);

 wouldn't it be easier to just take a uuid? This would save you a lot
 of code and especially the need to verify that the name is really
 unique, saving seeks.

As far as I understand the logic of orphans, the unique name should
depend only on the send_root and parent_root contents, which are both
frozen. So when you re-generate this name for a particular (ino,gen),
you must receive the same exact name every time. If the user has kind
of oXXX-YY-Z file(s) in the top dir by accident, then they are the
same every time we recalculate the orhpan name, so we get the same
result every time. Does it make sense?
So did you mean to generate a uuid here, and save it somewhere
in-memory, and later look it up based on (ino,gen)? Or you mean some
other improvement?

Thanks,
Alex.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 7/7] Btrfs: introduce BTRFS_IOC_SEND for btrfs send/receive (part 2)

2012-07-10 Thread Alex Lyakas
Alexander,
this focuses on area of sending file extents:

 +static int is_extent_unchanged(struct send_ctx *sctx,
 +  struct btrfs_path *left_path,
 +  struct btrfs_key *ekey)
 +{
 +   int ret = 0;
 +   struct btrfs_key key;
 +   struct btrfs_path *path = NULL;
 +   struct extent_buffer *eb;
 +   int slot;
 +   struct btrfs_key found_key;
 +   struct btrfs_file_extent_item *ei;
 +   u64 left_disknr;
 +   u64 right_disknr;
 +   u64 left_offset;
 +   u64 right_offset;
 +   u64 left_len;
 +   u64 right_len;
 +   u8 left_type;
 +   u8 right_type;
 +
 +   path = alloc_path_for_send();
 +   if (!path)
 +   return -ENOMEM;
 +
 +   eb = left_path-nodes[0];
 +   slot = left_path-slots[0];
 +
 +   ei = btrfs_item_ptr(eb, slot, struct btrfs_file_extent_item);
 +   left_type = btrfs_file_extent_type(eb, ei);
 +   left_disknr = btrfs_file_extent_disk_bytenr(eb, ei);
 +   left_len = btrfs_file_extent_num_bytes(eb, ei);
 +   left_offset = btrfs_file_extent_offset(eb, ei);
 +
 +   if (left_type != BTRFS_FILE_EXTENT_REG) {
 +   ret = 0;
 +   goto out;
 +   }
 +
 +   key.objectid = ekey-objectid;
 +   key.type = BTRFS_EXTENT_DATA_KEY;
 +   key.offset = ekey-offset;
 +
 +   while (1) {
 +   ret = btrfs_search_slot_for_read(sctx-parent_root, key, 
 path,
 +   0, 0);
 +   if (ret  0)
 +   goto out;
 +   if (ret) {
 +   ret = 0;
 +   goto out;
 +   }
 +   btrfs_item_key_to_cpu(path-nodes[0], found_key,
 +   path-slots[0]);
 +   if (found_key.objectid != key.objectid ||
 +   found_key.type != key.type) {
 +   ret = 0;
 +   goto out;
 +   }
 +
 +   eb = path-nodes[0];
 +   slot = path-slots[0];
 +
 +   ei = btrfs_item_ptr(eb, slot, struct btrfs_file_extent_item);
 +   right_type = btrfs_file_extent_type(eb, ei);
 +   right_disknr = btrfs_file_extent_disk_bytenr(eb, ei);
 +   right_len = btrfs_file_extent_num_bytes(eb, ei);
 +   right_offset = btrfs_file_extent_offset(eb, ei);
 +   btrfs_release_path(path);
 +
 +   if (right_type != BTRFS_FILE_EXTENT_REG) {
 +   ret = 0;
 +   goto out;
 +   }
 +
 +   if (left_disknr != right_disknr) {
 +   ret = 0;
 +   goto out;
 +   }
 +
 +   key.offset = found_key.offset + right_len;
 +   if (key.offset = ekey-offset + left_len) {
 +   ret = 1;
 +   goto out;
 +   }
 +   }
 +
 +out:
 +   btrfs_free_path(path);
 +   return ret;
 +}
 +

Should we always treat left extent with bytenr==0 as not changed?
Because right now, it simply reads and sends data of such extent,
while bytenr==0 means no data allocated here. Since we always do
send_truncate() afterwards, file size will always be correct, so we
can just skip bytenr==0 extents.
Same is true for BTRFS_FILE_EXTENT_PREALLOC extents, I think. Those
also don't contain real data.
So something like:
if (left_disknr == 0 || left_type == BTRFS_FILE_EXTENT_REG) {
ret = 1;
goto out;
}
before we check for BTRFS_FILE_EXTENT_REG.

Now I have a question about the rest of the logic that decides that
extent is unchanged. I understand that if we see the same extent (same
disk_bytenr) shared between parent_root and send_root, then it must
contain the same data, even in nodatacow mode, because on a first
write to such shared extent, it is cow'ed even with nodatacow.

However, shouldn't we check btrfs_file_extent_offset(), to make sure
that both send_root and parent_root point at the same offset into
extent from the same file offset? Because if extent_offset values are
different, then the data of the file might different, even though we
are talking about the same extent.

So I am thinking about something like:

- ekey.offset points at data at logical address
left_disknr+left_offset (logical address within CHUNK_ITEM address
space) for left_len bytes
- found_key.offset points at data at logical address
right_disknr+right_offset for right_len
- we know that found_key.offset = ekey.offset

So we need to ensure that left_disknr==right_disknr and also:
right_disknr+right_offset + (ekey.offset - found_key.offset) ==
left_disknr+left_offset
or does this while loop somehow ensures this equation?

However, I must admit I don't fully understand the logic behind
deciding that extent is unchanged. Can you pls explain what this tries
to accomplish, and why it decides that extent is unchanged here: