Re: [RFC PATCH 7/7] Btrfs: introduce BTRFS_IOC_SEND for btrfs send/receive (part 2)
On Mon, Jul 23, 2012 at 5:17 PM, Alex Lyakas alex.bolshoy.bt...@gmail.com wrote: Hi Alexander, I did some testing of the case where same inode, but with a different generation, exists both in send_root and in parent_root. I know that this can happen primarily when inode_cache option is enabled. So first I just tested some differential sends, where parent and root are unrelated subvolumes. Here are some issues: 1) The top subvolume inode (ino=BTRFS_FIRST_FREE_OBJECTID) is treated also as deleted + recreated. So the code goes into process_all_refs() path and does several strange things, such as trying to orphanize the top inode. Also get_cur_path() always returns for the top subvolume (without checking whether it is an orphan). Another complication for the top inode is that its parent dir is itself. I made the following fix: @@ -3782,7 +3972,13 @@ static int changed_inode(struct send_ctx *sctx, right_gen = btrfs_inode_generation(sctx-right_path-nodes[0], right_ii); - if (left_gen != right_gen) + if (left_gen != right_gen sctx-cur_ino != BTRFS_FIRST_FREE_OBJECTID) sctx-cur_inode_new_gen = 1; So basically, don't try to delete and re-create it, but treat it like a change. Since the top subvolume inode is S_IFDIR, and dir can have only one hardlink (and hopefully it is always ..), we will never need to change anything for this INODE_REF. I also added: @@ -2526,6 +2615,14 @@ static int process_recorded_refs(struct send_ctx *sctx) int did_overwrite = 0; int is_orphan = 0; + BUG_ON(sctx-cur_ino = BTRFS_FIRST_FREE_OBJECTID); I applied both fixes to for-chris now. 2) After I fixed this, I hit another issue, where inodes under the top subvolume dir, attempt to rmdir() the top dir, while iterating over check_dirs in process_recorded_refs(), because (top_dir_ino, top_dir_gen) indicate that it was deleted. So I added: @@ -2714,10 +2857,19 @@ verbose_printk(btrfs: process_recorded_refs %llu\n, sctx-cur_ino); */ ULIST_ITER_INIT(uit); while ((un = ulist_next(check_dirs, uit))) { + /* Do not attempt to rmdir it the top subvolume dir */ + if (un-val == BTRFS_FIRST_FREE_OBJECTID) + continue; + if (un-val sctx-cur_ino) continue; I applied a similar fix by adding a check to can_rmdir. The way you suggested would also skip utime updates for the top dir. 3) process_recorded_refs() always increments the send_progress: /* * Current inode is now at it's new position, so we must increase * send_progress */ sctx-send_progress = sctx-cur_ino + 1; However, in the changed_inode() path I am testing, process_all_refs() is called twice with the same inode (once for deleted inode, once for the recreated inode), so after the first call, send_progress is incremented and doesn't match the inode anymore. I don't think I hit any issues because of this, just that it's confusing. I fixed this issue some days ago. 4) +/* + * Record and process all refs at once. Needed when an inode changes the + * generation number, which means that it was deleted and recreated. + */ +static int process_all_refs(struct send_ctx *sctx, + enum btrfs_compare_tree_result cmd) +{ + int ret; + struct btrfs_root *root; + struct btrfs_path *path; + struct btrfs_key key; + struct btrfs_key found_key; + struct extent_buffer *eb; + int slot; + iterate_inode_ref_t cb; + + path = alloc_path_for_send(); + if (!path) + return -ENOMEM; + + if (cmd == BTRFS_COMPARE_TREE_NEW) { + root = sctx-send_root; + cb = __record_new_ref; + } else if (cmd == BTRFS_COMPARE_TREE_DELETED) { + root = sctx-parent_root; + cb = __record_deleted_ref; + } else { + BUG(); + } + + key.objectid = sctx-cmp_key-objectid; + key.type = BTRFS_INODE_REF_KEY; + key.offset = 0; + while (1) { + ret = btrfs_search_slot_for_read(root, key, path, 1, 0); + if (ret 0) { + btrfs_release_path(path); + goto out; + } + if (ret) { + btrfs_release_path(path); + break; + } + + eb = path-nodes[0]; + slot = path-slots[0]; + btrfs_item_key_to_cpu(eb, found_key, slot); + + if (found_key.objectid != key.objectid || + found_key.type != key.type) { + btrfs_release_path(path); + break; + } + + ret = iterate_inode_ref(sctx,
Re: [RFC PATCH 7/7] Btrfs: introduce BTRFS_IOC_SEND for btrfs send/receive (part 2)
Alexander, Same is true for BTRFS_FILE_EXTENT_PREALLOC extents, I think. Those also don't contain real data. So something like: if (left_disknr == 0 || left_type == BTRFS_FILE_EXTENT_REG) { ret = 1; goto out; } Do you mean || left_type == BTRFS_FILE_EXTENT_PREALLOC? I see your point about bytenr==0, I missed that on the parent tree it can be something else. As for PREALLOC: can it happen that on differential send we see extent of type BTRFS_FILE_EXTENT_PREALLOC? And can it happen that parent had some real data extent in that place? I don't know the answer, but if yes, then we must treat PREALLOC as normal extent. So this case is similar to bytenr==0. Thanks, Alex. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 7/7] Btrfs: introduce BTRFS_IOC_SEND for btrfs send/receive (part 2)
On Wed, Jul 25, 2012 at 7:20 PM, Alex Lyakas alex.bolshoy.bt...@gmail.com wrote: Alexander, Same is true for BTRFS_FILE_EXTENT_PREALLOC extents, I think. Those also don't contain real data. So something like: if (left_disknr == 0 || left_type == BTRFS_FILE_EXTENT_REG) { ret = 1; goto out; } Do you mean || left_type == BTRFS_FILE_EXTENT_PREALLOC? I see your point about bytenr==0, I missed that on the parent tree it can be something else. As for PREALLOC: can it happen that on differential send we see extent of type BTRFS_FILE_EXTENT_PREALLOC? And can it happen that parent had some real data extent in that place? I don't know the answer, but if yes, then we must treat PREALLOC as normal extent. So this case is similar to bytenr==0. I also don't know if that may happen. Currently, only REG extents are checked by is_extent_unchanged. All other types are regarded as changed and will be sent. So in the worst case the stream gets larget then it should be, but we won't loose data. I need to leave in a few minutes and will continue working on btrfs send/receive v2 later today. We should probably postpone optimizations (actually bug fixing) here for later...don't know if I find enough time to investigate more. Thanks, Alex. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 7/7] Btrfs: introduce BTRFS_IOC_SEND for btrfs send/receive (part 2)
Hi Alexander, I did some testing of the case where same inode, but with a different generation, exists both in send_root and in parent_root. I know that this can happen primarily when inode_cache option is enabled. So first I just tested some differential sends, where parent and root are unrelated subvolumes. Here are some issues: 1) The top subvolume inode (ino=BTRFS_FIRST_FREE_OBJECTID) is treated also as deleted + recreated. So the code goes into process_all_refs() path and does several strange things, such as trying to orphanize the top inode. Also get_cur_path() always returns for the top subvolume (without checking whether it is an orphan). Another complication for the top inode is that its parent dir is itself. I made the following fix: @@ -3782,7 +3972,13 @@ static int changed_inode(struct send_ctx *sctx, right_gen = btrfs_inode_generation(sctx-right_path-nodes[0], right_ii); - if (left_gen != right_gen) + if (left_gen != right_gen sctx-cur_ino != BTRFS_FIRST_FREE_OBJECTID) sctx-cur_inode_new_gen = 1; So basically, don't try to delete and re-create it, but treat it like a change. Since the top subvolume inode is S_IFDIR, and dir can have only one hardlink (and hopefully it is always ..), we will never need to change anything for this INODE_REF. I also added: @@ -2526,6 +2615,14 @@ static int process_recorded_refs(struct send_ctx *sctx) int did_overwrite = 0; int is_orphan = 0; + BUG_ON(sctx-cur_ino = BTRFS_FIRST_FREE_OBJECTID); 2) After I fixed this, I hit another issue, where inodes under the top subvolume dir, attempt to rmdir() the top dir, while iterating over check_dirs in process_recorded_refs(), because (top_dir_ino, top_dir_gen) indicate that it was deleted. So I added: @@ -2714,10 +2857,19 @@ verbose_printk(btrfs: process_recorded_refs %llu\n, sctx-cur_ino); */ ULIST_ITER_INIT(uit); while ((un = ulist_next(check_dirs, uit))) { + /* Do not attempt to rmdir it the top subvolume dir */ + if (un-val == BTRFS_FIRST_FREE_OBJECTID) + continue; + if (un-val sctx-cur_ino) continue; 3) process_recorded_refs() always increments the send_progress: /* * Current inode is now at it's new position, so we must increase * send_progress */ sctx-send_progress = sctx-cur_ino + 1; However, in the changed_inode() path I am testing, process_all_refs() is called twice with the same inode (once for deleted inode, once for the recreated inode), so after the first call, send_progress is incremented and doesn't match the inode anymore. I don't think I hit any issues because of this, just that it's confusing. 4) +/* + * Record and process all refs at once. Needed when an inode changes the + * generation number, which means that it was deleted and recreated. + */ +static int process_all_refs(struct send_ctx *sctx, + enum btrfs_compare_tree_result cmd) +{ + int ret; + struct btrfs_root *root; + struct btrfs_path *path; + struct btrfs_key key; + struct btrfs_key found_key; + struct extent_buffer *eb; + int slot; + iterate_inode_ref_t cb; + + path = alloc_path_for_send(); + if (!path) + return -ENOMEM; + + if (cmd == BTRFS_COMPARE_TREE_NEW) { + root = sctx-send_root; + cb = __record_new_ref; + } else if (cmd == BTRFS_COMPARE_TREE_DELETED) { + root = sctx-parent_root; + cb = __record_deleted_ref; + } else { + BUG(); + } + + key.objectid = sctx-cmp_key-objectid; + key.type = BTRFS_INODE_REF_KEY; + key.offset = 0; + while (1) { + ret = btrfs_search_slot_for_read(root, key, path, 1, 0); + if (ret 0) { + btrfs_release_path(path); + goto out; + } + if (ret) { + btrfs_release_path(path); + break; + } + + eb = path-nodes[0]; + slot = path-slots[0]; + btrfs_item_key_to_cpu(eb, found_key, slot); + + if (found_key.objectid != key.objectid || + found_key.type != key.type) { + btrfs_release_path(path); + break; + } + + ret = iterate_inode_ref(sctx, sctx-parent_root, path, + found_key, 0, cb, sctx); Shouldn't it be the root that you calculated eariler and not sctx-parent_root? I guess in this case it doesn't matter, because resolve is 0, and the passed root is only used for resolve. But still confusing. 5) When I started testing with inode_cache
Re: [RFC PATCH 7/7] Btrfs: introduce BTRFS_IOC_SEND for btrfs send/receive (part 2)
Hi Arne, (pls don't take this as if I pretend to have understood the code better than you, because I have a list of questions for Alexander too). +/* + * Helper function to generate a file name that is unique in the root of + * send_root and parent_root. This is used to generate names for orphan inodes. + */ +static int gen_unique_name(struct send_ctx *sctx, +u64 ino, u64 gen, +struct fs_path *dest) +{ + int ret = 0; + struct btrfs_path *path; + struct btrfs_dir_item *di; + char tmp[64]; + int len; + u64 idx = 0; + + path = alloc_path_for_send(); + if (!path) + return -ENOMEM; + + while (1) { + len = snprintf(tmp, sizeof(tmp) - 1, o%llu-%llu-%llu, + ino, gen, idx); wouldn't it be easier to just take a uuid? This would save you a lot of code and especially the need to verify that the name is really unique, saving seeks. As far as I understand the logic of orphans, the unique name should depend only on the send_root and parent_root contents, which are both frozen. So when you re-generate this name for a particular (ino,gen), you must receive the same exact name every time. If the user has kind of oXXX-YY-Z file(s) in the top dir by accident, then they are the same every time we recalculate the orhpan name, so we get the same result every time. Does it make sense? So did you mean to generate a uuid here, and save it somewhere in-memory, and later look it up based on (ino,gen)? Or you mean some other improvement? Thanks, Alex. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 7/7] Btrfs: introduce BTRFS_IOC_SEND for btrfs send/receive (part 2)
Alexander, this focuses on area of sending file extents: +static int is_extent_unchanged(struct send_ctx *sctx, + struct btrfs_path *left_path, + struct btrfs_key *ekey) +{ + int ret = 0; + struct btrfs_key key; + struct btrfs_path *path = NULL; + struct extent_buffer *eb; + int slot; + struct btrfs_key found_key; + struct btrfs_file_extent_item *ei; + u64 left_disknr; + u64 right_disknr; + u64 left_offset; + u64 right_offset; + u64 left_len; + u64 right_len; + u8 left_type; + u8 right_type; + + path = alloc_path_for_send(); + if (!path) + return -ENOMEM; + + eb = left_path-nodes[0]; + slot = left_path-slots[0]; + + ei = btrfs_item_ptr(eb, slot, struct btrfs_file_extent_item); + left_type = btrfs_file_extent_type(eb, ei); + left_disknr = btrfs_file_extent_disk_bytenr(eb, ei); + left_len = btrfs_file_extent_num_bytes(eb, ei); + left_offset = btrfs_file_extent_offset(eb, ei); + + if (left_type != BTRFS_FILE_EXTENT_REG) { + ret = 0; + goto out; + } + + key.objectid = ekey-objectid; + key.type = BTRFS_EXTENT_DATA_KEY; + key.offset = ekey-offset; + + while (1) { + ret = btrfs_search_slot_for_read(sctx-parent_root, key, path, + 0, 0); + if (ret 0) + goto out; + if (ret) { + ret = 0; + goto out; + } + btrfs_item_key_to_cpu(path-nodes[0], found_key, + path-slots[0]); + if (found_key.objectid != key.objectid || + found_key.type != key.type) { + ret = 0; + goto out; + } + + eb = path-nodes[0]; + slot = path-slots[0]; + + ei = btrfs_item_ptr(eb, slot, struct btrfs_file_extent_item); + right_type = btrfs_file_extent_type(eb, ei); + right_disknr = btrfs_file_extent_disk_bytenr(eb, ei); + right_len = btrfs_file_extent_num_bytes(eb, ei); + right_offset = btrfs_file_extent_offset(eb, ei); + btrfs_release_path(path); + + if (right_type != BTRFS_FILE_EXTENT_REG) { + ret = 0; + goto out; + } + + if (left_disknr != right_disknr) { + ret = 0; + goto out; + } + + key.offset = found_key.offset + right_len; + if (key.offset = ekey-offset + left_len) { + ret = 1; + goto out; + } + } + +out: + btrfs_free_path(path); + return ret; +} + Should we always treat left extent with bytenr==0 as not changed? Because right now, it simply reads and sends data of such extent, while bytenr==0 means no data allocated here. Since we always do send_truncate() afterwards, file size will always be correct, so we can just skip bytenr==0 extents. Same is true for BTRFS_FILE_EXTENT_PREALLOC extents, I think. Those also don't contain real data. So something like: if (left_disknr == 0 || left_type == BTRFS_FILE_EXTENT_REG) { ret = 1; goto out; } before we check for BTRFS_FILE_EXTENT_REG. Now I have a question about the rest of the logic that decides that extent is unchanged. I understand that if we see the same extent (same disk_bytenr) shared between parent_root and send_root, then it must contain the same data, even in nodatacow mode, because on a first write to such shared extent, it is cow'ed even with nodatacow. However, shouldn't we check btrfs_file_extent_offset(), to make sure that both send_root and parent_root point at the same offset into extent from the same file offset? Because if extent_offset values are different, then the data of the file might different, even though we are talking about the same extent. So I am thinking about something like: - ekey.offset points at data at logical address left_disknr+left_offset (logical address within CHUNK_ITEM address space) for left_len bytes - found_key.offset points at data at logical address right_disknr+right_offset for right_len - we know that found_key.offset = ekey.offset So we need to ensure that left_disknr==right_disknr and also: right_disknr+right_offset + (ekey.offset - found_key.offset) == left_disknr+left_offset or does this while loop somehow ensures this equation? However, I must admit I don't fully understand the logic behind deciding that extent is unchanged. Can you pls explain what this tries to accomplish, and why it decides that extent is unchanged here: