[PATCH 1/6] btrfs-progs: check: enable repair in lowmem mode
From: Su YueTurn on the option --repair with --mode==lowmem in btrfsck. Signed-off-by: Su Yue --- cmds-check.c | 8 +++- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/cmds-check.c b/cmds-check.c index c5faa2b..829f7c5 100644 --- a/cmds-check.c +++ b/cmds-check.c @@ -12844,12 +12844,10 @@ int cmd_check(int argc, char **argv) } /* -* Not supported yet +* experimental and dangerous */ - if (repair && check_mode == CHECK_MODE_LOWMEM) { - error("low memory mode doesn't support repair yet"); - exit(1); - } + if (repair && check_mode == CHECK_MODE_LOWMEM) + printf("Low memory mode supports repair partially\n"); radix_tree_init(); cache_tree_init(_cache); -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/6] btrfs-progs: check: Introduce repair_chunk_item()
From: Su YueBecause this patchset concentrates on repair of extent tree, repair_chunk_item() now only inserts missed chunk group item into extent tree. There are some things left TODO, for example dev_item. Signed-off-by: Su Yue --- cmds-check.c | 46 ++ 1 file changed, 46 insertions(+) diff --git a/cmds-check.c b/cmds-check.c index 0f26394..726e330 100644 --- a/cmds-check.c +++ b/cmds-check.c @@ -11543,6 +11543,50 @@ out: } /* + * Add block group item to the extent tree if @err contains + * REFERENCER_MISSING. + * TO DO: repair error about dev_item. + * + * Returns error after repair. + */ +static int repair_chunk_item(struct btrfs_trans_handle *trans, +struct btrfs_root *chunk_root, +struct btrfs_path *path, int err) +{ + struct btrfs_chunk *chunk; + struct btrfs_key chunk_key; + struct extent_buffer *eb = path->nodes[0]; + u64 length; + int slot = path->slots[0]; + u64 type; + int ret = 0; + + btrfs_item_key_to_cpu(eb, _key, slot); + if (chunk_key.type != BTRFS_CHUNK_ITEM_KEY) + return err; + chunk = btrfs_item_ptr(eb, slot, struct btrfs_chunk); + type = btrfs_chunk_type(path->nodes[0], chunk); + length = btrfs_chunk_length(eb, chunk); + + if (err & REFERENCER_MISSING) { + ret = btrfs_make_block_group(trans, chunk_root->fs_info, 0, +type, chunk_key.objectid, chunk_key.offset, length); + if (ret) { + error("fail to add block group item[%llu %llu]", + chunk_key.offset, length); + goto out; + } else { + err &= ~REFERENCER_MISSING; + printf("Added block group item[%llu %llu]\n", + chunk_key.offset, length); + } + } + +out: + return err; +} + +/* * Check a chunk item. * Including checking all referred dev_extents and block group */ @@ -11729,6 +11773,8 @@ again: break; case BTRFS_CHUNK_ITEM_KEY: ret = check_chunk_item(fs_info, eb, slot); + if (repair && ret) + ret = repair_chunk_item(trans, root, path, ret); err |= ret; break; case BTRFS_DEV_EXTENT_KEY: -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/6] btrfs-progs: check: delete wrong items in lowmem repair
From: Su YueIntroduce delete_extent_tree_item() and repair_extent_item() to do delete only. while checking a extent tree, just delete wrong item. For extent item, free wrong backref. Otherwise, do delete. So the rest items in extent tree should be correct. Signed-off-by: Su Yue --- cmds-check.c | 151 ++- 1 file changed, 138 insertions(+), 13 deletions(-) diff --git a/cmds-check.c b/cmds-check.c index 7c9036c..0f26394 100644 --- a/cmds-check.c +++ b/cmds-check.c @@ -11084,24 +11084,77 @@ out: } /* + * Only delete backref if REFERENCER_MISSING now + * + * Returns <0 the extent was deleted + * Returns >0 the backref was deleted but extent is still existed, + * returned value means err after repair + * Returns 0 nothing happened + */ +static int repair_extent_item(struct btrfs_trans_handle *trans, + struct btrfs_root *root, struct btrfs_path *path, + u64 bytenr, u64 num_bytes, u64 parent, u64 root_objectid, + u64 owner, u64 offset, int err) +{ + struct btrfs_key old_key; + int freed = 0; + int ret; + + btrfs_item_key_to_cpu(path->nodes[0], _key, path->slots[0]); + + if (err & (REFERENCER_MISSING | REFERENCER_MISMATCH)) { + /* delete the backref */ + ret = btrfs_free_extent(trans, root->fs_info->fs_root, bytenr, + num_bytes, parent, root_objectid, owner, offset); + if (!ret) { + freed = 1; + err &= ~REFERENCER_MISSING; + printf("Delete backref in extent [%llu %llu]\n", + bytenr, num_bytes); + } else { + error("fail to delete backref in extent [%llu %llu]\n", + bytenr, num_bytes); + } + } + + /* btrfs_free_extent may delete the extent */ + btrfs_release_path(path); + ret = btrfs_search_slot(NULL, root, _key, path, 0, 0); + + if (ret) + ret = -ENOENT; + else if (freed) + ret = err; + return ret; +} + +/* * This function will check a given extent item, including its backref and * itself (like crossing stripe boundary and type) * * Since we don't use extent_record anymore, introduce new error bit */ -static int check_extent_item(struct btrfs_fs_info *fs_info, -struct extent_buffer *eb, int slot) +static int check_extent_item(struct btrfs_trans_handle *trans, +struct btrfs_fs_info *fs_info, +struct btrfs_path *path) { struct btrfs_extent_item *ei; struct btrfs_extent_inline_ref *iref; struct btrfs_extent_data_ref *dref; + struct extent_buffer *eb = path->nodes[0]; unsigned long end; unsigned long ptr; + int slot = path->slots[0]; int type; u32 nodesize = btrfs_super_nodesize(fs_info->super_copy); u32 item_size = btrfs_item_size_nr(eb, slot); u64 flags; u64 offset; + u64 parent; + u64 num_bytes; + u64 root_objectid; + u64 owner; + u64 owner_offset; int metadata = 0; int level; struct btrfs_key key; @@ -11109,10 +11162,13 @@ static int check_extent_item(struct btrfs_fs_info *fs_info, int err = 0; btrfs_item_key_to_cpu(eb, , slot); - if (key.type == BTRFS_EXTENT_ITEM_KEY) + if (key.type == BTRFS_EXTENT_ITEM_KEY) { bytes_used += key.offset; - else + num_bytes = key.offset; + } else { bytes_used += nodesize; + num_bytes = nodesize; + } if (item_size < sizeof(*ei)) { /* @@ -11150,7 +11206,6 @@ static int check_extent_item(struct btrfs_fs_info *fs_info, level = key.offset; } end = (unsigned long)ei + item_size; - next: /* Reached extent item end normally */ if (ptr == end) @@ -11164,42 +11219,63 @@ next: goto out; } + parent = 0; + root_objectid = 0; + owner = 0; + owner_offset = 0; /* Now check every backref in this extent item */ iref = (struct btrfs_extent_inline_ref *)ptr; type = btrfs_extent_inline_ref_type(eb, iref); offset = btrfs_extent_inline_ref_offset(eb, iref); switch (type) { case BTRFS_TREE_BLOCK_REF_KEY: + root_objectid = offset; + owner = level; ret = check_tree_block_backref(fs_info, offset, key.objectid, level); err |= ret; break; case BTRFS_SHARED_BLOCK_REF_KEY: + parent = offset; ret =
[PATCH 5/6] [btrfs-progs: check: Introduce repair_tree_block_ref()
From: Su YueThe only thing repair_tree_block_ref() does is that adding backref of the tree_block. Just like what origin repair do: It first searches the correspond extent item then 1. If the extent item exists but backref is missing, add one backref to the extent. 2. Found nothing, just add an extent item and add one backref. Signed-off-by: Su Yue --- cmds-check.c | 147 +++ 1 file changed, 147 insertions(+) diff --git a/cmds-check.c b/cmds-check.c index 726e330..deebc70 100644 --- a/cmds-check.c +++ b/cmds-check.c @@ -2323,6 +2323,150 @@ static void account_bytes(struct btrfs_root *root, struct btrfs_path *path, } } +/* + * This function only handles BACKREF_MISSING, + * If correspond extent item exists, increase the ref; + * else insert an extent item and backref. + * + * Returns error bits after repair. + */ +static int repair_tree_block_ref(struct btrfs_trans_handle *trans, +struct btrfs_root *root, +struct extent_buffer *node, +struct node_refs *nrefs, int level, int err) +{ + struct btrfs_fs_info *fs_info = root->fs_info; + struct btrfs_root *extent_root = fs_info->extent_root; + struct btrfs_path path; + struct btrfs_extent_item *ei; + struct btrfs_tree_block_info *bi; + struct btrfs_key key; + struct extent_buffer *eb; + u32 size = sizeof(*ei); + u32 node_size = root->fs_info->nodesize; + int insert_extent = 0; + int skinny_metadata = btrfs_fs_incompat(fs_info, SKINNY_METADATA); + int root_level = btrfs_header_level(root->node); + int generation; + int ret; + u64 owner; + u64 bytenr; + u64 flags = BTRFS_EXTENT_FLAG_TREE_BLOCK; + u64 parent = 0; + + if ((err & BACKREF_MISSING) == 0) + return err; + + WARN_ON(level > BTRFS_MAX_LEVEL); + WARN_ON(level < 0); + + btrfs_init_path(); + bytenr = btrfs_header_bytenr(node); + owner = btrfs_header_owner(node); + generation = btrfs_header_generation(node); + + key.objectid = bytenr; + key.type = (u8)-1; + key.offset = (u64)-1; + + /* Search for the extent item */ + ret = btrfs_search_slot(NULL, extent_root, , , 0, 0); + if (ret <= 0) { + ret = -EIO; + goto out; + } + + ret = btrfs_previous_extent_item(extent_root, , bytenr); + if (ret) + insert_extent = 1; + + /* calculate the extent item flag is full backref or not */ + if (nrefs->full_backref[level] != 0) + flags |= BTRFS_BLOCK_FLAG_FULL_BACKREF; + + /* insert an extent item */ + if (insert_extent) { + struct btrfs_disk_key copy_key; + + generation = btrfs_header_generation(node); + + if (level < root_level && nrefs->full_backref[level + 1] && + owner != root->objectid) { + flags |= BTRFS_BLOCK_FLAG_FULL_BACKREF; + } + + key.objectid = bytenr; + if (!skinny_metadata) { + key.type = BTRFS_EXTENT_ITEM_KEY; + key.offset = node_size; + size += sizeof(*bi); + } else { + key.type = BTRFS_METADATA_ITEM_KEY; + key.offset = level; + } + + btrfs_release_path(); + ret = btrfs_insert_empty_item(trans, extent_root, , , + size); + if (ret) + goto out; + + eb = path.nodes[0]; + ei = btrfs_item_ptr(eb, path.slots[0], + struct btrfs_extent_item); + + btrfs_set_extent_refs(eb, ei, 0); + btrfs_set_extent_generation(eb, ei, generation); + btrfs_set_extent_flags(eb, ei, flags); + + if (!skinny_metadata) { + bi = (struct btrfs_tree_block_info *)(ei + 1); + memset_extent_buffer(eb, 0, (unsigned long)bi, +sizeof(*bi)); + btrfs_set_disk_key_objectid(_key, + root->objectid); + btrfs_set_disk_key_type(_key, 0); + btrfs_set_disk_key_offset(_key, 0); + + btrfs_set_tree_block_level(eb, bi, level); + btrfs_set_tree_block_key(eb, bi, _key); + } + btrfs_mark_buffer_dirty(eb); + printf("Added a extent item [%llu %u]\n", bytenr, + node_size); + btrfs_update_block_group(trans, extent_root, bytenr, node_size, +
[PATCH 0/6] btrfs-progs: check: extent tree lowmem repair
From: Su YueThis is part 2 of lowmem repair patchsets: 1. Change the way of traversal under lowmem check to use walk_up_tree_v2() and walk_down_tree_v2() and it scans all trees now. 2. Repair cases: block group missing, tree block backref missing, extent item mismatch, extent data mismatch and data extent backref missing. Methods to repair extent tree is similar as original mode. 1. Delete all wrong extents(REFERENCER_MISSING or REFERENCER_MISMATCH). 2. Traverse all trees and extent data to rebuild extent tree. some issues: 1. Because scan of all trees, the speed may be very very slow. 2. Unlike origin mode who gathers all things together and checks, extent tree repair in lowmem mode may print some incorrect information after check. But, data on disk is fine, next check should be OK. 3. After repair, some extent of extent tree nodes may be reported like "Extent buffer leak ", but next check is fine. (I don't know what's wrong) The reason why lowmem check has to check all trees is list as [patch 2/6] commit message. Although I have tested those code by images which tree level is 2 or 3 and has snapshots with option(--init-extent-tree). I am still worried some corner cases. Su Yue (6): btrfs-progs: check: enable repair in lowmem mode btrfs-progs: check: change traversal way of lowmem mode btrfs-progs: check: delete wrong items in lowmem repair btrfs-progs: check: Introduce repair_chunk_item() [btrfs-progs: check: Introduce repair_tree_block_ref() btrfs-progs: check: Introduce repair_extent_data_item() cmds-check.c | 1262 +++--- 1 file changed, 943 insertions(+), 319 deletions(-) -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/6] btrfs-progs: check: Introduce repair_extent_data_item()
From: Su YueThe only thing repair_extent_data_item() does is that adding backref of the tree_block. Just like what origin repair do: It first searches the correspond extent item then 1. If the extent item exists but backref is missing, add one backref to the extent. 2. Found nothing, just add an extent item and add one backref. Signed-off-by: Su Yue --- cmds-check.c | 117 +++ 1 file changed, 117 insertions(+) diff --git a/cmds-check.c b/cmds-check.c index deebc70..09c8d4d 100644 --- a/cmds-check.c +++ b/cmds-check.c @@ -10665,6 +10665,120 @@ out: } /* + * If @err contains BACKREF_MISSING then add extent of the + * file_extent_data_item. + * + * Returns error bits after reapir. + */ +static int repair_extent_data_item(struct btrfs_trans_handle *trans, + struct btrfs_root *root, + struct btrfs_path *pathp, + struct node_refs *nrefs, + int err) +{ + struct btrfs_file_extent_item *fi; + struct btrfs_key fi_key; + struct btrfs_key key; + struct btrfs_extent_item *ei; + struct btrfs_path path; + struct btrfs_root *extent_root = root->fs_info->extent_root; + struct extent_buffer *eb; + u64 size; + u64 disk_bytenr; + u64 num_bytes; + u64 parent; + u64 offset; + u64 extent_offset; + u64 file_offset; + int generation; + int slot; + int ret = 0; + + eb = pathp->nodes[0]; + slot = pathp->slots[0]; + btrfs_item_key_to_cpu(eb, _key, slot); + fi = btrfs_item_ptr(eb, slot, struct btrfs_file_extent_item); + + if (btrfs_file_extent_type(eb, fi) == BTRFS_FILE_EXTENT_INLINE || + btrfs_file_extent_disk_bytenr(eb, fi) == 0) + return err; + + file_offset = fi_key.offset; + generation = btrfs_file_extent_generation(eb, fi); + disk_bytenr = btrfs_file_extent_disk_bytenr(eb, fi); + num_bytes = btrfs_file_extent_disk_num_bytes(eb, fi); + extent_offset = btrfs_file_extent_offset(eb, fi); + offset = file_offset - extent_offset; + + /* now repair only adds backref */ + if ((err & BACKREF_MISSING) == 0) + return err; + + /* search extent item */ + key.objectid = disk_bytenr; + key.type = BTRFS_EXTENT_ITEM_KEY; + key.offset = num_bytes; + + btrfs_init_path(); + ret = btrfs_search_slot(NULL, extent_root, , , 0, 0); + if (ret < 0) { + ret = -EIO; + goto out; + } + + /* insert an extent item */ + if (ret > 0) { + key.objectid = disk_bytenr; + key.type = BTRFS_EXTENT_ITEM_KEY; + key.offset = num_bytes; + size = sizeof(*ei); + + btrfs_release_path(); + ret = btrfs_insert_empty_item(trans, extent_root, , , + size); + if (ret) + goto out; + eb = path.nodes[0]; + ei = btrfs_item_ptr(eb, path.slots[0], + struct btrfs_extent_item); + + btrfs_set_extent_refs(eb, ei, 0); + btrfs_set_extent_generation(eb, ei, generation); + btrfs_set_extent_flags(eb, ei, BTRFS_EXTENT_FLAG_DATA); + + btrfs_mark_buffer_dirty(eb); + ret = btrfs_update_block_group(trans, extent_root, disk_bytenr, + num_bytes, 1, 0); + btrfs_release_path(); + } + + if (nrefs->full_backref[0]) + parent = btrfs_header_bytenr(eb); + else + parent = 0; + + ret = btrfs_inc_extent_ref(trans, root, disk_bytenr, num_bytes, parent, + root->objectid, + parent ? BTRFS_FIRST_FREE_OBJECTID : fi_key.objectid, + offset); + if (ret) { + error("failed to increase extent data backref[%llu %llu] root %llu", + disk_bytenr, num_bytes, root->objectid); + goto out; + } else { + printf("Add one extent data backref [%llu %llu]\n", + disk_bytenr, num_bytes); + } + + err &= ~BACKREF_MISSING; +out: + if (ret) + error("can't repair root %llu extent data item[%llu %llu]", + root->objectid, disk_bytenr, num_bytes); + return err; +} + +/* * Check EXTENT_DATA item, mainly for its dbackref in extent tree * * Return >0 any error found and output error message @@ -11905,6 +12019,9 @@ again: switch (type) { case BTRFS_EXTENT_DATA_KEY: ret = check_extent_data_item(root, path, nrefs, account_bytes); +
[PATCH 2/6] btrfs-progs: check: change traversal way of lowmem mode
From: Su YueThis patch is a preparation for extent-tree repair in lowmem mode. In the lowmem mode, checking tree blocks of various tree is in recursive way. But if during repair, add or delete of item(s) may modify upper nodes which will cause the repair to be complicated and dangerous. Before this patch: One problem of lowmem check is that it only checks the lowest node's backref in check_tree_block_ref. This way ensures checked tree blocks are legal and avoids to traverse all trees for consideration about speed. However, there is one shortcoming that it can not detect backref mistake if one extent whose owner == offset but lacks of other backref(s). In check, correctness is more important than speed. If errors can not be detected, repair is impossible. Change of the patch: check_chunks_and_extents now has to check *ALL* trees so lowmem check will behave like original mode. Changing the way of traversal to be same as fs tree which calls walk_down_tree_v2() and walk_up_tree_v2() is easy for further repair. Signed-off-by: Su Yue --- cmds-check.c | 695 +-- 1 file changed, 443 insertions(+), 252 deletions(-) diff --git a/cmds-check.c b/cmds-check.c index 829f7c5..7c9036c 100644 --- a/cmds-check.c +++ b/cmds-check.c @@ -1878,10 +1878,15 @@ struct node_refs { u64 bytenr[BTRFS_MAX_LEVEL]; u64 refs[BTRFS_MAX_LEVEL]; int need_check[BTRFS_MAX_LEVEL]; + /* field for check all trees*/ + int checked[BTRFS_MAX_LEVEL]; + /* the correspond extent should mark as full backref or not */ + int full_backref[BTRFS_MAX_LEVEL]; }; static int update_nodes_refs(struct btrfs_root *root, u64 bytenr, -struct node_refs *nrefs, u64 level); +struct extent_buffer *eb, struct node_refs *nrefs, +u64 level, int check_all); static int check_inode_item(struct btrfs_root *root, struct btrfs_path *path, unsigned int ext_ref); @@ -1943,7 +1948,7 @@ again: ret = update_nodes_refs(root, path->nodes[i]->start, - nrefs, i); + path->nodes[i], nrefs, i, 0); if (ret) goto out; @@ -2062,25 +2067,42 @@ static int need_check(struct btrfs_root *root, struct ulist *roots) return 1; } +static int calc_extent_flag_v2(struct btrfs_root *root, + struct extent_buffer *eb, + u64 *flags_ret); /* * for a tree node or leaf, we record its reference count, so later if we still * process this node or leaf, don't need to compute its reference count again. + * + * @bytenr if @bytenr == (u64)-1, only update nrefs->full_backref[level] */ static int update_nodes_refs(struct btrfs_root *root, u64 bytenr, -struct node_refs *nrefs, u64 level) +struct extent_buffer *eb, struct node_refs *nrefs, +u64 level, int check_all) { - int check, ret; - u64 refs; struct ulist *roots; + u64 refs = 0; + u64 flags = 0; + int root_level = btrfs_header_level(root->node); + int check; + int ret; - if (nrefs->bytenr[level] != bytenr) { + if (nrefs->bytenr[level] == bytenr) + return 0; + + if (bytenr != (u64)-1) { + /* the return value of this function seems a mistake */ ret = btrfs_lookup_extent_info(NULL, root, bytenr, - level, 1, , NULL); - if (ret < 0) + level, 1, , ); + /* temporary fix */ + if (ret < 0 && !check_all) return ret; nrefs->bytenr[level] = bytenr; nrefs->refs[level] = refs; + nrefs->full_backref[level] = 0; + nrefs->checked[level] = 0; + if (refs > 1) { ret = btrfs_find_all_roots(NULL, root->fs_info, bytenr, 0, ); @@ -2091,13 +2113,56 @@ static int update_nodes_refs(struct btrfs_root *root, u64 bytenr, ulist_free(roots); nrefs->need_check[level] = check; } else { - nrefs->need_check[level] = 1; + if (!check_all) { + nrefs->need_check[level] = 1; + } else { + if (level == root_level) + nrefs->need_check[level] = 1; + else + /* +* the node refs may have not been updated +
RE: [PATCH] btrfs-progs: mkfs: Replace number with enum
> -Original Message- > From: David Sterba [mailto:dste...@suse.cz] > Sent: Tuesday, August 22, 2017 10:04 PM > To: Gu, Jinxiang/顾 金香; linux-btrfs@vger.kernel.org > Subject: Re: [PATCH] btrfs-progs: mkfs: Replace number with enum > > On Mon, Aug 21, 2017 at 07:39:49PM +0200, David Sterba wrote: > > > +/* roots: root tree, extent tree, chunk tree, dev tree, fs tree, > > > +csum tree */ enum btrfs_mkfs_block { > > > + SUPER_BLOCK = 0, > > > + ROOT_TREE, > > > + EXTENT_TREE, > > > + CHUNK_TREE, > > > + DEV_TREE, > > > + FS_TREE, > > > + CSUM_TREE, > > > + BLOCK_COUNT > > BLOCK_COUNT is 7 > > > > +}; > > > + > > > struct btrfs_mkfs_config { > > > /* Label of the new filesystem */ > > > const char *label; > > > @@ -43,7 +55,7 @@ struct btrfs_mkfs_config { > > > /* Output fields, set during creation */ > > > > > > /* Logical addresses of superblock [0] and other tree roots */ > > > - u64 blocks[8]; > > > + u64 blocks[BLOCK_COUNT]; > > This replaces 8 with 7 then, so the fs_uuid gets overwritten, can be also > caught by simply running 'make test-mkfs'. I made this change because block[7] is never used. I have run 'make test-mkfs', and get no error. Why need to make a u64 left before fs_uuid? > > > > char fs_uuid[BTRFS_UUID_UNPARSED_SIZE]; > > > char chunk_uuid[BTRFS_UUID_UNPARSED_SIZE]; > > > > > > -- > > > 2.9.4 > > > > > > > > > > > > -- > > > To unsubscribe from this list: send the line "unsubscribe > > > linux-btrfs" in the body of a message to majord...@vger.kernel.org > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" > > in the body of a message to majord...@vger.kernel.org More majordomo > > info at http://vger.kernel.org/majordomo-info.html >
[PATCH v5 1/6] Btrfs: heuristic make use compression workspaces
Move heuristic to external file Implement compression workspaces support for heuristic resources Signed-off-by: Timofey Titovets--- fs/btrfs/Makefile | 2 +- fs/btrfs/compression.c | 18 + fs/btrfs/compression.h | 7 - fs/btrfs/heuristic.c | 70 ++ 4 files changed, 84 insertions(+), 13 deletions(-) create mode 100644 fs/btrfs/heuristic.c diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile index 128ce17a80b0..6fa8479dff43 100644 --- a/fs/btrfs/Makefile +++ b/fs/btrfs/Makefile @@ -9,7 +9,7 @@ btrfs-y += super.o ctree.o extent-tree.o print-tree.o root-tree.o dir-item.o \ export.o tree-log.o free-space-cache.o zlib.o lzo.o \ compression.o delayed-ref.o relocation.o delayed-inode.o scrub.o \ reada.o backref.o ulist.o qgroup.o send.o dev-replace.o raid56.o \ - uuid-tree.o props.o hash.o free-space-tree.o + uuid-tree.o props.o hash.o free-space-tree.o heuristic.o btrfs-$(CONFIG_BTRFS_FS_POSIX_ACL) += acl.o btrfs-$(CONFIG_BTRFS_FS_CHECK_INTEGRITY) += check-integrity.o diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c index 883ecc58fd0d..f0aaf27bcc95 100644 --- a/fs/btrfs/compression.c +++ b/fs/btrfs/compression.c @@ -704,6 +704,7 @@ static struct { static const struct btrfs_compress_op * const btrfs_compress_op[] = { _zlib_compress, _lzo_compress, + _heuristic, }; void __init btrfs_init_compress(void) @@ -1065,18 +1066,13 @@ int btrfs_decompress_buf2page(const char *buf, unsigned long buf_start, */ int btrfs_compress_heuristic(struct inode *inode, u64 start, u64 end) { - u64 index = start >> PAGE_SHIFT; - u64 end_index = end >> PAGE_SHIFT; - struct page *page; - int ret = 1; + int ret; + enum btrfs_compression_type type = BTRFS_HEURISTIC; + struct list_head *workspace = find_workspace(type); - while (index <= end_index) { - page = find_get_page(inode->i_mapping, index); - kmap(page); - kunmap(page); - put_page(page); - index++; - } + ret = btrfs_compress_op[type-1]->heuristic(workspace, inode, + start, end); + free_workspace(type, workspace); return ret; } diff --git a/fs/btrfs/compression.h b/fs/btrfs/compression.h index 3b1b0ac15fdc..10e9ffa6dfa4 100644 --- a/fs/btrfs/compression.h +++ b/fs/btrfs/compression.h @@ -99,7 +99,8 @@ enum btrfs_compression_type { BTRFS_COMPRESS_NONE = 0, BTRFS_COMPRESS_ZLIB = 1, BTRFS_COMPRESS_LZO = 2, - BTRFS_COMPRESS_TYPES = 2, + BTRFS_HEURISTIC = 3, + BTRFS_COMPRESS_TYPES = 3, }; struct btrfs_compress_op { @@ -123,10 +124,14 @@ struct btrfs_compress_op { struct page *dest_page, unsigned long start_byte, size_t srclen, size_t destlen); + + int (*heuristic)(struct list_head *workspace, +struct inode *inode, u64 start, u64 end); }; extern const struct btrfs_compress_op btrfs_zlib_compress; extern const struct btrfs_compress_op btrfs_lzo_compress; +extern const struct btrfs_compress_op btrfs_heuristic; int btrfs_compress_heuristic(struct inode *inode, u64 start, u64 end); diff --git a/fs/btrfs/heuristic.c b/fs/btrfs/heuristic.c new file mode 100644 index ..96ae3e9334bc --- /dev/null +++ b/fs/btrfs/heuristic.c @@ -0,0 +1,70 @@ +/* + * Copyright (c) 2017 + * All rights reserved. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public + * License v2 as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + */ + +#include +#include +#include +#include +#include +#include +#include "compression.h" + +struct workspace { + struct list_head list; +}; + +static void heuristic_free_workspace(struct list_head *ws) +{ + struct workspace *workspace = list_entry(ws, struct workspace, list); + kfree(workspace); +} + +static struct list_head *heuristic_alloc_workspace(void) +{ + struct workspace *workspace; + + workspace = kzalloc(sizeof(*workspace), GFP_KERNEL); + if (!workspace) + return ERR_PTR(-ENOMEM); + + INIT_LIST_HEAD(>list); + + return >list; +} + +static int heuristic(struct list_head *ws, struct inode *inode, +u64 start, u64 end) +{ + struct page *page; + u64 index, index_end; + u8 *input_data; + + index = start >> PAGE_SHIFT; + index_end = end >> PAGE_SHIFT; + + for (; index <= index_end; index++) { +
[PATCH v5 2/6] Btrfs: heuristic workspace add bucket and sample items
Heuristic workspace: - Add bucket for storing byte type counters - Add sample array for storing partial copy of input data range - Add counter for store current sample size to workspace Signed-off-by: Timofey Titovets--- fs/btrfs/heuristic.c | 30 ++ 1 file changed, 30 insertions(+) diff --git a/fs/btrfs/heuristic.c b/fs/btrfs/heuristic.c index 96ae3e9334bc..2c2cadc9dfad 100644 --- a/fs/btrfs/heuristic.c +++ b/fs/btrfs/heuristic.c @@ -20,13 +20,36 @@ #include #include "compression.h" +#define READ_SIZE 16 +#define ITER_SHIFT 256 +#define BUCKET_SIZE 256 + +/* + * While mapping 128KiB range into pages (with 4k PAGE_SIZE as ex), + * and iterate that with index <= index_end + * code get 0-32 items, that a 33 pages + */ +#define MAX_INPUT_PAGES ((BTRFS_MAX_UNCOMPRESSED >> PAGE_SHIFT)+1) +#define MAX_SAMPLE_SIZE (MAX_INPUT_PAGES*PAGE_SIZE*READ_SIZE/ITER_SHIFT) + +struct bucket_item { + u32 count; +}; + struct workspace { + u8 *sample; + /* Partial copy of input data */ + u32 sample_size; + /* Bucket store counter for each byte type */ + struct bucket_item bucket[BUCKET_SIZE]; struct list_head list; }; static void heuristic_free_workspace(struct list_head *ws) { struct workspace *workspace = list_entry(ws, struct workspace, list); + + kvfree(workspace->sample); kfree(workspace); } @@ -38,9 +61,16 @@ static struct list_head *heuristic_alloc_workspace(void) if (!workspace) return ERR_PTR(-ENOMEM); + workspace->sample = kvmalloc(MAX_SAMPLE_SIZE, GFP_KERNEL); + if (!workspace->sample) + goto fail; + INIT_LIST_HEAD(>list); return >list; +fail: + heuristic_free_workspace(>list); + return ERR_PTR(-ENOMEM); } static int heuristic(struct list_head *ws, struct inode *inode, -- 2.14.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v5 3/6] Btrfs: implement heuristic sampling logic
Copy sample data from input data range to sample buffer then calculate byte type count for that sample into bucket. Signed-off-by: Timofey Titovets--- fs/btrfs/heuristic.c | 31 +-- 1 file changed, 29 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/heuristic.c b/fs/btrfs/heuristic.c index 2c2cadc9dfad..5336638a3b7c 100644 --- a/fs/btrfs/heuristic.c +++ b/fs/btrfs/heuristic.c @@ -76,20 +76,47 @@ static struct list_head *heuristic_alloc_workspace(void) static int heuristic(struct list_head *ws, struct inode *inode, u64 start, u64 end) { + struct workspace *workspace = list_entry(ws, struct workspace, list); struct page *page; u64 index, index_end; - u8 *input_data; + u8 *in_data; + u32 a, b; + u8 byte; + + /* +* Compression only handle first 128kb of input range +* And just shift over range in loop for compressing it. +* Let's do the same. +*/ + if (end - start > BTRFS_MAX_UNCOMPRESSED) + end = start + BTRFS_MAX_UNCOMPRESSED; index = start >> PAGE_SHIFT; index_end = end >> PAGE_SHIFT; + b = 0; for (; index <= index_end; index++) { page = find_get_page(inode->i_mapping, index); - input_data = kmap(page); + in_data = kmap(page); + a = 0; + while (a < PAGE_SIZE-READ_SIZE && b < MAX_SAMPLE_SIZE) { + memcpy(>sample[b], _data[a], READ_SIZE); + a += ITER_SHIFT; + b += READ_SIZE; + } kunmap(page); put_page(page); } + workspace->sample_size = b; + + memset(workspace->bucket, 0, sizeof(*workspace->bucket)*BUCKET_SIZE); + + for (a = 0; a < workspace->sample_size; a++) { + byte = workspace->sample[a]; + workspace->bucket[byte].count++; + } + return 1; } -- 2.14.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v5 4/6] Btrfs: heuristic add detection of zeroed sample
Use memcmp for check sample data to zeroes. Signed-off-by: Timofey Titovets--- fs/btrfs/heuristic.c | 18 ++ 1 file changed, 18 insertions(+) diff --git a/fs/btrfs/heuristic.c b/fs/btrfs/heuristic.c index 5336638a3b7c..4557ea1db373 100644 --- a/fs/btrfs/heuristic.c +++ b/fs/btrfs/heuristic.c @@ -73,6 +73,21 @@ static struct list_head *heuristic_alloc_workspace(void) return ERR_PTR(-ENOMEM); } +static bool sample_zeroed(struct workspace *workspace) +{ + u32 i; + u8 zero[READ_SIZE]; + + memset(, 0, sizeof(zero)); + + for (i = 0; i < workspace->sample_size; i += sizeof(zero)) { + if (memcmp(>sample[i], , sizeof(zero))) + return false; + } + + return true; +} + static int heuristic(struct list_head *ws, struct inode *inode, u64 start, u64 end) { @@ -110,6 +125,9 @@ static int heuristic(struct list_head *ws, struct inode *inode, workspace->sample_size = b; + if (sample_zeroed(workspace)) + return 1; + memset(workspace->bucket, 0, sizeof(*workspace->bucket)*BUCKET_SIZE); for (a = 0; a < workspace->sample_size; a++) { -- 2.14.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v5 5/6] Btrfs: heuristic add byte set calculation
Calculate byte set size for data sample: Calculate how many unique bytes has been in sample By count all bytes in bucket with count > 0 If byte set low (~25%), data are easily compressible Signed-off-by: Timofey Titovets--- fs/btrfs/heuristic.c | 26 ++ 1 file changed, 26 insertions(+) diff --git a/fs/btrfs/heuristic.c b/fs/btrfs/heuristic.c index 4557ea1db373..953428fde305 100644 --- a/fs/btrfs/heuristic.c +++ b/fs/btrfs/heuristic.c @@ -31,6 +31,7 @@ */ #define MAX_INPUT_PAGES ((BTRFS_MAX_UNCOMPRESSED >> PAGE_SHIFT)+1) #define MAX_SAMPLE_SIZE (MAX_INPUT_PAGES*PAGE_SIZE*READ_SIZE/ITER_SHIFT) +#define BYTE_SET_THRESHOLD 64 struct bucket_item { u32 count; @@ -73,6 +74,27 @@ static struct list_head *heuristic_alloc_workspace(void) return ERR_PTR(-ENOMEM); } +static int byte_set_size(const struct workspace *workspace) +{ + int a = 0; + int byte_set_size = 0; + + for (; a < BYTE_SET_THRESHOLD; a++) { + if (workspace->bucket[a].count > 0) + byte_set_size++; + } + + for (; a < BUCKET_SIZE; a++) { + if (workspace->bucket[a].count > 0) { + byte_set_size++; + if (byte_set_size > BYTE_SET_THRESHOLD) + return byte_set_size; + } + } + + return byte_set_size; +} + static bool sample_zeroed(struct workspace *workspace) { u32 i; @@ -135,6 +157,10 @@ static int heuristic(struct list_head *ws, struct inode *inode, workspace->bucket[byte].count++; } + a = byte_set_size(workspace); + if (a > BYTE_SET_THRESHOLD) + return 2; + return 1; } -- 2.14.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v5 6/6] Btrfs: heuristic add byte core set calculation
Calculate byte core set for data sample: Sort bucket's numbers in decreasing order Count how many numbers use 90% of sample If core set are low (<=25%), data are easily compressible If core set high (>=80%), data are not compressible Signed-off-by: Timofey Titovets--- fs/btrfs/heuristic.c | 51 ++- 1 file changed, 50 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/heuristic.c b/fs/btrfs/heuristic.c index 953428fde305..14128f77d5ae 100644 --- a/fs/btrfs/heuristic.c +++ b/fs/btrfs/heuristic.c @@ -18,6 +18,7 @@ #include #include #include +#include #include "compression.h" #define READ_SIZE 16 @@ -32,6 +33,8 @@ #define MAX_INPUT_PAGES ((BTRFS_MAX_UNCOMPRESSED >> PAGE_SHIFT)+1) #define MAX_SAMPLE_SIZE (MAX_INPUT_PAGES*PAGE_SIZE*READ_SIZE/ITER_SHIFT) #define BYTE_SET_THRESHOLD 64 +#define BYTE_CORE_SET_LOW BYTE_SET_THRESHOLD +#define BYTE_CORE_SET_HIGH 200 // ~80% struct bucket_item { u32 count; @@ -74,6 +77,45 @@ static struct list_head *heuristic_alloc_workspace(void) return ERR_PTR(-ENOMEM); } +/* For bucket sorting */ +static inline int bucket_compare(const void *lv, const void *rv) +{ + struct bucket_item *l = (struct bucket_item *)(lv); + struct bucket_item *r = (struct bucket_item *)(rv); + + return r->count - l->count; +} + +/* + * Byte Core set size + * How many bytes use 90% of sample + */ +static int byte_core_set_size(struct workspace *workspace) +{ + int a = 0; + u32 coreset_sum = 0; + struct bucket_item *bucket = workspace->bucket; + u32 core_set_threshold = workspace->sample_size*90/100; + + /* Sort in reverse order */ + sort(bucket, BUCKET_SIZE, sizeof(*bucket), +_compare, NULL); + + for (; a < BYTE_CORE_SET_LOW; a++) + coreset_sum += bucket[a].count; + + if (coreset_sum > core_set_threshold) + return a; + + for (; a < BYTE_CORE_SET_HIGH && bucket[a].count > 0; a++) { + coreset_sum += bucket[a].count; + if (coreset_sum > core_set_threshold) + break; + } + + return a; +} + static int byte_set_size(const struct workspace *workspace) { int a = 0; @@ -161,7 +203,14 @@ static int heuristic(struct list_head *ws, struct inode *inode, if (a > BYTE_SET_THRESHOLD) return 2; - return 1; + a = byte_core_set_size(workspace); + if (a <= BYTE_CORE_SET_LOW) + return 3; + + if (a >= BYTE_CORE_SET_HIGH) + return 0; + + return 4; } const struct btrfs_compress_op btrfs_heuristic = { -- 2.14.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v5 0/6] Btrfs: populate heuristic with code
Based on kdave for-next Patches short: 1. Move heuristic to use compression workspaces Bit tricky, but works. 2. Add heuristic counters and buffer to workspaces 3. Implement simple input data sampling It's get 16 byte samples with 256 bytes shifts over input data. Collect info about how many different bytes (symbols) has been found in sample data 4. Implement check sample to zeroes Just check all bytes in sample to 0 5. Add code for calculate how many unique bytes has been found in sample data That can fast detect easy compressible data 6. Add code for calculate byte core set size i.e. how many unique bytes use 90% of sample data That code require that numbers in bucket must be sorted That can detect easy compressible data with many repeated bytes That can detect not compressible data with evenly distributed bytes Changes v1 -> v2: - Change input data iterator shift 512 -> 256 - Replace magic macro numbers with direct values - Drop useless symbol population in bucket as no one care about where and what symbol stored in bucket at now Changes v2 -> v3 (only update #3 patch): - Fix u64 division problem by use u32 for input_size - Fix input size calculation start - end -> end - start - Add missing sort.h header Changes v3 -> v4 (only update #1 patch): - Change counter type in bucket item u16 -> u32 - Drop other fields from bucket item for now, no one use it Change v4 -> v5 - Move heuristic code to external file - Make heuristic use compression workspaces - Add check sample to zeroes Timofey Titovets (6): Btrfs: heuristic make use compression workspaces Btrfs: heuristic workspace add bucket and sample items Btrfs: Implement heuristic sampling logic Btrfs: heuristic add detection of zeroed sample Btrfs: heuristic add byte set calculation Btrfs: heuristic add byte core set calculation fs/btrfs/Makefile | 2 +- fs/btrfs/compression.c | 18 ++-- fs/btrfs/compression.h | 7 +- fs/btrfs/heuristic.c | 220 + 4 files changed, 234 insertions(+), 13 deletions(-) create mode 100644 fs/btrfs/heuristic.c -- 2.14.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs Raid5 issue.
On 2017年08月23日 00:37, Robert LeBlanc wrote: Thanks for the explanations. Chris, I don't think 'degraded' did anything to help the mounting, I just passed it in to see if it would help (I'm not sure if btrfs is "smart" enough to ignore a drive if it would increase the chance of mounting the volume even if it is degraded, but one could hope). I believe the key was 'nologreplay'. Here is some info about the corrupted fs: # btrfs fi show /tmp/root/ Label: 'kvm-btrfs' uuid: fef29f0a-dc4c-4cc4-b524-914e6630803c Total devices 3 FS bytes used 3.30TiB devid1 size 2.73TiB used 2.09TiB path /dev/bcache32 devid2 size 2.73TiB used 2.09TiB path /dev/bcache0 devid3 size 2.73TiB used 2.09TiB path /dev/bcache16 # btrfs fi usage /tmp/root/ WARNING: RAID56 detected, not implemented WARNING: RAID56 detected, not implemented WARNING: RAID56 detected, not implemented Overall: Device size: 8.18TiB Device allocated:0.00B Device unallocated:8.18TiB Device missing: 0.00B Used:0.00B Free (estimated):0.00B (min: 8.00EiB) Data ratio: 0.00 Metadata ratio: 0.00 Global reserve: 512.00MiB (used: 0.00B) Data,RAID5: Size:4.15TiB, Used:3.28TiB /dev/bcache02.08TiB /dev/bcache16 2.08TiB /dev/bcache32 2.08TiB Metadata,RAID5: Size:22.00GiB, Used:20.69GiB /dev/bcache0 11.00GiB /dev/bcache16 11.00GiB /dev/bcache32 11.00GiB System,RAID5: Size:64.00MiB, Used:400.00KiB /dev/bcache0 32.00MiB /dev/bcache16 32.00MiB /dev/bcache32 32.00MiB Unallocated: /dev/bcache0 655.00GiB /dev/bcache16 655.00GiB /dev/bcache32 656.49GiB So it looks like I set the metadata and system data to RAID5 and not RAID1. I guess that it could have been affected by the write hole causing the problem I was seeing. Since I get the same space usage with RAID1 and RAID5, Well, RAID1 has larger space usage than 3-disk RAID5. Space efficiency will be 50% for RAID1 while 66% for 3-disk RAID5. So you may lost some available space. I think I'm just going to use RAID1. I don't need stripe performance or anything like that. And RAID5/6 won't always improve performance. Especially when IO blocksize is smaller than full stripe size (in your case it's 128K). When doing sequential IO with blocksize smaller than 128K, there will be an obvious performance drop due to RMW cycle. This is not limited to Btrfs RAID56 but all RAID56. It would be nice if btrfs supported hotplug and re-plug a little better so that it is more "production" quality, but I just have to be patient. I'm familiar with Gluster and contributed code to Ceph, so I'm familiar with those types of distributed systems. I really like them, but the complexity is quite overkill for my needs at home. As far as bcache performance: I have two Crucial MX200 250GB drives that were md raid1 containing /boot (ext2), swap and then bcache. I have 2 WD Reds and a Seagate Barracuda Desktop drive all 3TB. With bcache in writeback, apt-get would be painfully slow. Running iostat, the SSDs would be doing a few hundred IOPs and the backing disks would be very busy and would be the limiting factor overall. Even though apt-get just downloaded the file (should be on the SSDs because of writeback), it still involved the backend disks way too much. The amount of dirty data was always less than 10% so there should have been plenty of space to free up cache without having to flush. I experimented with changing the size of contiguous IO to force more to cache, increasing the dirty ratio, etc, nothing seemed to provide the performance I was hoping. To be fair having a pair of SSDs (md raid1) caching three spindles (btrfs raid5) may not be an ideal configuration. If I had three SSDs, one for each drive, then it may have performed better?? I have also ~980 snapshots spread over a years time, so I don't know how much that impacts things. I did use a btrfs utility to help find duplicate files/chunks and dedupe them so that updated system binaries between upgraded LXC containers would use the same space on disk and be more efficient in bcache cache usage. Well, RAID1 ssd, offline dedupe, bcache, many snapshots, way more complex than I though. So I'm uncertain where the bottleneck is. After restoring the root and LXC roots snapshots on the SSD (broke the md raid1 so I could restore to one of them), I ran apt-get and got upwards to 2,400 IOPs with it being sustained around 1,200 IOPs (btrfs single on md raid1 degraded). I know that btrfs has some performance challenges, but I don't think I was hitting those. I was most likely a very unusual set-up of bcache and btrfs raid that caused the problem. I have bcache on 10 year old desktop box with a
user snapshots
On Tue 2017-08-22 (19:36), Peter Grandi wrote: > For somewhat good reasons subvolumes including snapshots cannot be > deleted by users though unless mount option 'user_subvol_rm_allowed' is > used. Also in https://btrfs.wiki.kernel.org/index.php/Mount_options "user_subvol_rm_allowed (...) Use with caution." Why? What is the problem? -- Ullrich Horlacher Server und Virtualisierung Rechenzentrum TIK Universitaet Stuttgart E-Mail: horlac...@tik.uni-stuttgart.de Allmandring 30aTel:++49-711-68565868 70569 Stuttgart (Germany) WWW:http://www.tik.uni-stuttgart.de/ REF:<22940.31139.194399.982...@tree.ty.sabi.co.uk> -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: netapp-alike snapshots?
On Tue 2017-08-22 (19:36), Peter Grandi wrote: > Indeed and there is a fair description of some options for > subvolume nesting policies here which may be interesting to the > original poster: > > https://btrfs.wiki.kernel.org/index.php/SysadminGuide#Layout > > It is unsurprising to me that there are tradeoffs involved in > every choice. I find the "Flat" layout particularly desirable. My layout is already nearly "flat". It seems my decision was right :-) > Btrfs snapshots can only be done for a whole subvolume. I know this. > Subvolumes and snapshots can be created by users, but too many snapshots > (see below) can cause trouble. For somewhat good reasons subvolumes > including snapshots cannot be deleted by users though unless mount option > 'user_subvol_rm_allowed' is used. Ooops, this is new to me! framstag@fex:~: btrfs subvolume create xx Create subvolume './xx' framstag@fex:~: btrfs subvolume delete xx Delete subvolume '/local/home/framstag/xx' ERROR: cannot delete '/local/home/framstag/xx' - Operation not permitted This means, root has to remove the subvolme. Is it possible to disallow creation of subvolumes for normal users? > >>> Because Netapp do it this way - for at least 20 years and we > >>> have a multi-PB Netapp storage environment. No chance to change > >>> this. > > Send patches :-). For waffle or btrfs? :-) > Assumptions that all Btrfs features such as snapshots are > infinitely scalable at no cost may be optimistic: > > > https://btrfs.wiki.kernel.org/index.php/Gotchas#Having_many_subvolumes_can_be_very_slow "when you do device removes on file systems with a lot of snapshots, it is unbelievably slow ... took nearly a week to move 20GB of FS data from one device to the other using that method" "a balance on 2TB of data that was heavily snapshotted - it took 3 months" ARGH!! Thanks for this warning! I will overthink my multi-snapshots plan! -- Ullrich Horlacher Server und Virtualisierung Rechenzentrum TIK Universitaet Stuttgart E-Mail: horlac...@tik.uni-stuttgart.de Allmandring 30aTel:++49-711-68565868 70569 Stuttgart (Germany) WWW:http://www.tik.uni-stuttgart.de/ REF:<22940.31139.194399.982...@tree.ty.sabi.co.uk> -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[josef-btrfs:slab-priority 4/6] fs//ntfs/attrib.c:2549:35: error: implicit declaration of function 'inode_to_bdi'
tree: https://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git slab-priority head: a1be3b41415243d20c90e9e92e82808fe1ff91a0 commit: fe049b0156a10dd0bb3fbf3d4dad3ca943874f10 [4/6] remove mapping from balance_dirty_pages*() config: i386-randconfig-a1-201734 (attached as .config) compiler: gcc-5 (Debian 5.4.1-2) 5.4.1 20160904 reproduce: git checkout fe049b0156a10dd0bb3fbf3d4dad3ca943874f10 # save the attached .config to linux build tree make ARCH=i386 All error/warnings (new ones prefixed by >>): fs//ntfs/attrib.c: In function 'ntfs_attr_set': >> fs//ntfs/attrib.c:2549:35: error: implicit declaration of function >> 'inode_to_bdi' [-Werror=implicit-function-declaration] balance_dirty_pages_ratelimited(inode_to_bdi(inode), ^ >> fs//ntfs/attrib.c:2549:35: warning: passing argument 1 of >> 'balance_dirty_pages_ratelimited' makes pointer from integer without a cast >> [-Wint-conversion] In file included from include/linux/memcontrol.h:31:0, from include/linux/swap.h:8, from fs//ntfs/attrib.c:26: include/linux/writeback.h:380:6: note: expected 'struct backing_dev_info *' but argument is of type 'int' void balance_dirty_pages_ratelimited(struct backing_dev_info *bdi, ^ fs//ntfs/attrib.c:2591:35: warning: passing argument 1 of 'balance_dirty_pages_ratelimited' makes pointer from integer without a cast [-Wint-conversion] balance_dirty_pages_ratelimited(inode_to_bdi(inode), ^ In file included from include/linux/memcontrol.h:31:0, from include/linux/swap.h:8, from fs//ntfs/attrib.c:26: include/linux/writeback.h:380:6: note: expected 'struct backing_dev_info *' but argument is of type 'int' void balance_dirty_pages_ratelimited(struct backing_dev_info *bdi, ^ fs//ntfs/attrib.c:2609:35: warning: passing argument 1 of 'balance_dirty_pages_ratelimited' makes pointer from integer without a cast [-Wint-conversion] balance_dirty_pages_ratelimited(inode_to_bdi(inode), ^ In file included from include/linux/memcontrol.h:31:0, from include/linux/swap.h:8, from fs//ntfs/attrib.c:26: include/linux/writeback.h:380:6: note: expected 'struct backing_dev_info *' but argument is of type 'int' void balance_dirty_pages_ratelimited(struct backing_dev_info *bdi, ^ cc1: some warnings being treated as errors vim +/inode_to_bdi +2549 fs//ntfs/attrib.c 2472 2473 /** 2474 * ntfs_attr_set - fill (a part of) an attribute with a byte 2475 * @ni: ntfs inode describing the attribute to fill 2476 * @ofs:offset inside the attribute at which to start to fill 2477 * @cnt:number of bytes to fill 2478 * @val:the unsigned 8-bit value with which to fill the attribute 2479 * 2480 * Fill @cnt bytes of the attribute described by the ntfs inode @ni starting at 2481 * byte offset @ofs inside the attribute with the constant byte @val. 2482 * 2483 * This function is effectively like memset() applied to an ntfs attribute. 2484 * Note thie function actually only operates on the page cache pages belonging 2485 * to the ntfs attribute and it marks them dirty after doing the memset(). 2486 * Thus it relies on the vm dirty page write code paths to cause the modified 2487 * pages to be written to the mft record/disk. 2488 * 2489 * Return 0 on success and -errno on error. An error code of -ESPIPE means 2490 * that @ofs + @cnt were outside the end of the attribute and no write was 2491 * performed. 2492 */ 2493 int ntfs_attr_set(ntfs_inode *ni, const s64 ofs, const s64 cnt, const u8 val) 2494 { 2495 ntfs_volume *vol = ni->vol; 2496 struct inode *inode = VFS_I(ni); 2497 struct address_space *mapping; 2498 struct page *page; 2499 u8 *kaddr; 2500 pgoff_t idx, end; 2501 unsigned start_ofs, end_ofs, size; 2502 2503 ntfs_debug("Entering for ofs 0x%llx, cnt 0x%llx, val 0x%hx.", 2504 (long long)ofs, (long long)cnt, val); 2505 BUG_ON(ofs < 0); 2506 BUG_ON(cnt < 0); 2507 if (!cnt) 2508 goto done; 2509 /* 2510 * FIXME: Compressed and encrypted attributes are not supported when 2511 * writing and we should never have gotten here for them. 2512 */ 2513 BUG_ON(NInoCompressed(ni)); 2514 BUG_ON(NInoEncrypted(ni)); 2515 mapping = VFS_I(ni)->i_mapping; 2516 /* Work out the starting index and page offset. */ 2517 idx = ofs >> PAGE_SHIFT; 2518 start_ofs = ofs & ~PAGE_MASK; 2519 /* Work out the ending
[PATCH][v2] btrfs: change how we decide to commit transactions during flushing
From: Josef BacikNikolay reported that generic/273 was failing currently with ENOSPC. Turns out this is because we get to the point where the outstanding reservations are greater than the pinned space on the fs. This is a mistake, previously we used the current reservation amount in may_commit_transaction, not the entire outstanding reservation amount. Fix this to find the minimum byte size needed to make progress in flushing, and pass that into may_commit_transaction. From there we can make a smarter decision on whether to commit the transaction or not. This fixes the failure in generic/273. Reported-by: Nikolay Borisov Signed-off-by: Josef Bacik --- v1->v2: - check the ticket bytes in may_commit_transaction instead of copying bytes around. - clean up may_commit_transaction to remove unused arguments fs/btrfs/extent-tree.c | 42 -- 1 file changed, 28 insertions(+), 14 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index a5d59dd..1464678 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -4836,6 +4836,13 @@ static void shrink_delalloc(struct btrfs_fs_info *fs_info, u64 to_reclaim, } } +struct reserve_ticket { + u64 bytes; + int error; + struct list_head list; + wait_queue_head_t wait; +}; + /** * maybe_commit_transaction - possibly commit the transaction if its ok to * @root - the root we're allocating for @@ -4847,18 +4854,29 @@ static void shrink_delalloc(struct btrfs_fs_info *fs_info, u64 to_reclaim, * will return -ENOSPC. */ static int may_commit_transaction(struct btrfs_fs_info *fs_info, - struct btrfs_space_info *space_info, - u64 bytes, int force) + struct btrfs_space_info *space_info) { + struct reserve_ticket *ticket = NULL; struct btrfs_block_rsv *delayed_rsv = _info->delayed_block_rsv; struct btrfs_trans_handle *trans; + u64 bytes; trans = (struct btrfs_trans_handle *)current->journal_info; if (trans) return -EAGAIN; - if (force) - goto commit; + spin_lock(_info->lock); + if (!list_empty(_info->priority_tickets)) + ticket = list_first_entry(_info->priority_tickets, + struct reserve_ticket, list); + else if (!list_empty(_info->tickets)) + ticket = list_first_entry(_info->tickets, + struct reserve_ticket, list); + bytes = (ticket) ? ticket->bytes : 0; + spin_unlock(_info->lock); + + if (!bytes) + return 0; /* See if there is enough pinned space to make this reservation */ if (percpu_counter_compare(_info->total_bytes_pinned, @@ -4873,8 +4891,12 @@ static int may_commit_transaction(struct btrfs_fs_info *fs_info, return -ENOSPC; spin_lock(_rsv->lock); + if (delayed_rsv->size > bytes) + bytes = 0; + else + bytes -= delayed_rsv->size; if (percpu_counter_compare(_info->total_bytes_pinned, - bytes - delayed_rsv->size) < 0) { + bytes) < 0) { spin_unlock(_rsv->lock); return -ENOSPC; } @@ -4888,13 +4910,6 @@ static int may_commit_transaction(struct btrfs_fs_info *fs_info, return btrfs_commit_transaction(trans); } -struct reserve_ticket { - u64 bytes; - int error; - struct list_head list; - wait_queue_head_t wait; -}; - /* * Try to flush some data based on policy set by @state. This is only advisory * and may fail for various reasons. The caller is supposed to examine the @@ -4944,8 +4959,7 @@ static void flush_space(struct btrfs_fs_info *fs_info, ret = 0; break; case COMMIT_TRANS: - ret = may_commit_transaction(fs_info, space_info, -num_bytes, 0); + ret = may_commit_transaction(fs_info, space_info); break; default: ret = -ENOSPC; -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: netapp-alike snapshots?
[ ... ] It is beneficial to not have snapshots in-place. With a local directory of snapshots, [ ... ] Indeed and there is a fair description of some options for subvolume nesting policies here which may be interesting to the original poster: https://btrfs.wiki.kernel.org/index.php/SysadminGuide#Layout It is unsurprising to me that there are tradeoffs involved in every choice. I find the "Flat" layout particularly desirable. >>> Netapp snapshots are invisible for tools doing opendir()/ >>> readdir() One could simulate this with symlinks for the >>> snapshot directory: store the snapshot elsewhere (not inplace) >>> and create a symlink to it, in every directory. More precisely in every subvolume root directory. >>> My users want the snapshots locally in a .snapshot >>> subdirectory. Btrfs snapshots can only be done for a whole subvolume. Subvolumes and snapshots can be created by users, but too many snapshots (see below) can cause trouble. For somewhat good reasons subvolumes including snapshots cannot be deleted by users though unless mount option 'user_subvol_rm_allowed' is used. >>> Because Netapp do it this way - for at least 20 years and we >>> have a multi-PB Netapp storage environment. No chance to change >>> this. Send patches :-). > Not only du works recursivly, but also find and with option > also ls, grep, etc. Note also that subvolume root directory inodes are indeed root directory inodes so they can be 'mount'ed and therefore the transition from a subvolume into a contained subvolume can be detected at the mountpoint. So 'find' has the '-xdev' option and 'du' has the '-x' options and so similarly nearly all other tools, so perhaps someone expects that to happen :-). > And it would require a bind mount for EVERY directory. There can > be hundreds... thousends! Assumptions that all Btrfs features such as snapshots are infinitely scalable at no cost may be optimistic: https://btrfs.wiki.kernel.org/index.php/Gotchas#Having_many_subvolumes_can_be_very_slow -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/5 v2] btrfs-progs: convert fixes + reiserfs support
On Thu, Jul 27, 2017 at 11:47:18AM -0400, je...@suse.com wrote: > From: Jeff Mahoney> > Changes since v1: > - reiserfs conversion: > - use bool instead of int > - catch 'impossible' condition of multiple discontiguous tails > - properly handle hole followed by tail > - add testing for combinations of real blocks, tails, and holes > - print error indicating filename (and key) that caused a failure > - constify buffer arg to btrfs_insert_inline_extent > - add tails=on to reiserfs mount options > - fixed absence of libreiserfscore to be non-fatal unless specificially > enabled > > - btrfs-progs: convert: use search_cache_extent in migrate_one_reserved_range > - In testing, we would not be able to roll back to part of the 0-1MB range > not being migrated. > > - btrfs-progs: tests: fix typo in convert-tests/008-readonly-image > - The test used ext2_save instead of ext2_saved as the filename Oh crap, I noticed v2 after I had finished merging v1. The changes have been now transferred, so the version in devel is v2 + my fixups. The 2 new patches ("use search_cache_extent in migrate_one_reserved_range" and "tests: fix typo in convert-tests/008-readonly-image") have been also added. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: netapp-alike snapshots?
On Tue 2017-08-22 (22:36), Roman Mamedov wrote: > > My users want the snapshots locally in a .snapshot subdirectory. > > Because Netapp do it this way - for at least 20 years and we have a > > multi-PB Netapp storage environment. > > Just a side note, you do know that only subvolumes can be snapshotted on > Btrfs, > not any regular directory? And that snapshots are not recursive, i.e. if a > subvolume "contains" other subvolumes (hint: it really doesn't), snapshots of > the parent one will not include content of subvolumes below that in the tree. Yes, I know this. But thanks for your hints! (Other readers here may be not aware of this) > I don't know how Netapp does this I am only a Netapp/waffle user, so I know no internals. Netapp is not Linux based and definitly a lot older than btrfs. > from the way you describe that setup it feels like with Btrfs you're > still in for some bad surprises and a part of your expectations will not > be met. I will take care :-) > Do you plan to make each and every directory and subdirectory a subvolume No. My idea is to place a symlink in every subdirectory pointing to the snapshot directory. Not yet programmed... I was hoping someone already has implemented such a feature. -- Ullrich Horlacher Server und Virtualisierung Rechenzentrum TIK Universitaet Stuttgart E-Mail: horlac...@tik.uni-stuttgart.de Allmandring 30aTel:++49-711-68565868 70569 Stuttgart (Germany) WWW:http://www.tik.uni-stuttgart.de/ REF:<2017083647.350ca27d@natsu> -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: netapp-alike snapshots?
On Tue 2017-08-22 (19:19), A L wrote: > Perhaps using a bind mount? It would look and work the same as a ordinary fs. > Just need to make sure du uses one filesystem. > > From: Ulli Horlacher-- Sent: 2017-08-22 > - 18:57 > > > On Tue 2017-08-22 (21:45), Roman Mamedov wrote: > > > >> It is beneficial to not have snapshots in-place. With a local directory of > >> snapshots, issuing things like "find", "grep -r" or even "du" will take an > >> inordinate amount of time and will produce a result you do not expect. > > > > Netapp snapshots are invisible for tools doing opendir()/readdir() > > One could simulate this with symlinks for the snapshot directory: > > store the snapshot elsewhere (not inplace) and create a symlink to it, in > > every directory. Not only du works recursivly, but also find and with option also ls, grep, etc. And it would require a bind mount for EVERY directory. There can be hundreds... thousends! -- Ullrich Horlacher Server und Virtualisierung Rechenzentrum TIK Universitaet Stuttgart E-Mail: horlac...@tik.uni-stuttgart.de Allmandring 30aTel:++49-711-68565868 70569 Stuttgart (Germany) WWW:http://www.tik.uni-stuttgart.de/ REF: -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: finding root filesystem of a subvolume?
[ ... ] >> There is no fixed relationship between the root directory >> inode of a subvolume and the root directory inode of any >> other subvolume or the main volume. > Actually, there is, because it's inherently rooted in the > hierarchy of the volume itself. That root inode for the > subvolume is anchored somewhere under the next higher > subvolume. This stupid point relies on ignoring that it is not mandatory to mount the main volume, and that therefore "There is no fixed relationship between the root directory inode of a subvolume and the root directory inode of any other subvolume or the main volume", because the "root directory inode" of the "main volume" may not be mounted at all. This stupid point also relies on ignoring that subvolumes can be mounted *also* under another directory, even if the main volume is mounted somewhere else. Suppose that the following applies: subvol=5 /local subvol=383/local/.backup/home subvol=383/mnt/home-backup and you are given the mountpoint '/mnt/home-backup', how can you find the main volume mountpoint '/local' from that? Please explain how '/mnt/home-backup' is indeed "inherently rooted in the hierarchy of the volume itself", because there is always a "fixed relationship between the root directory inode of a subvolume and the root directory inode of any other subvolume or the main volume". [ ... ] > Again, it does, it's just not inherently exposed to userspace > unless you mount the top-level subvolume (subvolid=5 and/or > subvol=/ in mount options). This extra stupid point is based on ignoring that to "mount the top-level subvolume" relies on knowing already which one is the "top-level subvolume", which is begging the question. [ ... ] -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: netapp-alike snapshots?
On Tue, 22 Aug 2017 18:57:25 +0200 Ulli Horlacherwrote: > On Tue 2017-08-22 (21:45), Roman Mamedov wrote: > > > It is beneficial to not have snapshots in-place. With a local directory of > > snapshots, issuing things like "find", "grep -r" or even "du" will take an > > inordinate amount of time and will produce a result you do not expect. > > Netapp snapshots are invisible for tools doing opendir()/readdir() > One could simulate this with symlinks for the snapshot directory: > store the snapshot elsewhere (not inplace) and create a symlink to it, in > every directory. > > > > Personally I prefer to have a /snapshots directory on every FS > > My users want the snapshots locally in a .snapshot subdirectory. > Because Netapp do it this way - for at least 20 years and we have a > multi-PB Netapp storage environment. > No chance to change this. Just a side note, you do know that only subvolumes can be snapshotted on Btrfs, not any regular directory? And that snapshots are not recursive, i.e. if a subvolume "contains" other subvolumes (hint: it really doesn't), snapshots of the parent one will not include content of subvolumes below that in the tree. I don't know how Netapp does this, from the way you describe that setup it feels like with Btrfs you're still in for some bad surprises and a part of your expectations will not be met. Do you plan to make each and every directory and subdirectory a subvolume (so that it could have a trail of its own snapshots)? There will be performance implications to that. Also deleting subvolumes can only be done via the "btrfs" tool, they won't delete like normal dirs, e.g. when trying to do that remotely via NFS or Samba share. -- With respect, Roman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: netapp-alike snapshots?
Perhaps using a bind mount? It would look and work the same as a ordinary fs. Just need to make sure du uses one filesystem. From: Ulli Horlacher-- Sent: 2017-08-22 - 18:57 > On Tue 2017-08-22 (21:45), Roman Mamedov wrote: > >> It is beneficial to not have snapshots in-place. With a local directory of >> snapshots, issuing things like "find", "grep -r" or even "du" will take an >> inordinate amount of time and will produce a result you do not expect. > > Netapp snapshots are invisible for tools doing opendir()/readdir() > One could simulate this with symlinks for the snapshot directory: > store the snapshot elsewhere (not inplace) and create a symlink to it, in > every directory. > > >> Personally I prefer to have a /snapshots directory on every FS > > My users want the snapshots locally in a .snapshot subdirectory. > Because Netapp do it this way - for at least 20 years and we have a > multi-PB Netapp storage environment. > No chance to change this. > > -- > Ullrich Horlacher Server und Virtualisierung > Rechenzentrum TIK > Universitaet Stuttgart E-Mail: horlac...@tik.uni-stuttgart.de > Allmandring 30aTel:++49-711-68565868 > 70569 Stuttgart (Germany) WWW:http://www.tik.uni-stuttgart.de/ > REF:<20170822214531.44538589@natsu> > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] btrfs-progs: convert: add support for converting reiserfs
On Tue, Jul 25, 2017 at 04:54:43PM -0400, je...@suse.com wrote: > From: Jeff Mahoney> > This patch adds support to convert reiserfs file systems in-place to btrfs. > > It will convert extended attribute files to btrfs extended attributes, > translate ACLs, coalesce tails that consist of multiple items into one item, > and convert tails that are too big into indirect files. > > This requires that libreiserfscore 3.6.27 be available. > > Many of the test cases for convert apply regardless of what the source > file system is and using ext4 is sufficient. I've included several > test cases that are reiserfs-specific. > > Signed-off-by: Jeff Mahoney Patches merged, with quite a few small fixups here and there. It took me less time to fix them on the way than to take it through the mailinglist. The tests were split to separate patch. I'm fine with keeping the reiserfs bits in one patch as it's an isolated feature. The tests are now running, I'll let it finish. There's some code or test code duplication that can be cleaned up eventually, but now I consider reiserfs conversion support done. Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: netapp-alike snapshots?
On Tue 2017-08-22 (21:45), Roman Mamedov wrote: > It is beneficial to not have snapshots in-place. With a local directory of > snapshots, issuing things like "find", "grep -r" or even "du" will take an > inordinate amount of time and will produce a result you do not expect. Netapp snapshots are invisible for tools doing opendir()/readdir() One could simulate this with symlinks for the snapshot directory: store the snapshot elsewhere (not inplace) and create a symlink to it, in every directory. > Personally I prefer to have a /snapshots directory on every FS My users want the snapshots locally in a .snapshot subdirectory. Because Netapp do it this way - for at least 20 years and we have a multi-PB Netapp storage environment. No chance to change this. -- Ullrich Horlacher Server und Virtualisierung Rechenzentrum TIK Universitaet Stuttgart E-Mail: horlac...@tik.uni-stuttgart.de Allmandring 30aTel:++49-711-68565868 70569 Stuttgart (Germany) WWW:http://www.tik.uni-stuttgart.de/ REF:<20170822214531.44538589@natsu> -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: finding root filesystem of a subvolume?
On Tue, 22 Aug 2017 17:45:37 +0200 Ulli Horlacherwrote: > In perl I have now: > > $root = $volume; > while (`btrfs subvolume show "$root" 2>/dev/null` !~ /toplevel subvolume/) { > $root = dirname($root); > last if $root eq '/'; > } > > If you are okay with rolling your own solutions like this, take a look at "btrfs filesystem usage ". It will print the blockdevice used for mounting the base FS. From that you can find the mountpoint via /proc/mounts. Performance-wise it seems to work instantly on an almost full 2TB FS. -- With respect, Roman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: netapp-alike snapshots?
On Tue 2017-08-22 (18:08), Peter Becker wrote: > This is possible. Use the -b or -B option. > > -b basedir places the snapshot in basedir with a directory structure > that mimics the mountpoint > -B basedir places the snapshots in basedir with NO additional > subdirectory structure > > 2017-08-22 16:24 GMT+02:00 Ulli Horlacher: > > On Tue 2017-08-22 (15:44), Peter Becker wrote: > >> Is use: https://github.com/jf647/btrfs-snap > >> > >> 2017-08-22 15:22 GMT+02:00 Ulli Horlacher : > >> > With Netapp/waffle you have automatic hourly/daily/weekly snapshots. > >> > You can find these snapshots in every local directory (readonly). > >> > Example: > >> > > >> > framstag@fex:/sw/share: ll .snapshot/ > >> > drwxr-xr-x framstag root - 2017-08-14 10:21:47 > >> > .snapshot/daily.2017-08-15_0010 > >> > drwxr-xr-x framstag root - 2017-08-14 10:21:47 > >> > .snapshot/daily.2017-08-16_0010 > >> > drwxr-xr-x framstag root - 2017-08-14 10:21:47 > >> > .snapshot/daily.2017-08-17_0010 > >> > drwxr-xr-x framstag root - 2017-08-14 10:21:47 > >> > .snapshot/daily.2017-08-18_0010 > >> > drwxr-xr-x framstag root - 2017-08-18 23:59:29 > >> > .snapshot/daily.2017-08-19_0010 > >> > drwxr-xr-x framstag root - 2017-08-19 21:01:25 > >> > .snapshot/daily.2017-08-20_0010 > >> > drwxr-xr-x framstag root - 2017-08-20 19:48:40 > >> > .snapshot/daily.2017-08-21_0010 > >> > drwxr-xr-x framstag root - 2017-08-20 02:50:18 > >> > .snapshot/hourly.2017-08-20_1210 > >> > drwxr-xr-x framstag root - 2017-08-20 02:50:18 > >> > .snapshot/hourly.2017-08-20_1610 > >> > drwxr-xr-x framstag root - 2017-08-20 19:48:40 > >> > .snapshot/hourly.2017-08-20_2010 > >> > drwxr-xr-x framstag root - 2017-08-21 00:42:28 > >> > .snapshot/hourly.2017-08-21_0810 > >> > drwxr-xr-x framstag root - 2017-08-21 00:42:28 > >> > .snapshot/hourly.2017-08-21_1210 > >> > drwxr-xr-x framstag root - 2017-08-21 13:05:28 > >> > .snapshot/hourly.2017-08-21_1610 > > > > btrfs-snap does not create local .snapshot/ sub-directories, but saves the > > snapshots in the toplevel root volume directory. No, I want in EVERY directory of the sourcetree a subdirectory named snapshot, example: framstag@fex:/sw/share: ll .snapshot a*/.snapshot a*/*/.snapshot drwxrwxrwx root root - 2017-08-22 16:10:01 .snapshot drwxrwxrwx root root - 2017-08-22 16:10:01 aggis-1.0/.snapshot drwxrwxrwx root root - 2017-08-22 16:10:01 aggis-1.0/bin/.snapshot drwxrwxrwx root root - 2017-08-22 16:10:01 aggis-1.0/man/.snapshot (this is on a Netapp NFS volume) btrfs-snap creates a snapshot directory tree on a different path. -- Ullrich Horlacher Server und Virtualisierung Rechenzentrum TIK Universitaet Stuttgart E-Mail: horlac...@tik.uni-stuttgart.de Allmandring 30aTel:++49-711-68565868 70569 Stuttgart (Germany) WWW:http://www.tik.uni-stuttgart.de/ REF: -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: netapp-alike snapshots?
On Tue, 22 Aug 2017 16:24:51 +0200 Ulli Horlacherwrote: > On Tue 2017-08-22 (15:44), Peter Becker wrote: > > Is use: https://github.com/jf647/btrfs-snap > > > > 2017-08-22 15:22 GMT+02:00 Ulli Horlacher : > > > With Netapp/waffle you have automatic hourly/daily/weekly snapshots. > > > You can find these snapshots in every local directory (readonly). > > > Example: > > > > > > framstag@fex:/sw/share: ll .snapshot/ > > > drwxr-xr-x framstag root - 2017-08-14 10:21:47 > > > .snapshot/daily.2017-08-15_0010 > > > drwxr-xr-x framstag root - 2017-08-14 10:21:47 > > > .snapshot/daily.2017-08-16_0010 > > > drwxr-xr-x framstag root - 2017-08-14 10:21:47 > > > .snapshot/daily.2017-08-17_0010 > > > drwxr-xr-x framstag root - 2017-08-14 10:21:47 > > > .snapshot/daily.2017-08-18_0010 > > > drwxr-xr-x framstag root - 2017-08-18 23:59:29 > > > .snapshot/daily.2017-08-19_0010 > > > drwxr-xr-x framstag root - 2017-08-19 21:01:25 > > > .snapshot/daily.2017-08-20_0010 > > > drwxr-xr-x framstag root - 2017-08-20 19:48:40 > > > .snapshot/daily.2017-08-21_0010 > > > drwxr-xr-x framstag root - 2017-08-20 02:50:18 > > > .snapshot/hourly.2017-08-20_1210 > > > drwxr-xr-x framstag root - 2017-08-20 02:50:18 > > > .snapshot/hourly.2017-08-20_1610 > > > drwxr-xr-x framstag root - 2017-08-20 19:48:40 > > > .snapshot/hourly.2017-08-20_2010 > > > drwxr-xr-x framstag root - 2017-08-21 00:42:28 > > > .snapshot/hourly.2017-08-21_0810 > > > drwxr-xr-x framstag root - 2017-08-21 00:42:28 > > > .snapshot/hourly.2017-08-21_1210 > > > drwxr-xr-x framstag root - 2017-08-21 13:05:28 > > > .snapshot/hourly.2017-08-21_1610 > > btrfs-snap does not create local .snapshot/ sub-directories, but saves the > snapshots in the toplevel root volume directory. It is beneficial to not have snapshots in-place. With a local directory of snapshots, issuing things like "find", "grep -r" or even "du" will take an inordinate amount of time and will produce a result you do not expect. For some of those tools the problem can be avoided (by always keeping in mind to use "-x" with du, or "--one-file-system" with tar), but not for all of them. Personally I prefer to have a /snapshots directory on every FS, and e.g. timed snapshots of /home/username/src will live in /snapshots/home-username-src/. No point to hide it there with a dot either, as it's convenient to be able to browse older snapshots with GUI filemanagers (which hide dot-files by default). -- With respect, Roman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs Raid5 issue.
Thanks for the explanations. Chris, I don't think 'degraded' did anything to help the mounting, I just passed it in to see if it would help (I'm not sure if btrfs is "smart" enough to ignore a drive if it would increase the chance of mounting the volume even if it is degraded, but one could hope). I believe the key was 'nologreplay'. Here is some info about the corrupted fs: # btrfs fi show /tmp/root/ Label: 'kvm-btrfs' uuid: fef29f0a-dc4c-4cc4-b524-914e6630803c Total devices 3 FS bytes used 3.30TiB devid1 size 2.73TiB used 2.09TiB path /dev/bcache32 devid2 size 2.73TiB used 2.09TiB path /dev/bcache0 devid3 size 2.73TiB used 2.09TiB path /dev/bcache16 # btrfs fi usage /tmp/root/ WARNING: RAID56 detected, not implemented WARNING: RAID56 detected, not implemented WARNING: RAID56 detected, not implemented Overall: Device size: 8.18TiB Device allocated:0.00B Device unallocated:8.18TiB Device missing: 0.00B Used:0.00B Free (estimated):0.00B (min: 8.00EiB) Data ratio: 0.00 Metadata ratio: 0.00 Global reserve: 512.00MiB (used: 0.00B) Data,RAID5: Size:4.15TiB, Used:3.28TiB /dev/bcache02.08TiB /dev/bcache16 2.08TiB /dev/bcache32 2.08TiB Metadata,RAID5: Size:22.00GiB, Used:20.69GiB /dev/bcache0 11.00GiB /dev/bcache16 11.00GiB /dev/bcache32 11.00GiB System,RAID5: Size:64.00MiB, Used:400.00KiB /dev/bcache0 32.00MiB /dev/bcache16 32.00MiB /dev/bcache32 32.00MiB Unallocated: /dev/bcache0 655.00GiB /dev/bcache16 655.00GiB /dev/bcache32 656.49GiB So it looks like I set the metadata and system data to RAID5 and not RAID1. I guess that it could have been affected by the write hole causing the problem I was seeing. Since I get the same space usage with RAID1 and RAID5, I think I'm just going to use RAID1. I don't need stripe performance or anything like that. It would be nice if btrfs supported hotplug and re-plug a little better so that it is more "production" quality, but I just have to be patient. I'm familiar with Gluster and contributed code to Ceph, so I'm familiar with those types of distributed systems. I really like them, but the complexity is quite overkill for my needs at home. As far as bcache performance: I have two Crucial MX200 250GB drives that were md raid1 containing /boot (ext2), swap and then bcache. I have 2 WD Reds and a Seagate Barracuda Desktop drive all 3TB. With bcache in writeback, apt-get would be painfully slow. Running iostat, the SSDs would be doing a few hundred IOPs and the backing disks would be very busy and would be the limiting factor overall. Even though apt-get just downloaded the file (should be on the SSDs because of writeback), it still involved the backend disks way too much. The amount of dirty data was always less than 10% so there should have been plenty of space to free up cache without having to flush. I experimented with changing the size of contiguous IO to force more to cache, increasing the dirty ratio, etc, nothing seemed to provide the performance I was hoping. To be fair having a pair of SSDs (md raid1) caching three spindles (btrfs raid5) may not be an ideal configuration. If I had three SSDs, one for each drive, then it may have performed better?? I have also ~980 snapshots spread over a years time, so I don't know how much that impacts things. I did use a btrfs utility to help find duplicate files/chunks and dedupe them so that updated system binaries between upgraded LXC containers would use the same space on disk and be more efficient in bcache cache usage. After restoring the root and LXC roots snapshots on the SSD (broke the md raid1 so I could restore to one of them), I ran apt-get and got upwards to 2,400 IOPs with it being sustained around 1,200 IOPs (btrfs single on md raid1 degraded). I know that btrfs has some performance challenges, but I don't think I was hitting those. I was most likely a very unusual set-up of bcache and btrfs raid that caused the problem. I have bcache on 10 year old desktop box with a single nvme drive that performs a little better, but it is hard to be certain because of its age. It has bcache in write-around (since there is only a single nvme) and btrfs in raid1. I haven't watched that box as closely because it is responsive enough. It also only has four Gb of RAM so it constantly has to swap (web pages are hogs these days) and one of the reasons to retrofit that box with nvme rather than MX200. If you have any other questions, feel free to ask. Thanks Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to
Re: netapp-alike snapshots?
This is possible. Use the -b or -B option. -b basedir places the snapshot in basedir with a directory structure that mimics the mountpoint -B basedir places the snapshots in basedir with NO additional subdirectory structure 2017-08-22 16:24 GMT+02:00 Ulli Horlacher: > On Tue 2017-08-22 (15:44), Peter Becker wrote: >> Is use: https://github.com/jf647/btrfs-snap >> >> 2017-08-22 15:22 GMT+02:00 Ulli Horlacher : >> > With Netapp/waffle you have automatic hourly/daily/weekly snapshots. >> > You can find these snapshots in every local directory (readonly). >> > Example: >> > >> > framstag@fex:/sw/share: ll .snapshot/ >> > drwxr-xr-x framstag root - 2017-08-14 10:21:47 >> > .snapshot/daily.2017-08-15_0010 >> > drwxr-xr-x framstag root - 2017-08-14 10:21:47 >> > .snapshot/daily.2017-08-16_0010 >> > drwxr-xr-x framstag root - 2017-08-14 10:21:47 >> > .snapshot/daily.2017-08-17_0010 >> > drwxr-xr-x framstag root - 2017-08-14 10:21:47 >> > .snapshot/daily.2017-08-18_0010 >> > drwxr-xr-x framstag root - 2017-08-18 23:59:29 >> > .snapshot/daily.2017-08-19_0010 >> > drwxr-xr-x framstag root - 2017-08-19 21:01:25 >> > .snapshot/daily.2017-08-20_0010 >> > drwxr-xr-x framstag root - 2017-08-20 19:48:40 >> > .snapshot/daily.2017-08-21_0010 >> > drwxr-xr-x framstag root - 2017-08-20 02:50:18 >> > .snapshot/hourly.2017-08-20_1210 >> > drwxr-xr-x framstag root - 2017-08-20 02:50:18 >> > .snapshot/hourly.2017-08-20_1610 >> > drwxr-xr-x framstag root - 2017-08-20 19:48:40 >> > .snapshot/hourly.2017-08-20_2010 >> > drwxr-xr-x framstag root - 2017-08-21 00:42:28 >> > .snapshot/hourly.2017-08-21_0810 >> > drwxr-xr-x framstag root - 2017-08-21 00:42:28 >> > .snapshot/hourly.2017-08-21_1210 >> > drwxr-xr-x framstag root - 2017-08-21 13:05:28 >> > .snapshot/hourly.2017-08-21_1610 > > btrfs-snap does not create local .snapshot/ sub-directories, but saves the > snapshots in the toplevel root volume directory. > > > > -- > Ullrich Horlacher Server und Virtualisierung > Rechenzentrum TIK > Universitaet Stuttgart E-Mail: horlac...@tik.uni-stuttgart.de > Allmandring 30aTel:++49-711-68565868 > 70569 Stuttgart (Germany) WWW:http://www.tik.uni-stuttgart.de/ >
Re: [PATCH 3/7] btrfs-progs: extent-cache: actually cache extent buffers
On Tue, Jul 25, 2017 at 04:51:34PM -0400, je...@suse.com wrote: > From: Jeff Mahoney> > We have the infrastructure to cache extent buffers but we don't actually > do the caching. As soon as the last reference is dropped, the buffer > is dropped. This patch keeps the extent buffers around until the max > cache size is reached (defaults to 25% of memory) and then it drops > the last 10% of the LRU to free up cache space for reallocation. The > cache size is configurable (for use by e.g. lowmem) when the cache is > initialized. > > Signed-off-by: Jeff Mahoney I've started to merge the series, changed code according to the review. In this patch, test-mkfs and test-check fail (segfaults and assertions). A debugging build or asan (make D=all,asan) does not reproduce the errors so this will need to be found by other means. I also fixed some trivial coding style issues, so the changes are now in the branch https://github.com/kdave/btrfs-progs/tree/ext/jeffm/extent-cache Please use this as a starting point, I'm fine with resending just this patch or an incremental. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: finding root filesystem of a subvolume?
On Tue 2017-08-22 (11:03), Austin S. Hemmelgarn wrote: > Or alternatively, repeatedly call `btrfs filesystem show` on the path, > removing one component from the end each time until you get a zero > return code. The path you called it on that got a zero return code is > where the mount is (and thus what filesystem that subvolume is part of), > and the output just gave you a list of devices it's on. "btrfs filesystem show" is relative slow (2.6 s), "btrfs subvolume show" is MUCH faster (0.02 s). In perl I have now: $root = $volume; while (`btrfs subvolume show "$root" 2>/dev/null` !~ /toplevel subvolume/) { $root = dirname($root); last if $root eq '/'; } -- Ullrich Horlacher Server und Virtualisierung Rechenzentrum TIK Universitaet Stuttgart E-Mail: horlac...@tik.uni-stuttgart.de Allmandring 30aTel:++49-711-68565868 70569 Stuttgart (Germany) WWW:http://www.tik.uni-stuttgart.de/ REF:<62494c0c-0c27-5b36-3727-b8755eb2c...@gmail.com> -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: finding root filesystem of a subvolume?
On 2017-08-22 10:43, Peter Grandi wrote: How do I find the root filesystem of a subvolume? Example: root@fex:~# df -T Filesystem Type 1K-blocks Used Available Use% Mounted on - -1073740800 104244552 967773976 10% /local/.backup/home [ ... ] I know, the root filesystem is /local, That question is somewhat misunderstood and uses the wrong concepts and terms. In UNIX filesystems a filesystem "root" is a directory inode with a number that is local to itself, and can be "mounted" anywhere, or left unmounted, and that is a property of the running system, not of the filesystem "root". Usually UNIX filesystems have a single "root" directory inode. In the case of Btrfs the main volume and its subvolumes all have filesystem "root" directory inodes, which may or may not be "mounted", anywhere the administrators of the running system pleases, as a property of the running system. There is no fixed relationship between the root directory inode of a subvolume and the root directory inode of any other subvolume or the main volume. Actually, there is, because it's inherently rooted in the hierarchy of the volume itself. That root inode for the subvolume is anchored somewhere under the next higher subvolume. It's the same concept as nested data-sets in ZFS, BTRFS just inherently exposes them at the appropriate location in the hierarchy and allows intermediary directories. Note: in Btrfs terminology "volume" seems to mean both the main volume and the collection of devices where it and subvolumes are hosted. Standard terminology from what I've seen uses 'volume' in the same way it's used for ext4, XFS, LVM, MD, and similar things, namely a BTRFS 'volume' is the collection of devices as well as the sum total of all subvolumes on those devices. This ends up meaning that it implicitly refers to the top-level subvolume when there are no other subvolumes, and as a result it's come to sometimes mean the top-level subvolume (though I rarely see that usage, and would actively discourage it). but who can I show it by command? The system does not keep an explicit record of which Btrfs "root" directory inode is related to which other Btrfs "root" directory inode in the same volume, whether mounted or unmounted. Again, it does, it's just not inherently exposed to userspace unless you mount the top-level subvolume (subvolid=5 and/or subvol=/ in mount options). Mount the top level subvolume (once you know what volume the subvolume is on), and call `btrfs subvolume list` on it. The `top level N` part of the output from that tells you what the next subvolume up the hierarchy is for each subvolume, and the `path` part at the end tells you the location within that next higher subvolume where this one is rooted. The output may not make sense though if you don't have the root subvolume mounted (because it may be non trivial to trace things up the tree). That relationship has to be discovered by using volume UUIDs, which are the same for the main subvolume and the other subvolumes, whether mounted or not, so one has to do: * For the indicated mounted subvolume "root" read its UUID. * For every mounted filesystem "root", check whether its type is 'btrfs' and if it is obtain its UUID. * If the UUID is the same, and the subvolume id is '5', that's the main subvolume, and terminate. * For every block device which is not mounted, check whether it has a Btrfs superblock. * If the type is 'btrfs' and the volume UUIS is the same as that of the subvolume, list the block device. In the latter case since the main volume is not mounted the only way to identify it is to list the block devices that host it. Or alternatively, repeatedly call `btrfs filesystem show` on the path, removing one component from the end each time until you get a zero return code. The path you called it on that got a zero return code is where the mount is (and thus what filesystem that subvolume is part of), and the output just gave you a list of devices it's on. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: finding root filesystem of a subvolume?
> How do I find the root filesystem of a subvolume? > Example: > root@fex:~# df -T > Filesystem Type 1K-blocks Used Available Use% Mounted on > - -1073740800 104244552 967773976 10% /local/.backup/home [ ... ] > I know, the root filesystem is /local, That question is somewhat misunderstood and uses the wrong concepts and terms. In UNIX filesystems a filesystem "root" is a directory inode with a number that is local to itself, and can be "mounted" anywhere, or left unmounted, and that is a property of the running system, not of the filesystem "root". Usually UNIX filesystems have a single "root" directory inode. In the case of Btrfs the main volume and its subvolumes all have filesystem "root" directory inodes, which may or may not be "mounted", anywhere the administrators of the running system pleases, as a property of the running system. There is no fixed relationship between the root directory inode of a subvolume and the root directory inode of any other subvolume or the main volume. Note: in Btrfs terminology "volume" seems to mean both the main volume and the collection of devices where it and subvolumes are hosted. > but who can I show it by command? The system does not keep an explicit record of which Btrfs "root" directory inode is related to which other Btrfs "root" directory inode in the same volume, whether mounted or unmounted. That relationship has to be discovered by using volume UUIDs, which are the same for the main subvolume and the other subvolumes, whether mounted or not, so one has to do: * For the indicated mounted subvolume "root" read its UUID. * For every mounted filesystem "root", check whether its type is 'btrfs' and if it is obtain its UUID. * If the UUID is the same, and the subvolume id is '5', that's the main subvolume, and terminate. * For every block device which is not mounted, check whether it has a Btrfs superblock. * If the type is 'btrfs' and the volume UUIS is the same as that of the subvolume, list the block device. In the latter case since the main volume is not mounted the only way to identify it is to list the block devices that host it. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: finding root filesystem of a subvolume?
On 2017-08-22 10:23, Hugo Mills wrote: On Tue, Aug 22, 2017 at 10:12:25AM -0400, Austin S. Hemmelgarn wrote: On 2017-08-22 09:53, Ulli Horlacher wrote: On Tue 2017-08-22 (09:37), Austin S. Hemmelgarn wrote: root@fex:~# df -T /local/.backup/home Filesystem Type 1K-blocks Used Available Use% Mounted on - -1073740800 104252160 967766336 10% /local/.backup/home Hmm, now I'm really confused, I just checked on the Ubuntu 17.04 and 16.04.3 VM's I have (I only run current and the most recent LTS version), and neither of them behave like this. I have this kind of output on all of my Ubuntu hosts: root@moep:~# grep PRETTY_NAME /etc/os-release PRETTY_NAME="Ubuntu 16.04.3 LTS" root@moep:~# df -T /usb/UF/tmp/blubb Filesystem Type 1K-blocksUsed Available Use% Mounted on - - 12581888 3690524 7253700 34% /usb/UF/tmp/blubb root@moep:~# btrfs subvolume show /usb/UF/tmp/blubb /usb/UF/tmp/blubb Name: blubb UUID: ecf8c804-d4a3-9948-89fe-b0c1971c25cb Parent UUID:- Received UUID: - Creation time: 2017-08-22 12:54:16 +0200 Subvolume ID: 262 Generation: 23 Gen at creation:22 Parent ID: 5 Top level ID: 5 Flags: - Snapshot(s): root@moep:~# dpkg -l | grep btrfs ii btrfs-tools 4.4-1ubuntu1 amd64Checksumming Copy on Write Filesystem utilities Hmm, interesting. Are you using qgroups by chance? I get this behaviour (the "- -") only if it's a non-mounted subvolume: hrm@amelia:~ $ df -T . Filesystem Type 1K-blocks Used Available Use% Mounted on /dev/sdb1 btrfs 117220284 95271852 18611060 84% /home hrm@amelia:~ $ sudo btrfs sub crea foo Create subvolume './foo' hrm@amelia:~ $ df -T ./foo Filesystem Type 1K-blocks Used Available Use% Mounted on - -117220284 95271880 18611032 84% /home/hrm/foo hrm@amelia:~ $ sudo mkdir foo/bar hrm@amelia:~ $ df -T foo/bar Filesystem Type 1K-blocks Used Available Use% Mounted on - -117220284 95271852 18611060 84% /home/hrm/foo hrm@amelia:~ $ mkdir foo2 hrm@amelia:~ $ sudo mount /dev/sdb1 ./foo2 -o subvol=home/hrm/foo hrm@amelia:~ $ df -T foo2 Filesystem Type 1K-blocks Used Available Use% Mounted on /dev/sdb1 btrfs 117220284 95272384 18610528 84% /home/hrm/foo2 Wait, I think I see what's up here. I was just calling `df -T` without pointing at the subvolume (which correctly ignores it because it's not actually mounted). It looks like this is a side effect of the (rather irritating) fake mount-point behavior of subvolumes. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] btrfs-progs: Use named constants for common sizes
On Thu, Jul 27, 2017 at 11:17:00AM +0300, Nikolay Borisov wrote: > There multiple places where we use well-known sizes - 1,8,16,32 megabytes. We > also have them defined as constants in the sizes.h header. So let's use them. > No functional changes. Both applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] btrfs-progs: Use named constants for common sizes
On Thu, Jul 27, 2017 at 09:02:12PM +, Duncan wrote: > Nikolay Borisov posted on Thu, 27 Jul 2017 11:17:00 +0300 as excerpted: > > > diff --git a/convert/main.c b/convert/main.c > > index c56382e915fd..49ab829b5641 100644 > > --- a/convert/main.c > > +++ b/convert/main.c > > > @@ -1586,7 +1586,7 @@ next: > > * | RSV 1 | | Old | |RSV 2 | | Old | | RSV 3 | > > * | 0~1M| | Fs | | SB2 + 64K | | Fs | | SB3 + 64K | > > * > > - * On the other hande, the converted fs image in btrfs is a completely > > + * On the other hande, the converted fs image in btrfs is a completely > > * valid old fs. > > * > > * |<-Converted fs image in btrfs>| > > If you're going to kill the line-terminating space, you might as well > do the spell-correct in the same line: > > s/hande/hand/ Fixed. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: netapp-alike snapshots?
On Tue 2017-08-22 (15:44), Peter Becker wrote: > Is use: https://github.com/jf647/btrfs-snap > > 2017-08-22 15:22 GMT+02:00 Ulli Horlacher: > > With Netapp/waffle you have automatic hourly/daily/weekly snapshots. > > You can find these snapshots in every local directory (readonly). > > Example: > > > > framstag@fex:/sw/share: ll .snapshot/ > > drwxr-xr-x framstag root - 2017-08-14 10:21:47 > > .snapshot/daily.2017-08-15_0010 > > drwxr-xr-x framstag root - 2017-08-14 10:21:47 > > .snapshot/daily.2017-08-16_0010 > > drwxr-xr-x framstag root - 2017-08-14 10:21:47 > > .snapshot/daily.2017-08-17_0010 > > drwxr-xr-x framstag root - 2017-08-14 10:21:47 > > .snapshot/daily.2017-08-18_0010 > > drwxr-xr-x framstag root - 2017-08-18 23:59:29 > > .snapshot/daily.2017-08-19_0010 > > drwxr-xr-x framstag root - 2017-08-19 21:01:25 > > .snapshot/daily.2017-08-20_0010 > > drwxr-xr-x framstag root - 2017-08-20 19:48:40 > > .snapshot/daily.2017-08-21_0010 > > drwxr-xr-x framstag root - 2017-08-20 02:50:18 > > .snapshot/hourly.2017-08-20_1210 > > drwxr-xr-x framstag root - 2017-08-20 02:50:18 > > .snapshot/hourly.2017-08-20_1610 > > drwxr-xr-x framstag root - 2017-08-20 19:48:40 > > .snapshot/hourly.2017-08-20_2010 > > drwxr-xr-x framstag root - 2017-08-21 00:42:28 > > .snapshot/hourly.2017-08-21_0810 > > drwxr-xr-x framstag root - 2017-08-21 00:42:28 > > .snapshot/hourly.2017-08-21_1210 > > drwxr-xr-x framstag root - 2017-08-21 13:05:28 > > .snapshot/hourly.2017-08-21_1610 btrfs-snap does not create local .snapshot/ sub-directories, but saves the snapshots in the toplevel root volume directory. -- Ullrich Horlacher Server und Virtualisierung Rechenzentrum TIK Universitaet Stuttgart E-Mail: horlac...@tik.uni-stuttgart.de Allmandring 30aTel:++49-711-68565868 70569 Stuttgart (Germany) WWW:http://www.tik.uni-stuttgart.de/
Re: finding root filesystem of a subvolume?
On Tue, Aug 22, 2017 at 10:12:25AM -0400, Austin S. Hemmelgarn wrote: > On 2017-08-22 09:53, Ulli Horlacher wrote: > >On Tue 2017-08-22 (09:37), Austin S. Hemmelgarn wrote: > > > >>>root@fex:~# df -T /local/.backup/home > >>>Filesystem Type 1K-blocks Used Available Use% Mounted on > >>>- -1073740800 104252160 967766336 10% /local/.backup/home > >> > >>Hmm, now I'm really confused, I just checked on the Ubuntu 17.04 and > >>16.04.3 VM's I have (I only run current and the most recent LTS > >>version), and neither of them behave like this. > > > >I have this kind of output on all of my Ubuntu hosts: > > > >root@moep:~# grep PRETTY_NAME /etc/os-release > >PRETTY_NAME="Ubuntu 16.04.3 LTS" > > > >root@moep:~# df -T /usb/UF/tmp/blubb > >Filesystem Type 1K-blocksUsed Available Use% Mounted on > >- - 12581888 3690524 7253700 34% /usb/UF/tmp/blubb > > > >root@moep:~# btrfs subvolume show /usb/UF/tmp/blubb > >/usb/UF/tmp/blubb > > Name: blubb > > UUID: ecf8c804-d4a3-9948-89fe-b0c1971c25cb > > Parent UUID:- > > Received UUID: - > > Creation time: 2017-08-22 12:54:16 +0200 > > Subvolume ID: 262 > > Generation: 23 > > Gen at creation:22 > > Parent ID: 5 > > Top level ID: 5 > > Flags: - > > Snapshot(s): > > > >root@moep:~# dpkg -l | grep btrfs > >ii btrfs-tools 4.4-1ubuntu1 > >amd64Checksumming Copy on Write Filesystem utilities > > > Hmm, interesting. Are you using qgroups by chance? I get this behaviour (the "- -") only if it's a non-mounted subvolume: hrm@amelia:~ $ df -T . Filesystem Type 1K-blocks Used Available Use% Mounted on /dev/sdb1 btrfs 117220284 95271852 18611060 84% /home hrm@amelia:~ $ sudo btrfs sub crea foo Create subvolume './foo' hrm@amelia:~ $ df -T ./foo Filesystem Type 1K-blocks Used Available Use% Mounted on - -117220284 95271880 18611032 84% /home/hrm/foo hrm@amelia:~ $ sudo mkdir foo/bar hrm@amelia:~ $ df -T foo/bar Filesystem Type 1K-blocks Used Available Use% Mounted on - -117220284 95271852 18611060 84% /home/hrm/foo hrm@amelia:~ $ mkdir foo2 hrm@amelia:~ $ sudo mount /dev/sdb1 ./foo2 -o subvol=home/hrm/foo hrm@amelia:~ $ df -T foo2 Filesystem Type 1K-blocks Used Available Use% Mounted on /dev/sdb1 btrfs 117220284 95272384 18610528 84% /home/hrm/foo2 Hugo. -- Hugo Mills | "Your problem is that you have a negative hugo@... carfax.org.uk | personality." http://carfax.org.uk/ | "No, I don't!" PGP: E2AB1DE4 | Londo and Vir, Babylon 5 signature.asc Description: Digital signature
Re: [PATCH] Btrfs-progs: Check root before printing item
On Mon, Aug 21, 2017 at 03:57:13PM +0800, zhangyu-f...@cn.fujitsu.com wrote: > From: Zhang Yu> > [TEST/fuzz] case: 004-simple-dump-tree > > Since the wrong key(DATA_RELOC_TREE CHUNK_ITEM 0) in root tree, > error calling print_chunk(), resulting in num_stripes == 0. > > ERROR: > [TEST/fuzz] 004-simple-dump-tree > ctree.h:317: btrfs_chunk_item_size: BUG_ON `num_stripes == 0` > triggered, value 1 > > failed (ignored, ret=134): /myproject/btrfs-progs/btrfs > inspect-internal dump-tree > /myproject/btrfs-progs/tests/fuzz-tests/images/ > bko-155201-wrong-chunk-item-in-root-tree.raw.restored > > test failed for case 004-simple-dump-tree > Makefile:288: recipe for target 'test-fuzz' failed > make: *** [test-fuzz] Error 1 > > So, before printing item, determine the root is valid or not. I don't think this is the right way to fix it. The print-tree function should print everything that's found, possibly doing sanity checks and then only skip the bad data. For debugging or other purposes, we want to get exact state of the trees. The original problem you found is wrong number of stripes, so it should be dealt with in print_chunk. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: finding root filesystem of a subvolume?
On 2017-08-22 09:53, Ulli Horlacher wrote: On Tue 2017-08-22 (09:37), Austin S. Hemmelgarn wrote: root@fex:~# df -T /local/.backup/home Filesystem Type 1K-blocks Used Available Use% Mounted on - -1073740800 104252160 967766336 10% /local/.backup/home Hmm, now I'm really confused, I just checked on the Ubuntu 17.04 and 16.04.3 VM's I have (I only run current and the most recent LTS version), and neither of them behave like this. I have this kind of output on all of my Ubuntu hosts: root@moep:~# grep PRETTY_NAME /etc/os-release PRETTY_NAME="Ubuntu 16.04.3 LTS" root@moep:~# df -T /usb/UF/tmp/blubb Filesystem Type 1K-blocksUsed Available Use% Mounted on - - 12581888 3690524 7253700 34% /usb/UF/tmp/blubb root@moep:~# btrfs subvolume show /usb/UF/tmp/blubb /usb/UF/tmp/blubb Name: blubb UUID: ecf8c804-d4a3-9948-89fe-b0c1971c25cb Parent UUID:- Received UUID: - Creation time: 2017-08-22 12:54:16 +0200 Subvolume ID: 262 Generation: 23 Gen at creation:22 Parent ID: 5 Top level ID: 5 Flags: - Snapshot(s): root@moep:~# dpkg -l | grep btrfs ii btrfs-tools 4.4-1ubuntu1 amd64Checksumming Copy on Write Filesystem utilities Hmm, interesting. Are you using qgroups by chance? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs-progs: Make in-place exit to a common exit block
On Tue, Aug 22, 2017 at 01:35:06PM +0800, Gu Jinxiang wrote: > As comment pointed out by David, make in-place exit > to a common exit block of mkfs. > > v1: > Add some close(fd) when error occures in mkfs. > And add close(fd) when end use it. > > Signed-off-by: Gu JinxiangApplied, thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs-progs: mkfs: Replace number with enum
On Mon, Aug 21, 2017 at 07:39:49PM +0200, David Sterba wrote: > > +/* roots: root tree, extent tree, chunk tree, dev tree, fs tree, csum tree > > */ > > +enum btrfs_mkfs_block { > > + SUPER_BLOCK = 0, > > + ROOT_TREE, > > + EXTENT_TREE, > > + CHUNK_TREE, > > + DEV_TREE, > > + FS_TREE, > > + CSUM_TREE, > > + BLOCK_COUNT BLOCK_COUNT is 7 > > +}; > > + > > struct btrfs_mkfs_config { > > /* Label of the new filesystem */ > > const char *label; > > @@ -43,7 +55,7 @@ struct btrfs_mkfs_config { > > /* Output fields, set during creation */ > > > > /* Logical addresses of superblock [0] and other tree roots */ > > - u64 blocks[8]; > > + u64 blocks[BLOCK_COUNT]; This replaces 8 with 7 then, so the fs_uuid gets overwritten, can be also caught by simply running 'make test-mkfs'. > > char fs_uuid[BTRFS_UUID_UNPARSED_SIZE]; > > char chunk_uuid[BTRFS_UUID_UNPARSED_SIZE]; > > > > -- > > 2.9.4 > > > > > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > > the body of a message to majord...@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: finding root filesystem of a subvolume?
On Tue 2017-08-22 (09:37), Austin S. Hemmelgarn wrote: > > root@fex:~# df -T /local/.backup/home > > Filesystem Type 1K-blocks Used Available Use% Mounted on > > - -1073740800 104252160 967766336 10% /local/.backup/home > > Hmm, now I'm really confused, I just checked on the Ubuntu 17.04 and > 16.04.3 VM's I have (I only run current and the most recent LTS > version), and neither of them behave like this. I have this kind of output on all of my Ubuntu hosts: root@moep:~# grep PRETTY_NAME /etc/os-release PRETTY_NAME="Ubuntu 16.04.3 LTS" root@moep:~# df -T /usb/UF/tmp/blubb Filesystem Type 1K-blocksUsed Available Use% Mounted on - - 12581888 3690524 7253700 34% /usb/UF/tmp/blubb root@moep:~# btrfs subvolume show /usb/UF/tmp/blubb /usb/UF/tmp/blubb Name: blubb UUID: ecf8c804-d4a3-9948-89fe-b0c1971c25cb Parent UUID:- Received UUID: - Creation time: 2017-08-22 12:54:16 +0200 Subvolume ID: 262 Generation: 23 Gen at creation:22 Parent ID: 5 Top level ID: 5 Flags: - Snapshot(s): root@moep:~# dpkg -l | grep btrfs ii btrfs-tools 4.4-1ubuntu1 amd64Checksumming Copy on Write Filesystem utilities -- Ullrich Horlacher Server und Virtualisierung Rechenzentrum TIK Universitaet Stuttgart E-Mail: horlac...@tik.uni-stuttgart.de Allmandring 30aTel:++49-711-68565868 70569 Stuttgart (Germany) WWW:http://www.tik.uni-stuttgart.de/ REF:<0c35fba9-a514-31dd-a703-17f4727ed...@gmail.com> -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: finding root filesystem of a subvolume?
Hmm, now I'm really confused, I just checked on the Ubuntu 17.04 and 16.04.3 VM's I have (I only run current and the most recent LTS version), and neither of them behave like this. Was also shocked, but: $ lsb_release -a No LSB modules are available. Distributor ID:Ubuntu Description:Ubuntu 16.04.3 LTS Release:16.04 Codename:xenial $ df -T | grep /mnt/data/lxc $ df -T /mnt/data/lxc Filesystem Type 1K-blocks Used Available Use% Mounted on - -2907008836 90829848 2815107576 4% /mnt/data/lxc -- With Best Regards, Marat Khalili -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: finding root filesystem of a subvolume?
I have no subvol=/ option at all: Probably depends on kernel, but I presume missing subvol means the same as subvol=/ . I am only interested in mounted volumes. If your initial path (/local/.backup/home) is a subvolume but it's not itself present in /proc/mounts then it's probably mounted as a part some higher-level subvolume, but this higher-level subvolume does not have to be root. Do you need volume root or just some higher-level subvolume that's mounted? -- With Best Regards, Marat Khalili -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: netapp-alike snapshots?
Is use: https://github.com/jf647/btrfs-snap 2017-08-22 15:22 GMT+02:00 Ulli Horlacher: > With Netapp/waffle you have automatic hourly/daily/weekly snapshots. > You can find these snapshots in every local directory (readonly). > Example: > > framstag@fex:/sw/share: ll .snapshot/ > drwxr-xr-x framstag root - 2017-08-14 10:21:47 > .snapshot/daily.2017-08-15_0010 > drwxr-xr-x framstag root - 2017-08-14 10:21:47 > .snapshot/daily.2017-08-16_0010 > drwxr-xr-x framstag root - 2017-08-14 10:21:47 > .snapshot/daily.2017-08-17_0010 > drwxr-xr-x framstag root - 2017-08-14 10:21:47 > .snapshot/daily.2017-08-18_0010 > drwxr-xr-x framstag root - 2017-08-18 23:59:29 > .snapshot/daily.2017-08-19_0010 > drwxr-xr-x framstag root - 2017-08-19 21:01:25 > .snapshot/daily.2017-08-20_0010 > drwxr-xr-x framstag root - 2017-08-20 19:48:40 > .snapshot/daily.2017-08-21_0010 > drwxr-xr-x framstag root - 2017-08-20 02:50:18 > .snapshot/hourly.2017-08-20_1210 > drwxr-xr-x framstag root - 2017-08-20 02:50:18 > .snapshot/hourly.2017-08-20_1610 > drwxr-xr-x framstag root - 2017-08-20 19:48:40 > .snapshot/hourly.2017-08-20_2010 > drwxr-xr-x framstag root - 2017-08-21 00:42:28 > .snapshot/hourly.2017-08-21_0810 > drwxr-xr-x framstag root - 2017-08-21 00:42:28 > .snapshot/hourly.2017-08-21_1210 > drwxr-xr-x framstag root - 2017-08-21 13:05:28 > .snapshot/hourly.2017-08-21_1610 > > I would like to have something similar with btrfs. > Programming such a feature is not a general problem for me, but I think I > am not the first one who wants this kind of auto-snapshooting. > Is there (where?) such a tool? > > I know snapper, but it has a totally different approach. > > -- > Ullrich Horlacher Server und Virtualisierung > Rechenzentrum TIK > Universitaet Stuttgart E-Mail: horlac...@tik.uni-stuttgart.de > Allmandring 30aTel:++49-711-68565868 > 70569 Stuttgart (Germany) WWW:http://www.tik.uni-stuttgart.de/ > REF:<20170822132208.gd14...@rus.uni-stuttgart.de> > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: finding root filesystem of a subvolume?
On 2017-08-22 09:30, Ulli Horlacher wrote: On Tue 2017-08-22 (09:27), Austin S. Hemmelgarn wrote: root@fex:~# df -T Filesystem Type 1K-blocks Used Available Use% Mounted on - -1073740800 104244552 967773976 10% /local/.backup/home I've never seen the "- -" output from df before. Is this a bind mount or something? No, /local/.backup/home is just a btrfs subvolume It arguably shouldn't be showing up here then if it's not been explicitly mounted. I'm betting you're running OpenSUSE or SLES No: root@fex:~# cat /etc/os-release NAME="Ubuntu" VERSION="14.04.5 LTS, Trusty Tahr" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 14.04.5 LTS" VERSION_ID="14.04" HOME_URL="http://www.ubuntu.com/; SUPPORT_URL="http://help.ubuntu.com/; BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/; root@fex:~# df -T /local/.backup/home Filesystem Type 1K-blocks Used Available Use% Mounted on - -1073740800 104252160 967766336 10% /local/.backup/home root@fex:~# type df df is hashed (/bin/df) root@fex:~# dpkg -S /bin/df coreutils: /bin/df Hmm, now I'm really confused, I just checked on the Ubuntu 17.04 and 16.04.3 VM's I have (I only run current and the most recent LTS version), and neither of them behave like this. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: finding root filesystem of a subvolume?
On Tue 2017-08-22 (09:27), Austin S. Hemmelgarn wrote: > >>> root@fex:~# df -T > >>> Filesystem Type 1K-blocks Used Available Use% Mounted on > >>> - -1073740800 104244552 967773976 10% > >>> /local/.backup/home > >> > >> I've never seen the "- -" output from df before. Is this a bind > >> mount or something? > > > > No, /local/.backup/home is just a btrfs subvolume > > It arguably shouldn't be showing up here then if it's not been > explicitly mounted. I'm betting you're running OpenSUSE or SLES No: root@fex:~# cat /etc/os-release NAME="Ubuntu" VERSION="14.04.5 LTS, Trusty Tahr" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 14.04.5 LTS" VERSION_ID="14.04" HOME_URL="http://www.ubuntu.com/; SUPPORT_URL="http://help.ubuntu.com/; BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/; root@fex:~# df -T /local/.backup/home Filesystem Type 1K-blocks Used Available Use% Mounted on - -1073740800 104252160 967766336 10% /local/.backup/home root@fex:~# type df df is hashed (/bin/df) root@fex:~# dpkg -S /bin/df coreutils: /bin/df -- Ullrich Horlacher Server und Virtualisierung Rechenzentrum TIK Universitaet Stuttgart E-Mail: horlac...@tik.uni-stuttgart.de Allmandring 30aTel:++49-711-68565868 70569 Stuttgart (Germany) WWW:http://www.tik.uni-stuttgart.de/ REF:<16778020-9167-b7cc-4768-ee33dca2b...@gmail.com> -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: finding root filesystem of a subvolume?
On 2017-08-22 08:50, Ulli Horlacher wrote: On Tue 2017-08-22 (12:40), Hugo Mills wrote: On Tue, Aug 22, 2017 at 02:23:50PM +0200, Ulli Horlacher wrote: How do I find the root filesystem of a subvolume? Example: root@fex:~# df -T Filesystem Type 1K-blocks Used Available Use% Mounted on - -1073740800 104244552 967773976 10% /local/.backup/home I've never seen the "- -" output from df before. Is this a bind mount or something? No, /local/.backup/home is just a btrfs subvolume It arguably shouldn't be showing up here then if it's not been explicitly mounted. I'm betting you're running OpenSUSE or SLES and they finally got their df integration done, as that df output absolutely matches the type of brain-dead handling of BTRFS I'm coming to expect out of them. Note to SUSE people reading this: You should be including actual information for at least the Type field, and ideally the Filesystem field too. People expect this to behave reasonably, and not listing any info about where the 'mount' originated or what it is is not reasonable. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
netapp-alike snapshots?
With Netapp/waffle you have automatic hourly/daily/weekly snapshots. You can find these snapshots in every local directory (readonly). Example: framstag@fex:/sw/share: ll .snapshot/ drwxr-xr-x framstag root - 2017-08-14 10:21:47 .snapshot/daily.2017-08-15_0010 drwxr-xr-x framstag root - 2017-08-14 10:21:47 .snapshot/daily.2017-08-16_0010 drwxr-xr-x framstag root - 2017-08-14 10:21:47 .snapshot/daily.2017-08-17_0010 drwxr-xr-x framstag root - 2017-08-14 10:21:47 .snapshot/daily.2017-08-18_0010 drwxr-xr-x framstag root - 2017-08-18 23:59:29 .snapshot/daily.2017-08-19_0010 drwxr-xr-x framstag root - 2017-08-19 21:01:25 .snapshot/daily.2017-08-20_0010 drwxr-xr-x framstag root - 2017-08-20 19:48:40 .snapshot/daily.2017-08-21_0010 drwxr-xr-x framstag root - 2017-08-20 02:50:18 .snapshot/hourly.2017-08-20_1210 drwxr-xr-x framstag root - 2017-08-20 02:50:18 .snapshot/hourly.2017-08-20_1610 drwxr-xr-x framstag root - 2017-08-20 19:48:40 .snapshot/hourly.2017-08-20_2010 drwxr-xr-x framstag root - 2017-08-21 00:42:28 .snapshot/hourly.2017-08-21_0810 drwxr-xr-x framstag root - 2017-08-21 00:42:28 .snapshot/hourly.2017-08-21_1210 drwxr-xr-x framstag root - 2017-08-21 13:05:28 .snapshot/hourly.2017-08-21_1610 I would like to have something similar with btrfs. Programming such a feature is not a general problem for me, but I think I am not the first one who wants this kind of auto-snapshooting. Is there (where?) such a tool? I know snapper, but it has a totally different approach. -- Ullrich Horlacher Server und Virtualisierung Rechenzentrum TIK Universitaet Stuttgart E-Mail: horlac...@tik.uni-stuttgart.de Allmandring 30aTel:++49-711-68565868 70569 Stuttgart (Germany) WWW:http://www.tik.uni-stuttgart.de/ REF:<20170822132208.gd14...@rus.uni-stuttgart.de> -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: finding root filesystem of a subvolume?
On Tue 2017-08-22 (15:58), Marat Khalili wrote: > On 22/08/17 15:50, Ulli Horlacher wrote: > > > It seems, I have to scan the subvolume path upwards until I found a real > > mount point, > > I think searching /proc/mounts for the same device and subvol=/ in > options is most straightforward. I have no subvol=/ option at all: root@fex:~# grep btrfs /proc/mounts /dev/sdf1 /backup btrfs rw,relatime,compress=zlib,space_cache 0 0 /dev/sde1 /local btrfs rw,relatime,compress=zlib,space_cache 0 0 > But what makes you think it's mounted at all? I am only interested in mounted volumes. -- Ullrich Horlacher Server und Virtualisierung Rechenzentrum TIK Universitaet Stuttgart E-Mail: horlac...@tik.uni-stuttgart.de Allmandring 30aTel:++49-711-68565868 70569 Stuttgart (Germany) WWW:http://www.tik.uni-stuttgart.de/ REF:-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: finding root filesystem of a subvolume?
On 22/08/17 15:50, Ulli Horlacher wrote: It seems, I have to scan the subvolume path upwards until I found a real mount point, I think searching /proc/mounts for the same device and subvol=/ in options is most straightforward. But what makes you think it's mounted at all? -- With Best Regards, Marat Khalili -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: finding root filesystem of a subvolume?
On Tue 2017-08-22 (12:40), Hugo Mills wrote: > On Tue, Aug 22, 2017 at 02:23:50PM +0200, Ulli Horlacher wrote: > > > How do I find the root filesystem of a subvolume? > > Example: > > > > root@fex:~# df -T > > Filesystem Type 1K-blocks Used Available Use% Mounted on > > - -1073740800 104244552 967773976 10% /local/.backup/home > >I've never seen the "- -" output from df before. Is this a bind > mount or something? No, /local/.backup/home is just a btrfs subvolume > > I know, the root filesystem is /local, but who can I show it by command? > >Probably in /proc/self/mountinfo root@fex:~# grep home /proc/self/mountinfo root@fex:~# grep btrfs /proc/self/mountinfo 31 22 0:23 / /backup rw,relatime - btrfs /dev/sdf1 rw,compress=zlib,space_cache 32 22 0:26 / /local rw,relatime - btrfs /dev/sde1 rw,compress=zlib,space_cache No information about the subvolume /local/.backup/home It seems, I have to scan the subvolume path upwards until I found a real mount point, -- Ullrich Horlacher Server und Virtualisierung Rechenzentrum TIK Universitaet Stuttgart E-Mail: horlac...@tik.uni-stuttgart.de Allmandring 30aTel:++49-711-68565868 70569 Stuttgart (Germany) WWW:http://www.tik.uni-stuttgart.de/ REF:<20170822124036.ga32...@carfax.org.uk> -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: finding root filesystem of a subvolume?
On Tue, Aug 22, 2017 at 02:23:50PM +0200, Ulli Horlacher wrote: > How do I find the root filesystem of a subvolume? > Example: > > root@fex:~# df -T > Filesystem Type 1K-blocks Used Available Use% Mounted on > - -1073740800 104244552 967773976 10% /local/.backup/home I've never seen the "- -" output from df before. Is this a bind mount or something? > root@fex:~# btrfs subvolume show /local/.backup/home > /local/.backup/home > Name: home > uuid: f86a2db0-6a82-124f-9a71-1cd4c20fd6fb > Parent uuid:ba4d388f-44bf-7b46-b2b8-00e2a9a87181 > Creation time: 2017-08-10 22:19:15 > Object ID: 383 > Generation (Gen): 148 > Gen at creation:148 > Parent: 5 > Top Level: 5 > Flags: readonly > Snapshot(s): > > > I know, the root filesystem is /local, but who can I show it by command? Probably in /proc/self/mountinfo -- that should give you the full set of applied mount options, plus the original source for the mount (which will be a block device for most filesystem mounts, a path for bind mounts, or something FS-specific for network filesystems). Hugo. -- Hugo Mills | And what rough beast, its hour come round at last / hugo@... carfax.org.uk | slouches towards Bethlehem, to be born? http://carfax.org.uk/ | PGP: E2AB1DE4 | W.B. Yeats, The Second Coming signature.asc Description: Digital signature
finding root filesystem of a subvolume?
How do I find the root filesystem of a subvolume? Example: root@fex:~# df -T Filesystem Type 1K-blocks Used Available Use% Mounted on - -1073740800 104244552 967773976 10% /local/.backup/home root@fex:~# btrfs subvolume show /local/.backup/home /local/.backup/home Name: home uuid: f86a2db0-6a82-124f-9a71-1cd4c20fd6fb Parent uuid:ba4d388f-44bf-7b46-b2b8-00e2a9a87181 Creation time: 2017-08-10 22:19:15 Object ID: 383 Generation (Gen): 148 Gen at creation:148 Parent: 5 Top Level: 5 Flags: readonly Snapshot(s): I know, the root filesystem is /local, but who can I show it by command? -- Ullrich Horlacher Server und Virtualisierung Rechenzentrum TIK Universitaet Stuttgart E-Mail: horlac...@tik.uni-stuttgart.de Allmandring 30aTel:++49-711-68565868 70569 Stuttgart (Germany) WWW:http://www.tik.uni-stuttgart.de/ REF:<20170822122350.ga14...@rus.uni-stuttgart.de> -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] btrfs: Add sanity check for EXTENT_DATA when reading out leaf
On 22.08.2017 14:23, Qu Wenruo wrote: > > > On 2017年08月22日 19:00, Nikolay Borisov wrote: >> >> >> On 22.08.2017 13:57, Nikolay Borisov wrote: >>> >>> >>> On 22.08.2017 10:37, Qu Wenruo wrote: Add extra checker for item with EXTENT_DATA type. This checks the following thing: 1) Item size Plain text inline file extent size must match item size. (compressed inline file extent has no info about its on-disk size) Regular/preallocated file extent size must be a fixed value. 2) Every member of regular file extent item Including alignment for bytenr and offset, possible value for compression/encryption/type. 3) Type/compression/encode must be one of the valid values. This should be the most comprehensive and restrict check in the context of btrfs_item for EXTENT_DATA. Signed-off-by: Qu Wenruo--- fs/btrfs/disk-io.c | 88 + include/uapi/linux/btrfs_tree.h | 1 + 2 files changed, 89 insertions(+) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 59ee7b959bf0..557f9a520e2a 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -549,6 +549,83 @@ static int check_tree_block_fsid(struct btrfs_fs_info *fs_info, btrfs_header_level(eb) == 0 ? "leaf" : "node",\ reason, btrfs_header_bytenr(eb), root->objectid, slot) +static int check_extent_data_item(struct btrfs_root *root, + struct extent_buffer *leaf, int slot) +{ +struct btrfs_file_extent_item *fi; +u32 sectorsize = root->fs_info->sectorsize; +u32 item_size = btrfs_item_size_nr(leaf, slot); + +fi = btrfs_item_ptr(leaf, slot, struct btrfs_file_extent_item); + +if (btrfs_file_extent_type(leaf, fi) >= BTRFS_FILE_EXTENT_LAST_TYPE) { +CORRUPT("invalid file extent type", leaf, root, slot); +return -EIO; +} +if (btrfs_file_extent_compression(leaf, fi) >= BTRFS_COMPRESS_LAST) { +CORRUPT("invalid file extent compression", leaf, root, slot); +return -EIO; +} +if (btrfs_file_extent_encryption(leaf, fi)) { +CORRUPT("invalid file extent encryption", leaf, root, slot); +return -EIO; +} +if (btrfs_file_extent_type(leaf, fi) == BTRFS_FILE_EXTENT_INLINE) { +if (btrfs_file_extent_compression(leaf, fi) != +BTRFS_COMPRESS_NONE) +return 0; +/* Plaintext inline extent size must match item size */ +if (item_size != BTRFS_FILE_EXTENT_INLINE_DATA_START + +btrfs_file_extent_ram_bytes(leaf, fi)) { +CORRUPT("plaintext inline extent has invalid size", +leaf, root, slot); +return -EIO; +} +return 0; +} >> >> One more thing - don't we really want to use -EUCLEAN rather than -EIO? > > Nice suggestion. > Since it's not really something wrong with IO routine, EUCLEAN is better. Yeah, I'm not saying it's wrong. But my mental model for -EIO vs -EUCLEAN should be the following: - When we write data in case something goes wrong e should return -EIO ( we basically cover this, since we always used -EIO). - When we read data but while performing validity checks on it (as is the case with your patch) we should return -EUCLEAN. Basically the FS needs to ensure that it's always feeding valid data to disk and the only error could be -EIO. But if this same data is read some time later and our internal checks show that the data is inconsistent we should say so and not just -EIO. I've mentioned this before and as a result David created the following wiki entry: https://btrfs.wiki.kernel.org/index.php/Project_ideas#Distinguish_EIO_and_EUCLEAN_types_of_errors I guess we should start from somewhere :) > >> >> + + +/* regular or preallocated extent has fixed item size */ +if (item_size != sizeof(*fi)) { +CORRUPT( +"regluar or preallocated extent data item size is invalid", +leaf, root, slot); +return -EIO; +} +if (!IS_ALIGNED(btrfs_file_extent_ram_bytes(leaf, fi), sectorsize) || +!IS_ALIGNED(btrfs_file_extent_disk_bytenr(leaf, fi), sectorsize) || +!IS_ALIGNED(btrfs_file_extent_disk_num_bytes(leaf, fi), +sectorsize) || +!IS_ALIGNED(btrfs_file_extent_offset(leaf, fi), sectorsize) || +!IS_ALIGNED(btrfs_file_extent_num_bytes(leaf, fi), sectorsize)) { +CORRUPT( +"regular or preallocated extent data item has unaligned value", +leaf, root, slot); +return -EIO; +
Re: [PATCH 3/3] btrfs: Add sanity check for EXTENT_DATA when reading out leaf
On 2017年08月22日 19:00, Nikolay Borisov wrote: On 22.08.2017 13:57, Nikolay Borisov wrote: On 22.08.2017 10:37, Qu Wenruo wrote: Add extra checker for item with EXTENT_DATA type. This checks the following thing: 1) Item size Plain text inline file extent size must match item size. (compressed inline file extent has no info about its on-disk size) Regular/preallocated file extent size must be a fixed value. 2) Every member of regular file extent item Including alignment for bytenr and offset, possible value for compression/encryption/type. 3) Type/compression/encode must be one of the valid values. This should be the most comprehensive and restrict check in the context of btrfs_item for EXTENT_DATA. Signed-off-by: Qu Wenruo--- fs/btrfs/disk-io.c | 88 + include/uapi/linux/btrfs_tree.h | 1 + 2 files changed, 89 insertions(+) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 59ee7b959bf0..557f9a520e2a 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -549,6 +549,83 @@ static int check_tree_block_fsid(struct btrfs_fs_info *fs_info, btrfs_header_level(eb) == 0 ? "leaf" : "node", \ reason, btrfs_header_bytenr(eb), root->objectid, slot) +static int check_extent_data_item(struct btrfs_root *root, + struct extent_buffer *leaf, int slot) +{ + struct btrfs_file_extent_item *fi; + u32 sectorsize = root->fs_info->sectorsize; + u32 item_size = btrfs_item_size_nr(leaf, slot); + + fi = btrfs_item_ptr(leaf, slot, struct btrfs_file_extent_item); + + if (btrfs_file_extent_type(leaf, fi) >= BTRFS_FILE_EXTENT_LAST_TYPE) { + CORRUPT("invalid file extent type", leaf, root, slot); + return -EIO; + } + if (btrfs_file_extent_compression(leaf, fi) >= BTRFS_COMPRESS_LAST) { + CORRUPT("invalid file extent compression", leaf, root, slot); + return -EIO; + } + if (btrfs_file_extent_encryption(leaf, fi)) { + CORRUPT("invalid file extent encryption", leaf, root, slot); + return -EIO; + } + if (btrfs_file_extent_type(leaf, fi) == BTRFS_FILE_EXTENT_INLINE) { + if (btrfs_file_extent_compression(leaf, fi) != + BTRFS_COMPRESS_NONE) + return 0; + /* Plaintext inline extent size must match item size */ + if (item_size != BTRFS_FILE_EXTENT_INLINE_DATA_START + + btrfs_file_extent_ram_bytes(leaf, fi)) { + CORRUPT("plaintext inline extent has invalid size", + leaf, root, slot); + return -EIO; + } + return 0; + } One more thing - don't we really want to use -EUCLEAN rather than -EIO? Nice suggestion. Since it's not really something wrong with IO routine, EUCLEAN is better. + + + /* regular or preallocated extent has fixed item size */ + if (item_size != sizeof(*fi)) { + CORRUPT( + "regluar or preallocated extent data item size is invalid", + leaf, root, slot); + return -EIO; + } + if (!IS_ALIGNED(btrfs_file_extent_ram_bytes(leaf, fi), sectorsize) || + !IS_ALIGNED(btrfs_file_extent_disk_bytenr(leaf, fi), sectorsize) || + !IS_ALIGNED(btrfs_file_extent_disk_num_bytes(leaf, fi), + sectorsize) || + !IS_ALIGNED(btrfs_file_extent_offset(leaf, fi), sectorsize) || + !IS_ALIGNED(btrfs_file_extent_num_bytes(leaf, fi), sectorsize)) { + CORRUPT( + "regular or preallocated extent data item has unaligned value", + leaf, root, slot); + return -EIO; + } + + return 0; +} + +static int check_leaf_item(struct btrfs_root *root, + struct extent_buffer *leaf, int slot) +{ + struct btrfs_key key; + int ret = 0; + + btrfs_item_key_to_cpu(leaf, , slot); nit: We already have the key in the proper format in the caller of this function. Why not just pass in the type as an argument and save a redundant call for every item in a leaf? Perhaps it's a microoptimisation but for very densely populated trees the miniature cost might build up. Sounds valid. Considering how many times this item_key_to_cpu() get called in a large leaf, micro-optimization counts. I'll update this in next revision. Thanks for your review, Qu + /* +* Considering how overcrowded the code will be inside the switch, +* complex verification is better to moved its own function. +*/ + switch (key.type) { + case BTRFS_EXTENT_DATA_KEY: + ret = check_extent_data_item(root, leaf, slot); + break; +
Re: [PATCH 3/3] btrfs: Add sanity check for EXTENT_DATA when reading out leaf
On 22.08.2017 13:57, Nikolay Borisov wrote: > > > On 22.08.2017 10:37, Qu Wenruo wrote: >> Add extra checker for item with EXTENT_DATA type. >> This checks the following thing: >> 1) Item size >>Plain text inline file extent size must match item size. >>(compressed inline file extent has no info about its on-disk size) >>Regular/preallocated file extent size must be a fixed value. >> >> 2) Every member of regular file extent item >>Including alignment for bytenr and offset, possible value for >>compression/encryption/type. >> >> 3) Type/compression/encode must be one of the valid values. >> >> This should be the most comprehensive and restrict check in the context >> of btrfs_item for EXTENT_DATA. >> >> Signed-off-by: Qu Wenruo>> --- >> fs/btrfs/disk-io.c | 88 >> + >> include/uapi/linux/btrfs_tree.h | 1 + >> 2 files changed, 89 insertions(+) >> >> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c >> index 59ee7b959bf0..557f9a520e2a 100644 >> --- a/fs/btrfs/disk-io.c >> +++ b/fs/btrfs/disk-io.c >> @@ -549,6 +549,83 @@ static int check_tree_block_fsid(struct btrfs_fs_info >> *fs_info, >> btrfs_header_level(eb) == 0 ? "leaf" : "node", \ >> reason, btrfs_header_bytenr(eb), root->objectid, slot) >> >> +static int check_extent_data_item(struct btrfs_root *root, >> + struct extent_buffer *leaf, int slot) >> +{ >> +struct btrfs_file_extent_item *fi; >> +u32 sectorsize = root->fs_info->sectorsize; >> +u32 item_size = btrfs_item_size_nr(leaf, slot); >> + >> +fi = btrfs_item_ptr(leaf, slot, struct btrfs_file_extent_item); >> + >> +if (btrfs_file_extent_type(leaf, fi) >= BTRFS_FILE_EXTENT_LAST_TYPE) { >> +CORRUPT("invalid file extent type", leaf, root, slot); >> +return -EIO; >> +} >> +if (btrfs_file_extent_compression(leaf, fi) >= BTRFS_COMPRESS_LAST) { >> +CORRUPT("invalid file extent compression", leaf, root, slot); >> +return -EIO; >> +} >> +if (btrfs_file_extent_encryption(leaf, fi)) { >> +CORRUPT("invalid file extent encryption", leaf, root, slot); >> +return -EIO; >> +} >> +if (btrfs_file_extent_type(leaf, fi) == BTRFS_FILE_EXTENT_INLINE) { >> +if (btrfs_file_extent_compression(leaf, fi) != >> +BTRFS_COMPRESS_NONE) >> +return 0; >> +/* Plaintext inline extent size must match item size */ >> +if (item_size != BTRFS_FILE_EXTENT_INLINE_DATA_START + >> +btrfs_file_extent_ram_bytes(leaf, fi)) { >> +CORRUPT("plaintext inline extent has invalid size", >> +leaf, root, slot); >> +return -EIO; >> +} >> +return 0; >> +} One more thing - don't we really want to use -EUCLEAN rather than -EIO? >> + >> + >> +/* regular or preallocated extent has fixed item size */ >> +if (item_size != sizeof(*fi)) { >> +CORRUPT( >> +"regluar or preallocated extent data item size is invalid", >> +leaf, root, slot); >> +return -EIO; >> +} >> +if (!IS_ALIGNED(btrfs_file_extent_ram_bytes(leaf, fi), sectorsize) || >> +!IS_ALIGNED(btrfs_file_extent_disk_bytenr(leaf, fi), sectorsize) || >> +!IS_ALIGNED(btrfs_file_extent_disk_num_bytes(leaf, fi), >> +sectorsize) || >> +!IS_ALIGNED(btrfs_file_extent_offset(leaf, fi), sectorsize) || >> +!IS_ALIGNED(btrfs_file_extent_num_bytes(leaf, fi), sectorsize)) { >> +CORRUPT( >> +"regular or preallocated extent data item has unaligned value", >> +leaf, root, slot); >> +return -EIO; >> +} >> + >> +return 0; >> +} >> + >> +static int check_leaf_item(struct btrfs_root *root, >> + struct extent_buffer *leaf, int slot) >> +{ >> +struct btrfs_key key; >> +int ret = 0; >> + >> +btrfs_item_key_to_cpu(leaf, , slot); > > nit: We already have the key in the proper format in the caller of this > function. Why not just pass in the type as an argument and save a > redundant call for every item in a leaf? Perhaps it's a > microoptimisation but for very densely populated trees the miniature > cost might build up. > >> +/* >> + * Considering how overcrowded the code will be inside the switch, >> + * complex verification is better to moved its own function. >> + */ >> +switch (key.type) { >> +case BTRFS_EXTENT_DATA_KEY: >> +ret = check_extent_data_item(root, leaf, slot); >> +break; >> +} >> +return ret; >> +} >> + >> static noinline int check_leaf(struct btrfs_root *root, >> struct extent_buffer *leaf) >> { >> @@ -605,9 +682,13 @@ static noinline int
Re: [PATCH 0/3] Introduce comprehensive sanity check framework and
On 22.08.2017 10:37, Qu Wenruo wrote: > The patchset introduce a new framework to do more comprehensive (if not > the most) sanity check when reading out a leaf. > > The new sanity checker will include: > > 1) Key order >Existing code > > 2) Item boundary >Existing code with enhanced checker to ensure item pointer doesn't >overlap with item itself. > > 3) Key type based sanity checker >Only EXTENT_DATA checker is implemented yet. >As each checker should go through review and tests, or it can easily >make a valid btrfs failed to be mounted. >So only one checker is implemented as an example. > >Existing checker like INODE_REF checker can be moved to this >framework easily, and we can centralize all existing checkers, make >the rest of codes more clean. > > Performance wise, it's just iterating a leaf. > And it will only get triggered when read out a leaf, cached leaf will > not go through such checker. > So it won't be a performance breaker. > > I tested with the patchset applied on v4.13-rc6 with fstests, no > regression is detected. > > Qu Wenruo (3): > btrfs: Refactor check_leaf function for later expansion. > btrfs: Check if item pointer overlap with item itself > btrfs: Add sanity check for EXTENT_DATA when reading out leaf I have one minor comment on 3/3 which I've sent separately but otherwise this series looks good and I like the direction it's steering future code into. For the whole series: Reviewed-by: Nikolay Borisov> > fs/btrfs/disk-io.c | 137 > ++-- > include/uapi/linux/btrfs_tree.h | 1 + > 2 files changed, 119 insertions(+), 19 deletions(-) > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] btrfs: Add sanity check for EXTENT_DATA when reading out leaf
On 22.08.2017 10:37, Qu Wenruo wrote: > Add extra checker for item with EXTENT_DATA type. > This checks the following thing: > 1) Item size >Plain text inline file extent size must match item size. >(compressed inline file extent has no info about its on-disk size) >Regular/preallocated file extent size must be a fixed value. > > 2) Every member of regular file extent item >Including alignment for bytenr and offset, possible value for >compression/encryption/type. > > 3) Type/compression/encode must be one of the valid values. > > This should be the most comprehensive and restrict check in the context > of btrfs_item for EXTENT_DATA. > > Signed-off-by: Qu Wenruo> --- > fs/btrfs/disk-io.c | 88 > + > include/uapi/linux/btrfs_tree.h | 1 + > 2 files changed, 89 insertions(+) > > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c > index 59ee7b959bf0..557f9a520e2a 100644 > --- a/fs/btrfs/disk-io.c > +++ b/fs/btrfs/disk-io.c > @@ -549,6 +549,83 @@ static int check_tree_block_fsid(struct btrfs_fs_info > *fs_info, > btrfs_header_level(eb) == 0 ? "leaf" : "node", \ > reason, btrfs_header_bytenr(eb), root->objectid, slot) > > +static int check_extent_data_item(struct btrfs_root *root, > + struct extent_buffer *leaf, int slot) > +{ > + struct btrfs_file_extent_item *fi; > + u32 sectorsize = root->fs_info->sectorsize; > + u32 item_size = btrfs_item_size_nr(leaf, slot); > + > + fi = btrfs_item_ptr(leaf, slot, struct btrfs_file_extent_item); > + > + if (btrfs_file_extent_type(leaf, fi) >= BTRFS_FILE_EXTENT_LAST_TYPE) { > + CORRUPT("invalid file extent type", leaf, root, slot); > + return -EIO; > + } > + if (btrfs_file_extent_compression(leaf, fi) >= BTRFS_COMPRESS_LAST) { > + CORRUPT("invalid file extent compression", leaf, root, slot); > + return -EIO; > + } > + if (btrfs_file_extent_encryption(leaf, fi)) { > + CORRUPT("invalid file extent encryption", leaf, root, slot); > + return -EIO; > + } > + if (btrfs_file_extent_type(leaf, fi) == BTRFS_FILE_EXTENT_INLINE) { > + if (btrfs_file_extent_compression(leaf, fi) != > + BTRFS_COMPRESS_NONE) > + return 0; > + /* Plaintext inline extent size must match item size */ > + if (item_size != BTRFS_FILE_EXTENT_INLINE_DATA_START + > + btrfs_file_extent_ram_bytes(leaf, fi)) { > + CORRUPT("plaintext inline extent has invalid size", > + leaf, root, slot); > + return -EIO; > + } > + return 0; > + } > + > + > + /* regular or preallocated extent has fixed item size */ > + if (item_size != sizeof(*fi)) { > + CORRUPT( > + "regluar or preallocated extent data item size is invalid", > + leaf, root, slot); > + return -EIO; > + } > + if (!IS_ALIGNED(btrfs_file_extent_ram_bytes(leaf, fi), sectorsize) || > + !IS_ALIGNED(btrfs_file_extent_disk_bytenr(leaf, fi), sectorsize) || > + !IS_ALIGNED(btrfs_file_extent_disk_num_bytes(leaf, fi), > + sectorsize) || > + !IS_ALIGNED(btrfs_file_extent_offset(leaf, fi), sectorsize) || > + !IS_ALIGNED(btrfs_file_extent_num_bytes(leaf, fi), sectorsize)) { > + CORRUPT( > + "regular or preallocated extent data item has unaligned value", > + leaf, root, slot); > + return -EIO; > + } > + > + return 0; > +} > + > +static int check_leaf_item(struct btrfs_root *root, > +struct extent_buffer *leaf, int slot) > +{ > + struct btrfs_key key; > + int ret = 0; > + > + btrfs_item_key_to_cpu(leaf, , slot); nit: We already have the key in the proper format in the caller of this function. Why not just pass in the type as an argument and save a redundant call for every item in a leaf? Perhaps it's a microoptimisation but for very densely populated trees the miniature cost might build up. > + /* > + * Considering how overcrowded the code will be inside the switch, > + * complex verification is better to moved its own function. > + */ > + switch (key.type) { > + case BTRFS_EXTENT_DATA_KEY: > + ret = check_extent_data_item(root, leaf, slot); > + break; > + } > + return ret; > +} > + > static noinline int check_leaf(struct btrfs_root *root, > struct extent_buffer *leaf) > { > @@ -605,9 +682,13 @@ static noinline int check_leaf(struct btrfs_root *root, >* 1) key order >* 2) item offset and size >*No overlap, no hole, all inside the leaf. > + * 3) item content > +
Re: degraded BTRFS RAID 1 not mountable: open_ctree failed, unable to find block group for 0
On Tue, 22 Aug 2017 11:31:23 +0200 g6094...@freenet.de wrote: > So 1st should be investigating why did the disk not get removed > correctly? Btrfs dev del should remove the device corretly, right? Is > there a bug? It should and probably did. To check that we need to see output of btrfs filesystem show and output of btrfs filesystem usage If there are non-raid1 chunks then you need to do soft balance: btrfs balance start -mconvert=raid1,soft -dconvert=raid1,soft The balance should finish very quickly as you probably have only one of data and metadata single chunks. They appeared during writes when the filesystem was mounted read-write in degraded mode. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 00/15] Btrfs-progs offline scrub
Ping -Original Message- From: linux-btrfs-ow...@vger.kernel.org [mailto:linux-btrfs-ow...@vger.kernel.org] On Behalf Of Gu Jinxiang Sent: Tuesday, July 18, 2017 2:34 PM To: linux-btrfs@vger.kernel.org Cc: quwenruo.bt...@gmx.com Subject: [PATCH 00/15] Btrfs-progs offline scrub For any one who wants to try it, it can be get from my repo: https://github.com/gujx2017/btrfs-progs/tree/offline_scrub In this v5, just make some small fixups of comments on the left 15 patches, according to problems pointed out by David when mergering the first 5 patches of this patchset. And rebase it to 93a9004dde410d920f08f85c6365e138713992d8. Several reports on kernel scrub screwing up good data stripes are in ML for sometime. And since kernel scrub won't account P/Q corruption, it makes us quite hard to detect error like kernel screwing up P/Q when scrubbing. To get a comparable tool for kernel scrub, we need a user-space tool to act as benchmark to compare their different behaviors. So here is the patchset for user-space scrub. Which can do: 1) All mirror/backup check for non-parity based stripe Which means for RAID1/DUP/RAID10, we can really check all mirrors other than the 1st good mirror. Current "--check-data-csum" option should be finally replaced by offline scrub. As "--check-data-csum" doesn't really check all mirrors, if it hits a good copy, then resting copies will just be ignored. In v4 update, data check is further improved, inspired by kernel behavior, now data extent is checked sector by sector, so it can handle the following corruption case: Data extent A contains data from 0~28K. And |///| = corrupted | | = good 0 4k 8k 12k 16k 20k 24k 28k Mirror 0 |///| |///| |///| | | Mirror 1 | |///| |///| |///| | Extent A should be RECOVERABLE, while in v3 we treat data extent A as a whole unit, above case is reported as CORRUPTED. 2) RAID5/6 full stripe check It will take full use of btrfs csum(both tree and data). It will only recover the full stripe if all recovered data matches with its csum. NOTE: Due to the lack of good bitmap facilities, RAID56 sector by sector repair will be quite complex, especially when NODATASUM is involved. So current RAID56 doesn't support vertical sector recovery yet. Data extent A contains data from 0~64K And |///| = corrupted while | | = good 0 8K 16K 24K 32K 40K 48K 56K 64K Data stripe 0 |///| |///| |///| |///| | Data stripe 1 | |///| |///| |///| |///| Parity | | | | | | | | | Kernel will recover it, while current scrub will report it as CORRUPTED. 3) Repair In v4 update, repair is finally added. And this patchset also introduces new btrfs_map_block() function, which is more flex than current btrfs_map_block(), and has a unified interface for all profiles, not just an extra array for RAID56. Check the 6th and 7th patch for details. They are already used in RAID5/6 scrub, but can also be used for other profiles too. The to-do list has been shortened, since repair is added in v4 update. 1) Test cases Need to make the infrastructure able to handle multi-device first. 2) Make btrfsck able to handle RAID5 with missing device Now it doesn't even open RAID5 btrfs with missing device, even though scrub should be able to handle it. 3) RAID56 vertical sector repair Although I consider such case is minor compared to RAID1 vertical sector repair. As for RAID1, an extent can be as large as 128M, while for RAID56 one stripe will always be 64K, much smaller than RAID1 case, making the possibility lower. I prefer to add this function after the patchset get merged, as no one really likes get 20 mails every time I update the patchset. For guys who want to review the patchset, there is a basic function relationships slide. I hope this will reduce the time needed to get what the patchset is doing. https://docs.google.com/presentation/d/1tAU3lUVaRUXooSjhFaDUeyW3wauHDSg9H-AiLBOSuIM/edit?usp=sharing Changelog: V0.8 RFC: Initial RFC patchset v1: First formal patchset. RAID6 recovery support added, mainly copied from kernel radi6 lib. Cleaner recovery logical. v2: More comments in both code and commit message, suggested by David. File re-arrangement, no check/ dir, raid56.ch moved to kernel-lib, Suggested by David v3: Put "--offline" option to scrub, other than put it in fsck. Use bitmap to read multiple csums in one run, to improve performance. Add --progress/--no-progress option, to tell user we're not just wasting CPU and IO. v4: Improve data check. Make data extent to be checked sector by sector. And make repair to be supported. Gu Jinxiang (1): btrfs-progs: Introduce new btrfs_map_block function which returns more unified result. Qu Wenruo (14): btrfs-progs: Allow __btrfs_map_block_v2 to remove
Re: degraded BTRFS RAID 1 not mountable: open_ctree failed, unable to find block group for 0
He guys, picking up this old topic cause i'm running into a similar problem. Running a Ubuntu 16.04 (HWE K4.8) server with 2 nvme SSD as Raid1 as /. Since one nvme died i had to replace it, where the trouble began. I replaced the nvme, bootet degraded, added the new disk to the raid (btrfs dev add) and removed the missing/dead device (btrfs dev del). Everything worked well. BUT as i rebooted i ran into the "BTRFS RAID 1 not mountable: open_ctree failed, unable to find block group for 0" because of a MISSING disk?! I checked the btrfs list and found that there was a patch that enabled a more strict behavior in handing missing devices (atm cant find the related patch anymore), which was merged some kernels before k4.8 but was NOT in k4.4. So i managed to install the k4.4 ubuntu kernel and the system startet booting and working again. So my pitty is that i cant update to anything after k4.4 with this production machine. :-( So 1st should be investigating why did the disk not get removed correctly? Btrfs dev del should remove the device corretly, right? Is there a bug? 2nd Was the restriction on handling missing devices to strikt? Is there a bug? 3rd i saw https://patchwork.kernel.org/patch/9419189/ from Roman. Did he receive any comments on his patch? This one could help on this problem, too. Regards Sash -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] btrfs: Refactor check_leaf function for later expansion.
Current check_leaf() function does a good job checking key orders and item offset/size. However it only checks from slot 0 to the last but one slot, this is good but makes later expansion hard. So this refactoring iterates from slot 0 to the last slot. For key comparison, it uses a key with all 0 as initial key, so all valid key should be larger than it. And for item size/offset check, it compares current item end with previous item offset. For slot 0, use leaf end as special case. This makes later item/key offset check and item size check easier to be implemented. Signed-off-by: Qu Wenruo--- fs/btrfs/disk-io.c | 42 +++--- 1 file changed, 23 insertions(+), 19 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 080e2ebb8aa0..919ddd4b774c 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -553,8 +553,9 @@ static noinline int check_leaf(struct btrfs_root *root, struct extent_buffer *leaf) { struct btrfs_fs_info *fs_info = root->fs_info; + /* No valid key type is 0, so all key should be larger than this key */ + struct btrfs_key prev_key = {0, 0, 0}; struct btrfs_key key; - struct btrfs_key leaf_key; u32 nritems = btrfs_header_nritems(leaf); int slot; @@ -597,26 +598,21 @@ static noinline int check_leaf(struct btrfs_root *root, if (nritems == 0) return 0; - /* Check the 0 item */ - if (btrfs_item_offset_nr(leaf, 0) + btrfs_item_size_nr(leaf, 0) != - BTRFS_LEAF_DATA_SIZE(fs_info)) { - CORRUPT("invalid item offset size pair", leaf, root, 0); - return -EIO; - } - /* -* Check to make sure each items keys are in the correct order and their -* offsets make sense. We only have to loop through nritems-1 because -* we check the current slot against the next slot, which verifies the -* next slot's offset+size makes sense and that the current's slot -* offset is correct. +* Check the following things to make sure this is a good leaf, and +* leaf users won't need to bother similar sanity check: +* +* 1) key order +* 2) item offset and size +*No overlap, no hole, all inside the leaf. */ - for (slot = 0; slot < nritems - 1; slot++) { - btrfs_item_key_to_cpu(leaf, _key, slot); - btrfs_item_key_to_cpu(leaf, , slot + 1); + for (slot = 0; slot < nritems; slot++) { + u32 item_end_expected; + + btrfs_item_key_to_cpu(leaf, , slot); /* Make sure the keys are in the right order */ - if (btrfs_comp_cpu_keys(_key, ) >= 0) { + if (btrfs_comp_cpu_keys(_key, ) >= 0) { CORRUPT("bad key order", leaf, root, slot); return -EIO; } @@ -626,8 +622,12 @@ static noinline int check_leaf(struct btrfs_root *root, * item data starts at the end of the leaf and grows towards the * front. */ - if (btrfs_item_offset_nr(leaf, slot) != - btrfs_item_end_nr(leaf, slot + 1)) { + if (slot == 0) + item_end_expected = BTRFS_LEAF_DATA_SIZE(fs_info); + else + item_end_expected = btrfs_item_offset_nr(leaf, +slot - 1); + if (btrfs_item_end_nr(leaf, slot) != item_end_expected) { CORRUPT("slot offset bad", leaf, root, slot); return -EIO; } @@ -642,6 +642,10 @@ static noinline int check_leaf(struct btrfs_root *root, CORRUPT("slot end outside of leaf", leaf, root, slot); return -EIO; } + + prev_key.objectid = key.objectid; + prev_key.type = key.type; + prev_key.offset = key.offset; } return 0; -- 2.14.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] btrfs: Add sanity check for EXTENT_DATA when reading out leaf
Add extra checker for item with EXTENT_DATA type. This checks the following thing: 1) Item size Plain text inline file extent size must match item size. (compressed inline file extent has no info about its on-disk size) Regular/preallocated file extent size must be a fixed value. 2) Every member of regular file extent item Including alignment for bytenr and offset, possible value for compression/encryption/type. 3) Type/compression/encode must be one of the valid values. This should be the most comprehensive and restrict check in the context of btrfs_item for EXTENT_DATA. Signed-off-by: Qu Wenruo--- fs/btrfs/disk-io.c | 88 + include/uapi/linux/btrfs_tree.h | 1 + 2 files changed, 89 insertions(+) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 59ee7b959bf0..557f9a520e2a 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -549,6 +549,83 @@ static int check_tree_block_fsid(struct btrfs_fs_info *fs_info, btrfs_header_level(eb) == 0 ? "leaf" : "node", \ reason, btrfs_header_bytenr(eb), root->objectid, slot) +static int check_extent_data_item(struct btrfs_root *root, + struct extent_buffer *leaf, int slot) +{ + struct btrfs_file_extent_item *fi; + u32 sectorsize = root->fs_info->sectorsize; + u32 item_size = btrfs_item_size_nr(leaf, slot); + + fi = btrfs_item_ptr(leaf, slot, struct btrfs_file_extent_item); + + if (btrfs_file_extent_type(leaf, fi) >= BTRFS_FILE_EXTENT_LAST_TYPE) { + CORRUPT("invalid file extent type", leaf, root, slot); + return -EIO; + } + if (btrfs_file_extent_compression(leaf, fi) >= BTRFS_COMPRESS_LAST) { + CORRUPT("invalid file extent compression", leaf, root, slot); + return -EIO; + } + if (btrfs_file_extent_encryption(leaf, fi)) { + CORRUPT("invalid file extent encryption", leaf, root, slot); + return -EIO; + } + if (btrfs_file_extent_type(leaf, fi) == BTRFS_FILE_EXTENT_INLINE) { + if (btrfs_file_extent_compression(leaf, fi) != + BTRFS_COMPRESS_NONE) + return 0; + /* Plaintext inline extent size must match item size */ + if (item_size != BTRFS_FILE_EXTENT_INLINE_DATA_START + + btrfs_file_extent_ram_bytes(leaf, fi)) { + CORRUPT("plaintext inline extent has invalid size", + leaf, root, slot); + return -EIO; + } + return 0; + } + + + /* regular or preallocated extent has fixed item size */ + if (item_size != sizeof(*fi)) { + CORRUPT( + "regluar or preallocated extent data item size is invalid", + leaf, root, slot); + return -EIO; + } + if (!IS_ALIGNED(btrfs_file_extent_ram_bytes(leaf, fi), sectorsize) || + !IS_ALIGNED(btrfs_file_extent_disk_bytenr(leaf, fi), sectorsize) || + !IS_ALIGNED(btrfs_file_extent_disk_num_bytes(leaf, fi), + sectorsize) || + !IS_ALIGNED(btrfs_file_extent_offset(leaf, fi), sectorsize) || + !IS_ALIGNED(btrfs_file_extent_num_bytes(leaf, fi), sectorsize)) { + CORRUPT( + "regular or preallocated extent data item has unaligned value", + leaf, root, slot); + return -EIO; + } + + return 0; +} + +static int check_leaf_item(struct btrfs_root *root, + struct extent_buffer *leaf, int slot) +{ + struct btrfs_key key; + int ret = 0; + + btrfs_item_key_to_cpu(leaf, , slot); + /* +* Considering how overcrowded the code will be inside the switch, +* complex verification is better to moved its own function. +*/ + switch (key.type) { + case BTRFS_EXTENT_DATA_KEY: + ret = check_extent_data_item(root, leaf, slot); + break; + } + return ret; +} + static noinline int check_leaf(struct btrfs_root *root, struct extent_buffer *leaf) { @@ -605,9 +682,13 @@ static noinline int check_leaf(struct btrfs_root *root, * 1) key order * 2) item offset and size *No overlap, no hole, all inside the leaf. +* 3) item content +*If possible, do comprehensive sanity check. +*NOTE: All check must only rely on the item data itself. */ for (slot = 0; slot < nritems; slot++) { u32 item_end_expected; + int ret; btrfs_item_key_to_cpu(leaf, , slot); @@ -650,6 +731,13 @@ static noinline int check_leaf(struct btrfs_root *root, return -EIO;
[PATCH 0/3] Introduce comprehensive sanity check framework and
The patchset introduce a new framework to do more comprehensive (if not the most) sanity check when reading out a leaf. The new sanity checker will include: 1) Key order Existing code 2) Item boundary Existing code with enhanced checker to ensure item pointer doesn't overlap with item itself. 3) Key type based sanity checker Only EXTENT_DATA checker is implemented yet. As each checker should go through review and tests, or it can easily make a valid btrfs failed to be mounted. So only one checker is implemented as an example. Existing checker like INODE_REF checker can be moved to this framework easily, and we can centralize all existing checkers, make the rest of codes more clean. Performance wise, it's just iterating a leaf. And it will only get triggered when read out a leaf, cached leaf will not go through such checker. So it won't be a performance breaker. I tested with the patchset applied on v4.13-rc6 with fstests, no regression is detected. Qu Wenruo (3): btrfs: Refactor check_leaf function for later expansion. btrfs: Check if item pointer overlap with item itself btrfs: Add sanity check for EXTENT_DATA when reading out leaf fs/btrfs/disk-io.c | 137 ++-- include/uapi/linux/btrfs_tree.h | 1 + 2 files changed, 119 insertions(+), 19 deletions(-) -- 2.14.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] btrfs: Check if item pointer overlap with item itself
Function check_leaf() checks if any item pointer points outside of the leaf, but it doesn't check if the pointer overlap with the item itself. Normally only the last item may be the victim, but add such check is never a bad idea anyway. Signed-off-by: Qu Wenruo--- fs/btrfs/disk-io.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 919ddd4b774c..59ee7b959bf0 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -643,6 +643,13 @@ static noinline int check_leaf(struct btrfs_root *root, return -EIO; } + /* Also check if the item pointer overlaps with btrfs item. */ + if (btrfs_item_nr_offset(slot) + sizeof(struct btrfs_item) > + btrfs_item_ptr_offset(leaf, slot)) { + CORRUPT("slot overlap with its data", leaf, root, slot); + return -EIO; + } + prev_key.objectid = key.objectid; prev_key.type = key.type; prev_key.offset = key.offset; -- 2.14.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs Raid5 issue.
On 2017年08月22日 13:19, Robert LeBlanc wrote: Chris and Qu thanks for your help. I was able to restore the data off the volume. I only could not read one file that I tried to rsync (a MySQl bin log), but it wasn't critical as I had an off-site snapshot from that morning and ownclould could resync the files that were changed anyway. This turned out much better than the md RAID failure that I had a year ago. Much faster recovery thanks to snapshots. Is there anything you would like from this damaged filesystem to help determine what went wrong and to help make btrfs better? If I don't hear back from you in a day, I'll destroy it so that I can add the disks into the new btrfs volumes to restore redundancy. Feel free to destroy the old images. If nologreplay works, that's good enough. The problem seems to be extent tree, but it's too hard to locate the real problem. Bcache wasn't providing the performance I was hoping for, so I'm putting the root and roots for my LXC containers on the SSDs (btrfs RAID1) and the bulk stuff on the three spindle drives (btrfs RAID1). Well, I'm more interested in the bcache performance. I was considering to using my Intel 600P NVMe to cache one 2.5' HGST 1T HDD (7200rpm) in my btrfs KVM host (also my daily machine). Would you please share more details about the performance problem? (Maybe it's about some btrfs performance problems, not bcache. Btrfs is not good at workload like DB or metadata heavy operations) For some reason, it seemed that the btrfs RAID5 setup required one of the drives, but I thought I had data with RAID5 and metadata with 2 copies. Was I missing something else that prevented mounting with that specific drive? I don't want to get into a situation where one drive dies and I can't get to any data. The direct cause is btrfs fails to replay its log, and it's corrupted extent tree causing log replay failed. And normally such failure will definitely cause problem, so btrfs just stop the mount procedure. In your case, if "nologreplay" is specified, btrfs skips the problem, and since you must specify RO for nologrelay, btrfs has nothing to do with extent tree at all. So btrfs can be mounted. Why extent tree get corrupted is still unknown. If your metadata is also RAID5, then write-hole may be the cause. If your metadata profile is RAID1, then I don't know why this could happen. So from this point of view, even we fixed btrfs scrub/race problems, it's still not good enough to survive a disk removal in real world. With RAID1 setup, at least we don't need to care about write hole and csum will help us to determine which copy is correct, so I think it will be much better than RAID56. If you have spare time, you could try to hot-plug RAID1 devices to verify how it works. But please note that, re-attach plugged device may need to umount the fs and re-scan btrfs. And even you're using 3 devices with RAID1, it's still 2 copies. So you can lose at most 1 device. Thanks, Qu Thank you again. Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html