[btrfs-progs] btrfs fi df output
Hello, I have a question regarding "btrfs filesystem df"output. # btrfs fi df /mnt/test Data: total=3.01GB, used=512.19MB System, DUP: total=8.00MB, used=4.00KB System: total=4.00MB, used=0.00 <= What this means? For what is used? I've never seen this incremented Metadata, DUP: total=2.50GB, used=676.00KB Metadata: total=8.00MB, used=0.00<= the same question I have kernel 3.3.6 and btrfs-tools from git. #mkfs.btrfs /dev/mapper/vg-lvtest #mount /dev/mapper/vg-lvtest /mnt/test #dd if=/dev/zero of=/mnt/test/test.file bs=1M count=512 conv=fdatasync # btrfs fi df /mnt/test Data: total=3.01GB, used=512.19MB System, DUP: total=8.00MB, used=4.00KB System: total=4.00MB, used=0.00 Metadata, DUP: total=2.50GB, used=676.00KB Metadata: total=8.00MB, used=0.00 ierdnac-hp ~ # #umount /mnt/test These two chunks are the ones that appear below in btrfs-debug-tree ? Which ones ? In the three there is one 4MB andthree 8MB, one with 2 stripes. #btrfs-debug-tree /dev/mapper/vg-lvtest chunk tree leaf 20979712 items 12 free space 2557 generation 5 owner 3 fs uuid 6accfaf3-c88a-462e-85fc-35513d0b43d6 chunk uuid 65f22206-a9dd-4053-a660-61bc4ee0be12 item 0 key (DEV_ITEMS DEV_ITEM 1) itemoff 3897 itemsize 98 dev item devid 1 total_bytes 116912029696 bytes used 8627683328 item 1 key (FIRST_CHUNK_TREE CHUNK_ITEM 0) itemoff 3817 itemsize 80 chunk length 4194304 owner 2 type 2 num_stripes 1 stripe 0 devid 1 offset 0 item 2 key (FIRST_CHUNK_TREE CHUNK_ITEM 4194304) itemoff 3737 itemsize 80 chunk length 8388608 owner 2 type 4 num_stripes 1 stripe 0 devid 1 offset 4194304 item 3 key (FIRST_CHUNK_TREE CHUNK_ITEM 12582912) itemoff 3657 itemsize 80 chunk length 8388608 owner 2 type 1 num_stripes 1 stripe 0 devid 1 offset 12582912 item 4 key (FIRST_CHUNK_TREE CHUNK_ITEM 20971520) itemoff 3545 itemsize 112 chunk length 8388608 owner 2 type 34 num_stripes 2 stripe 0 devid 1 offset 20971520 stripe 1 devid 1 offset 29360128 Thanks -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: atime and filesystems with snapshots (especially Btrfs)
On 05/25/2012 06:35 PM, Alexander Block wrote: > Hello, > > (this is a resend with proper CC for linux-fsdevel and linux-kernel) > > I would like to start a discussion on atime in Btrfs (and other > filesystems with snapshot support). > > As atime is updated on every access of a file or directory, we get > many changes to the trees in btrfs that as always trigger cow > operations. This is no problem as long as the changed tree blocks are > not shared by other subvolumes. Performance is also not a problem, no > matter if shared or not (thanks to relatime which is the default). > The problems start when someone starts to use snapshots. If you for > example snapshot your root and continue working on your root, after > some time big parts of the tree will be cowed and unshared. In the > worst case, the whole tree gets unshared and thus takes up the double > space. Normally, a user would expect to only use extra space for a > tree if he changes something. > A worst case scenario would be if someone took regular snapshots for > backup purposes and later greps the contents of all snapshots to find > a specific file. This would touch all inodes in all trees and thus > make big parts of the trees unshared. > > relatime (which is the default) reduces this problem a little bit, as > it by default only updates atime once a day. This means, if anyone > wants to test this problem, mount with relatime disabled or change the > system date before you try to update atime (that's the way i tested > it). > > As a solution, I would suggest to make noatime the default for btrfs. > I'm however not sure if it is allowed in linux to have different > default mount options for different filesystem types. I know this > discussion pops up every few years (last time it resulted in making > relatime the default). But this is a special case for btrfs. atime is > already bad on other filesystems, but it's much much worse in btrfs. > Sounds like a real problem. I would suggest a few remedies. 1. Make a filesystem persistent parameter that says noatime/relatime/atime So the default if not specified on mount is taken as a property of the FS (mkfs can set it) 2. The snapshot program should check and complain if it is on, and recommend an off. Since the problem only starts with a snapshot. 3. If space availability drops under some threshold, disable atime. As you said this is catastrophic in this case. So user can always search and delete files. In fact if the IO was only because of atime, it should be a soft error, warned, and ignored. But perhaps the true solution is to put atime on a side table, so only the atime info gets COW and not the all MetaData Just my $0.017 Boaz > Alex. > -- > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: return value of btrfs_read_buffer is checked correctly
btrfs_read_buffer() has the possibility of returning the error. Therefore, I add the code in which the return value of btrfs_read_buffer() is checked. Signed-off-by: Tsutomu Itoh --- fs/btrfs/ctree.c|6 +- fs/btrfs/tree-log.c | 16 +--- 2 files changed, 18 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index 4106264..c1af717 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -739,7 +739,11 @@ int btrfs_realloc_node(struct btrfs_trans_handle *trans, if (!cur) return -EIO; } else if (!uptodate) { - btrfs_read_buffer(cur, gen); + err = btrfs_read_buffer(cur, gen); + if (err) { + free_extent_buffer(cur); + return err; + } } } if (search_start == 0) diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index eb1ae90..6f22a4f 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -1628,7 +1628,9 @@ static int replay_one_buffer(struct btrfs_root *log, struct extent_buffer *eb, int i; int ret; - btrfs_read_buffer(eb, gen); + ret = btrfs_read_buffer(eb, gen); + if (ret) + return ret; level = btrfs_header_level(eb); @@ -1749,7 +1751,11 @@ static noinline int walk_down_log_tree(struct btrfs_trans_handle *trans, path->slots[*level]++; if (wc->free) { - btrfs_read_buffer(next, ptr_gen); + ret = btrfs_read_buffer(next, ptr_gen); + if (ret) { + free_extent_buffer(next); + return ret; + } btrfs_tree_lock(next); btrfs_set_lock_blocking(next); @@ -1766,7 +1772,11 @@ static noinline int walk_down_log_tree(struct btrfs_trans_handle *trans, free_extent_buffer(next); continue; } - btrfs_read_buffer(next, ptr_gen); + ret = btrfs_read_buffer(next, ptr_gen); + if (ret) { + free_extent_buffer(next); + return ret; + } WARN_ON(*level <= 0); if (path->nodes[*level-1]) -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Newbie questions on some of btrfs code...
On Mon, May 28, 2012 at 20:45 (+0200), Alex Lyakas wrote: > I have re-looked at btrfs_search_slot, and don't see how it would end > up in leaf B. The bin_search() function will clearly return the slot > *after* the slot of N that has key==5 (which is the parent slot of A). > So then the following code: > if (level != 0) { > int dec = 0; > if (ret && slot > 0) { > dec = 1; > slot -= 1; > } > will bring us back into the slot of N with key=5. And we will go to > leaf A. While if key(N) of that slot was 10, we would never have ended > up in that slot, unless there is no lesser key in the tree. Yes, that's right. As already said in my previous mail (in the paragraph you didn't quote), the key in the leaf must be an exact match. The key in N pointing to A will be 10. > Actually, it looks like "no lesser key" is the only case when we can > get ret==1 and slot==0. Correct. > Except perhaps an empty leaf, which I am not sure can happen. It can't. -Jan -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Make existing snapshots read-only?
On Tue, May 29, 2012 at 08:40:10AM +0800, Li Zefan wrote: > > Is there any way to mark existing snapshots as read-only? Making new > > ones read-only is easy enough, but what about existing ones? > > We have code in the kernel side, so what we need to do is to update > btrfs-progs, > which is trivial. Well, I don't like that it's even possible to turn a RO snapshot to a RW one. What was the rationale behind this back then? Besides, I think that it could break assumptions in the backref code. If it's only a one-way operation from a regular subvol -> RO subvol, this sounds reasonable to me. If the opposite direction is allowed, then I'd not call it 'read-only' but "unwritable on-request". david -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Newbie questions on some of btrfs code...
Thank you Jan, Hugo & Lio, for taking time answering my questions. Alex. P.S.: I have dug in some more, so probably more questions will arrive:) On Tue, May 29, 2012 at 12:13 PM, Jan Schmidt wrote: > On Mon, May 28, 2012 at 20:45 (+0200), Alex Lyakas wrote: >> I have re-looked at btrfs_search_slot, and don't see how it would end >> up in leaf B. The bin_search() function will clearly return the slot >> *after* the slot of N that has key==5 (which is the parent slot of A). >> So then the following code: >> if (level != 0) { >> int dec = 0; >> if (ret && slot > 0) { >> dec = 1; >> slot -= 1; >> } >> will bring us back into the slot of N with key=5. And we will go to >> leaf A. While if key(N) of that slot was 10, we would never have ended >> up in that slot, unless there is no lesser key in the tree. > > Yes, that's right. As already said in my previous mail (in the paragraph > you didn't quote), the key in the leaf must be an exact match. The key > in N pointing to A will be 10. > >> Actually, it looks like "no lesser key" is the only case when we can >> get ret==1 and slot==0. > > Correct. > >> Except perhaps an empty leaf, which I am not sure can happen. > > It can't. > > -Jan -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Make existing snapshots read-only?
On Tue, May 29, 2012 at 5:18 AM, David Sterba wrote: > On Tue, May 29, 2012 at 08:40:10AM +0800, Li Zefan wrote: >> > Is there any way to mark existing snapshots as read-only? Making new >> > ones read-only is easy enough, but what about existing ones? >> >> We have code in the kernel side, so what we need to do is to update >> btrfs-progs, >> which is trivial. > > Well, I don't like that it's even possible to turn a RO snapshot to a RW > one. What was the rationale behind this back then? Besides, I think that > it could break assumptions in the backref code. > > If it's only a one-way operation from a regular subvol -> RO subvol, > this sounds reasonable to me. If the opposite direction is allowed, then > I'd not call it 'read-only' but "unwritable on-request". Is anyone actually expecting readonly-snapshots to be a worm implementation? And are they sane to expect it? So long as the permissions required to change it are sane (admin rights to change an arbitrary snapshot, possibly something like write-permission on the mountpoint to change otherwise), I don't see the gain. It's not like root can't modify the disk directly, so withholding an easy way to flip the readonly bit just strikes me as a nuisance feature. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Make existing snapshots read-only?
2012-05-28 12:37:00 -0600, Bruce Guenter: > > Is there any way to mark existing snapshots as read-only? Making new > ones read-only is easy enough, but what about existing ones? [...] you can always do btrfs sub snap -r vol vol-ro btrfs sub del vol mv vol-ro vol -- Stephane -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Will big metadata blocks fix # of hardlinks?
Thanks for noting this one. That is one very surprising and unexpected limit!... And a killer for some not completely rare applications... On 26/05/12 19:22, Sami Liedes wrote: > Hi! > > I see that Linux 3.4 supports bigger metadata blocks for btrfs. > > Will using them allow a bigger number of hardlinks on a single file > (i.e. the bug that has bitten at least git users on Debian[1,2], and > BackupPC[3])? As far as I understand correctly, the problem has been > that the hard links are stored in the same metadata block with some > other metadata, so the size of the block is an inherent limitation? > > If so, I think it would be worth for me to try Btrfs again :) > > Sami > > > [1] http://permalink.gmane.org/gmane.comp.file-systems.btrfs/13603 > [2] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=642603 > [3] https://bugzilla.kernel.org/show_bug.cgi?id=15762 One example fail case is just 13 hard links. Even x4 that (16k blocks) only gives 52 links for that example fail case. The brief summary for those are: * It's a rare corner case that needs a format change to fix, so "won't-fix"; * There are real world problem examples noted in those threads for such as: BackupPC (backups); nnmaildir mail backend in Gnus (an Emacs package for reading news and email); and a web archiver. * Also, Bacula (backups) and Mutt (email client) are quoted as problem examples in: Btrfs File-System Plans For Ubuntu 12.10 http://www.phoronix.com/scan.php?page=news_item&px=MTEwMDE For myself, I have a real world example for deduplication of identical files from a proprietary data capture system where the filenames change (timestamp and index data stored in the filename) yet there are periods where the file contents change only occasionally... The 'natural' thing to do is hardlink together all the identical files to then just have the unique filenames... And you might have many files in a particular directory... Note that for long filenames (surprisingly commonly done!), one fail case noted above is just 13 hard links. Looks like I'm stuck on ext4 with an impoverished "cp -l" for a fast 'snapshot' for the time being still... (Or differently, LVM snapshot and copy.) For btrfs, rather than a "break everything" format change, can a neat and robust 'workaround' be made so that the problem-case hardlinks to a file within the same directory perhaps spawn their own transparent subdirectory for the hard links?... Worse case then is that upon a downgrade to an older kernel, the 'transparent' subdirectory of hard links becomes visible as a distinct subdirectory? (That is a 'break' but at least data isn't lost.) Or am I chasing the wrong bits? ;-) More seriously: The killer there for me is that running rsync or running a deduplication script might hit too many hard links that were perfectly fine when on ext4. Regards, Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Will big metadata blocks fix # of hardlinks?
On Tue, May 29, 2012 at 02:09:03PM +0100, Martin wrote: > Thanks for noting this one. That is one very surprising and unexpected > limit!... And a killer for some not completely rare applications... There have been substantially-complete patches posted to this list which fix the problem (see "extended inode refs" patches by Mark Fasheh in the archives). I don't think they're quite ready for inclusion yet, but work is ongoing to fix the issue. > On 26/05/12 19:22, Sami Liedes wrote: > > Hi! > > > > I see that Linux 3.4 supports bigger metadata blocks for btrfs. > > > > Will using them allow a bigger number of hardlinks on a single file > > (i.e. the bug that has bitten at least git users on Debian[1,2], and > > BackupPC[3])? As far as I understand correctly, the problem has been > > that the hard links are stored in the same metadata block with some > > other metadata, so the size of the block is an inherent limitation? > > > > If so, I think it would be worth for me to try Btrfs again :) > > > > Sami > > > > > > [1] http://permalink.gmane.org/gmane.comp.file-systems.btrfs/13603 > > [2] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=642603 > > [3] https://bugzilla.kernel.org/show_bug.cgi?id=15762 > > One example fail case is just 13 hard links. Even x4 that (16k blocks) > only gives 52 links for that example fail case. > > > The brief summary for those are: > > * It's a rare corner case that needs a format change to fix, so "won't-fix"; Definitely not "won't-fix" (see above). Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Great oxymorons of the world, no. 7: The Simple Truth --- signature.asc Description: Digital signature
Re: atime and filesystems with snapshots (especially Btrfs)
On Tue, May 29, 2012 at 10:14 AM, Boaz Harrosh wrote: > > Sounds like a real problem. I would suggest a few remedies. > 1. Make a filesystem persistent parameter that says noatime/relatime/atime > So the default if not specified on mount is taken as a property of > the FS (mkfs can set it) That would be possible. But again, I'm not sure if it is allowed for one fs type to differ from all the other filesystems in its default behavior. > 2. The snapshot program should check and complain if it is on, and recommend > an off. Since the problem only starts with a snapshot. That would definitely cause awareness for the problem and many people would probably switch to noatime on mount. > 3. If space availability drops under some threshold, disable atime. As you > said > this is catastrophic in this case. So user can always search and delete > files. > In fact if the IO was only because of atime, it should be a soft error, > warned, > and ignored. It would be hard to determine a good threshold. This really depends on the way snapshots are used. > > But perhaps the true solution is to put atime on a side table, so only the > atime > info gets COW and not the all MetaData This would definitely reduce the problem to a minimum. But it may be harder to implement as it sounds. You would either have to keep 2 trees per subvolume (one for the fs and one for atime) or share one tree for all subvols. I don't think 2 trees per subvolume would be acceptable, but I'm not sure. A shared tree would require to implement some kind of custom refcounting for the items, as changes to one fs tree should not change atime of the other and thus create new items on demand. It would probably also require snapshot origin tracking, because on a freshly snapshotted subvolume, no atime entries would exist at all and must be read from the parent/origin. > > Just my $0.017 > Boaz > >> Alex. >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Decrease meta fragments by using a caterpillar band Method (Ver. 2)
This is a several bugs fixed version since my first patch commit, and added patch of btrfs-prog Introduction and brief speculate of values and penalties: When a tree block need to be created, we offer, say, 2 or 3 blocks for it, then pick one from the continuous blocks. If this tree block needs a cow, another free block from these continuous blocks can be grabbed, and the old one is freed for next cow. In the most ideal condition only 2 continuous blocks are kept for any times of cowing a tree block -- image a caterpillar band by which my method is named. Given a scene that there are two file systems: one with COW and the other NOCOW, each has 1GB space for metadata with it's first 100MB has been used, and let me focus only on ops of modifying metadata and ignore deleting metadata for simple reason. As we can image that the NOCOW fs would likely keep the layout of its 100MB meta to most neat: changes can be overwritten in the original places and leave the rest 900MB untouched. But it is lack of the excellent feature to assure data integrity which owned only by COW fs. However, only modifying metadata though, the COW fs would make holes in the first 100MB and write COWed meta into the rest 900MB space, in the extreme condition, the whole 1GB space would be fragmented finally and scattered by that 100MB metadata. I don't think btrfs will be easily trap into such bad state, as I understood we have extent, chunk, cluster and maybe other methods(tell me please) to slow fragmenting, but it seems that there are no dedicate method (fix me) to help COW getting rid of this type of fragments which NOCOW fs does not incline to make. I introduce the caterpillar band method as a compromise. It use 300MB for meta to avoid such fragments and without lost of COW feature, in the instance, that means three countinues blocks are used for a single tree block, the tree block will be circularly updated(cowed) within its three countinues blocks. Penalties? Yes there are, thanks to Arne Jansen and Liu Bo. As Arne Jansen indicated, the most disadvantage of the method will be that this will basically limit us to 1/3th of platter speed to write meta when using spinning drives and to 1/4th if using four countinues blocks for a tree block. About readahead, which will be also down to the 1/3th of NOCOW fs rate, but I would discreetly think it as an advantage rather than penalty comparing worse condition which COW would get -- nearly since the first COW, the new tree blocks cowed would be 50MB far away from their original neighbor blocks normally, and after frequent random modify ops, would the worst conditition be that every dozen of tree blocks newly cowed are more than 50MB far away from their original neighbor blocks if in equal probability? So permit me to think readahead is only usefull for NOCOW fs in this scenario, because it always keeps its original 100MB continued, and my way would keep 1/3 readahead rate vs maybe-zero by pure COW if worstly. Of course, both penalties and values are only for metadata and will not affect user date read/write, my patch is only applied for cow tree blocks. But if there are large number of small files(size<4k), values and penalties will also affect those small user data R/W. I have not made tests for my patch by now, it still need some time to get more check for both strategy and code in patch and fix possible bugs before test, any comments are welcome. Thanks signed-off-by WeiFeng Liu 523f28f9b3d9c710cacc31dbba644efb1678cf62 --- diff -uprN a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c --- a/fs/btrfs/ctree.c 2012-05-21 18:42:51.0 + +++ b/fs/btrfs/ctree.c 2012-05-29 23:08:19.0 + @@ -444,9 +444,21 @@ static noinline int __btrfs_cow_block(st } else parent_start = 0; - cow = btrfs_alloc_free_block(trans, root, buf->len, parent_start, -root->root_key.objectid, &disk_key, -level, search_start, empty_size, 1); + if (root->fs_info->cater_factor > 1) { + if (btrfs_cater_factor(btrfs_header_cater(buf)) > 1) + cow = btrfs_grab_cater_block(trans, root, buf, parent_start, + root->root_key.objectid, &disk_key, + level, search_start, empty_size, 1); + else + cow = btrfs_alloc_free_block_cater(trans, root, buf->len, + parent_start, + root->root_key.objectid, &disk_key, + level, search_start, empty_size, 1); + } else { + cow = btrfs_alloc_free_block(trans, roo
[PATCH] Decrease meta fragments by using a caterpillar band Method (btrfs-progs)
signed-off-by WeiFeng Liu 523f28f9b3d9c710cacc31dbba644efb1678cf62 --- diff -uprN btrfs-progs-120328-a/ctree.c btrfs-progs-120328-b/ctree.c --- btrfs-progs-120328-a/ctree.c2012-04-16 08:47:08.0 + +++ btrfs-progs-120328-b/ctree.c2012-05-28 23:29:15.0 + @@ -334,6 +334,7 @@ int __btrfs_cow_block(struct btrfs_trans btrfs_set_header_flag(cow, BTRFS_HEADER_FLAG_RELOC); else btrfs_set_header_owner(cow, root->root_key.objectid); + btrfs_set_header_cater(cow, 0); write_extent_buffer(cow, root->fs_info->fsid, (unsigned long)btrfs_header_fsid(cow), diff -uprN btrfs-progs-120328-a/ctree.h btrfs-progs-120328-b/ctree.h --- btrfs-progs-120328-a/ctree.h2012-04-16 08:47:08.0 + +++ btrfs-progs-120328-b/ctree.h2012-05-28 23:25:26.0 + @@ -292,6 +292,7 @@ struct btrfs_header { __le64 owner; __le32 nritems; u8 level; + u8 cater_index_factor; } __attribute__ ((__packed__)); #define BTRFS_NODEPTRS_PER_BLOCK(r) (((r)->nodesize - \ @@ -510,6 +511,7 @@ struct btrfs_extent_item { __le64 refs; __le64 generation; __le64 flags; + u8 cater_index_factor; } __attribute__ ((__packed__)); struct btrfs_extent_item_v0 { @@ -1246,6 +1248,8 @@ BTRFS_SETGET_FUNCS(extent_flags, struct BTRFS_SETGET_FUNCS(extent_refs_v0, struct btrfs_extent_item_v0, refs, 32); BTRFS_SETGET_FUNCS(tree_block_level, struct btrfs_tree_block_info, level, 8); +BTRFS_SETGET_FUNCS(extent_cater, struct btrfs_extent_item, + cater_index_factor, 8); static inline void btrfs_tree_block_key(struct extent_buffer *eb, struct btrfs_tree_block_info *item, @@ -1511,6 +1515,8 @@ BTRFS_SETGET_HEADER_FUNCS(header_owner, BTRFS_SETGET_HEADER_FUNCS(header_nritems, struct btrfs_header, nritems, 32); BTRFS_SETGET_HEADER_FUNCS(header_flags, struct btrfs_header, flags, 64); BTRFS_SETGET_HEADER_FUNCS(header_level, struct btrfs_header, level, 8); +BTRFS_SETGET_HEADER_FUNCS(header_cater, struct btrfs_header, + cater_index_factor, 8); static inline int btrfs_header_flag(struct extent_buffer *eb, u64 flag) { diff -uprN btrfs-progs-120328-a/extent-tree.c btrfs-progs-120328-b/extent-tree.c --- btrfs-progs-120328-a/extent-tree.c 2012-04-16 08:47:08.0 + +++ btrfs-progs-120328-b/extent-tree.c 2012-05-28 20:06:06.0 + @@ -2584,6 +2584,7 @@ static int alloc_reserved_tree_block(str btrfs_set_extent_generation(leaf, extent_item, generation); btrfs_set_extent_flags(leaf, extent_item, flags | BTRFS_EXTENT_FLAG_TREE_BLOCK); + btrfs_set_extent_cater(leaf, extent_item, 0); block_info = (struct btrfs_tree_block_info *)(extent_item + 1); btrfs_set_tree_block_key(leaf, block_info, key); diff -uprN btrfs-progs-120328-a/utils.c btrfs-progs-120328-b/utils.c --- btrfs-progs-120328-a/utils.c2012-04-16 08:47:08.0 + +++ btrfs-progs-120328-b/utils.c2012-05-28 23:22:20.0 + @@ -135,6 +135,7 @@ int make_btrfs(int fd, const char *devic btrfs_set_header_generation(buf, 1); btrfs_set_header_backref_rev(buf, BTRFS_MIXED_BACKREF_REV); btrfs_set_header_owner(buf, BTRFS_ROOT_TREE_OBJECTID); + btrfs_set_header_cater(buf, 0); write_extent_buffer(buf, super.fsid, (unsigned long) btrfs_header_fsid(buf), BTRFS_FSID_SIZE); @@ -254,6 +255,7 @@ int make_btrfs(int fd, const char *devic btrfs_set_header_bytenr(buf, blocks[2]); btrfs_set_header_owner(buf, BTRFS_EXTENT_TREE_OBJECTID); btrfs_set_header_nritems(buf, nritems); + btrfs_set_header_cater(buf, 0); csum_tree_block_size(buf, BTRFS_CRC32_SIZE, 0); ret = pwrite(fd, buf->data, leafsize, blocks[2]); BUG_ON(ret != leafsize); @@ -338,6 +340,7 @@ int make_btrfs(int fd, const char *devic btrfs_set_header_bytenr(buf, blocks[3]); btrfs_set_header_owner(buf, BTRFS_CHUNK_TREE_OBJECTID); btrfs_set_header_nritems(buf, nritems); + btrfs_set_header_cater(buf, 0); csum_tree_block_size(buf, BTRFS_CRC32_SIZE, 0); ret = pwrite(fd, buf->data, leafsize, blocks[3]); @@ -373,6 +376,7 @@ int make_btrfs(int fd, const char *devic btrfs_set_header_bytenr(buf, blocks[4]); btrfs_set_header_owner(buf, BTRFS_DEV_TREE_OBJECTID); btrfs_set_header_nritems(buf, nritems); + btrfs_set_header_cater(buf, 0); csum_tree_block_size(buf, BTRFS_CRC32_SIZE, 0); ret = pwrite(fd, buf->data, leafsize, blocks[4]); @@ -382,6 +386,7 @@ int make_btrfs(int fd, const char *devic btrfs_set_header_bytenr(buf, blocks[5]); btrfs_set_header_owner(buf, BTRFS_FS_TREE_OBJECTID); btrfs_set_header_nritems(buf, 0); + btrfs_set_header_cater(buf, 0);
[RFC PATCH] Decrease meta fragments by using a caterpillar band Method (Ver. 2)
This is a several bugs fixed version since my first patch commit, and added patch of btrfs-prog Introduction and brief speculate of values and penalties: When a tree block need to be created, we offer, say, 2 or 3 blocks for it, then pick one from the continuous blocks. If this tree block needs a cow, another free block from these continuous blocks can be grabbed, and the old one is freed for next cow. In the most ideal condition only 2 continuous blocks are kept for any times of cowing a tree block -- image a caterpillar band by which my method is named. Given a scene that there are two file systems: one with COW and the other NOCOW, each has 1GB space for metadata with it's first 100MB has been used, and let me focus only on ops of modifying metadata and ignore deleting metadata for simple reason. As we can image that the NOCOW fs would likely keep the layout of its 100MB meta to most neat: changes can be overwritten in the original places and leave the rest 900MB untouched. But it is lack of the excellent feature to assure data integrity which owned only by COW fs. However, only modifying metadata though, the COW fs would make holes in the first 100MB and write COWed meta into the rest 900MB space, in the extreme condition, the whole 1GB space would be fragmented finally and scattered by that 100MB metadata. I don't think btrfs will be easily trap into such bad state, as I understood we have extent, chunk, cluster and maybe other methods(tell me please) to slow fragmenting, but it seems that there are no dedicate method (fix me) to help COW getting rid of this type of fragments which NOCOW fs does not incline to make. I introduce the caterpillar band method as a compromise. It use 300MB for meta to avoid such fragments and without lost of COW feature, in the instance, that means three countinues blocks are used for a single tree block, the tree block will be circularly updated(cowed) within its three countinues blocks. Penalties? Yes there are, thanks to Arne Jansen and Liu Bo. As Arne Jansen indicated, the most disadvantage of the method will be that this will basically limit us to 1/3th of platter speed to write meta when using spinning drives and to 1/4th if using four countinues blocks for a tree block. About readahead, which will be also down to the 1/3th of NOCOW fs rate, but I would discreetly think it as an advantage rather than penalty comparing worse condition which COW would get -- nearly since the first COW, the new tree blocks cowed would be 50MB far away from their original neighbor blocks normally, and after frequent random modify ops, would the worst conditition be that every dozen of tree blocks newly cowed are more than 50MB far away from their original neighbor blocks if in equal probability? So permit me to think readahead is only usefull for NOCOW fs in this scenario, because it always keeps its original 100MB continued, and my way would keep 1/3 readahead rate vs maybe-zero by pure COW if worstly. Of course, both penalties and values are only for metadata and will not affect user date read/write, my patch is only applied for cow tree blocks. But if there are large number of small files(size<4k), values and penalties will also affect those small user data R/W. I have not made tests for my patch by now, it still need some time to get more check for both strategy and code in patch and fix possible bugs before test, any comments are welcome. Thanks signed-off-by WeiFeng Liu 523f28f9b3d9c710cacc31dbba644efb1678cf62 --- diff -uprN a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c --- a/fs/btrfs/ctree.c 2012-05-21 18:42:51.0 + +++ b/fs/btrfs/ctree.c 2012-05-29 23:08:19.0 + @@ -444,9 +444,21 @@ static noinline int __btrfs_cow_block(st } else parent_start = 0; - cow = btrfs_alloc_free_block(trans, root, buf->len, parent_start, -root->root_key.objectid, &disk_key, -level, search_start, empty_size, 1); + if (root->fs_info->cater_factor > 1) { + if (btrfs_cater_factor(btrfs_header_cater(buf)) > 1) + cow = btrfs_grab_cater_block(trans, root, buf, parent_start, + root->root_key.objectid, &disk_key, + level, search_start, empty_size, 1); + else + cow = btrfs_alloc_free_block_cater(trans, root, buf->len, + parent_start, + root->root_key.objectid, &disk_key, + level, search_start, empty_size, 1); + } else { + cow = btrfs_alloc_free_block(trans, ro
Re: [RFC PATCH] Decrease meta fragments by using a caterpillar band Method (Ver. 2)
Hi Liu, On 05/29/2012 06:24 PM, WeiFeng Liu wrote: > This is a several bugs fixed version since my first patch commit, and added > patch of btrfs-prog > > > Introduction and brief speculate of values and penalties: > > When a tree block need to be created, we offer, say, 2 or 3 blocks for > it, > then pick one from the continuous blocks. If this tree block needs a cow, > another free block from these continuous blocks can be grabbed, and the old > one > is freed for next cow. What happens if the block is not COW-ed *and freed* but COW-ed only (think about a snapshot updated) ? I.e, what happens if the user makes 5-6 snapshot and the caterpillar-size is 3 ? > > In the most ideal condition only 2 continuous blocks are kept for any > times > of cowing a tree block -- image a caterpillar band by which my method is > named. Apart my doubt above, I am very interested on the performances. However I have some doubts about the space efficiency. I have the impression that today BTRFS consumes a lot of space for the meta-data. On my linux box: ghigo@venice:~$ sudo btrfs fi df / Data: total=19.01GB, used=14.10GB System, DUP: total=8.00MB, used=4.00KB System: total=4.00MB, used=0.00 Metadata, DUP: total=2.00GB, used=933.55MB Metadata: total=8.00MB, used=0.00 So basically the metadata are bout the 6% of the data (0.9GB / 14.1GB ). But with your idea, BTRFS should reserve 3 blocks for every metadata block. This means that the BTRFS ratio metadata/data will increase to 18% (6%*3). Which is not a so negligible value. GB > > Given a scene that there are two file systems: one with COW and the > other > NOCOW, each has 1GB space for metadata with it's first 100MB has been used, > and > let me focus only on ops of modifying metadata and ignore deleting metadata > for > simple reason. > > As we can image that the NOCOW fs would likely keep the layout of its > 100MB > meta to most neat: changes can be overwritten in the original places and leave > the rest 900MB untouched. But it is lack of the excellent feature to assure > data > integrity which owned only by COW fs. > > However, only modifying metadata though, the COW fs would make holes in > the first 100MB and write COWed meta into the rest 900MB space, in the extreme > condition, the whole 1GB space would be fragmented finally and scattered by > that > 100MB metadata. I don't think btrfs will be easily trap into such bad state, > as > I understood we have extent, chunk, cluster and maybe other methods(tell me > please) to slow fragmenting, but it seems that there are no dedicate method > (fix me) to help COW getting rid of this type of fragments which NOCOW fs does > not incline to make. > > I introduce the caterpillar band method as a compromise. It use 300MB > for > meta to avoid such fragments and without lost of COW feature, in the instance, > that means three countinues blocks are used for a single tree block, the tree > block will be circularly updated(cowed) within its three countinues blocks. > > Penalties? Yes there are, thanks to Arne Jansen and Liu Bo. As Arne > Jansen > indicated, the most disadvantage of the method will be that this will > basically > limit us to 1/3th of platter speed to write meta when using spinning drives > and > to 1/4th if using four countinues blocks for a tree block. > > About readahead, which will be also down to the 1/3th of NOCOW fs rate, > but > I would discreetly think it as an advantage rather than penalty comparing > worse > condition which COW would get -- nearly since the first COW, the new tree > blocks > cowed would be 50MB far away from their original neighbor blocks normally, and > after frequent random modify ops, would the worst conditition be that every > dozen of tree blocks newly cowed are more than 50MB far away from their > original > neighbor blocks if in equal probability? > > So permit me to think readahead is only usefull for NOCOW fs in this > scenario, because it always keeps its original 100MB continued, and my way > would > keep 1/3 readahead rate vs maybe-zero by pure COW if worstly. > > Of course, both penalties and values are only for metadata and will not > affect user date read/write, my patch is only applied for cow tree blocks. > But if there are large number of small files(size<4k), values and > penalties > will also affect those small user data R/W. > > I have not made tests for my patch by now, it still need some time to > get > more check for both strategy and code in patch and fix possible bugs before > test, any comments are welcome. > > > Thanks > > > signed-off-by WeiFeng Liu > 523f28f9b3d9c710cacc31dbba644efb1678cf62 > > --- > > diff -uprN a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c > --- a/fs/btrfs/ctree.c2012-05-21 18:42:51.0 + > +++ b/fs/btrfs/ctree.c2012-05-29 23:08:19.0 + > @@ -444,
[PATCH] Btrfs: fix return code in drop_objectid_items
So dpkg fsync()'s the file and the directory containing the file whenever it writes to a file which is really slow in btrfs. This is partly because fsync()'ing a directory _always_ committed the transaction instead of just going to the tree log. This is because drop_objectid_items() would return 1 since it does a btrfs_search_slot() which returns 1. In tree-log jargon this means that we have to commit the transaction to be safe. So just check if ret is greater than 0 and set it to 0 if it does. With this patch we now use the tree-log instead of committing the entire transaction, which is twice as fast on my box. Thanks, Signed-off-by: Josef Bacik --- fs/btrfs/tree-log.c |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index 425014b..2017d0f 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -2667,6 +2667,8 @@ static int drop_objectid_items(struct btrfs_trans_handle *trans, btrfs_release_path(path); } btrfs_release_path(path); + if (ret > 0) + ret = 0; return ret; } -- 1.7.7.6 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: check to see if the inode is in the log before fsyncing
We have this check down in the actual logging code, but this is after we start a transaction and all that good stuff. So move the helper inode_in_log() out so we can call it in fsync() and avoid starting a transaction altogether and just exit if we've already fsync()'ed this file recently. You would notice this issue if you fsync()'ed a file over and over again until the transaction committed. Thanks, Signed-off-by: Josef Bacik --- fs/btrfs/btrfs_inode.h | 13 + fs/btrfs/file.c|3 ++- fs/btrfs/tree-log.c| 17 + 3 files changed, 16 insertions(+), 17 deletions(-) diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h index ce2c9d6..e616f887 100644 --- a/fs/btrfs/btrfs_inode.h +++ b/fs/btrfs/btrfs_inode.h @@ -199,4 +199,17 @@ static inline bool btrfs_is_free_space_inode(struct btrfs_root *root, return false; } +static inline int btrfs_inode_in_log(struct inode *inode, u64 generation) +{ + struct btrfs_root *root = BTRFS_I(inode)->root; + int ret = 0; + + mutex_lock(&root->log_mutex); + if (BTRFS_I(inode)->logged_trans == generation && + BTRFS_I(inode)->last_sub_trans <= root->last_log_commit) + ret = 1; + mutex_unlock(&root->log_mutex); + return ret; +} + #endif diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 5a525d0..70dc8ca 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1552,7 +1552,8 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync) * syncing */ smp_mb(); - if (BTRFS_I(inode)->last_trans <= + if (btrfs_inode_in_log(inode, root->fs_info->generation) || + BTRFS_I(inode)->last_trans <= root->fs_info->last_trans_committed) { BTRFS_I(inode)->last_trans = 0; mutex_unlock(&inode->i_mutex); diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index 6f22a4f..425014b 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -3038,21 +3038,6 @@ out: return ret; } -static int inode_in_log(struct btrfs_trans_handle *trans, -struct inode *inode) -{ - struct btrfs_root *root = BTRFS_I(inode)->root; - int ret = 0; - - mutex_lock(&root->log_mutex); - if (BTRFS_I(inode)->logged_trans == trans->transid && - BTRFS_I(inode)->last_sub_trans <= root->last_log_commit) - ret = 1; - mutex_unlock(&root->log_mutex); - return ret; -} - - /* * helper function around btrfs_log_inode to make sure newly created * parent directories also end up in the log. A minimal inode and backref @@ -3093,7 +3078,7 @@ int btrfs_log_inode_parent(struct btrfs_trans_handle *trans, if (ret) goto end_no_trans; - if (inode_in_log(trans, inode)) { + if (btrfs_inode_in_log(inode, trans->transid)) { ret = BTRFS_NO_LOG_SYNC; goto end_no_trans; } -- 1.7.7.6 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Help with recover data
Hi Everyone, I recently decided to use btrfs. It works perfectly for a week even under heavy load. Yesterday I destroyed backups as cannot afford to have ~10TB in backups. I decided to switch on Btrfs because it was announced that it stable already I need to recover ~5TB data, this data is important and I do not have backups uname -a Linux s0 3.4.0-030400-generic #201205210521 SMP Mon May 21 09:22:02 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux sudo mount -o recovery /dev/sdb /tank mount: wrong fs type, bad option, bad superblock on /dev/sdb, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so dmesg: [ 9612.971149] device fsid c9776e19-37eb-4f9c-bd6b-04e8dde97682 devid 2 transid 9096 /dev/sdb [ 9613.048476] btrfs: enabling auto recovery [ 9613.048482] btrfs: disk space caching is enabled [ 9621.172540] parent transid verify failed on 5468060241920 wanted 9096 found 7621 [ 9621.181369] parent transid verify failed on 5468060241920 wanted 9096 found 7621 [ 9621.182167] btrfs read error corrected: ino 1 off 5468060241920 (dev /dev/sdd sector 2143292648) [ 9621.182181] Failed to read block groups: -5 [ 9621.193680] btrfs: open_ctree failed sudo /usr/local/bin/btrfs-find-root /dev/sdb ... Well block 4455562448896 seems great, but generation doesn't match, have=9092, want=9096 Well block 4455568302080 seems great, but generation doesn't match, have=9091, want=9096 Well block 4848395739136 seems great, but generation doesn't match, have=9093, want=9096 Well block 4923796594688 seems great, but generation doesn't match, have=9094, want=9096 Well block 4923798065152 seems great, but generation doesn't match, have=9095, want=9096 Found tree root at 5532762525696 $ sudo btrfs-restore -v -t 4923798065152 /dev/sdb ./ parent transid verify failed on 4923798065152 wanted 9096 found 9095 parent transid verify failed on 4923798065152 wanted 9096 found 9095 parent transid verify failed on 4923798065152 wanted 9096 found 9095 parent transid verify failed on 4923798065152 wanted 9096 found 9095 Ignoring transid failure Root objectid is 5 Restoring ./Irina Restoring ./Irina/.idmapdir2 Skipping existing file ./Irina/.idmapdir2/4.bucket.lock If you wish to overwrite use the -o option to overwrite Skipping existing file ./Irina/.idmapdir2/7.bucket Skipping existing file ./Irina/.idmapdir2/15.bucket Skipping existing file ./Irina/.idmapdir2/12.bucket.lock Skipping existing file ./Irina/.idmapdir2/cap.txt Skipping existing file ./Irina/.idmapdir2/5.bucket Restoring ./Irina/.idmapdir2/10.bucket.lock Restoring ./Irina/.idmapdir2/6.bucket.lock Restoring ./Irina/.idmapdir2/8.bucket ret is -3 sudo btrfs-zero-log /dev/sdb ... parent transid verify failed on 5468231311360 wanted 9096 found 7621 parent transid verify failed on 5468231311360 wanted 9096 found 7621 parent transid verify failed on 5468060102656 wanted 9096 found 7621 Ignoring transid failure leaf parent key incorrect 59310080 btrfs-zero-log: extent-tree.c:2578: alloc_reserved_tree_block: Assertion `!(ret)' failed. Help me please. Max -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Help with data recovering
After command: sudo /usr/local/bin/btrfs device scan i got new lines in dmesg: 11329.598535] device fsid c9776e19-37eb-4f9c-bd6b-04e8dde97682 devid 2 transid 9096 /dev/sdb [11329.599885] device fsid c9776e19-37eb-4f9c-bd6b-04e8dde97682 devid 3 transid 9095 /dev/sdd [11329.600840] device fsid c9776e19-37eb-4f9c-bd6b-04e8dde97682 devid 1 transid 9096 /dev/sda [11329.602083] device fsid c9776e19-37eb-4f9c-bd6b-04e8dde97682 devid 4 transid 9096 /dev/sde [11329.603036] device fsid c9776e19-37eb-4f9c-bd6b-04e8dde97682 devid 5 transid 9096 /dev/sdf looks like /dev/sdd lost one transid. Is it possible to roll back on transid 9095? Thanks On 05/29/2012 06:14 PM, Maxim Mikheev wrote: Hi Everyone, I recently decided to use btrfs. It works perfectly for a week even under heavy load. Yesterday I destroyed backups as cannot afford to have ~10TB in backups. I decided to switch on Btrfs because it was announced that it stable already I need to recover ~5TB data, this data is important and I do not have backups uname -a Linux s0 3.4.0-030400-generic #201205210521 SMP Mon May 21 09:22:02 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux sudo mount -o recovery /dev/sdb /tank mount: wrong fs type, bad option, bad superblock on /dev/sdb, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so dmesg: [ 9612.971149] device fsid c9776e19-37eb-4f9c-bd6b-04e8dde97682 devid 2 transid 9096 /dev/sdb [ 9613.048476] btrfs: enabling auto recovery [ 9613.048482] btrfs: disk space caching is enabled [ 9621.172540] parent transid verify failed on 5468060241920 wanted 9096 found 7621 [ 9621.181369] parent transid verify failed on 5468060241920 wanted 9096 found 7621 [ 9621.182167] btrfs read error corrected: ino 1 off 5468060241920 (dev /dev/sdd sector 2143292648) [ 9621.182181] Failed to read block groups: -5 [ 9621.193680] btrfs: open_ctree failed sudo /usr/local/bin/btrfs-find-root /dev/sdb ... Well block 4455562448896 seems great, but generation doesn't match, have=9092, want=9096 Well block 4455568302080 seems great, but generation doesn't match, have=9091, want=9096 Well block 4848395739136 seems great, but generation doesn't match, have=9093, want=9096 Well block 4923796594688 seems great, but generation doesn't match, have=9094, want=9096 Well block 4923798065152 seems great, but generation doesn't match, have=9095, want=9096 Found tree root at 5532762525696 $ sudo btrfs-restore -v -t 4923798065152 /dev/sdb ./ parent transid verify failed on 4923798065152 wanted 9096 found 9095 parent transid verify failed on 4923798065152 wanted 9096 found 9095 parent transid verify failed on 4923798065152 wanted 9096 found 9095 parent transid verify failed on 4923798065152 wanted 9096 found 9095 Ignoring transid failure Root objectid is 5 Restoring ./Irina Restoring ./Irina/.idmapdir2 Skipping existing file ./Irina/.idmapdir2/4.bucket.lock If you wish to overwrite use the -o option to overwrite Skipping existing file ./Irina/.idmapdir2/7.bucket Skipping existing file ./Irina/.idmapdir2/15.bucket Skipping existing file ./Irina/.idmapdir2/12.bucket.lock Skipping existing file ./Irina/.idmapdir2/cap.txt Skipping existing file ./Irina/.idmapdir2/5.bucket Restoring ./Irina/.idmapdir2/10.bucket.lock Restoring ./Irina/.idmapdir2/6.bucket.lock Restoring ./Irina/.idmapdir2/8.bucket ret is -3 sudo btrfs-zero-log /dev/sdb ... parent transid verify failed on 5468231311360 wanted 9096 found 7621 parent transid verify failed on 5468231311360 wanted 9096 found 7621 parent transid verify failed on 5468060102656 wanted 9096 found 7621 Ignoring transid failure leaf parent key incorrect 59310080 btrfs-zero-log: extent-tree.c:2578: alloc_reserved_tree_block: Assertion `!(ret)' failed. Help me please. Max -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Help with data recovering
I can't help much at the moment, but the following will help sort things out: Can you provide as much detail as possible about how things were configured at the time of the failure? Raid levels used, kernel versions at the time of the failure, how the disks are connected, general description of the activity on the disk and the nature of its contents (all large files? rootfs? mail spools?) What you were thinking at the time you decided that you couldn't afford backups? As much detail as possible on what all you've tried since the failure to recover things? It's likely the data is fine (if currently inaccessible), but obviously things are in a fragile state, and the important thing right now is to not make things worse: a recoverable situation may otherwise turn into an irrecoverable one. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Help with recover data
On 5/30/12 12:14 AM, Maxim Mikheev wrote: Hi Everyone, I recently decided to use btrfs. It works perfectly for a week even under heavy load. Yesterday I destroyed backups as cannot afford to have ~10TB in backups. I decided to switch on Btrfs because it was announced that it stable already I need to recover ~5TB data, this data is important and I do not have backups Just out of curiosity: Who announced that BTRFS is stable already?! The kernel says something different and there is still no 100% working fsck for btrfs. Imho it is far away from being stable :) And btw: Even it would be stable, allways keep backups for important data ffs! I don't understand why there are still technical experienced people who don't do backups :/ Imho if you don't do backups from a portion of data they are considered not to be important. uname -a Linux s0 3.4.0-030400-generic #201205210521 SMP Mon May 21 09:22:02 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux sudo mount -o recovery /dev/sdb /tank mount: wrong fs type, bad option, bad superblock on /dev/sdb, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so dmesg: [ 9612.971149] device fsid c9776e19-37eb-4f9c-bd6b-04e8dde97682 devid 2 transid 9096 /dev/sdb [ 9613.048476] btrfs: enabling auto recovery [ 9613.048482] btrfs: disk space caching is enabled [ 9621.172540] parent transid verify failed on 5468060241920 wanted 9096 found 7621 [ 9621.181369] parent transid verify failed on 5468060241920 wanted 9096 found 7621 [ 9621.182167] btrfs read error corrected: ino 1 off 5468060241920 (dev /dev/sdd sector 2143292648) [ 9621.182181] Failed to read block groups: -5 [ 9621.193680] btrfs: open_ctree failed sudo /usr/local/bin/btrfs-find-root /dev/sdb ... Well block 4455562448896 seems great, but generation doesn't match, have=9092, want=9096 Well block 4455568302080 seems great, but generation doesn't match, have=9091, want=9096 Well block 4848395739136 seems great, but generation doesn't match, have=9093, want=9096 Well block 4923796594688 seems great, but generation doesn't match, have=9094, want=9096 Well block 4923798065152 seems great, but generation doesn't match, have=9095, want=9096 Found tree root at 5532762525696 $ sudo btrfs-restore -v -t 4923798065152 /dev/sdb ./ parent transid verify failed on 4923798065152 wanted 9096 found 9095 parent transid verify failed on 4923798065152 wanted 9096 found 9095 parent transid verify failed on 4923798065152 wanted 9096 found 9095 parent transid verify failed on 4923798065152 wanted 9096 found 9095 Ignoring transid failure Root objectid is 5 Restoring ./Irina Restoring ./Irina/.idmapdir2 Skipping existing file ./Irina/.idmapdir2/4.bucket.lock If you wish to overwrite use the -o option to overwrite Skipping existing file ./Irina/.idmapdir2/7.bucket Skipping existing file ./Irina/.idmapdir2/15.bucket Skipping existing file ./Irina/.idmapdir2/12.bucket.lock Skipping existing file ./Irina/.idmapdir2/cap.txt Skipping existing file ./Irina/.idmapdir2/5.bucket Restoring ./Irina/.idmapdir2/10.bucket.lock Restoring ./Irina/.idmapdir2/6.bucket.lock Restoring ./Irina/.idmapdir2/8.bucket ret is -3 sudo btrfs-zero-log /dev/sdb ... parent transid verify failed on 5468231311360 wanted 9096 found 7621 parent transid verify failed on 5468231311360 wanted 9096 found 7621 parent transid verify failed on 5468060102656 wanted 9096 found 7621 Ignoring transid failure leaf parent key incorrect 59310080 btrfs-zero-log: extent-tree.c:2578: alloc_reserved_tree_block: Assertion `!(ret)' failed. Help me please. Max -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Help with recover data
On Tue, May 29, 2012 at 5:14 PM, Felix Blanke wrote: > > > On 5/30/12 12:14 AM, Maxim Mikheev wrote: >> >> Hi Everyone, >> >> I recently decided to use btrfs. It works perfectly for a week even >> under heavy load. Yesterday I destroyed backups as cannot afford to have >> ~10TB in backups. I decided to switch on Btrfs because it was announced >> that it stable already >> I need to recover ~5TB data, this data is important and I do not have >> backups >> > > Just out of curiosity: Who announced that BTRFS is stable already?! The > kernel says something different and there is still no 100% working fsck for > btrfs. Imho it is far away from being stable :) > > And btw: Even it would be stable, allways keep backups for important data > ffs! I don't understand why there are still technical experienced people who > don't do backups :/ Imho if you don't do backups from a portion of data they > are considered not to be important. Some distros do offer support, but that's usually in the sense of "if you have a support contract (and are on qualified hardware and using it in a supported configuration), we'll help you fix what breaks (and we're confident we can)", rather than a claim that things will never break. I expect (but haven't actually checked recently) that such distros actively backport btrfs fixes to their supported kernels (btrfs in Distro X's 3.2 kernel may have fixes that Distro Y's 3.2 kernel does not, etc), which can lead to unfortunate misunderstandings; we don't have enough information yet to determine whether that's the case here though. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Help with data recovering
Thank you for your answer. The system kernel was and now: Linux s0 3.4.0-030400-generic #201205210521 SMP Mon May 21 09:22:02 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux the raid was created by: mkfs.btrfs /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf Disk are connected through RocketRaid 2670. for mounting I used line in fstab: UUID=c9776e19-37eb-4f9c-bd6b-04e8dde97682/tankbtrfs defaults,compress=lzo01 On machine was running several Virtual machines. Only one was actively using disks. VM has active several threads: 1. 2 threads reading big files (50GB each) 2. reading from 50 files and writing one big file 3. The kernel panic happens when I run another program with 30 threads of reading/writing of small files. Virtual Machine accessed to underline btrfs through 9-p file system which actively used xattr. After reboot system was in this stage. I hope that btrfsck --repair will not make it worse, It is now running. . Backups, you everytime need them when you don't have. We was urgently need extra space and planed to buy new disks soon. On 05/29/2012 07:11 PM, cwillu wrote: I can't help much at the moment, but the following will help sort things out: Can you provide as much detail as possible about how things were configured at the time of the failure? Raid levels used, kernel versions at the time of the failure, how the disks are connected, general description of the activity on the disk and the nature of its contents (all large files? rootfs? mail spools?) What you were thinking at the time you decided that you couldn't afford backups? As much detail as possible on what all you've tried since the failure to recover things? It's likely the data is fine (if currently inaccessible), but obviously things are in a fragile state, and the important thing right now is to not make things worse: a recoverable situation may otherwise turn into an irrecoverable one. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Help with data recovering
I forgot to add. Btrfs-tools was build from: git clone git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git On 05/29/2012 07:24 PM, Maxim Mikheev wrote: Thank you for your answer. The system kernel was and now: Linux s0 3.4.0-030400-generic #201205210521 SMP Mon May 21 09:22:02 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux the raid was created by: mkfs.btrfs /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf Disk are connected through RocketRaid 2670. for mounting I used line in fstab: UUID=c9776e19-37eb-4f9c-bd6b-04e8dde97682/tankbtrfs defaults,compress=lzo01 On machine was running several Virtual machines. Only one was actively using disks. VM has active several threads: 1. 2 threads reading big files (50GB each) 2. reading from 50 files and writing one big file 3. The kernel panic happens when I run another program with 30 threads of reading/writing of small files. Virtual Machine accessed to underline btrfs through 9-p file system which actively used xattr. After reboot system was in this stage. I hope that btrfsck --repair will not make it worse, It is now running. . Backups, you everytime need them when you don't have. We was urgently need extra space and planed to buy new disks soon. On 05/29/2012 07:11 PM, cwillu wrote: I can't help much at the moment, but the following will help sort things out: Can you provide as much detail as possible about how things were configured at the time of the failure? Raid levels used, kernel versions at the time of the failure, how the disks are connected, general description of the activity on the disk and the nature of its contents (all large files? rootfs? mail spools?) What you were thinking at the time you decided that you couldn't afford backups? As much detail as possible on what all you've tried since the failure to recover things? It's likely the data is fine (if currently inaccessible), but obviously things are in a fragile state, and the important thing right now is to not make things worse: a recoverable situation may otherwise turn into an irrecoverable one. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Help with data recovering
On Tue, May 29, 2012 at 5:24 PM, Maxim Mikheev wrote: > Thank you for your answer. > > > The system kernel was and now: > > Linux s0 3.4.0-030400-generic #201205210521 SMP Mon May 21 09:22:02 UTC 2012 > x86_64 x86_64 x86_64 GNU/Linux > > the raid was created by: > mkfs.btrfs /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf > > Disk are connected through RocketRaid 2670. > > for mounting I used line in fstab: > UUID=c9776e19-37eb-4f9c-bd6b-04e8dde97682 /tank btrfs > defaults,compress=lzo 0 1 > > On machine was running several Virtual machines. Only one was actively using > disks. > > VM has active several threads: > 1. 2 threads reading big files (50GB each) > 2. reading from 50 files and writing one big file > 3. The kernel panic happens when I run another program with 30 threads of > reading/writing of small files. > > Virtual Machine accessed to underline btrfs through 9-p file system which > actively used xattr. > > After reboot system was in this stage. > > I hope that btrfsck --repair will not make it worse, It is now running. **twitch** Well, I also hope it won't make it worse. Do not cancel it now, let it finish (aborting it will make things worse), but I suggest waiting until a few more people have weighed in before attempting anything beyond that. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix return code in drop_objectid_items
On 05/30/2012 04:57 AM, Josef Bacik wrote: > So dpkg fsync()'s the file and the directory containing the file whenever it > writes to a file which is really slow in btrfs. This is partly because > fsync()'ing a directory _always_ committed the transaction instead of just > going to the tree log. This is because drop_objectid_items() would return 1 > since it does a btrfs_search_slot() which returns 1. In tree-log jargon > this means that we have to commit the transaction to be safe. So just check > if ret is greater than 0 and set it to 0 if it does. With this patch we now > use the tree-log instead of committing the entire transaction, which is > twice as fast on my box. Thanks, > Good catch. Reviewed-by: Liu Bo > Signed-off-by: Josef Bacik > --- > fs/btrfs/tree-log.c |2 ++ > 1 files changed, 2 insertions(+), 0 deletions(-) > > diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c > index 425014b..2017d0f 100644 > --- a/fs/btrfs/tree-log.c > +++ b/fs/btrfs/tree-log.c > @@ -2667,6 +2667,8 @@ static int drop_objectid_items(struct > btrfs_trans_handle *trans, > btrfs_release_path(path); > } > btrfs_release_path(path); > + if (ret > 0) > + ret = 0; > return ret; > } > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html