date:20120529

[btrfs-progs] btrfs fi df output

2012-05-29 Thread Andrei Popa

Hello,

I have a question regarding "btrfs filesystem df"output.
# btrfs fi df /mnt/test
Data: total=3.01GB, used=512.19MB
System, DUP: total=8.00MB, used=4.00KB
System: total=4.00MB, used=0.00 <= What this
means? For what is used? I've never seen this incremented
Metadata, DUP: total=2.50GB, used=676.00KB
Metadata: total=8.00MB, used=0.00<= the same
question

I have kernel 3.3.6 and btrfs-tools from git.
#mkfs.btrfs /dev/mapper/vg-lvtest
#mount /dev/mapper/vg-lvtest /mnt/test
#dd if=/dev/zero of=/mnt/test/test.file bs=1M count=512 conv=fdatasync
# btrfs fi df /mnt/test
Data: total=3.01GB, used=512.19MB
System, DUP: total=8.00MB, used=4.00KB
System: total=4.00MB, used=0.00
Metadata, DUP: total=2.50GB, used=676.00KB
Metadata: total=8.00MB, used=0.00
ierdnac-hp ~ # 
#umount /mnt/test

These two chunks are the ones that appear below in btrfs-debug-tree ?
Which ones ? In the three there is one 4MB andthree 8MB, one with 2
stripes.

#btrfs-debug-tree /dev/mapper/vg-lvtest 
chunk tree
leaf 20979712 items 12 free space 2557 generation 5 owner 3
fs uuid 6accfaf3-c88a-462e-85fc-35513d0b43d6
chunk uuid 65f22206-a9dd-4053-a660-61bc4ee0be12
item 0 key (DEV_ITEMS DEV_ITEM 1) itemoff 3897 itemsize 98
dev item devid 1 total_bytes 116912029696 bytes used 8627683328
item 1 key (FIRST_CHUNK_TREE CHUNK_ITEM 0) itemoff 3817 itemsize 80
chunk length 4194304 owner 2 type 2 num_stripes 1
stripe 0 devid 1 offset 0
item 2 key (FIRST_CHUNK_TREE CHUNK_ITEM 4194304) itemoff 3737 itemsize
80
chunk length 8388608 owner 2 type 4 num_stripes 1
stripe 0 devid 1 offset 4194304
item 3 key (FIRST_CHUNK_TREE CHUNK_ITEM 12582912) itemoff 3657 itemsize
80
chunk length 8388608 owner 2 type 1 num_stripes 1
stripe 0 devid 1 offset 12582912
item 4 key (FIRST_CHUNK_TREE CHUNK_ITEM 20971520) itemoff 3545 itemsize
112
chunk length 8388608 owner 2 type 34 num_stripes 2
stripe 0 devid 1 offset 20971520
stripe 1 devid 1 offset 29360128

Thanks

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: atime and filesystems with snapshots (especially Btrfs)

2012-05-29 Thread Boaz Harrosh

On 05/25/2012 06:35 PM, Alexander Block wrote:

> Hello,
> 
> (this is a resend with proper CC for linux-fsdevel and linux-kernel)
> 
> I would like to start a discussion on atime in Btrfs (and other
> filesystems with snapshot support).
> 
> As atime is updated on every access of a file or directory, we get
> many changes to the trees in btrfs that as always trigger cow
> operations. This is no problem as long as the changed tree blocks are
> not shared by other subvolumes. Performance is also not a problem, no
> matter if shared or not (thanks to relatime which is the default).
> The problems start when someone starts to use snapshots. If you for
> example snapshot your root and continue working on your root, after
> some time big parts of the tree will be cowed and unshared. In the
> worst case, the whole tree gets unshared and thus takes up the double
> space. Normally, a user would expect to only use extra space for a
> tree if he changes something.
> A worst case scenario would be if someone took regular snapshots for
> backup purposes and later greps the contents of all snapshots to find
> a specific file. This would touch all inodes in all trees and thus
> make big parts of the trees unshared.
> 
> relatime (which is the default) reduces this problem a little bit, as
> it by default only updates atime once a day. This means, if anyone
> wants to test this problem, mount with relatime disabled or change the
> system date before you try to update atime (that's the way i tested
> it).
> 
> As a solution, I would suggest to make noatime the default for btrfs.
> I'm however not sure if it is allowed in linux to have different
> default mount options for different filesystem types. I know this
> discussion pops up every few years (last time it resulted in making
> relatime the default). But this is a special case for btrfs. atime is
> already bad on other filesystems, but it's much much worse in btrfs.
> 


Sounds like a real problem. I would suggest a few remedies.
1. Make a filesystem persistent parameter that says noatime/relatime/atime
   So the default if not specified on mount is taken as a property of
   the FS (mkfs can set it)
2. The snapshot program should check and complain if it is on, and recommend
   an off. Since the problem only starts with a snapshot.
3. If space availability drops under some threshold, disable atime. As you said
   this is catastrophic in this case. So user can always search and delete 
files.
   In fact if the IO was only because of atime, it should be a soft error, 
warned,
   and ignored.

But perhaps the true solution is to put atime on a side table, so only the atime
info gets COW and not the all MetaData

Just my $0.017
Boaz

> Alex.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Btrfs: return value of btrfs_read_buffer is checked correctly

2012-05-29 Thread Tsutomu Itoh

btrfs_read_buffer() has the possibility of returning the error.
Therefore, I add the code in which the return value of btrfs_read_buffer()
is checked.

Signed-off-by: Tsutomu Itoh 
---
 fs/btrfs/ctree.c|6 +-
 fs/btrfs/tree-log.c |   16 +---
 2 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 4106264..c1af717 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -739,7 +739,11 @@ int btrfs_realloc_node(struct btrfs_trans_handle *trans,
if (!cur)
return -EIO;
} else if (!uptodate) {
-   btrfs_read_buffer(cur, gen);
+   err = btrfs_read_buffer(cur, gen);
+   if (err) {
+   free_extent_buffer(cur);
+   return err;
+   }
}
}
if (search_start == 0)
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index eb1ae90..6f22a4f 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -1628,7 +1628,9 @@ static int replay_one_buffer(struct btrfs_root *log, 
struct extent_buffer *eb,
int i;
int ret;
 
-   btrfs_read_buffer(eb, gen);
+   ret = btrfs_read_buffer(eb, gen);
+   if (ret)
+   return ret;
 
level = btrfs_header_level(eb);
 
@@ -1749,7 +1751,11 @@ static noinline int walk_down_log_tree(struct 
btrfs_trans_handle *trans,
 
path->slots[*level]++;
if (wc->free) {
-   btrfs_read_buffer(next, ptr_gen);
+   ret = btrfs_read_buffer(next, ptr_gen);
+   if (ret) {
+   free_extent_buffer(next);
+   return ret;
+   }
 
btrfs_tree_lock(next);
btrfs_set_lock_blocking(next);
@@ -1766,7 +1772,11 @@ static noinline int walk_down_log_tree(struct 
btrfs_trans_handle *trans,
free_extent_buffer(next);
continue;
}
-   btrfs_read_buffer(next, ptr_gen);
+   ret = btrfs_read_buffer(next, ptr_gen);
+   if (ret) {
+   free_extent_buffer(next);
+   return ret;
+   }
 
WARN_ON(*level <= 0);
if (path->nodes[*level-1])

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Newbie questions on some of btrfs code...

2012-05-29 Thread Jan Schmidt

On Mon, May 28, 2012 at 20:45 (+0200), Alex Lyakas wrote:
> I have re-looked at btrfs_search_slot, and don't see how it would end
> up in leaf B. The bin_search() function will clearly return the slot
> *after* the slot of N that has key==5 (which is the parent slot of A).
> So then the following code:
>   if (level != 0) {
>   int dec = 0;
>   if (ret && slot > 0) {
>   dec = 1;
>   slot -= 1;
>   }
> will bring us back into the slot of N with key=5. And we will go to
> leaf A. While if key(N) of that slot was 10, we would never have ended
> up in that slot, unless there is no lesser key in the tree.

Yes, that's right. As already said in my previous mail (in the paragraph
you didn't quote), the key in the leaf must be an exact match. The key
in N pointing to A will be 10.

> Actually, it looks like "no lesser key" is the only case when we can
> get ret==1 and slot==0.

Correct.

> Except perhaps an empty leaf, which I am not sure can happen.

It can't.

-Jan
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Make existing snapshots read-only?

2012-05-29 Thread David Sterba

On Tue, May 29, 2012 at 08:40:10AM +0800, Li Zefan wrote:
> > Is there any way to mark existing snapshots as read-only? Making new
> > ones read-only is easy enough, but what about existing ones?
> 
> We have code in the kernel side, so what we need to do is to update 
> btrfs-progs,
> which is trivial.

Well, I don't like that it's even possible to turn a RO snapshot to a RW
one. What was the rationale behind this back then? Besides, I think that
it could break assumptions in the backref code.

If it's only a one-way operation from a regular subvol -> RO subvol,
this sounds reasonable to me. If the opposite direction is allowed, then
I'd not call it 'read-only' but "unwritable on-request".

david
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Newbie questions on some of btrfs code...

2012-05-29 Thread Alex Lyakas

Thank you Jan, Hugo & Lio,
for taking time answering my questions.

Alex.
P.S.: I have dug in some more, so probably more questions will arrive:)


On Tue, May 29, 2012 at 12:13 PM, Jan Schmidt  wrote:
> On Mon, May 28, 2012 at 20:45 (+0200), Alex Lyakas wrote:
>> I have re-looked at btrfs_search_slot, and don't see how it would end
>> up in leaf B. The bin_search() function will clearly return the slot
>> *after* the slot of N that has key==5 (which is the parent slot of A).
>> So then the following code:
>>               if (level != 0) {
>>                       int dec = 0;
>>                       if (ret && slot > 0) {
>>                               dec = 1;
>>                               slot -= 1;
>>                       }
>> will bring us back into the slot of N with key=5. And we will go to
>> leaf A. While if key(N) of that slot was 10, we would never have ended
>> up in that slot, unless there is no lesser key in the tree.
>
> Yes, that's right. As already said in my previous mail (in the paragraph
> you didn't quote), the key in the leaf must be an exact match. The key
> in N pointing to A will be 10.
>
>> Actually, it looks like "no lesser key" is the only case when we can
>> get ret==1 and slot==0.
>
> Correct.
>
>> Except perhaps an empty leaf, which I am not sure can happen.
>
> It can't.
>
> -Jan
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Make existing snapshots read-only?

2012-05-29 Thread cwillu

On Tue, May 29, 2012 at 5:18 AM, David Sterba  wrote:
> On Tue, May 29, 2012 at 08:40:10AM +0800, Li Zefan wrote:
>> > Is there any way to mark existing snapshots as read-only? Making new
>> > ones read-only is easy enough, but what about existing ones?
>>
>> We have code in the kernel side, so what we need to do is to update 
>> btrfs-progs,
>> which is trivial.
>
> Well, I don't like that it's even possible to turn a RO snapshot to a RW
> one. What was the rationale behind this back then? Besides, I think that
> it could break assumptions in the backref code.
>
> If it's only a one-way operation from a regular subvol -> RO subvol,
> this sounds reasonable to me. If the opposite direction is allowed, then
> I'd not call it 'read-only' but "unwritable on-request".

Is anyone actually expecting readonly-snapshots to be a worm
implementation?  And are they sane to expect it?

So long as the permissions required to change it are sane (admin
rights to change an arbitrary snapshot, possibly something like
write-permission on the mountpoint to change otherwise), I don't see
the gain.  It's not like root can't modify the disk directly, so
withholding an easy way to flip the readonly bit just strikes me as a
nuisance feature.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Make existing snapshots read-only?

2012-05-29 Thread Stephane Chazelas

2012-05-28 12:37:00 -0600, Bruce Guenter:
> 
> Is there any way to mark existing snapshots as read-only? Making new
> ones read-only is easy enough, but what about existing ones?
[...]

you can always do

btrfs sub snap -r vol vol-ro
btrfs sub del vol
mv vol-ro vol

-- 
Stephane

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Will big metadata blocks fix # of hardlinks?

2012-05-29 Thread Martin

Thanks for noting this one. That is one very surprising and unexpected
limit!... And a killer for some not completely rare applications...

On 26/05/12 19:22, Sami Liedes wrote:
> Hi!
> 
> I see that Linux 3.4 supports bigger metadata blocks for btrfs.
> 
> Will using them allow a bigger number of hardlinks on a single file
> (i.e. the bug that has bitten at least git users on Debian[1,2], and
> BackupPC[3])? As far as I understand correctly, the problem has been
> that the hard links are stored in the same metadata block with some
> other metadata, so the size of the block is an inherent limitation?
> 
> If so, I think it would be worth for me to try Btrfs again :)
> 
>   Sami
> 
> 
> [1] http://permalink.gmane.org/gmane.comp.file-systems.btrfs/13603
> [2] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=642603
> [3] https://bugzilla.kernel.org/show_bug.cgi?id=15762

One example fail case is just 13 hard links. Even x4 that (16k blocks)
only gives 52 links for that example fail case.

The brief summary for those are:

* It's a rare corner case that needs a format change to fix, so "won't-fix";

* There are real world problem examples noted in those threads for such
as: BackupPC (backups); nnmaildir mail backend in Gnus (an Emacs package
for reading news and email); and a web archiver.

* Also, Bacula (backups) and Mutt (email client) are quoted as problem
examples in:

Btrfs File-System Plans For Ubuntu 12.10
http://www.phoronix.com/scan.php?page=news_item&px=MTEwMDE

For myself, I have a real world example for deduplication of identical
files from a proprietary data capture system where the filenames change
(timestamp and index data stored in the filename) yet there are periods
where the file contents change only occasionally... The 'natural' thing
to do is hardlink together all the identical files to then just have the
unique filenames... And you might have many files in a particular
directory...

Note that for long filenames (surprisingly commonly done!), one fail
case noted above is just 13 hard links.

Looks like I'm stuck on ext4 with an impoverished "cp -l" for a fast
'snapshot' for the time being still... (Or differently, LVM snapshot and
copy.)

For btrfs, rather than a "break everything" format change, can a neat
and robust 'workaround' be made so that the problem-case hardlinks to a
file within the same directory perhaps spawn their own transparent
subdirectory for the hard links?... Worse case then is that upon a
downgrade to an older kernel, the 'transparent' subdirectory of hard
links becomes visible as a distinct subdirectory? (That is a 'break' but
at least data isn't lost.)

Or am I chasing the wrong bits? ;-)

More seriously: The killer there for me is that running rsync or running
a deduplication script might hit too many hard links that were perfectly
fine when on ext4.

Regards,
Martin

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Will big metadata blocks fix # of hardlinks?

2012-05-29 Thread Hugo Mills

On Tue, May 29, 2012 at 02:09:03PM +0100, Martin wrote:
> Thanks for noting this one. That is one very surprising and unexpected
> limit!... And a killer for some not completely rare applications...

   There have been substantially-complete patches posted to this list
which fix the problem (see "extended inode refs" patches by Mark
Fasheh in the archives). I don't think they're quite ready for
inclusion yet, but work is ongoing to fix the issue.

> On 26/05/12 19:22, Sami Liedes wrote:
> > Hi!
> > 
> > I see that Linux 3.4 supports bigger metadata blocks for btrfs.
> > 
> > Will using them allow a bigger number of hardlinks on a single file
> > (i.e. the bug that has bitten at least git users on Debian[1,2], and
> > BackupPC[3])? As far as I understand correctly, the problem has been
> > that the hard links are stored in the same metadata block with some
> > other metadata, so the size of the block is an inherent limitation?
> > 
> > If so, I think it would be worth for me to try Btrfs again :)
> > 
> > Sami
> > 
> > 
> > [1] http://permalink.gmane.org/gmane.comp.file-systems.btrfs/13603
> > [2] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=642603
> > [3] https://bugzilla.kernel.org/show_bug.cgi?id=15762
> 
> One example fail case is just 13 hard links. Even x4 that (16k blocks)
> only gives 52 links for that example fail case.
> 
> 
> The brief summary for those are:
> 
> * It's a rare corner case that needs a format change to fix, so "won't-fix";

   Definitely not "won't-fix" (see above).

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- Great oxymorons of the world, no. 7: The Simple Truth ---  


signature.asc
Description: Digital signature

Re: atime and filesystems with snapshots (especially Btrfs)

2012-05-29 Thread Alexander Block

On Tue, May 29, 2012 at 10:14 AM, Boaz Harrosh  wrote:
>
> Sounds like a real problem. I would suggest a few remedies.
> 1. Make a filesystem persistent parameter that says noatime/relatime/atime
>   So the default if not specified on mount is taken as a property of
>   the FS (mkfs can set it)
That would be possible. But again, I'm not sure if it is allowed for
one fs type to
differ from all the other filesystems in its default behavior.
> 2. The snapshot program should check and complain if it is on, and recommend
>   an off. Since the problem only starts with a snapshot.
That would definitely cause awareness for the problem and many people would
probably switch to noatime on mount.
> 3. If space availability drops under some threshold, disable atime. As you 
> said
>   this is catastrophic in this case. So user can always search and delete 
> files.
>   In fact if the IO was only because of atime, it should be a soft error, 
> warned,
>   and ignored.
It would be hard to determine a good threshold. This really depends on the way
snapshots are used.
>
> But perhaps the true solution is to put atime on a side table, so only the 
> atime
> info gets COW and not the all MetaData
This would definitely reduce the problem to a minimum. But it may be harder
to implement as it sounds. You would either have to keep 2 trees per subvolume
(one for the fs and one for atime) or share one tree for all subvols.
I don't think
2 trees per subvolume would be acceptable, but I'm not sure. A shared tree
would require to implement some kind of custom refcounting for the items, as
changes to one fs tree should not change atime of the other and thus create
new items on demand. It would probably also require snapshot origin tracking,
because on a freshly snapshotted subvolume, no atime entries would exist at
all and must be read from the parent/origin.
>
> Just my $0.017
> Boaz
>
>> Alex.
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Decrease meta fragments by using a caterpillar band Method (Ver. 2)

2012-05-29 Thread WeiFeng Liu


This is a several bugs fixed version since my first patch commit, and added
patch of btrfs-prog


Introduction and brief speculate of values and penalties:

When a tree block need to be created, we offer, say, 2 or 3 blocks for 
it, 
then pick one from the continuous blocks. If this tree block needs a cow,
another free block from these continuous blocks can be grabbed, and the old one
is freed for next cow.

In the most ideal condition only 2 continuous blocks are kept for any 
times
of cowing a tree block -- image a caterpillar band by which my method is named.

Given a scene that there are two file systems: one with COW and the 
other
NOCOW, each has 1GB space for metadata with it's first 100MB has been used, and
let me focus only on ops of modifying metadata and ignore deleting metadata for
simple reason. 

As we can image that the NOCOW fs would likely keep the layout of its 
100MB
meta to most neat: changes can be overwritten in the original places and leave
the rest 900MB untouched. But it is lack of the excellent feature to assure data
integrity which owned only by COW fs.

However, only modifying metadata though, the COW fs would make holes in 
the first 100MB and write COWed meta into the rest 900MB space, in the extreme
condition, the whole 1GB space would be fragmented finally and scattered by that
100MB metadata. I don't think btrfs will be easily trap into such bad state, as
I understood we have extent, chunk, cluster and maybe other methods(tell me
please) to slow fragmenting, but it seems that there are no dedicate method
(fix me) to help COW getting rid of this type of fragments which NOCOW fs does
not incline to make.

I introduce the caterpillar band method as a compromise. It use 300MB 
for
meta to avoid such fragments and without lost of COW feature, in the instance,
that means three countinues blocks are used for a single tree block, the tree
block will be circularly updated(cowed) within its three countinues blocks.

Penalties? Yes there are, thanks to Arne Jansen and Liu Bo. As Arne 
Jansen
indicated, the most disadvantage of the method will be that this will basically
limit us to 1/3th of platter speed to write meta when using spinning drives and
to 1/4th if using four countinues blocks for a tree block.

About readahead, which will be also down to the 1/3th of NOCOW fs rate, 
but
I would discreetly think it as an advantage rather than penalty comparing worse
condition which COW would get -- nearly since the first COW, the new tree blocks
cowed would be 50MB far away from their original neighbor blocks normally, and
after frequent random modify ops, would the worst conditition be that every
dozen of tree blocks newly cowed are more than 50MB far away from their original
neighbor blocks if in equal probability?

So permit me to think readahead is only usefull for NOCOW fs in this
scenario, because it always keeps its original 100MB continued, and my way would
keep 1/3 readahead rate vs maybe-zero by pure COW if worstly.

Of course, both penalties and values are only for metadata and will not
affect user date read/write, my patch is only applied for cow tree blocks. 
But if there are large number of small files(size<4k), values and 
penalties
will also affect those small user data R/W.

I have not made tests for my patch by now, it still need some time to 
get
more check for both strategy and code in patch and fix possible bugs before
test, any comments are welcome.


Thanks

signed-off-by WeiFeng Liu
523f28f9b3d9c710cacc31dbba644efb1678cf62

---
diff -uprN a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
--- a/fs/btrfs/ctree.c  2012-05-21 18:42:51.0 +
+++ b/fs/btrfs/ctree.c  2012-05-29 23:08:19.0 +
@@ -444,9 +444,21 @@ static noinline int __btrfs_cow_block(st
} else
parent_start = 0;
 
-   cow = btrfs_alloc_free_block(trans, root, buf->len, parent_start,
-root->root_key.objectid, &disk_key,
-level, search_start, empty_size, 1);
+   if (root->fs_info->cater_factor > 1) {  
+   if (btrfs_cater_factor(btrfs_header_cater(buf)) > 1)

+   cow = btrfs_grab_cater_block(trans, root, buf, 
parent_start,
+   
root->root_key.objectid, &disk_key,
+   level, search_start, 
empty_size, 1);
+   else
+   cow = btrfs_alloc_free_block_cater(trans, root, 
buf->len, 
+   parent_start,
+   
root->root_key.objectid, &disk_key,
+   level, search_start, 
empty_size, 1);
+   } else {
+   cow = btrfs_alloc_free_block(trans, roo

[PATCH] Decrease meta fragments by using a caterpillar band Method (btrfs-progs)

2012-05-29 Thread WeiFeng Liu

signed-off-by WeiFeng Liu
523f28f9b3d9c710cacc31dbba644efb1678cf62
---

diff -uprN btrfs-progs-120328-a/ctree.c btrfs-progs-120328-b/ctree.c
--- btrfs-progs-120328-a/ctree.c2012-04-16 08:47:08.0 +
+++ btrfs-progs-120328-b/ctree.c2012-05-28 23:29:15.0 +
@@ -334,6 +334,7 @@ int __btrfs_cow_block(struct btrfs_trans
btrfs_set_header_flag(cow, BTRFS_HEADER_FLAG_RELOC);
else
btrfs_set_header_owner(cow, root->root_key.objectid);
+   btrfs_set_header_cater(cow, 0);
 
write_extent_buffer(cow, root->fs_info->fsid,
(unsigned long)btrfs_header_fsid(cow),
diff -uprN btrfs-progs-120328-a/ctree.h btrfs-progs-120328-b/ctree.h
--- btrfs-progs-120328-a/ctree.h2012-04-16 08:47:08.0 +
+++ btrfs-progs-120328-b/ctree.h2012-05-28 23:25:26.0 +
@@ -292,6 +292,7 @@ struct btrfs_header {
__le64 owner;
__le32 nritems;
u8 level;
+   u8 cater_index_factor;
 } __attribute__ ((__packed__));
 
 #define BTRFS_NODEPTRS_PER_BLOCK(r) (((r)->nodesize - \
@@ -510,6 +511,7 @@ struct btrfs_extent_item {
__le64 refs;
__le64 generation;
__le64 flags;
+   u8 cater_index_factor;
 } __attribute__ ((__packed__));
 
 struct btrfs_extent_item_v0 {
@@ -1246,6 +1248,8 @@ BTRFS_SETGET_FUNCS(extent_flags, struct
 BTRFS_SETGET_FUNCS(extent_refs_v0, struct btrfs_extent_item_v0, refs, 32);
 
 BTRFS_SETGET_FUNCS(tree_block_level, struct btrfs_tree_block_info, level, 8);
+BTRFS_SETGET_FUNCS(extent_cater, struct btrfs_extent_item,
+   cater_index_factor, 8);
 
 static inline void btrfs_tree_block_key(struct extent_buffer *eb,
struct btrfs_tree_block_info *item,
@@ -1511,6 +1515,8 @@ BTRFS_SETGET_HEADER_FUNCS(header_owner,
 BTRFS_SETGET_HEADER_FUNCS(header_nritems, struct btrfs_header, nritems, 32);
 BTRFS_SETGET_HEADER_FUNCS(header_flags, struct btrfs_header, flags, 64);
 BTRFS_SETGET_HEADER_FUNCS(header_level, struct btrfs_header, level, 8);
+BTRFS_SETGET_HEADER_FUNCS(header_cater, struct btrfs_header,
+   cater_index_factor, 8);
 
 static inline int btrfs_header_flag(struct extent_buffer *eb, u64 flag)
 {
diff -uprN btrfs-progs-120328-a/extent-tree.c btrfs-progs-120328-b/extent-tree.c
--- btrfs-progs-120328-a/extent-tree.c  2012-04-16 08:47:08.0 +
+++ btrfs-progs-120328-b/extent-tree.c  2012-05-28 20:06:06.0 +
@@ -2584,6 +2584,7 @@ static int alloc_reserved_tree_block(str
btrfs_set_extent_generation(leaf, extent_item, generation);
btrfs_set_extent_flags(leaf, extent_item,
   flags | BTRFS_EXTENT_FLAG_TREE_BLOCK);
+   btrfs_set_extent_cater(leaf, extent_item, 0);
block_info = (struct btrfs_tree_block_info *)(extent_item + 1);
 
btrfs_set_tree_block_key(leaf, block_info, key);
diff -uprN btrfs-progs-120328-a/utils.c btrfs-progs-120328-b/utils.c
--- btrfs-progs-120328-a/utils.c2012-04-16 08:47:08.0 +
+++ btrfs-progs-120328-b/utils.c2012-05-28 23:22:20.0 +
@@ -135,6 +135,7 @@ int make_btrfs(int fd, const char *devic
btrfs_set_header_generation(buf, 1);
btrfs_set_header_backref_rev(buf, BTRFS_MIXED_BACKREF_REV);
btrfs_set_header_owner(buf, BTRFS_ROOT_TREE_OBJECTID);
+   btrfs_set_header_cater(buf, 0);
write_extent_buffer(buf, super.fsid, (unsigned long)
btrfs_header_fsid(buf), BTRFS_FSID_SIZE);
 
@@ -254,6 +255,7 @@ int make_btrfs(int fd, const char *devic
btrfs_set_header_bytenr(buf, blocks[2]);
btrfs_set_header_owner(buf, BTRFS_EXTENT_TREE_OBJECTID);
btrfs_set_header_nritems(buf, nritems);
+   btrfs_set_header_cater(buf, 0);
csum_tree_block_size(buf, BTRFS_CRC32_SIZE, 0);
ret = pwrite(fd, buf->data, leafsize, blocks[2]);
BUG_ON(ret != leafsize);
@@ -338,6 +340,7 @@ int make_btrfs(int fd, const char *devic
btrfs_set_header_bytenr(buf, blocks[3]);
btrfs_set_header_owner(buf, BTRFS_CHUNK_TREE_OBJECTID);
btrfs_set_header_nritems(buf, nritems);
+   btrfs_set_header_cater(buf, 0);
csum_tree_block_size(buf, BTRFS_CRC32_SIZE, 0);
ret = pwrite(fd, buf->data, leafsize, blocks[3]);
 
@@ -373,6 +376,7 @@ int make_btrfs(int fd, const char *devic
btrfs_set_header_bytenr(buf, blocks[4]);
btrfs_set_header_owner(buf, BTRFS_DEV_TREE_OBJECTID);
btrfs_set_header_nritems(buf, nritems);
+   btrfs_set_header_cater(buf, 0);
csum_tree_block_size(buf, BTRFS_CRC32_SIZE, 0);
ret = pwrite(fd, buf->data, leafsize, blocks[4]);
 
@@ -382,6 +386,7 @@ int make_btrfs(int fd, const char *devic
btrfs_set_header_bytenr(buf, blocks[5]);
btrfs_set_header_owner(buf, BTRFS_FS_TREE_OBJECTID);
btrfs_set_header_nritems(buf, 0);
+   btrfs_set_header_cater(buf, 0);

[RFC PATCH] Decrease meta fragments by using a caterpillar band Method (Ver. 2)

2012-05-29 Thread WeiFeng Liu

This is a several bugs fixed version since my first patch commit, and added
patch of btrfs-prog


Introduction and brief speculate of values and penalties:

When a tree block need to be created, we offer, say, 2 or 3 blocks for 
it, 
then pick one from the continuous blocks. If this tree block needs a cow,
another free block from these continuous blocks can be grabbed, and the old one
is freed for next cow.

In the most ideal condition only 2 continuous blocks are kept for any 
times
of cowing a tree block -- image a caterpillar band by which my method is named.

Given a scene that there are two file systems: one with COW and the 
other
NOCOW, each has 1GB space for metadata with it's first 100MB has been used, and
let me focus only on ops of modifying metadata and ignore deleting metadata for
simple reason. 

As we can image that the NOCOW fs would likely keep the layout of its 
100MB
meta to most neat: changes can be overwritten in the original places and leave
the rest 900MB untouched. But it is lack of the excellent feature to assure data
integrity which owned only by COW fs.

However, only modifying metadata though, the COW fs would make holes in 
the first 100MB and write COWed meta into the rest 900MB space, in the extreme
condition, the whole 1GB space would be fragmented finally and scattered by that
100MB metadata. I don't think btrfs will be easily trap into such bad state, as
I understood we have extent, chunk, cluster and maybe other methods(tell me
please) to slow fragmenting, but it seems that there are no dedicate method
(fix me) to help COW getting rid of this type of fragments which NOCOW fs does
not incline to make.

I introduce the caterpillar band method as a compromise. It use 300MB 
for
meta to avoid such fragments and without lost of COW feature, in the instance,
that means three countinues blocks are used for a single tree block, the tree
block will be circularly updated(cowed) within its three countinues blocks.

Penalties? Yes there are, thanks to Arne Jansen and Liu Bo. As Arne 
Jansen
indicated, the most disadvantage of the method will be that this will basically
limit us to 1/3th of platter speed to write meta when using spinning drives and
to 1/4th if using four countinues blocks for a tree block.

About readahead, which will be also down to the 1/3th of NOCOW fs rate, 
but
I would discreetly think it as an advantage rather than penalty comparing worse
condition which COW would get -- nearly since the first COW, the new tree blocks
cowed would be 50MB far away from their original neighbor blocks normally, and
after frequent random modify ops, would the worst conditition be that every
dozen of tree blocks newly cowed are more than 50MB far away from their original
neighbor blocks if in equal probability?

So permit me to think readahead is only usefull for NOCOW fs in this
scenario, because it always keeps its original 100MB continued, and my way would
keep 1/3 readahead rate vs maybe-zero by pure COW if worstly.

Of course, both penalties and values are only for metadata and will not
affect user date read/write, my patch is only applied for cow tree blocks. 
But if there are large number of small files(size<4k), values and 
penalties
will also affect those small user data R/W.

I have not made tests for my patch by now, it still need some time to 
get
more check for both strategy and code in patch and fix possible bugs before
test, any comments are welcome.


Thanks


signed-off-by WeiFeng Liu
523f28f9b3d9c710cacc31dbba644efb1678cf62

---

diff -uprN a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
--- a/fs/btrfs/ctree.c  2012-05-21 18:42:51.0 +
+++ b/fs/btrfs/ctree.c  2012-05-29 23:08:19.0 +
@@ -444,9 +444,21 @@ static noinline int __btrfs_cow_block(st
} else
parent_start = 0;
 
-   cow = btrfs_alloc_free_block(trans, root, buf->len, parent_start,
-root->root_key.objectid, &disk_key,
-level, search_start, empty_size, 1);
+   if (root->fs_info->cater_factor > 1) {  
+   if (btrfs_cater_factor(btrfs_header_cater(buf)) > 1)

+   cow = btrfs_grab_cater_block(trans, root, buf, 
parent_start,
+   
root->root_key.objectid, &disk_key,
+   level, search_start, 
empty_size, 1);
+   else
+   cow = btrfs_alloc_free_block_cater(trans, root, 
buf->len, 
+   parent_start,
+   
root->root_key.objectid, &disk_key,
+   level, search_start, 
empty_size, 1);
+   } else {
+   cow = btrfs_alloc_free_block(trans, ro

Re: [RFC PATCH] Decrease meta fragments by using a caterpillar band Method (Ver. 2)

2012-05-29 Thread Goffredo Baroncelli

Hi Liu,

On 05/29/2012 06:24 PM, WeiFeng Liu wrote:
> This is a several bugs fixed version since my first patch commit, and added
> patch of btrfs-prog
> 
> 
> Introduction and brief speculate of values and penalties:
> 
>   When a tree block need to be created, we offer, say, 2 or 3 blocks for 
> it, 
> then pick one from the continuous blocks. If this tree block needs a cow,
> another free block from these continuous blocks can be grabbed, and the old 
> one
> is freed for next cow.


What happens if the block is not COW-ed *and freed* but COW-ed only
(think about a snapshot updated) ? I.e, what happens if the user makes
5-6 snapshot and the caterpillar-size is 3 ?

> 
>   In the most ideal condition only 2 continuous blocks are kept for any 
> times
> of cowing a tree block -- image a caterpillar band by which my method is 
> named.

Apart my doubt above, I am very interested on the performances. However
I have some doubts about the space efficiency.
I have the impression that today BTRFS consumes a lot of space for the
meta-data.

On my linux box:

ghigo@venice:~$ sudo btrfs fi df /
Data: total=19.01GB, used=14.10GB
System, DUP: total=8.00MB, used=4.00KB
System: total=4.00MB, used=0.00
Metadata, DUP: total=2.00GB, used=933.55MB
Metadata: total=8.00MB, used=0.00

So basically the metadata are bout the 6% of the data (0.9GB / 14.1GB ).

But with your idea, BTRFS should reserve 3 blocks for every metadata block.
This means that the BTRFS ratio metadata/data will increase to 18%
(6%*3). Which is not a so negligible value.

GB

> 
>   Given a scene that there are two file systems: one with COW and the 
> other
> NOCOW, each has 1GB space for metadata with it's first 100MB has been used, 
> and
> let me focus only on ops of modifying metadata and ignore deleting metadata 
> for
> simple reason. 
> 
>   As we can image that the NOCOW fs would likely keep the layout of its 
> 100MB
> meta to most neat: changes can be overwritten in the original places and leave
> the rest 900MB untouched. But it is lack of the excellent feature to assure 
> data
> integrity which owned only by COW fs.
> 
>   However, only modifying metadata though, the COW fs would make holes in 
> the first 100MB and write COWed meta into the rest 900MB space, in the extreme
> condition, the whole 1GB space would be fragmented finally and scattered by 
> that
> 100MB metadata. I don't think btrfs will be easily trap into such bad state, 
> as
> I understood we have extent, chunk, cluster and maybe other methods(tell me
> please) to slow fragmenting, but it seems that there are no dedicate method
> (fix me) to help COW getting rid of this type of fragments which NOCOW fs does
> not incline to make.
> 
>   I introduce the caterpillar band method as a compromise. It use 300MB 
> for
> meta to avoid such fragments and without lost of COW feature, in the instance,
> that means three countinues blocks are used for a single tree block, the tree
> block will be circularly updated(cowed) within its three countinues blocks.
> 
>   Penalties? Yes there are, thanks to Arne Jansen and Liu Bo. As Arne 
> Jansen
> indicated, the most disadvantage of the method will be that this will 
> basically
> limit us to 1/3th of platter speed to write meta when using spinning drives 
> and
> to 1/4th if using four countinues blocks for a tree block.
> 
>   About readahead, which will be also down to the 1/3th of NOCOW fs rate, 
> but
> I would discreetly think it as an advantage rather than penalty comparing 
> worse
> condition which COW would get -- nearly since the first COW, the new tree 
> blocks
> cowed would be 50MB far away from their original neighbor blocks normally, and
> after frequent random modify ops, would the worst conditition be that every
> dozen of tree blocks newly cowed are more than 50MB far away from their 
> original
> neighbor blocks if in equal probability?
> 
>   So permit me to think readahead is only usefull for NOCOW fs in this
> scenario, because it always keeps its original 100MB continued, and my way 
> would
> keep 1/3 readahead rate vs maybe-zero by pure COW if worstly.
> 
>   Of course, both penalties and values are only for metadata and will not
> affect user date read/write, my patch is only applied for cow tree blocks. 
>   But if there are large number of small files(size<4k), values and 
> penalties
> will also affect those small user data R/W.
> 
>   I have not made tests for my patch by now, it still need some time to 
> get
> more check for both strategy and code in patch and fix possible bugs before
> test, any comments are welcome.
> 
> 
> Thanks
> 
> 
> signed-off-by WeiFeng Liu
> 523f28f9b3d9c710cacc31dbba644efb1678cf62
> 
> ---
> 
> diff -uprN a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> --- a/fs/btrfs/ctree.c2012-05-21 18:42:51.0 +
> +++ b/fs/btrfs/ctree.c2012-05-29 23:08:19.0 +
> @@ -444,

[PATCH] Btrfs: fix return code in drop_objectid_items

2012-05-29 Thread Josef Bacik

So dpkg fsync()'s the file and the directory containing the file whenever it
writes to a file which is really slow in btrfs.  This is partly because
fsync()'ing a directory _always_ committed the transaction instead of just
going to the tree log.  This is because drop_objectid_items() would return 1
since it does a btrfs_search_slot() which returns 1.  In tree-log jargon
this means that we have to commit the transaction to be safe.  So just check
if ret is greater than 0 and set it to 0 if it does.  With this patch we now
use the tree-log instead of committing the entire transaction, which is
twice as fast on my box.  Thanks,

Signed-off-by: Josef Bacik 
---
 fs/btrfs/tree-log.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 425014b..2017d0f 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -2667,6 +2667,8 @@ static int drop_objectid_items(struct btrfs_trans_handle 
*trans,
btrfs_release_path(path);
}
btrfs_release_path(path);
+   if (ret > 0)
+   ret = 0;
return ret;
 }
 
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Btrfs: check to see if the inode is in the log before fsyncing

2012-05-29 Thread Josef Bacik

We have this check down in the actual logging code, but this is after we
start a transaction and all that good stuff.  So move the helper
inode_in_log() out so we can call it in fsync() and avoid starting a
transaction altogether and just exit if we've already fsync()'ed this file
recently.  You would notice this issue if you fsync()'ed a file over and
over again until the transaction committed.  Thanks,

Signed-off-by: Josef Bacik 
---
 fs/btrfs/btrfs_inode.h |   13 +
 fs/btrfs/file.c|3 ++-
 fs/btrfs/tree-log.c|   17 +
 3 files changed, 16 insertions(+), 17 deletions(-)

diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index ce2c9d6..e616f887 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -199,4 +199,17 @@ static inline bool btrfs_is_free_space_inode(struct 
btrfs_root *root,
return false;
 }
 
+static inline int btrfs_inode_in_log(struct inode *inode, u64 generation)
+{
+   struct btrfs_root *root = BTRFS_I(inode)->root;
+   int ret = 0;
+
+   mutex_lock(&root->log_mutex);
+   if (BTRFS_I(inode)->logged_trans == generation &&
+   BTRFS_I(inode)->last_sub_trans <= root->last_log_commit)
+   ret = 1;
+   mutex_unlock(&root->log_mutex);
+   return ret;
+}
+
 #endif
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 5a525d0..70dc8ca 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1552,7 +1552,8 @@ int btrfs_sync_file(struct file *file, loff_t start, 
loff_t end, int datasync)
 * syncing
 */
smp_mb();
-   if (BTRFS_I(inode)->last_trans <=
+   if (btrfs_inode_in_log(inode, root->fs_info->generation) ||
+   BTRFS_I(inode)->last_trans <=
root->fs_info->last_trans_committed) {
BTRFS_I(inode)->last_trans = 0;
mutex_unlock(&inode->i_mutex);
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 6f22a4f..425014b 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -3038,21 +3038,6 @@ out:
return ret;
 }
 
-static int inode_in_log(struct btrfs_trans_handle *trans,
-struct inode *inode)
-{
-   struct btrfs_root *root = BTRFS_I(inode)->root;
-   int ret = 0;
-
-   mutex_lock(&root->log_mutex);
-   if (BTRFS_I(inode)->logged_trans == trans->transid &&
-   BTRFS_I(inode)->last_sub_trans <= root->last_log_commit)
-   ret = 1;
-   mutex_unlock(&root->log_mutex);
-   return ret;
-}
-
-
 /*
  * helper function around btrfs_log_inode to make sure newly created
  * parent directories also end up in the log.  A minimal inode and backref
@@ -3093,7 +3078,7 @@ int btrfs_log_inode_parent(struct btrfs_trans_handle 
*trans,
if (ret)
goto end_no_trans;
 
-   if (inode_in_log(trans, inode)) {
+   if (btrfs_inode_in_log(inode, trans->transid)) {
ret = BTRFS_NO_LOG_SYNC;
goto end_no_trans;
}
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Help with recover data

2012-05-29 Thread Maxim Mikheev


Hi Everyone,

I recently decided to use btrfs. It works perfectly for a week even 
under heavy load. Yesterday I destroyed backups as cannot afford to have 
~10TB in backups. I decided to switch on Btrfs because it was announced 
that it stable already
I need to recover ~5TB data, this data is important and I do not have 
backups



 uname -a
Linux s0 3.4.0-030400-generic #201205210521 SMP Mon May 21 09:22:02 UTC 
2012 x86_64 x86_64 x86_64 GNU/Linux


sudo mount -o recovery /dev/sdb /tank
mount: wrong fs type, bad option, bad superblock on /dev/sdb,
   missing codepage or helper program, or other error
   In some cases useful info is found in syslog - try
   dmesg | tail  or so

dmesg:
[ 9612.971149] device fsid c9776e19-37eb-4f9c-bd6b-04e8dde97682 devid 2 
transid 9096 /dev/sdb

[ 9613.048476] btrfs: enabling auto recovery
[ 9613.048482] btrfs: disk space caching is enabled
[ 9621.172540] parent transid verify failed on 5468060241920 wanted 9096 
found 7621
[ 9621.181369] parent transid verify failed on 5468060241920 wanted 9096 
found 7621
[ 9621.182167] btrfs read error corrected: ino 1 off 5468060241920 (dev 
/dev/sdd sector 2143292648)

[ 9621.182181] Failed to read block groups: -5
[ 9621.193680] btrfs: open_ctree failed

sudo /usr/local/bin/btrfs-find-root /dev/sdb
...
Well block 4455562448896 seems great, but generation doesn't match, 
have=9092, want=9096
Well block 4455568302080 seems great, but generation doesn't match, 
have=9091, want=9096
Well block 4848395739136 seems great, but generation doesn't match, 
have=9093, want=9096
Well block 4923796594688 seems great, but generation doesn't match, 
have=9094, want=9096
Well block 4923798065152 seems great, but generation doesn't match, 
have=9095, want=9096

Found tree root at 5532762525696


$ sudo btrfs-restore -v -t 4923798065152 /dev/sdb ./
parent transid verify failed on 4923798065152 wanted 9096 found 9095
parent transid verify failed on 4923798065152 wanted 9096 found 9095
parent transid verify failed on 4923798065152 wanted 9096 found 9095
parent transid verify failed on 4923798065152 wanted 9096 found 9095
Ignoring transid failure
Root objectid is 5
Restoring ./Irina
Restoring ./Irina/.idmapdir2
Skipping existing file ./Irina/.idmapdir2/4.bucket.lock
If you wish to overwrite use the -o option to overwrite
Skipping existing file ./Irina/.idmapdir2/7.bucket
Skipping existing file ./Irina/.idmapdir2/15.bucket
Skipping existing file ./Irina/.idmapdir2/12.bucket.lock
Skipping existing file ./Irina/.idmapdir2/cap.txt
Skipping existing file ./Irina/.idmapdir2/5.bucket
Restoring ./Irina/.idmapdir2/10.bucket.lock
Restoring ./Irina/.idmapdir2/6.bucket.lock
Restoring ./Irina/.idmapdir2/8.bucket
ret is -3


 sudo btrfs-zero-log /dev/sdb
...
parent transid verify failed on 5468231311360 wanted 9096 found 7621
parent transid verify failed on 5468231311360 wanted 9096 found 7621
parent transid verify failed on 5468060102656 wanted 9096 found 7621
Ignoring transid failure
leaf parent key incorrect 59310080
btrfs-zero-log: extent-tree.c:2578: alloc_reserved_tree_block: Assertion 
`!(ret)' failed.


Help me please.

Max
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Help with data recovering

2012-05-29 Thread Maxim Mikheev


After command:
 sudo /usr/local/bin/btrfs device scan

i got new lines in dmesg:


11329.598535] device fsid c9776e19-37eb-4f9c-bd6b-04e8dde97682 devid 2 
transid 9096 /dev/sdb
[11329.599885] device fsid c9776e19-37eb-4f9c-bd6b-04e8dde97682 devid 3 
transid 9095 /dev/sdd
[11329.600840] device fsid c9776e19-37eb-4f9c-bd6b-04e8dde97682 devid 1 
transid 9096 /dev/sda
[11329.602083] device fsid c9776e19-37eb-4f9c-bd6b-04e8dde97682 devid 4 
transid 9096 /dev/sde
[11329.603036] device fsid c9776e19-37eb-4f9c-bd6b-04e8dde97682 devid 5 
transid 9096 /dev/sdf


looks like /dev/sdd lost one transid.
Is it possible to roll back on transid 9095?

Thanks

On 05/29/2012 06:14 PM, Maxim Mikheev wrote:

Hi Everyone,

I recently decided to use btrfs. It works perfectly for a week even 
under heavy load. Yesterday I destroyed backups as cannot afford to 
have ~10TB in backups. I decided to switch on Btrfs because it was 
announced that it stable already
I need to recover ~5TB data, this data is important and I do not have 
backups



 uname -a
Linux s0 3.4.0-030400-generic #201205210521 SMP Mon May 21 09:22:02 
UTC 2012 x86_64 x86_64 x86_64 GNU/Linux


sudo mount -o recovery /dev/sdb /tank
mount: wrong fs type, bad option, bad superblock on /dev/sdb,
   missing codepage or helper program, or other error
   In some cases useful info is found in syslog - try
   dmesg | tail  or so

dmesg:
[ 9612.971149] device fsid c9776e19-37eb-4f9c-bd6b-04e8dde97682 devid 
2 transid 9096 /dev/sdb

[ 9613.048476] btrfs: enabling auto recovery
[ 9613.048482] btrfs: disk space caching is enabled
[ 9621.172540] parent transid verify failed on 5468060241920 wanted 
9096 found 7621
[ 9621.181369] parent transid verify failed on 5468060241920 wanted 
9096 found 7621
[ 9621.182167] btrfs read error corrected: ino 1 off 5468060241920 
(dev /dev/sdd sector 2143292648)

[ 9621.182181] Failed to read block groups: -5
[ 9621.193680] btrfs: open_ctree failed

sudo /usr/local/bin/btrfs-find-root /dev/sdb
...
Well block 4455562448896 seems great, but generation doesn't match, 
have=9092, want=9096
Well block 4455568302080 seems great, but generation doesn't match, 
have=9091, want=9096
Well block 4848395739136 seems great, but generation doesn't match, 
have=9093, want=9096
Well block 4923796594688 seems great, but generation doesn't match, 
have=9094, want=9096
Well block 4923798065152 seems great, but generation doesn't match, 
have=9095, want=9096

Found tree root at 5532762525696


$ sudo btrfs-restore -v -t 4923798065152 /dev/sdb ./
parent transid verify failed on 4923798065152 wanted 9096 found 9095
parent transid verify failed on 4923798065152 wanted 9096 found 9095
parent transid verify failed on 4923798065152 wanted 9096 found 9095
parent transid verify failed on 4923798065152 wanted 9096 found 9095
Ignoring transid failure
Root objectid is 5
Restoring ./Irina
Restoring ./Irina/.idmapdir2
Skipping existing file ./Irina/.idmapdir2/4.bucket.lock
If you wish to overwrite use the -o option to overwrite
Skipping existing file ./Irina/.idmapdir2/7.bucket
Skipping existing file ./Irina/.idmapdir2/15.bucket
Skipping existing file ./Irina/.idmapdir2/12.bucket.lock
Skipping existing file ./Irina/.idmapdir2/cap.txt
Skipping existing file ./Irina/.idmapdir2/5.bucket
Restoring ./Irina/.idmapdir2/10.bucket.lock
Restoring ./Irina/.idmapdir2/6.bucket.lock
Restoring ./Irina/.idmapdir2/8.bucket
ret is -3


 sudo btrfs-zero-log /dev/sdb
...
parent transid verify failed on 5468231311360 wanted 9096 found 7621
parent transid verify failed on 5468231311360 wanted 9096 found 7621
parent transid verify failed on 5468060102656 wanted 9096 found 7621
Ignoring transid failure
leaf parent key incorrect 59310080
btrfs-zero-log: extent-tree.c:2578: alloc_reserved_tree_block: 
Assertion `!(ret)' failed.


Help me please.

Max

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Help with data recovering

2012-05-29 Thread cwillu

I can't help much at the moment, but the following will help sort things out:

Can you provide as much detail as possible about how things were
configured at the time of the failure?  Raid levels used, kernel
versions at the time of the failure, how the disks are connected,
general description of the activity on the disk and the nature of its
contents (all large files? rootfs? mail spools?)  What you were
thinking at the time you decided that you couldn't afford backups? As
much detail as possible on what all you've tried since the failure to
recover things?

It's likely the data is fine (if currently inaccessible), but
obviously things are in a fragile state, and the important thing right
now is to not make things worse: a recoverable situation may otherwise
turn into an irrecoverable one.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Help with recover data

2012-05-29 Thread Felix Blanke




On 5/30/12 12:14 AM, Maxim Mikheev wrote:

Hi Everyone,

I recently decided to use btrfs. It works perfectly for a week even
under heavy load. Yesterday I destroyed backups as cannot afford to have
~10TB in backups. I decided to switch on Btrfs because it was announced
that it stable already
I need to recover ~5TB data, this data is important and I do not have
backups



Just out of curiosity: Who announced that BTRFS is stable already?! The 
kernel says something different and there is still no 100% working fsck 
for btrfs. Imho it is far away from being stable :)


And btw: Even it would be stable, allways keep backups for important 
data ffs! I don't understand why there are still technical experienced 
people who don't do backups :/ Imho if you don't do backups from a 
portion of data they are considered not to be important.




uname -a
Linux s0 3.4.0-030400-generic #201205210521 SMP Mon May 21 09:22:02 UTC
2012 x86_64 x86_64 x86_64 GNU/Linux

sudo mount -o recovery /dev/sdb /tank
mount: wrong fs type, bad option, bad superblock on /dev/sdb,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so

dmesg:
[ 9612.971149] device fsid c9776e19-37eb-4f9c-bd6b-04e8dde97682 devid 2
transid 9096 /dev/sdb
[ 9613.048476] btrfs: enabling auto recovery
[ 9613.048482] btrfs: disk space caching is enabled
[ 9621.172540] parent transid verify failed on 5468060241920 wanted 9096
found 7621
[ 9621.181369] parent transid verify failed on 5468060241920 wanted 9096
found 7621
[ 9621.182167] btrfs read error corrected: ino 1 off 5468060241920 (dev
/dev/sdd sector 2143292648)
[ 9621.182181] Failed to read block groups: -5
[ 9621.193680] btrfs: open_ctree failed

sudo /usr/local/bin/btrfs-find-root /dev/sdb
...
Well block 4455562448896 seems great, but generation doesn't match,
have=9092, want=9096
Well block 4455568302080 seems great, but generation doesn't match,
have=9091, want=9096
Well block 4848395739136 seems great, but generation doesn't match,
have=9093, want=9096
Well block 4923796594688 seems great, but generation doesn't match,
have=9094, want=9096
Well block 4923798065152 seems great, but generation doesn't match,
have=9095, want=9096
Found tree root at 5532762525696


$ sudo btrfs-restore -v -t 4923798065152 /dev/sdb ./
parent transid verify failed on 4923798065152 wanted 9096 found 9095
parent transid verify failed on 4923798065152 wanted 9096 found 9095
parent transid verify failed on 4923798065152 wanted 9096 found 9095
parent transid verify failed on 4923798065152 wanted 9096 found 9095
Ignoring transid failure
Root objectid is 5
Restoring ./Irina
Restoring ./Irina/.idmapdir2
Skipping existing file ./Irina/.idmapdir2/4.bucket.lock
If you wish to overwrite use the -o option to overwrite
Skipping existing file ./Irina/.idmapdir2/7.bucket
Skipping existing file ./Irina/.idmapdir2/15.bucket
Skipping existing file ./Irina/.idmapdir2/12.bucket.lock
Skipping existing file ./Irina/.idmapdir2/cap.txt
Skipping existing file ./Irina/.idmapdir2/5.bucket
Restoring ./Irina/.idmapdir2/10.bucket.lock
Restoring ./Irina/.idmapdir2/6.bucket.lock
Restoring ./Irina/.idmapdir2/8.bucket
ret is -3


sudo btrfs-zero-log /dev/sdb
...
parent transid verify failed on 5468231311360 wanted 9096 found 7621
parent transid verify failed on 5468231311360 wanted 9096 found 7621
parent transid verify failed on 5468060102656 wanted 9096 found 7621
Ignoring transid failure
leaf parent key incorrect 59310080
btrfs-zero-log: extent-tree.c:2578: alloc_reserved_tree_block: Assertion
`!(ret)' failed.

Help me please.

Max
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Help with recover data

2012-05-29 Thread cwillu

On Tue, May 29, 2012 at 5:14 PM, Felix Blanke  wrote:
>
>
> On 5/30/12 12:14 AM, Maxim Mikheev wrote:
>>
>> Hi Everyone,
>>
>> I recently decided to use btrfs. It works perfectly for a week even
>> under heavy load. Yesterday I destroyed backups as cannot afford to have
>> ~10TB in backups. I decided to switch on Btrfs because it was announced
>> that it stable already
>> I need to recover ~5TB data, this data is important and I do not have
>> backups
>>
>
> Just out of curiosity: Who announced that BTRFS is stable already?! The
> kernel says something different and there is still no 100% working fsck for
> btrfs. Imho it is far away from being stable :)
>
> And btw: Even it would be stable, allways keep backups for important data
> ffs! I don't understand why there are still technical experienced people who
> don't do backups :/ Imho if you don't do backups from a portion of data they
> are considered not to be important.

Some distros do offer support, but that's usually in the sense of "if
you have a support contract (and are on qualified hardware and using
it in a supported configuration), we'll help you fix what breaks (and
we're confident we can)", rather than a claim that things will never
break.

I expect (but haven't actually checked recently) that such distros
actively backport btrfs fixes to their supported kernels (btrfs in
Distro X's 3.2 kernel may have fixes that Distro Y's 3.2 kernel does
not, etc), which can lead to unfortunate misunderstandings; we don't
have enough information yet to determine whether that's the case here
though.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Help with data recovering

2012-05-29 Thread Maxim Mikheev


Thank you for your answer.


The system kernel was and now:
Linux s0 3.4.0-030400-generic #201205210521 SMP Mon May 21 09:22:02 UTC 
2012 x86_64 x86_64 x86_64 GNU/Linux


the raid was created by:
mkfs.btrfs /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf

Disk are connected through RocketRaid 2670.

for mounting I used line in fstab:
UUID=c9776e19-37eb-4f9c-bd6b-04e8dde97682/tankbtrfs
defaults,compress=lzo01


On machine was running several Virtual machines. Only one was actively 
using disks.


VM has active several threads:
1. 2 threads reading big files (50GB each)
2. reading from 50 files and writing one big file
3. The kernel panic happens when I run another program with 30 threads 
of reading/writing of small files.


Virtual Machine accessed to underline btrfs through 9-p file system 
which actively used xattr.


After reboot system was in this stage.

I hope that btrfsck --repair will not make it worse, It is now running.

.
Backups, you everytime need them when you don't have.
We was urgently need extra space and planed to buy new disks soon.


On 05/29/2012 07:11 PM, cwillu wrote:

I can't help much at the moment, but the following will help sort things out:

Can you provide as much detail as possible about how things were
configured at the time of the failure?  Raid levels used, kernel
versions at the time of the failure, how the disks are connected,
general description of the activity on the disk and the nature of its
contents (all large files? rootfs? mail spools?)  What you were
thinking at the time you decided that you couldn't afford backups? As
much detail as possible on what all you've tried since the failure to
recover things?

It's likely the data is fine (if currently inaccessible), but
obviously things are in a fragile state, and the important thing right
now is to not make things worse: a recoverable situation may otherwise
turn into an irrecoverable one.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Help with data recovering

2012-05-29 Thread Maxim Mikheev


I forgot to add.
Btrfs-tools was build from:
git clone 
git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git




On 05/29/2012 07:24 PM, Maxim Mikheev wrote:

Thank you for your answer.


The system kernel was and now:
Linux s0 3.4.0-030400-generic #201205210521 SMP Mon May 21 09:22:02 
UTC 2012 x86_64 x86_64 x86_64 GNU/Linux


the raid was created by:
mkfs.btrfs /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf

Disk are connected through RocketRaid 2670.

for mounting I used line in fstab:
UUID=c9776e19-37eb-4f9c-bd6b-04e8dde97682/tankbtrfs
defaults,compress=lzo01


On machine was running several Virtual machines. Only one was actively 
using disks.


VM has active several threads:
1. 2 threads reading big files (50GB each)
2. reading from 50 files and writing one big file
3. The kernel panic happens when I run another program with 30 threads 
of reading/writing of small files.


Virtual Machine accessed to underline btrfs through 9-p file system 
which actively used xattr.


After reboot system was in this stage.

I hope that btrfsck --repair will not make it worse, It is now running.

.
Backups, you everytime need them when you don't have.
We was urgently need extra space and planed to buy new disks soon.


On 05/29/2012 07:11 PM, cwillu wrote:
I can't help much at the moment, but the following will help sort 
things out:


Can you provide as much detail as possible about how things were
configured at the time of the failure?  Raid levels used, kernel
versions at the time of the failure, how the disks are connected,
general description of the activity on the disk and the nature of its
contents (all large files? rootfs? mail spools?)  What you were
thinking at the time you decided that you couldn't afford backups? As
much detail as possible on what all you've tried since the failure to
recover things?

It's likely the data is fine (if currently inaccessible), but
obviously things are in a fragile state, and the important thing right
now is to not make things worse: a recoverable situation may otherwise
turn into an irrecoverable one.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Help with data recovering

2012-05-29 Thread cwillu

On Tue, May 29, 2012 at 5:24 PM, Maxim Mikheev  wrote:
> Thank you for your answer.
>
>
> The system kernel was and now:
>
> Linux s0 3.4.0-030400-generic #201205210521 SMP Mon May 21 09:22:02 UTC 2012
> x86_64 x86_64 x86_64 GNU/Linux
>
> the raid was created by:
> mkfs.btrfs /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf
>
> Disk are connected through RocketRaid 2670.
>
> for mounting I used line in fstab:
> UUID=c9776e19-37eb-4f9c-bd6b-04e8dde97682    /tank        btrfs
>  defaults,compress=lzo    0    1
>
> On machine was running several Virtual machines. Only one was actively using
> disks.
>
> VM has active several threads:
> 1. 2 threads reading big files (50GB each)
> 2. reading from 50 files and writing one big file
> 3. The kernel panic happens when I run another program with 30 threads of
> reading/writing of small files.
>
> Virtual Machine accessed to underline btrfs through 9-p file system which
> actively used xattr.
>
> After reboot system was in this stage.
>
> I hope that btrfsck --repair will not make it worse, It is now running.

**twitch**

Well, I also hope it won't make it worse.  Do not cancel it now, let
it finish (aborting it will  make things worse), but I suggest waiting
until a few more people have weighed in before attempting anything
beyond that.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Btrfs: fix return code in drop_objectid_items

2012-05-29 Thread Liu Bo

On 05/30/2012 04:57 AM, Josef Bacik wrote:

> So dpkg fsync()'s the file and the directory containing the file whenever it
> writes to a file which is really slow in btrfs.  This is partly because
> fsync()'ing a directory _always_ committed the transaction instead of just
> going to the tree log.  This is because drop_objectid_items() would return 1
> since it does a btrfs_search_slot() which returns 1.  In tree-log jargon
> this means that we have to commit the transaction to be safe.  So just check
> if ret is greater than 0 and set it to 0 if it does.  With this patch we now
> use the tree-log instead of committing the entire transaction, which is
> twice as fast on my box.  Thanks,
> 


Good catch.

Reviewed-by: Liu Bo 

> Signed-off-by: Josef Bacik 
> ---
>  fs/btrfs/tree-log.c |2 ++
>  1 files changed, 2 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
> index 425014b..2017d0f 100644
> --- a/fs/btrfs/tree-log.c
> +++ b/fs/btrfs/tree-log.c
> @@ -2667,6 +2667,8 @@ static int drop_objectid_items(struct 
> btrfs_trans_handle *trans,
>   btrfs_release_path(path);
>   }
>   btrfs_release_path(path);
> + if (ret > 0)
> + ret = 0;
>   return ret;
>  }
>  


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[btrfs-progs] btrfs fi df output

Re: atime and filesystems with snapshots (especially Btrfs)

[PATCH] Btrfs: return value of btrfs_read_buffer is checked correctly

Re: Newbie questions on some of btrfs code...

Re: Make existing snapshots read-only?

Re: Newbie questions on some of btrfs code...

Re: Make existing snapshots read-only?

Re: Make existing snapshots read-only?

Re: Will big metadata blocks fix # of hardlinks?

Re: Will big metadata blocks fix # of hardlinks?

Re: atime and filesystems with snapshots (especially Btrfs)

Decrease meta fragments by using a caterpillar band Method (Ver. 2)

[PATCH] Decrease meta fragments by using a caterpillar band Method (btrfs-progs)

[RFC PATCH] Decrease meta fragments by using a caterpillar band Method (Ver. 2)

Re: [RFC PATCH] Decrease meta fragments by using a caterpillar band Method (Ver. 2)

[PATCH] Btrfs: fix return code in drop_objectid_items

[PATCH] Btrfs: check to see if the inode is in the log before fsyncing

Help with recover data

Re: Help with data recovering

Re: Help with data recovering

Re: Help with recover data

Re: Help with recover data

Re: Help with data recovering

Re: Help with data recovering

Re: Help with data recovering

Re: [PATCH] Btrfs: fix return code in drop_objectid_items

26 matches

Site Navigation

Mail list logo

Footer information