Re: Heavy nocow'd VM image fragmentation

2014-10-27 Thread Duncan
Larkin Lowrey posted on Sun, 26 Oct 2014 12:20:45 -0500 as excerpted:

 One unusual property of my setup is I have my fs on top of bcache. More
 specifically, the stack is md raid6  - bcache - lvm - btrfs. When the
 fs mounts it has mount option 'ssd' due to the fact that bcache sets
 /sys/block/bcache0/queue/rotational to 0.
 
 Is there any reason why either the 'ssd' mount option or being backed by
 bcache could be responsible?

Bcache... Some kernel cycles ago btrfs on bcache had known issues but IDR 
the details.  I /think/ that was fixed, but if you don't know what I'm 
referring to, I'd suggest looking back in the btrfs list archives (and 
assuming there's a bcache list, there's too), to see what it was, whether 
it was fixed, and (presumably on the bcache list) current status.

... Actually just did a bcache keyword search in my archive and see you 
on a thread, saying it was working fine for you, so never mind, looks 
like you are aware of that thread, and actually know more about the 
status than I do...

I don't believe the ssd mount option /should/ be triggering 
fragmentation; I use it here on real ssd, but as I said, I don't have 
that sort of large-internal-write-pattern file to worry about and have 
autodefrag set too, plus compress=lzo so filefrag's reports aren't 
trustworthy here anyway.

But what I DO know is that there's a nossd mount option available if the 
detection's going whacky and it's adding the ssd mount option 
inappropriately.  That has been there for a couple kernel cycles now.  
See the btrfs (5) manpage for the mount options.

So you could try the nossd mount option and see if it makes a difference.


Meanwhile, that's quite a stack you have there.  Before I switched to 
btrfs and btrfs raid, I was running mdraid here, and for a period ran lvm 
on top of mdraid.  But as an admin I decided that was simply too complex 
a setup for me to be confident in my own ability to properly handle 
disaster recovery.  And because I could feed the appropriate root on 
mdraid parameters directly to the kernel and didn't need an initr* for 
it, while I did for lvm, I kept mdraid, and actually had a few chances to 
practice disaster recovery on mdraid over time, becoming quite 
comfortable with it.

But not only do you have that, you have bcache thrown in too, and in 
place of the traditional reiserfs I was using (and still use on my second 
backups and media partitions on spinning rust as I've had very good 
results with reiserfs since data=ordered became the default, even thru 
various hardware issues... I'll avoid the stories), you're using btrfs, 
which has its own raid modes, altho I suppose you're not using them.

So that is indeed quite a stack.  If you're comfortable with your ability 
to properly handle disaster recovery at all those levels, wow, you 
definitely have my respect.  Or do you just have it all backed up and 
figure if it blows up and disaster recovery isn't going to be trivial you 
simply rebuild and restore from backup?  I guess with btrfs not yet fully 
stable and mature that's the best idea at its level anyway, and if you 
have it backed up for that, then you have it backed up for the others 
and /can/ simply rebuild your stack and restore from backup, should you 
need to.

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Poll: time to switch skinny-metadata on by default?

2014-10-27 Thread Duncan
Zygo Blaxell posted on Mon, 27 Oct 2014 00:39:25 -0400 as excerpted:

 One thing that may be significant is _when_ those 3 hanging filesystems
 are hanging:  when using rsync to update local files.  These machines
 are using the traditional rsync copy-then-rename method rather than
 --inplace updates.  There's no problem copying data into an empty
 directory with rsync, but as soon as I start updating existing data,
 some process (not necessarily rsync) using with the filesystem gets
 stuck within 36 hours, and stays stuck for days.  If I don't run rsync
 on the skinny filesystems,
 they'll run for a week or more without incident--and if I then start
 running rsync again, they hang later the same day.

Limited counterpoint here:

My packages partition is btrfs with skinny-metadata (skinny extents in 
dmsg), and the main gentoo tree on it gets regularly rsynced against 
gentoo servers.  In fact, my sync script does that *AND* a git-pull on 
three overlays, in parallel with the rsync so all three git-pulls and the 
rsync are happening at once.

No problems with that here. =:^)

However, I suspect other factors in my setup avoid whatever's triggering 
it for Zygo.

* The filesystem is btrfs raid1 mode data/metadata.

* Only 24 GiB in size (show says 19.78 GiB used, df says 15.84 of 18 GiB 
data used, 969 MiB of 1.75 GiB metadata used).

* Relatively fast SSD, ssd auto-detected and added as a mount option.

* I set the skinny-metadata option (and extref and no-holes) at 
mkfs.btrfs time, while Zygo converted and presumably has both fat and 
skinny metadata.

FWIW I've been spared all the rsync-triggered issues people have reported 
over time.  I'm guessing I don't hit the same race conditions because 
with the small filesystem my overhead is lower, and with the ssd I simply 
don't have the same bottlenecks.  So I'd not expect to hit this problem 
here either and that I'm not hitting it doesn't prove much, except that 
with reasonably fast ssds and smaller filesystems, whatever race 
conditions people seem to so commonly trigger with rsync elsewhere, 
simply don't seem to happen here.

So as I said, limited counterpoint, but offered FWIW.

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Poll: time to switch skinny-metadata on by default?

2014-10-27 Thread Duncan
Marc Joliet posted on Mon, 27 Oct 2014 02:24:15 +0100 as excerpted:

 Am Sat, 25 Oct 2014 14:35:33 -0600 schrieb Chris Murphy
 li...@colorremedies.com:
 
 
 On Oct 25, 2014, at 2:33 PM, Chris Murphy li...@colorremedies.com
 wrote:
 
 
  On Oct 25, 2014, at 6:24 AM, Marc Joliet mar...@gmx.de wrote:
  
  First of all: does grub2 support booting from a btrfs file system
  with skinny-metadata, or is it irrelevant?
  
  Seems plausible if older kernels don't understand skinny-metadata,
  that GRUB2 won't either. So I just tested it with grub2-2.02-0.8.fc21
  and it works. I'm surprised, actually.
 
 I don't understand the nature of the incompatibility with older
 kernels. Can they not mount a Btrfs volume even as ro? If so then I'd
 expect GRUB to have a problem, so I'm going to guess that maybe a 3.9
 or older kernel could ro mount a Btrfs volume with skinny extents and
 the incompatibility is writing.
 
 That sounds plausible, though I hope for a definitive answer. (FWIW, I
 originally asked because I couldn't find any commits to grub2 related to
 skinny metadata; the updates to the btrfs driver were fairly sparse.)

FWIW I have three /boot partitions, one one each of my main drives.  All 
three are gpt with a reserved BIOS partition that grub2 installs its 
monolithic grub2core into, but have dedicated /boot partitions as well, 
for the grub2 config and additional grub2 modules, kernels, etc.  The 
third one is reiserfs on spinning rust, but the other two are btrfs on 
ssd.

Last time I updated I thought I switched them to skinny-metadata, but 
just checking dmesg while mounting them now, the second one (first 
backup) is skinny-metadata, but my working /boot is still fat-metadata.

I did test the backup (with the skinny-metadata) after I did the mkfs and 
restore and it booted to grub2 and from grub2 to my main system just 
fine, so grub2 with skinny-metadata *CAN* work.

But because it's my backup, I don't update it with new kernels as 
frequently as I do my working /boot, nor do I boot from it that often.  
So while I can be sure grub2 /can/ work with skinny-metadata, I do not 
yet know at this point if it does so /reliably/.

And of course, to the extent that grub2 works differently on MBR and/or 
on GPT when it doesn't have a reserved BIOS partition to put the 
monolithic grub2core in, I haven't tested that.  Tho in theory that 
should install in slack-space if available and the filesystem shouldn't 
affect that at all.  But I know reiserfs used to screw up grub1 very  
occasionally (maybe .5-1% of new kernel installations; it did it I think 
twice in about 7 years, and I run git kernels so update them reasonably 
frequently) on my old MBR setup without much slack-space to spare, and 
I'd have to reinstall grub1.

So that's a qualified skinny-metadata shouldn't affect grub2, as I've 
booted using grub2 on a btrfs with skinny-metadata /boot.  But I've 
simply not tested it enough to know whether it's reliable over time as 
the filesystem updates and changes, or not.

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: Enhance btrfs chunk allocation algorithm to reduce ENOSPC caused by unbalanced data/metadata allocation.

2014-10-27 Thread Liu Bo
On Mon, Oct 27, 2014 at 08:18:12AM +0800, Qu Wenruo wrote:
 
  Original Message 
 Subject: Re: [PATCH] btrfs: Enhance btrfs chunk allocation algorithm
 to reduce ENOSPC caused by unbalanced data/metadata allocation.
 From: Liu Bo bo.li@oracle.com
 To: Qu Wenruo quwen...@cn.fujitsu.com
 Date: 2014年10月24日 19:06
 On Thu, Oct 23, 2014 at 10:37:51AM +0800, Qu Wenruo wrote:
 When btrfs allocate a chunk, it will try to alloc up to 1G for data and
 256M for metadata, or 10% of all the writeable space if there is enough
 10G for data,
  if (type  BTRFS_BLOCK_GROUP_DATA) {
  max_stripe_size = 1024 * 1024 * 1024;
  max_chunk_size = 10 * max_stripe_size;
 Oh, sorry, 10G is right.
 
 Any other comments?
 
 Thanks,
 Qu
 
 
  ...
 
 thanks,
 -liubo
 
 space for the stripe on device.
 
 However, when we run out of space, this allocation may cause unbalanced
 chunk allocation.
 For example, there are only 1G unallocated space, and request for
 allocate DATA chunk is sent, and all the space will be allocated as data
 chunk, making later metadata chunk alloc request unable to handle, which
 will cause ENOSPC.
 This is the one of the common complains from end users about why ENOSPC
 happens but there is still available space.

Okay, I don't think this is the common case, AFAIK, the most ENOSPC is caused
by our runtime worst case metadata reservation problem.

btrfs has been inclined to create a fairly large metadata chunk (1G) in its
initial mkfs stage and 256M metadata chunk is also a very large one.

As of your below example, yes, we don't have space for metadata
allocation, but do we really need to allocate a new one?

Or am I missing something?

thanks,
-liubo

 
 This patch will try not to alloc chunk which is more than half of the
 unallocated space, making the last space more balanced at a small cost
 of more fragmented chunk at the last 1G.
 
 Some easy example:
 Preallocate 17.5G on a 20G empty btrfs fs:
 [Before]
   # btrfs fi show /mnt/test
 Label: none  uuid: da8741b1-5d47-4245-9e94-bfccea34e91e
 Total devices 1 FS bytes used 17.50GiB
 devid1 size 20.00GiB used 20.00GiB path /dev/sdb
 All space is allocated. No space later metadata space.
 
 [After]
   # btrfs fi show /mnt/test
 Label: none  uuid: e6935aeb-a232-4140-84f9-80aab1f23d56
 Total devices 1 FS bytes used 17.50GiB
 devid1 size 20.00GiB used 19.77GiB path /dev/sdb
 About 230M is still available for later metadata allocation.
 
 Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com
 ---
   fs/btrfs/volumes.c | 18 ++
   1 file changed, 18 insertions(+)
 
 diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
 index d47289c..fa8de79 100644
 --- a/fs/btrfs/volumes.c
 +++ b/fs/btrfs/volumes.c
 @@ -4240,6 +4240,7 @@ static int __btrfs_alloc_chunk(struct 
 btrfs_trans_handle *trans,
 int ret;
 u64 max_stripe_size;
 u64 max_chunk_size;
 +   u64 total_avail_space = 0;
 u64 stripe_size;
 u64 num_bytes;
 u64 raid_stripe_len = BTRFS_STRIPE_LEN;
 @@ -4352,10 +4353,27 @@ static int __btrfs_alloc_chunk(struct 
 btrfs_trans_handle *trans,
 devices_info[ndevs].max_avail = max_avail;
 devices_info[ndevs].total_avail = total_avail;
 devices_info[ndevs].dev = device;
 +   total_avail_space += total_avail;
 ++ndevs;
 }
 /*
 +* Try not to occupy more than half of the unallocated space.
 +* When run short of space and alloc all the space to
 +* data/metadata will cause ENOSPC to be triggered more easily.
 +*
 +* And since the minimum chunk size is 16M, the half-half will cause
 +* 16M allocated from 20M available space and reset 4M will not be
 +* used ever. In that case(16~32M), allocate all directly.
 +*/
 +   if (total_avail_space  32 * 1024 * 1024 
 +   total_avail_space  16 * 1024 * 1024)
 +   max_chunk_size = total_avail_space;
 +   else
 +   max_chunk_size = min(total_avail_space / 2, max_chunk_size);
 +   max_chunk_size = min(total_avail_space / 2, max_chunk_size);
 +
 +   /*
  * now sort the devices by hole size / available space
  */
 sort(devices_info, ndevs, sizeof(struct btrfs_device_info),
 -- 
 2.1.2
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: Enhance btrfs chunk allocation algorithm to reduce ENOSPC caused by unbalanced data/metadata allocation.

2014-10-27 Thread Qu Wenruo


 Original Message 
Subject: Re: [PATCH] btrfs: Enhance btrfs chunk allocation algorithm to 
reduce ENOSPC caused by unbalanced data/metadata allocation.

From: Liu Bo bo.li@oracle.com
To: Qu Wenruo quwen...@cn.fujitsu.com
Date: 2014年10月27日 16:14

On Mon, Oct 27, 2014 at 08:18:12AM +0800, Qu Wenruo wrote:

 Original Message 
Subject: Re: [PATCH] btrfs: Enhance btrfs chunk allocation algorithm
to reduce ENOSPC caused by unbalanced data/metadata allocation.
From: Liu Bo bo.li@oracle.com
To: Qu Wenruo quwen...@cn.fujitsu.com
Date: 2014年10月24日 19:06

On Thu, Oct 23, 2014 at 10:37:51AM +0800, Qu Wenruo wrote:

When btrfs allocate a chunk, it will try to alloc up to 1G for data and
256M for metadata, or 10% of all the writeable space if there is enough

10G for data,
 if (type  BTRFS_BLOCK_GROUP_DATA) {
 max_stripe_size = 1024 * 1024 * 1024;
 max_chunk_size = 10 * max_stripe_size;

Oh, sorry, 10G is right.

Any other comments?

Thanks,
Qu



...

thanks,
-liubo


space for the stripe on device.

However, when we run out of space, this allocation may cause unbalanced
chunk allocation.
For example, there are only 1G unallocated space, and request for
allocate DATA chunk is sent, and all the space will be allocated as data
chunk, making later metadata chunk alloc request unable to handle, which
will cause ENOSPC.
This is the one of the common complains from end users about why ENOSPC
happens but there is still available space.

Okay, I don't think this is the common case, AFAIK, the most ENOSPC is caused
by our runtime worst case metadata reservation problem.

btrfs has been inclined to create a fairly large metadata chunk (1G) in its
initial mkfs stage and 256M metadata chunk is also a very large one.

As of your below example, yes, we don't have space for metadata
allocation, but do we really need to allocate a new one?

Or am I missing something?

thanks,
-liubo
Yes that's true this is not the common cause, but at least this patch 
may make the percentage
of 'df' command reach as close to 100% as possible before hitting ENOSPC 
under normal operations.

(If not using balance)

And some case like the following mail may be improved by the patch:
https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg36097.html

I understand that most of the cases that a lot of free data space and no 
metadata space is caused by
create and then delete large files, but if the last giga bytes can be 
allocated more carefully,
at least the available bytes of 'df'  command should be reduced before 
hit ENOSPC.


How do you think about it?

Thanks,
Qu



This patch will try not to alloc chunk which is more than half of the
unallocated space, making the last space more balanced at a small cost
of more fragmented chunk at the last 1G.

Some easy example:
Preallocate 17.5G on a 20G empty btrfs fs:
[Before]
  # btrfs fi show /mnt/test
Label: none  uuid: da8741b1-5d47-4245-9e94-bfccea34e91e
Total devices 1 FS bytes used 17.50GiB
devid1 size 20.00GiB used 20.00GiB path /dev/sdb
All space is allocated. No space later metadata space.

[After]
  # btrfs fi show /mnt/test
Label: none  uuid: e6935aeb-a232-4140-84f9-80aab1f23d56
Total devices 1 FS bytes used 17.50GiB
devid1 size 20.00GiB used 19.77GiB path /dev/sdb
About 230M is still available for later metadata allocation.

Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com
---
  fs/btrfs/volumes.c | 18 ++
  1 file changed, 18 insertions(+)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index d47289c..fa8de79 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -4240,6 +4240,7 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle 
*trans,
int ret;
u64 max_stripe_size;
u64 max_chunk_size;
+   u64 total_avail_space = 0;
u64 stripe_size;
u64 num_bytes;
u64 raid_stripe_len = BTRFS_STRIPE_LEN;
@@ -4352,10 +4353,27 @@ static int __btrfs_alloc_chunk(struct 
btrfs_trans_handle *trans,
devices_info[ndevs].max_avail = max_avail;
devices_info[ndevs].total_avail = total_avail;
devices_info[ndevs].dev = device;
+   total_avail_space += total_avail;
++ndevs;
}
/*
+* Try not to occupy more than half of the unallocated space.
+* When run short of space and alloc all the space to
+* data/metadata will cause ENOSPC to be triggered more easily.
+*
+* And since the minimum chunk size is 16M, the half-half will cause
+* 16M allocated from 20M available space and reset 4M will not be
+* used ever. In that case(16~32M), allocate all directly.
+*/
+   if (total_avail_space  32 * 1024 * 1024 
+   total_avail_space  16 * 1024 * 1024)
+   max_chunk_size = total_avail_space;
+   else
+   max_chunk_size = 

[PATCH] Btrfs: fix invalid leaf slot access in btrfs_lookup_extent()

2014-10-27 Thread Filipe Manana
If we couldn't find our extent item, we accessed the current slot
(path-slots[0]) to check if it corresponds to an equivalent skinny
metadata item. However this slot could be beyond our last item in the
leaf (i.e. path-slots[0] = btrfs_header_nritems(leaf)), in which case
we shouldn't process it.

Since btrfs_lookup_extent() is only used to find extent items for data
extents, fix this by removing completely the logic that looks up for an
equivalent skinny metadata item, since it can not exist.

Signed-off-by: Filipe Manana fdman...@suse.com
---
 fs/btrfs/extent-tree.c | 8 +---
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 0d599ba..9141b2b 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -710,7 +710,7 @@ void btrfs_clear_space_info_full(struct btrfs_fs_info *info)
rcu_read_unlock();
 }
 
-/* simple helper to search for an existing extent at a given offset */
+/* simple helper to search for an existing data extent at a given offset */
 int btrfs_lookup_extent(struct btrfs_root *root, u64 start, u64 len)
 {
int ret;
@@ -726,12 +726,6 @@ int btrfs_lookup_extent(struct btrfs_root *root, u64 
start, u64 len)
key.type = BTRFS_EXTENT_ITEM_KEY;
ret = btrfs_search_slot(NULL, root-fs_info-extent_root, key, path,
0, 0);
-   if (ret  0) {
-   btrfs_item_key_to_cpu(path-nodes[0], key, path-slots[0]);
-   if (key.objectid == start 
-   key.type == BTRFS_METADATA_ITEM_KEY)
-   ret = 0;
-   }
btrfs_free_path(path);
return ret;
 }
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: fix race that makes btrfs_lookup_extent_info miss skinny extent items

2014-10-27 Thread Filipe Manana
We have a race that can lead us to miss skinny extent items in the function
btrfs_lookup_extent_info() when the skinny metadata feature is enabled.
So basically the sequence of steps is:

1) We search in the extent tree for the skinny extent, which returns  0
   (not found);

2) We check the previous item in the returned leaf for a non-skinny extent,
   and we don't find it;

3) Because we didn't find the non-skinny extent in step 2), we release our
   path to search the extent tree again, but this time for a non-skinny
   extent key;

4) Right after we released our path in step 3), a skinny extent was inserted
   in the extent tree (delayed refs were run) - our second extent tree search
   will miss it, because it's not looking for a skinny extent;

5) After the second search returned (with ret  0), we look for any delayed
   ref for our extent's bytenr (and we do it while holding a read lock on the
   leaf), but we won't find any, as such delayed ref had just run and completed
   after we released out path in step 3) before doing the second search.

Fix this by removing completely the path release and re-search logic. This is
safe, because if we seach for a metadata item and we don't find it, we have the
guarantee that the returned leaf is the one where the item would be inserted,
and so path-slots[0]  0 and path-slots[0] - 1 must be the slot where the
non-skinny extent item is if it exists. The only case where path-slots[0] is
zero is when there are no smaller keys in the tree (i.e. no left siblings for
our leaf), in which case the re-search logic isn't needed as well.

This race has been present since the introduction of skinny metadata (change
3173a18f70554fe7880bb2d85c7da566e364eb3c).

Signed-off-by: Filipe Manana fdman...@suse.com
---
 fs/btrfs/extent-tree.c | 8 
 1 file changed, 8 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 9141b2b..2cedd06 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -780,7 +780,6 @@ search_again:
else
key.type = BTRFS_EXTENT_ITEM_KEY;
 
-again:
ret = btrfs_search_slot(trans, root-fs_info-extent_root,
key, path, 0, 0);
if (ret  0)
@@ -796,13 +795,6 @@ again:
key.offset == root-nodesize)
ret = 0;
}
-   if (ret) {
-   key.objectid = bytenr;
-   key.type = BTRFS_EXTENT_ITEM_KEY;
-   key.offset = root-nodesize;
-   btrfs_release_path(path);
-   goto again;
-   }
}
 
if (ret == 0) {
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


BTRFS balance segfault, where to go from here

2014-10-27 Thread Stephan Alz
Hello Folks,

I used to have an array of 4x4TB drives with BTRFS in raid10.
The kernel version is: 3.13-0.bpo.1-amd64
BTRFS version is: v3.14.1

When it was reaching 80% in space I added another 4TB drive to the array with:

 btrfs device add /dev/sdf /mnt/backup

And started the balancing to the new drive:

 btrfs filesystem balance /mnt/backup

This was going for a while for 5-6 hours before it segfaulted with not enough 
free space message.
Now my configuration looks like this:

btrfs fi show /mnt/backup
Label: 'backup'  uuid: ...
Total devices 5 FS bytes used 5.93TiB
devid1 size 3.64TiB used 2.82TiB path /dev/sdd
devid2 size 3.64TiB used 2.82TiB path /dev/sdc
devid3 size 3.64TiB used 2.81TiB path /dev/sdb
devid4 size 3.64TiB used 2.82TiB path /dev/sde
devid5 size 3.64TiB used 638.50GiB path /dev/sdf

After this crash happend during the balancing (logs are attached at the end) 
the system remounted my /mnt/backup share as RO.
At this point I started to really worry. I umounted and remounted it manually. 
At the beginning it run some self checks which took like 5 mins then as iotop 
showed it continued with the balancing which failed again the same way. For 
next time after mount I immediately put the balancing on pause (which helped). 

My question is where to go from here? What I going to do right now is to copy 
the most important data to another separated XFS drive.
What I planning to do is:

1, Upgrade the kernel
2, Upgrade BTRFS
3, Continue the balancing.


Could someone please also explain that how is exactly the raid10 setup works 
with ODD number of drives with btrfs? 
Raid10 should be a stripe of mirrors. Now then this sdf drive is mirrored or 
striped or what? 
Some btrfs gurus could tell me that should I be worried of dataloss because of 
this or not?

Would I need even more free space just to add a 5th drive? If so how much more? 

Kernel logs
---


Oct 24 17:25:44 backup kernel: [29396.873750] btrfs: relocating block group 
5162588438528 flags 65
Oct 24 17:26:09 backup kernel: [29421.594524] btrfs: found 13126 extents
Oct 24 17:26:38 backup kernel: [29450.769228] btrfs: found 13126 extents
Oct 24 17:26:39 backup kernel: [29451.345198] btrfs: relocating block group 
5161514696704 flags 68
Oct 24 17:31:33 backup kernel: [29745.776810] BTRFS debug (device sdb): 
run_one_delayed_ref returned -28
Oct 24 17:31:33 backup kernel: [29745.776818] [ cut here 
]
Oct 24 17:31:33 backup kernel: [29745.776847] WARNING: CPU: 1 PID: 1807 at 
/build/linux-t5aGFh/linux-3.13.10/fs/btrfs/super.c:254 
__btrfs_abort_transaction+0x5a/0x140 [btrfs]()
Oct 24 17:31:33 backup kernel: [29745.776849] btrfs: Transaction aborted (error 
-28)
Oct 24 17:31:33 backup kernel: [29745.776851] Modules linked in: xen_gntdev 
xen_evtchn xenfs xen_privcmd nfsd auth_rpcgss oid_registry nfs_acl nfs lockd 
fscache sunrpc 8021q garp mrp bridge stp llc loop iTCO_wdt iTCO_vendor_support 
lpc_ich radeon mfd_core processor evdev ttm drm_kms_helper drm i2c_algo_bit 
coretemp rng_core serio_raw pcspkr i2c_i801 i2c_core i3000_edac thermal_sys 
button shpchp edac_core ext4 crc16 mbcache jbd2 btrfs xor raid6_pq crc32c 
libcrc32c dm_mod xen_pciback sg sd_mod sr_mod crc_t10dif cdrom crct10dif_common 
ata_generic ahci ata_piix libahci 3w_9xxx libata scsi_mod ehci_pci uhci_hcd 
ehci_hcd e1000e ptp pps_core usbcore usb_common
Oct 24 17:31:33 backup kernel: [29745.776902] CPU: 1 PID: 1807 Comm: 
btrfs-transacti Not tainted 3.13-0.bpo.1-amd64 #1 Debian 3.13.10-1~bpo70+1
Oct 24 17:31:33 backup kernel: [29745.776905] Hardware name: Supermicro 
PDSM4+/PDSM4+, BIOS 6.00 02/05/2007
Oct 24 17:31:33 backup kernel: [29745.776907]   
a0257130 814d16c9 88006a7f3cc8
Oct 24 17:31:33 backup kernel: [29745.776911]  81060967 
ffe4 880004282800 88003b813ec0
Oct 24 17:31:33 backup kernel: [29745.776914]  0aaa 
a0253b60 81060a55 a0257260
Oct 24 17:31:33 backup kernel: [29745.776918] Call Trace:
Oct 24 17:31:33 backup kernel: [29745.776926]  [814d16c9] ? 
dump_stack+0x41/0x51
Oct 24 17:31:33 backup kernel: [29745.776931]  [81060967] ? 
warn_slowpath_common+0x87/0xc0
Oct 24 17:31:33 backup kernel: [29745.776935]  [81060a55] ? 
warn_slowpath_fmt+0x45/0x50
Oct 24 17:31:33 backup kernel: [29745.776946]  [a01b73ca] ? 
__btrfs_abort_transaction+0x5a/0x140 [btrfs]
Oct 24 17:31:33 backup kernel: [29745.776959]  [a01d2e72] ? 
btrfs_run_delayed_refs+0x372/0x530 [btrfs]
Oct 24 17:31:33 backup kernel: [29745.776974]  [a01fa8c3] ? 
btrfs_run_ordered_operations+0x213/0x2b0 [btrfs]
Oct 24 17:31:33 backup kernel: [29745.776988]  [a01e2fea] ? 
btrfs_commit_transaction+0x5a/0x990 [btrfs]
Oct 24 17:31:33 backup kernel: [29745.777001]  [a01e1345] ? 
transaction_kthread+0x1c5/0x240 [btrfs]
Oct 24 17:31:33 backup kernel: 

Re: [PATCH] Btrfs: fix invalid leaf slot access in btrfs_lookup_extent()

2014-10-27 Thread Miao Xie
On Mon, 27 Oct 2014 09:16:55 +, Filipe Manana wrote:
 If we couldn't find our extent item, we accessed the current slot
 (path-slots[0]) to check if it corresponds to an equivalent skinny
 metadata item. However this slot could be beyond our last item in the
 leaf (i.e. path-slots[0] = btrfs_header_nritems(leaf)), in which case
 we shouldn't process it.
 
 Since btrfs_lookup_extent() is only used to find extent items for data
 extents, fix this by removing completely the logic that looks up for an
 equivalent skinny metadata item, since it can not exist.

I think we also need a better function name, such as btrfs_lookup_data_extent.

Thanks
Miao

 
 Signed-off-by: Filipe Manana fdman...@suse.com
 ---
  fs/btrfs/extent-tree.c | 8 +---
  1 file changed, 1 insertion(+), 7 deletions(-)
 
 diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
 index 0d599ba..9141b2b 100644
 --- a/fs/btrfs/extent-tree.c
 +++ b/fs/btrfs/extent-tree.c
 @@ -710,7 +710,7 @@ void btrfs_clear_space_info_full(struct btrfs_fs_info 
 *info)
   rcu_read_unlock();
  }
  
 -/* simple helper to search for an existing extent at a given offset */
 +/* simple helper to search for an existing data extent at a given offset */
  int btrfs_lookup_extent(struct btrfs_root *root, u64 start, u64 len)
  {
   int ret;
 @@ -726,12 +726,6 @@ int btrfs_lookup_extent(struct btrfs_root *root, u64 
 start, u64 len)
   key.type = BTRFS_EXTENT_ITEM_KEY;
   ret = btrfs_search_slot(NULL, root-fs_info-extent_root, key, path,
   0, 0);
 - if (ret  0) {
 - btrfs_item_key_to_cpu(path-nodes[0], key, path-slots[0]);
 - if (key.objectid == start 
 - key.type == BTRFS_METADATA_ITEM_KEY)
 - ret = 0;
 - }
   btrfs_free_path(path);
   return ret;
  }
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] Btrfs: fix invalid leaf slot access in btrfs_lookup_extent()

2014-10-27 Thread Filipe Manana
If we couldn't find our extent item, we accessed the current slot
(path-slots[0]) to check if it corresponds to an equivalent skinny
metadata item. However this slot could be beyond our last item in the
leaf (i.e. path-slots[0] = btrfs_header_nritems(leaf)), in which case
we shouldn't process it.

Since btrfs_lookup_extent() is only used to find extent items for data
extents, fix this by removing completely the logic that looks up for an
equivalent skinny metadata item, since it can not exist.

Signed-off-by: Filipe Manana fdman...@suse.com
---

V2: Renamed btrfs_lookup_extent() to btrfs_lookup_data_extent().

 fs/btrfs/ctree.h   |  2 +-
 fs/btrfs/extent-tree.c | 10 ++
 fs/btrfs/tree-log.c|  2 +-
 3 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index dd8b275..b72b358 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3276,7 +3276,7 @@ int btrfs_run_delayed_refs(struct btrfs_trans_handle 
*trans,
   struct btrfs_root *root, unsigned long count);
 int btrfs_async_run_delayed_refs(struct btrfs_root *root,
 unsigned long count, int wait);
-int btrfs_lookup_extent(struct btrfs_root *root, u64 start, u64 len);
+int btrfs_lookup_data_extent(struct btrfs_root *root, u64 start, u64 len);
 int btrfs_lookup_extent_info(struct btrfs_trans_handle *trans,
 struct btrfs_root *root, u64 bytenr,
 u64 offset, int metadata, u64 *refs, u64 *flags);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 0d599ba..87c0b46f 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -710,8 +710,8 @@ void btrfs_clear_space_info_full(struct btrfs_fs_info *info)
rcu_read_unlock();
 }
 
-/* simple helper to search for an existing extent at a given offset */
-int btrfs_lookup_extent(struct btrfs_root *root, u64 start, u64 len)
+/* simple helper to search for an existing data extent at a given offset */
+int btrfs_lookup_data_extent(struct btrfs_root *root, u64 start, u64 len)
 {
int ret;
struct btrfs_key key;
@@ -726,12 +726,6 @@ int btrfs_lookup_extent(struct btrfs_root *root, u64 
start, u64 len)
key.type = BTRFS_EXTENT_ITEM_KEY;
ret = btrfs_search_slot(NULL, root-fs_info-extent_root, key, path,
0, 0);
-   if (ret  0) {
-   btrfs_item_key_to_cpu(path-nodes[0], key, path-slots[0]);
-   if (key.objectid == start 
-   key.type == BTRFS_METADATA_ITEM_KEY)
-   ret = 0;
-   }
btrfs_free_path(path);
return ret;
 }
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 2b26dad..6d58d72 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -672,7 +672,7 @@ static noinline int replay_one_extent(struct 
btrfs_trans_handle *trans,
 * is this extent already allocated in the extent
 * allocation tree?  If so, just add a reference
 */
-   ret = btrfs_lookup_extent(root, ins.objectid,
+   ret = btrfs_lookup_data_extent(root, ins.objectid,
ins.offset);
if (ret == 0) {
ret = btrfs_inc_extent_ref(trans, root,
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: suspicious number of devices: 72057594037927936

2014-10-27 Thread Filipe David Manana
On Mon, Oct 27, 2014 at 10:34 AM, Christian Kujau li...@nerdbynature.de wrote:
 (somehow this message did not make it to the list)

 Hi,

 After upgrading from linux 3.17.0 to 3.18.0-rc2, I cannot mount my btrfs
 partition any more. It's just one btrfs partition, no raid, no
 compression, no fancy mount options:

 # mount -t btrfs -o ro /dev/sda6 /usr/local/
 mount: wrong fs type, bad option, bad superblock on /dev/sda6,
 [...]
 BTRFS: suspicious number of devices: 72057594037927936
 BTRFS: super offset mismatch 1099511627776 != 65536
 BTRFS: superblock contains fatal errors
 BTRFS: open_ctree failed

 The only thing fancy may be the machine: PowerBook G4 (powerpc 32 bit),
 running Debian/Linux (stable).

 The message comes from the newly added fs/btrfs/disk-io.c:

 if (sb-num_devices  (1UL  31))
  printk(KERN_WARNING BTRFS: suspicious number of devices: %llu\n,
   sb-num_devices);

 And 72057594037927936 is 2^56, so maybe there's an endianess problem here?

Sounds like you need to revert this patch:
https://patchwork.kernel.org/patch/5004701/ (which ignored endianess)
or go back to an older kernel (don't use 3.17 or 3.17.1 however, due
to other serious issues, latest 3.16.x should be safe). There's a v2
of that patch that fixes the endianess issue, but it didn't make it to
3.18-rc1/2 (https://patchwork.kernel.org/patch/5082701/)

regards


 Some details below, please let me know what other details may be needed.
 Going back to 3.17 now...

 Thanks,
 Christian.

 # file -Ls /dev/sda6
 /dev/sda6: sticky BTRFS Filesystem sectorsize 4096, nodesize 4096,
 leafsize 4096)

 # btrfsck /dev/sda6
 checking extents
 checking fs roots
 checking root refs
 found 2035929088 bytes used err is 0
 total csum bytes: 1886920
 total tree bytes: 102936576
 total fs tree bytes: 94441472
 btree space waste bytes: 30875964
 file data blocks allocated: 1932992512
  referenced 1932849152
 Btrfs Btrfs v0.19
 # echo $?
 0


 --
 BOFH excuse #427:

 network down, IP packets delivered via UPS
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Filipe David Manana,

Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix race that makes btrfs_lookup_extent_info miss skinny extent items

2014-10-27 Thread Miao Xie
On Mon, 27 Oct 2014 09:19:52 +, Filipe Manana wrote:
 We have a race that can lead us to miss skinny extent items in the function
 btrfs_lookup_extent_info() when the skinny metadata feature is enabled.
 So basically the sequence of steps is:
 
 1) We search in the extent tree for the skinny extent, which returns  0
(not found);
 
 2) We check the previous item in the returned leaf for a non-skinny extent,
and we don't find it;
 
 3) Because we didn't find the non-skinny extent in step 2), we release our
path to search the extent tree again, but this time for a non-skinny
extent key;
 
 4) Right after we released our path in step 3), a skinny extent was inserted
in the extent tree (delayed refs were run) - our second extent tree search
will miss it, because it's not looking for a skinny extent;
 
 5) After the second search returned (with ret  0), we look for any delayed
ref for our extent's bytenr (and we do it while holding a read lock on the
leaf), but we won't find any, as such delayed ref had just run and 
 completed
after we released out path in step 3) before doing the second search.
 
 Fix this by removing completely the path release and re-search logic. This is
 safe, because if we seach for a metadata item and we don't find it, we have 
 the
 guarantee that the returned leaf is the one where the item would be inserted,
 and so path-slots[0]  0 and path-slots[0] - 1 must be the slot where the
 non-skinny extent item is if it exists. The only case where path-slots[0] is

I think this analysis is wrong if there are some independent shared ref 
metadata for
a tree block, just like:
++-+-+
| tree block extent item | shared ref1 | shared ref2 |
++-+-+

Thanks
Miao

 zero is when there are no smaller keys in the tree (i.e. no left siblings for
 our leaf), in which case the re-search logic isn't needed as well.
 
 This race has been present since the introduction of skinny metadata (change
 3173a18f70554fe7880bb2d85c7da566e364eb3c).
 
 Signed-off-by: Filipe Manana fdman...@suse.com
 ---
  fs/btrfs/extent-tree.c | 8 
  1 file changed, 8 deletions(-)
 
 diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
 index 9141b2b..2cedd06 100644
 --- a/fs/btrfs/extent-tree.c
 +++ b/fs/btrfs/extent-tree.c
 @@ -780,7 +780,6 @@ search_again:
   else
   key.type = BTRFS_EXTENT_ITEM_KEY;
  
 -again:
   ret = btrfs_search_slot(trans, root-fs_info-extent_root,
   key, path, 0, 0);
   if (ret  0)
 @@ -796,13 +795,6 @@ again:
   key.offset == root-nodesize)
   ret = 0;
   }
 - if (ret) {
 - key.objectid = bytenr;
 - key.type = BTRFS_EXTENT_ITEM_KEY;
 - key.offset = root-nodesize;
 - btrfs_release_path(path);
 - goto again;
 - }
   }
  
   if (ret == 0) {
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Heavy nocow'd VM image fragmentation

2014-10-27 Thread Austin S Hemmelgarn

On 2014-10-26 13:20, Larkin Lowrey wrote:

On 10/24/2014 10:28 PM, Duncan wrote:

Robert White posted on Fri, 24 Oct 2014 19:41:32 -0700 as excerpted:


On 10/24/2014 04:49 AM, Marc MERLIN wrote:

On Thu, Oct 23, 2014 at 06:04:43PM -0500, Larkin Lowrey wrote:

I have a 240GB VirtualBox vdi image that is showing heavy
fragmentation (filefrag). The file was created in a dir that was
chattr +C'd, the file was created via fallocate and the contents of
the orignal image were copied into the file via dd. I verified that
the image was +C.

To be honest, I have the same problem, and it's vexing:

If I understand correctly, when you take a snapshot the file goes into
what I call 1COW mode.

Yes, but the OP said he hadn't snapshotted since creating the file, and
MM's a regular that actually wrote much of the wiki documentation on
raid56 modes, so he better know about the snapshotting problem too.

So that can't be it.  There's apparently a bug in some recent code, and
it's not honoring the NOCOW even in normal operation, when it should be.

(FWIW I'm not running any VMs or large DBs here, so don't have nocow set
on anything and can and do use autodefrag on all my btrfs.  So I can't
say one way or the other, personally.)



Correct, there were no snapshots during VM usage when the fragmentation
occurred.

One unusual property of my setup is I have my fs on top of bcache. More
specifically, the stack is md raid6  - bcache - lvm - btrfs. When the
fs mounts it has mount option 'ssd' due to the fact that bcache sets
/sys/block/bcache0/queue/rotational to 0.

Is there any reason why either the 'ssd' mount option or being backed by
bcache could be responsible?



Two things:
First, regarding your question, the ssd mount option shouldn't be 
responsible for this, because it is supposed to spread out allocation 
only at the chunk level, not the block level, but some recent commit may 
have changed that.  Are you using any kind of compression in btrfs?  If 
so, then filefrag won't report the number of fragments correctly (it 
currently reports the number of compressed blocks in the file instead), 
and in fact, if you are using compression in btrfs, I would expect the 
number of compressed blocks to go up as you use more space in the VM 
image, long runs of zero bytes compress well, other stuff (especially 
on-disk structures from encapsulated filesystems) doesn't.  You might 
consider putting the vm images directly on the LVM layer instead, that 
tends to get much better performance in my experience than storing them 
on a filesystem.


Secondly, I'd recommend switching from using bcache under LVM to using 
dm-cache on top of LVM, as it makes it much easier to recover from the 
various failure modes, and also to deal with a corrupted cache, due to 
the fact that dm-cache doesn't put any metadata on the backing device. 
It takes longer to shutdown when in write-back mode, and isn't SSD 
optimized, but has also been much more reliable in my experience.




smime.p7s
Description: S/MIME Cryptographic Signature


Re: [PATCH] Btrfs: fix race that makes btrfs_lookup_extent_info miss skinny extent items

2014-10-27 Thread Filipe David Manana
On Mon, Oct 27, 2014 at 11:08 AM, Miao Xie mi...@cn.fujitsu.com wrote:
 On Mon, 27 Oct 2014 09:19:52 +, Filipe Manana wrote:
 We have a race that can lead us to miss skinny extent items in the function
 btrfs_lookup_extent_info() when the skinny metadata feature is enabled.
 So basically the sequence of steps is:

 1) We search in the extent tree for the skinny extent, which returns  0
(not found);

 2) We check the previous item in the returned leaf for a non-skinny extent,
and we don't find it;

 3) Because we didn't find the non-skinny extent in step 2), we release our
path to search the extent tree again, but this time for a non-skinny
extent key;

 4) Right after we released our path in step 3), a skinny extent was inserted
in the extent tree (delayed refs were run) - our second extent tree search
will miss it, because it's not looking for a skinny extent;

 5) After the second search returned (with ret  0), we look for any delayed
ref for our extent's bytenr (and we do it while holding a read lock on the
leaf), but we won't find any, as such delayed ref had just run and 
 completed
after we released out path in step 3) before doing the second search.

 Fix this by removing completely the path release and re-search logic. This is
 safe, because if we seach for a metadata item and we don't find it, we have 
 the
 guarantee that the returned leaf is the one where the item would be inserted,
 and so path-slots[0]  0 and path-slots[0] - 1 must be the slot where the
 non-skinny extent item is if it exists. The only case where path-slots[0] is

 I think this analysis is wrong if there are some independent shared ref 
 metadata for
 a tree block, just like:
 ++-+-+
 | tree block extent item | shared ref1 | shared ref2 |
 ++-+-+

Why does that matters? Can you elaborate why it's not correct?

We're looking for the extent item only in btrfs_lookup_extent_info(),
and running a delayed ref, independently of being inlined/shared, it
implies inserting a new extent item or updating an existing extent
item (updating ref count).

thanks


 Thanks
 Miao

 zero is when there are no smaller keys in the tree (i.e. no left siblings for
 our leaf), in which case the re-search logic isn't needed as well.

 This race has been present since the introduction of skinny metadata (change
 3173a18f70554fe7880bb2d85c7da566e364eb3c).

 Signed-off-by: Filipe Manana fdman...@suse.com
 ---
  fs/btrfs/extent-tree.c | 8 
  1 file changed, 8 deletions(-)

 diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
 index 9141b2b..2cedd06 100644
 --- a/fs/btrfs/extent-tree.c
 +++ b/fs/btrfs/extent-tree.c
 @@ -780,7 +780,6 @@ search_again:
   else
   key.type = BTRFS_EXTENT_ITEM_KEY;

 -again:
   ret = btrfs_search_slot(trans, root-fs_info-extent_root,
   key, path, 0, 0);
   if (ret  0)
 @@ -796,13 +795,6 @@ again:
   key.offset == root-nodesize)
   ret = 0;
   }
 - if (ret) {
 - key.objectid = bytenr;
 - key.type = BTRFS_EXTENT_ITEM_KEY;
 - key.offset = root-nodesize;
 - btrfs_release_path(path);
 - goto again;
 - }
   }

   if (ret == 0) {


 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Filipe David Manana,

Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: get the accurate value of used_bytes in btrfs_get_block_group_info().

2014-10-27 Thread Dongsheng Yang
Reproducer:
# mkfs.btrfs -f -b 20G /dev/sdb
# mount /dev/sdb /mnt/test
# fallocate  -l 17G /mnt/test/largefile
# btrfs fi df /mnt/test
Data, single: total=17.49GiB, used=6.00GiB - only 6G, but actually it 
should be 17G.
System, DUP: total=8.00MiB, used=16.00KiB
System, single: total=4.00MiB, used=0.00B
Metadata, DUP: total=1.00GiB, used=112.00KiB
Metadata, single: total=8.00MiB, used=0.00B
GlobalReserve, single: total=16.00MiB, used=0.00B
# sync
# btrfs fi df /mnt/test
Data, single: total=17.49GiB, used=17.00GiB - After sync, it is 
expected.
System, DUP: total=8.00MiB, used=16.00KiB
System, single: total=4.00MiB, used=0.00B
Metadata, DUP: total=1.00GiB, used=112.00KiB
Metadata, single: total=8.00MiB, used=0.00B
GlobalReserve, single: total=16.00MiB, used=0.00B

The value of 6.00GiB is actually calculated in btrfs_get_block_group_info()
by adding the @block_group-item-used for each group together. In this way,
it did not consider the bytes in cache.

This patch adds the value of @pinned, @reserved and @bytes_super in
struct btrfs_block_group_cache to make sure we can get the accurate @used_bytes.

Reported-by: Qu Wenruo quwen...@cn.fujitsu.com
Signed-off-by: Dongsheng Yang yangds.f...@cn.fujitsu.com
---
 fs/btrfs/ioctl.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 33c80f5..bc2aaeb 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3892,6 +3892,10 @@ void btrfs_get_block_group_info(struct list_head 
*groups_list,
space-total_bytes += block_group-key.offset;
space-used_bytes +=
btrfs_block_group_used(block_group-item);
+   /* Add bytes-info in cache */
+   space-used_bytes += block_group-pinned;
+   space-used_bytes += block_group-reserved;
+   space-used_bytes += block_group-bytes_super;
}
 }
 
-- 
1.8.4.2

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


nocow and compression

2014-10-27 Thread Marc Dietrich
Hi,

I created a filesystem and mounted it with compress-force=lzo. Then I did:

# df -h .
Filesystem  Size  Used Avail Use% Mounted on
/dev/loop0  100M  4.1M   96M   5% /mnt

# yes Hello World | dd of=/mnt/test iflag=fullblock bs=1M count=20 
status=none
yes: standard output: Broken pipe
yes: write error

# sync; ls -l ; df -h .
total 20480
-rw-r--r-- 1 root root 20971520 Oct 27 13:48 test
Filesystem  Size  Used Avail Use% Mounted on
/dev/loop0  100M  4.7M   96M   5% /mnt

so far so good ...

# touch test2; chattr +C test2
# dd if=test of=test2 conv=notrunc bs=1M iflag=fullblock oflag=append 
status=none
# sync; ls -l ; df -h .
total 40960
-rw-r--r-- 1 root root 20971520 Oct 27 13:51 test
-rw-r--r-- 1 root root 20971520 Oct 27 13:51 test2
Filesystem  Size  Used Avail Use% Mounted on
/dev/loop0  100M   25M   76M  25% /mnt

oops, no compression.

Is this intended?

Marc

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: nocow and compression

2014-10-27 Thread Marc Dietrich
Am Montag, 27. Oktober 2014, 13:59:24 schrieb Swâmi Petaramesh:
 Le lundi 27 octobre 2014, 13:56:07 Marc Dietrich a écrit :
  oops, no compression.
  Is this intended?
 
 « Compression does not work for NOCOW files » is clearly stated in
 
 https://btrfs.wiki.kernel.org/index.php/Compression#How_does_compression_int
 eract_with_direct_IO_or_COW.3F

ah, sorry, I somehow overlooked this.

Thanks

Marc

signature.asc
Description: This is a digitally signed message part.


Re: nocow and compression

2014-10-27 Thread Swâmi Petaramesh
As far as I understood, NOCOW means that modified parts of files be rewritten 
into place, whereas compression causes compressed blocks of variable sizes to 
be created (depending upon their compression ratio). Changing a block in a file 
will most probably change its compressed size, and then you see why it cannot 
be rewritten into place...

Somebody correct me if I'm wrong ;-)


Le lundi 27 octobre 2014, 14:06:36 Marc Dietrich a écrit :
  
  « Compression does not work for NOCOW files » is clearly stated in
  
  https://btrfs.wiki.kernel.org/index.php/Compression#How_does_compression_i
  nt eract_with_direct_IO_or_COW.3F
 
 ah, sorry, I somehow overlooked this.
 
 Thanks
 
 Marc

-- 
Swâmi Petaramesh sw...@petaramesh.org http://petaramesh.org PGP 9076E32E

Tout être manifesté est là pour embaumer, pour exprimer la Présence.
-- Jean-Marc Mantel

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix race that makes btrfs_lookup_extent_info miss skinny extent items

2014-10-27 Thread Filipe David Manana
On Mon, Oct 27, 2014 at 12:11 PM, Filipe David Manana
fdman...@gmail.com wrote:
 On Mon, Oct 27, 2014 at 11:08 AM, Miao Xie mi...@cn.fujitsu.com wrote:
 On Mon, 27 Oct 2014 09:19:52 +, Filipe Manana wrote:
 We have a race that can lead us to miss skinny extent items in the function
 btrfs_lookup_extent_info() when the skinny metadata feature is enabled.
 So basically the sequence of steps is:

 1) We search in the extent tree for the skinny extent, which returns  0
(not found);

 2) We check the previous item in the returned leaf for a non-skinny extent,
and we don't find it;

 3) Because we didn't find the non-skinny extent in step 2), we release our
path to search the extent tree again, but this time for a non-skinny
extent key;

 4) Right after we released our path in step 3), a skinny extent was inserted
in the extent tree (delayed refs were run) - our second extent tree 
 search
will miss it, because it's not looking for a skinny extent;

 5) After the second search returned (with ret  0), we look for any delayed
ref for our extent's bytenr (and we do it while holding a read lock on 
 the
leaf), but we won't find any, as such delayed ref had just run and 
 completed
after we released out path in step 3) before doing the second search.

 Fix this by removing completely the path release and re-search logic. This 
 is
 safe, because if we seach for a metadata item and we don't find it, we have 
 the
 guarantee that the returned leaf is the one where the item would be 
 inserted,
 and so path-slots[0]  0 and path-slots[0] - 1 must be the slot where the
 non-skinny extent item is if it exists. The only case where path-slots[0] 
 is

 I think this analysis is wrong if there are some independent shared ref 
 metadata for
 a tree block, just like:
 ++-+-+
 | tree block extent item | shared ref1 | shared ref2 |
 ++-+-+

Trying to guess what's in your mind.

Is the concern that if after a non-skinny extent item we have
non-inlined references, the assumption that path-slots[0] - 1 points
to the extent item would be wrong when searching for a skinny extent?

That wouldn't be the case because BTRFS_EXTENT_ITEM_KEY == 168 and
BTRFS_METADATA_ITEM_KEY == 169, with BTRFS_SHARED_BLOCK_REF_KEY ==
182. So in the presence of such non-inlined shared tree block
reference items, searching for a skinny extent item leaves us at a
slot that points to the first non-inlined ref (regardless of its type,
since they're all  169), and therefore path-slots[0] - 1 is the
non-skinny extent item.

thanks.


 Why does that matters? Can you elaborate why it's not correct?

 We're looking for the extent item only in btrfs_lookup_extent_info(),
 and running a delayed ref, independently of being inlined/shared, it
 implies inserting a new extent item or updating an existing extent
 item (updating ref count).

 thanks


 Thanks
 Miao

 zero is when there are no smaller keys in the tree (i.e. no left siblings 
 for
 our leaf), in which case the re-search logic isn't needed as well.

 This race has been present since the introduction of skinny metadata (change
 3173a18f70554fe7880bb2d85c7da566e364eb3c).

 Signed-off-by: Filipe Manana fdman...@suse.com
 ---
  fs/btrfs/extent-tree.c | 8 
  1 file changed, 8 deletions(-)

 diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
 index 9141b2b..2cedd06 100644
 --- a/fs/btrfs/extent-tree.c
 +++ b/fs/btrfs/extent-tree.c
 @@ -780,7 +780,6 @@ search_again:
   else
   key.type = BTRFS_EXTENT_ITEM_KEY;

 -again:
   ret = btrfs_search_slot(trans, root-fs_info-extent_root,
   key, path, 0, 0);
   if (ret  0)
 @@ -796,13 +795,6 @@ again:
   key.offset == root-nodesize)
   ret = 0;
   }
 - if (ret) {
 - key.objectid = bytenr;
 - key.type = BTRFS_EXTENT_ITEM_KEY;
 - key.offset = root-nodesize;
 - btrfs_release_path(path);
 - goto again;
 - }
   }

   if (ret == 0) {


 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html



 --
 Filipe David Manana,

 Reasonable men adapt themselves to the world.
  Unreasonable men adapt the world to themselves.
  That's why all progress depends on unreasonable men.



-- 
Filipe David Manana,

Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs unmountable: read block failed check_tree_block; Couldn't read tree root

2014-10-27 Thread Ansgar Hockmann-Stolle

Hi!

My btrfs system partition went readonly. After reboot it doesnt mount 
anymore. System was openSUSE 13.1 Tumbleweed (kernel 3.17.??). Now I'm 
on openSUSE 13.2-RC1 rescue (kernel 3.16.3). I dumped (dd) the whole 250 
GB SSD to some USB file and tried some btrfs tools on another copy per 
loopback device. But everything failed with:


kernel: BTRFS: failed to read tree root on dm-2

See http://pastebin.com/raw.php?i=dPnU6nzg.

Any hints where to go from here?

Ciao
Ansgar
--
Ansgar Hockmann-Stolle, Universität Osnabrück, Rechenzentrum
Albrechtstraße 28, 49076 Osnabrück, Deutschland, Raum 31/E77B
+49 541 969-2749 (fax -2470), http://www.home.uos.de/anshockm
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem converting data raid0 to raid1: enospc errors during balance

2014-10-27 Thread Chris Murphy
 
  On Oct 26, 2014, at 7:40 PM, Qu Wenruo quwen...@cn.fujitsu.com wrote:
 
 BTW what's the output of 'df' command?

Jasper,
What do you get for the conventional df command when this btrfs volume is 
mounted? Thanks.

Chris Murphy--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: suspicious number of devices: 72057594037927936

2014-10-27 Thread David Sterba
On Mon, Oct 27, 2014 at 10:57:59AM +, Filipe David Manana wrote:
  The only thing fancy may be the machine: PowerBook G4 (powerpc 32 bit),
  running Debian/Linux (stable).
 
  The message comes from the newly added fs/btrfs/disk-io.c:
 
  if (sb-num_devices  (1UL  31))
   printk(KERN_WARNING BTRFS: suspicious number of devices: %llu\n,
sb-num_devices);
 
  And 72057594037927936 is 2^56, so maybe there's an endianess problem here?
 
 Sounds like you need to revert this patch:
 https://patchwork.kernel.org/patch/5004701/ (which ignored endianess)
 or go back to an older kernel (don't use 3.17 or 3.17.1 however, due
 to other serious issues, latest 3.16.x should be safe). There's a v2
 of that patch that fixes the endianess issue, but it didn't make it to
 3.18-rc1/2 (https://patchwork.kernel.org/patch/5082701/)

Yeah sorry, I sent the v2 too late, here's an incremental that applies
on top of current 3.18-rc

https://patchwork.kernel.org/patch/5160651/
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Problem converting data raid0 to raid1: enospc errors during balance

2014-10-27 Thread Jasper Verberk
Hej guys!

Thanks for your input on the issue this far.

Too my knowledge raid1 in btrfs means 2 copies of each piece of data 
independent of the amount of disks used.

So 4 x 2,73tb would result in a totaal storage of roughly 5,5tb right?

Shouldn't this be more then enough?

btw, here is the output for df:

http://paste.debian.net/128932/



 Date: Mon, 27 Oct 2014 12:49:15 +0800
 From: quwen...@cn.fujitsu.com
 To: li...@colorremedies.com
 CC: jverb...@hotmail.com; linux-btrfs@vger.kernel.org
 Subject: Re: Problem converting data raid0 to raid1: enospc errors during 
 balance


  Original Message 
 Subject: Re: Problem converting data raid0 to raid1: enospc errors
 during balance
 From: Chris Murphy li...@colorremedies.com
 To: Qu Wenruo quwen...@cn.fujitsu.com
 Date: 2014年10月27日 12:40
 On Oct 26, 2014, at 7:40 PM, Qu Wenruo quwen...@cn.fujitsu.com wrote:

 Hi,

 Although I'm not completely sure, but it seems that, you really ran out of 
 space.

 [1] Your array won't hold raid1 for 1.97T data
 Your array used up 1.97T raid0 data, it takes 1.97T for raid0.
 But if converted to 1.97T, it will occupy 1.97T X2 = 3.94T.
 Your array are only 2.73T, it is too small to contain the data.
 I'm not understanding. The btrfs fi show, shows 4x 2.73TiB devices, so that 
 seems like it's a 10+TiB array.

 There's 2.04TiB raid0 data chunks, so roughly 500GiB per device, yet 1.94TiB 
 is reported used per device by fi show. Confusing.

 Also it's still very confusing: Data, RAID1: total=2.85TiB, used=790.46GiB 
 whether this means 2.85TiB out of 10TiB is allocated, or if it's twice that 
 due to raid1. I can't ever remember this presentation detail, so again the 
 secret decode ring where the UI doesn't expressly tell us what's going on is 
 going to continue to be a source of confusion for users.


 Chris Murphy
 Oh, I misread the output

 That turns strange now
 BTW what's the output of 'df' command?

 Thanks,
 Qu

  

Re: Problem converting data raid0 to raid1: enospc errors during balance

2014-10-27 Thread Chris Murphy

On Oct 27, 2014, at 9:56 AM, Jasper Verberk jverb...@hotmail.com wrote:

 These are the results to a normal df:
 
 http://paste.debian.net/128932/
 
 The mountpoint is /data.

OK so this is with the new computation in kernel 3.17 (which I think contains a 
bug by counting free space twice); so now it shows available blocks based on 
the loss due to mirroring or parity. So 1k blocks 5860533168 = 5.45TiB. If you 
boot an older kernel my expectation is this shows up as 10.91TiB. In any case, 
df says there's 1.77TiB worth of data, so there should be plenty of space.

Somewhere there's a bug. Either the 'btrfs fi df' is insufficiently 
communicating whether the desired operation can be done, or there's actual 
kernel confusion on how much space is available to do the conversion.

I wonder what happens if you go back to kernel 3.16 and try do do the 
conversion?


Chris Murphy--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS balance segfault, where to go from here

2014-10-27 Thread Chris Murphy

On Oct 27, 2014, at 3:26 AM, Stephan Alz stephan...@gmx.com wrote:
 
 My question is where to go from here? What I going to do right now is to copy 
 the most important data to another separated XFS drive.
 What I planning to do is:
 
 1, Upgrade the kernel
 2, Upgrade BTRFS
 3, Continue the balancing.

Definitely upgrade the kernel and see how that goes, there's been many many 
changes since 3.13. I would upgrade the user space tools also but that's not as 
important.

FYI you can mount with skip_balance mount option to inhibit resuming balance, 
sometimes pausing the balance isn't fast enough when there are balance problems.

 
 
 Could someone please also explain that how is exactly the raid10 setup works 
 with ODD number of drives with btrfs? 
 Raid10 should be a stripe of mirrors. Now then this sdf drive is mirrored or 
 striped or what?

I have no idea honestly. Btrfs is very tolerant of adding odd number and sizes 
of devices, but things get a bit nutty in actual operation sometimes. This 
might be one of them because traditionally raid10 is always even number of 
drives, odd numbers just don't make sense. But Btrfs allows the addition; I 
think the expectation is you'd have added two before doing the balance though.

 Some btrfs gurus could tell me that should I be worried of dataloss because 
 of this or not?

Anything is possible so hopefully you have backups. My expectation is worse 
case scenario the fs gets confused and you can't mount rw anymore in which case 
you won't be able to make it an even drive raid10. But in the case even as ro 
you can update your backups, blow away the Btrfs volume and start from scratch 
with an even number of drives, right?

 Would I need even more free space just to add a 5th drive? If so how much 
 more?

Gonna guess you'd need to add a drive that's at least 2.83TiB in size if you 
want to keep it raid10.

Chris Murphy--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: suspicious number of devices: 72057594037927936

2014-10-27 Thread Christian Kujau
On Mon, 27 Oct 2014 at 16:35, David Sterba wrote:
 Yeah sorry, I sent the v2 too late, here's an incremental that applies
 on top of current 3.18-rc
 
 https://patchwork.kernel.org/patch/5160651/

Yup, that fixes it. Thank you! If it's needed:

  Tested-by: Christian Kujau li...@nerdbynature.de

@Filipe: and thanks for warning me about 3.17 - I used 3.17.0 since it 
came out and compiled kernels on the btrfs partition and haven't had any 
issues. But it wasn't used very often, so whatever the serious issues 
were, I haven't experienced any.

Christian.
-- 
BOFH excuse #98:

The vendor put the bug there.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: suspicious number of devices: 72057594037927936

2014-10-27 Thread Hugo Mills
On Mon, Oct 27, 2014 at 11:21:13AM -0700, Christian Kujau wrote:
 On Mon, 27 Oct 2014 at 16:35, David Sterba wrote:
  Yeah sorry, I sent the v2 too late, here's an incremental that applies
  on top of current 3.18-rc
  
  https://patchwork.kernel.org/patch/5160651/
 
 Yup, that fixes it. Thank you! If it's needed:
 
   Tested-by: Christian Kujau li...@nerdbynature.de
 
 @Filipe: and thanks for warning me about 3.17 - I used 3.17.0 since it 
 came out and compiled kernels on the btrfs partition and haven't had any 
 issues. But it wasn't used very often, so whatever the serious issues 
 were, I haven't experienced any.

   If you make read-only snapshots, there's a good chance of metadata
corruption. It's fixed in 3.17.2.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- Our so-called leaders speak/with words they try to jail ya/ ---   
They subjugate the meek/but it's the rhetoric of failure.
 


signature.asc
Description: Digital signature


RAID1 fails to recover chunk tree

2014-10-27 Thread Zack Coffey
Revisit of a previous issue. Setup a single 640GB drive with BTRFS and
compression. This was not a system drive, just a place to put random
junk.

Made a RAID1 with another drive of just the metadata. Was in
that state for less than 12 hours-ish, removed the second drive and
now cannot get to any data on the original drive. Data remained single
while only metadata was RAID1.

Single drive btrfs was made on Ubuntu with kernel 3.13.0 and tools
3.12.

$ sudo mount -o degraded /dev/sdc1 /media/Data/
mount: wrong fs type, bad option, bad superblock on /dev/sdc1,
   missing codepage or helper program, or other error
   In some cases useful info is found in syslog - try
   dmesg | tail  or so

$ dmesg | tail
[45353.869448] KBD BUG in
../../../../../../../../
drivers/2d/lnx/fgl/drm/kernel/
gal.c at line:
304!
[45353.901511] KBD BUG in
../../../../../../../../drivers/2d/lnx/fgl/drm/kernel/gal.c at line:
304!
[45353.901666] KBD BUG in
../../../../../../../../drivers/2d/lnx/fgl/drm/kernel/gal.c at line:
304!
[45354.148488] KBD BUG in
../../../../../../../../drivers/2d/lnx/fgl/drm/kernel/gal.c at line:
304!
[45354.148573] KBD BUG in
../../../../../../../../drivers/2d/lnx/fgl/drm/kernel/gal.c at line:
304!
[46241.155350] btrfs: device fsid bd78815a-802b-43e2-8387-fc6ab4237d67
devid 1 transid 60944 /dev/sdc1
[46241.155923] btrfs: allowing degraded mounts
[46241.155927] btrfs: disk space caching is enabled
[46241.159436] btrfs: failed to read chunk root on sdc1
[46241.177815] btrfs: open_ctree failed

$ btrfs-show-super /dev/sdc1
superblock: bytenr=65536, device=/dev/sdc1
--
---
csum0x93bcb1b5 [match]
bytenr  65536
flags   0x1
magic   _BHRfS_M [match]
fsidbd78815a-802b-43e2-8387-fc6ab4237d67
label
generation  60944
root909586694144
sys_array_size  97
chunk_root_generation   60938
root_level  1
chunk_root  911673917440
chunk_root_level1
log_root0
log_root_transid0
log_root_level  0
total_bytes 1115871535104
bytes_used  321833435136
sectorsize  4096
nodesize4096
leafsize4096
stripesize  4096
root_dir6
num_devices 2
compat_flags0x0
compat_ro_flags 0x0
incompat_flags  0x9
csum_type   0
csum_size   4
cache_generation60944
uuid_tree_generation60944
dev_item.uuid   d82b2027-17b6-4513-a86d-9227a42d7ed1
dev_item.fsid   bd78815a-802b-43e2-8387-fc6ab4237d67 [match]
dev_item.type   0
dev_item.total_bytes615763673088
dev_item.bytes_used 324270030848
dev_item.io_align   4096
dev_item.io_width   4096
dev_item.sector_size4096
dev_item.devid  1
dev_item.dev_group  0
dev_item.seek_speed 0
dev_item.bandwidth  0
dev_item.generation 0


$ sudo btrfs device add -f /dev/sdh1 /dev/sdc1
ERROR: error adding the device '/dev/sdh1' - Inappropriate ioctl for device

$ sudo btrfs device delete missing /dev/sdc1
ERROR: error removing the device 'missing' - Inappropriate ioctl for device

$ sudo mount -o degraded,defaults,compress=lzo /dev/sdc1 /media/Data/
mount: wrong fs type, bad option, bad superblock on /dev/sdc1,
   missing codepage or helper program, or other error
   In some cases useful info is found in syslog - try
   dmesg | tail  or so

$ dmesg | tail
[106991.655384] btrfs: device fsid
bd78815a-802b-43e2-8387-fc6ab4237d67 devid 1 transid 60944 /dev/sdc1
[106991.665066] btrfs: device fsid
bd78815a-802b-43e2-8387-fc6ab4237d67 devid 1 transid 60944 /dev/sdc1
[107019.954397] btrfs: device fsid
bd78815a-802b-43e2-8387-fc6ab4237d67 devid 1 transid 60944 /dev/sdc1
[107019.962009] btrfs: device fsid
bd78815a-802b-43e2-8387-fc6ab4237d67 devid 1 transid 60944 /dev/sdc1
[107070.124927] btrfs: device fsid
bd78815a-802b-43e2-8387-fc6ab4237d67 devid 1 transid 60944 /dev/sdc1
[107070.126475] btrfs: allowing degraded mounts
[107070.126479] btrfs: use lzo compression
[107070.126480] btrfs: disk space caching is enabled
[107070.127254] btrfs: failed to read chunk root on sdc1
[107070.142983] btrfs: open_ctree failed

$ sudo btrfs rescue super-recover -v /dev/sdc1
All Devices:
Device: id = 1, name = /dev/sdc1

Before Recovering:
[All good supers]:
device name = /dev/sdc1
superblock bytenr = 65536

device name = /dev/sdc1
superblock bytenr = 67108864

device name = /dev/sdc1
superblock bytenr = 274877906944

[All bad supers]:

All supers are valid, no need to recover

$ btrfs rescue chunk-recover -v /dev/sdc1
snipped
Chunk: start = 860100755456, len = 1073741824, type = 1, num_stripes = 1
  Stripes list:
  [ 0] Stripe: devid = 1, 

Re: [PATCH] Btrfs: fix snapshot inconsistency after a file write followed by truncate

2014-10-27 Thread Chris Mason
On Tue, Oct 21, 2014 at 6:12 AM, Filipe Manana fdman...@suse.com 
wrote:
If right after starting the snapshot creation ioctl we perform a 
write against a
file followed by a truncate, with both operations increasing the 
file's size, we
can get a snapshot tree that reflects a state of the source 
subvolume's tree where
the file truncation happened but the write operation didn't. This 
leaves a gap
between 2 file extent items of the inode, which makes btrfs' fsck 
complain about it.


For example, if we perform the following file operations:

$ mkfs.btrfs -f /dev/vdd
$ mount /dev/vdd /mnt
$ xfs_io -f \
  -c pwrite -S 0xaa -b 32K 0 32K \
  -c fsync \
  -c pwrite -S 0xbb -b 32770 16K 32770 \
  -c truncate 90123 \
  /mnt/foobar

and the snapshot creation ioctl was just called before the second 
write, we often

can get the following inode items in the snapshot's btree:

item 120 key (257 INODE_ITEM 0) itemoff 7987 itemsize 160
inode generation 146 transid 7 size 90123 block group 
0 mode 100600 links 1 uid 0 gid 0 rdev 0 flags 0x0

item 121 key (257 INODE_REF 256) itemoff 7967 itemsize 20
inode ref index 282 namelen 10 name: foobar
item 122 key (257 EXTENT_DATA 0) itemoff 7914 itemsize 53
extent data disk byte 1104855040 nr 32768
extent data offset 0 nr 32768 ram 32768
extent compression 0
item 123 key (257 EXTENT_DATA 53248) itemoff 7861 itemsize 53
extent data disk byte 0 nr 0
extent data offset 0 nr 40960 ram 40960
extent compression 0

There's a file range, corresponding to the interval [32K; ALIGN(16K + 
32770, 4096)[
for which there's no file extent item covering it. This is because 
the file write
and file truncate operations happened both right after the snapshot 
creation ioctl
called btrfs_start_delalloc_inodes(), which means we didn't start and 
wait for the
ordered extent that matches the write and, in btrfs_setsize(), we 
were able to call
btrfs_cont_expand() before being able to commit the current 
transaction in the
snapshot creation ioctl. So this made it possibe to insert the hole 
file extent
item in the source subvolume (which represents the region added by 
the truncate)

right before the transaction commit from the snapshot creation ioctl.

Btrfs' fsck tool complains about such cases with a message like the 
following:


root 331 inode 257 errors 100, file extent discount

From a user perspective, the expectation when a snapshot is created 
while those
file operations are being performed is that the snapshot will have a 
file that

either:

1) is empty
2) only the first write was captured
3) only the 2 writes were captured
4) both writes and the truncation were captured

But never capture a state where only the first write and the 
truncation were

captured (since the second write was performed before the truncation).

A test case for xfstests follows.

Signed-off-by: Filipe Manana fdman...@suse.com
---
 fs/btrfs/inode.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 0d41741..c28b78f 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4622,6 +4622,9 @@ static int btrfs_setsize(struct inode *inode, 
struct iattr *attr)

}

if (newsize  oldsize) {
+   ret = btrfs_wait_ordered_range(inode, 0, (u64)-1);
+   if (ret)
+   return ret;


Expanding truncates aren't my favorite operation, but we don't want 
them to imply fsync.  I'm holding off on this one while I work out the 
rest of the vacation backlog ;)


-chris



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs unmountable: read block failed check_tree_block; Couldn't read tree root

2014-10-27 Thread Ansgar Hockmann-Stolle

Am 27.10.14 um 14:23 schrieb Ansgar Hockmann-Stolle:

Hi!

My btrfs system partition went readonly. After reboot it doesnt mount
anymore. System was openSUSE 13.1 Tumbleweed (kernel 3.17.??). Now I'm
on openSUSE 13.2-RC1 rescue (kernel 3.16.3). I dumped (dd) the whole 250
GB SSD to some USB file and tried some btrfs tools on another copy per
loopback device. But everything failed with:

kernel: BTRFS: failed to read tree root on dm-2

See http://pastebin.com/raw.php?i=dPnU6nzg.

Any hints where to go from here?


After an offlist hint (thanks Tom!) I compiled the latest btrfs-progs 
3.17 and tried some more ...


linux:~/bin # ./btrfs --version
Btrfs v3.17
linux:~/bin # ./btrfs-find-root /dev/sda3
Super think's the tree root is at 1015238656, chunk root 20971520
Well block 239718400 seems great, but generation doesn't match, 
have=661931, want=663595 level 0
Well block 239722496 seems great, but generation doesn't match, 
have=661931, want=663595 level 0
Well block 320098304 seems great, but generation doesn't match, 
have=662233, want=663595 level 0
Well block 879341568 seems great, but generation doesn't match, 
have=663227, want=663595 level 0
Well block 879345664 seems great, but generation doesn't match, 
have=663227, want=663595 level 0
Well block 879382528 seems great, but generation doesn't match, 
have=663227, want=663595 level 0
Well block 879398912 seems great, but generation doesn't match, 
have=663227, want=663595 level 0
Well block 879403008 seems great, but generation doesn't match, 
have=663227, want=663595 level 0
Well block 879423488 seems great, but generation doesn't match, 
have=663227, want=663595 level 0
Well block 879435776 seems great, but generation doesn't match, 
have=663227, want=663595 level 0
Well block 880095232 seems great, but generation doesn't match, 
have=663227, want=663595 level 0
Well block 881504256 seems great, but generation doesn't match, 
have=663228, want=663595 level 0
Well block 881512448 seems great, but generation doesn't match, 
have=663228, want=663595 level 0
Well block 936271872 seems great, but generation doesn't match, 
have=663397, want=663595 level 0
Well block 1004490752 seems great, but generation doesn't match, 
have=663571, want=663595 level 0
Well block 1007804416 seems great, but generation doesn't match, 
have=663572, want=663595 level 0
Well block 1012031488 seems great, but generation doesn't match, 
have=663575, want=663595 level 0
Well block 1012396032 seems great, but generation doesn't match, 
have=663575, want=663595 level 0
Well block 1012633600 seems great, but generation doesn't match, 
have=663586, want=663595 level 0
Well block 1012871168 seems great, but generation doesn't match, 
have=663585, want=663595 level 0
Well block 1015201792 seems great, but generation doesn't match, 
have=663588, want=663595 level 0
Well block 1015836672 seems great, but generation doesn't match, 
have=663596, want=663595 level 1
Well block 44132536320 seems great, but generation doesn't match, 
have=658774, want=663595 level 0
Well block 44178280448 seems great, but generation doesn't match, 
have=658774, want=663595 level 0
Well block 87443644416 seems great, but generation doesn't match, 
have=661349, want=663595 level 0
Well block 87514079232 seems great, but generation doesn't match, 
have=651051, want=663595 level 0
Well block 87517679616 seems great, but generation doesn't match, 
have=661349, want=663595 level 0
Well block 98697822208 seems great, but generation doesn't match, 
have=643548, want=663595 level 0
Well block 103285026816 seems great, but generation doesn't match, 
have=661672, want=663595 level 0
Well block 103309553664 seems great, but generation doesn't match, 
have=661674, want=663595 level 0
Well block 103523430400 seems great, but generation doesn't match, 
have=661767, want=663595 level 0

No more metdata to scan, exiting

This line I found interesting because have is want + 1:
Well block 1015836672 seems great, but generation doesn't match, 
have=663596, want=663595 level 1


And here the tail of btrfs rescue chunk-recover (full output at 
http://pastebin.com/raw.php?i=1D5VgDxv)


[..]
Total Chunks:   234
  Heathy:   231
  Bad:  3

Orphan Block Groups:

Orphan Device Extents:
Couldn't map the block 1015238656
btrfs: volumes.c:1140: btrfs_num_copies: Assertion `!(ce-start  
logical || ce-start + ce-size  logical)' failed.

Aborted


Sadly btrfs check --repair keep up refusing to do its job.

linux:~ # btrfs check --repair /dev/sda3
enabling repair mode
Check tree block failed, want=1015238656, have=0
Check tree block failed, want=1015238656, have=0
Check tree block failed, want=1015238656, have=0
Check tree block failed, want=1015238656, have=0
Check tree block failed, want=1015238656, have=0
read block failed check_tree_block
Couldn't read tree root
Checking filesystem on /dev/sda3
UUID: 1af256b5-b1ad-443b-aeee-a6853e70b7e2
Critical roots corrupted, unable to fsck the FS
Segmentation fault

Any more hints?

Ciao
   

Re: btrfs unmountable: read block failed check_tree_block; Couldn't read tree root

2014-10-27 Thread Duncan
Ansgar Hockmann-Stolle posted on Mon, 27 Oct 2014 14:23:19 +0100 as
excerpted:

 Hi!
 
 My btrfs system partition went readonly. After reboot it doesnt mount
 anymore. System was openSUSE 13.1 Tumbleweed (kernel 3.17.??). Now I'm
 on openSUSE 13.2-RC1 rescue (kernel 3.16.3). I dumped (dd) the whole 250
 GB SSD to some USB file and tried some btrfs tools on another copy per
 loopback device. But everything failed with:
 
 kernel: BTRFS: failed to read tree root on dm-2
 
 See http://pastebin.com/raw.php?i=dPnU6nzg.
 
 Any hints where to go from here?

Good job posting initial problem information.  =:^)  A lot of folks take 
2-3 rounds of request and reply before that much info is available on the 
problem.

While others may be able to assist you in restoring that filesystem to 
working condition, my focus is more on recovering what can be recovered 
from it and doing a fresh mkfs.

System partition, 250 GB, looks to be just under 231 GiB based on the 
total bytes from btrfs-show-super.

How recent is your backup, and/or being a system partition, is it simply 
the distro installation, possibly without too much customization, thus 
easily reinstalled?

IOW, if you were to call that partition a total loss and simply mkfs it, 
would you lose anything real valuable that's not backed up?  (Of course, 
the standard lecture at this point is that if it's not backed up, by 
definition you didn't consider it valuable enough to be worth the hassle, 
so by definition it's not valuable and you can simply blow it away, 
but...)

If you're in good shape in that regard, that's what I'd probably do at 
this point, keeping the dd image you made in case someone's interested in 
tracking the problem down and making btrfs handle that case.

If there's important files on there that you don't have backed up, or if 
you have a backup but it's older than you'd like and you want to try to 
recover current versions of what you can (the situation I was in a few 
months ago), then btrfs restore is what you're interested in.  Restore 
works on an /unmounted/ (and potentially unmountable, as here) 
filesystem, letting you retrieve files from it and copy them to other 
filesystems.  It does NOT write anything to the damaged filesystem 
itself, so no worries about making the problem worse.

There's a page on the wiki describing how to use btrfs restore along with 
btrfs-find-root in some detail, definitely more than is in the manpages 
or that I want to do here.

https://btrfs.wiki.kernel.org/index.php/Restore

Some useful hints that weren't originally clear to me as I used that page 
here:

* Generation and transid are the same thing, a sequentially increasing 
number that updates every time the root tree is written.  The generation 
recorded in your superblocks (from btrfs-show-super) is 663595, so the 
idea would be that generation/transid, falling back one to 663594 if 95 
isn't usable, then 93, then... etc.  The lower the number the further 
back in history you're going, so obviously, you want the closest to 
663595 that you can get, that still gives you access to a (nearly) whole 
filesystem, or at least the parts of it you are interested in.

* That page was written before restore's -D/--dry-run option was 
available.  This option can be quite helpful, and I recommend using it to 
see what will actually be restored at each generation and associated tree 
root (bytenr/byte-number).  Tho (with -v/verbose) the list of files 
restored will normally be too long to go thru in detail, you can either 
scan it or pipe the output to wc -l to get a general idea of how many 
files would be restored.

* Restore's -l/list-tree-roots option isn't listed on the page either.  
btrfs restore -l -t bytenr can be quite useful, giving you a nice list 
of trees available for the generation corresponding to that bytenr (as 
found using btrfs-find-root).  This is where the page's advice to pick 
the latest tree root with all or as many as possible of the filesystem 
trees in it, comes in, since this lets you easily see which trees each 
root has available.

* I don't use snapshots or subvolumes here, while I understand OpenSuSE 
uses them rather heavily (via snapper).  Thus I have no direct experience 
with restore's snapshot-related options.  Presumably you can either 
ignore the snapshots (the apparent default) or restore them either in 
general (using -s) or selectively (using -r, with the appropriate 
snapshot rootid).

* It's worth noting that restore simply lets you retrieve files.  It does 
*NOT* retrieve file ownership or permissions, with the restored files all 
being owned by the user you ran btrfs restore under (presumably root), 
with $UMASK permissions.  You'll have to restore ownership and 
permissions manually.

When I used restore here I had a backup, but the backup was old.  So I 
hacked up a bash scriptlet with a for loop, that went thru all the 
restored files recursively, comparing them against the old backup.  If 
the file existed in 

Re: Does btrfs-restore report missing/corrupt files?

2014-10-27 Thread Robert White

On 10/26/2014 12:59 AM, Christian Tschabuschnig wrote:


Hello,

currently I am trying to recover a btrfs filesystem which had a few subvolumes. 
When running
# btrfs restore -sx /dev/xxx .
one subvolume gets restored.


Important Aside: The one time I had to resort to btrfs restore I didn't 
get the contents of _many_ of the really small files. My _guess_ is that 
those where the files small enough to reside entirely within the 
original filesystem's metadata.


You should mount the filesystem read-only and recursively copy the 
hirearchy to another file system as well as doing a restore. The two 
results can then be folded together, or at least the former might help 
you find some of what the latter might miss.


I could be totally wrong, or restore could have been improved since 
then, but it was what seemed to be happening.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs unmountable: read block failed check_tree_block; Couldn't read tree root

2014-10-27 Thread Qu Wenruo


 Original Message 
Subject: Re: btrfs unmountable: read block failed check_tree_block; 
Couldn't read tree root

From: Qu Wenruo quwen...@cn.fujitsu.com
To: Ansgar Hockmann-Stolle ansgar.hockmann-sto...@uni-osnabrueck.de, 
linux-btrfs@vger.kernel.org

Date: 2014年10月28日 09:05


 Original Message 
Subject: Re: btrfs unmountable: read block failed check_tree_block; 
Couldn't read tree root

From: Ansgar Hockmann-Stolle ansgar.hockmann-sto...@uni-osnabrueck.de
To: linux-btrfs@vger.kernel.org
Date: 2014年10月28日 07:03

Am 27.10.14 um 14:23 schrieb Ansgar Hockmann-Stolle:

Hi!

My btrfs system partition went readonly. After reboot it doesnt mount
anymore. System was openSUSE 13.1 Tumbleweed (kernel 3.17.??). Now I'm
on openSUSE 13.2-RC1 rescue (kernel 3.16.3). I dumped (dd) the whole 
250

GB SSD to some USB file and tried some btrfs tools on another copy per
loopback device. But everything failed with:

kernel: BTRFS: failed to read tree root on dm-2

See http://pastebin.com/raw.php?i=dPnU6nzg.

Any hints where to go from here?


After an offlist hint (thanks Tom!) I compiled the latest btrfs-progs 
3.17 and tried some more ...


linux:~/bin # ./btrfs --version
Btrfs v3.17
linux:~/bin # ./btrfs-find-root /dev/sda3
Super think's the tree root is at 1015238656, chunk root 20971520
Well block 239718400 seems great, but generation doesn't match, 
have=661931, want=663595 level 0
Well block 239722496 seems great, but generation doesn't match, 
have=661931, want=663595 level 0
Well block 320098304 seems great, but generation doesn't match, 
have=662233, want=663595 level 0
Well block 879341568 seems great, but generation doesn't match, 
have=663227, want=663595 level 0
Well block 879345664 seems great, but generation doesn't match, 
have=663227, want=663595 level 0
Well block 879382528 seems great, but generation doesn't match, 
have=663227, want=663595 level 0
Well block 879398912 seems great, but generation doesn't match, 
have=663227, want=663595 level 0
Well block 879403008 seems great, but generation doesn't match, 
have=663227, want=663595 level 0
Well block 879423488 seems great, but generation doesn't match, 
have=663227, want=663595 level 0
Well block 879435776 seems great, but generation doesn't match, 
have=663227, want=663595 level 0
Well block 880095232 seems great, but generation doesn't match, 
have=663227, want=663595 level 0
Well block 881504256 seems great, but generation doesn't match, 
have=663228, want=663595 level 0
Well block 881512448 seems great, but generation doesn't match, 
have=663228, want=663595 level 0
Well block 936271872 seems great, but generation doesn't match, 
have=663397, want=663595 level 0
Well block 1004490752 seems great, but generation doesn't match, 
have=663571, want=663595 level 0
Well block 1007804416 seems great, but generation doesn't match, 
have=663572, want=663595 level 0
Well block 1012031488 seems great, but generation doesn't match, 
have=663575, want=663595 level 0
Well block 1012396032 seems great, but generation doesn't match, 
have=663575, want=663595 level 0
Well block 1012633600 seems great, but generation doesn't match, 
have=663586, want=663595 level 0
Well block 1012871168 seems great, but generation doesn't match, 
have=663585, want=663595 level 0
Well block 1015201792 seems great, but generation doesn't match, 
have=663588, want=663595 level 0
Well block 1015836672 seems great, but generation doesn't match, 
have=663596, want=663595 level 1
Well block 44132536320 seems great, but generation doesn't match, 
have=658774, want=663595 level 0
Well block 44178280448 seems great, but generation doesn't match, 
have=658774, want=663595 level 0
Well block 87443644416 seems great, but generation doesn't match, 
have=661349, want=663595 level 0
Well block 87514079232 seems great, but generation doesn't match, 
have=651051, want=663595 level 0
Well block 87517679616 seems great, but generation doesn't match, 
have=661349, want=663595 level 0
Well block 98697822208 seems great, but generation doesn't match, 
have=643548, want=663595 level 0
Well block 103285026816 seems great, but generation doesn't match, 
have=661672, want=663595 level 0
Well block 103309553664 seems great, but generation doesn't match, 
have=661674, want=663595 level 0
Well block 103523430400 seems great, but generation doesn't match, 
have=661767, want=663595 level 0

No more metdata to scan, exiting

This line I found interesting because have is want + 1:
Well block 1015836672 seems great, but generation doesn't match, 
have=663596, want=663595 level 1


And here the tail of btrfs rescue chunk-recover (full output at 
http://pastebin.com/raw.php?i=1D5VgDxv)


[..]
Total Chunks:234
  Heathy:231
  Bad:3

Orphan Block Groups:

Orphan Device Extents:
Couldn't map the block 1015238656
btrfs: volumes.c:1140: btrfs_num_copies: Assertion `!(ce-start  
logical || ce-start + ce-size  logical)' failed.

Aborted

After looking into the 3 bad chunks, it turns 

Re: [PATCH] Btrfs: fix race that makes btrfs_lookup_extent_info miss skinny extent items

2014-10-27 Thread Miao Xie
On Mon, 27 Oct 2014 13:44:22 +, Filipe David Manana wrote:
 On Mon, Oct 27, 2014 at 12:11 PM, Filipe David Manana
 fdman...@gmail.com wrote:
 On Mon, Oct 27, 2014 at 11:08 AM, Miao Xie mi...@cn.fujitsu.com wrote:
 On Mon, 27 Oct 2014 09:19:52 +, Filipe Manana wrote:
 We have a race that can lead us to miss skinny extent items in the function
 btrfs_lookup_extent_info() when the skinny metadata feature is enabled.
 So basically the sequence of steps is:

 1) We search in the extent tree for the skinny extent, which returns  0
(not found);

 2) We check the previous item in the returned leaf for a non-skinny extent,
and we don't find it;

 3) Because we didn't find the non-skinny extent in step 2), we release our
path to search the extent tree again, but this time for a non-skinny
extent key;

 4) Right after we released our path in step 3), a skinny extent was 
 inserted
in the extent tree (delayed refs were run) - our second extent tree 
 search
will miss it, because it's not looking for a skinny extent;

 5) After the second search returned (with ret  0), we look for any delayed
ref for our extent's bytenr (and we do it while holding a read lock on 
 the
leaf), but we won't find any, as such delayed ref had just run and 
 completed
after we released out path in step 3) before doing the second search.

 Fix this by removing completely the path release and re-search logic. This 
 is
 safe, because if we seach for a metadata item and we don't find it, we 
 have the
 guarantee that the returned leaf is the one where the item would be 
 inserted,
 and so path-slots[0]  0 and path-slots[0] - 1 must be the slot where the
 non-skinny extent item is if it exists. The only case where path-slots[0] 
 is

 I think this analysis is wrong if there are some independent shared ref 
 metadata for
 a tree block, just like:
 ++-+-+
 | tree block extent item | shared ref1 | shared ref2 |
 ++-+-+
 
 Trying to guess what's in your mind.
 
 Is the concern that if after a non-skinny extent item we have
 non-inlined references, the assumption that path-slots[0] - 1 points
 to the extent item would be wrong when searching for a skinny extent?
 
 That wouldn't be the case because BTRFS_EXTENT_ITEM_KEY == 168 and
 BTRFS_METADATA_ITEM_KEY == 169, with BTRFS_SHARED_BLOCK_REF_KEY ==
 182. So in the presence of such non-inlined shared tree block
 reference items, searching for a skinny extent item leaves us at a
 slot that points to the first non-inlined ref (regardless of its type,
 since they're all  169), and therefore path-slots[0] - 1 is the
 non-skinny extent item.

You are right. I forget to check the value of key type. Sorry.

This patch seems good for me.

Reviewed-by: Miao Xie mi...@cn.fujitsu.com

 
 thanks.
 

 Why does that matters? Can you elaborate why it's not correct?

 We're looking for the extent item only in btrfs_lookup_extent_info(),
 and running a delayed ref, independently of being inlined/shared, it
 implies inserting a new extent item or updating an existing extent
 item (updating ref count).

 thanks


 Thanks
 Miao

 zero is when there are no smaller keys in the tree (i.e. no left siblings 
 for
 our leaf), in which case the re-search logic isn't needed as well.

 This race has been present since the introduction of skinny metadata 
 (change
 3173a18f70554fe7880bb2d85c7da566e364eb3c).

 Signed-off-by: Filipe Manana fdman...@suse.com
 ---
  fs/btrfs/extent-tree.c | 8 
  1 file changed, 8 deletions(-)

 diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
 index 9141b2b..2cedd06 100644
 --- a/fs/btrfs/extent-tree.c
 +++ b/fs/btrfs/extent-tree.c
 @@ -780,7 +780,6 @@ search_again:
   else
   key.type = BTRFS_EXTENT_ITEM_KEY;

 -again:
   ret = btrfs_search_slot(trans, root-fs_info-extent_root,
   key, path, 0, 0);
   if (ret  0)
 @@ -796,13 +795,6 @@ again:
   key.offset == root-nodesize)
   ret = 0;
   }
 - if (ret) {
 - key.objectid = bytenr;
 - key.type = BTRFS_EXTENT_ITEM_KEY;
 - key.offset = root-nodesize;
 - btrfs_release_path(path);
 - goto again;
 - }
   }

   if (ret == 0) {


 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html



 --
 Filipe David Manana,

 Reasonable men adapt themselves to the world.
  Unreasonable men adapt the world to themselves.
  That's why all progress depends on unreasonable men.
 
 
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  

Re: Btrfs-progs release 3.17

2014-10-27 Thread Gui Hecheng
On Thu, 2014-10-23 at 15:23 +0200, Petr Janecek wrote:
 Hello Gui,
 
  Oh, it seems that there are btrfs with missing devs that are bringing
  troubles to the @open_ctree_... function.
 
   what do you mean by missing devs? I have no degraded fs.

Ah, sorry, I'm too focused on the problem that Anand's script pointed
out. Ignore this missing devs.

   The time btrfs fi sh spends scanning disks of a filesystem seems to
 be proportional to the amount of data stored on them: on a completely
 idle system, of ~20s total time it spends 10s scanning each of /mnt/b
 and /mnt/b0, and almost no time on /mnt/b3 (which is the biggest)
 
 Filesystem  Size  Used Avail Use% Mounted on
 /dev/sdm5.5T  2.4T  2.1T  54% /mnt/b
 /dev/sda5.5T  2.5T  3.1T  45% /mnt/b0
 /dev/sde7.3T   90G  5.4T   2% /mnt/b3

For your original problems:
o error messages:
  The concurrency problem exists as Anand said. As you said, running
balance  cp lead to such messages, so I think there are some
unintentional redundency works over the mounted devices when dealing
with umounted ones. I'll try to 

o stalling:
  This may be due to concurrency problem either. After the first problem
handled, let's see what happens.

Thanks,
Gui
 
 Thanks,
 
 Petr
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs-progs release 3.17

2014-10-27 Thread Gui Hecheng
On Thu, 2014-10-23 at 21:36 +0800, Anand Jain wrote:
 
   there is no point in re-creating so many btrfs kernel's logic in user
   space. its just unnecessary, when kernel is already doing it. use
   some interface to get info from kernel after device is registered,
   (not necessarily mounted). so progs can be as sleek as possible.
   to me it started as just one more bug now we have fixed so many many.
   It all needs one good interface for kernel which provides anything
   anything from the kernel.
 

Oh, the interface for kernel you described is really interesting.
But how to store the seed/sprout relationships so that we can fetch them
correctly for umounted btrfs?

-Gui 

 
 On 10/23/14 16:52, Gui Hecheng wrote:
  On Thu, 2014-10-23 at 16:13 +0800, Anand Jain wrote:
 
  Some of the disks on my system were missing and I was able to hit
  this issue.
 
 
  
  Check tree block failed, want=12582912, have=0
  read block failed check_tree_block
  Couldn't read chunk root
  warning devid 2 not found already
  Check tree block failed, want=143360, have=0
  read block failed check_tree_block
  Couldn't read chunk root
  warning, device 4 is missing
  warning, device 3 is missing
  warning, device 2 is missing
  warning, device 1 is missing
  
 
 
  Did a bisect and it leads to this following patch.
 
  
  commit 915902c5002485fb13d27c4b699a73fb66cc0f09
 
btrfs-progs: fix device missing of btrfs fi show with seed devices
  
 
 Also this patch stalls ~2sec in the cmd btrfs fi show, on my system
 with 48 disks.
 
  Also a simple test case hits some warnings...
 
  
 mkfs.btrfs -draid1 -mraid1 /dev/sdb /dev/sdc
 mount /dev/sdb /btrfs  fillfs /btrfs 100  umount /btrfs
 wipefs -a /dev/sdb
 modprobe -r btrfs  modprobe btrfs
 mount -o degraded /dev/sdb /btrfs
 btrfs fi show
  Label: none  uuid: 9844cd05-1c8c-473e-a84b-bac95aab7bc9
Total devices 2 FS bytes used 1.59MiB
devid2 size 967.87MiB used 104.75MiB path /dev/sdc
*** Some devices missing
 
  warning, device 1 is missing
  warning, device 1 is missing
  warning devid 1 not found already
  
 
 
  Hi Anand and Petr,
 
  Oh, it seems that there are btrfs with missing devs that are bringing
  troubles to the @open_ctree_... function.
  This should be a missing case of the patch above which should only take
  effects when seeding devices are present.
  I will try my best to follow this case, suggestions are welcome, Thanks!
 
  -Gui
 
 
 
 
  On 10/23/14 14:57, Petr Janecek wrote:
  Hello,
 
 You have mentioned two issues when balance and fi show running
 concurrently
 
  my mail was a bit chaotic, but I get the stalls even on idle system.
  Today I got
 
  parent transid verify failed on 1559973888000 wanted 1819 found 1821
  parent transid verify failed on 1559973888000 wanted 1819 found 1821
  parent transid verify failed on 1559973888000 wanted 1819 found 1821
  parent transid verify failed on 1559973888000 wanted 1819 found 1821
  Ignoring transid failure
  leaf parent key incorrect 1559973888000
 
  from 'btrfs fi sh' while I was just copying something, no balance running.
 
  [...]
  [PATCH 1/1] btrfs-progs: code optimize cmd_scan_dev() use
  btrfs_register_one_device()
  [PATCH 1/2] btrfs-progs: introduce btrfs_register_all_device()
  [PATCH 2/2] btrfs-progs: optimize btrfs_scan_lblkid() for multiple calls
 
  If you could, pls..
 Now on 3.17 apply above 3 patches and see if you see any better
 performance for the stalling issue.
 
  no perceptible change: takes ~40 seconds both before and after
  applying. Old version 1 sec.
 
 can you do same steps on 3.16 and report what you observe
 
  So many rejects -- do you have older versions of these patches?
 
 
  Thanks,
 
  Petr
  --
  To unsubscribe from this list: send the line unsubscribe linux-btrfs in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 
 
  --
  To unsubscribe from this list: send the line unsubscribe linux-btrfs in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
 


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html