date:20140903

BTRFS critical (device dm-0): invalid dir item name len: 45389

2014-09-03 Thread john terragon

Hi.

When I traverse one of my btrfs, for example with a simple "find /", I
get the following in kmsg

BTRFS critical (device dm-0): invalid dir item name len: 45389

The message appears just one time (so I guess it involves just one
file/dir). dm-0 is the first dmcrypt device of a pair on which I have
btrfs in RAID0 (btrfs native raid). Though I can't be 100% sure, this
seems to be a very recent problem (I would have noticed something
"critical" in kmsg if it happened before). Everything else seems to
work fine.

So, should I be worried. Is there a way to fix this? (I assume that a
scrub would not do any good since it seems to be related to btrfs data
structures more than actual file data). Is there at least a way to
know which file/dir is involved? Maybe a verbose debug mode? Or maybe
I should just add some printk in the verify_dir_item function that
seems to generate the message.

Thanks
John
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] btrfs-progs: remove wrong set_argv0 for restore

2014-09-03 Thread Gui Hecheng

Before this patch, you could see the following after exec restore
# :too few arguments

The tool name "btrfs restore" is missing.

The @set_argv0() function is introduced by:
commit a184abc70f7b1468e6036ab576f1587ee0574668
btrfs-progs: move the check_argc_* functions into utils.c
...
Also add a new function "set_argv0" to set the correct tool name:
*btrfs-image*: too few arguments

But @set_argv0() only applies to the independent tools with
the name pattern btrfs-***.
Since restore is now is subcommand under "btrfs",
there is no need to use @set_argv0() before check_argc_* to
repair the prompt tool name before "too few arguments".

Signed-off-by: Gui Hecheng 
---
 cmds-restore.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/cmds-restore.c b/cmds-restore.c
index f909429..38a131e 100644
--- a/cmds-restore.c
+++ b/cmds-restore.c
@@ -1229,7 +1229,6 @@ int cmd_restore(int argc, char **argv)
}
}
 
-   set_argv0(argv);
if (!list_roots && check_argc_min(argc - optind, 2))
usage(cmd_restore_usage);
else if (list_roots && check_argc_min(argc - optind, 1))
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] btrfs-progs: remove btrfs_release_path before btrfs_free_path

2014-09-03 Thread Gui Hecheng

The btrfs_free_path calls btrfs_release_path internally.

Signed-off-by: Gui Hecheng 
---
 disk-io.c   | 1 -
 file-item.c | 1 -
 inode-map.c | 2 --
 3 files changed, 4 deletions(-)

diff --git a/disk-io.c b/disk-io.c
index 9e44f10..0f9f374 100644
--- a/disk-io.c
+++ b/disk-io.c
@@ -628,7 +628,6 @@ struct btrfs_root *btrfs_read_fs_root_no_cache(struct 
btrfs_fs_info *fs_info,
memcpy(&root->root_key, location, sizeof(*location));
ret = 0;
 out:
-   btrfs_release_path(path);
btrfs_free_path(path);
if (ret) {
free(root);
diff --git a/file-item.c b/file-item.c
index 6f3708b..b46d7f1 100644
--- a/file-item.c
+++ b/file-item.c
@@ -306,7 +306,6 @@ found:
csum_size);
btrfs_mark_buffer_dirty(path->nodes[0]);
 fail:
-   btrfs_release_path(path);
btrfs_free_path(path);
return ret;
 }
diff --git a/inode-map.c b/inode-map.c
index 3e138b5..1321bfb 100644
--- a/inode-map.c
+++ b/inode-map.c
@@ -90,12 +90,10 @@ int btrfs_find_free_objectid(struct btrfs_trans_handle 
*trans,
// FIXME -ENOSPC
 found:
root->last_inode_alloc = *objectid;
-   btrfs_release_path(path);
btrfs_free_path(path);
BUG_ON(*objectid < search_start);
return 0;
 error:
-   btrfs_release_path(path);
btrfs_free_path(path);
return ret;
 }
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: fs corruption report

2014-09-03 Thread Gui Hecheng

On Mon, 2014-09-01 at 15:25 +, Zooko Wilcox-OHearn wrote:
> I'm more than happy to try out patches and even focus my own brain on
> diagnosing it, if I can. I'm hoping to regain access to some of my
> files on my btrfs partition, and also I would enjoy helping get this
> improved. :-)
> 
> So if you want me to try an experiment, just email me. Unfortunately I
> can't just give you a copy of the partition, since it has confidential
> information on it.
> 
> Regards,
> 
> Zooko

Hi Zooko, Marc,

Firstly, thanks for your backtrace info, Marc.
Sorry to reply late, since I'm offline these days.
For the restore problem, I'm sure that the lzo decompress routine lacks
the ability to handle some specific extent pattern.

Here is my test result:
I'm using a specific file for test
/usr/lib/modules/$(uname -r)/kernel/net/irda/irda.ko.
You can get it easily on your own box.

# mkfs -t btrfs 
# mount -o compress-force=lzo  
# cp irda.ko 
# umount 
# btrfs restore -v  
report:
# bad compress length
# failed to inflate

btrfs-progs version: v3.16.x

With the same file under no-compress & zlib-compress,
the restore will output a correct copy of irda.ko.

I'm not sure whether the problem above has something to do with your
problem. Hope that the messages above are helpful.

-Gui

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] btrfs-progs: fix find_mount_root() to handle duplicated mount point correctly

2014-09-03 Thread Qu Wenruo

Original find_mount_root() will use the first mount point match and
return it.
It was OK until the following commit, which will also check the fstype:
de22c28ef31d9721606ba059 btrfs-progs: Check fstype in find_mount_root()

With fstype check, we should check the last match, not only the first
one.
Or the following mount will not pass the find_mount_root():
/dev/sdc on /mnt/test type ext4 (rw,relatime,data=ordered)
/dev/sdb on /mnt/test type btrfs (rw,relatime,space_cache)

This patch will use the last match to do the fstype check.

Reported-by: Remco Hosman 
Signed-off-bu: Remco Hosman 
Signed-off-by: Qu Wenruo 
---
 utils.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/utils.c b/utils.c
index 6c09366..6a16b06 100644
--- a/utils.c
+++ b/utils.c
@@ -2359,8 +2359,8 @@ int find_mount_root(const char *path, char **mount_root)
while ((ent = getmntent(mnttab))) {
len = strlen(ent->mnt_dir);
if (strncmp(ent->mnt_dir, path, len) == 0) {
-   /* match found */
-   if (longest_matchlen < len) {
+   /* match found and use the latest match */
+   if (longest_matchlen <= len) {
free(longest_match);
longest_matchlen = len;
longest_match = strdup(ent->mnt_dir);
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Btrfs stable updates for v3.16

2014-09-03 Thread Chris Mason



On 09/03/2014 07:36 PM, Holger Hoffstätte wrote:
> On Wed, 03 Sep 2014 16:50:47 -0400, Chris Mason wrote:
> 
>> Hi everyone,
>>
>> For 3.16, please pull these into stable, I've cherry picked and tested
>> them here.  For 3.15 and earlier there are a few conflicts, so I'll make
>> a git tree with things to pull.
>>
>> 8d875f95da43c6a8f18f77869f2ef26e9594fecc v3.15+
> 
> This ("fix filemap_flush call in btrfs_file_release") is the only one 
> that requires some work for 3.14.
> 
> There is one conflict in ordered.data.c - just a sligh work queue 
> submission change - and the second in transaction.c where the patch does 
> not delete enough from btrfs_flush_all_pending_stuffs(), since 3.14 still 
> has the old qgroup calls in place. I removed it wholesale and that makes 
> everything fit.
> 
> The followup ("fix filemap_flush call in btrfs_file_release") then also 
> applies.
> 
> Should they also go into the next 3.14.x stable cycle? This rename 
> deadlock sounds like a possible problem with rsync, which seems like a 
> popular use case, and I guess nobody will complain about slightly better 
> performance either.

Right, the btrfs_flush_all_pending_stuffs function can just be deleted.
 But, Liu Bo's patch isn't required on 3.14 (since the regression he
fixed came with 3.15).

And these changes are big enough that I like to test them a little here
before sending out.  I did mark that patch as 3.15+, but really that
deadlock has been there forever.  We only started seeing it with 3.15+
because other waitqueue problems made it stand out.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Btrfs stable updates for v3.16

2014-09-03 Thread Holger Hoffstätte

On Wed, 03 Sep 2014 16:50:47 -0400, Chris Mason wrote:

> Hi everyone,
> 
> For 3.16, please pull these into stable, I've cherry picked and tested
> them here.  For 3.15 and earlier there are a few conflicts, so I'll make
> a git tree with things to pull.
> 
> 8d875f95da43c6a8f18f77869f2ef26e9594fecc v3.15+

This ("fix filemap_flush call in btrfs_file_release") is the only one 
that requires some work for 3.14.

There is one conflict in ordered.data.c - just a sligh work queue 
submission change - and the second in transaction.c where the patch does 
not delete enough from btrfs_flush_all_pending_stuffs(), since 3.14 still 
has the old qgroup calls in place. I removed it wholesale and that makes 
everything fit.

The followup ("fix filemap_flush call in btrfs_file_release") then also 
applies.

Should they also go into the next 3.14.x stable cycle? This rename 
deadlock sounds like a possible problem with rsync, which seems like a 
popular use case, and I guess nobody will complain about slightly better 
performance either.

Holger

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: INFO: task btrfs-transacti:2408 blocked for more than 120 seconds.

2014-09-03 Thread Duncan

Martin Steigerwald posted on Thu, 04 Sep 2014 00:02:03 +0200 as excerpted:

> Am Mittwoch, 3. September 2014, 19:17:17 schrieben Sie:
>> At a 32 bit stable Gentoo Linux I do have 2 BTRFS file systems :
>> 
>> $ mount | grep btrfs /var/lib/portage.fs on /usr/portage type btrfs
>> (rw,noatime,compress=lzo) /var/lib/pkg.fs on /var/db/pkg type btrfs
>> (rw,noatime,compress=lzo)
>> 
>> holding a lot of small Gentoo-package-Manager-related files. The first
>> is exported via NMFS so that my KVM can access that tree too.
>> 
>> Today I got a hang while upgrading a package at the host and one within
>> the KVM at the same time, syslog tells me:
> 
> Which kernel is this?

>From the posted log:

>> Sep  3 19:10:57 n22 kernel: Not tainted 3.16.1 #5

=:^)

(FWIW even tho I don't claim to be a dev or to otherwise make much sense 
of traces like the one posted, I /have/ learned to look for the kernel 
version in a line near the top of the trace.  I can make sense of that, 
at least, and it can sometimes save a bit of confusion when the poster 
claims to be using one version but is obviously a bit confused themselves 
as the trace says it's something else.  =:^)

> If this is anything less than 3.17-rc3, I suggest you try with that one,
> or wait till the hang fix patches got into stable trees. Chris´s recent
> pull request may have been about these.

Agreed.  Very likely the following known issue:

Kernel 3.15 switched various critical btrfs tasks from private btrfs 
threads to the generic kworker kernel threads infrastructure, but in the 
process triggered a previously latent kworker lockdep bug where the 
kworker threads weren't behaving according to their documentation.

Developing and testing a proper fix to the root kworker threads behavior 
issue will probably take another kernel cycle or two, but in the mean 
time a btrfs patch working around the problem has been developed and 
tested.  It's in 3.17-rc3 and marked for stable but not yet in a stable 
release.

So 3.14 stable series wasn't affected by the problem as btrfs was still 
using private kernel threads, previous versions aren't recommended as 
they had other now known and fixed bugs, 3.15 is AFAIK not a long-term-
stable series and is unlikely to get the patch unless you apply it 
yourself, 3.16 isn't a long-term-stable either but is still supported and 
the patch is queued for the next stable release, and 3.17 thru rc2 
doesn't have the fix but it's in rc3.

So 3.17-rc3+ is the only non-git mainline kernel with the patch applied 
at this time.  For this bug you therefore have the following choices:

1) Switch to the latest 3.17 series development kernel. (Preferred)
2) Live with it until the next 3.16 stable series release.
3) Grab and apply the patch to a previous 3.15 or 3.16 stable series 
kernel yourself.
4) Revert to 3.14 stable series, which wasn't affected.
5) Turn off the compression mount option and do a rebalance to eliminate 
existing compression, as the bug only triggers when dealing with btrfs 
compression.
6) Live on the /real/ edge and switch to btrfs integration series 
kernels, with patches undergoing testing for the /next/ mainline kernel 
series (3.18 at this point).

FWIW, there's another much harder to trigger (thus only recently found, 
traced and patched) bug that goes back much farther (3.4 at least), with 
a patch in 3.17-rc3 and headed for stable series as well.  However, given 
the rarity of triggering it and the fact that people have lived with it 
until now, while that patch is good to apply to prevent future rare-case 
issues, it's not as urgent as the one above.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: INFO: task btrfs-transacti:2408 blocked for more than 120 seconds.

2014-09-03 Thread Martin Steigerwald

Am Mittwoch, 3. September 2014, 19:17:17 schrieben Sie:
> At a 32 bit stable Gentoo Linux I do have 2 BTRFS file systems :
> 
> $ mount | grep btrfs
> /var/lib/portage.fs on /usr/portage type btrfs (rw,noatime,compress=lzo)
> /var/lib/pkg.fs on /var/db/pkg type btrfs (rw,noatime,compress=lzo)
> 
> holding a lot of small Gentoo-package-Manager-related files. The first is
> exported via NMFS so that my KVM can access that tree too.
> 
> Today I got a hang while upgrading a package at the host and one within the
> KVM at the same time, syslog tells me:

Which kernel is this?

If this is anything less than 3.17-rc3, I suggest you try with that one, or 
wait till the hang fix patches got into stable trees. Chris´s recent pull 
request may have been about these.

Thanks,
Martin

> Sep  3 19:10:57 n22 kernel: INFO: task btrfs-transacti:2408 blocked for more
> than 120 seconds. Sep  3 19:10:57 n22 kernel: Not tainted 3.16.1 #5
> Sep  3 19:10:57 n22 kernel: "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message. Sep  3
> 19:10:57 n22 kernel: btrfs-transacti D  0  2408  2
> 0x Sep  3 19:10:57 n22 kernel: edb15a34 0086 c10c7efc 
> 0001 5197f8f9 1fa0 c17bf880 Sep  3 19:10:57 n22 kernel: c17bf880
> f3630880 f10167c0 c1099ed3 0002 0001  c10c7efc Sep  3
> 19:10:57 n22 kernel: 118a2543 0857  0018 c24d94d3 002180b0
> 0062c84a abf93cc6 Sep  3 19:10:57 n22 kernel: Call Trace:
> Sep  3 19:10:57 n22 kernel: [] ? __delayacct_blkio_start+0x1c/0x20
> Sep  3 19:10:57 n22 kernel: [] ? ktime_get_ts+0x83/0x180 Sep  3
> 19:10:57 n22 kernel: [] ? __delayacct_blkio_start+0x1c/0x20 Sep 
> 3 19:10:57 n22 kernel: [] ? wait_on_page_read+0x50/0x50 Sep  3
> 19:10:57 n22 kernel: [] io_schedule+0x86/0x100
> Sep  3 19:10:57 n22 kernel: [] sleep_on_page+0xd/0x20
> Sep  3 19:10:57 n22 kernel: [] __wait_on_bit+0x51/0x80
> Sep  3 19:10:57 n22 kernel: [] ? wait_on_page_read+0x50/0x50
> Sep  3 19:10:57 n22 kernel: [] wait_on_page_bit+0x83/0x90
> Sep  3 19:10:57 n22 kernel: [] ?
> autoremove_wake_function+0x40/0x40 Sep  3 19:10:57 n22 kernel: []
> read_extent_buffer_pages+0x2dc/0x2f0 Sep  3 19:10:57 n22 kernel:
> [] btree_read_extent_buffer_pages.constprop.50+0xc8/0x140 Sep  3
> 19:10:57 n22 kernel: [] ? free_root_pointers+0x50/0x50 Sep  3
> 19:10:57 n22 kernel: [] read_tree_block+0x3c/0x60 Sep  3 19:10:57
> n22 kernel: [] read_block_for_search.isra.30+0x141/0x390 Sep  3
> 19:10:57 n22 kernel: [] btrfs_search_slot+0x3a7/0x870 Sep  3
> 19:10:57 n22 kernel: [] lookup_inline_extent_backref+0x132/0x6e0
> Sep  3 19:10:57 n22 kernel: [] ? update_curr+0xeb/0x1a0
> Sep  3 19:10:57 n22 kernel: [] ? cpuacct_charge+0x6e/0x90
> Sep  3 19:10:57 n22 kernel: [] __btrfs_free_extent+0x13d/0xd10
> Sep  3 19:10:57 n22 kernel: [] ? _raw_spin_unlock+0x22/0x30
> Sep  3 19:10:57 n22 kernel: [] ?
> __btrfs_run_delayed_refs+0x117/0x1260 Sep  3 19:10:57 n22 kernel:
> [] __btrfs_run_delayed_refs+0x8d7/0x1260 Sep  3 19:10:57 n22
> kernel: [] ? finish_task_switch+0x79/0x100 Sep  3 19:10:57 n22
> kernel: [] ? mutex_unlock+0xd/0x10
> Sep  3 19:10:57 n22 kernel: []
> btrfs_run_delayed_refs.part.60+0x58/0x220 Sep  3 19:10:57 n22 kernel:
> [] ? btrfs_run_ordered_operations+0x1b7/0x240 Sep  3 19:10:57 n22
> kernel: [] btrfs_run_delayed_refs+0x14/0x30 Sep  3 19:10:57 n22
> kernel: [] btrfs_commit_transaction+0x45/0xc70 Sep  3 19:10:57
> n22 kernel: [] ? start_transaction+0x7e/0x5b0 Sep  3 19:10:57 n22
> kernel: [] transaction_kthread+0x195/0x220 Sep  3 19:10:57 n22
> kernel: [] ? btrfs_cleanup_transaction+0x490/0x490 Sep  3
> 19:10:57 n22 kernel: [] kthread+0xa6/0xc0
> Sep  3 19:10:57 n22 kernel: [] ret_from_kernel_thread+0x21/0x30
> Sep  3 19:10:57 n22 kernel: [] ?
> kthread_create_on_node+0x180/0x180 Sep  3 19:10:57 n22 kernel: 2 locks held
> by btrfs-transacti/2408:
> Sep  3 19:10:57 n22 kernel: #0: 
> (&fs_info->transaction_kthread_mutex){..}, at: []
> transaction_kthread+0x107/0x220 Sep  3 19:10:57 n22 kernel: #1: 
> (&head_ref->mutex){..}, at: []
> btrfs_delayed_ref_lock+0x2f/0x1f0
> 
> 
> Just FWIW

-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Btrfs stable updates for v3.16

2014-09-03 Thread Greg KH

On Wed, Sep 03, 2014 at 04:50:47PM -0400, Chris Mason wrote:
> Hi everyone,
> 
> For 3.16, please pull these into stable, I've cherry picked and tested
> them here.  For 3.15 and earlier there are a few conflicts, so I'll make
> a git tree with things to pull.
> 
> 8d875f95da43c6a8f18f77869f2ef26e9594fecc v3.15+
> 38c1c2e44bacb37efd68b90b3f70386a8ee370ee v3.11+
> f6dc45c7a93a011dff6eb9b2ffda59c390c7705a v3.15+
> 9e0af23764344f7f1b68e4eefbe7dc865018b63d v3.15+

Now applied to the trees I manage.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Btrfs stable updates for 3.16.x (and others)

2014-09-03 Thread Greg KH

On Tue, Aug 19, 2014 at 01:10:45PM +0200, David Sterba wrote:
> Hi stable team,
> 
> please add the following patches to stable trees.
> 
> Patch #3 applies to all currently live stables, a 7 years old bug. I've
> briefly reviewed all 3 patches against 3.10/12/14/16 (ie. 3.4 skips #1
> and #2).
> 
> Subjects:
> Btrfs: read lock extent buffer while walking backrefs
> Btrfs: fix compressed write corruption on enospc
> Btrfs: fix csum tree corruption, duplicate and outdated checksums
> Commits:
> 6f7ff6d7832c6be13e8c95598884dbc40ad69fb7

This doesn't apply to 3.10-stable :(

> ce62003f690dff38d3164a632ec69efa15c32cbf

Neither did this.

> 27b9a8122ff71a8cadfbffb9c4f0694300464f3b

Was already marked for stable.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Btrfs stable updates for v3.16

2014-09-03 Thread Chris Mason

Hi everyone,

For 3.16, please pull these into stable, I've cherry picked and tested
them here.  For 3.15 and earlier there are a few conflicts, so I'll make
a git tree with things to pull.

8d875f95da43c6a8f18f77869f2ef26e9594fecc v3.15+
38c1c2e44bacb37efd68b90b3f70386a8ee370ee v3.11+
f6dc45c7a93a011dff6eb9b2ffda59c390c7705a v3.15+
9e0af23764344f7f1b68e4eefbe7dc865018b63d v3.15+

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Large files, nodatacow and fragmentation

2014-09-03 Thread Clemens Eisserer

Hi Richard,

> It is interesting that for me the number of extents before and after
> bcache are essentially the same.
>
> The lesson here for me there is that the fragmentation of a btrfs
> nodatacow file is not mitigated by bcache. There seems to be nothing I
> can do to prevent that fragmentation, and may in fact be expected
> behavior.

This is to be expected - bcache behaves like a single, transparent
block device - so for btrfs it doesn't matter whether you run on a
"real" device or a bcache one.
The performance increase is expected, however ;)

Best regards, Clemens
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

kernel BUG at fs/btrfs/extent-tree.c:7727! with 3.17-rc3

2014-09-03 Thread Tomasz Chmielewski

Got the following with 3.17-rc3 and running balance (had to power cycle 
after that):


[ 1329.952600] [ cut here ]
[ 1329.952671] WARNING: CPU: 7 PID: 3106 at fs/btrfs/extent-tree.c:876 
btrfs_lookup_extent_info+0x377/0x3eb [btrfs]()
[ 1329.952726] Modules linked in: ipt_MASQUERADE iptable_nat 
nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack 
ip_tables x_tables cpufreq_ondemand cpufreq_conservative 
cpufreq_powersave cpufreq_stats bridge stp llc ipv6 btrfs xor raid6_pq 
zlib_deflate coretemp hwmon loop parport_pc parport pcspkr i2c_i801 
tpm_infineon tpm_tis tpm i2ccore video battery lpc_ich mfd_core ehci_pci 
ehci_hcd button acpi_cpufreq ext4 crc16 jbd2 mbcache raid1 sg sd_mod 
ahci libahci libata scsi_mod r8169 mii
[ 1329.954740] CPU: 7 PID: 3106 Comm: btrfs-balance Not tainted 
3.17.0-rc3 #1
[ 1329.954789] Hardware name: System manufacturer System Product 
Name/P8H77-M PRO, BIOS 1101 02/04/2013
[ 1329.954841]  0009 880733d4f8d8 813ab092 

[ 1329.955030]   880733d4f918 81039b41 
0007
[ 1329.955219]  a02d8560 8807aa536120  


[ 1329.955407] Call Trace:
[ 1329.955455]  [] dump_stack+0x46/0x58
[ 1329.955503]  [] warn_slowpath_common+0x77/0x91
[ 1329.955610]  [] ? 
btrfs_lookup_extent_info+0x377/0x3eb [btrfs]

[ 1329.955758]  [] warn_slowpath_null+0x15/0x17
[ 1329.955862]  [] 
btrfs_lookup_extent_info+0x377/0x3eb [btrfs]

[ 1329.956018]  [] walk_down_proc+0xc5/0x22b [btrfs]
[ 1329.956128]  [] ? 
join_transaction.isra.30+0x24/0x309 [btrfs]

[ 1329.956285]  [] walk_down_tree+0x45/0xd5 [btrfs]
[ 1329.956391]  [] btrfs_drop_snapshot+0x2f5/0x68f 
[btrfs]
[ 1329.956505]  [] merge_reloc_roots+0x139/0x23f 
[btrfs]
[ 1329.956617]  [] relocate_block_group+0x466/0x4de 
[btrfs]
[ 1329.956728]  [] 
btrfs_relocate_block_group+0x158/0x278 [btrfs]
[ 1329.956890]  [] 
btrfs_relocate_chunk.isra.62+0x58/0x5f7 [btrfs]
[ 1329.957073]  [] ? 
btrfs_set_lock_blocking_rw+0x68/0x95 [btrfs]
[ 1329.957214]  [] ? btrfs_set_path_blocking+0x23/0x54 
[btrfs]
[ 1329.957297]  [] ? btrfs_search_slot+0x7bc/0x816 
[btrfs]
[ 1329.957382]  [] ? free_extent_buffer+0x6f/0x7c 
[btrfs]

[ 1329.957467]  [] btrfs_balance+0xa7b/0xc80 [btrfs]
[ 1329.957547]  [] ? printk+0x48/0x4a
[ 1329.957629]  [] balance_kthread+0x57/0x7c [btrfs]
[ 1329.957724]  [] ? btrfs_balance+0xc80/0xc80 [btrfs]
[ 1329.957807]  [] ? btrfs_balance+0xc80/0xc80 [btrfs]
[ 1329.957887]  [] kthread+0xcd/0xd5
[ 1329.957965]  [] ? 
kthread_freezable_should_stop+0x43/0x43

[ 1329.958045]  [] ret_from_fork+0x7c/0xb0
[ 1329.958122]  [] ? 
kthread_freezable_should_stop+0x43/0x43

[ 1329.958210] ---[ end trace a368b0643f9207e2 ]---
[ 1329.958293] [ cut here ]
[ 1329.958378] kernel BUG at fs/btrfs/extent-tree.c:7727!
[ 1329.958455] invalid opcode:  [#1] SMP
[ 1329.958593] Modules linked in: ipt_MASQUERADE iptable_nat 
nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack 
ip_tables x_tables cpufreq_ondemand cpufreq_conservative 
cpufreq_powersave cpufreq_stats bridge stp llc ipv6 btrfs xor raid6_pq 
zlib_deflate coretemp hwmon loop parport_pc parport pcspkr i2c_i801 
tpm_infineon tpm_tis tpm i2ccore video battery lpc_ich mfd_core ehci_pci 
ehci_hcd button acpi_cpufreq ext4 crc16 jbd2 mbcache raid1 sg sd_mod 
ahci libahci libata scsi_mod r8169 mii
[ 1329.960684] CPU: 7 PID: 3106 Comm: btrfs-balance Tainted: GW  
3.17.0-rc3 #1
[ 1329.960803] Hardware name: System manufacturer System Product 
Name/P8H77-M PRO, BIOS 1101 02/04/2013
[ 1329.960924] task: 8807f18c ti: 880733d4c000 task.ti: 
880733d4c000
[ 1329.961043] RIP: 0010:[]  [] 
walk_down_proc+0xdc/0x22b [btrfs]

[ 1329.961200] RSP: 0018:880733d4f9e8  EFLAGS: 00010246
[ 1329.961277] RAX:  RBX: 0002 RCX: 
000f5a50
[ 1329.961356] RDX: 000f5a4f RSI: 88081fbd9650 RDI: 
00019650
[ 1329.961436] RBP: 880733d4fa38 R08: ea001ea94d80 R09: 
09a2
[ 1329.961515] R10: a02cbc38 R11:  R12: 
8807aa536d80
[ 1329.961594] R13: 880733ac5600 R14: 880660ba65c8 R15: 
0002
[ 1329.961674] FS:  () GS:88081fbc() 
knlGS:

[ 1329.961794] CS:  0010 DS:  ES:  CR0: 80050033
[ 1329.961872] CR2: 7f7fa0c3e000 CR3: 01611000 CR4: 
001407e0

[ 1329.961951] Stack:
[ 1329.962024]  880733ac5650 a02ebd20 880732a34820 
8807eb201000
[ 1329.962267]   8807aa536d80 0002 
880732a34820
[ 1329.962510]  8807eb201000 880733ac5600 880733d4fa98 
a02dae92

[ 1329.962754] Call Trace:
[ 1329.962834]  [] ? 
join_transaction.isra.30+0x24/0x309 [btrfs]

[ 1329.962957]  [] walk_down_tree+0x45/0xd5 [btrfs]
[ 1329.963040]  [] btrfs_drop_snapshot+0x2f5/0x68f 
[btrfs]
[ 1329.963126]  [] merge_reloc

INFO: task btrfs-transacti:2408 blocked for more than 120 seconds.

2014-09-03 Thread Toralf Förster

At a 32 bit stable Gentoo Linux I do have 2 BTRFS file systems :

$ mount | grep btrfs
/var/lib/portage.fs on /usr/portage type btrfs (rw,noatime,compress=lzo)
/var/lib/pkg.fs on /var/db/pkg type btrfs (rw,noatime,compress=lzo)

holding a lot of small Gentoo-package-Manager-related files. The first is 
exported via NMFS so that my KVM can access that tree too.

Today I got a hang while upgrading a package at the host and one within the KVM 
at the same time, syslog tells me:


Sep  3 19:10:57 n22 kernel: INFO: task btrfs-transacti:2408 blocked for more 
than 120 seconds.
Sep  3 19:10:57 n22 kernel: Not tainted 3.16.1 #5
Sep  3 19:10:57 n22 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
Sep  3 19:10:57 n22 kernel: btrfs-transacti D  0  2408  2 
0x
Sep  3 19:10:57 n22 kernel: edb15a34 0086 c10c7efc  0001 
5197f8f9 1fa0 c17bf880
Sep  3 19:10:57 n22 kernel: c17bf880 f3630880 f10167c0 c1099ed3 0002 
0001  c10c7efc
Sep  3 19:10:57 n22 kernel: 118a2543 0857  0018 c24d94d3 
002180b0 0062c84a abf93cc6
Sep  3 19:10:57 n22 kernel: Call Trace:
Sep  3 19:10:57 n22 kernel: [] ? __delayacct_blkio_start+0x1c/0x20
Sep  3 19:10:57 n22 kernel: [] ? ktime_get_ts+0x83/0x180
Sep  3 19:10:57 n22 kernel: [] ? __delayacct_blkio_start+0x1c/0x20
Sep  3 19:10:57 n22 kernel: [] ? wait_on_page_read+0x50/0x50
Sep  3 19:10:57 n22 kernel: [] io_schedule+0x86/0x100
Sep  3 19:10:57 n22 kernel: [] sleep_on_page+0xd/0x20
Sep  3 19:10:57 n22 kernel: [] __wait_on_bit+0x51/0x80
Sep  3 19:10:57 n22 kernel: [] ? wait_on_page_read+0x50/0x50
Sep  3 19:10:57 n22 kernel: [] wait_on_page_bit+0x83/0x90
Sep  3 19:10:57 n22 kernel: [] ? autoremove_wake_function+0x40/0x40
Sep  3 19:10:57 n22 kernel: [] read_extent_buffer_pages+0x2dc/0x2f0
Sep  3 19:10:57 n22 kernel: [] 
btree_read_extent_buffer_pages.constprop.50+0xc8/0x140
Sep  3 19:10:57 n22 kernel: [] ? free_root_pointers+0x50/0x50
Sep  3 19:10:57 n22 kernel: [] read_tree_block+0x3c/0x60
Sep  3 19:10:57 n22 kernel: [] 
read_block_for_search.isra.30+0x141/0x390
Sep  3 19:10:57 n22 kernel: [] btrfs_search_slot+0x3a7/0x870
Sep  3 19:10:57 n22 kernel: [] 
lookup_inline_extent_backref+0x132/0x6e0
Sep  3 19:10:57 n22 kernel: [] ? update_curr+0xeb/0x1a0
Sep  3 19:10:57 n22 kernel: [] ? cpuacct_charge+0x6e/0x90
Sep  3 19:10:57 n22 kernel: [] __btrfs_free_extent+0x13d/0xd10
Sep  3 19:10:57 n22 kernel: [] ? _raw_spin_unlock+0x22/0x30
Sep  3 19:10:57 n22 kernel: [] ? __btrfs_run_delayed_refs+0x117/0x1260
Sep  3 19:10:57 n22 kernel: [] __btrfs_run_delayed_refs+0x8d7/0x1260
Sep  3 19:10:57 n22 kernel: [] ? finish_task_switch+0x79/0x100
Sep  3 19:10:57 n22 kernel: [] ? mutex_unlock+0xd/0x10
Sep  3 19:10:57 n22 kernel: [] 
btrfs_run_delayed_refs.part.60+0x58/0x220
Sep  3 19:10:57 n22 kernel: [] ? 
btrfs_run_ordered_operations+0x1b7/0x240
Sep  3 19:10:57 n22 kernel: [] btrfs_run_delayed_refs+0x14/0x30
Sep  3 19:10:57 n22 kernel: [] btrfs_commit_transaction+0x45/0xc70
Sep  3 19:10:57 n22 kernel: [] ? start_transaction+0x7e/0x5b0
Sep  3 19:10:57 n22 kernel: [] transaction_kthread+0x195/0x220
Sep  3 19:10:57 n22 kernel: [] ? btrfs_cleanup_transaction+0x490/0x490
Sep  3 19:10:57 n22 kernel: [] kthread+0xa6/0xc0
Sep  3 19:10:57 n22 kernel: [] ret_from_kernel_thread+0x21/0x30
Sep  3 19:10:57 n22 kernel: [] ? kthread_create_on_node+0x180/0x180
Sep  3 19:10:57 n22 kernel: 2 locks held by btrfs-transacti/2408:
Sep  3 19:10:57 n22 kernel: #0:  (&fs_info->transaction_kthread_mutex){..}, 
at: [] transaction_kthread+0x107/0x220
Sep  3 19:10:57 n22 kernel: #1:  (&head_ref->mutex){..}, at: [] 
btrfs_delayed_ref_lock+0x2f/0x1f0


Just FWIW


-- 
Toralf
pgp key: 0076 E94E

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Large files, nodatacow and fragmentation

2014-09-03 Thread G. Richard Bellamy

It is interesting that for me the number of extents before and after
bcache are essentially the same.

The lesson here for me there is that the fragmentation of a btrfs
nodatacow file is not mitigated by bcache. There seems to be nothing I
can do to prevent that fragmentation, and may in fact be expected
behavior.

I cannot prove that adding the SSD bcache front-end improved
performance of the guest VM, though subjectively it seems to have had
a positive effect.

There is something systemically pathological with the VM in question,
but that's a different mailing list. :)

-rb

On Tue, Sep 2, 2014 at 11:26 PM, Chris Murphy  wrote:
>
> On Sep 3, 2014, at 12:01 AM, Chris Murphy  wrote:
>
>> I created two pools, one xfs one btrfs, default formatting and mount 
>> options. I then created a qcow2 file on each using virt-manager, also using 
>> default options. And default caching (whatever that is, I think it's 
>> writethrough but don't hold me to it).
>
> On the btrfs qcow2, xattr C was set.
>
>
> Chris Murphy--
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: kernel 3.17-rc3: task rsync:2524 blocked for more than 120 seconds

2014-09-03 Thread Chris Murphy


On Sep 3, 2014, at 8:11 AM, john terragon  wrote:

> It's a usb2 device but doesn't it seem kind of slow?

Not atypical, I have one that's the same, and another that's ~21MB/s, both are 
USB 2. 

[Certain older Apple Mac firmware boot faster with the slow stick than the fast 
one, and it turns out the block size matters. Block size 512 bytes is insanely 
slow (as in 100KB/s) on the "fast" stick, whereas a block size of even 32k puts 
it to 20+MB/s. So I think the older firmware must be initially asking for 512 
byte blocks, once the kernel takes over the performance is very good.]

Chris Murphy

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: kernel 3.17-rc3: task rsync:2524 blocked for more than 120 seconds

2014-09-03 Thread john terragon

I wasn't sure what you meant with  so I dd'd
all the three possible cases:

1) here's the dmcrypt device on which I mkfs.btrfs

   2097152000 bytes (2.1 GB) copied, 487.265 s, 4.3 MB/s

2) here's the partition of the usb stick (which has another partition
containing /boot) on top of which the dmcrypt device is created

  2097152000 bytes (2.1 GB) copied, 449.693 s, 4.7 MB/s

3) here's the whole usb stick device

  2097152000 bytes (2.1 GB) copied, 448.003 s, 4.7 MB/s

It's a usb2 device but doesn't it seem kind of slow?

Thanks
John


On Wed, Sep 3, 2014 at 2:36 PM, Chris Mason  wrote:
> On 09/02/2014 09:31 PM, john terragon wrote:
>> Rsync finished. FWIW in the end it reported an average speed of about
>>  900K/sec. Without autodefrag there have been no messages about hung
>> kworkers even though rsync seemingly keeps getting hung for several
>> minutes throughout the whole execution.
>
> So lets take a step back and figure out how fast the usb stick actually is.
> This will erase your usb stick, but give us an idea of its performance:
>
> dd if=/dev/zero of=/dev/ bs=20M oflag=direct 
> count=100
>
> Note again, the above command will erase your usb stick ;)  Use whatever 
> device name
> you've been sending to mkfs.btrfs
>
> The kernel will allow a pretty significant amount of ram to be dirtied before
> forcing writeback, which is why you're seeing rsync stall at seemingly strange
> intervals.  In the base of btrfs with compression, we add some worker threads 
> between
> rsync and the device, and these may be turning the writeback into a somewhat
> more bursty operation.
>
> -chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 18/18] Btrfs: modify rw_devices counter under chunk_mutex context

2014-09-03 Thread Miao Xie

rw_devices counter is often used to tune the profile when doing chunk 
allocation,
so we should modify it under the chunk_mutex context to avoid getting wrong
chunk profile.

Signed-off-by: Miao Xie 
---
 fs/btrfs/volumes.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index b7f093d..1aacf5f 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1649,8 +1649,8 @@ int btrfs_rm_device(struct btrfs_root *root, char 
*device_path)
if (device->writeable) {
lock_chunks(root);
list_del_init(&device->dev_alloc_list);
+   device->fs_devices->rw_devices--;
unlock_chunks(root);
-   root->fs_info->fs_devices->rw_devices--;
clear_super = true;
}
 
@@ -1795,8 +1795,8 @@ error_undo:
lock_chunks(root);
list_add(&device->dev_alloc_list,
 &root->fs_info->fs_devices->alloc_list);
+   device->fs_devices->rw_devices++;
unlock_chunks(root);
-   root->fs_info->fs_devices->rw_devices++;
}
goto error_brelse;
 }
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 13/18] Btrfs: fix unprotected device list access when cloning fs devices

2014-09-03 Thread Miao Xie

We can build a new filesystem based a seed filesystem, and we need clone
the fs devices when we open the new filesystem. But someone might clear
the seed flag of the seed filesystem, then mount that filesystem and
remove some device. If we mount the new filesystem, we might access
a device list which was being changed when we clone the fs devices.
Fix it.

Signed-off-by: Miao Xie 
---
 fs/btrfs/volumes.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 357f911..f0173b1 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -583,6 +583,7 @@ static struct btrfs_fs_devices *clone_fs_devices(struct 
btrfs_fs_devices *orig)
if (IS_ERR(fs_devices))
return fs_devices;
 
+   mutex_lock(&orig->device_list_mutex);
fs_devices->total_devices = orig->total_devices;
 
/* We have held the volume lock, it is safe to get the devices. */
@@ -611,8 +612,10 @@ static struct btrfs_fs_devices *clone_fs_devices(struct 
btrfs_fs_devices *orig)
device->fs_devices = fs_devices;
fs_devices->num_devices++;
}
+   mutex_unlock(&orig->device_list_mutex);
return fs_devices;
 error:
+   mutex_unlock(&orig->device_list_mutex);
free_fs_devices(fs_devices);
return ERR_PTR(-ENOMEM);
 }
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/5] block: export disk_class and disk_type for btrfs

2014-09-03 Thread Miao Xie

Btrfs can make filesystem cross several disks/partitions, in order to
load all the disks/partitions which belong to the same filesystem, we
need scan the system and find all the devices, and then register them
into the kernel. Currently, we do it by user tool. But if we forget to
do it, we can not mount the filesystem. So I want btrfs scan the system
and find all the devices by itself in the kernel. In order to implement
it, we need disk_class and disk_type, so export them.

Signed-off-by: Miao Xie 
---
 block/genhd.c | 7 +--
 include/linux/genhd.h | 1 +
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/block/genhd.c b/block/genhd.c
index 791f419..8371c09 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -34,7 +34,7 @@ struct kobject *block_depr;
 static DEFINE_MUTEX(ext_devt_mutex);
 static DEFINE_IDR(ext_devt_idr);
 
-static struct device_type disk_type;
+struct device_type disk_type;
 
 static void disk_check_events(struct disk_events *ev,
  unsigned int *clearing_ptr);
@@ -1107,9 +1107,11 @@ static void disk_release(struct device *dev)
blk_put_queue(disk->queue);
kfree(disk);
 }
+
 struct class block_class = {
.name   = "block",
 };
+EXPORT_SYMBOL(block_class);
 
 static char *block_devnode(struct device *dev, umode_t *mode,
   kuid_t *uid, kgid_t *gid)
@@ -1121,12 +1123,13 @@ static char *block_devnode(struct device *dev, umode_t 
*mode,
return NULL;
 }
 
-static struct device_type disk_type = {
+struct device_type disk_type = {
.name   = "disk",
.groups = disk_attr_groups,
.release= disk_release,
.devnode= block_devnode,
 };
+EXPORT_SYMBOL(disk_type);
 
 #ifdef CONFIG_PROC_FS
 /*
diff --git a/include/linux/genhd.h b/include/linux/genhd.h
index ec274e0..a701ace 100644
--- a/include/linux/genhd.h
+++ b/include/linux/genhd.h
@@ -22,6 +22,7 @@
 #define part_to_dev(part)  (&((part)->__dev))
 
 extern struct device_type part_type;
+extern struct device_type disk_type;
 extern struct kobject *block_depr;
 extern struct class block_class;
 
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 15/18] Btrfs: make the logic of source device removing more clear

2014-09-03 Thread Miao Xie

Signed-off-by: Miao Xie 
---
 fs/btrfs/dev-replace.c |  3 +--
 fs/btrfs/volumes.c | 19 +++
 2 files changed, 8 insertions(+), 14 deletions(-)

diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index e9cbbdb..6f662b3 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -569,8 +569,7 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info 
*fs_info,
if (fs_info->fs_devices->latest_bdev == src_device->bdev)
fs_info->fs_devices->latest_bdev = tgt_device->bdev;
list_add(&tgt_device->dev_alloc_list, &fs_info->fs_devices->alloc_list);
-   if (src_device->fs_devices->seeding)
-   fs_info->fs_devices->rw_devices++;
+   fs_info->fs_devices->rw_devices++;
 
/* replace the sysfs entry */
btrfs_kobj_rm_device(fs_info, src_device);
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 24d7001..fd8141e 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1819,23 +1819,18 @@ void btrfs_rm_dev_replace_srcdev(struct btrfs_fs_info 
*fs_info,
list_del_rcu(&srcdev->dev_list);
list_del_rcu(&srcdev->dev_alloc_list);
fs_devices->num_devices--;
-   if (srcdev->missing) {
+   if (srcdev->missing)
fs_devices->missing_devices--;
-   if (!fs_devices->seeding)
-   fs_devices->rw_devices++;
+
+   if (srcdev->writeable) {
+   fs_devices->rw_devices--;
+   /* zero out the old super if it is writable */
+   btrfs_scratch_superblock(srcdev);
}
 
-   if (srcdev->bdev) {
+   if (srcdev->bdev)
fs_devices->open_devices--;
 
-   /*
-* zero out the old super if it is not writable
-* (e.g. seed device)
-*/
-   if (srcdev->writeable)
-   btrfs_scratch_superblock(srcdev);
-   }
-
call_rcu(&srcdev->rcu, free_device);
 
/*
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH RFC 0/5] Scan all devices to build fs device list

2014-09-03 Thread Miao Xie

This patchset implements device list automatic building function. As we
know, currently we need scan the devices to build device list by a user tool
before mounting the filesystem, especially mount the filesystem after
we re-install btrfs module. It is not convenient. This patchset can improve
that problem. With this patchset, we will scan all the devices in the
system to build the device list if we find the number of the devices
is not right when we mount the filesystem. By this way, we needn't scan
the device by the user tool and reduce the mount failure probability due
to the incomplete device list.

---
Miao Xie (5):
  block: export disk_class and disk_type for btrfs
  Btrfs: don't return btrfs_fs_devices if the caller doesn't want it
  Btrfs: restructure btrfs_scan_one_device
  Btrfs: restructure btrfs_get_bdev_and_sb and pick up some code used
later
  Btrfs: scan all the devices and build the fs device list by btrfs's
self

 block/genhd.c |   7 +-
 fs/btrfs/super.c  |   3 +
 fs/btrfs/volumes.c| 227 --
 fs/btrfs/volumes.h|   5 +-
 include/linux/genhd.h |   1 +
 5 files changed, 177 insertions(+), 66 deletions(-)

-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 11/18] Btrfs: fix unprotected device list access when getting the fs information

2014-09-03 Thread Miao Xie

When we get the fs information, we forgot to acquire the mutex of device list,
it might cause the problem we might access a device that was removed. Fix
it by acquiring the device list mutex.

Signed-off-by: Miao Xie 
---
 fs/btrfs/super.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 089991d..6b98358 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1703,7 +1703,11 @@ static int btrfs_statfs(struct dentry *dentry, struct 
kstatfs *buf)
struct btrfs_block_rsv *block_rsv = &fs_info->global_block_rsv;
int ret;
 
-   /* holding chunk_muext to avoid allocating new chunks */
+   /*
+* holding chunk_muext to avoid allocating new chunks, holding
+* device_list_mutex to avoid the device being removed
+*/
+   mutex_lock(&fs_info->fs_devices->device_list_mutex);
mutex_lock(&fs_info->chunk_mutex);
rcu_read_lock();
list_for_each_entry_rcu(found, head, list) {
@@ -1744,11 +1748,13 @@ static int btrfs_statfs(struct dentry *dentry, struct 
kstatfs *buf)
ret = btrfs_calc_avail_data_space(fs_info->tree_root, &total_free_data);
if (ret) {
mutex_unlock(&fs_info->chunk_mutex);
+   mutex_unlock(&fs_info->fs_devices->device_list_mutex);
return ret;
}
buf->f_bavail += div_u64(total_free_data, factor);
buf->f_bavail = buf->f_bavail >> bits;
mutex_unlock(&fs_info->chunk_mutex);
+   mutex_unlock(&fs_info->fs_devices->device_list_mutex);
 
buf->f_type = BTRFS_SUPER_MAGIC;
buf->f_bsize = dentry->d_sb->s_blocksize;
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 09/18] Btrfs: fix unprotected device's variants on 32bits machine

2014-09-03 Thread Miao Xie

->total_bytes,->disk_total_bytes,->bytes_used is protected by chunk
lock when we change them, but sometimes we read them without any lock,
and we might get unexpected value. We fix this problem like inode's
i_size.

Signed-off-by: Miao Xie 
---
 fs/btrfs/dev-replace.c | 15 +
 fs/btrfs/ioctl.c   |  6 ++--
 fs/btrfs/volumes.c | 48 +
 fs/btrfs/volumes.h | 84 ++
 4 files changed, 124 insertions(+), 29 deletions(-)

diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index 1be03d8..da7ac14 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -418,7 +418,7 @@ int btrfs_dev_replace_start(struct btrfs_root *root,
 
/* the disk copy procedure reuses the scrub code */
ret = btrfs_scrub_dev(fs_info, src_device->devid, 0,
- src_device->total_bytes,
+ btrfs_device_get_total_bytes(src_device),
  &dev_replace->scrub_progress, 0, 1);
 
ret = btrfs_dev_replace_finishing(root->fs_info, ret);
@@ -555,11 +555,12 @@ static int btrfs_dev_replace_finishing(struct 
btrfs_fs_info *fs_info,
memcpy(uuid_tmp, tgt_device->uuid, sizeof(uuid_tmp));
memcpy(tgt_device->uuid, src_device->uuid, sizeof(tgt_device->uuid));
memcpy(src_device->uuid, uuid_tmp, sizeof(src_device->uuid));
-   tgt_device->total_bytes = src_device->total_bytes;
-   tgt_device->disk_total_bytes = src_device->disk_total_bytes;
+   btrfs_device_set_total_bytes(tgt_device, src_device->total_bytes);
+   btrfs_device_set_disk_total_bytes(tgt_device,
+ src_device->disk_total_bytes);
+   btrfs_device_set_bytes_used(tgt_device, src_device->bytes_used);
ASSERT(list_empty(&src_device->resized_list));
tgt_device->commit_total_bytes = src_device->commit_total_bytes;
-   tgt_device->bytes_used = src_device->bytes_used;
tgt_device->commit_bytes_used = src_device->bytes_used;
if (fs_info->sb->s_bdev == src_device->bdev)
fs_info->sb->s_bdev = tgt_device->bdev;
@@ -650,6 +651,7 @@ void btrfs_dev_replace_status(struct btrfs_fs_info *fs_info,
  struct btrfs_ioctl_dev_replace_args *args)
 {
struct btrfs_dev_replace *dev_replace = &fs_info->dev_replace;
+   struct btrfs_device *srcdev;
 
btrfs_dev_replace_lock(dev_replace);
/* even if !dev_replace_is_valid, the values are good enough for
@@ -672,8 +674,9 @@ void btrfs_dev_replace_status(struct btrfs_fs_info *fs_info,
break;
case BTRFS_IOCTL_DEV_REPLACE_STATE_STARTED:
case BTRFS_IOCTL_DEV_REPLACE_STATE_SUSPENDED:
+   srcdev = dev_replace->srcdev;
args->status.progress_1000 = div64_u64(dev_replace->cursor_left,
-   div64_u64(dev_replace->srcdev->total_bytes, 1000));
+   div64_u64(btrfs_device_get_total_bytes(srcdev), 1000));
break;
}
btrfs_dev_replace_unlock(dev_replace);
@@ -832,7 +835,7 @@ static int btrfs_dev_replace_continue_on_mount(struct 
btrfs_fs_info *fs_info)
 
ret = btrfs_scrub_dev(fs_info, dev_replace->srcdev->devid,
  dev_replace->committed_cursor_left,
- dev_replace->srcdev->total_bytes,
+ btrfs_device_get_total_bytes(dev_replace->srcdev),
  &dev_replace->scrub_progress, 0, 1);
ret = btrfs_dev_replace_finishing(fs_info, ret);
WARN_ON(ret);
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index c692c36..e78d9f9 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1550,7 +1550,7 @@ static noinline int btrfs_ioctl_resize(struct file *file,
goto out_free;
}
 
-   old_size = device->total_bytes;
+   old_size = btrfs_device_get_total_bytes(device);
 
if (mod < 0) {
if (new_size > old_size) {
@@ -2732,8 +2732,8 @@ static long btrfs_ioctl_dev_info(struct btrfs_root *root, 
void __user *arg)
}
 
di_args->devid = dev->devid;
-   di_args->bytes_used = dev->bytes_used;
-   di_args->total_bytes = dev->total_bytes;
+   di_args->bytes_used = btrfs_device_get_bytes_used(dev);
+   di_args->total_bytes = btrfs_device_get_total_bytes(dev);
memcpy(di_args->uuid, dev->uuid, sizeof(di_args->uuid));
if (dev->name) {
struct rcu_string *name;
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index d8e4a3d..41da102 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1308,7 +1308,7 @@ again:
 
if (device->bytes_used > 0) {
u64 len = btrfs_dev_extent_length(leaf, extent);
-   device->bytes_used -= len;
+   btrfs_device_set_bytes_used(device, device->bytes_used - len);
spin_lock(&roo

[PATCH 07/18] Btrfs: fix unprotected device->bytes_used update

2014-09-03 Thread Miao Xie

We should update device->bytes_used in the lock context of
chunk_mutex, or we would get wrong data.

Signed-off-by: Miao Xie 
---
 fs/btrfs/volumes.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 1524b3f..45e0b5d 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -4429,6 +4429,9 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle 
*trans,
if (ret)
goto error_del_extent;
 
+   for (i = 0; i < map->num_stripes; i++)
+   map->stripes[i].dev->bytes_used += stripe_size;
+
free_extent_map(em);
check_raid56_incompat_flag(extent_root->fs_info, type);
 
@@ -4500,7 +4503,6 @@ int btrfs_finish_chunk_alloc(struct btrfs_trans_handle 
*trans,
device = map->stripes[i].dev;
dev_offset = map->stripes[i].physical;
 
-   device->bytes_used += stripe_size;
ret = btrfs_update_device(trans, device);
if (ret)
goto out;
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 05/18] Btrfs: fix wrong device bytes_used in the super block

2014-09-03 Thread Miao Xie

device->bytes_used will be changed when allocating a new chunk, and
disk_total_size will be changed if resizing is successful.
Meanwhile, the on-disk super blocks of the previous transaction
might not be updated. Considering the consistency of the metadata
in the previous transaction, We should use the size in the previous
transaction to check if the super block is beyond the boundary
of the device.

Though it is not big problem because we don't use it now, but anyway
it is better that we make it be consistent with the common metadata,
maybe we will use it in the future.

Signed-off-by: Miao Xie 
---
 fs/btrfs/dev-replace.c |  3 +++
 fs/btrfs/disk-io.c |  3 ++-
 fs/btrfs/transaction.c |  1 +
 fs/btrfs/volumes.c | 27 +++
 fs/btrfs/volumes.h |  4 
 5 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index 7877b0f..1be03d8 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -172,6 +172,8 @@ no_valid_dev_replace_entry_found:
dev_replace->srcdev->commit_total_bytes;
dev_replace->tgtdev->bytes_used =
dev_replace->srcdev->bytes_used;
+   dev_replace->tgtdev->commit_bytes_used =
+   dev_replace->srcdev->commit_bytes_used;
}
dev_replace->tgtdev->is_tgtdev_for_dev_replace = 1;
btrfs_init_dev_replace_tgtdev_for_resume(fs_info,
@@ -558,6 +560,7 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info 
*fs_info,
ASSERT(list_empty(&src_device->resized_list));
tgt_device->commit_total_bytes = src_device->commit_total_bytes;
tgt_device->bytes_used = src_device->bytes_used;
+   tgt_device->commit_bytes_used = src_device->bytes_used;
if (fs_info->sb->s_bdev == src_device->bdev)
fs_info->sb->s_bdev = tgt_device->bdev;
if (fs_info->fs_devices->latest_bdev == src_device->bdev)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 0c7ae0e..ff3ee22 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3450,7 +3450,8 @@ static int write_all_supers(struct btrfs_root *root, int 
max_mirrors)
btrfs_set_stack_device_id(dev_item, dev->devid);
btrfs_set_stack_device_total_bytes(dev_item,
   dev->commit_total_bytes);
-   btrfs_set_stack_device_bytes_used(dev_item, dev->bytes_used);
+   btrfs_set_stack_device_bytes_used(dev_item,
+ dev->commit_bytes_used);
btrfs_set_stack_device_io_align(dev_item, dev->io_align);
btrfs_set_stack_device_io_width(dev_item, dev->io_width);
btrfs_set_stack_device_sector_size(dev_item, dev->sector_size);
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 2f7c0be..16d0c1b 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -1869,6 +1869,7 @@ int btrfs_commit_transaction(struct btrfs_trans_handle 
*trans,
   sizeof(*root->fs_info->super_copy));
 
btrfs_update_commit_device_size(root->fs_info);
+   btrfs_update_commit_device_bytes_used(root, cur_trans);
 
spin_lock(&root->fs_info->trans_lock);
cur_trans->state = TRANS_STATE_UNBLOCKED;
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 7b5c042..f8273bb 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2370,6 +2370,7 @@ int btrfs_init_dev_replace_tgtdev(struct btrfs_root 
*root, char *device_path,
ASSERT(list_empty(&srcdev->resized_list));
device->commit_total_bytes = srcdev->commit_total_bytes;
device->bytes_used = srcdev->bytes_used;
+   device->commit_bytes_used = device->bytes_used;
device->dev_root = fs_info->dev_root;
device->bdev = bdev;
device->in_fs_metadata = 1;
@@ -6009,6 +6010,7 @@ static void fill_device_from_item(struct extent_buffer 
*leaf,
device->total_bytes = device->disk_total_bytes;
device->commit_total_bytes = device->disk_total_bytes;
device->bytes_used = btrfs_device_bytes_used(leaf, dev_item);
+   device->commit_bytes_used = device->bytes_used;
device->type = btrfs_device_type(leaf, dev_item);
device->io_align = btrfs_device_io_align(leaf, dev_item);
device->io_width = btrfs_device_io_width(leaf, dev_item);
@@ -6558,3 +6560,28 @@ void btrfs_update_commit_device_size(struct 
btrfs_fs_info *fs_info)
unlock_chunks(fs_info->dev_root);
mutex_unlock(&fs_devices->device_list_mutex);
 }
+
+/* Must be invoked during the transaction commit */
+void btrfs_update_commit_device_bytes_used(struct btrfs_root *root,
+   struct btrfs_transaction *transaction)
+{
+   struct extent_ma

[PATCH 5/5] Btrfs: scan all the devices and build the fs device list by btrfs's self

2014-09-03 Thread Miao Xie

The original code need scan the devices and build the fs device list by the user
tool by udev or users' selves. It is flexible. But if someone re-install the
filesystem module, and forget to scan the devices by himself, or we plug some
devices with btrfs, but udev thread is blocked and doesn't register the disk
into btrfs in time, the filesystem would report that "can not open some device"
when mounting the filesystem, it was uncomfortable, this patch fixes this 
problem
by scanning all the devices if we find the number of devices is not right when
we mount the filesystem.

Signed-off-by: Miao Xie 
---
 fs/btrfs/super.c   |   3 ++
 fs/btrfs/volumes.c | 107 +++--
 fs/btrfs/volumes.h |   5 ++-
 3 files changed, 103 insertions(+), 12 deletions(-)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 6b98358..2a8c664 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1264,6 +1264,9 @@ static struct dentry *btrfs_mount(struct file_system_type 
*fs_type, int flags,
if (error)
return ERR_PTR(error);
 
+   if (fs_devices->num_devices != fs_devices->total_devices)
+   btrfs_scan_all_devices(fs_type);
+
/*
 * Setup a dummy root and fs_info for test/set super.  This is because
 * we don't actually fill this stuff out until open_ctree, but we need
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 9d52fd8..aa4665e 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include "ctree.h"
 #include "extent_map.h"
@@ -236,6 +237,29 @@ btrfs_get_bdev_and_sb_by_path(const char *device_path, 
fmode_t flags,
return 0;
 }
 
+static int
+btrfs_get_bdev_and_sb_by_dev(dev_t dev, fmode_t flags, void *holder, int flush,
+struct block_device **bdev,
+struct buffer_head **bh)
+{
+   int ret;
+
+   *bdev = blkdev_get_by_dev(dev, flags, holder);
+   if (IS_ERR(*bdev)) {
+   printk(KERN_INFO "BTRFS: open device %d:%d failed\n",
+  MAJOR(dev), MINOR(dev));
+   return PTR_ERR(*bdev);
+   }
+
+   ret = __btrfs_get_sb(*bdev, flush, bh);
+   if (ret) {
+   blkdev_put(*bdev, flags);
+   return ret;
+   }
+
+   return 0;
+}
+
 static void requeue_list(struct btrfs_pending_bios *pending_bios,
struct bio *head, struct bio *tail)
 {
@@ -466,8 +490,9 @@ static void pending_bios_fn(struct btrfs_work *work)
  * < 0 - error
  */
 static noinline int device_list_add(const char *path,
-  struct btrfs_super_block *disk_super,
-  u64 devid, struct btrfs_fs_devices **fs_devices_ret)
+   struct btrfs_super_block *disk_super,
+   u64 devid, dev_t devnum,
+   struct btrfs_fs_devices **fs_devices_ret)
 {
struct btrfs_device *device;
struct btrfs_fs_devices *fs_devices;
@@ -493,7 +518,7 @@ static noinline int device_list_add(const char *path,
if (fs_devices->opened)
return -EBUSY;
 
-   device = btrfs_alloc_device(NULL, &devid,
+   device = btrfs_alloc_device(NULL, &devid, devnum,
disk_super->dev_item.uuid);
if (IS_ERR(device)) {
/* we can safely leave the fs_devices entry around */
@@ -561,6 +586,7 @@ static noinline int device_list_add(const char *path,
if (device->missing) {
fs_devices->missing_devices--;
device->missing = 0;
+   device->devnum = devnum;
}
}
 
@@ -597,7 +623,7 @@ static struct btrfs_fs_devices *clone_fs_devices(struct 
btrfs_fs_devices *orig)
struct rcu_string *name;
 
device = btrfs_alloc_device(NULL, &orig_dev->devid,
-   orig_dev->uuid);
+   orig_dev->devnum, orig_dev->uuid);
if (IS_ERR(device))
goto error;
 
@@ -735,7 +761,7 @@ static int __btrfs_close_devices(struct btrfs_fs_devices 
*fs_devices)
fs_devices->missing_devices--;
 
new_device = btrfs_alloc_device(NULL, &device->devid,
-   device->uuid);
+   device->devnum, device->uuid);
BUG_ON(IS_ERR(new_device)); /* -ENOMEM */
 
/* Safe because we are under uuid_mutex */
@@ -811,7 +837,7 @@ static int __btrfs_open_devices(struct btrfs_fs_devices 
*fs_devices,
continue;
 
/* Just open everything we can; ignore failures here */
-   if (bt

[PATCH 16/18] Btrfs: stop mounting the fs if the non-ENOENT errors happen when opening seed fs

2014-09-03 Thread Miao Xie

When we open a seed filesystem, if the degraded mount option is set, we 
continue to
mount the fs if we don't find some devices in the seed filesystem. But we 
should stop
mounting if other errors happen. Fix it

Signed-off-by: Miao Xie 
---
 fs/btrfs/volumes.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index fd8141e..cc59fcb 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6093,7 +6093,7 @@ static int read_one_dev(struct btrfs_root *root,
 
if (memcmp(fs_uuid, root->fs_info->fsid, BTRFS_UUID_SIZE)) {
ret = open_seed_devices(root, fs_uuid);
-   if (ret && !btrfs_test_opt(root, DEGRADED))
+   if (ret && !(ret == -ENOENT && btrfs_test_opt(root, DEGRADED)))
return ret;
}
 
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 14/18] Btrfs: fix use-after-free problem of the device during device replace

2014-09-03 Thread Miao Xie

The problem is:
Task0(device scan task) Task1(device replace task)
scan_one_device()
mutex_lock(&uuid_mutex)
device = find_device()
mutex_lock(&device_list_mutex)
lock_chunk()
rm_and_free_source_device
unlock_chunk()
mutex_unlock(&device_list_mutex)
check device

Destroying the target device if device replace fails also has the same problem.

We fix this problem by locking uuid_mutex during destroying source device or
target device, just like the device remove operation.

It is a temporary solution, we can fix this problem and make the code more
clear by atomic counter in the future.

Signed-off-by: Miao Xie 
---
 fs/btrfs/dev-replace.c | 3 +++
 fs/btrfs/volumes.c | 4 +++-
 fs/btrfs/volumes.h | 2 ++
 3 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index aa4c828..e9cbbdb 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -509,6 +509,7 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info 
*fs_info,
ret = btrfs_commit_transaction(trans, root);
WARN_ON(ret);
 
+   mutex_lock(&uuid_mutex);
/* keep away write_all_supers() during the finishing procedure */
mutex_lock(&root->fs_info->fs_devices->device_list_mutex);
mutex_lock(&root->fs_info->chunk_mutex);
@@ -536,6 +537,7 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info 
*fs_info,
btrfs_dev_replace_unlock(dev_replace);
mutex_unlock(&root->fs_info->chunk_mutex);
mutex_unlock(&root->fs_info->fs_devices->device_list_mutex);
+   mutex_unlock(&uuid_mutex);
if (tgt_device)
btrfs_destroy_dev_replace_tgtdev(fs_info, tgt_device);
mutex_unlock(&dev_replace->lock_finishing_cancel_unmount);
@@ -591,6 +593,7 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info 
*fs_info,
 */
mutex_unlock(&root->fs_info->chunk_mutex);
mutex_unlock(&root->fs_info->fs_devices->device_list_mutex);
+   mutex_unlock(&uuid_mutex);
 
/* write back the superblocks */
trans = btrfs_start_transaction(root, 0);
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index f0173b1..24d7001 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -50,7 +50,7 @@ static void __btrfs_reset_dev_stats(struct btrfs_device *dev);
 static void btrfs_dev_stat_print_on_error(struct btrfs_device *dev);
 static void btrfs_dev_stat_print_on_load(struct btrfs_device *device);
 
-static DEFINE_MUTEX(uuid_mutex);
+DEFINE_MUTEX(uuid_mutex);
 static LIST_HEAD(fs_uuids);
 
 static void lock_chunks(struct btrfs_root *root)
@@ -1867,6 +1867,7 @@ void btrfs_destroy_dev_replace_tgtdev(struct 
btrfs_fs_info *fs_info,
 {
struct btrfs_device *next_device;
 
+   mutex_lock(&uuid_mutex);
WARN_ON(!tgtdev);
mutex_lock(&fs_info->fs_devices->device_list_mutex);
if (tgtdev->bdev) {
@@ -1886,6 +1887,7 @@ void btrfs_destroy_dev_replace_tgtdev(struct 
btrfs_fs_info *fs_info,
call_rcu(&tgtdev->rcu, free_device);
 
mutex_unlock(&fs_info->fs_devices->device_list_mutex);
+   mutex_unlock(&uuid_mutex);
 }
 
 static int btrfs_find_device_by_path(struct btrfs_root *root, char 
*device_path,
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 76600a3..2b37da3 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -24,6 +24,8 @@
 #include 
 #include "async-thread.h"
 
+extern struct mutex uuid_mutex;
+
 #define BTRFS_STRIPE_LEN   (64 * 1024)
 
 struct buffer_head;
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 08/18] Btrfs: update free_chunk_space during allocting a new chunk

2014-09-03 Thread Miao Xie

We should update free_chunk_space in time when we allocate a new chunk,
not when we deal with the pending device update and block group insertion,
because we need the real free_chunk_space data to calculate the reserved
space, if we don't update it in time, we would consider the disk space which
has be allocated as free space, and would use it to do overcommit reservation.
Fix it.

Signed-off-by: Miao Xie 
---
 fs/btrfs/volumes.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 45e0b5d..d8e4a3d 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -4432,6 +4432,11 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle 
*trans,
for (i = 0; i < map->num_stripes; i++)
map->stripes[i].dev->bytes_used += stripe_size;
 
+   spin_lock(&extent_root->fs_info->free_chunk_lock);
+   extent_root->fs_info->free_chunk_space -= (stripe_size *
+  map->num_stripes);
+   spin_unlock(&extent_root->fs_info->free_chunk_lock);
+
free_extent_map(em);
check_raid56_incompat_flag(extent_root->fs_info, type);
 
@@ -4515,11 +4520,6 @@ int btrfs_finish_chunk_alloc(struct btrfs_trans_handle 
*trans,
goto out;
}
 
-   spin_lock(&extent_root->fs_info->free_chunk_lock);
-   extent_root->fs_info->free_chunk_space -= (stripe_size *
-  map->num_stripes);
-   spin_unlock(&extent_root->fs_info->free_chunk_lock);
-
stripe = &chunk->stripe;
for (i = 0; i < map->num_stripes; i++) {
device = map->stripes[i].dev;
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/5] Btrfs: don't return btrfs_fs_devices if the caller doesn't want it

2014-09-03 Thread Miao Xie

We will implement the function that the filesystem scan all the devices
in the system and build the device set for btrfs. In this case, we needn't
get btrfs_fs_devices when adding a device into list. This patch changes
device_add_list and implement this feature.

Signed-off-by: Miao Xie 
---
 fs/btrfs/volumes.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 1aacf5f..740a4f9 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -568,7 +568,8 @@ static noinline int device_list_add(const char *path,
if (!fs_devices->opened)
device->generation = found_transid;
 
-   *fs_devices_ret = fs_devices;
+   if (fs_devices_ret)
+   *fs_devices_ret = fs_devices;
 
return ret;
 }
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 10/18] Btrfs: fix unprotected system chunk array insertion

2014-09-03 Thread Miao Xie

We didn't protect the system chunk array when we added a new
system chunk into it, it would cause the array be corrupted
if someone remove/add some system chunk into array at the same
time. Fix it by chunk lock.

Signed-off-by: Miao Xie 
---
 fs/btrfs/volumes.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 41da102..9f22398d 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -4054,10 +4054,13 @@ static int btrfs_add_system_chunk(struct btrfs_root 
*root,
u32 array_size;
u8 *ptr;
 
+   lock_chunks(root);
array_size = btrfs_super_sys_array_size(super_copy);
if (array_size + item_size + sizeof(disk_key)
-   > BTRFS_SYSTEM_CHUNK_ARRAY_SIZE)
+   > BTRFS_SYSTEM_CHUNK_ARRAY_SIZE) {
+   unlock_chunks(root);
return -EFBIG;
+   }
 
ptr = super_copy->sys_chunk_array + array_size;
btrfs_cpu_key_to_disk(&disk_key, key);
@@ -4066,6 +4069,8 @@ static int btrfs_add_system_chunk(struct btrfs_root *root,
memcpy(ptr, chunk, item_size);
item_size += sizeof(disk_key);
btrfs_set_super_sys_array_size(super_copy, array_size + item_size);
+   unlock_chunks(root);
+
return 0;
 }
 
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 12/18] Btrfs: Fix misuse of chunk mutex

2014-09-03 Thread Miao Xie

There were several problems about chunk mutex usage:
- Lock chunk mutex when updating metadata. It would cause the nested
  deadlock because updating metadata might need allocate new chunks
  that need acquire chunk mutex. We remove chunk mutex at this case,
  because b-tree lock and other lock mechanism can help us.
- ABBA deadlock occured between device_list_mutex and chunk_mutex.
  When we update device status, we must acquire device_list_mutex at the
  beginning, and then we might get chunk_mutex during the device status
  update because we need allocate new chunks for metadata COW. But at
  most place, we acquire chunk_mutex at first and then acquire device list
  mutex. We need change the lock order.
- Some place we needn't acquire chunk_mutex. For example we needn't get
  chunk_mutex when we free a empty seed fs_devices structure.

Signed-off-by: Miao Xie 
---
 fs/btrfs/dev-replace.c |   6 +--
 fs/btrfs/extent-tree.c |   2 -
 fs/btrfs/volumes.c | 129 -
 3 files changed, 65 insertions(+), 72 deletions(-)

diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index da7ac14..aa4c828 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -510,8 +510,8 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info 
*fs_info,
WARN_ON(ret);
 
/* keep away write_all_supers() during the finishing procedure */
-   mutex_lock(&root->fs_info->chunk_mutex);
mutex_lock(&root->fs_info->fs_devices->device_list_mutex);
+   mutex_lock(&root->fs_info->chunk_mutex);
btrfs_dev_replace_lock(dev_replace);
dev_replace->replace_state =
scrub_ret ? BTRFS_IOCTL_DEV_REPLACE_STATE_CANCELED
@@ -534,8 +534,8 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info 
*fs_info,
  src_device->devid,
  rcu_str_deref(tgt_device->name), scrub_ret);
btrfs_dev_replace_unlock(dev_replace);
-   mutex_unlock(&root->fs_info->fs_devices->device_list_mutex);
mutex_unlock(&root->fs_info->chunk_mutex);
+   mutex_unlock(&root->fs_info->fs_devices->device_list_mutex);
if (tgt_device)
btrfs_destroy_dev_replace_tgtdev(fs_info, tgt_device);
mutex_unlock(&dev_replace->lock_finishing_cancel_unmount);
@@ -589,8 +589,8 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info 
*fs_info,
 * superblock is scratched out so that it is no longer marked to
 * belong to this filesystem.
 */
-   mutex_unlock(&root->fs_info->fs_devices->device_list_mutex);
mutex_unlock(&root->fs_info->chunk_mutex);
+   mutex_unlock(&root->fs_info->fs_devices->device_list_mutex);
 
/* write back the superblocks */
trans = btrfs_start_transaction(root, 0);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index e105558..e1ad84e 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -9404,8 +9404,6 @@ int btrfs_remove_block_group(struct btrfs_trans_handle 
*trans,
 
memcpy(&key, &block_group->key, sizeof(key));
 
-   btrfs_clear_space_info_full(root->fs_info);
-
btrfs_put_block_group(block_group);
btrfs_put_block_group(block_group);
 
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 9f22398d..357f911 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1264,7 +1264,7 @@ out:
 
 static int btrfs_free_dev_extent(struct btrfs_trans_handle *trans,
  struct btrfs_device *device,
- u64 start)
+ u64 start, u64 *dev_extent_len)
 {
int ret;
struct btrfs_path *path;
@@ -1306,13 +1306,8 @@ again:
goto out;
}
 
-   if (device->bytes_used > 0) {
-   u64 len = btrfs_dev_extent_length(leaf, extent);
-   btrfs_device_set_bytes_used(device, device->bytes_used - len);
-   spin_lock(&root->fs_info->free_chunk_lock);
-   root->fs_info->free_chunk_space += len;
-   spin_unlock(&root->fs_info->free_chunk_lock);
-   }
+   *dev_extent_len = btrfs_dev_extent_length(leaf, extent);
+
ret = btrfs_del_item(trans, root, path);
if (ret) {
btrfs_error(root->fs_info, ret,
@@ -1521,7 +1516,6 @@ static int btrfs_rm_dev_item(struct btrfs_root *root,
key.objectid = BTRFS_DEV_ITEMS_OBJECTID;
key.type = BTRFS_DEV_ITEM_KEY;
key.offset = device->devid;
-   lock_chunks(root);
 
ret = btrfs_search_slot(trans, root, &key, path, -1, 1);
if (ret < 0)
@@ -1537,7 +1531,6 @@ static int btrfs_rm_dev_item(struct btrfs_root *root,
goto out;
 out:
btrfs_free_path(path);
-   unlock_chunks(root);
btrfs_commit_transaction(trans, root);
return ret;
 }
@@ -1726,9 +1719,7 @@ int btrfs_rm_device(struct btrfs_roo

[PATCH 04/18] Btrfs: fix wrong disk size when writing super blocks

2014-09-03 Thread Miao Xie

total_size will be changed when resizing a device, and disk_total_size
will be changed if resizing is successful. Meanwhile, the on-disk super
blocks of the previous transaction might not be updated. Considering
the consistency of the metadata in the previous transaction, We should
use the size in the previous transaction to check if the super block is
beyond the boundary of the device. Fix it.

Signed-off-by: Miao Xie 
---
 fs/btrfs/check-integrity.c |  2 +-
 fs/btrfs/dev-replace.c | 18 ++
 fs/btrfs/disk-io.c |  5 +++--
 fs/btrfs/scrub.c   |  3 ++-
 fs/btrfs/transaction.c |  2 ++
 fs/btrfs/volumes.c | 40 +++-
 fs/btrfs/volumes.h | 18 ++
 7 files changed, 83 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c
index e0033c8..cb7f3fe 100644
--- a/fs/btrfs/check-integrity.c
+++ b/fs/btrfs/check-integrity.c
@@ -807,7 +807,7 @@ static int btrfsic_process_superblock_dev_mirror(
 
/* super block bytenr is always the unmapped device bytenr */
dev_bytenr = btrfs_sb_offset(superblock_mirror_num);
-   if (dev_bytenr + BTRFS_SUPER_INFO_SIZE > device->total_bytes)
+   if (dev_bytenr + BTRFS_SUPER_INFO_SIZE > device->commit_total_bytes)
return -1;
bh = __bread(superblock_bdev, dev_bytenr / 4096,
 BTRFS_SUPER_INFO_SIZE);
diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index 72dc02e..7877b0f 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -168,6 +168,8 @@ no_valid_dev_replace_entry_found:
dev_replace->srcdev->total_bytes;
dev_replace->tgtdev->disk_total_bytes =
dev_replace->srcdev->disk_total_bytes;
+   dev_replace->tgtdev->commit_total_bytes =
+   dev_replace->srcdev->commit_total_bytes;
dev_replace->tgtdev->bytes_used =
dev_replace->srcdev->bytes_used;
}
@@ -329,6 +331,20 @@ int btrfs_dev_replace_start(struct btrfs_root *root,
args->start.tgtdev_name[0] == '\0')
return -EINVAL;
 
+   /*
+* Here we commit the transaction to make sure commit_total_bytes
+* of all the devices are updated.
+*/
+   trans = btrfs_attach_transaction(root);
+   if (!IS_ERR(trans)) {
+   ret = btrfs_commit_transaction(trans, root);
+   if (ret)
+   return ret;
+   } else if (PTR_ERR(trans) != -ENOENT) {
+   return PTR_ERR(trans);
+   }
+
+   /* the disk copy procedure reuses the scrub code */
mutex_lock(&fs_info->volume_mutex);
ret = btrfs_dev_replace_find_srcdev(root, args->start.srcdevid,
args->start.srcdev_name,
@@ -539,6 +555,8 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info 
*fs_info,
memcpy(src_device->uuid, uuid_tmp, sizeof(src_device->uuid));
tgt_device->total_bytes = src_device->total_bytes;
tgt_device->disk_total_bytes = src_device->disk_total_bytes;
+   ASSERT(list_empty(&src_device->resized_list));
+   tgt_device->commit_total_bytes = src_device->commit_total_bytes;
tgt_device->bytes_used = src_device->bytes_used;
if (fs_info->sb->s_bdev == src_device->bdev)
fs_info->sb->s_bdev = tgt_device->bdev;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index df1ae8c..0c7ae0e 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3131,7 +3131,8 @@ static int write_dev_supers(struct btrfs_device *device,
 
for (i = 0; i < max_mirrors; i++) {
bytenr = btrfs_sb_offset(i);
-   if (bytenr + BTRFS_SUPER_INFO_SIZE >= device->total_bytes)
+   if (bytenr + BTRFS_SUPER_INFO_SIZE >=
+   device->commit_total_bytes)
break;
 
if (wait) {
@@ -3448,7 +3449,7 @@ static int write_all_supers(struct btrfs_root *root, int 
max_mirrors)
btrfs_set_stack_device_type(dev_item, dev->type);
btrfs_set_stack_device_id(dev_item, dev->devid);
btrfs_set_stack_device_total_bytes(dev_item,
-  dev->disk_total_bytes);
+  dev->commit_total_bytes);
btrfs_set_stack_device_bytes_used(dev_item, dev->bytes_used);
btrfs_set_stack_device_io_align(dev_item, dev->io_align);
btrfs_set_stack_device_io_width(dev_item, dev->io_width);
diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index f8e1144..cce122b 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -2861,7 +2861,8 @@ static noinline_for_stack i

[PATCH 17/18] Btrfs: move the missing device to its own fs device list

2014-09-03 Thread Miao Xie

For a missing device, we don't know it belong to which fs before we read its
fsid from the chunk tree. So we add them into the current fs device list at 
first.
When we get its fsid, we should move them to their own fs device list.

Signed-off-by: Miao Xie 
---
 fs/btrfs/volumes.c | 78 --
 1 file changed, 52 insertions(+), 26 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index cc59fcb..b7f093d 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -5846,10 +5846,10 @@ struct btrfs_device *btrfs_find_device(struct 
btrfs_fs_info *fs_info, u64 devid,
 }
 
 static struct btrfs_device *add_missing_dev(struct btrfs_root *root,
+   struct btrfs_fs_devices *fs_devices,
u64 devid, u8 *dev_uuid)
 {
struct btrfs_device *device;
-   struct btrfs_fs_devices *fs_devices = root->fs_info->fs_devices;
 
device = btrfs_alloc_device(NULL, &devid, dev_uuid);
if (IS_ERR(device))
@@ -5986,7 +5986,8 @@ static int read_one_chunk(struct btrfs_root *root, struct 
btrfs_key *key,
}
if (!map->stripes[i].dev) {
map->stripes[i].dev =
-   add_missing_dev(root, devid, uuid);
+   add_missing_dev(root, root->fs_info->fs_devices,
+   devid, uuid);
if (!map->stripes[i].dev) {
free_extent_map(em);
return -EIO;
@@ -6027,7 +6028,8 @@ static void fill_device_from_item(struct extent_buffer 
*leaf,
read_extent_buffer(leaf, device->uuid, ptr, BTRFS_UUID_SIZE);
 }
 
-static int open_seed_devices(struct btrfs_root *root, u8 *fsid)
+static struct btrfs_fs_devices *open_seed_devices(struct btrfs_root *root,
+ u8 *fsid)
 {
struct btrfs_fs_devices *fs_devices;
int ret;
@@ -6036,49 +6038,56 @@ static int open_seed_devices(struct btrfs_root *root, 
u8 *fsid)
 
fs_devices = root->fs_info->fs_devices->seed;
while (fs_devices) {
-   if (!memcmp(fs_devices->fsid, fsid, BTRFS_UUID_SIZE)) {
-   ret = 0;
-   goto out;
-   }
+   if (!memcmp(fs_devices->fsid, fsid, BTRFS_UUID_SIZE))
+   return fs_devices;
+
fs_devices = fs_devices->seed;
}
 
fs_devices = find_fsid(fsid);
if (!fs_devices) {
-   ret = -ENOENT;
-   goto out;
+   if (!btrfs_test_opt(root, DEGRADED))
+   return ERR_PTR(-ENOENT);
+
+   fs_devices = alloc_fs_devices(fsid);
+   if (IS_ERR(fs_devices))
+   return fs_devices;
+
+   fs_devices->seeding = 1;
+   fs_devices->opened = 1;
+   return fs_devices;
}
 
fs_devices = clone_fs_devices(fs_devices);
-   if (IS_ERR(fs_devices)) {
-   ret = PTR_ERR(fs_devices);
-   goto out;
-   }
+   if (IS_ERR(fs_devices))
+   return fs_devices;
 
ret = __btrfs_open_devices(fs_devices, FMODE_READ,
   root->fs_info->bdev_holder);
if (ret) {
free_fs_devices(fs_devices);
+   fs_devices = ERR_PTR(ret);
goto out;
}
 
if (!fs_devices->seeding) {
__btrfs_close_devices(fs_devices);
free_fs_devices(fs_devices);
-   ret = -EINVAL;
+   fs_devices = ERR_PTR(-EINVAL);
goto out;
}
 
fs_devices->seed = root->fs_info->fs_devices->seed;
root->fs_info->fs_devices->seed = fs_devices;
 out:
-   return ret;
+   return fs_devices;
 }
 
 static int read_one_dev(struct btrfs_root *root,
struct extent_buffer *leaf,
struct btrfs_dev_item *dev_item)
 {
+   struct btrfs_fs_devices *fs_devices = root->fs_info->fs_devices;
struct btrfs_device *device;
u64 devid;
int ret;
@@ -6092,31 +6101,48 @@ static int read_one_dev(struct btrfs_root *root,
   BTRFS_UUID_SIZE);
 
if (memcmp(fs_uuid, root->fs_info->fsid, BTRFS_UUID_SIZE)) {
-   ret = open_seed_devices(root, fs_uuid);
-   if (ret && !(ret == -ENOENT && btrfs_test_opt(root, DEGRADED)))
-   return ret;
+   fs_devices = open_seed_devices(root, fs_uuid);
+   if (IS_ERR(fs_devices))
+   return PTR_ERR(fs_devices);
}
 
device = btrfs_find_device(root->fs_info, devid, dev_uuid, fs_uuid);
-   if (!device || !device->bdev) {
+   if (!device) {
if (!btrfs_test_opt(root, DEGRADED)

[PATCH 4/5] Btrfs: restructure btrfs_get_bdev_and_sb and pick up some code used later

2014-09-03 Thread Miao Xie

Some code in btrfs_get_bdev_and_sb will be re-used by the other function later,
so restructure btrfs_get_bdev_and_sb and pick up those code to make a new
function.

Signed-off-by: Miao Xie 
---
 fs/btrfs/volumes.c | 66 +-
 1 file changed, 36 insertions(+), 30 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index bcb19d5..9d52fd8 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -193,42 +193,47 @@ static noinline struct btrfs_fs_devices *find_fsid(u8 
*fsid)
return NULL;
 }
 
+static int __btrfs_get_sb(struct block_device *bdev, int flush,
+ struct buffer_head **bh)
+{
+   int ret;
+
+   if (flush)
+   filemap_write_and_wait(bdev->bd_inode->i_mapping);
+
+   ret = set_blocksize(bdev, 4096);
+   if (ret)
+   return ret;
+
+   invalidate_bdev(bdev);
+   *bh = btrfs_read_dev_super(bdev);
+   if (!*bh)
+   return -EINVAL;
+
+   return 0;
+}
+
 static int
-btrfs_get_bdev_and_sb(const char *device_path, fmode_t flags, void *holder,
- int flush, struct block_device **bdev,
- struct buffer_head **bh)
+btrfs_get_bdev_and_sb_by_path(const char *device_path, fmode_t flags,
+ void *holder, int flush,
+ struct block_device **bdev,
+ struct buffer_head **bh)
 {
int ret;
 
*bdev = blkdev_get_by_path(device_path, flags, holder);
-
if (IS_ERR(*bdev)) {
-   ret = PTR_ERR(*bdev);
printk(KERN_INFO "BTRFS: open %s failed\n", device_path);
-   goto error;
+   return PTR_ERR(*bdev);
}
 
-   if (flush)
-   filemap_write_and_wait((*bdev)->bd_inode->i_mapping);
-   ret = set_blocksize(*bdev, 4096);
+   ret = __btrfs_get_sb(*bdev, flush, bh);
if (ret) {
blkdev_put(*bdev, flags);
-   goto error;
-   }
-   invalidate_bdev(*bdev);
-   *bh = btrfs_read_dev_super(*bdev);
-   if (!*bh) {
-   ret = -EINVAL;
-   blkdev_put(*bdev, flags);
-   goto error;
+   return ret;
}
 
return 0;
-
-error:
-   *bdev = NULL;
-   *bh = NULL;
-   return ret;
 }
 
 static void requeue_list(struct btrfs_pending_bios *pending_bios,
@@ -806,8 +811,8 @@ static int __btrfs_open_devices(struct btrfs_fs_devices 
*fs_devices,
continue;
 
/* Just open everything we can; ignore failures here */
-   if (btrfs_get_bdev_and_sb(device->name->str, flags, holder, 1,
-   &bdev, &bh))
+   if (btrfs_get_bdev_and_sb_by_path(device->name->str, flags,
+ holder, 1, &bdev, &bh))
continue;
 
disk_super = (struct btrfs_super_block *)bh->b_data;
@@ -1629,10 +1634,10 @@ int btrfs_rm_device(struct btrfs_root *root, char 
*device_path)
goto out;
}
} else {
-   ret = btrfs_get_bdev_and_sb(device_path,
-   FMODE_WRITE | FMODE_EXCL,
-   root->fs_info->bdev_holder, 0,
-   &bdev, &bh);
+   ret = btrfs_get_bdev_and_sb_by_path(device_path,
+   FMODE_WRITE | FMODE_EXCL,
+   root->fs_info->bdev_holder,
+   0, &bdev, &bh);
if (ret)
goto out;
disk_super = (struct btrfs_super_block *)bh->b_data;
@@ -1906,8 +1911,9 @@ static int btrfs_find_device_by_path(struct btrfs_root 
*root, char *device_path,
struct buffer_head *bh;
 
*device = NULL;
-   ret = btrfs_get_bdev_and_sb(device_path, FMODE_READ,
-   root->fs_info->bdev_holder, 0, &bdev, &bh);
+   ret = btrfs_get_bdev_and_sb_by_path(device_path, FMODE_READ,
+   root->fs_info->bdev_holder, 0,
+   &bdev, &bh);
if (ret)
return ret;
disk_super = (struct btrfs_super_block *)bh->b_data;
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 03/18] Btrfs: fix unprotected assignment of the target device

2014-09-03 Thread Miao Xie

We didn't protect the assignment of the target device, it might cause the
problem that the super block update was skipped because we might find wrong
size of the target device during the assignment. Fix it by moving the
assignment sentences into the initialization function of the target device.
And there is another merit that we can check if the target device is suitable
more early.

Signed-off-by: Miao Xie 
---
 fs/btrfs/dev-replace.c | 32 
 fs/btrfs/volumes.c | 23 +++
 fs/btrfs/volumes.h |  1 +
 3 files changed, 28 insertions(+), 28 deletions(-)

diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index 10dfb41..72dc02e 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -330,29 +330,19 @@ int btrfs_dev_replace_start(struct btrfs_root *root,
return -EINVAL;
 
mutex_lock(&fs_info->volume_mutex);
-   ret = btrfs_init_dev_replace_tgtdev(root, args->start.tgtdev_name,
-   &tgt_device);
-   if (ret) {
-   btrfs_err(fs_info, "target device %s is invalid!",
-  args->start.tgtdev_name);
-   mutex_unlock(&fs_info->volume_mutex);
-   return -EINVAL;
-   }
-
ret = btrfs_dev_replace_find_srcdev(root, args->start.srcdevid,
args->start.srcdev_name,
&src_device);
-   mutex_unlock(&fs_info->volume_mutex);
if (ret) {
-   ret = -EINVAL;
-   goto leave_no_lock;
+   mutex_unlock(&fs_info->volume_mutex);
+   return ret;
}
 
-   if (tgt_device->total_bytes < src_device->total_bytes) {
-   btrfs_err(fs_info, "target device is smaller than source 
device!");
-   ret = -EINVAL;
-   goto leave_no_lock;
-   }
+   ret = btrfs_init_dev_replace_tgtdev(root, args->start.tgtdev_name,
+   src_device, &tgt_device);
+   mutex_unlock(&fs_info->volume_mutex);
+   if (ret)
+   return ret;
 
btrfs_dev_replace_lock(dev_replace);
switch (dev_replace->replace_state) {
@@ -380,10 +370,6 @@ int btrfs_dev_replace_start(struct btrfs_root *root,
  src_device->devid,
  rcu_str_deref(tgt_device->name));
 
-   tgt_device->total_bytes = src_device->total_bytes;
-   tgt_device->disk_total_bytes = src_device->disk_total_bytes;
-   tgt_device->bytes_used = src_device->bytes_used;
-
/*
 * from now on, the writes to the srcdev are all duplicated to
 * go to the tgtdev as well (refer to btrfs_map_block()).
@@ -426,9 +412,7 @@ leave:
dev_replace->srcdev = NULL;
dev_replace->tgtdev = NULL;
btrfs_dev_replace_unlock(dev_replace);
-leave_no_lock:
-   if (tgt_device)
-   btrfs_destroy_dev_replace_tgtdev(fs_info, tgt_device);
+   btrfs_destroy_dev_replace_tgtdev(fs_info, tgt_device);
return ret;
 }
 
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 483fc6d..1646659 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2295,6 +2295,7 @@ error:
 }
 
 int btrfs_init_dev_replace_tgtdev(struct btrfs_root *root, char *device_path,
+ struct btrfs_device *srcdev,
  struct btrfs_device **device_out)
 {
struct request_queue *q;
@@ -2307,24 +2308,37 @@ int btrfs_init_dev_replace_tgtdev(struct btrfs_root 
*root, char *device_path,
int ret = 0;
 
*device_out = NULL;
-   if (fs_info->fs_devices->seeding)
+   if (fs_info->fs_devices->seeding) {
+   btrfs_err(fs_info, "the filesystem is a seed filesystem!");
return -EINVAL;
+   }
 
bdev = blkdev_get_by_path(device_path, FMODE_WRITE | FMODE_EXCL,
  fs_info->bdev_holder);
-   if (IS_ERR(bdev))
+   if (IS_ERR(bdev)) {
+   btrfs_err(fs_info, "target device %s is invalid!", device_path);
return PTR_ERR(bdev);
+   }
 
filemap_write_and_wait(bdev->bd_inode->i_mapping);
 
devices = &fs_info->fs_devices->devices;
list_for_each_entry(device, devices, dev_list) {
if (device->bdev == bdev) {
+   btrfs_err(fs_info, "target device is in the 
filesystem!");
ret = -EEXIST;
goto error;
}
}
 
+
+   if (i_size_read(bdev->bd_inode) < srcdev->total_bytes) {
+   btrfs_err(fs_info, "target device is smaller than source 
device!");
+   ret = -EINVAL;
+   goto error;
+   }
+
+
device = btrfs_alloc_device(NULL, &devid, NULL);
if (IS_ERR(device)) {
ret = PTR_ERR(device);
@@ -2348,8 +2362,9 @@ int btrfs_init_dev_re

[PATCH 3/5] Btrfs: restructure btrfs_scan_one_device

2014-09-03 Thread Miao Xie

Some code in btrfs_scan_one_device will be re-used by the other function later,
so restructure btrfs_scan_one_device and pick up those code to make a new
function.

Signed-off-by: Miao Xie 
---
 fs/btrfs/volumes.c | 57 +++---
 1 file changed, 33 insertions(+), 24 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 740a4f9..bcb19d5 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -885,24 +885,18 @@ int btrfs_open_devices(struct btrfs_fs_devices 
*fs_devices,
return ret;
 }
 
-/*
- * Look for a btrfs signature on a device. This may be called out of the mount 
path
- * and we are not allowed to call set_blocksize during the scan. The superblock
- * is read via pagecache
- */
-int btrfs_scan_one_device(const char *path, fmode_t flags, void *holder,
- struct btrfs_fs_devices **fs_devices_ret)
+static int __scan_device(struct block_device *bdev, const char *path,
+struct btrfs_fs_devices **fs_devices_ret)
 {
struct btrfs_super_block *disk_super;
-   struct block_device *bdev;
struct page *page;
void *p;
-   int ret = -EINVAL;
u64 devid;
u64 transid;
u64 total_devices;
u64 bytenr;
pgoff_t index;
+   int ret;
 
/*
 * we would like to check all the supers, but that would make
@@ -911,38 +905,30 @@ int btrfs_scan_one_device(const char *path, fmode_t 
flags, void *holder,
 * later supers, using BTRFS_SUPER_MIRROR_MAX instead
 */
bytenr = btrfs_sb_offset(0);
-   flags |= FMODE_EXCL;
-   mutex_lock(&uuid_mutex);
-
-   bdev = blkdev_get_by_path(path, flags, holder);
-
-   if (IS_ERR(bdev)) {
-   ret = PTR_ERR(bdev);
-   goto error;
-   }
 
/* make sure our super fits in the device */
if (bytenr + PAGE_CACHE_SIZE >= i_size_read(bdev->bd_inode))
-   goto error_bdev_put;
+   return -EINVAL;
 
/* make sure our super fits in the page */
if (sizeof(*disk_super) > PAGE_CACHE_SIZE)
-   goto error_bdev_put;
+   return -EINVAL;
 
/* make sure our super doesn't straddle pages on disk */
index = bytenr >> PAGE_CACHE_SHIFT;
if ((bytenr + sizeof(*disk_super) - 1) >> PAGE_CACHE_SHIFT != index)
-   goto error_bdev_put;
+   return -EINVAL;
 
/* pull in the page with our super */
page = read_cache_page_gfp(bdev->bd_inode->i_mapping,
   index, GFP_NOFS);
 
if (IS_ERR_OR_NULL(page))
-   goto error_bdev_put;
+   return -ENOMEM;
 
-   p = kmap(page);
+   ret = -EINVAL;
 
+   p = kmap(page);
/* align our pointer to the offset of the super block */
disk_super = p + (bytenr & ~PAGE_CACHE_MASK);
 
@@ -974,7 +960,30 @@ error_unmap:
kunmap(page);
page_cache_release(page);
 
-error_bdev_put:
+   return ret;
+}
+
+/*
+ * Look for a btrfs signature on a device. This may be called out of the mount 
path
+ * and we are not allowed to call set_blocksize during the scan. The superblock
+ * is read via pagecache
+ */
+int btrfs_scan_one_device(const char *path, fmode_t flags, void *holder,
+ struct btrfs_fs_devices **fs_devices_ret)
+{
+   struct block_device *bdev;
+   int ret;
+
+   flags |= FMODE_EXCL;
+
+   mutex_lock(&uuid_mutex);
+   bdev = blkdev_get_by_path(path, flags, holder);
+   if (IS_ERR(bdev)) {
+   ret = PTR_ERR(bdev);
+   goto error;
+   }
+
+   ret = __scan_device(bdev, path, fs_devices_ret);
blkdev_put(bdev, flags);
 error:
mutex_unlock(&uuid_mutex);
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 01/18] Btrfs: cleanup unused num_can_discard in fs_devices

2014-09-03 Thread Miao Xie

The member variants - num_can_discard - of fs_devices structure
are set, but no one use them to do anything. so remove them.

Signed-off-by: Miao Xie 
---
 fs/btrfs/volumes.c | 16 ++--
 fs/btrfs/volumes.h |  1 -
 2 files changed, 2 insertions(+), 15 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index e9676a4..483fc6d 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -720,8 +720,6 @@ static int __btrfs_close_devices(struct btrfs_fs_devices 
*fs_devices)
fs_devices->rw_devices--;
}
 
-   if (device->can_discard)
-   fs_devices->num_can_discard--;
if (device->missing)
fs_devices->missing_devices--;
 
@@ -828,10 +826,8 @@ static int __btrfs_open_devices(struct btrfs_fs_devices 
*fs_devices,
}
 
q = bdev_get_queue(bdev);
-   if (blk_queue_discard(q)) {
+   if (blk_queue_discard(q))
device->can_discard = 1;
-   fs_devices->num_can_discard++;
-   }
 
device->bdev = bdev;
device->in_fs_metadata = 0;
@@ -1835,8 +1831,7 @@ void btrfs_rm_dev_replace_srcdev(struct btrfs_fs_info 
*fs_info,
if (!fs_devices->seeding)
fs_devices->rw_devices++;
}
-   if (srcdev->can_discard)
-   fs_devices->num_can_discard--;
+
if (srcdev->bdev) {
fs_devices->open_devices--;
 
@@ -1886,8 +1881,6 @@ void btrfs_destroy_dev_replace_tgtdev(struct 
btrfs_fs_info *fs_info,
fs_info->fs_devices->open_devices--;
}
fs_info->fs_devices->num_devices--;
-   if (tgtdev->can_discard)
-   fs_info->fs_devices->num_can_discard++;
 
next_device = list_entry(fs_info->fs_devices->devices.next,
 struct btrfs_device, dev_list);
@@ -2008,7 +2001,6 @@ static int btrfs_prepare_sprout(struct btrfs_root *root)
fs_devices->num_devices = 0;
fs_devices->open_devices = 0;
fs_devices->missing_devices = 0;
-   fs_devices->num_can_discard = 0;
fs_devices->rotating = 0;
fs_devices->seed = seed_devices;
 
@@ -2200,8 +2192,6 @@ int btrfs_init_new_device(struct btrfs_root *root, char 
*device_path)
root->fs_info->fs_devices->open_devices++;
root->fs_info->fs_devices->rw_devices++;
root->fs_info->fs_devices->total_devices++;
-   if (device->can_discard)
-   root->fs_info->fs_devices->num_can_discard++;
root->fs_info->fs_devices->total_rw_bytes += device->total_bytes;
 
spin_lock(&root->fs_info->free_chunk_lock);
@@ -2371,8 +2361,6 @@ int btrfs_init_dev_replace_tgtdev(struct btrfs_root 
*root, char *device_path,
list_add(&device->dev_list, &fs_info->fs_devices->devices);
fs_info->fs_devices->num_devices++;
fs_info->fs_devices->open_devices++;
-   if (device->can_discard)
-   fs_info->fs_devices->num_can_discard++;
mutex_unlock(&root->fs_info->fs_devices->device_list_mutex);
 
*device_out = device;
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index e894ac6..37f8bff 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -124,7 +124,6 @@ struct btrfs_fs_devices {
u64 rw_devices;
u64 missing_devices;
u64 total_rw_bytes;
-   u64 num_can_discard;
u64 total_devices;
struct block_device *latest_bdev;
 
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 02/18] Btrfs: cleanup double assignment of device->bytes_used when device replace finishes

2014-09-03 Thread Miao Xie

Signed-off-by: Miao Xie 
---
 fs/btrfs/dev-replace.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index a85b5f5..10dfb41 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -550,7 +550,6 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info 
*fs_info,
tgt_device->is_tgtdev_for_dev_replace = 0;
tgt_device->devid = src_device->devid;
src_device->devid = BTRFS_DEV_REPLACE_DEVID;
-   tgt_device->bytes_used = src_device->bytes_used;
memcpy(uuid_tmp, tgt_device->uuid, sizeof(uuid_tmp));
memcpy(tgt_device->uuid, src_device->uuid, sizeof(tgt_device->uuid));
memcpy(src_device->uuid, uuid_tmp, sizeof(src_device->uuid));
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 06/18] Btrfs: Fix wrong free_chunk_space assignment during removing a device

2014-09-03 Thread Miao Xie

During removing a device, we have modified free_chunk_space when we
shrink the device, so we needn't assign a new value to it after
the device shrink. Fix it.

Signed-off-by: Miao Xie 
---
 fs/btrfs/volumes.c | 5 -
 1 file changed, 5 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index f8273bb..1524b3f 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1671,11 +1671,6 @@ int btrfs_rm_device(struct btrfs_root *root, char 
*device_path)
if (ret)
goto error_undo;
 
-   spin_lock(&root->fs_info->free_chunk_lock);
-   root->fs_info->free_chunk_space = device->total_bytes -
-   device->bytes_used;
-   spin_unlock(&root->fs_info->free_chunk_lock);
-
device->in_fs_metadata = 0;
btrfs_scrub_cancel_dev(root->fs_info, device);
 
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: kernel 3.17-rc3: task rsync:2524 blocked for more than 120 seconds

2014-09-03 Thread Chris Mason

On 09/02/2014 09:31 PM, john terragon wrote:
> Rsync finished. FWIW in the end it reported an average speed of about
>  900K/sec. Without autodefrag there have been no messages about hung
> kworkers even though rsync seemingly keeps getting hung for several
> minutes throughout the whole execution.

So lets take a step back and figure out how fast the usb stick actually is.
This will erase your usb stick, but give us an idea of its performance:

dd if=/dev/zero of=/dev/ bs=20M oflag=direct count=100

Note again, the above command will erase your usb stick ;)  Use whatever device 
name
you've been sending to mkfs.btrfs

The kernel will allow a pretty significant amount of ram to be dirtied before 
forcing writeback, which is why you're seeing rsync stall at seemingly strange 
intervals.  In the base of btrfs with compression, we add some worker threads 
between 
rsync and the device, and these may be turning the writeback into a somewhat 
more bursty operation.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Problem with applying incremental btrfs-send

2014-09-03 Thread Harald Hoyer

Hi,

maybe someone can enlighten me. I am doing btrfs send & receive with full
snapshots and incremental updates.

It basically looks like this:

vol-0 and vol-1 are full subvolume image sends.
inc-1 and inc-2 are incremental images with:
# btrfs send -f inc-1 -p vol vol'
# btrfs send -f inc-2 -p vol' vol''

Case A:

vol---send> vol-0 --receive--> avol  --send&rec--> bvol
|-send> inc-1 --receive-->  |   | <--receive-- inc-1
vol' -send> vol-1  avol'   bvol'
|-send> inc-2 --receive-->  |   | <--receive-- inc-2
vol''  avol''  bvol''

which, works for bvol and which even works, if bvol is removed before inc-2 is
applied to bvol'.

Case B:

vol---send> vol-0 --receive--> avol
|-send> inc-1 --receive-->  |
vol' -send> vol-1  avol' --send&rec--> bvol'
|-send> inc-2 --receive-->  |   | <--receive-- inc-2
vol''  avol''  ERROR


trying to apply inc-2 to bvol' fails with:

ERROR: could not find parent subvolume

What's the problem here?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

kernel BUG at fs/btrfs/relocation.c:1065 in 3.14.16 to 3.17-rc3

2014-09-03 Thread Olivier Bonvalet

Hi,

I have a btrfs partition which throw kernel BUG, even with linux
3.17-rc3 (I tried 3.14.16, 3.16.1 and 3.17-rc3 kernels) :

[   45.058466] [ cut here ]
[   45.058539] kernel BUG at fs/btrfs/relocation.c:1065!
[   45.058578] invalid opcode:  [#1] SMP 
[   45.058655] Modules linked in: nf_conntrack iTCO_wdt iTCO_vendor_support 
i2c_i801 lpc_ich ehci_pci i2ccore ehci_hcd mfd_core evdev battery ie31200_edac 
edac_core video button btrfs xor raid6_pq dm_mod raid1 md_mod sg sd_mod 
crc_t10dif crct10dif_common thermal ahci libahci libata scsi_mod xhci_hcd 
e1000e fan ptp pps_core
[   45.059500] CPU: 2 PID: 1740 Comm: btrfs-balance Not tainted 
3.17-rc3-dae-intel #1
[   45.059550] Hardware name: Digicube sas DediCube/DQ77MK, BIOS 
MKQ7710H.86A.0058.2013.0226.1541 02/26/2013
[   45.059602] task: 8802151c17e0 ti: 8802105ec000 task.ti: 
8802105ec000
[   45.059652] RIP: 0010:[]  [] 
build_backref_tree+0xa3d/0xcf6 [btrfs]
[   45.059739] RSP: 0018:8802105efaf0  EFLAGS: 00010246
[   45.059776] RAX: 8802105efb00 RBX: 880213b83800 RCX: 880210565d10
[   45.059816] RDX: 8802105efb68 RSI: 8802105efb68 RDI: 880210565d10
[   45.059857] RBP: 880210565d10 R08: 88021313fc40 R09: 1000
[   45.059896] R10: 1600 R11: 6db6db6db6db6db7 R12: 8800d114d310
[   45.059937] R13: 8802105efb78 R14: 8800d114d2c0 R15: 88021313fc40
[   45.059977] FS:  () GS:88021e28() 
knlGS:
[   45.060028] CS:  0010 DS:  ES:  CR0: 80050033
[   45.060066] CR2: 7f2fc9c649b8 CR3: 01611000 CR4: 001407e0
[   45.060105] Stack:
[   45.060138]  88021d5fc890 88021417d890  
8802105efb68
[   45.060264]  00010005 880213b83920 8802105efb78 
00ffa015ecd1
[   45.060392]  8800d114d400 8800d114d240 880210464220 
880210565d40
[   45.060516] Call Trace:
[   45.060556]  [] ? relocate_tree_blocks+0x15f/0x430 [btrfs]
[   45.060607]  [] ? tree_insert+0x44/0x47 [btrfs]
[   45.060656]  [] ? add_tree_block+0x112/0x13c [btrfs]
[   45.060702]  [] ? relocate_block_group+0x26d/0x4a6 [btrfs]
[   45.060753]  [] ? btrfs_wait_ordered_roots+0x18f/0x1ab 
[btrfs]
[   45.060812]  [] ? btrfs_relocate_block_group+0x154/0x265 
[btrfs]
[   45.060872]  [] ? btrfs_relocate_chunk.isra.29+0x52/0x55d 
[btrfs]
[   45.060932]  [] ? btrfs_set_lock_blocking_rw+0xa8/0xaa 
[btrfs]
[   45.060988]  [] ? btrfs_item_key_to_cpu+0x12/0x30 [btrfs]
[   45.061039]  [] ? btrfs_get_token_64+0x75/0xcf [btrfs]
[   45.061088]  [] ? release_extent_buffer+0x26/0x96 [btrfs]
[   45.061170]  [] ? btrfs_balance+0x9e3/0xb78 [btrfs]
[   45.061263]  [] ? btrfs_balance+0xb78/0xb78 [btrfs]
[   45.061314]  [] ? balance_kthread+0x4f/0x6d [btrfs]
[   45.061360]  [] ? kthread+0xa7/0xaf
[   45.061420]  [] ? SyS_old_getrlimit+0x21/0xcb
[   45.061460]  [] ? __kthread_parkme+0x5b/0x5b
[   45.061501]  [] ? ret_from_fork+0x7c/0xb0
[   45.061541]  [] ? __kthread_parkme+0x5b/0x5b
[   45.061579] Code: 26 a8 02 74 0d 4c 89 e7 e8 3c e1 ff ff 41 80 66 71 fd 49 
8b 46 58 49 89 6e 58 4c 89 65 00 48 89 45 08 48 89 28 eb c0 a8 10 75 02 <0f> 0b 
83 e0 01 39 44 24 10 0f 84 20 ff ff ff 0f 0b 49 8b 46 58 
[   45.063148] RIP  [] build_backref_tree+0xa3d/0xcf6 [btrfs]
[   45.063219]  RSP 
[   45.063260] ---[ end trace c396e96e4d1a5697 ]---

I have dump the FS with btrfs-image, but don't know where to push that.
So you can download it at : https://daevel.fr/img/btrfs-image.out
(near 6GB, md5sum ee5559ab31368aba60c259ce3b5b9504)

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: kernel 3.17-rc3: task rsync:2524 blocked for more than 120 seconds

2014-09-03 Thread Liu Bo

On Wed, Sep 03, 2014 at 08:03:51AM +0200, john terragon wrote:
> I tried the same routine on 32GB usb sticks. Same exact problems. 32GB
> seems a bit much for a --mixed btrfs.
> I haven't tried ssd_spread, maybe it's beneficial. However, as I wrote
> above, disabling autodefrag gets rid completely of the "INFO: hung
> task" messages but even though the kernel doesn't complain about
> blocked kworkers, the rsync process still  blocks for several minutes
> throughout the whole copy.

It's very nice to know that you can reproduce it with autodefrag.

I made a few analysis on the provided blocked stacks, the key is what blocks 
writing free space cache's pages to finish, it sits on wait_on_page_bit() which
expects a WRITEBACK bit.

Could you please paste the output of sysrq-w and sysrq-t when you get that hang?

thanks,
-liubo

> 
> 
> On Wed, Sep 3, 2014 at 4:44 AM, Chris Murphy  wrote:
> >
> > On Sep 2, 2014, at 12:40 AM, Duncan <1i5t5.dun...@cox.net> wrote:
> >>
> >> Mkfs.btrfs used to default to 4 KiB node/leaf sizes; now days it defaults
> >> to 16 KiB as that's far better for most usage.  I wonder if USB sticks
> >> are an exception...
> >
> > USB sticks > 1 GB get 16KB nodesize also. At <= 1 GB, mixed-bg is default 
> > as is 4KB nodesize. Probably because queue/rotational is 1 for USB sticks, 
> > they mount without ssd or ssd_spread which may be unfortunate (I haven't 
> > benchmarked it but I suspect ssd_spread would work well for USB sticks).
> >
> > It was suggested a while ago that maybe mixed-bg should apply to larger 
> > volumes, maybe up to 8GB or 16GB?
> >
> >
> > Chris Murphy
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 11/12] Btrfs: implement repair function when direct read fails

2014-09-03 Thread Miao Xie

On Tue, 2 Sep 2014 09:05:15 -0400, Chris Mason wrote:
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 08e65e9..56b1546 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -698,7 +719,12 @@ static void end_workqueue_bio(struct bio *bio, int 
> err)
>  
>   fs_info = end_io_wq->info;
>   end_io_wq->error = err;
> - btrfs_init_work(&end_io_wq->work, end_workqueue_fn, NULL, NULL);
> +
> + if (likely(end_io_wq->metadata != BTRFS_WQ_ENDIO_DIO_REPAIR))
> + btrfs_init_work(&end_io_wq->work, end_workqueue_fn, NULL,
> + NULL);
> + else
> + INIT_WORK(&end_io_wq->work.normal_work, dio_end_workqueue_fn);

 It's not clear why this one is using INIT_WORK instead of
 btrfs_init_work, or why we're calling directly into queue_work instead
 of btrfs_queue_work.  What am I missing?
>>>
>>> I'm sorry that I forgot writing the explanation in this patch's changlog,
>>> I wrote it in Patch 0.
>>>
>>> "2.When the io on the mirror ends, we will insert the endio work into the
>>>system workqueue, not btrfs own endio workqueue, because the original
>>>endio work is still blocked in the btrfs endio workqueue, if we insert
>>>the endio work of the io on the mirror into that workqueue, deadlock
>>>would happen."
>>
>> Can you elaborate the deadlock?
>>
>> Now that buffer read can insert a subsequent read-mirror bio into btrfs endio
>> workqueue without problems, what's the difference?
> 
> We do have problems if we're inserting dependent items in the same
> workqueue.
> 
> Miao, please make a repair workqueue.  I'll also have a use for it in
> the raid56 parity work I think.

OK, I'll update the patch soon.

Thanks
Miao

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Btrfs-progs-3.16: fs metadata is both single and dup?

2014-09-03 Thread Duncan

Hugo Mills posted on Wed, 03 Sep 2014 08:33:05 +0100 as excerpted:

> On Wed, Sep 03, 2014 at 04:53:39AM +, Duncan wrote:
>> Hugo Mills posted on Tue, 02 Sep 2014 13:13:49 +0100 as excerpted:
>> 
>> 
>> [A] btrfs fi df on a new filesystem always seems to have those extra
>> unused single profile lines.
>> 
>> I got so the first thing I'd do on first mount was a balance -- before
>> there was anything actually on the filesystem so it was real fast -- to
>> get rid of those null entries.
> 
>Interesting. Last time I tried that (balance without any contents),
> the balance removed *all* the chunks, and then the FS forgot about what
> configuration it should have and reverted to RAID-1/single. I usually
> recommend writing at least one 4k+ file to the FS first, if it's
> bothering someone so much that they can't let it go.

Interesting indeed.  From memory, even before I've put anything on the 
filesystem it always seems to have a bit of the first chunk of both data 
and metadata used -- not much but enough that it's obvious in the df 
which mode chunks are the null-chunks, and apparently obvious to the 
balance as well, as it has always left me with at least a first chunk of 
each.

I wonder what the difference might be.  Perhaps it's just the versions of 
kernel and/or userspace I've happened to do all my mkfs.btrfs with?  Or 
maybe it's one of the features (like thin-metadata or noholes) I enable 
by default, or the fact that I use labels for partition ID and tracking, 
so I always fill that in.  Whatever it is, it seems to put a bit of 
something in the filesystem, possibly at first mount, so the actually 
used chunks, one each of data and metadata, aren't entirely empty.

Or maybe I'm remembering wrong and I've just been lucky. 

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Btrfs-progs-3.16: fs metadata is both single and dup?

2014-09-03 Thread Hugo Mills

On Wed, Sep 03, 2014 at 04:53:39AM +, Duncan wrote:
> Hugo Mills posted on Tue, 02 Sep 2014 13:13:49 +0100 as excerpted:
> 
> > On Tue, Sep 02, 2014 at 12:05:33PM +, Holger Hoffstätte wrote:
> >> So where does the confusing initial display come from? [I] don't
> >> remember ever seeing this with btrfs-progs-3.14.2.
> > 
> >Your memory is faulty, I'm afraid. It's always done that -- at
> > least since I started using btrfs, several years ago.
> > 
> >I believe it comes from mkfs creating a trivial basic filesystem
> > (with the single profiles), and then setting enough flags on it that the
> > kernel can bootstrap it with the desired chunks in it -- but I may be
> > wrong about that.
> 
> Agreed.  It's an artifact of the mkfs.btrfs process and a btrfs fi df on 
> a new filesystem always seems to have those extra unused single profile 
> lines.
> 
> I got so the first thing I'd do on first mount was a balance -- before 
> there was anything actually on the filesystem so it was real fast -- to 
> get rid of those null entries.

   Interesting. Last time I tried that (balance without any contents),
the balance removed *all* the chunks, and then the FS forgot about
what configuration it should have and reverted to RAID-1/single. I
usually recommend writing at least one 4k+ file to the FS first, if
it's bothering someone so much that they can't let it go.

   Hugo.

> Actually, I had already created a little mkfs.btrfs helper script that 
> sets options I normally want, etc, and after doing the mkfs and balance 
> drill a few times, I setup the script such that if at the appropriate 
> prompt I give it a mountpoint to point balance at, it'll mount the 
> filesystem and immediately run a balance, thus automating things and 
> making the balance part of the same scripted process that does the 
> mkfs.btrfs in the first place.
> 
> IOW, those null-entry lines bother me too... enough that even tho I know 
> what they are I arranged things so they're automatically and immediately 
> eliminated and I don't have to see 'em! =:^)
> 

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- Never underestimate the bandwidth of a Volvo filled ---   
   with backup tapes.


signature.asc
Description: Digital signature

[PATCH v2 06/10] Btrfs: Fix the problem that the dirty flag of dev stats is cleared

2014-09-03 Thread Miao Xie

The io error might happen during writing out the device stats, and the
device stats information and dirty flag would be update at that time,
but the current code didn't consider this case, just clear the dirty
flag, it would cause that we forgot to write out the new device stats
information. Fix it.

Signed-off-by: Miao Xie 
---
Changelog v1 -> v2:
- Change the variant name and make some cleanup by David's comment
---
 fs/btrfs/volumes.c |  8 ++--
 fs/btrfs/volumes.h | 16 
 2 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 19188df..4ea73c8 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -159,6 +159,7 @@ static struct btrfs_device *__alloc_device(void)
 
spin_lock_init(&dev->reada_lock);
atomic_set(&dev->reada_in_flight, 0);
+   atomic_set(&dev->dev_stats_dirty, 0);
INIT_RADIX_TREE(&dev->reada_zones, GFP_NOFS & ~__GFP_WAIT);
INIT_RADIX_TREE(&dev->reada_extents, GFP_NOFS & ~__GFP_WAIT);
 
@@ -6398,16 +6399,19 @@ int btrfs_run_dev_stats(struct btrfs_trans_handle 
*trans,
struct btrfs_root *dev_root = fs_info->dev_root;
struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
struct btrfs_device *device;
+   int dirtied;
int ret = 0;
 
mutex_lock(&fs_devices->device_list_mutex);
list_for_each_entry(device, &fs_devices->devices, dev_list) {
-   if (!device->dev_stats_valid || !device->dev_stats_dirty)
+   dirtied = atomic_read(&device->dev_stats_dirty);
+
+   if (!device->dev_stats_valid || !dirtied)
continue;
 
ret = update_dev_stat_item(trans, dev_root, device);
if (!ret)
-   device->dev_stats_dirty = 0;
+   atomic_sub(dirtied, &device->dev_stats_dirty);
}
mutex_unlock(&fs_devices->device_list_mutex);
 
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 6fcc8ea..9a1eff3 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -110,7 +110,8 @@ struct btrfs_device {
/* disk I/O failure stats. For detailed description refer to
 * enum btrfs_dev_stat_values in ioctl.h */
int dev_stats_valid;
-   int dev_stats_dirty; /* counters need to be written to disk */
+
+   atomic_t dev_stats_dirty; /* counters need to be written to disk */
atomic_t dev_stat_values[BTRFS_DEV_STAT_VALUES_MAX];
 };
 
@@ -359,11 +360,18 @@ unsigned long btrfs_full_stripe_len(struct btrfs_root 
*root,
 int btrfs_finish_chunk_alloc(struct btrfs_trans_handle *trans,
struct btrfs_root *extent_root,
u64 chunk_offset, u64 chunk_size);
+
+static inline void btrfs_dev_dirty_stat(struct btrfs_device *dev)
+{
+   smp_mb__before_atomic();
+   atomic_inc(&dev->dev_stats_dirty);
+}
+
 static inline void btrfs_dev_stat_inc(struct btrfs_device *dev,
  int index)
 {
atomic_inc(dev->dev_stat_values + index);
-   dev->dev_stats_dirty = 1;
+   btrfs_dev_dirty_stat(dev);
 }
 
 static inline int btrfs_dev_stat_read(struct btrfs_device *dev,
@@ -378,7 +386,7 @@ static inline int btrfs_dev_stat_read_and_reset(struct 
btrfs_device *dev,
int ret;
 
ret = atomic_xchg(dev->dev_stat_values + index, 0);
-   dev->dev_stats_dirty = 1;
+   btrfs_dev_dirty_stat(dev);
return ret;
 }
 
@@ -386,7 +394,7 @@ static inline void btrfs_dev_stat_set(struct btrfs_device 
*dev,
  int index, unsigned long val)
 {
atomic_set(dev->dev_stat_values + index, val);
-   dev->dev_stats_dirty = 1;
+   btrfs_dev_dirty_stat(dev);
 }
 
 static inline void btrfs_dev_stat_reset(struct btrfs_device *dev,
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

50 matches

Mail list logo