BTRFS critical (device dm-0): invalid dir item name len: 45389
Hi. When I traverse one of my btrfs, for example with a simple "find /", I get the following in kmsg BTRFS critical (device dm-0): invalid dir item name len: 45389 The message appears just one time (so I guess it involves just one file/dir). dm-0 is the first dmcrypt device of a pair on which I have btrfs in RAID0 (btrfs native raid). Though I can't be 100% sure, this seems to be a very recent problem (I would have noticed something "critical" in kmsg if it happened before). Everything else seems to work fine. So, should I be worried. Is there a way to fix this? (I assume that a scrub would not do any good since it seems to be related to btrfs data structures more than actual file data). Is there at least a way to know which file/dir is involved? Maybe a verbose debug mode? Or maybe I should just add some printk in the verify_dir_item function that seems to generate the message. Thanks John -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs-progs: remove wrong set_argv0 for restore
Before this patch, you could see the following after exec restore # :too few arguments The tool name "btrfs restore" is missing. The @set_argv0() function is introduced by: commit a184abc70f7b1468e6036ab576f1587ee0574668 btrfs-progs: move the check_argc_* functions into utils.c ... Also add a new function "set_argv0" to set the correct tool name: *btrfs-image*: too few arguments But @set_argv0() only applies to the independent tools with the name pattern btrfs-***. Since restore is now is subcommand under "btrfs", there is no need to use @set_argv0() before check_argc_* to repair the prompt tool name before "too few arguments". Signed-off-by: Gui Hecheng --- cmds-restore.c | 1 - 1 file changed, 1 deletion(-) diff --git a/cmds-restore.c b/cmds-restore.c index f909429..38a131e 100644 --- a/cmds-restore.c +++ b/cmds-restore.c @@ -1229,7 +1229,6 @@ int cmd_restore(int argc, char **argv) } } - set_argv0(argv); if (!list_roots && check_argc_min(argc - optind, 2)) usage(cmd_restore_usage); else if (list_roots && check_argc_min(argc - optind, 1)) -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs-progs: remove btrfs_release_path before btrfs_free_path
The btrfs_free_path calls btrfs_release_path internally. Signed-off-by: Gui Hecheng --- disk-io.c | 1 - file-item.c | 1 - inode-map.c | 2 -- 3 files changed, 4 deletions(-) diff --git a/disk-io.c b/disk-io.c index 9e44f10..0f9f374 100644 --- a/disk-io.c +++ b/disk-io.c @@ -628,7 +628,6 @@ struct btrfs_root *btrfs_read_fs_root_no_cache(struct btrfs_fs_info *fs_info, memcpy(&root->root_key, location, sizeof(*location)); ret = 0; out: - btrfs_release_path(path); btrfs_free_path(path); if (ret) { free(root); diff --git a/file-item.c b/file-item.c index 6f3708b..b46d7f1 100644 --- a/file-item.c +++ b/file-item.c @@ -306,7 +306,6 @@ found: csum_size); btrfs_mark_buffer_dirty(path->nodes[0]); fail: - btrfs_release_path(path); btrfs_free_path(path); return ret; } diff --git a/inode-map.c b/inode-map.c index 3e138b5..1321bfb 100644 --- a/inode-map.c +++ b/inode-map.c @@ -90,12 +90,10 @@ int btrfs_find_free_objectid(struct btrfs_trans_handle *trans, // FIXME -ENOSPC found: root->last_inode_alloc = *objectid; - btrfs_release_path(path); btrfs_free_path(path); BUG_ON(*objectid < search_start); return 0; error: - btrfs_release_path(path); btrfs_free_path(path); return ret; } -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: fs corruption report
On Mon, 2014-09-01 at 15:25 +, Zooko Wilcox-OHearn wrote: > I'm more than happy to try out patches and even focus my own brain on > diagnosing it, if I can. I'm hoping to regain access to some of my > files on my btrfs partition, and also I would enjoy helping get this > improved. :-) > > So if you want me to try an experiment, just email me. Unfortunately I > can't just give you a copy of the partition, since it has confidential > information on it. > > Regards, > > Zooko Hi Zooko, Marc, Firstly, thanks for your backtrace info, Marc. Sorry to reply late, since I'm offline these days. For the restore problem, I'm sure that the lzo decompress routine lacks the ability to handle some specific extent pattern. Here is my test result: I'm using a specific file for test /usr/lib/modules/$(uname -r)/kernel/net/irda/irda.ko. You can get it easily on your own box. # mkfs -t btrfs # mount -o compress-force=lzo # cp irda.ko # umount # btrfs restore -v report: # bad compress length # failed to inflate btrfs-progs version: v3.16.x With the same file under no-compress & zlib-compress, the restore will output a correct copy of irda.ko. I'm not sure whether the problem above has something to do with your problem. Hope that the messages above are helpful. -Gui -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs-progs: fix find_mount_root() to handle duplicated mount point correctly
Original find_mount_root() will use the first mount point match and return it. It was OK until the following commit, which will also check the fstype: de22c28ef31d9721606ba059 btrfs-progs: Check fstype in find_mount_root() With fstype check, we should check the last match, not only the first one. Or the following mount will not pass the find_mount_root(): /dev/sdc on /mnt/test type ext4 (rw,relatime,data=ordered) /dev/sdb on /mnt/test type btrfs (rw,relatime,space_cache) This patch will use the last match to do the fstype check. Reported-by: Remco Hosman Signed-off-bu: Remco Hosman Signed-off-by: Qu Wenruo --- utils.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/utils.c b/utils.c index 6c09366..6a16b06 100644 --- a/utils.c +++ b/utils.c @@ -2359,8 +2359,8 @@ int find_mount_root(const char *path, char **mount_root) while ((ent = getmntent(mnttab))) { len = strlen(ent->mnt_dir); if (strncmp(ent->mnt_dir, path, len) == 0) { - /* match found */ - if (longest_matchlen < len) { + /* match found and use the latest match */ + if (longest_matchlen <= len) { free(longest_match); longest_matchlen = len; longest_match = strdup(ent->mnt_dir); -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs stable updates for v3.16
On 09/03/2014 07:36 PM, Holger Hoffstätte wrote: > On Wed, 03 Sep 2014 16:50:47 -0400, Chris Mason wrote: > >> Hi everyone, >> >> For 3.16, please pull these into stable, I've cherry picked and tested >> them here. For 3.15 and earlier there are a few conflicts, so I'll make >> a git tree with things to pull. >> >> 8d875f95da43c6a8f18f77869f2ef26e9594fecc v3.15+ > > This ("fix filemap_flush call in btrfs_file_release") is the only one > that requires some work for 3.14. > > There is one conflict in ordered.data.c - just a sligh work queue > submission change - and the second in transaction.c where the patch does > not delete enough from btrfs_flush_all_pending_stuffs(), since 3.14 still > has the old qgroup calls in place. I removed it wholesale and that makes > everything fit. > > The followup ("fix filemap_flush call in btrfs_file_release") then also > applies. > > Should they also go into the next 3.14.x stable cycle? This rename > deadlock sounds like a possible problem with rsync, which seems like a > popular use case, and I guess nobody will complain about slightly better > performance either. Right, the btrfs_flush_all_pending_stuffs function can just be deleted. But, Liu Bo's patch isn't required on 3.14 (since the regression he fixed came with 3.15). And these changes are big enough that I like to test them a little here before sending out. I did mark that patch as 3.15+, but really that deadlock has been there forever. We only started seeing it with 3.15+ because other waitqueue problems made it stand out. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs stable updates for v3.16
On Wed, 03 Sep 2014 16:50:47 -0400, Chris Mason wrote: > Hi everyone, > > For 3.16, please pull these into stable, I've cherry picked and tested > them here. For 3.15 and earlier there are a few conflicts, so I'll make > a git tree with things to pull. > > 8d875f95da43c6a8f18f77869f2ef26e9594fecc v3.15+ This ("fix filemap_flush call in btrfs_file_release") is the only one that requires some work for 3.14. There is one conflict in ordered.data.c - just a sligh work queue submission change - and the second in transaction.c where the patch does not delete enough from btrfs_flush_all_pending_stuffs(), since 3.14 still has the old qgroup calls in place. I removed it wholesale and that makes everything fit. The followup ("fix filemap_flush call in btrfs_file_release") then also applies. Should they also go into the next 3.14.x stable cycle? This rename deadlock sounds like a possible problem with rsync, which seems like a popular use case, and I guess nobody will complain about slightly better performance either. Holger -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: INFO: task btrfs-transacti:2408 blocked for more than 120 seconds.
Martin Steigerwald posted on Thu, 04 Sep 2014 00:02:03 +0200 as excerpted: > Am Mittwoch, 3. September 2014, 19:17:17 schrieben Sie: >> At a 32 bit stable Gentoo Linux I do have 2 BTRFS file systems : >> >> $ mount | grep btrfs /var/lib/portage.fs on /usr/portage type btrfs >> (rw,noatime,compress=lzo) /var/lib/pkg.fs on /var/db/pkg type btrfs >> (rw,noatime,compress=lzo) >> >> holding a lot of small Gentoo-package-Manager-related files. The first >> is exported via NMFS so that my KVM can access that tree too. >> >> Today I got a hang while upgrading a package at the host and one within >> the KVM at the same time, syslog tells me: > > Which kernel is this? >From the posted log: >> Sep 3 19:10:57 n22 kernel: Not tainted 3.16.1 #5 =:^) (FWIW even tho I don't claim to be a dev or to otherwise make much sense of traces like the one posted, I /have/ learned to look for the kernel version in a line near the top of the trace. I can make sense of that, at least, and it can sometimes save a bit of confusion when the poster claims to be using one version but is obviously a bit confused themselves as the trace says it's something else. =:^) > If this is anything less than 3.17-rc3, I suggest you try with that one, > or wait till the hang fix patches got into stable trees. Chris´s recent > pull request may have been about these. Agreed. Very likely the following known issue: Kernel 3.15 switched various critical btrfs tasks from private btrfs threads to the generic kworker kernel threads infrastructure, but in the process triggered a previously latent kworker lockdep bug where the kworker threads weren't behaving according to their documentation. Developing and testing a proper fix to the root kworker threads behavior issue will probably take another kernel cycle or two, but in the mean time a btrfs patch working around the problem has been developed and tested. It's in 3.17-rc3 and marked for stable but not yet in a stable release. So 3.14 stable series wasn't affected by the problem as btrfs was still using private kernel threads, previous versions aren't recommended as they had other now known and fixed bugs, 3.15 is AFAIK not a long-term- stable series and is unlikely to get the patch unless you apply it yourself, 3.16 isn't a long-term-stable either but is still supported and the patch is queued for the next stable release, and 3.17 thru rc2 doesn't have the fix but it's in rc3. So 3.17-rc3+ is the only non-git mainline kernel with the patch applied at this time. For this bug you therefore have the following choices: 1) Switch to the latest 3.17 series development kernel. (Preferred) 2) Live with it until the next 3.16 stable series release. 3) Grab and apply the patch to a previous 3.15 or 3.16 stable series kernel yourself. 4) Revert to 3.14 stable series, which wasn't affected. 5) Turn off the compression mount option and do a rebalance to eliminate existing compression, as the bug only triggers when dealing with btrfs compression. 6) Live on the /real/ edge and switch to btrfs integration series kernels, with patches undergoing testing for the /next/ mainline kernel series (3.18 at this point). FWIW, there's another much harder to trigger (thus only recently found, traced and patched) bug that goes back much farther (3.4 at least), with a patch in 3.17-rc3 and headed for stable series as well. However, given the rarity of triggering it and the fact that people have lived with it until now, while that patch is good to apply to prevent future rare-case issues, it's not as urgent as the one above. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: INFO: task btrfs-transacti:2408 blocked for more than 120 seconds.
Am Mittwoch, 3. September 2014, 19:17:17 schrieben Sie: > At a 32 bit stable Gentoo Linux I do have 2 BTRFS file systems : > > $ mount | grep btrfs > /var/lib/portage.fs on /usr/portage type btrfs (rw,noatime,compress=lzo) > /var/lib/pkg.fs on /var/db/pkg type btrfs (rw,noatime,compress=lzo) > > holding a lot of small Gentoo-package-Manager-related files. The first is > exported via NMFS so that my KVM can access that tree too. > > Today I got a hang while upgrading a package at the host and one within the > KVM at the same time, syslog tells me: Which kernel is this? If this is anything less than 3.17-rc3, I suggest you try with that one, or wait till the hang fix patches got into stable trees. Chris´s recent pull request may have been about these. Thanks, Martin > Sep 3 19:10:57 n22 kernel: INFO: task btrfs-transacti:2408 blocked for more > than 120 seconds. Sep 3 19:10:57 n22 kernel: Not tainted 3.16.1 #5 > Sep 3 19:10:57 n22 kernel: "echo 0 > > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Sep 3 > 19:10:57 n22 kernel: btrfs-transacti D 0 2408 2 > 0x Sep 3 19:10:57 n22 kernel: edb15a34 0086 c10c7efc > 0001 5197f8f9 1fa0 c17bf880 Sep 3 19:10:57 n22 kernel: c17bf880 > f3630880 f10167c0 c1099ed3 0002 0001 c10c7efc Sep 3 > 19:10:57 n22 kernel: 118a2543 0857 0018 c24d94d3 002180b0 > 0062c84a abf93cc6 Sep 3 19:10:57 n22 kernel: Call Trace: > Sep 3 19:10:57 n22 kernel: [] ? __delayacct_blkio_start+0x1c/0x20 > Sep 3 19:10:57 n22 kernel: [] ? ktime_get_ts+0x83/0x180 Sep 3 > 19:10:57 n22 kernel: [] ? __delayacct_blkio_start+0x1c/0x20 Sep > 3 19:10:57 n22 kernel: [] ? wait_on_page_read+0x50/0x50 Sep 3 > 19:10:57 n22 kernel: [] io_schedule+0x86/0x100 > Sep 3 19:10:57 n22 kernel: [] sleep_on_page+0xd/0x20 > Sep 3 19:10:57 n22 kernel: [] __wait_on_bit+0x51/0x80 > Sep 3 19:10:57 n22 kernel: [] ? wait_on_page_read+0x50/0x50 > Sep 3 19:10:57 n22 kernel: [] wait_on_page_bit+0x83/0x90 > Sep 3 19:10:57 n22 kernel: [] ? > autoremove_wake_function+0x40/0x40 Sep 3 19:10:57 n22 kernel: [] > read_extent_buffer_pages+0x2dc/0x2f0 Sep 3 19:10:57 n22 kernel: > [] btree_read_extent_buffer_pages.constprop.50+0xc8/0x140 Sep 3 > 19:10:57 n22 kernel: [] ? free_root_pointers+0x50/0x50 Sep 3 > 19:10:57 n22 kernel: [] read_tree_block+0x3c/0x60 Sep 3 19:10:57 > n22 kernel: [] read_block_for_search.isra.30+0x141/0x390 Sep 3 > 19:10:57 n22 kernel: [] btrfs_search_slot+0x3a7/0x870 Sep 3 > 19:10:57 n22 kernel: [] lookup_inline_extent_backref+0x132/0x6e0 > Sep 3 19:10:57 n22 kernel: [] ? update_curr+0xeb/0x1a0 > Sep 3 19:10:57 n22 kernel: [] ? cpuacct_charge+0x6e/0x90 > Sep 3 19:10:57 n22 kernel: [] __btrfs_free_extent+0x13d/0xd10 > Sep 3 19:10:57 n22 kernel: [] ? _raw_spin_unlock+0x22/0x30 > Sep 3 19:10:57 n22 kernel: [] ? > __btrfs_run_delayed_refs+0x117/0x1260 Sep 3 19:10:57 n22 kernel: > [] __btrfs_run_delayed_refs+0x8d7/0x1260 Sep 3 19:10:57 n22 > kernel: [] ? finish_task_switch+0x79/0x100 Sep 3 19:10:57 n22 > kernel: [] ? mutex_unlock+0xd/0x10 > Sep 3 19:10:57 n22 kernel: [] > btrfs_run_delayed_refs.part.60+0x58/0x220 Sep 3 19:10:57 n22 kernel: > [] ? btrfs_run_ordered_operations+0x1b7/0x240 Sep 3 19:10:57 n22 > kernel: [] btrfs_run_delayed_refs+0x14/0x30 Sep 3 19:10:57 n22 > kernel: [] btrfs_commit_transaction+0x45/0xc70 Sep 3 19:10:57 > n22 kernel: [] ? start_transaction+0x7e/0x5b0 Sep 3 19:10:57 n22 > kernel: [] transaction_kthread+0x195/0x220 Sep 3 19:10:57 n22 > kernel: [] ? btrfs_cleanup_transaction+0x490/0x490 Sep 3 > 19:10:57 n22 kernel: [] kthread+0xa6/0xc0 > Sep 3 19:10:57 n22 kernel: [] ret_from_kernel_thread+0x21/0x30 > Sep 3 19:10:57 n22 kernel: [] ? > kthread_create_on_node+0x180/0x180 Sep 3 19:10:57 n22 kernel: 2 locks held > by btrfs-transacti/2408: > Sep 3 19:10:57 n22 kernel: #0: > (&fs_info->transaction_kthread_mutex){..}, at: [] > transaction_kthread+0x107/0x220 Sep 3 19:10:57 n22 kernel: #1: > (&head_ref->mutex){..}, at: [] > btrfs_delayed_ref_lock+0x2f/0x1f0 > > > Just FWIW -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs stable updates for v3.16
On Wed, Sep 03, 2014 at 04:50:47PM -0400, Chris Mason wrote: > Hi everyone, > > For 3.16, please pull these into stable, I've cherry picked and tested > them here. For 3.15 and earlier there are a few conflicts, so I'll make > a git tree with things to pull. > > 8d875f95da43c6a8f18f77869f2ef26e9594fecc v3.15+ > 38c1c2e44bacb37efd68b90b3f70386a8ee370ee v3.11+ > f6dc45c7a93a011dff6eb9b2ffda59c390c7705a v3.15+ > 9e0af23764344f7f1b68e4eefbe7dc865018b63d v3.15+ Now applied to the trees I manage. thanks, greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs stable updates for 3.16.x (and others)
On Tue, Aug 19, 2014 at 01:10:45PM +0200, David Sterba wrote: > Hi stable team, > > please add the following patches to stable trees. > > Patch #3 applies to all currently live stables, a 7 years old bug. I've > briefly reviewed all 3 patches against 3.10/12/14/16 (ie. 3.4 skips #1 > and #2). > > Subjects: > Btrfs: read lock extent buffer while walking backrefs > Btrfs: fix compressed write corruption on enospc > Btrfs: fix csum tree corruption, duplicate and outdated checksums > Commits: > 6f7ff6d7832c6be13e8c95598884dbc40ad69fb7 This doesn't apply to 3.10-stable :( > ce62003f690dff38d3164a632ec69efa15c32cbf Neither did this. > 27b9a8122ff71a8cadfbffb9c4f0694300464f3b Was already marked for stable. thanks, greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Btrfs stable updates for v3.16
Hi everyone, For 3.16, please pull these into stable, I've cherry picked and tested them here. For 3.15 and earlier there are a few conflicts, so I'll make a git tree with things to pull. 8d875f95da43c6a8f18f77869f2ef26e9594fecc v3.15+ 38c1c2e44bacb37efd68b90b3f70386a8ee370ee v3.11+ f6dc45c7a93a011dff6eb9b2ffda59c390c7705a v3.15+ 9e0af23764344f7f1b68e4eefbe7dc865018b63d v3.15+ -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Large files, nodatacow and fragmentation
Hi Richard, > It is interesting that for me the number of extents before and after > bcache are essentially the same. > > The lesson here for me there is that the fragmentation of a btrfs > nodatacow file is not mitigated by bcache. There seems to be nothing I > can do to prevent that fragmentation, and may in fact be expected > behavior. This is to be expected - bcache behaves like a single, transparent block device - so for btrfs it doesn't matter whether you run on a "real" device or a bcache one. The performance increase is expected, however ;) Best regards, Clemens -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
kernel BUG at fs/btrfs/extent-tree.c:7727! with 3.17-rc3
Got the following with 3.17-rc3 and running balance (had to power cycle after that): [ 1329.952600] [ cut here ] [ 1329.952671] WARNING: CPU: 7 PID: 3106 at fs/btrfs/extent-tree.c:876 btrfs_lookup_extent_info+0x377/0x3eb [btrfs]() [ 1329.952726] Modules linked in: ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables cpufreq_ondemand cpufreq_conservative cpufreq_powersave cpufreq_stats bridge stp llc ipv6 btrfs xor raid6_pq zlib_deflate coretemp hwmon loop parport_pc parport pcspkr i2c_i801 tpm_infineon tpm_tis tpm i2ccore video battery lpc_ich mfd_core ehci_pci ehci_hcd button acpi_cpufreq ext4 crc16 jbd2 mbcache raid1 sg sd_mod ahci libahci libata scsi_mod r8169 mii [ 1329.954740] CPU: 7 PID: 3106 Comm: btrfs-balance Not tainted 3.17.0-rc3 #1 [ 1329.954789] Hardware name: System manufacturer System Product Name/P8H77-M PRO, BIOS 1101 02/04/2013 [ 1329.954841] 0009 880733d4f8d8 813ab092 [ 1329.955030] 880733d4f918 81039b41 0007 [ 1329.955219] a02d8560 8807aa536120 [ 1329.955407] Call Trace: [ 1329.955455] [] dump_stack+0x46/0x58 [ 1329.955503] [] warn_slowpath_common+0x77/0x91 [ 1329.955610] [] ? btrfs_lookup_extent_info+0x377/0x3eb [btrfs] [ 1329.955758] [] warn_slowpath_null+0x15/0x17 [ 1329.955862] [] btrfs_lookup_extent_info+0x377/0x3eb [btrfs] [ 1329.956018] [] walk_down_proc+0xc5/0x22b [btrfs] [ 1329.956128] [] ? join_transaction.isra.30+0x24/0x309 [btrfs] [ 1329.956285] [] walk_down_tree+0x45/0xd5 [btrfs] [ 1329.956391] [] btrfs_drop_snapshot+0x2f5/0x68f [btrfs] [ 1329.956505] [] merge_reloc_roots+0x139/0x23f [btrfs] [ 1329.956617] [] relocate_block_group+0x466/0x4de [btrfs] [ 1329.956728] [] btrfs_relocate_block_group+0x158/0x278 [btrfs] [ 1329.956890] [] btrfs_relocate_chunk.isra.62+0x58/0x5f7 [btrfs] [ 1329.957073] [] ? btrfs_set_lock_blocking_rw+0x68/0x95 [btrfs] [ 1329.957214] [] ? btrfs_set_path_blocking+0x23/0x54 [btrfs] [ 1329.957297] [] ? btrfs_search_slot+0x7bc/0x816 [btrfs] [ 1329.957382] [] ? free_extent_buffer+0x6f/0x7c [btrfs] [ 1329.957467] [] btrfs_balance+0xa7b/0xc80 [btrfs] [ 1329.957547] [] ? printk+0x48/0x4a [ 1329.957629] [] balance_kthread+0x57/0x7c [btrfs] [ 1329.957724] [] ? btrfs_balance+0xc80/0xc80 [btrfs] [ 1329.957807] [] ? btrfs_balance+0xc80/0xc80 [btrfs] [ 1329.957887] [] kthread+0xcd/0xd5 [ 1329.957965] [] ? kthread_freezable_should_stop+0x43/0x43 [ 1329.958045] [] ret_from_fork+0x7c/0xb0 [ 1329.958122] [] ? kthread_freezable_should_stop+0x43/0x43 [ 1329.958210] ---[ end trace a368b0643f9207e2 ]--- [ 1329.958293] [ cut here ] [ 1329.958378] kernel BUG at fs/btrfs/extent-tree.c:7727! [ 1329.958455] invalid opcode: [#1] SMP [ 1329.958593] Modules linked in: ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables cpufreq_ondemand cpufreq_conservative cpufreq_powersave cpufreq_stats bridge stp llc ipv6 btrfs xor raid6_pq zlib_deflate coretemp hwmon loop parport_pc parport pcspkr i2c_i801 tpm_infineon tpm_tis tpm i2ccore video battery lpc_ich mfd_core ehci_pci ehci_hcd button acpi_cpufreq ext4 crc16 jbd2 mbcache raid1 sg sd_mod ahci libahci libata scsi_mod r8169 mii [ 1329.960684] CPU: 7 PID: 3106 Comm: btrfs-balance Tainted: GW 3.17.0-rc3 #1 [ 1329.960803] Hardware name: System manufacturer System Product Name/P8H77-M PRO, BIOS 1101 02/04/2013 [ 1329.960924] task: 8807f18c ti: 880733d4c000 task.ti: 880733d4c000 [ 1329.961043] RIP: 0010:[] [] walk_down_proc+0xdc/0x22b [btrfs] [ 1329.961200] RSP: 0018:880733d4f9e8 EFLAGS: 00010246 [ 1329.961277] RAX: RBX: 0002 RCX: 000f5a50 [ 1329.961356] RDX: 000f5a4f RSI: 88081fbd9650 RDI: 00019650 [ 1329.961436] RBP: 880733d4fa38 R08: ea001ea94d80 R09: 09a2 [ 1329.961515] R10: a02cbc38 R11: R12: 8807aa536d80 [ 1329.961594] R13: 880733ac5600 R14: 880660ba65c8 R15: 0002 [ 1329.961674] FS: () GS:88081fbc() knlGS: [ 1329.961794] CS: 0010 DS: ES: CR0: 80050033 [ 1329.961872] CR2: 7f7fa0c3e000 CR3: 01611000 CR4: 001407e0 [ 1329.961951] Stack: [ 1329.962024] 880733ac5650 a02ebd20 880732a34820 8807eb201000 [ 1329.962267] 8807aa536d80 0002 880732a34820 [ 1329.962510] 8807eb201000 880733ac5600 880733d4fa98 a02dae92 [ 1329.962754] Call Trace: [ 1329.962834] [] ? join_transaction.isra.30+0x24/0x309 [btrfs] [ 1329.962957] [] walk_down_tree+0x45/0xd5 [btrfs] [ 1329.963040] [] btrfs_drop_snapshot+0x2f5/0x68f [btrfs] [ 1329.963126] [] merge_reloc
INFO: task btrfs-transacti:2408 blocked for more than 120 seconds.
At a 32 bit stable Gentoo Linux I do have 2 BTRFS file systems : $ mount | grep btrfs /var/lib/portage.fs on /usr/portage type btrfs (rw,noatime,compress=lzo) /var/lib/pkg.fs on /var/db/pkg type btrfs (rw,noatime,compress=lzo) holding a lot of small Gentoo-package-Manager-related files. The first is exported via NMFS so that my KVM can access that tree too. Today I got a hang while upgrading a package at the host and one within the KVM at the same time, syslog tells me: Sep 3 19:10:57 n22 kernel: INFO: task btrfs-transacti:2408 blocked for more than 120 seconds. Sep 3 19:10:57 n22 kernel: Not tainted 3.16.1 #5 Sep 3 19:10:57 n22 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Sep 3 19:10:57 n22 kernel: btrfs-transacti D 0 2408 2 0x Sep 3 19:10:57 n22 kernel: edb15a34 0086 c10c7efc 0001 5197f8f9 1fa0 c17bf880 Sep 3 19:10:57 n22 kernel: c17bf880 f3630880 f10167c0 c1099ed3 0002 0001 c10c7efc Sep 3 19:10:57 n22 kernel: 118a2543 0857 0018 c24d94d3 002180b0 0062c84a abf93cc6 Sep 3 19:10:57 n22 kernel: Call Trace: Sep 3 19:10:57 n22 kernel: [] ? __delayacct_blkio_start+0x1c/0x20 Sep 3 19:10:57 n22 kernel: [] ? ktime_get_ts+0x83/0x180 Sep 3 19:10:57 n22 kernel: [] ? __delayacct_blkio_start+0x1c/0x20 Sep 3 19:10:57 n22 kernel: [] ? wait_on_page_read+0x50/0x50 Sep 3 19:10:57 n22 kernel: [] io_schedule+0x86/0x100 Sep 3 19:10:57 n22 kernel: [] sleep_on_page+0xd/0x20 Sep 3 19:10:57 n22 kernel: [] __wait_on_bit+0x51/0x80 Sep 3 19:10:57 n22 kernel: [] ? wait_on_page_read+0x50/0x50 Sep 3 19:10:57 n22 kernel: [] wait_on_page_bit+0x83/0x90 Sep 3 19:10:57 n22 kernel: [] ? autoremove_wake_function+0x40/0x40 Sep 3 19:10:57 n22 kernel: [] read_extent_buffer_pages+0x2dc/0x2f0 Sep 3 19:10:57 n22 kernel: [] btree_read_extent_buffer_pages.constprop.50+0xc8/0x140 Sep 3 19:10:57 n22 kernel: [] ? free_root_pointers+0x50/0x50 Sep 3 19:10:57 n22 kernel: [] read_tree_block+0x3c/0x60 Sep 3 19:10:57 n22 kernel: [] read_block_for_search.isra.30+0x141/0x390 Sep 3 19:10:57 n22 kernel: [] btrfs_search_slot+0x3a7/0x870 Sep 3 19:10:57 n22 kernel: [] lookup_inline_extent_backref+0x132/0x6e0 Sep 3 19:10:57 n22 kernel: [] ? update_curr+0xeb/0x1a0 Sep 3 19:10:57 n22 kernel: [] ? cpuacct_charge+0x6e/0x90 Sep 3 19:10:57 n22 kernel: [] __btrfs_free_extent+0x13d/0xd10 Sep 3 19:10:57 n22 kernel: [] ? _raw_spin_unlock+0x22/0x30 Sep 3 19:10:57 n22 kernel: [] ? __btrfs_run_delayed_refs+0x117/0x1260 Sep 3 19:10:57 n22 kernel: [] __btrfs_run_delayed_refs+0x8d7/0x1260 Sep 3 19:10:57 n22 kernel: [] ? finish_task_switch+0x79/0x100 Sep 3 19:10:57 n22 kernel: [] ? mutex_unlock+0xd/0x10 Sep 3 19:10:57 n22 kernel: [] btrfs_run_delayed_refs.part.60+0x58/0x220 Sep 3 19:10:57 n22 kernel: [] ? btrfs_run_ordered_operations+0x1b7/0x240 Sep 3 19:10:57 n22 kernel: [] btrfs_run_delayed_refs+0x14/0x30 Sep 3 19:10:57 n22 kernel: [] btrfs_commit_transaction+0x45/0xc70 Sep 3 19:10:57 n22 kernel: [] ? start_transaction+0x7e/0x5b0 Sep 3 19:10:57 n22 kernel: [] transaction_kthread+0x195/0x220 Sep 3 19:10:57 n22 kernel: [] ? btrfs_cleanup_transaction+0x490/0x490 Sep 3 19:10:57 n22 kernel: [] kthread+0xa6/0xc0 Sep 3 19:10:57 n22 kernel: [] ret_from_kernel_thread+0x21/0x30 Sep 3 19:10:57 n22 kernel: [] ? kthread_create_on_node+0x180/0x180 Sep 3 19:10:57 n22 kernel: 2 locks held by btrfs-transacti/2408: Sep 3 19:10:57 n22 kernel: #0: (&fs_info->transaction_kthread_mutex){..}, at: [] transaction_kthread+0x107/0x220 Sep 3 19:10:57 n22 kernel: #1: (&head_ref->mutex){..}, at: [] btrfs_delayed_ref_lock+0x2f/0x1f0 Just FWIW -- Toralf pgp key: 0076 E94E -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Large files, nodatacow and fragmentation
It is interesting that for me the number of extents before and after bcache are essentially the same. The lesson here for me there is that the fragmentation of a btrfs nodatacow file is not mitigated by bcache. There seems to be nothing I can do to prevent that fragmentation, and may in fact be expected behavior. I cannot prove that adding the SSD bcache front-end improved performance of the guest VM, though subjectively it seems to have had a positive effect. There is something systemically pathological with the VM in question, but that's a different mailing list. :) -rb On Tue, Sep 2, 2014 at 11:26 PM, Chris Murphy wrote: > > On Sep 3, 2014, at 12:01 AM, Chris Murphy wrote: > >> I created two pools, one xfs one btrfs, default formatting and mount >> options. I then created a qcow2 file on each using virt-manager, also using >> default options. And default caching (whatever that is, I think it's >> writethrough but don't hold me to it). > > On the btrfs qcow2, xattr C was set. > > > Chris Murphy-- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel 3.17-rc3: task rsync:2524 blocked for more than 120 seconds
On Sep 3, 2014, at 8:11 AM, john terragon wrote: > It's a usb2 device but doesn't it seem kind of slow? Not atypical, I have one that's the same, and another that's ~21MB/s, both are USB 2. [Certain older Apple Mac firmware boot faster with the slow stick than the fast one, and it turns out the block size matters. Block size 512 bytes is insanely slow (as in 100KB/s) on the "fast" stick, whereas a block size of even 32k puts it to 20+MB/s. So I think the older firmware must be initially asking for 512 byte blocks, once the kernel takes over the performance is very good.] Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel 3.17-rc3: task rsync:2524 blocked for more than 120 seconds
I wasn't sure what you meant with so I dd'd all the three possible cases: 1) here's the dmcrypt device on which I mkfs.btrfs 2097152000 bytes (2.1 GB) copied, 487.265 s, 4.3 MB/s 2) here's the partition of the usb stick (which has another partition containing /boot) on top of which the dmcrypt device is created 2097152000 bytes (2.1 GB) copied, 449.693 s, 4.7 MB/s 3) here's the whole usb stick device 2097152000 bytes (2.1 GB) copied, 448.003 s, 4.7 MB/s It's a usb2 device but doesn't it seem kind of slow? Thanks John On Wed, Sep 3, 2014 at 2:36 PM, Chris Mason wrote: > On 09/02/2014 09:31 PM, john terragon wrote: >> Rsync finished. FWIW in the end it reported an average speed of about >> 900K/sec. Without autodefrag there have been no messages about hung >> kworkers even though rsync seemingly keeps getting hung for several >> minutes throughout the whole execution. > > So lets take a step back and figure out how fast the usb stick actually is. > This will erase your usb stick, but give us an idea of its performance: > > dd if=/dev/zero of=/dev/ bs=20M oflag=direct > count=100 > > Note again, the above command will erase your usb stick ;) Use whatever > device name > you've been sending to mkfs.btrfs > > The kernel will allow a pretty significant amount of ram to be dirtied before > forcing writeback, which is why you're seeing rsync stall at seemingly strange > intervals. In the base of btrfs with compression, we add some worker threads > between > rsync and the device, and these may be turning the writeback into a somewhat > more bursty operation. > > -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 18/18] Btrfs: modify rw_devices counter under chunk_mutex context
rw_devices counter is often used to tune the profile when doing chunk allocation, so we should modify it under the chunk_mutex context to avoid getting wrong chunk profile. Signed-off-by: Miao Xie --- fs/btrfs/volumes.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index b7f093d..1aacf5f 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1649,8 +1649,8 @@ int btrfs_rm_device(struct btrfs_root *root, char *device_path) if (device->writeable) { lock_chunks(root); list_del_init(&device->dev_alloc_list); + device->fs_devices->rw_devices--; unlock_chunks(root); - root->fs_info->fs_devices->rw_devices--; clear_super = true; } @@ -1795,8 +1795,8 @@ error_undo: lock_chunks(root); list_add(&device->dev_alloc_list, &root->fs_info->fs_devices->alloc_list); + device->fs_devices->rw_devices++; unlock_chunks(root); - root->fs_info->fs_devices->rw_devices++; } goto error_brelse; } -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 13/18] Btrfs: fix unprotected device list access when cloning fs devices
We can build a new filesystem based a seed filesystem, and we need clone the fs devices when we open the new filesystem. But someone might clear the seed flag of the seed filesystem, then mount that filesystem and remove some device. If we mount the new filesystem, we might access a device list which was being changed when we clone the fs devices. Fix it. Signed-off-by: Miao Xie --- fs/btrfs/volumes.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 357f911..f0173b1 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -583,6 +583,7 @@ static struct btrfs_fs_devices *clone_fs_devices(struct btrfs_fs_devices *orig) if (IS_ERR(fs_devices)) return fs_devices; + mutex_lock(&orig->device_list_mutex); fs_devices->total_devices = orig->total_devices; /* We have held the volume lock, it is safe to get the devices. */ @@ -611,8 +612,10 @@ static struct btrfs_fs_devices *clone_fs_devices(struct btrfs_fs_devices *orig) device->fs_devices = fs_devices; fs_devices->num_devices++; } + mutex_unlock(&orig->device_list_mutex); return fs_devices; error: + mutex_unlock(&orig->device_list_mutex); free_fs_devices(fs_devices); return ERR_PTR(-ENOMEM); } -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/5] block: export disk_class and disk_type for btrfs
Btrfs can make filesystem cross several disks/partitions, in order to load all the disks/partitions which belong to the same filesystem, we need scan the system and find all the devices, and then register them into the kernel. Currently, we do it by user tool. But if we forget to do it, we can not mount the filesystem. So I want btrfs scan the system and find all the devices by itself in the kernel. In order to implement it, we need disk_class and disk_type, so export them. Signed-off-by: Miao Xie --- block/genhd.c | 7 +-- include/linux/genhd.h | 1 + 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/block/genhd.c b/block/genhd.c index 791f419..8371c09 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -34,7 +34,7 @@ struct kobject *block_depr; static DEFINE_MUTEX(ext_devt_mutex); static DEFINE_IDR(ext_devt_idr); -static struct device_type disk_type; +struct device_type disk_type; static void disk_check_events(struct disk_events *ev, unsigned int *clearing_ptr); @@ -1107,9 +1107,11 @@ static void disk_release(struct device *dev) blk_put_queue(disk->queue); kfree(disk); } + struct class block_class = { .name = "block", }; +EXPORT_SYMBOL(block_class); static char *block_devnode(struct device *dev, umode_t *mode, kuid_t *uid, kgid_t *gid) @@ -1121,12 +1123,13 @@ static char *block_devnode(struct device *dev, umode_t *mode, return NULL; } -static struct device_type disk_type = { +struct device_type disk_type = { .name = "disk", .groups = disk_attr_groups, .release= disk_release, .devnode= block_devnode, }; +EXPORT_SYMBOL(disk_type); #ifdef CONFIG_PROC_FS /* diff --git a/include/linux/genhd.h b/include/linux/genhd.h index ec274e0..a701ace 100644 --- a/include/linux/genhd.h +++ b/include/linux/genhd.h @@ -22,6 +22,7 @@ #define part_to_dev(part) (&((part)->__dev)) extern struct device_type part_type; +extern struct device_type disk_type; extern struct kobject *block_depr; extern struct class block_class; -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 15/18] Btrfs: make the logic of source device removing more clear
Signed-off-by: Miao Xie --- fs/btrfs/dev-replace.c | 3 +-- fs/btrfs/volumes.c | 19 +++ 2 files changed, 8 insertions(+), 14 deletions(-) diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c index e9cbbdb..6f662b3 100644 --- a/fs/btrfs/dev-replace.c +++ b/fs/btrfs/dev-replace.c @@ -569,8 +569,7 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info *fs_info, if (fs_info->fs_devices->latest_bdev == src_device->bdev) fs_info->fs_devices->latest_bdev = tgt_device->bdev; list_add(&tgt_device->dev_alloc_list, &fs_info->fs_devices->alloc_list); - if (src_device->fs_devices->seeding) - fs_info->fs_devices->rw_devices++; + fs_info->fs_devices->rw_devices++; /* replace the sysfs entry */ btrfs_kobj_rm_device(fs_info, src_device); diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 24d7001..fd8141e 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1819,23 +1819,18 @@ void btrfs_rm_dev_replace_srcdev(struct btrfs_fs_info *fs_info, list_del_rcu(&srcdev->dev_list); list_del_rcu(&srcdev->dev_alloc_list); fs_devices->num_devices--; - if (srcdev->missing) { + if (srcdev->missing) fs_devices->missing_devices--; - if (!fs_devices->seeding) - fs_devices->rw_devices++; + + if (srcdev->writeable) { + fs_devices->rw_devices--; + /* zero out the old super if it is writable */ + btrfs_scratch_superblock(srcdev); } - if (srcdev->bdev) { + if (srcdev->bdev) fs_devices->open_devices--; - /* -* zero out the old super if it is not writable -* (e.g. seed device) -*/ - if (srcdev->writeable) - btrfs_scratch_superblock(srcdev); - } - call_rcu(&srcdev->rcu, free_device); /* -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC 0/5] Scan all devices to build fs device list
This patchset implements device list automatic building function. As we know, currently we need scan the devices to build device list by a user tool before mounting the filesystem, especially mount the filesystem after we re-install btrfs module. It is not convenient. This patchset can improve that problem. With this patchset, we will scan all the devices in the system to build the device list if we find the number of the devices is not right when we mount the filesystem. By this way, we needn't scan the device by the user tool and reduce the mount failure probability due to the incomplete device list. --- Miao Xie (5): block: export disk_class and disk_type for btrfs Btrfs: don't return btrfs_fs_devices if the caller doesn't want it Btrfs: restructure btrfs_scan_one_device Btrfs: restructure btrfs_get_bdev_and_sb and pick up some code used later Btrfs: scan all the devices and build the fs device list by btrfs's self block/genhd.c | 7 +- fs/btrfs/super.c | 3 + fs/btrfs/volumes.c| 227 -- fs/btrfs/volumes.h| 5 +- include/linux/genhd.h | 1 + 5 files changed, 177 insertions(+), 66 deletions(-) -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 11/18] Btrfs: fix unprotected device list access when getting the fs information
When we get the fs information, we forgot to acquire the mutex of device list, it might cause the problem we might access a device that was removed. Fix it by acquiring the device list mutex. Signed-off-by: Miao Xie --- fs/btrfs/super.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 089991d..6b98358 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -1703,7 +1703,11 @@ static int btrfs_statfs(struct dentry *dentry, struct kstatfs *buf) struct btrfs_block_rsv *block_rsv = &fs_info->global_block_rsv; int ret; - /* holding chunk_muext to avoid allocating new chunks */ + /* +* holding chunk_muext to avoid allocating new chunks, holding +* device_list_mutex to avoid the device being removed +*/ + mutex_lock(&fs_info->fs_devices->device_list_mutex); mutex_lock(&fs_info->chunk_mutex); rcu_read_lock(); list_for_each_entry_rcu(found, head, list) { @@ -1744,11 +1748,13 @@ static int btrfs_statfs(struct dentry *dentry, struct kstatfs *buf) ret = btrfs_calc_avail_data_space(fs_info->tree_root, &total_free_data); if (ret) { mutex_unlock(&fs_info->chunk_mutex); + mutex_unlock(&fs_info->fs_devices->device_list_mutex); return ret; } buf->f_bavail += div_u64(total_free_data, factor); buf->f_bavail = buf->f_bavail >> bits; mutex_unlock(&fs_info->chunk_mutex); + mutex_unlock(&fs_info->fs_devices->device_list_mutex); buf->f_type = BTRFS_SUPER_MAGIC; buf->f_bsize = dentry->d_sb->s_blocksize; -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 09/18] Btrfs: fix unprotected device's variants on 32bits machine
->total_bytes,->disk_total_bytes,->bytes_used is protected by chunk lock when we change them, but sometimes we read them without any lock, and we might get unexpected value. We fix this problem like inode's i_size. Signed-off-by: Miao Xie --- fs/btrfs/dev-replace.c | 15 + fs/btrfs/ioctl.c | 6 ++-- fs/btrfs/volumes.c | 48 + fs/btrfs/volumes.h | 84 ++ 4 files changed, 124 insertions(+), 29 deletions(-) diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c index 1be03d8..da7ac14 100644 --- a/fs/btrfs/dev-replace.c +++ b/fs/btrfs/dev-replace.c @@ -418,7 +418,7 @@ int btrfs_dev_replace_start(struct btrfs_root *root, /* the disk copy procedure reuses the scrub code */ ret = btrfs_scrub_dev(fs_info, src_device->devid, 0, - src_device->total_bytes, + btrfs_device_get_total_bytes(src_device), &dev_replace->scrub_progress, 0, 1); ret = btrfs_dev_replace_finishing(root->fs_info, ret); @@ -555,11 +555,12 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info *fs_info, memcpy(uuid_tmp, tgt_device->uuid, sizeof(uuid_tmp)); memcpy(tgt_device->uuid, src_device->uuid, sizeof(tgt_device->uuid)); memcpy(src_device->uuid, uuid_tmp, sizeof(src_device->uuid)); - tgt_device->total_bytes = src_device->total_bytes; - tgt_device->disk_total_bytes = src_device->disk_total_bytes; + btrfs_device_set_total_bytes(tgt_device, src_device->total_bytes); + btrfs_device_set_disk_total_bytes(tgt_device, + src_device->disk_total_bytes); + btrfs_device_set_bytes_used(tgt_device, src_device->bytes_used); ASSERT(list_empty(&src_device->resized_list)); tgt_device->commit_total_bytes = src_device->commit_total_bytes; - tgt_device->bytes_used = src_device->bytes_used; tgt_device->commit_bytes_used = src_device->bytes_used; if (fs_info->sb->s_bdev == src_device->bdev) fs_info->sb->s_bdev = tgt_device->bdev; @@ -650,6 +651,7 @@ void btrfs_dev_replace_status(struct btrfs_fs_info *fs_info, struct btrfs_ioctl_dev_replace_args *args) { struct btrfs_dev_replace *dev_replace = &fs_info->dev_replace; + struct btrfs_device *srcdev; btrfs_dev_replace_lock(dev_replace); /* even if !dev_replace_is_valid, the values are good enough for @@ -672,8 +674,9 @@ void btrfs_dev_replace_status(struct btrfs_fs_info *fs_info, break; case BTRFS_IOCTL_DEV_REPLACE_STATE_STARTED: case BTRFS_IOCTL_DEV_REPLACE_STATE_SUSPENDED: + srcdev = dev_replace->srcdev; args->status.progress_1000 = div64_u64(dev_replace->cursor_left, - div64_u64(dev_replace->srcdev->total_bytes, 1000)); + div64_u64(btrfs_device_get_total_bytes(srcdev), 1000)); break; } btrfs_dev_replace_unlock(dev_replace); @@ -832,7 +835,7 @@ static int btrfs_dev_replace_continue_on_mount(struct btrfs_fs_info *fs_info) ret = btrfs_scrub_dev(fs_info, dev_replace->srcdev->devid, dev_replace->committed_cursor_left, - dev_replace->srcdev->total_bytes, + btrfs_device_get_total_bytes(dev_replace->srcdev), &dev_replace->scrub_progress, 0, 1); ret = btrfs_dev_replace_finishing(fs_info, ret); WARN_ON(ret); diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index c692c36..e78d9f9 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -1550,7 +1550,7 @@ static noinline int btrfs_ioctl_resize(struct file *file, goto out_free; } - old_size = device->total_bytes; + old_size = btrfs_device_get_total_bytes(device); if (mod < 0) { if (new_size > old_size) { @@ -2732,8 +2732,8 @@ static long btrfs_ioctl_dev_info(struct btrfs_root *root, void __user *arg) } di_args->devid = dev->devid; - di_args->bytes_used = dev->bytes_used; - di_args->total_bytes = dev->total_bytes; + di_args->bytes_used = btrfs_device_get_bytes_used(dev); + di_args->total_bytes = btrfs_device_get_total_bytes(dev); memcpy(di_args->uuid, dev->uuid, sizeof(di_args->uuid)); if (dev->name) { struct rcu_string *name; diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index d8e4a3d..41da102 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1308,7 +1308,7 @@ again: if (device->bytes_used > 0) { u64 len = btrfs_dev_extent_length(leaf, extent); - device->bytes_used -= len; + btrfs_device_set_bytes_used(device, device->bytes_used - len); spin_lock(&roo
[PATCH 07/18] Btrfs: fix unprotected device->bytes_used update
We should update device->bytes_used in the lock context of chunk_mutex, or we would get wrong data. Signed-off-by: Miao Xie --- fs/btrfs/volumes.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 1524b3f..45e0b5d 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -4429,6 +4429,9 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans, if (ret) goto error_del_extent; + for (i = 0; i < map->num_stripes; i++) + map->stripes[i].dev->bytes_used += stripe_size; + free_extent_map(em); check_raid56_incompat_flag(extent_root->fs_info, type); @@ -4500,7 +4503,6 @@ int btrfs_finish_chunk_alloc(struct btrfs_trans_handle *trans, device = map->stripes[i].dev; dev_offset = map->stripes[i].physical; - device->bytes_used += stripe_size; ret = btrfs_update_device(trans, device); if (ret) goto out; -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 05/18] Btrfs: fix wrong device bytes_used in the super block
device->bytes_used will be changed when allocating a new chunk, and disk_total_size will be changed if resizing is successful. Meanwhile, the on-disk super blocks of the previous transaction might not be updated. Considering the consistency of the metadata in the previous transaction, We should use the size in the previous transaction to check if the super block is beyond the boundary of the device. Though it is not big problem because we don't use it now, but anyway it is better that we make it be consistent with the common metadata, maybe we will use it in the future. Signed-off-by: Miao Xie --- fs/btrfs/dev-replace.c | 3 +++ fs/btrfs/disk-io.c | 3 ++- fs/btrfs/transaction.c | 1 + fs/btrfs/volumes.c | 27 +++ fs/btrfs/volumes.h | 4 5 files changed, 37 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c index 7877b0f..1be03d8 100644 --- a/fs/btrfs/dev-replace.c +++ b/fs/btrfs/dev-replace.c @@ -172,6 +172,8 @@ no_valid_dev_replace_entry_found: dev_replace->srcdev->commit_total_bytes; dev_replace->tgtdev->bytes_used = dev_replace->srcdev->bytes_used; + dev_replace->tgtdev->commit_bytes_used = + dev_replace->srcdev->commit_bytes_used; } dev_replace->tgtdev->is_tgtdev_for_dev_replace = 1; btrfs_init_dev_replace_tgtdev_for_resume(fs_info, @@ -558,6 +560,7 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info *fs_info, ASSERT(list_empty(&src_device->resized_list)); tgt_device->commit_total_bytes = src_device->commit_total_bytes; tgt_device->bytes_used = src_device->bytes_used; + tgt_device->commit_bytes_used = src_device->bytes_used; if (fs_info->sb->s_bdev == src_device->bdev) fs_info->sb->s_bdev = tgt_device->bdev; if (fs_info->fs_devices->latest_bdev == src_device->bdev) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 0c7ae0e..ff3ee22 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3450,7 +3450,8 @@ static int write_all_supers(struct btrfs_root *root, int max_mirrors) btrfs_set_stack_device_id(dev_item, dev->devid); btrfs_set_stack_device_total_bytes(dev_item, dev->commit_total_bytes); - btrfs_set_stack_device_bytes_used(dev_item, dev->bytes_used); + btrfs_set_stack_device_bytes_used(dev_item, + dev->commit_bytes_used); btrfs_set_stack_device_io_align(dev_item, dev->io_align); btrfs_set_stack_device_io_width(dev_item, dev->io_width); btrfs_set_stack_device_sector_size(dev_item, dev->sector_size); diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 2f7c0be..16d0c1b 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -1869,6 +1869,7 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans, sizeof(*root->fs_info->super_copy)); btrfs_update_commit_device_size(root->fs_info); + btrfs_update_commit_device_bytes_used(root, cur_trans); spin_lock(&root->fs_info->trans_lock); cur_trans->state = TRANS_STATE_UNBLOCKED; diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 7b5c042..f8273bb 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -2370,6 +2370,7 @@ int btrfs_init_dev_replace_tgtdev(struct btrfs_root *root, char *device_path, ASSERT(list_empty(&srcdev->resized_list)); device->commit_total_bytes = srcdev->commit_total_bytes; device->bytes_used = srcdev->bytes_used; + device->commit_bytes_used = device->bytes_used; device->dev_root = fs_info->dev_root; device->bdev = bdev; device->in_fs_metadata = 1; @@ -6009,6 +6010,7 @@ static void fill_device_from_item(struct extent_buffer *leaf, device->total_bytes = device->disk_total_bytes; device->commit_total_bytes = device->disk_total_bytes; device->bytes_used = btrfs_device_bytes_used(leaf, dev_item); + device->commit_bytes_used = device->bytes_used; device->type = btrfs_device_type(leaf, dev_item); device->io_align = btrfs_device_io_align(leaf, dev_item); device->io_width = btrfs_device_io_width(leaf, dev_item); @@ -6558,3 +6560,28 @@ void btrfs_update_commit_device_size(struct btrfs_fs_info *fs_info) unlock_chunks(fs_info->dev_root); mutex_unlock(&fs_devices->device_list_mutex); } + +/* Must be invoked during the transaction commit */ +void btrfs_update_commit_device_bytes_used(struct btrfs_root *root, + struct btrfs_transaction *transaction) +{ + struct extent_ma
[PATCH 5/5] Btrfs: scan all the devices and build the fs device list by btrfs's self
The original code need scan the devices and build the fs device list by the user tool by udev or users' selves. It is flexible. But if someone re-install the filesystem module, and forget to scan the devices by himself, or we plug some devices with btrfs, but udev thread is blocked and doesn't register the disk into btrfs in time, the filesystem would report that "can not open some device" when mounting the filesystem, it was uncomfortable, this patch fixes this problem by scanning all the devices if we find the number of devices is not right when we mount the filesystem. Signed-off-by: Miao Xie --- fs/btrfs/super.c | 3 ++ fs/btrfs/volumes.c | 107 +++-- fs/btrfs/volumes.h | 5 ++- 3 files changed, 103 insertions(+), 12 deletions(-) diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 6b98358..2a8c664 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -1264,6 +1264,9 @@ static struct dentry *btrfs_mount(struct file_system_type *fs_type, int flags, if (error) return ERR_PTR(error); + if (fs_devices->num_devices != fs_devices->total_devices) + btrfs_scan_all_devices(fs_type); + /* * Setup a dummy root and fs_info for test/set super. This is because * we don't actually fill this stuff out until open_ctree, but we need diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 9d52fd8..aa4665e 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -27,6 +27,7 @@ #include #include #include +#include #include #include "ctree.h" #include "extent_map.h" @@ -236,6 +237,29 @@ btrfs_get_bdev_and_sb_by_path(const char *device_path, fmode_t flags, return 0; } +static int +btrfs_get_bdev_and_sb_by_dev(dev_t dev, fmode_t flags, void *holder, int flush, +struct block_device **bdev, +struct buffer_head **bh) +{ + int ret; + + *bdev = blkdev_get_by_dev(dev, flags, holder); + if (IS_ERR(*bdev)) { + printk(KERN_INFO "BTRFS: open device %d:%d failed\n", + MAJOR(dev), MINOR(dev)); + return PTR_ERR(*bdev); + } + + ret = __btrfs_get_sb(*bdev, flush, bh); + if (ret) { + blkdev_put(*bdev, flags); + return ret; + } + + return 0; +} + static void requeue_list(struct btrfs_pending_bios *pending_bios, struct bio *head, struct bio *tail) { @@ -466,8 +490,9 @@ static void pending_bios_fn(struct btrfs_work *work) * < 0 - error */ static noinline int device_list_add(const char *path, - struct btrfs_super_block *disk_super, - u64 devid, struct btrfs_fs_devices **fs_devices_ret) + struct btrfs_super_block *disk_super, + u64 devid, dev_t devnum, + struct btrfs_fs_devices **fs_devices_ret) { struct btrfs_device *device; struct btrfs_fs_devices *fs_devices; @@ -493,7 +518,7 @@ static noinline int device_list_add(const char *path, if (fs_devices->opened) return -EBUSY; - device = btrfs_alloc_device(NULL, &devid, + device = btrfs_alloc_device(NULL, &devid, devnum, disk_super->dev_item.uuid); if (IS_ERR(device)) { /* we can safely leave the fs_devices entry around */ @@ -561,6 +586,7 @@ static noinline int device_list_add(const char *path, if (device->missing) { fs_devices->missing_devices--; device->missing = 0; + device->devnum = devnum; } } @@ -597,7 +623,7 @@ static struct btrfs_fs_devices *clone_fs_devices(struct btrfs_fs_devices *orig) struct rcu_string *name; device = btrfs_alloc_device(NULL, &orig_dev->devid, - orig_dev->uuid); + orig_dev->devnum, orig_dev->uuid); if (IS_ERR(device)) goto error; @@ -735,7 +761,7 @@ static int __btrfs_close_devices(struct btrfs_fs_devices *fs_devices) fs_devices->missing_devices--; new_device = btrfs_alloc_device(NULL, &device->devid, - device->uuid); + device->devnum, device->uuid); BUG_ON(IS_ERR(new_device)); /* -ENOMEM */ /* Safe because we are under uuid_mutex */ @@ -811,7 +837,7 @@ static int __btrfs_open_devices(struct btrfs_fs_devices *fs_devices, continue; /* Just open everything we can; ignore failures here */ - if (bt
[PATCH 16/18] Btrfs: stop mounting the fs if the non-ENOENT errors happen when opening seed fs
When we open a seed filesystem, if the degraded mount option is set, we continue to mount the fs if we don't find some devices in the seed filesystem. But we should stop mounting if other errors happen. Fix it Signed-off-by: Miao Xie --- fs/btrfs/volumes.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index fd8141e..cc59fcb 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -6093,7 +6093,7 @@ static int read_one_dev(struct btrfs_root *root, if (memcmp(fs_uuid, root->fs_info->fsid, BTRFS_UUID_SIZE)) { ret = open_seed_devices(root, fs_uuid); - if (ret && !btrfs_test_opt(root, DEGRADED)) + if (ret && !(ret == -ENOENT && btrfs_test_opt(root, DEGRADED))) return ret; } -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 14/18] Btrfs: fix use-after-free problem of the device during device replace
The problem is: Task0(device scan task) Task1(device replace task) scan_one_device() mutex_lock(&uuid_mutex) device = find_device() mutex_lock(&device_list_mutex) lock_chunk() rm_and_free_source_device unlock_chunk() mutex_unlock(&device_list_mutex) check device Destroying the target device if device replace fails also has the same problem. We fix this problem by locking uuid_mutex during destroying source device or target device, just like the device remove operation. It is a temporary solution, we can fix this problem and make the code more clear by atomic counter in the future. Signed-off-by: Miao Xie --- fs/btrfs/dev-replace.c | 3 +++ fs/btrfs/volumes.c | 4 +++- fs/btrfs/volumes.h | 2 ++ 3 files changed, 8 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c index aa4c828..e9cbbdb 100644 --- a/fs/btrfs/dev-replace.c +++ b/fs/btrfs/dev-replace.c @@ -509,6 +509,7 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info *fs_info, ret = btrfs_commit_transaction(trans, root); WARN_ON(ret); + mutex_lock(&uuid_mutex); /* keep away write_all_supers() during the finishing procedure */ mutex_lock(&root->fs_info->fs_devices->device_list_mutex); mutex_lock(&root->fs_info->chunk_mutex); @@ -536,6 +537,7 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info *fs_info, btrfs_dev_replace_unlock(dev_replace); mutex_unlock(&root->fs_info->chunk_mutex); mutex_unlock(&root->fs_info->fs_devices->device_list_mutex); + mutex_unlock(&uuid_mutex); if (tgt_device) btrfs_destroy_dev_replace_tgtdev(fs_info, tgt_device); mutex_unlock(&dev_replace->lock_finishing_cancel_unmount); @@ -591,6 +593,7 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info *fs_info, */ mutex_unlock(&root->fs_info->chunk_mutex); mutex_unlock(&root->fs_info->fs_devices->device_list_mutex); + mutex_unlock(&uuid_mutex); /* write back the superblocks */ trans = btrfs_start_transaction(root, 0); diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index f0173b1..24d7001 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -50,7 +50,7 @@ static void __btrfs_reset_dev_stats(struct btrfs_device *dev); static void btrfs_dev_stat_print_on_error(struct btrfs_device *dev); static void btrfs_dev_stat_print_on_load(struct btrfs_device *device); -static DEFINE_MUTEX(uuid_mutex); +DEFINE_MUTEX(uuid_mutex); static LIST_HEAD(fs_uuids); static void lock_chunks(struct btrfs_root *root) @@ -1867,6 +1867,7 @@ void btrfs_destroy_dev_replace_tgtdev(struct btrfs_fs_info *fs_info, { struct btrfs_device *next_device; + mutex_lock(&uuid_mutex); WARN_ON(!tgtdev); mutex_lock(&fs_info->fs_devices->device_list_mutex); if (tgtdev->bdev) { @@ -1886,6 +1887,7 @@ void btrfs_destroy_dev_replace_tgtdev(struct btrfs_fs_info *fs_info, call_rcu(&tgtdev->rcu, free_device); mutex_unlock(&fs_info->fs_devices->device_list_mutex); + mutex_unlock(&uuid_mutex); } static int btrfs_find_device_by_path(struct btrfs_root *root, char *device_path, diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 76600a3..2b37da3 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -24,6 +24,8 @@ #include #include "async-thread.h" +extern struct mutex uuid_mutex; + #define BTRFS_STRIPE_LEN (64 * 1024) struct buffer_head; -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 08/18] Btrfs: update free_chunk_space during allocting a new chunk
We should update free_chunk_space in time when we allocate a new chunk, not when we deal with the pending device update and block group insertion, because we need the real free_chunk_space data to calculate the reserved space, if we don't update it in time, we would consider the disk space which has be allocated as free space, and would use it to do overcommit reservation. Fix it. Signed-off-by: Miao Xie --- fs/btrfs/volumes.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 45e0b5d..d8e4a3d 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -4432,6 +4432,11 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans, for (i = 0; i < map->num_stripes; i++) map->stripes[i].dev->bytes_used += stripe_size; + spin_lock(&extent_root->fs_info->free_chunk_lock); + extent_root->fs_info->free_chunk_space -= (stripe_size * + map->num_stripes); + spin_unlock(&extent_root->fs_info->free_chunk_lock); + free_extent_map(em); check_raid56_incompat_flag(extent_root->fs_info, type); @@ -4515,11 +4520,6 @@ int btrfs_finish_chunk_alloc(struct btrfs_trans_handle *trans, goto out; } - spin_lock(&extent_root->fs_info->free_chunk_lock); - extent_root->fs_info->free_chunk_space -= (stripe_size * - map->num_stripes); - spin_unlock(&extent_root->fs_info->free_chunk_lock); - stripe = &chunk->stripe; for (i = 0; i < map->num_stripes; i++) { device = map->stripes[i].dev; -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/5] Btrfs: don't return btrfs_fs_devices if the caller doesn't want it
We will implement the function that the filesystem scan all the devices in the system and build the device set for btrfs. In this case, we needn't get btrfs_fs_devices when adding a device into list. This patch changes device_add_list and implement this feature. Signed-off-by: Miao Xie --- fs/btrfs/volumes.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 1aacf5f..740a4f9 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -568,7 +568,8 @@ static noinline int device_list_add(const char *path, if (!fs_devices->opened) device->generation = found_transid; - *fs_devices_ret = fs_devices; + if (fs_devices_ret) + *fs_devices_ret = fs_devices; return ret; } -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 10/18] Btrfs: fix unprotected system chunk array insertion
We didn't protect the system chunk array when we added a new system chunk into it, it would cause the array be corrupted if someone remove/add some system chunk into array at the same time. Fix it by chunk lock. Signed-off-by: Miao Xie --- fs/btrfs/volumes.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 41da102..9f22398d 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -4054,10 +4054,13 @@ static int btrfs_add_system_chunk(struct btrfs_root *root, u32 array_size; u8 *ptr; + lock_chunks(root); array_size = btrfs_super_sys_array_size(super_copy); if (array_size + item_size + sizeof(disk_key) - > BTRFS_SYSTEM_CHUNK_ARRAY_SIZE) + > BTRFS_SYSTEM_CHUNK_ARRAY_SIZE) { + unlock_chunks(root); return -EFBIG; + } ptr = super_copy->sys_chunk_array + array_size; btrfs_cpu_key_to_disk(&disk_key, key); @@ -4066,6 +4069,8 @@ static int btrfs_add_system_chunk(struct btrfs_root *root, memcpy(ptr, chunk, item_size); item_size += sizeof(disk_key); btrfs_set_super_sys_array_size(super_copy, array_size + item_size); + unlock_chunks(root); + return 0; } -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 12/18] Btrfs: Fix misuse of chunk mutex
There were several problems about chunk mutex usage: - Lock chunk mutex when updating metadata. It would cause the nested deadlock because updating metadata might need allocate new chunks that need acquire chunk mutex. We remove chunk mutex at this case, because b-tree lock and other lock mechanism can help us. - ABBA deadlock occured between device_list_mutex and chunk_mutex. When we update device status, we must acquire device_list_mutex at the beginning, and then we might get chunk_mutex during the device status update because we need allocate new chunks for metadata COW. But at most place, we acquire chunk_mutex at first and then acquire device list mutex. We need change the lock order. - Some place we needn't acquire chunk_mutex. For example we needn't get chunk_mutex when we free a empty seed fs_devices structure. Signed-off-by: Miao Xie --- fs/btrfs/dev-replace.c | 6 +-- fs/btrfs/extent-tree.c | 2 - fs/btrfs/volumes.c | 129 - 3 files changed, 65 insertions(+), 72 deletions(-) diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c index da7ac14..aa4c828 100644 --- a/fs/btrfs/dev-replace.c +++ b/fs/btrfs/dev-replace.c @@ -510,8 +510,8 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info *fs_info, WARN_ON(ret); /* keep away write_all_supers() during the finishing procedure */ - mutex_lock(&root->fs_info->chunk_mutex); mutex_lock(&root->fs_info->fs_devices->device_list_mutex); + mutex_lock(&root->fs_info->chunk_mutex); btrfs_dev_replace_lock(dev_replace); dev_replace->replace_state = scrub_ret ? BTRFS_IOCTL_DEV_REPLACE_STATE_CANCELED @@ -534,8 +534,8 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info *fs_info, src_device->devid, rcu_str_deref(tgt_device->name), scrub_ret); btrfs_dev_replace_unlock(dev_replace); - mutex_unlock(&root->fs_info->fs_devices->device_list_mutex); mutex_unlock(&root->fs_info->chunk_mutex); + mutex_unlock(&root->fs_info->fs_devices->device_list_mutex); if (tgt_device) btrfs_destroy_dev_replace_tgtdev(fs_info, tgt_device); mutex_unlock(&dev_replace->lock_finishing_cancel_unmount); @@ -589,8 +589,8 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info *fs_info, * superblock is scratched out so that it is no longer marked to * belong to this filesystem. */ - mutex_unlock(&root->fs_info->fs_devices->device_list_mutex); mutex_unlock(&root->fs_info->chunk_mutex); + mutex_unlock(&root->fs_info->fs_devices->device_list_mutex); /* write back the superblocks */ trans = btrfs_start_transaction(root, 0); diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index e105558..e1ad84e 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -9404,8 +9404,6 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans, memcpy(&key, &block_group->key, sizeof(key)); - btrfs_clear_space_info_full(root->fs_info); - btrfs_put_block_group(block_group); btrfs_put_block_group(block_group); diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 9f22398d..357f911 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1264,7 +1264,7 @@ out: static int btrfs_free_dev_extent(struct btrfs_trans_handle *trans, struct btrfs_device *device, - u64 start) + u64 start, u64 *dev_extent_len) { int ret; struct btrfs_path *path; @@ -1306,13 +1306,8 @@ again: goto out; } - if (device->bytes_used > 0) { - u64 len = btrfs_dev_extent_length(leaf, extent); - btrfs_device_set_bytes_used(device, device->bytes_used - len); - spin_lock(&root->fs_info->free_chunk_lock); - root->fs_info->free_chunk_space += len; - spin_unlock(&root->fs_info->free_chunk_lock); - } + *dev_extent_len = btrfs_dev_extent_length(leaf, extent); + ret = btrfs_del_item(trans, root, path); if (ret) { btrfs_error(root->fs_info, ret, @@ -1521,7 +1516,6 @@ static int btrfs_rm_dev_item(struct btrfs_root *root, key.objectid = BTRFS_DEV_ITEMS_OBJECTID; key.type = BTRFS_DEV_ITEM_KEY; key.offset = device->devid; - lock_chunks(root); ret = btrfs_search_slot(trans, root, &key, path, -1, 1); if (ret < 0) @@ -1537,7 +1531,6 @@ static int btrfs_rm_dev_item(struct btrfs_root *root, goto out; out: btrfs_free_path(path); - unlock_chunks(root); btrfs_commit_transaction(trans, root); return ret; } @@ -1726,9 +1719,7 @@ int btrfs_rm_device(struct btrfs_roo
[PATCH 04/18] Btrfs: fix wrong disk size when writing super blocks
total_size will be changed when resizing a device, and disk_total_size will be changed if resizing is successful. Meanwhile, the on-disk super blocks of the previous transaction might not be updated. Considering the consistency of the metadata in the previous transaction, We should use the size in the previous transaction to check if the super block is beyond the boundary of the device. Fix it. Signed-off-by: Miao Xie --- fs/btrfs/check-integrity.c | 2 +- fs/btrfs/dev-replace.c | 18 ++ fs/btrfs/disk-io.c | 5 +++-- fs/btrfs/scrub.c | 3 ++- fs/btrfs/transaction.c | 2 ++ fs/btrfs/volumes.c | 40 +++- fs/btrfs/volumes.h | 18 ++ 7 files changed, 83 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c index e0033c8..cb7f3fe 100644 --- a/fs/btrfs/check-integrity.c +++ b/fs/btrfs/check-integrity.c @@ -807,7 +807,7 @@ static int btrfsic_process_superblock_dev_mirror( /* super block bytenr is always the unmapped device bytenr */ dev_bytenr = btrfs_sb_offset(superblock_mirror_num); - if (dev_bytenr + BTRFS_SUPER_INFO_SIZE > device->total_bytes) + if (dev_bytenr + BTRFS_SUPER_INFO_SIZE > device->commit_total_bytes) return -1; bh = __bread(superblock_bdev, dev_bytenr / 4096, BTRFS_SUPER_INFO_SIZE); diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c index 72dc02e..7877b0f 100644 --- a/fs/btrfs/dev-replace.c +++ b/fs/btrfs/dev-replace.c @@ -168,6 +168,8 @@ no_valid_dev_replace_entry_found: dev_replace->srcdev->total_bytes; dev_replace->tgtdev->disk_total_bytes = dev_replace->srcdev->disk_total_bytes; + dev_replace->tgtdev->commit_total_bytes = + dev_replace->srcdev->commit_total_bytes; dev_replace->tgtdev->bytes_used = dev_replace->srcdev->bytes_used; } @@ -329,6 +331,20 @@ int btrfs_dev_replace_start(struct btrfs_root *root, args->start.tgtdev_name[0] == '\0') return -EINVAL; + /* +* Here we commit the transaction to make sure commit_total_bytes +* of all the devices are updated. +*/ + trans = btrfs_attach_transaction(root); + if (!IS_ERR(trans)) { + ret = btrfs_commit_transaction(trans, root); + if (ret) + return ret; + } else if (PTR_ERR(trans) != -ENOENT) { + return PTR_ERR(trans); + } + + /* the disk copy procedure reuses the scrub code */ mutex_lock(&fs_info->volume_mutex); ret = btrfs_dev_replace_find_srcdev(root, args->start.srcdevid, args->start.srcdev_name, @@ -539,6 +555,8 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info *fs_info, memcpy(src_device->uuid, uuid_tmp, sizeof(src_device->uuid)); tgt_device->total_bytes = src_device->total_bytes; tgt_device->disk_total_bytes = src_device->disk_total_bytes; + ASSERT(list_empty(&src_device->resized_list)); + tgt_device->commit_total_bytes = src_device->commit_total_bytes; tgt_device->bytes_used = src_device->bytes_used; if (fs_info->sb->s_bdev == src_device->bdev) fs_info->sb->s_bdev = tgt_device->bdev; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index df1ae8c..0c7ae0e 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3131,7 +3131,8 @@ static int write_dev_supers(struct btrfs_device *device, for (i = 0; i < max_mirrors; i++) { bytenr = btrfs_sb_offset(i); - if (bytenr + BTRFS_SUPER_INFO_SIZE >= device->total_bytes) + if (bytenr + BTRFS_SUPER_INFO_SIZE >= + device->commit_total_bytes) break; if (wait) { @@ -3448,7 +3449,7 @@ static int write_all_supers(struct btrfs_root *root, int max_mirrors) btrfs_set_stack_device_type(dev_item, dev->type); btrfs_set_stack_device_id(dev_item, dev->devid); btrfs_set_stack_device_total_bytes(dev_item, - dev->disk_total_bytes); + dev->commit_total_bytes); btrfs_set_stack_device_bytes_used(dev_item, dev->bytes_used); btrfs_set_stack_device_io_align(dev_item, dev->io_align); btrfs_set_stack_device_io_width(dev_item, dev->io_width); diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index f8e1144..cce122b 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -2861,7 +2861,8 @@ static noinline_for_stack i
[PATCH 17/18] Btrfs: move the missing device to its own fs device list
For a missing device, we don't know it belong to which fs before we read its fsid from the chunk tree. So we add them into the current fs device list at first. When we get its fsid, we should move them to their own fs device list. Signed-off-by: Miao Xie --- fs/btrfs/volumes.c | 78 -- 1 file changed, 52 insertions(+), 26 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index cc59fcb..b7f093d 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -5846,10 +5846,10 @@ struct btrfs_device *btrfs_find_device(struct btrfs_fs_info *fs_info, u64 devid, } static struct btrfs_device *add_missing_dev(struct btrfs_root *root, + struct btrfs_fs_devices *fs_devices, u64 devid, u8 *dev_uuid) { struct btrfs_device *device; - struct btrfs_fs_devices *fs_devices = root->fs_info->fs_devices; device = btrfs_alloc_device(NULL, &devid, dev_uuid); if (IS_ERR(device)) @@ -5986,7 +5986,8 @@ static int read_one_chunk(struct btrfs_root *root, struct btrfs_key *key, } if (!map->stripes[i].dev) { map->stripes[i].dev = - add_missing_dev(root, devid, uuid); + add_missing_dev(root, root->fs_info->fs_devices, + devid, uuid); if (!map->stripes[i].dev) { free_extent_map(em); return -EIO; @@ -6027,7 +6028,8 @@ static void fill_device_from_item(struct extent_buffer *leaf, read_extent_buffer(leaf, device->uuid, ptr, BTRFS_UUID_SIZE); } -static int open_seed_devices(struct btrfs_root *root, u8 *fsid) +static struct btrfs_fs_devices *open_seed_devices(struct btrfs_root *root, + u8 *fsid) { struct btrfs_fs_devices *fs_devices; int ret; @@ -6036,49 +6038,56 @@ static int open_seed_devices(struct btrfs_root *root, u8 *fsid) fs_devices = root->fs_info->fs_devices->seed; while (fs_devices) { - if (!memcmp(fs_devices->fsid, fsid, BTRFS_UUID_SIZE)) { - ret = 0; - goto out; - } + if (!memcmp(fs_devices->fsid, fsid, BTRFS_UUID_SIZE)) + return fs_devices; + fs_devices = fs_devices->seed; } fs_devices = find_fsid(fsid); if (!fs_devices) { - ret = -ENOENT; - goto out; + if (!btrfs_test_opt(root, DEGRADED)) + return ERR_PTR(-ENOENT); + + fs_devices = alloc_fs_devices(fsid); + if (IS_ERR(fs_devices)) + return fs_devices; + + fs_devices->seeding = 1; + fs_devices->opened = 1; + return fs_devices; } fs_devices = clone_fs_devices(fs_devices); - if (IS_ERR(fs_devices)) { - ret = PTR_ERR(fs_devices); - goto out; - } + if (IS_ERR(fs_devices)) + return fs_devices; ret = __btrfs_open_devices(fs_devices, FMODE_READ, root->fs_info->bdev_holder); if (ret) { free_fs_devices(fs_devices); + fs_devices = ERR_PTR(ret); goto out; } if (!fs_devices->seeding) { __btrfs_close_devices(fs_devices); free_fs_devices(fs_devices); - ret = -EINVAL; + fs_devices = ERR_PTR(-EINVAL); goto out; } fs_devices->seed = root->fs_info->fs_devices->seed; root->fs_info->fs_devices->seed = fs_devices; out: - return ret; + return fs_devices; } static int read_one_dev(struct btrfs_root *root, struct extent_buffer *leaf, struct btrfs_dev_item *dev_item) { + struct btrfs_fs_devices *fs_devices = root->fs_info->fs_devices; struct btrfs_device *device; u64 devid; int ret; @@ -6092,31 +6101,48 @@ static int read_one_dev(struct btrfs_root *root, BTRFS_UUID_SIZE); if (memcmp(fs_uuid, root->fs_info->fsid, BTRFS_UUID_SIZE)) { - ret = open_seed_devices(root, fs_uuid); - if (ret && !(ret == -ENOENT && btrfs_test_opt(root, DEGRADED))) - return ret; + fs_devices = open_seed_devices(root, fs_uuid); + if (IS_ERR(fs_devices)) + return PTR_ERR(fs_devices); } device = btrfs_find_device(root->fs_info, devid, dev_uuid, fs_uuid); - if (!device || !device->bdev) { + if (!device) { if (!btrfs_test_opt(root, DEGRADED)
[PATCH 4/5] Btrfs: restructure btrfs_get_bdev_and_sb and pick up some code used later
Some code in btrfs_get_bdev_and_sb will be re-used by the other function later, so restructure btrfs_get_bdev_and_sb and pick up those code to make a new function. Signed-off-by: Miao Xie --- fs/btrfs/volumes.c | 66 +- 1 file changed, 36 insertions(+), 30 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index bcb19d5..9d52fd8 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -193,42 +193,47 @@ static noinline struct btrfs_fs_devices *find_fsid(u8 *fsid) return NULL; } +static int __btrfs_get_sb(struct block_device *bdev, int flush, + struct buffer_head **bh) +{ + int ret; + + if (flush) + filemap_write_and_wait(bdev->bd_inode->i_mapping); + + ret = set_blocksize(bdev, 4096); + if (ret) + return ret; + + invalidate_bdev(bdev); + *bh = btrfs_read_dev_super(bdev); + if (!*bh) + return -EINVAL; + + return 0; +} + static int -btrfs_get_bdev_and_sb(const char *device_path, fmode_t flags, void *holder, - int flush, struct block_device **bdev, - struct buffer_head **bh) +btrfs_get_bdev_and_sb_by_path(const char *device_path, fmode_t flags, + void *holder, int flush, + struct block_device **bdev, + struct buffer_head **bh) { int ret; *bdev = blkdev_get_by_path(device_path, flags, holder); - if (IS_ERR(*bdev)) { - ret = PTR_ERR(*bdev); printk(KERN_INFO "BTRFS: open %s failed\n", device_path); - goto error; + return PTR_ERR(*bdev); } - if (flush) - filemap_write_and_wait((*bdev)->bd_inode->i_mapping); - ret = set_blocksize(*bdev, 4096); + ret = __btrfs_get_sb(*bdev, flush, bh); if (ret) { blkdev_put(*bdev, flags); - goto error; - } - invalidate_bdev(*bdev); - *bh = btrfs_read_dev_super(*bdev); - if (!*bh) { - ret = -EINVAL; - blkdev_put(*bdev, flags); - goto error; + return ret; } return 0; - -error: - *bdev = NULL; - *bh = NULL; - return ret; } static void requeue_list(struct btrfs_pending_bios *pending_bios, @@ -806,8 +811,8 @@ static int __btrfs_open_devices(struct btrfs_fs_devices *fs_devices, continue; /* Just open everything we can; ignore failures here */ - if (btrfs_get_bdev_and_sb(device->name->str, flags, holder, 1, - &bdev, &bh)) + if (btrfs_get_bdev_and_sb_by_path(device->name->str, flags, + holder, 1, &bdev, &bh)) continue; disk_super = (struct btrfs_super_block *)bh->b_data; @@ -1629,10 +1634,10 @@ int btrfs_rm_device(struct btrfs_root *root, char *device_path) goto out; } } else { - ret = btrfs_get_bdev_and_sb(device_path, - FMODE_WRITE | FMODE_EXCL, - root->fs_info->bdev_holder, 0, - &bdev, &bh); + ret = btrfs_get_bdev_and_sb_by_path(device_path, + FMODE_WRITE | FMODE_EXCL, + root->fs_info->bdev_holder, + 0, &bdev, &bh); if (ret) goto out; disk_super = (struct btrfs_super_block *)bh->b_data; @@ -1906,8 +1911,9 @@ static int btrfs_find_device_by_path(struct btrfs_root *root, char *device_path, struct buffer_head *bh; *device = NULL; - ret = btrfs_get_bdev_and_sb(device_path, FMODE_READ, - root->fs_info->bdev_holder, 0, &bdev, &bh); + ret = btrfs_get_bdev_and_sb_by_path(device_path, FMODE_READ, + root->fs_info->bdev_holder, 0, + &bdev, &bh); if (ret) return ret; disk_super = (struct btrfs_super_block *)bh->b_data; -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 03/18] Btrfs: fix unprotected assignment of the target device
We didn't protect the assignment of the target device, it might cause the problem that the super block update was skipped because we might find wrong size of the target device during the assignment. Fix it by moving the assignment sentences into the initialization function of the target device. And there is another merit that we can check if the target device is suitable more early. Signed-off-by: Miao Xie --- fs/btrfs/dev-replace.c | 32 fs/btrfs/volumes.c | 23 +++ fs/btrfs/volumes.h | 1 + 3 files changed, 28 insertions(+), 28 deletions(-) diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c index 10dfb41..72dc02e 100644 --- a/fs/btrfs/dev-replace.c +++ b/fs/btrfs/dev-replace.c @@ -330,29 +330,19 @@ int btrfs_dev_replace_start(struct btrfs_root *root, return -EINVAL; mutex_lock(&fs_info->volume_mutex); - ret = btrfs_init_dev_replace_tgtdev(root, args->start.tgtdev_name, - &tgt_device); - if (ret) { - btrfs_err(fs_info, "target device %s is invalid!", - args->start.tgtdev_name); - mutex_unlock(&fs_info->volume_mutex); - return -EINVAL; - } - ret = btrfs_dev_replace_find_srcdev(root, args->start.srcdevid, args->start.srcdev_name, &src_device); - mutex_unlock(&fs_info->volume_mutex); if (ret) { - ret = -EINVAL; - goto leave_no_lock; + mutex_unlock(&fs_info->volume_mutex); + return ret; } - if (tgt_device->total_bytes < src_device->total_bytes) { - btrfs_err(fs_info, "target device is smaller than source device!"); - ret = -EINVAL; - goto leave_no_lock; - } + ret = btrfs_init_dev_replace_tgtdev(root, args->start.tgtdev_name, + src_device, &tgt_device); + mutex_unlock(&fs_info->volume_mutex); + if (ret) + return ret; btrfs_dev_replace_lock(dev_replace); switch (dev_replace->replace_state) { @@ -380,10 +370,6 @@ int btrfs_dev_replace_start(struct btrfs_root *root, src_device->devid, rcu_str_deref(tgt_device->name)); - tgt_device->total_bytes = src_device->total_bytes; - tgt_device->disk_total_bytes = src_device->disk_total_bytes; - tgt_device->bytes_used = src_device->bytes_used; - /* * from now on, the writes to the srcdev are all duplicated to * go to the tgtdev as well (refer to btrfs_map_block()). @@ -426,9 +412,7 @@ leave: dev_replace->srcdev = NULL; dev_replace->tgtdev = NULL; btrfs_dev_replace_unlock(dev_replace); -leave_no_lock: - if (tgt_device) - btrfs_destroy_dev_replace_tgtdev(fs_info, tgt_device); + btrfs_destroy_dev_replace_tgtdev(fs_info, tgt_device); return ret; } diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 483fc6d..1646659 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -2295,6 +2295,7 @@ error: } int btrfs_init_dev_replace_tgtdev(struct btrfs_root *root, char *device_path, + struct btrfs_device *srcdev, struct btrfs_device **device_out) { struct request_queue *q; @@ -2307,24 +2308,37 @@ int btrfs_init_dev_replace_tgtdev(struct btrfs_root *root, char *device_path, int ret = 0; *device_out = NULL; - if (fs_info->fs_devices->seeding) + if (fs_info->fs_devices->seeding) { + btrfs_err(fs_info, "the filesystem is a seed filesystem!"); return -EINVAL; + } bdev = blkdev_get_by_path(device_path, FMODE_WRITE | FMODE_EXCL, fs_info->bdev_holder); - if (IS_ERR(bdev)) + if (IS_ERR(bdev)) { + btrfs_err(fs_info, "target device %s is invalid!", device_path); return PTR_ERR(bdev); + } filemap_write_and_wait(bdev->bd_inode->i_mapping); devices = &fs_info->fs_devices->devices; list_for_each_entry(device, devices, dev_list) { if (device->bdev == bdev) { + btrfs_err(fs_info, "target device is in the filesystem!"); ret = -EEXIST; goto error; } } + + if (i_size_read(bdev->bd_inode) < srcdev->total_bytes) { + btrfs_err(fs_info, "target device is smaller than source device!"); + ret = -EINVAL; + goto error; + } + + device = btrfs_alloc_device(NULL, &devid, NULL); if (IS_ERR(device)) { ret = PTR_ERR(device); @@ -2348,8 +2362,9 @@ int btrfs_init_dev_re
[PATCH 3/5] Btrfs: restructure btrfs_scan_one_device
Some code in btrfs_scan_one_device will be re-used by the other function later, so restructure btrfs_scan_one_device and pick up those code to make a new function. Signed-off-by: Miao Xie --- fs/btrfs/volumes.c | 57 +++--- 1 file changed, 33 insertions(+), 24 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 740a4f9..bcb19d5 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -885,24 +885,18 @@ int btrfs_open_devices(struct btrfs_fs_devices *fs_devices, return ret; } -/* - * Look for a btrfs signature on a device. This may be called out of the mount path - * and we are not allowed to call set_blocksize during the scan. The superblock - * is read via pagecache - */ -int btrfs_scan_one_device(const char *path, fmode_t flags, void *holder, - struct btrfs_fs_devices **fs_devices_ret) +static int __scan_device(struct block_device *bdev, const char *path, +struct btrfs_fs_devices **fs_devices_ret) { struct btrfs_super_block *disk_super; - struct block_device *bdev; struct page *page; void *p; - int ret = -EINVAL; u64 devid; u64 transid; u64 total_devices; u64 bytenr; pgoff_t index; + int ret; /* * we would like to check all the supers, but that would make @@ -911,38 +905,30 @@ int btrfs_scan_one_device(const char *path, fmode_t flags, void *holder, * later supers, using BTRFS_SUPER_MIRROR_MAX instead */ bytenr = btrfs_sb_offset(0); - flags |= FMODE_EXCL; - mutex_lock(&uuid_mutex); - - bdev = blkdev_get_by_path(path, flags, holder); - - if (IS_ERR(bdev)) { - ret = PTR_ERR(bdev); - goto error; - } /* make sure our super fits in the device */ if (bytenr + PAGE_CACHE_SIZE >= i_size_read(bdev->bd_inode)) - goto error_bdev_put; + return -EINVAL; /* make sure our super fits in the page */ if (sizeof(*disk_super) > PAGE_CACHE_SIZE) - goto error_bdev_put; + return -EINVAL; /* make sure our super doesn't straddle pages on disk */ index = bytenr >> PAGE_CACHE_SHIFT; if ((bytenr + sizeof(*disk_super) - 1) >> PAGE_CACHE_SHIFT != index) - goto error_bdev_put; + return -EINVAL; /* pull in the page with our super */ page = read_cache_page_gfp(bdev->bd_inode->i_mapping, index, GFP_NOFS); if (IS_ERR_OR_NULL(page)) - goto error_bdev_put; + return -ENOMEM; - p = kmap(page); + ret = -EINVAL; + p = kmap(page); /* align our pointer to the offset of the super block */ disk_super = p + (bytenr & ~PAGE_CACHE_MASK); @@ -974,7 +960,30 @@ error_unmap: kunmap(page); page_cache_release(page); -error_bdev_put: + return ret; +} + +/* + * Look for a btrfs signature on a device. This may be called out of the mount path + * and we are not allowed to call set_blocksize during the scan. The superblock + * is read via pagecache + */ +int btrfs_scan_one_device(const char *path, fmode_t flags, void *holder, + struct btrfs_fs_devices **fs_devices_ret) +{ + struct block_device *bdev; + int ret; + + flags |= FMODE_EXCL; + + mutex_lock(&uuid_mutex); + bdev = blkdev_get_by_path(path, flags, holder); + if (IS_ERR(bdev)) { + ret = PTR_ERR(bdev); + goto error; + } + + ret = __scan_device(bdev, path, fs_devices_ret); blkdev_put(bdev, flags); error: mutex_unlock(&uuid_mutex); -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 01/18] Btrfs: cleanup unused num_can_discard in fs_devices
The member variants - num_can_discard - of fs_devices structure are set, but no one use them to do anything. so remove them. Signed-off-by: Miao Xie --- fs/btrfs/volumes.c | 16 ++-- fs/btrfs/volumes.h | 1 - 2 files changed, 2 insertions(+), 15 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index e9676a4..483fc6d 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -720,8 +720,6 @@ static int __btrfs_close_devices(struct btrfs_fs_devices *fs_devices) fs_devices->rw_devices--; } - if (device->can_discard) - fs_devices->num_can_discard--; if (device->missing) fs_devices->missing_devices--; @@ -828,10 +826,8 @@ static int __btrfs_open_devices(struct btrfs_fs_devices *fs_devices, } q = bdev_get_queue(bdev); - if (blk_queue_discard(q)) { + if (blk_queue_discard(q)) device->can_discard = 1; - fs_devices->num_can_discard++; - } device->bdev = bdev; device->in_fs_metadata = 0; @@ -1835,8 +1831,7 @@ void btrfs_rm_dev_replace_srcdev(struct btrfs_fs_info *fs_info, if (!fs_devices->seeding) fs_devices->rw_devices++; } - if (srcdev->can_discard) - fs_devices->num_can_discard--; + if (srcdev->bdev) { fs_devices->open_devices--; @@ -1886,8 +1881,6 @@ void btrfs_destroy_dev_replace_tgtdev(struct btrfs_fs_info *fs_info, fs_info->fs_devices->open_devices--; } fs_info->fs_devices->num_devices--; - if (tgtdev->can_discard) - fs_info->fs_devices->num_can_discard++; next_device = list_entry(fs_info->fs_devices->devices.next, struct btrfs_device, dev_list); @@ -2008,7 +2001,6 @@ static int btrfs_prepare_sprout(struct btrfs_root *root) fs_devices->num_devices = 0; fs_devices->open_devices = 0; fs_devices->missing_devices = 0; - fs_devices->num_can_discard = 0; fs_devices->rotating = 0; fs_devices->seed = seed_devices; @@ -2200,8 +2192,6 @@ int btrfs_init_new_device(struct btrfs_root *root, char *device_path) root->fs_info->fs_devices->open_devices++; root->fs_info->fs_devices->rw_devices++; root->fs_info->fs_devices->total_devices++; - if (device->can_discard) - root->fs_info->fs_devices->num_can_discard++; root->fs_info->fs_devices->total_rw_bytes += device->total_bytes; spin_lock(&root->fs_info->free_chunk_lock); @@ -2371,8 +2361,6 @@ int btrfs_init_dev_replace_tgtdev(struct btrfs_root *root, char *device_path, list_add(&device->dev_list, &fs_info->fs_devices->devices); fs_info->fs_devices->num_devices++; fs_info->fs_devices->open_devices++; - if (device->can_discard) - fs_info->fs_devices->num_can_discard++; mutex_unlock(&root->fs_info->fs_devices->device_list_mutex); *device_out = device; diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index e894ac6..37f8bff 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -124,7 +124,6 @@ struct btrfs_fs_devices { u64 rw_devices; u64 missing_devices; u64 total_rw_bytes; - u64 num_can_discard; u64 total_devices; struct block_device *latest_bdev; -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 02/18] Btrfs: cleanup double assignment of device->bytes_used when device replace finishes
Signed-off-by: Miao Xie --- fs/btrfs/dev-replace.c | 1 - 1 file changed, 1 deletion(-) diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c index a85b5f5..10dfb41 100644 --- a/fs/btrfs/dev-replace.c +++ b/fs/btrfs/dev-replace.c @@ -550,7 +550,6 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info *fs_info, tgt_device->is_tgtdev_for_dev_replace = 0; tgt_device->devid = src_device->devid; src_device->devid = BTRFS_DEV_REPLACE_DEVID; - tgt_device->bytes_used = src_device->bytes_used; memcpy(uuid_tmp, tgt_device->uuid, sizeof(uuid_tmp)); memcpy(tgt_device->uuid, src_device->uuid, sizeof(tgt_device->uuid)); memcpy(src_device->uuid, uuid_tmp, sizeof(src_device->uuid)); -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 06/18] Btrfs: Fix wrong free_chunk_space assignment during removing a device
During removing a device, we have modified free_chunk_space when we shrink the device, so we needn't assign a new value to it after the device shrink. Fix it. Signed-off-by: Miao Xie --- fs/btrfs/volumes.c | 5 - 1 file changed, 5 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index f8273bb..1524b3f 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1671,11 +1671,6 @@ int btrfs_rm_device(struct btrfs_root *root, char *device_path) if (ret) goto error_undo; - spin_lock(&root->fs_info->free_chunk_lock); - root->fs_info->free_chunk_space = device->total_bytes - - device->bytes_used; - spin_unlock(&root->fs_info->free_chunk_lock); - device->in_fs_metadata = 0; btrfs_scrub_cancel_dev(root->fs_info, device); -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel 3.17-rc3: task rsync:2524 blocked for more than 120 seconds
On 09/02/2014 09:31 PM, john terragon wrote: > Rsync finished. FWIW in the end it reported an average speed of about > 900K/sec. Without autodefrag there have been no messages about hung > kworkers even though rsync seemingly keeps getting hung for several > minutes throughout the whole execution. So lets take a step back and figure out how fast the usb stick actually is. This will erase your usb stick, but give us an idea of its performance: dd if=/dev/zero of=/dev/ bs=20M oflag=direct count=100 Note again, the above command will erase your usb stick ;) Use whatever device name you've been sending to mkfs.btrfs The kernel will allow a pretty significant amount of ram to be dirtied before forcing writeback, which is why you're seeing rsync stall at seemingly strange intervals. In the base of btrfs with compression, we add some worker threads between rsync and the device, and these may be turning the writeback into a somewhat more bursty operation. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Problem with applying incremental btrfs-send
Hi, maybe someone can enlighten me. I am doing btrfs send & receive with full snapshots and incremental updates. It basically looks like this: vol-0 and vol-1 are full subvolume image sends. inc-1 and inc-2 are incremental images with: # btrfs send -f inc-1 -p vol vol' # btrfs send -f inc-2 -p vol' vol'' Case A: vol---send> vol-0 --receive--> avol --send&rec--> bvol |-send> inc-1 --receive--> | | <--receive-- inc-1 vol' -send> vol-1 avol' bvol' |-send> inc-2 --receive--> | | <--receive-- inc-2 vol'' avol'' bvol'' which, works for bvol and which even works, if bvol is removed before inc-2 is applied to bvol'. Case B: vol---send> vol-0 --receive--> avol |-send> inc-1 --receive--> | vol' -send> vol-1 avol' --send&rec--> bvol' |-send> inc-2 --receive--> | | <--receive-- inc-2 vol'' avol'' ERROR trying to apply inc-2 to bvol' fails with: ERROR: could not find parent subvolume What's the problem here? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
kernel BUG at fs/btrfs/relocation.c:1065 in 3.14.16 to 3.17-rc3
Hi, I have a btrfs partition which throw kernel BUG, even with linux 3.17-rc3 (I tried 3.14.16, 3.16.1 and 3.17-rc3 kernels) : [ 45.058466] [ cut here ] [ 45.058539] kernel BUG at fs/btrfs/relocation.c:1065! [ 45.058578] invalid opcode: [#1] SMP [ 45.058655] Modules linked in: nf_conntrack iTCO_wdt iTCO_vendor_support i2c_i801 lpc_ich ehci_pci i2ccore ehci_hcd mfd_core evdev battery ie31200_edac edac_core video button btrfs xor raid6_pq dm_mod raid1 md_mod sg sd_mod crc_t10dif crct10dif_common thermal ahci libahci libata scsi_mod xhci_hcd e1000e fan ptp pps_core [ 45.059500] CPU: 2 PID: 1740 Comm: btrfs-balance Not tainted 3.17-rc3-dae-intel #1 [ 45.059550] Hardware name: Digicube sas DediCube/DQ77MK, BIOS MKQ7710H.86A.0058.2013.0226.1541 02/26/2013 [ 45.059602] task: 8802151c17e0 ti: 8802105ec000 task.ti: 8802105ec000 [ 45.059652] RIP: 0010:[] [] build_backref_tree+0xa3d/0xcf6 [btrfs] [ 45.059739] RSP: 0018:8802105efaf0 EFLAGS: 00010246 [ 45.059776] RAX: 8802105efb00 RBX: 880213b83800 RCX: 880210565d10 [ 45.059816] RDX: 8802105efb68 RSI: 8802105efb68 RDI: 880210565d10 [ 45.059857] RBP: 880210565d10 R08: 88021313fc40 R09: 1000 [ 45.059896] R10: 1600 R11: 6db6db6db6db6db7 R12: 8800d114d310 [ 45.059937] R13: 8802105efb78 R14: 8800d114d2c0 R15: 88021313fc40 [ 45.059977] FS: () GS:88021e28() knlGS: [ 45.060028] CS: 0010 DS: ES: CR0: 80050033 [ 45.060066] CR2: 7f2fc9c649b8 CR3: 01611000 CR4: 001407e0 [ 45.060105] Stack: [ 45.060138] 88021d5fc890 88021417d890 8802105efb68 [ 45.060264] 00010005 880213b83920 8802105efb78 00ffa015ecd1 [ 45.060392] 8800d114d400 8800d114d240 880210464220 880210565d40 [ 45.060516] Call Trace: [ 45.060556] [] ? relocate_tree_blocks+0x15f/0x430 [btrfs] [ 45.060607] [] ? tree_insert+0x44/0x47 [btrfs] [ 45.060656] [] ? add_tree_block+0x112/0x13c [btrfs] [ 45.060702] [] ? relocate_block_group+0x26d/0x4a6 [btrfs] [ 45.060753] [] ? btrfs_wait_ordered_roots+0x18f/0x1ab [btrfs] [ 45.060812] [] ? btrfs_relocate_block_group+0x154/0x265 [btrfs] [ 45.060872] [] ? btrfs_relocate_chunk.isra.29+0x52/0x55d [btrfs] [ 45.060932] [] ? btrfs_set_lock_blocking_rw+0xa8/0xaa [btrfs] [ 45.060988] [] ? btrfs_item_key_to_cpu+0x12/0x30 [btrfs] [ 45.061039] [] ? btrfs_get_token_64+0x75/0xcf [btrfs] [ 45.061088] [] ? release_extent_buffer+0x26/0x96 [btrfs] [ 45.061170] [] ? btrfs_balance+0x9e3/0xb78 [btrfs] [ 45.061263] [] ? btrfs_balance+0xb78/0xb78 [btrfs] [ 45.061314] [] ? balance_kthread+0x4f/0x6d [btrfs] [ 45.061360] [] ? kthread+0xa7/0xaf [ 45.061420] [] ? SyS_old_getrlimit+0x21/0xcb [ 45.061460] [] ? __kthread_parkme+0x5b/0x5b [ 45.061501] [] ? ret_from_fork+0x7c/0xb0 [ 45.061541] [] ? __kthread_parkme+0x5b/0x5b [ 45.061579] Code: 26 a8 02 74 0d 4c 89 e7 e8 3c e1 ff ff 41 80 66 71 fd 49 8b 46 58 49 89 6e 58 4c 89 65 00 48 89 45 08 48 89 28 eb c0 a8 10 75 02 <0f> 0b 83 e0 01 39 44 24 10 0f 84 20 ff ff ff 0f 0b 49 8b 46 58 [ 45.063148] RIP [] build_backref_tree+0xa3d/0xcf6 [btrfs] [ 45.063219] RSP [ 45.063260] ---[ end trace c396e96e4d1a5697 ]--- I have dump the FS with btrfs-image, but don't know where to push that. So you can download it at : https://daevel.fr/img/btrfs-image.out (near 6GB, md5sum ee5559ab31368aba60c259ce3b5b9504) -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel 3.17-rc3: task rsync:2524 blocked for more than 120 seconds
On Wed, Sep 03, 2014 at 08:03:51AM +0200, john terragon wrote: > I tried the same routine on 32GB usb sticks. Same exact problems. 32GB > seems a bit much for a --mixed btrfs. > I haven't tried ssd_spread, maybe it's beneficial. However, as I wrote > above, disabling autodefrag gets rid completely of the "INFO: hung > task" messages but even though the kernel doesn't complain about > blocked kworkers, the rsync process still blocks for several minutes > throughout the whole copy. It's very nice to know that you can reproduce it with autodefrag. I made a few analysis on the provided blocked stacks, the key is what blocks writing free space cache's pages to finish, it sits on wait_on_page_bit() which expects a WRITEBACK bit. Could you please paste the output of sysrq-w and sysrq-t when you get that hang? thanks, -liubo > > > On Wed, Sep 3, 2014 at 4:44 AM, Chris Murphy wrote: > > > > On Sep 2, 2014, at 12:40 AM, Duncan <1i5t5.dun...@cox.net> wrote: > >> > >> Mkfs.btrfs used to default to 4 KiB node/leaf sizes; now days it defaults > >> to 16 KiB as that's far better for most usage. I wonder if USB sticks > >> are an exception... > > > > USB sticks > 1 GB get 16KB nodesize also. At <= 1 GB, mixed-bg is default > > as is 4KB nodesize. Probably because queue/rotational is 1 for USB sticks, > > they mount without ssd or ssd_spread which may be unfortunate (I haven't > > benchmarked it but I suspect ssd_spread would work well for USB sticks). > > > > It was suggested a while ago that maybe mixed-bg should apply to larger > > volumes, maybe up to 8GB or 16GB? > > > > > > Chris Murphy > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > > the body of a message to majord...@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 11/12] Btrfs: implement repair function when direct read fails
On Tue, 2 Sep 2014 09:05:15 -0400, Chris Mason wrote: > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c > index 08e65e9..56b1546 100644 > --- a/fs/btrfs/disk-io.c > +++ b/fs/btrfs/disk-io.c > @@ -698,7 +719,12 @@ static void end_workqueue_bio(struct bio *bio, int > err) > > fs_info = end_io_wq->info; > end_io_wq->error = err; > - btrfs_init_work(&end_io_wq->work, end_workqueue_fn, NULL, NULL); > + > + if (likely(end_io_wq->metadata != BTRFS_WQ_ENDIO_DIO_REPAIR)) > + btrfs_init_work(&end_io_wq->work, end_workqueue_fn, NULL, > + NULL); > + else > + INIT_WORK(&end_io_wq->work.normal_work, dio_end_workqueue_fn); It's not clear why this one is using INIT_WORK instead of btrfs_init_work, or why we're calling directly into queue_work instead of btrfs_queue_work. What am I missing? >>> >>> I'm sorry that I forgot writing the explanation in this patch's changlog, >>> I wrote it in Patch 0. >>> >>> "2.When the io on the mirror ends, we will insert the endio work into the >>>system workqueue, not btrfs own endio workqueue, because the original >>>endio work is still blocked in the btrfs endio workqueue, if we insert >>>the endio work of the io on the mirror into that workqueue, deadlock >>>would happen." >> >> Can you elaborate the deadlock? >> >> Now that buffer read can insert a subsequent read-mirror bio into btrfs endio >> workqueue without problems, what's the difference? > > We do have problems if we're inserting dependent items in the same > workqueue. > > Miao, please make a repair workqueue. I'll also have a use for it in > the raid56 parity work I think. OK, I'll update the patch soon. Thanks Miao -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs-progs-3.16: fs metadata is both single and dup?
Hugo Mills posted on Wed, 03 Sep 2014 08:33:05 +0100 as excerpted: > On Wed, Sep 03, 2014 at 04:53:39AM +, Duncan wrote: >> Hugo Mills posted on Tue, 02 Sep 2014 13:13:49 +0100 as excerpted: >> >> >> [A] btrfs fi df on a new filesystem always seems to have those extra >> unused single profile lines. >> >> I got so the first thing I'd do on first mount was a balance -- before >> there was anything actually on the filesystem so it was real fast -- to >> get rid of those null entries. > >Interesting. Last time I tried that (balance without any contents), > the balance removed *all* the chunks, and then the FS forgot about what > configuration it should have and reverted to RAID-1/single. I usually > recommend writing at least one 4k+ file to the FS first, if it's > bothering someone so much that they can't let it go. Interesting indeed. From memory, even before I've put anything on the filesystem it always seems to have a bit of the first chunk of both data and metadata used -- not much but enough that it's obvious in the df which mode chunks are the null-chunks, and apparently obvious to the balance as well, as it has always left me with at least a first chunk of each. I wonder what the difference might be. Perhaps it's just the versions of kernel and/or userspace I've happened to do all my mkfs.btrfs with? Or maybe it's one of the features (like thin-metadata or noholes) I enable by default, or the fact that I use labels for partition ID and tracking, so I always fill that in. Whatever it is, it seems to put a bit of something in the filesystem, possibly at first mount, so the actually used chunks, one each of data and metadata, aren't entirely empty. Or maybe I'm remembering wrong and I've just been lucky. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs-progs-3.16: fs metadata is both single and dup?
On Wed, Sep 03, 2014 at 04:53:39AM +, Duncan wrote: > Hugo Mills posted on Tue, 02 Sep 2014 13:13:49 +0100 as excerpted: > > > On Tue, Sep 02, 2014 at 12:05:33PM +, Holger Hoffstätte wrote: > >> So where does the confusing initial display come from? [I] don't > >> remember ever seeing this with btrfs-progs-3.14.2. > > > >Your memory is faulty, I'm afraid. It's always done that -- at > > least since I started using btrfs, several years ago. > > > >I believe it comes from mkfs creating a trivial basic filesystem > > (with the single profiles), and then setting enough flags on it that the > > kernel can bootstrap it with the desired chunks in it -- but I may be > > wrong about that. > > Agreed. It's an artifact of the mkfs.btrfs process and a btrfs fi df on > a new filesystem always seems to have those extra unused single profile > lines. > > I got so the first thing I'd do on first mount was a balance -- before > there was anything actually on the filesystem so it was real fast -- to > get rid of those null entries. Interesting. Last time I tried that (balance without any contents), the balance removed *all* the chunks, and then the FS forgot about what configuration it should have and reverted to RAID-1/single. I usually recommend writing at least one 4k+ file to the FS first, if it's bothering someone so much that they can't let it go. Hugo. > Actually, I had already created a little mkfs.btrfs helper script that > sets options I normally want, etc, and after doing the mkfs and balance > drill a few times, I setup the script such that if at the appropriate > prompt I give it a mountpoint to point balance at, it'll mount the > filesystem and immediately run a balance, thus automating things and > making the balance part of the same scripted process that does the > mkfs.btrfs in the first place. > > IOW, those null-entry lines bother me too... enough that even tho I know > what they are I arranged things so they're automatically and immediately > eliminated and I don't have to see 'em! =:^) > -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Never underestimate the bandwidth of a Volvo filled --- with backup tapes. signature.asc Description: Digital signature
[PATCH v2 06/10] Btrfs: Fix the problem that the dirty flag of dev stats is cleared
The io error might happen during writing out the device stats, and the device stats information and dirty flag would be update at that time, but the current code didn't consider this case, just clear the dirty flag, it would cause that we forgot to write out the new device stats information. Fix it. Signed-off-by: Miao Xie --- Changelog v1 -> v2: - Change the variant name and make some cleanup by David's comment --- fs/btrfs/volumes.c | 8 ++-- fs/btrfs/volumes.h | 16 2 files changed, 18 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 19188df..4ea73c8 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -159,6 +159,7 @@ static struct btrfs_device *__alloc_device(void) spin_lock_init(&dev->reada_lock); atomic_set(&dev->reada_in_flight, 0); + atomic_set(&dev->dev_stats_dirty, 0); INIT_RADIX_TREE(&dev->reada_zones, GFP_NOFS & ~__GFP_WAIT); INIT_RADIX_TREE(&dev->reada_extents, GFP_NOFS & ~__GFP_WAIT); @@ -6398,16 +6399,19 @@ int btrfs_run_dev_stats(struct btrfs_trans_handle *trans, struct btrfs_root *dev_root = fs_info->dev_root; struct btrfs_fs_devices *fs_devices = fs_info->fs_devices; struct btrfs_device *device; + int dirtied; int ret = 0; mutex_lock(&fs_devices->device_list_mutex); list_for_each_entry(device, &fs_devices->devices, dev_list) { - if (!device->dev_stats_valid || !device->dev_stats_dirty) + dirtied = atomic_read(&device->dev_stats_dirty); + + if (!device->dev_stats_valid || !dirtied) continue; ret = update_dev_stat_item(trans, dev_root, device); if (!ret) - device->dev_stats_dirty = 0; + atomic_sub(dirtied, &device->dev_stats_dirty); } mutex_unlock(&fs_devices->device_list_mutex); diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 6fcc8ea..9a1eff3 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -110,7 +110,8 @@ struct btrfs_device { /* disk I/O failure stats. For detailed description refer to * enum btrfs_dev_stat_values in ioctl.h */ int dev_stats_valid; - int dev_stats_dirty; /* counters need to be written to disk */ + + atomic_t dev_stats_dirty; /* counters need to be written to disk */ atomic_t dev_stat_values[BTRFS_DEV_STAT_VALUES_MAX]; }; @@ -359,11 +360,18 @@ unsigned long btrfs_full_stripe_len(struct btrfs_root *root, int btrfs_finish_chunk_alloc(struct btrfs_trans_handle *trans, struct btrfs_root *extent_root, u64 chunk_offset, u64 chunk_size); + +static inline void btrfs_dev_dirty_stat(struct btrfs_device *dev) +{ + smp_mb__before_atomic(); + atomic_inc(&dev->dev_stats_dirty); +} + static inline void btrfs_dev_stat_inc(struct btrfs_device *dev, int index) { atomic_inc(dev->dev_stat_values + index); - dev->dev_stats_dirty = 1; + btrfs_dev_dirty_stat(dev); } static inline int btrfs_dev_stat_read(struct btrfs_device *dev, @@ -378,7 +386,7 @@ static inline int btrfs_dev_stat_read_and_reset(struct btrfs_device *dev, int ret; ret = atomic_xchg(dev->dev_stat_values + index, 0); - dev->dev_stats_dirty = 1; + btrfs_dev_dirty_stat(dev); return ret; } @@ -386,7 +394,7 @@ static inline void btrfs_dev_stat_set(struct btrfs_device *dev, int index, unsigned long val) { atomic_set(dev->dev_stat_values + index, val); - dev->dev_stats_dirty = 1; + btrfs_dev_dirty_stat(dev); } static inline void btrfs_dev_stat_reset(struct btrfs_device *dev, -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html