Re: Unable to fixup (regular) error in RAID1 fs
El 2014-10-29 04:02, Duncan escribió: Juan Orti posted on Tue, 28 Oct 2014 16:54:19 +0100 as excerpted: [ 3713.086292] BTRFS: unable to fixup (regular) error at logical 483011874816 on dev /dev/sdb2 [ 3713.092577] BTRFS: checksum error at logical 483011948544 on dev /dev/sdb2, sector 628793528, root 2500, inode 1436631, offset 4059963392, length 4096, links 1 (path: juan/.local/share/gnome-boxes/images/boxes-unknown) [ 3713.092584] BTRFS: bdev /dev/sdb2 errs: wr 0, rd 0, flush 0, corrupt 38, gen 0 [ 3713.093035] BTRFS: unable to fixup (regular) error at logical 483011948544 on dev /dev/sdb2 Why can't it fix the errors? a bad device? smartctl says the disk is ok. I'm currently running a full scrub to see if it finds more errors. What should I do? Btrfs raid1, and I see you have it for both data and metadata. During normal operation, when btrfs comes across a block that doesn't match its checksum, it will look to see if there's another copy (which there is with raid1, which has exactly two copies) of that block and will try to use it instead if so. If the second copy matches the checksum, all is fine and btrfs will in fact attempt to rewrite the bad copy using the good copy, as well as returning the good copy to whatever was reading it. Those corruption errors seem to indicate that it can't find a good copy to update the bad copy with -- both copies ended up bad. Either that or it found the good copy and returned it to whatever was reading, but couldn't rewrite the bad copy, for some reason. I'm not sure which of those interpretations is correct, but given that you didn't see anything else bad happening, no apps returning errors due to read error, etc, I'd guess the second. Because otherwise whatever was doing the read should have returned an error. When this error happened, I was editing some text files with vi, and it was painfully slow, it took 30 seconds to open a 20 lines file, so something weird was going on. Anyway, no visible user space error could be seen. Doing a scrub, as you already did, is the first thing I'd try here, since normal operation won't catch all the errors. BUT, you report that the scrub found no errors, which is weird. You have the log saying there's corruption errors, but scrub saying there's not. The easiest explanation for something like that, is that the errors were temporary. If it happens again or regularly, consider running memcheck or the like, as it could be bad memory. Do you have ECC RAM? I don't have ECC RAM, it's a regular desktop PC. Some RAM checks in the past have shown no errors, I'll check it again. Another question. Do you have skinny metadata on that btrfs? If you do, btrfs should mention skinny extents when mounting the filesystem. No skinny metadata. I made the fs with the standard options, just with raid1 for data and metadata. The reason I'm asking this is that if I'm reading the patch descriptions correctly, a recently posted patch deals with a specific skinny-metadata bug where wrong results would occasionally be returned, resulting in errors. Not being a dev I don't have the technical ability to know for sure whether this could be connected to that or not, but it sounds like the sort of thing I might expect from a bug that intermittently returned bad data -- odd apparent corruption errors in normal use that scrub can't see, even tho it's designed to catch and fix if possible exactly that sort of corruption error. Anyway, if scrub says no corruption, for a potential corruption error I'd be inclined to trust scrub, so I think the filesystem is fine. But if so, I'm worried about what might be triggering these intermittent errors. Certainly watch for more of them, and if you're running skinny-metadata, consider finding and applying that patch. If not or in general, also be on the lookout for more possible hints of failing memory and/or run a good memory checker for a few hours and see if it reports all is well. But as they say about some kinds of potential cancer reports at times, sometimes watchful waiting is the best you can do, hoping no further symptoms show up, but being alert in case they do, to try something more drastic, that isn't warranted /unless/ they do. That's what I'll do, I'll wait and see. Thank you for your explanation. -- Juan Orti https://miceliux.com -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3] Btrfs: fix snapshot inconsistency after a file write followed by truncate
If right after starting the snapshot creation ioctl we perform a write against a file followed by a truncate, with both operations increasing the file's size, we can get a snapshot tree that reflects a state of the source subvolume's tree where the file truncation happened but the write operation didn't. This leaves a gap between 2 file extent items of the inode, which makes btrfs' fsck complain about it. For example, if we perform the following file operations: $ mkfs.btrfs -f /dev/vdd $ mount /dev/vdd /mnt $ xfs_io -f \ -c pwrite -S 0xaa -b 32K 0 32K \ -c fsync \ -c pwrite -S 0xbb -b 32770 16K 32770 \ -c truncate 90123 \ /mnt/foobar and the snapshot creation ioctl was just called before the second write, we often can get the following inode items in the snapshot's btree: item 120 key (257 INODE_ITEM 0) itemoff 7987 itemsize 160 inode generation 146 transid 7 size 90123 block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0 flags 0x0 item 121 key (257 INODE_REF 256) itemoff 7967 itemsize 20 inode ref index 282 namelen 10 name: foobar item 122 key (257 EXTENT_DATA 0) itemoff 7914 itemsize 53 extent data disk byte 1104855040 nr 32768 extent data offset 0 nr 32768 ram 32768 extent compression 0 item 123 key (257 EXTENT_DATA 53248) itemoff 7861 itemsize 53 extent data disk byte 0 nr 0 extent data offset 0 nr 40960 ram 40960 extent compression 0 There's a file range, corresponding to the interval [32K; ALIGN(16K + 32770, 4096)[ for which there's no file extent item covering it. This is because the file write and file truncate operations happened both right after the snapshot creation ioctl called btrfs_start_delalloc_inodes(), which means we didn't start and wait for the ordered extent that matches the write and, in btrfs_setsize(), we were able to call btrfs_cont_expand() before being able to commit the current transaction in the snapshot creation ioctl. So this made it possibe to insert the hole file extent item in the source subvolume (which represents the region added by the truncate) right before the transaction commit from the snapshot creation ioctl. Btrfs' fsck tool complains about such cases with a message like the following: root 331 inode 257 errors 100, file extent discount From a user perspective, the expectation when a snapshot is created while those file operations are being performed is that the snapshot will have a file that either: 1) is empty 2) only the first write was captured 3) only the 2 writes were captured 4) both writes and the truncation were captured But never capture a state where only the first write and the truncation were captured (since the second write was performed before the truncation). A test case for xfstests follows. Signed-off-by: Filipe Manana fdman...@suse.com --- V2: Use different approach to solve the problem. Don't start and wait for all dellaloc to finish after every expanding truncate, instead add an additional flush at transaction commit time if we're doing a transaction commit that creates snapshots. V3: Removed useless test condition in +wait_pending_snapshot_roots_delalloc(). fs/btrfs/transaction.c | 59 ++ 1 file changed, 59 insertions(+) diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 396ae8b..5e7f004 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -1714,12 +1714,65 @@ static inline void btrfs_wait_delalloc_flush(struct btrfs_fs_info *fs_info) btrfs_wait_ordered_roots(fs_info, -1); } +static int +start_pending_snapshot_roots_delalloc(struct btrfs_trans_handle *trans, + struct list_head *splice) +{ + struct btrfs_pending_snapshot *pending_snapshot; + int ret = 0; + + if (btrfs_test_opt(trans-root, FLUSHONCOMMIT)) + return 0; + + spin_lock(trans-root-fs_info-trans_lock); + list_splice_init(trans-transaction-pending_snapshots, splice); + spin_unlock(trans-root-fs_info-trans_lock); + + /* +* Start again delalloc for the roots our pending snapshots are made +* from. We did it before starting/joining a transaction and we do it +* here again because new inode operations might have happened since +* then and we want to make sure the snapshot captures a fully +* consistent state of the source root tree. For example, if after the +* first delalloc flush a write is made against an inode followed by +* an expanding truncate, we want to make sure the snapshot captured +* both the write and the truncation, and not just the truncation. +* Here we shouldn't have much delalloc work to do, as the bulk of it +* was done before and outside the
Re: [PATCH] Btrfs: don't do async reclaim during log replay V2
Ping.. On Thu, 23 Oct 2014 16:44:54 +0800, Miao Xie wrote: On Thu, 18 Sep 2014 11:27:17 -0400, Josef Bacik wrote: Trying to reproduce a log enospc bug I hit a panic in the async reclaim code during log replay. This is because we use fs_info-fs_root as our root for shrinking and such. Technically we can use whatever root we want, but let's just not allow async reclaim while we're doing log replay. Thanks, Why not move the code of fs_root initialization to the front of log replay? I think it is better than the fix way in this patch because the async reclaimer can help us do some work. Thanks Miao Signed-off-by: Josef Bacik jba...@fb.com --- V1-V2: use fs_info-log_root_recovering instead, didn't notice this existed before. fs/btrfs/extent-tree.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 28a27d5..44d0497 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -4513,7 +4513,13 @@ again: space_info-flush = 1; } else if (!ret space_info-flags BTRFS_BLOCK_GROUP_METADATA) { used += orig_bytes; -if (need_do_async_reclaim(space_info, root-fs_info, used) +/* + * We will do the space reservation dance during log replay, + * which means we won't have fs_info-fs_root set, so don't do + * the async reclaim as we will panic. + */ +if (!root-fs_info-log_root_recovering +need_do_async_reclaim(space_info, root-fs_info, used) !work_busy(root-fs_info-async_reclaim_work)) queue_work(system_unbound_wq, root-fs_info-async_reclaim_work); -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] Btrfs: fix snapshot inconsistency after a file write followed by truncate
On Wed, 29 Oct 2014 08:21:12 +, Filipe Manana wrote: If right after starting the snapshot creation ioctl we perform a write against a file followed by a truncate, with both operations increasing the file's size, we can get a snapshot tree that reflects a state of the source subvolume's tree where the file truncation happened but the write operation didn't. This leaves a gap between 2 file extent items of the inode, which makes btrfs' fsck complain about it. For example, if we perform the following file operations: $ mkfs.btrfs -f /dev/vdd $ mount /dev/vdd /mnt $ xfs_io -f \ -c pwrite -S 0xaa -b 32K 0 32K \ -c fsync \ -c pwrite -S 0xbb -b 32770 16K 32770 \ -c truncate 90123 \ /mnt/foobar and the snapshot creation ioctl was just called before the second write, we often can get the following inode items in the snapshot's btree: item 120 key (257 INODE_ITEM 0) itemoff 7987 itemsize 160 inode generation 146 transid 7 size 90123 block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0 flags 0x0 item 121 key (257 INODE_REF 256) itemoff 7967 itemsize 20 inode ref index 282 namelen 10 name: foobar item 122 key (257 EXTENT_DATA 0) itemoff 7914 itemsize 53 extent data disk byte 1104855040 nr 32768 extent data offset 0 nr 32768 ram 32768 extent compression 0 item 123 key (257 EXTENT_DATA 53248) itemoff 7861 itemsize 53 extent data disk byte 0 nr 0 extent data offset 0 nr 40960 ram 40960 extent compression 0 There's a file range, corresponding to the interval [32K; ALIGN(16K + 32770, 4096)[ for which there's no file extent item covering it. This is because the file write and file truncate operations happened both right after the snapshot creation ioctl called btrfs_start_delalloc_inodes(), which means we didn't start and wait for the ordered extent that matches the write and, in btrfs_setsize(), we were able to call btrfs_cont_expand() before being able to commit the current transaction in the snapshot creation ioctl. So this made it possibe to insert the hole file extent item in the source subvolume (which represents the region added by the truncate) right before the transaction commit from the snapshot creation ioctl. Btrfs' fsck tool complains about such cases with a message like the following: root 331 inode 257 errors 100, file extent discount From a user perspective, the expectation when a snapshot is created while those file operations are being performed is that the snapshot will have a file that either: 1) is empty 2) only the first write was captured 3) only the 2 writes were captured 4) both writes and the truncation were captured But never capture a state where only the first write and the truncation were captured (since the second write was performed before the truncation). A test case for xfstests follows. Signed-off-by: Filipe Manana fdman...@suse.com --- V2: Use different approach to solve the problem. Don't start and wait for all dellaloc to finish after every expanding truncate, instead add an additional flush at transaction commit time if we're doing a transaction commit that creates snapshots. This method will make the transaction commit spend more time, why not use i_disk_size to expand the file size in btrfs_setsize()? Or we might rename btrfs_{start, end}_nocow_write(), and use them in btrfs_setsize()? Thanks Miao V3: Removed useless test condition in +wait_pending_snapshot_roots_delalloc(). fs/btrfs/transaction.c | 59 ++ 1 file changed, 59 insertions(+) diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 396ae8b..5e7f004 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -1714,12 +1714,65 @@ static inline void btrfs_wait_delalloc_flush(struct btrfs_fs_info *fs_info) btrfs_wait_ordered_roots(fs_info, -1); } +static int +start_pending_snapshot_roots_delalloc(struct btrfs_trans_handle *trans, + struct list_head *splice) +{ + struct btrfs_pending_snapshot *pending_snapshot; + int ret = 0; + + if (btrfs_test_opt(trans-root, FLUSHONCOMMIT)) + return 0; + + spin_lock(trans-root-fs_info-trans_lock); + list_splice_init(trans-transaction-pending_snapshots, splice); + spin_unlock(trans-root-fs_info-trans_lock); + + /* + * Start again delalloc for the roots our pending snapshots are made + * from. We did it before starting/joining a transaction and we do it + * here again because new inode operations might have happened since + * then and we want to make sure the snapshot captures a fully + * consistent state of the source root
Re: [PATCH v2] btrfs: ioctl BTRFS_IOC_FS_INFO and BTRFS_IOC_DEV_INFO miss-matched with slots
There will be compatibility issue with this patch running older kernel, sorry I slipped some combination. As I see this is already in, I am sending a patch to back out this changes if it helps. Thanks. On 09/04/14 20:02, Anand Jain wrote: On 09/04/2014 05:58 PM, David Sterba wrote: On Mon, Aug 18, 2014 at 04:38:18PM +0800, Anand Jain wrote: ioctl BTRFS_IOC_FS_INFO return num_devices which does _not_ include seed device, But the following ioctl BTRFS_IOC_DEV_INFO counts and gets seed disk when probed. So in the userland we hit a count-slot missmatch bug.. get_fs_info() :: BUG_ON(ndevs = fi_args-num_devices); which hits this bug when we have mounted a seed device. So to fix this problem here in this patch ioctl BTRFS_IOC_FS_INFO will provide total_devices instead of num_devices. The ioctl is very unclear what the 'num_device' actually means. Right. Thats also true in kernel. very messy. very confusing. tool btrfs-devlist would help understand whats going on. $ egrep num_device *.c | egrep total_device ioctl.c:fi_args-num_devices = fs_devices-total_devices; super.c:ret = !(fs_devices-num_devices == fs_devices-total_devices); volumes.c:total_devices = btrfs_super_num_devices(disk_super); By the way about BTRFS_IOC_DEVICES_READY ioctl above its long time broken with seed/replace, just waiting to get these patches integrated first so to fix it later. This would fix the problem partly. Partly because ealier num_devices included the replacing device but now total_device does not include the replacing device. Getting a count which includes a transient device is rather too in efficient/wrong indeed, because there can be a race condition where in the time between ioctl BTRFS_IOC_FS_INFO to BTRFS_IOC_DEV_INFO the replace device operation might have been completed. So to fix this problem its better that user land btrfs-progs probes replacing device (at devid 0) separately. v2: Agree with Wang's comment. Its better to show seed disks under the sprout fs, so that user can establish mapping of seed to sprout devices. So here I am making BTRFS_IOC_FS_INFO to return the total_devices which would count the seed devices (but not the replacing device). This is even more confusing. I think we need to add another member to the ioctl struct to reflect the number of regular devices (num_devices) and the true total number of devices including seeding and replaced devices. that will be a better way. thanks. The difference should be accompanied by a flag that would say if there's a seeding or replace in progress. There are some backward compatibility concerns. Setting num_devices to total_devices changes semantics of the ioctl, so I think it should stay as is for now, As I have tested there is not backward compatibility issue. But from semantics perspective .. agreed. but the BUG_ON can be removed and replaced by code that reallocates the buffer or allocates a few more items in advance. We don't know how may seed devices are there for a sprout FS. So thats not possible. Will review resubmit. Thanks for commenting. Anand -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] revert btrfs-progs: do a separate probe for _transient_ replacing device
There is a compatibility issue with older kernel with the progs commit id as below. 05cd2907557ba627cfb86e60b214ea6228613a84 So as of now writing to revert the above commit id. The brewing sysfs interface would help to fix the impending issue, which is seed device would fail show in 'btrfs fi show' output of a sprout device. Signed-off-by: Anand Jain anand.j...@oracle.com --- utils.c | 19 +-- 1 file changed, 1 insertion(+), 18 deletions(-) diff --git a/utils.c b/utils.c index a8691fe..1d1cc77 100644 --- a/utils.c +++ b/utils.c @@ -1869,29 +1869,12 @@ int get_fs_info(char *path, struct btrfs_ioctl_fs_info_args *fi_args, if (!fi_args-num_devices) goto out; - /* -* with kernel patch -* btrfs: ioctl BTRFS_IOC_FS_INFO and BTRFS_IOC_DEV_INFO miss-matched with slots -* the kernel now returns total_devices which does not include -* replacing device if running. -* As we need to get dev info of the replace device if it is running, -* so just add one to fi_args-num_devices. -*/ - - di_args = *di_ret = malloc((fi_args-num_devices + 1) * sizeof(*di_args)); + di_args = *di_ret = malloc((fi_args-num_devices) * sizeof(*di_args)); if (!di_args) { ret = -errno; goto out; } - /* get the replace target device if it is there */ - ret = get_device_info(fd, i, di_args[ndevs]); - if (!ret) { - ndevs++; - fi_args-num_devices++; - } - i++; - for (; i = fi_args-max_id; ++i) { BUG_ON(ndevs = fi_args-num_devices); ret = get_device_info(fd, i, di_args[ndevs]); -- 2.0.0.153.g79d -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs-progs: fix dev stats error output related to replace handle
Hi Gui, We don't need this patch. Actually you should back out this patch to get this correct. [PATCH] btrfs-progs: do a separate probe for _transient_ replacing device OR apply. this [PATCH] revert btrfs-progs: do a separate probe for _transient_ replacing device Try it out. Lets know. Thanks On 10/23/14 09:56, Gui Hecheng wrote: Steps to reproduce: # mkfs.btrfs -f /dev/sdb7 # mount /dev/sdb7 /mnt # btrfs dev stats /dev/sdb7 output: [/dev/sdb7].write_io_errs 0 [/dev/sdb7].read_io_errs0 [/dev/sdb7].flush_io_errs 0 [/dev/sdb7].corruption_errs 0 [/dev/sdb7].generation_errs 0 * ERROR: ioctl(BTRFS_IOC_GET_DEV_STATS) on failed: No such device while the following cmd: # btrfs dev stats /mnt yields the right thing: [/dev/sdb7].write_io_errs 0 [/dev/sdb7].read_io_errs0 [/dev/sdb7].flush_io_errs 0 [/dev/sdb7].corruption_errs 0 [/dev/sdb7].generation_errs 0 This is caused by commit: commit d0588bfa479409b2a0f6243f894338a01a56221a btrfs-progs: do a separate probe for transient replacing device The above commit trys to handle the fi show problem with device under replacing, but it changes the @get_fs_info() logic which annoys dev stats. For @get_fs_info(): o If the passed in @path is a mount point, then the @get_device_info() to probe the replacing device will be glad to accept the device index var @i as its init value 0 and the following i++ correctly sets @i to 1 as the start of all devices in btrfs. o If @path is a block device, then the problem comes... The device index @i is set to devid of the block device passed in, and the @get_device_info() will be forced to accept the devid unwillingly. Then the following i++ do the evil of skip the block device desired and an empty piece is handled next which causes the ERROR above. To fix this problem, let's just pass 0 to the @get_device_info() explicitly, and set the index @i to 1 if a mount point is passed in. Under my own test, this will not affect the original fix of the fi show problem with device under replacing. Signed-off-by: Gui Hecheng guihc.f...@cn.fujitsu.com --- utils.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/utils.c b/utils.c index f10c178..0ba2b26 100644 --- a/utils.c +++ b/utils.c @@ -1881,12 +1881,15 @@ int get_fs_info(char *path, struct btrfs_ioctl_fs_info_args *fi_args, } /* get the replace target device if it is there */ - ret = get_device_info(fd, i, di_args[ndevs]); + ret = get_device_info(fd, 0, di_args[ndevs]); if (!ret) { ndevs++; fi_args-num_devices++; } - i++; + + /* if a mount point is passed in, start from devid 1 */ + if (fi_args-num_devices != 1) + i = 1; for (; i = fi_args-max_id; ++i) { BUG_ON(ndevs = fi_args-num_devices); -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs-progs: fix dev stats error output related to replace handle
On Wed, 2014-10-29 at 18:56 +0800, Anand Jain wrote: Hi Gui, We don't need this patch. Actually you should back out this patch to get this correct. [PATCH] btrfs-progs: do a separate probe for _transient_ replacing device OR apply. this [PATCH] revert btrfs-progs: do a separate probe for _transient_ replacing device Try it out. Lets know. Thanks Oh, yes, I've tried your revert patch and I acknowledge that it fixes the problem. So please *ignore* my patch David, sorry for the noise. -Gui On 10/23/14 09:56, Gui Hecheng wrote: Steps to reproduce: # mkfs.btrfs -f /dev/sdb7 # mount /dev/sdb7 /mnt # btrfs dev stats /dev/sdb7 output: [/dev/sdb7].write_io_errs 0 [/dev/sdb7].read_io_errs0 [/dev/sdb7].flush_io_errs 0 [/dev/sdb7].corruption_errs 0 [/dev/sdb7].generation_errs 0 * ERROR: ioctl(BTRFS_IOC_GET_DEV_STATS) on failed: No such device while the following cmd: # btrfs dev stats /mnt yields the right thing: [/dev/sdb7].write_io_errs 0 [/dev/sdb7].read_io_errs0 [/dev/sdb7].flush_io_errs 0 [/dev/sdb7].corruption_errs 0 [/dev/sdb7].generation_errs 0 This is caused by commit: commit d0588bfa479409b2a0f6243f894338a01a56221a btrfs-progs: do a separate probe for transient replacing device The above commit trys to handle the fi show problem with device under replacing, but it changes the @get_fs_info() logic which annoys dev stats. For @get_fs_info(): o If the passed in @path is a mount point, then the @get_device_info() to probe the replacing device will be glad to accept the device index var @i as its init value 0 and the following i++ correctly sets @i to 1 as the start of all devices in btrfs. o If @path is a block device, then the problem comes... The device index @i is set to devid of the block device passed in, and the @get_device_info() will be forced to accept the devid unwillingly. Then the following i++ do the evil of skip the block device desired and an empty piece is handled next which causes the ERROR above. To fix this problem, let's just pass 0 to the @get_device_info() explicitly, and set the index @i to 1 if a mount point is passed in. Under my own test, this will not affect the original fix of the fi show problem with device under replacing. Signed-off-by: Gui Hecheng guihc.f...@cn.fujitsu.com --- utils.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/utils.c b/utils.c index f10c178..0ba2b26 100644 --- a/utils.c +++ b/utils.c @@ -1881,12 +1881,15 @@ int get_fs_info(char *path, struct btrfs_ioctl_fs_info_args *fi_args, } /* get the replace target device if it is there */ - ret = get_device_info(fd, i, di_args[ndevs]); + ret = get_device_info(fd, 0, di_args[ndevs]); if (!ret) { ndevs++; fi_args-num_devices++; } - i++; + + /* if a mount point is passed in, start from devid 1 */ + if (fi_args-num_devices != 1) + i = 1; for (; i = fi_args-max_id; ++i) { BUG_ON(ndevs = fi_args-num_devices); -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] revert btrfs-progs: do a separate probe for _transient_ replacing device
On Wed, 2014-10-29 at 18:51 +0800, Anand Jain wrote: There is a compatibility issue with older kernel with the progs commit id as below. 05cd2907557ba627cfb86e60b214ea6228613a84 Which tree does this commit id belongs to? I can't find it anywhere? So as of now writing to revert the above commit id. The brewing sysfs interface would help to fix the impending issue, which is seed device would fail show in 'btrfs fi show' output of a sprout device. Signed-off-by: Anand Jain anand.j...@oracle.com --- utils.c | 19 +-- 1 file changed, 1 insertion(+), 18 deletions(-) diff --git a/utils.c b/utils.c index a8691fe..1d1cc77 100644 --- a/utils.c +++ b/utils.c @@ -1869,29 +1869,12 @@ int get_fs_info(char *path, struct btrfs_ioctl_fs_info_args *fi_args, if (!fi_args-num_devices) goto out; - /* - * with kernel patch - * btrfs: ioctl BTRFS_IOC_FS_INFO and BTRFS_IOC_DEV_INFO miss-matched with slots - * the kernel now returns total_devices which does not include - * replacing device if running. - * As we need to get dev info of the replace device if it is running, - * so just add one to fi_args-num_devices. - */ - - di_args = *di_ret = malloc((fi_args-num_devices + 1) * sizeof(*di_args)); + di_args = *di_ret = malloc((fi_args-num_devices) * sizeof(*di_args)); if (!di_args) { ret = -errno; goto out; } - /* get the replace target device if it is there */ - ret = get_device_info(fd, i, di_args[ndevs]); - if (!ret) { - ndevs++; - fi_args-num_devices++; - } - i++; - for (; i = fi_args-max_id; ++i) { BUG_ON(ndevs = fi_args-num_devices); ret = get_device_info(fd, i, di_args[ndevs]); -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4] Btrfs: fix snapshot inconsistency after a file write followed by truncate
If right after starting the snapshot creation ioctl we perform a write against a file followed by a truncate, with both operations increasing the file's size, we can get a snapshot tree that reflects a state of the source subvolume's tree where the file truncation happened but the write operation didn't. This leaves a gap between 2 file extent items of the inode, which makes btrfs' fsck complain about it. For example, if we perform the following file operations: $ mkfs.btrfs -f /dev/vdd $ mount /dev/vdd /mnt $ xfs_io -f \ -c pwrite -S 0xaa -b 32K 0 32K \ -c fsync \ -c pwrite -S 0xbb -b 32770 16K 32770 \ -c truncate 90123 \ /mnt/foobar and the snapshot creation ioctl was just called before the second write, we often can get the following inode items in the snapshot's btree: item 120 key (257 INODE_ITEM 0) itemoff 7987 itemsize 160 inode generation 146 transid 7 size 90123 block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0 flags 0x0 item 121 key (257 INODE_REF 256) itemoff 7967 itemsize 20 inode ref index 282 namelen 10 name: foobar item 122 key (257 EXTENT_DATA 0) itemoff 7914 itemsize 53 extent data disk byte 1104855040 nr 32768 extent data offset 0 nr 32768 ram 32768 extent compression 0 item 123 key (257 EXTENT_DATA 53248) itemoff 7861 itemsize 53 extent data disk byte 0 nr 0 extent data offset 0 nr 40960 ram 40960 extent compression 0 There's a file range, corresponding to the interval [32K; ALIGN(16K + 32770, 4096)[ for which there's no file extent item covering it. This is because the file write and file truncate operations happened both right after the snapshot creation ioctl called btrfs_start_delalloc_inodes(), which means we didn't start and wait for the ordered extent that matches the write and, in btrfs_setsize(), we were able to call btrfs_cont_expand() before being able to commit the current transaction in the snapshot creation ioctl. So this made it possibe to insert the hole file extent item in the source subvolume (which represents the region added by the truncate) right before the transaction commit from the snapshot creation ioctl. Btrfs' fsck tool complains about such cases with a message like the following: root 331 inode 257 errors 100, file extent discount From a user perspective, the expectation when a snapshot is created while those file operations are being performed is that the snapshot will have a file that either: 1) is empty 2) only the first write was captured 3) only the 2 writes were captured 4) both writes and the truncation were captured But never capture a state where only the first write and the truncation were captured (since the second write was performed before the truncation). A test case for xfstests follows. Signed-off-by: Filipe Manana fdman...@suse.com --- V2: Use different approach to solve the problem. Don't start and wait for all dellaloc to finish after every expanding truncate, instead add an additional flush at transaction commit time if we're doing a transaction commit that creates snapshots. V3: Removed useless test condition in +wait_pending_snapshot_roots_delalloc(). V4: Use another approach that doesn't imply starting delalloc work and wait for it to finish at transaction commit time. fs/btrfs/ctree.h | 4 ++-- fs/btrfs/extent-tree.c | 16 +--- fs/btrfs/file.c| 10 +- fs/btrfs/inode.c | 47 --- fs/btrfs/ioctl.c | 7 --- 5 files changed, 60 insertions(+), 24 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index b72b358..36f82ba 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -3427,8 +3427,8 @@ int btrfs_init_space_info(struct btrfs_fs_info *fs_info); int btrfs_delayed_refs_qgroup_accounting(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info); int __get_raid_index(u64 flags); -int btrfs_start_nocow_write(struct btrfs_root *root); -void btrfs_end_nocow_write(struct btrfs_root *root); +int btrfs_start_write_no_snapshoting(struct btrfs_root *root); +void btrfs_end_write_no_snapshoting(struct btrfs_root *root); /* ctree.c */ int btrfs_bin_search(struct extent_buffer *eb, struct btrfs_key *key, int level, int *slot); diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index a84e00d..9ba886c 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -9657,12 +9657,14 @@ int btrfs_trim_fs(struct btrfs_root *root, struct fstrim_range *range) } /* - * btrfs_{start,end}_write() is similar to mnt_{want, drop}_write(), - * they are used to prevent the some tasks writing data into the page cache - * by nocow before the subvolume is snapshoted, but flush the data into - * the disk
Re: read block failed check_tree_block / Couldn't read chunk tree
Can't find commit in official repos Get fatal: bad object 915902c5002485fb13d27c4b699a73fb66cc0f09 from git show Found commit 2513077f2f830b4bc83d528bfb6979eb461918bd btrfs-progs: fix device missing of btrfs fi show with seed devices Thanks René 2014-10-29 4:45 GMT+01:00 Anand Jain anand.j...@oracle.com: this is (most likely) due to patch below, commit 915902c5002485fb13d27c4b699a73fb66cc0f09 btrfs-progs: fix device missing of btrfs fi show with seed devices Could you try to back out the patch from progs and give it a shot ? and pls report what you see. Thanks. On 10/25/14 00:43, Rene Thomas wrote: # btrfs --version Btrfs v3.17 # btrfs fi show Label: 'mythstorage' uuid: 9b454272-6800-4b3c-b196-9e180407a6cb Total devices 1 FS bytes used 2.36MiB devid1 size 931.51GiB used 10.04GiB path /dev/sdd1 Check tree block failed, want=5845480062976, have=0 Check tree block failed, want=5845480062976, have=0 Check tree block failed, want=5845480062976, have=65536 Check tree block failed, want=5845480062976, have=0 Check tree block failed, want=5845480062976, have=0 read block failed check_tree_block Couldn't read chunk tree -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] Btrfs: fix snapshot inconsistency after a file write followed by truncate
On Wed, Oct 29, 2014 at 7:57 AM, Filipe Manana fdman...@suse.com wrote: If right after starting the snapshot creation ioctl we perform a write against a file followed by a truncate, with both operations increasing the file's size, we can get a snapshot tree that reflects a state of the source subvolume's tree where the file truncation happened but the write operation didn't. This leaves a gap between 2 file extent items of the inode, which makes btrfs' fsck complain about it. For example, if we perform the following file operations: $ mkfs.btrfs -f /dev/vdd $ mount /dev/vdd /mnt $ xfs_io -f \ -c pwrite -S 0xaa -b 32K 0 32K \ -c fsync \ -c pwrite -S 0xbb -b 32770 16K 32770 \ -c truncate 90123 \ /mnt/foobar and the snapshot creation ioctl was just called before the second write, we often can get the following inode items in the snapshot's btree: item 120 key (257 INODE_ITEM 0) itemoff 7987 itemsize 160 inode generation 146 transid 7 size 90123 block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0 flags 0x0 item 121 key (257 INODE_REF 256) itemoff 7967 itemsize 20 inode ref index 282 namelen 10 name: foobar item 122 key (257 EXTENT_DATA 0) itemoff 7914 itemsize 53 extent data disk byte 1104855040 nr 32768 extent data offset 0 nr 32768 ram 32768 extent compression 0 item 123 key (257 EXTENT_DATA 53248) itemoff 7861 itemsize 53 extent data disk byte 0 nr 0 extent data offset 0 nr 40960 ram 40960 extent compression 0 There's a file range, corresponding to the interval [32K; ALIGN(16K + 32770, 4096)[ for which there's no file extent item covering it. This is because the file write and file truncate operations happened both right after the snapshot creation ioctl called btrfs_start_delalloc_inodes(), which means we didn't start and wait for the ordered extent that matches the write and, in btrfs_setsize(), we were able to call btrfs_cont_expand() before being able to commit the current transaction in the snapshot creation ioctl. So this made it possibe to insert the hole file extent item in the source subvolume (which represents the region added by the truncate) right before the transaction commit from the snapshot creation ioctl. Btrfs' fsck tool complains about such cases with a message like the following: root 331 inode 257 errors 100, file extent discount From a user perspective, the expectation when a snapshot is created while those file operations are being performed is that the snapshot will have a file that either: 1) is empty 2) only the first write was captured 3) only the 2 writes were captured 4) both writes and the truncation were captured But never capture a state where only the first write and the truncation were captured (since the second write was performed before the truncation). A test case for xfstests follows. Signed-off-by: Filipe Manana fdman...@suse.com --- V2: Use different approach to solve the problem. Don't start and wait for all dellaloc to finish after every expanding truncate, instead add an additional flush at transaction commit time if we're doing a transaction commit that creates snapshots. V3: Removed useless test condition in +wait_pending_snapshot_roots_delalloc(). V4: Use another approach that doesn't imply starting delalloc work and wait for it to finish at transaction commit time. I like this one better ;) Taking it for a spin here. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: Enhance btrfs chunk allocation algorithm to reduce ENOSPC caused by unbalanced data/metadata allocation.
On Mon, Oct 27, 2014 at 04:36:26PM +0800, Qu Wenruo wrote: Original Message Subject: Re: [PATCH] btrfs: Enhance btrfs chunk allocation algorithm to reduce ENOSPC caused by unbalanced data/metadata allocation. From: Liu Bo bo.li@oracle.com To: Qu Wenruo quwen...@cn.fujitsu.com Date: 2014年10月27日 16:14 On Mon, Oct 27, 2014 at 08:18:12AM +0800, Qu Wenruo wrote: Original Message Subject: Re: [PATCH] btrfs: Enhance btrfs chunk allocation algorithm to reduce ENOSPC caused by unbalanced data/metadata allocation. From: Liu Bo bo.li@oracle.com To: Qu Wenruo quwen...@cn.fujitsu.com Date: 2014年10月24日 19:06 On Thu, Oct 23, 2014 at 10:37:51AM +0800, Qu Wenruo wrote: When btrfs allocate a chunk, it will try to alloc up to 1G for data and 256M for metadata, or 10% of all the writeable space if there is enough 10G for data, if (type BTRFS_BLOCK_GROUP_DATA) { max_stripe_size = 1024 * 1024 * 1024; max_chunk_size = 10 * max_stripe_size; Oh, sorry, 10G is right. Any other comments? Thanks, Qu ... thanks, -liubo space for the stripe on device. However, when we run out of space, this allocation may cause unbalanced chunk allocation. For example, there are only 1G unallocated space, and request for allocate DATA chunk is sent, and all the space will be allocated as data chunk, making later metadata chunk alloc request unable to handle, which will cause ENOSPC. This is the one of the common complains from end users about why ENOSPC happens but there is still available space. Okay, I don't think this is the common case, AFAIK, the most ENOSPC is caused by our runtime worst case metadata reservation problem. btrfs has been inclined to create a fairly large metadata chunk (1G) in its initial mkfs stage and 256M metadata chunk is also a very large one. As of your below example, yes, we don't have space for metadata allocation, but do we really need to allocate a new one? Or am I missing something? thanks, -liubo Yes that's true this is not the common cause, but at least this patch may make the percentage of 'df' command reach as close to 100% as possible before hitting ENOSPC under normal operations. (If not using balance) And some case like the following mail may be improved by the patch: https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg36097.html I understand that most of the cases that a lot of free data space and no metadata space is caused by create and then delete large files, but if the last giga bytes can be allocated more carefully, at least the available bytes of 'df' command should be reduced before hit ENOSPC. How do you think about it? Sorry for the late reply. I just notice that a recent commit has fixed this problem. commit 47ab2a6c689913db23ccae38349714edf8365e0a Author: Josef Bacik jba...@fb.com Date: Thu Sep 18 11:20:02 2014 -0400 Btrfs: remove empty block groups automatically thanks, -liubo Thanks, Qu This patch will try not to alloc chunk which is more than half of the unallocated space, making the last space more balanced at a small cost of more fragmented chunk at the last 1G. Some easy example: Preallocate 17.5G on a 20G empty btrfs fs: [Before] # btrfs fi show /mnt/test Label: none uuid: da8741b1-5d47-4245-9e94-bfccea34e91e Total devices 1 FS bytes used 17.50GiB devid1 size 20.00GiB used 20.00GiB path /dev/sdb All space is allocated. No space later metadata space. [After] # btrfs fi show /mnt/test Label: none uuid: e6935aeb-a232-4140-84f9-80aab1f23d56 Total devices 1 FS bytes used 17.50GiB devid1 size 20.00GiB used 19.77GiB path /dev/sdb About 230M is still available for later metadata allocation. Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com --- fs/btrfs/volumes.c | 18 ++ 1 file changed, 18 insertions(+) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index d47289c..fa8de79 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -4240,6 +4240,7 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans, int ret; u64 max_stripe_size; u64 max_chunk_size; + u64 total_avail_space = 0; u64 stripe_size; u64 num_bytes; u64 raid_stripe_len = BTRFS_STRIPE_LEN; @@ -4352,10 +4353,27 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans, devices_info[ndevs].max_avail = max_avail; devices_info[ndevs].total_avail = total_avail; devices_info[ndevs].dev = device; + total_avail_space += total_avail; ++ndevs; } /* + * Try not to occupy more than half of the unallocated space. + * When run short of space and alloc all the space to + * data/metadata will cause ENOSPC to be triggered more easily. + * + * And since the
Re: Unable to fixup (regular) error in RAID1 fs
On Oct 29, 2014, at 2:08 AM, Juan Orti juan.o...@miceliux.com wrote: El 2014-10-29 04:02, Duncan escribió: Juan Orti posted on Tue, 28 Oct 2014 16:54:19 +0100 as excerpted: [ 3713.086292] BTRFS: unable to fixup (regular) error at logical 483011874816 on dev /dev/sdb2 [ 3713.092577] BTRFS: checksum error at logical 483011948544 on dev /dev/sdb2, sector 628793528, root 2500, inode 1436631, offset 4059963392, length 4096, links 1 (path: juan/.local/share/gnome-boxes/images/boxes-unknown) [ 3713.092584] BTRFS: bdev /dev/sdb2 errs: wr 0, rd 0, flush 0, corrupt 38, gen 0 [ 3713.093035] BTRFS: unable to fixup (regular) error at logical 483011948544 on dev /dev/sdb2 Why can't it fix the errors? a bad device? smartctl says the disk is ok. I'm currently running a full scrub to see if it finds more errors. What should I do? Btrfs raid1, and I see you have it for both data and metadata. During normal operation, when btrfs comes across a block that doesn't match its checksum, it will look to see if there's another copy (which there is with raid1, which has exactly two copies) of that block and will try to use it instead if so. If the second copy matches the checksum, all is fine and btrfs will in fact attempt to rewrite the bad copy using the good copy, as well as returning the good copy to whatever was reading it. Those corruption errors seem to indicate that it can't find a good copy to update the bad copy with -- both copies ended up bad. Either that or it found the good copy and returned it to whatever was reading, but couldn't rewrite the bad copy, for some reason. I'm not sure which of those interpretations is correct, but given that you didn't see anything else bad happening, no apps returning errors due to read error, etc, I'd guess the second. Because otherwise whatever was doing the read should have returned an error. When this error happened, I was editing some text files with vi, and it was painfully slow, it took 30 seconds to open a 20 lines file, so something weird was going on. Anyway, no visible user space error could be seen. Anything in dmesg prior to the previously reported errors? Either with syslog messages or journalctl, filter by btrfs and see what you get for the past couple of days. And then also find out what ata port the two drives are on and filter by those; usually in the form ataX.00. You could also search for exception Emask and see if anything comes up. This would account for either controller or drive hardware error messages. Chris Murphy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
v3.18-rc2 at a 32 bit KVM gives :INFO: trying to register non-static key.the code is fine but needs lockdep annotation.
This is new in my eyes, or ? : Oct 29 17:53:04 n22kvmclone kernel: INFO: trying to register non-static key. Oct 29 17:53:04 n22kvmclone kernel: the code is fine but needs lockdep annotation. Oct 29 17:53:04 n22kvmclone kernel: turning off the locking correctness validator. Oct 29 17:53:04 n22kvmclone kernel: CPU: 0 PID: 2525 Comm: trinity-c0 Not tainted 3.18.0-rc2 #1 Oct 29 17:53:04 n22kvmclone kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 Oct 29 17:53:04 n22kvmclone kernel: f55e5b70 c2a5d3ba f55e5bc4 c2684a7b c2ba5888 Oct 29 17:53:04 n22kvmclone kernel: c2a64822 f55a f55e5bb4 0001 f890b458 Oct 29 17:53:04 n22kvmclone kernel: f55a0df4 f55a0e04 f40c3000 0100 c2d68578 f5719c94 f62b6ea0 Oct 29 17:53:04 n22kvmclone kernel: Call Trace: Oct 29 17:53:04 n22kvmclone kernel: [c2a5d3ba] dump_stack+0x41/0x52 Oct 29 17:53:04 n22kvmclone kernel: [c2684a7b] __lock_acquire.isra.31+0x89b/0x9a0 Oct 29 17:53:04 n22kvmclone kernel: [c2a64822] ? _raw_spin_unlock+0x22/0x30 Oct 29 17:53:04 n22kvmclone kernel: [f890b458] ? btrfs_make_block_group+0x1d8/0x290 [btrfs] Oct 29 17:53:04 n22kvmclone kernel: [c263f360] ? native_wbinvd+0x10/0x10 Oct 29 17:53:04 n22kvmclone kernel: [c26850ff] lock_acquire+0x8f/0x110 Oct 29 17:53:04 n22kvmclone kernel: [f894d001] ? btrfs_alloc_chunk+0x41/0x50 [btrfs] Oct 29 17:53:04 n22kvmclone kernel: [f8949304] __btrfs_alloc_chunk+0x684/0xb10 [btrfs] Oct 29 17:53:04 n22kvmclone kernel: [f894d001] ? btrfs_alloc_chunk+0x41/0x50 [btrfs] Oct 29 17:53:04 n22kvmclone kernel: [f894d001] btrfs_alloc_chunk+0x41/0x50 [btrfs] Oct 29 17:53:04 n22kvmclone kernel: [f8901f8d] do_chunk_alloc+0x1dd/0x410 [btrfs] Oct 29 17:53:04 n22kvmclone kernel: [f88fc196] ? get_alloc_profile+0x166/0x2d0 [btrfs] Oct 29 17:53:04 n22kvmclone kernel: [f8903144] btrfs_check_data_free_space+0x144/0x320 [btrfs] Oct 29 17:53:04 n22kvmclone kernel: [f8930e8b] __btrfs_buffered_write+0x10b/0x550 [btrfs] Oct 29 17:53:04 n22kvmclone kernel: [f89316c0] btrfs_file_write_iter+0x3f0/0x6c0 [btrfs] Oct 29 17:53:04 n22kvmclone kernel: [c2755eba] ? do_iter_readv_writev+0x6a/0xa0 Oct 29 17:53:04 n22kvmclone kernel: [c2755eba] do_iter_readv_writev+0x6a/0xa0 Oct 29 17:53:04 n22kvmclone kernel: [f89312d0] ? __btrfs_buffered_write+0x550/0x550 [btrfs] Oct 29 17:53:04 n22kvmclone kernel: [c2757210] do_readv_writev+0xa0/0x270 Oct 29 17:53:04 n22kvmclone kernel: [f89312d0] ? __btrfs_buffered_write+0x550/0x550 [btrfs] Oct 29 17:53:04 n22kvmclone kernel: [c2755f80] ? do_sync_readv_writev+0x90/0x90 Oct 29 17:53:04 n22kvmclone kernel: [c27720c0] ? __fdget_pos+0x30/0x40 Oct 29 17:53:04 n22kvmclone kernel: [c269d6c1] ? do_setitimer+0x121/0x200 Oct 29 17:53:04 n22kvmclone kernel: [c2a64992] ? _raw_spin_unlock_irq+0x22/0x40 Oct 29 17:53:04 n22kvmclone kernel: [c269d6c1] ? do_setitimer+0x121/0x200 Oct 29 17:53:04 n22kvmclone kernel: [c2757414] vfs_writev+0x34/0x60 Oct 29 17:53:04 n22kvmclone kernel: [c27575d6] SyS_writev+0x56/0xe0 Oct 29 17:53:04 n22kvmclone kernel: [c2a6522b] sysenter_do_call+0x12/0x12 -- Toralf pgp key: 0076 E94E -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID1 fails to recover chunk tree
$ sudo mount -o degraded,ro /dev/sdd1 /asdf mount: wrong fs type, bad option, bad superblock on /dev/sdd1, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so. $ dmesg | tail [524718.760792] BTRFS info (device sdd1): allowing degraded mounts [524718.760800] BTRFS info (device sdd1): disk space caching is enabled [524718.762087] BTRFS: failed to read chunk root on sdd1 [524718.776524] BTRFS: open_ctree failed $ uname -a Linux mach 3.17.1-52.g5c4d099-desktop #1 SMP PREEMPT Sat Oct 18 23:36:23 UTC 2014 (5c4d099) x86_64 x86_64 x86_64 GNU/Linux $ btrfs --version Btrfs v3.16.2+20141003 On 10/28/2014 11:55 PM, Anand Jain wrote: 'mount degraded,ro' see if there is any non-zero non-raid1 group profile. On 10/29/14 04:32, Zack Coffey wrote: Revisit of a previous issue. Setup a single 640GB drive with BTRFS and compression. This was not a system drive, just a place to put random junk. Made a RAID1 with another drive of just the metadata. Was in that state for less than 12 hours-ish, removed the second drive and now cannot get to any data on the original drive. Data remained single while only metadata was RAID1. Single drive btrfs was made on Ubuntu with kernel 3.13.0 and tools 3.12. $ sudo mount -o degraded /dev/sdc1 /media/Data/ mount: wrong fs type, bad option, bad superblock on /dev/sdc1, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so $ dmesg | tail [45353.869448] KBD BUG in ../../../../../../../../ drivers/2d/lnx/fgl/drm/kernel/ gal.c at line: 304! [45353.901511] KBD BUG in ../../../../../../../../ drivers/2d/lnx/fgl/drm/kernel/gal.c at line: 304! [45353.901666] KBD BUG in ../../../../../../../../drivers/2d/lnx/fgl/drm/kernel/gal.c at line: 304! [45354.148488] KBD BUG in ../../../../../../../../drivers/2d/lnx/fgl/drm/kernel/gal.c at line: 304! [45354.148573] KBD BUG in ../../../../../../../../drivers/2d/lnx/fgl/drm/kernel/gal.c at line: 304! [46241.155350] btrfs: device fsid bd78815a-802b-43e2-8387-fc6ab4237d67 devid 1 transid 60944 /dev/sdc1 [46241.155923] btrfs: allowing degraded mounts [46241.155927] btrfs: disk space caching is enabled [46241.159436] btrfs: failed to read chunk root on sdc1 [46241.177815] btrfs: open_ctree failed $ btrfs-show-super /dev/sdc1 superblock: bytenr=65536, device=/dev/sdc1 -- --- csum0x93bcb1b5 [match] bytenr 65536 flags 0x1 magic _BHRfS_M [match] fsidbd78815a-802b-43e2-8387-fc6ab4237d67 label generation 60944 root909586694144 sys_array_size 97 chunk_root_generation 60938 root_level 1 chunk_root 911673917440 chunk_root_level1 log_root0 log_root_transid0 log_root_level 0 total_bytes 1115871535104 bytes_used 321833435136 sectorsize 4096 nodesize4096 leafsize4096 stripesize 4096 root_dir6 num_devices 2 compat_flags0x0 compat_ro_flags 0x0 incompat_flags 0x9 csum_type 0 csum_size 4 cache_generation60944 uuid_tree_generation60944 dev_item.uuid d82b2027-17b6-4513-a86d-9227a42d7ed1 dev_item.fsid bd78815a-802b-43e2-8387-fc6ab4237d67 [match] dev_item.type 0 dev_item.total_bytes615763673088 dev_item.bytes_used 324270030848 dev_item.io_align 4096 dev_item.io_width 4096 dev_item.sector_size4096 dev_item.devid 1 dev_item.dev_group 0 dev_item.seek_speed 0 dev_item.bandwidth 0 dev_item.generation 0 $ sudo btrfs device add -f /dev/sdh1 /dev/sdc1 ERROR: error adding the device '/dev/sdh1' - Inappropriate ioctl for device $ sudo btrfs device delete missing /dev/sdc1 ERROR: error removing the device 'missing' - Inappropriate ioctl for device $ sudo mount -o degraded,defaults,compress=lzo /dev/sdc1 /media/Data/ mount: wrong fs type, bad option, bad superblock on /dev/sdc1, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so $ dmesg | tail [106991.655384] btrfs: device fsid bd78815a-802b-43e2-8387-fc6ab4237d67 devid 1 transid 60944 /dev/sdc1 [106991.665066] btrfs: device fsid bd78815a-802b-43e2-8387-fc6ab4237d67 devid 1 transid 60944 /dev/sdc1 [107019.954397] btrfs: device fsid bd78815a-802b-43e2-8387-fc6ab4237d67 devid 1 transid 60944 /dev/sdc1 [107019.962009] btrfs: device fsid bd78815a-802b-43e2-8387-fc6ab4237d67 devid 1 transid 60944 /dev/sdc1 [107070.124927] btrfs: device fsid bd78815a-802b-43e2-8387-fc6ab4237d67 devid 1 transid 60944 /dev/sdc1 [107070.126475] btrfs: allowing
Fix Penguin Penalty 17th October2014 ( mail-archive.com )
Dear Sir Did your website get hit by Google Penguin update on October 17th 2014? What basically is Google Penguin Update? It is actually a code name for Google algorithm which aims at decreasing your websites search engine rankings that violate Googles guidelines by using black hat SEO techniques to rank your webpage by giving number of spammy links to the page. We are one of those few SEO companies that can help you avoid penalties from Google Updates like Penguin and Panda. Our clients have survived all the previous and present updates with ease. They have never been hit because we use 100% white hat SEO techniques to rank Webpages. Simple thing that we do to keep websites away from any Penguin or Panda penalties is follow Google guidelines and we give Google users the best answers to their queries. If you are looking to increase the quality of your websites and to get more targeted traffic or save your websites from these Google penalties email us back with your interest. We will be glad to serve you and help you grow your business. Regards Vince G SEO Manager ( TOB ) B7 Green Avenue, Amritsar 143001 Punjab NO CLICK in the subject to STOP EMAILS -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs raid1 array has issues with rtorrent usage pattern.
I'm in the middle of debugging the exact same thing. 3.17.0 - rtorrent dies with SIGBUS. I've done some debugging, the sequence is something like this: open a new file fallocate() to the final size mmap() all (or a portion) of the file write to the region run SHA1 on that mmap'd region to validate the chink crash, eventually. Generally not at the same point. Reading that file (cat /dev/null) returns -EIO. Looking up the process maps, the SIGBUS appears to be happening in the middle of a mapped region of a pre-allocated file - I.E. it shouldn't be. I'm not completely ruling out a rtorrent bug but it appears sane to me. Weirder: old files, that have been around a while, work just fine for seeding. I've re-hashed my entire collection without an error. Seeing this on both inherit-COW and no-inherit-COW files, and the filesystem is not using compression. The interesting part is going back and attempting to read the files later they sometimes don't throw an IO error. Absolutely nothing in dmesg. Working on a testcase that triggers it reliably but no luck so far. I thought I had bad RAM but two people upgrading to 3.17 and seeing the same bug at around the same time can't be a coincidence. I rebooted to 3.17 on the 25th, the first new download was on the 28th and that failed. Working on a testcase for it that's more reproducable than go grab torrent files with rtorrent. On Tue, Oct 28, 2014 at 12:49 PM, Alec Blayne a...@tevsa.net wrote: Hi, it seems that when using rtorrent to download into a btrfs system, it leads to the creation of files that fail to read properly. For instance, I get rtorrent to crash, but if I try to rsync the file he was writting into someplace else, rsync also fails with the message can't map file $file: Input/Output error (5). If I give it time, eventually the file gets into a good state and I can rsync it somewhere else (as long as rtorrent doesn't keep writting into it). This doesn't happen using ext4 on the same system. No btrfs errors, or any other errors, show up in any log. Scrubbing or balancing don't turn up any issues. I've tried using a subvolume mounted with nodatacow and/or flushoncommit, which didn't help. I'm not using quotas and at some point had a single snapshot that I deleted. The filesystem was originally created recently (on a 3.16.4+ kernel). Here's what the array looks like: Label: 'data' uuid: ffe83a3d-f4ba-46b7-8424-4ec3380cb811 Total devices 4 FS bytes used 3.14TiB devid4 size 2.73TiB used 2.36TiB path /dev/sdd1 devid5 size 1.82TiB used 1.45TiB path /dev/sdc1 devid6 size 1.82TiB used 1.45TiB path /dev/sdb1 devid7 size 1.82TiB used 1.45TiB path /dev/sda1 Btrfs v3.17 Data, RAID1: total=3.34TiB, used=3.13TiB System, RAID1: total=32.00MiB, used=512.00KiB Metadata, RAID1: total=10.00GiB, used=7.31GiB GlobalReserve, single: total=512.00MiB, used=0.00B On linux 3.17.1: Linux 3.17.1-gentoo-r1 #3 SMP PREEMPT Tue Oct 28 02:43:11 WET 2014 x86_64 AMD Athlon(tm) 5350 APU with Radeon(tm) R3 AuthenticAMD GNU/Linux I'm utterly puzzled and clueless at how to dig into this issue. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[bug] allows umount before transactions complete
Filed bug here with more details and complete dmesg attached: https://bugzilla.kernel.org/show_bug.cgi?id=87131 kernel-3.18.0-0.rc2.git1.1.fc22.x86_64 SUMMARY: After umount returning to prompt, and physical disconnected 2x devices (btrfs raid1 on raw devices), I get a backtrace with some scary messages. Here are some snippets: [10570.371285] BTRFS: error (device sdc) in btrfs_commit_transaction:1917: errno=-5 IO failure (Error while writing out transaction) [10570.372426] BTRFS info (device sdc): forced readonly [10570.372432] BTRFS warning (device sdc): Skipping commit of aborted transaction. [10570.372456] BTRFS: Transaction aborted (error -5) [10570.372807] BTRFS: error (device sdc) in cleanup_transaction:1599: errno=-5 IO failure [10570.373960] BTRFS info (device sdc): delayed_refs has NO entry After reboot, kernel shows both devids have the same generation. btrfs check comes up clean. mount also has no complaints, and mounts rw. Chris Murphy-- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID1 fails to recover chunk tree
On 10/28/2014 01:32 PM, Zack Coffey wrote: Made a RAID1 with another drive of just the metadata. Was in that state for less than 12 hours-ish, removed the second drive and now cannot get to any data on the original drive. Data remained single while only metadata was RAID1. I don't know all the details but I would _never_ suspect the action you described to _not_ hose up the file system. The single mode is not restrict to one drive its concatenation, as in treat the entire space as if it were a single drive. In that twelve hour window data migrated. I _think_ directories may count as data in this sense. If a key element (say the root directory) migrated onto the disk you eventually removed then there is no root directory to read. And if not root, then any secondary directory you choose. So sure your checksum trees and your extent maps were all duplicated in the mirror, but your actual data -- you know all those files that were copied on write -- may well be only on that second drive you pulled out. RAID metadata, and non RAID1 data, would not safely allow for failure (or removal) of one drive. I'm not sure what you expected to happen but what you did is full of fail. You need to put the second drive back in and then coerce all the data back to the first drive. btrfs device delete is what you want. You _may_ need to switch the metadata back to single before the delete. --Rob. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs raid1 array has issues with rtorrent usage pattern.
The following code reliably throws a SIGBUS in the memset, and cat testfile /dev/null returns an IO error. I've sometimes gotten as high as iteration 900 before a SIGBUS, so don't assume a single clear is OK. linux 3.17.0, SATA - MD(raid5) - bcache (ssd) - btrfs Working on eliminating more variables. #include fcntl.h #include unistd.h #include sys/mman.h #include stdint.h #include stdlib.h #include stdio.h #include string.h #define MB (1024ull * 1024) #define GB (1024ull * MB) #define TEST_SIZE (4096) int main() { int fd; srandom(1024); fd=open(testfile, O_RDWR|O_CREAT, 0600); posix_fallocate(fd, 0, TEST_SIZE * MB); uint8_t * map = 0; int i; for(i=0;i1000;i++) { size_t location=(random() % (TEST_SIZE-1)) * MB; map = (uint8_t *) mmap(map, MB, PROT_READ|PROT_WRITE, MAP_SHARED, fd, location); printf(%d: writing at %04zd mb\n, i, location); memset(map, 0x5a, 1 * MB); msync(map, 1*MB, MS_ASYNC); munmap(map, MB); } } On Wed, Oct 29, 2014 at 5:50 PM, Dan Merillat dan.meril...@gmail.com wrote: I'm in the middle of debugging the exact same thing. 3.17.0 - rtorrent dies with SIGBUS. I've done some debugging, the sequence is something like this: open a new file fallocate() to the final size mmap() all (or a portion) of the file write to the region run SHA1 on that mmap'd region to validate the chink crash, eventually. Generally not at the same point. Reading that file (cat /dev/null) returns -EIO. Looking up the process maps, the SIGBUS appears to be happening in the middle of a mapped region of a pre-allocated file - I.E. it shouldn't be. I'm not completely ruling out a rtorrent bug but it appears sane to me. Weirder: old files, that have been around a while, work just fine for seeding. I've re-hashed my entire collection without an error. Seeing this on both inherit-COW and no-inherit-COW files, and the filesystem is not using compression. The interesting part is going back and attempting to read the files later they sometimes don't throw an IO error. Absolutely nothing in dmesg. Working on a testcase that triggers it reliably but no luck so far. I thought I had bad RAM but two people upgrading to 3.17 and seeing the same bug at around the same time can't be a coincidence. I rebooted to 3.17 on the 25th, the first new download was on the 28th and that failed. Working on a testcase for it that's more reproducable than go grab torrent files with rtorrent. On Tue, Oct 28, 2014 at 12:49 PM, Alec Blayne a...@tevsa.net wrote: Hi, it seems that when using rtorrent to download into a btrfs system, it leads to the creation of files that fail to read properly. For instance, I get rtorrent to crash, but if I try to rsync the file he was writting into someplace else, rsync also fails with the message can't map file $file: Input/Output error (5). If I give it time, eventually the file gets into a good state and I can rsync it somewhere else (as long as rtorrent doesn't keep writting into it). This doesn't happen using ext4 on the same system. No btrfs errors, or any other errors, show up in any log. Scrubbing or balancing don't turn up any issues. I've tried using a subvolume mounted with nodatacow and/or flushoncommit, which didn't help. I'm not using quotas and at some point had a single snapshot that I deleted. The filesystem was originally created recently (on a 3.16.4+ kernel). Here's what the array looks like: Label: 'data' uuid: ffe83a3d-f4ba-46b7-8424-4ec3380cb811 Total devices 4 FS bytes used 3.14TiB devid4 size 2.73TiB used 2.36TiB path /dev/sdd1 devid5 size 1.82TiB used 1.45TiB path /dev/sdc1 devid6 size 1.82TiB used 1.45TiB path /dev/sdb1 devid7 size 1.82TiB used 1.45TiB path /dev/sda1 Btrfs v3.17 Data, RAID1: total=3.34TiB, used=3.13TiB System, RAID1: total=32.00MiB, used=512.00KiB Metadata, RAID1: total=10.00GiB, used=7.31GiB GlobalReserve, single: total=512.00MiB, used=0.00B On linux 3.17.1: Linux 3.17.1-gentoo-r1 #3 SMP PREEMPT Tue Oct 28 02:43:11 WET 2014 x86_64 AMD Athlon(tm) 5350 APU with Radeon(tm) R3 AuthenticAMD GNU/Linux I'm utterly puzzled and clueless at how to dig into this issue. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID1 fails to recover chunk tree
On 10/29/2014 03:26 PM, Robert White wrote: On 10/28/2014 01:32 PM, Zack Coffey wrote: Made a RAID1 with another drive of just the metadata. Was in that state for less than 12 hours-ish, removed the second drive and now cannot get to any data on the original drive. Data remained single while only metadata was RAID1. I don't know all the details but I would _never_ suspect the action you described to _not_ hose up the file system. You need to put the second drive back in and then coerce all the data back to the first drive. btrfs device delete is what you want. You _may_ need to switch the metadata back to single before the delete. --Rob. P.S. I am/was assuming you said removed the second drive in the normal sense of disconnecting and removing, as opposed to the semantic action of deleting the device element. If you did do the btrfs delete, you might have needed to do a btrfs filesystem sync to make sure that all the transactions involved in the delete were finished and flushed to disk. Either way, physically reattaching the second drive is your first step; presuming again that you haven't destroyed the partition or re-used the drive etc. If the partition will mount once the second drive is in place, do the delete operation (if you didn't) and then the sync (to make sure that everything has finished migrating etc). Then you should be able to re-remove the physical drive. If you already did the delete and sync as part of what you meant by remove then sorry for the interruption of your misery. 8-) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: Enhance btrfs chunk allocation algorithm to reduce ENOSPC caused by unbalanced data/metadata allocation.
Original Message Subject: Re: [PATCH] btrfs: Enhance btrfs chunk allocation algorithm to reduce ENOSPC caused by unbalanced data/metadata allocation. From: Liu Bo bo.li@oracle.com To: Qu Wenruo quwen...@cn.fujitsu.com Date: 2014年10月29日 22:29 On Mon, Oct 27, 2014 at 04:36:26PM +0800, Qu Wenruo wrote: Original Message Subject: Re: [PATCH] btrfs: Enhance btrfs chunk allocation algorithm to reduce ENOSPC caused by unbalanced data/metadata allocation. From: Liu Bo bo.li@oracle.com To: Qu Wenruo quwen...@cn.fujitsu.com Date: 2014年10月27日 16:14 On Mon, Oct 27, 2014 at 08:18:12AM +0800, Qu Wenruo wrote: Original Message Subject: Re: [PATCH] btrfs: Enhance btrfs chunk allocation algorithm to reduce ENOSPC caused by unbalanced data/metadata allocation. From: Liu Bo bo.li@oracle.com To: Qu Wenruo quwen...@cn.fujitsu.com Date: 2014年10月24日 19:06 On Thu, Oct 23, 2014 at 10:37:51AM +0800, Qu Wenruo wrote: When btrfs allocate a chunk, it will try to alloc up to 1G for data and 256M for metadata, or 10% of all the writeable space if there is enough 10G for data, if (type BTRFS_BLOCK_GROUP_DATA) { max_stripe_size = 1024 * 1024 * 1024; max_chunk_size = 10 * max_stripe_size; Oh, sorry, 10G is right. Any other comments? Thanks, Qu ... thanks, -liubo space for the stripe on device. However, when we run out of space, this allocation may cause unbalanced chunk allocation. For example, there are only 1G unallocated space, and request for allocate DATA chunk is sent, and all the space will be allocated as data chunk, making later metadata chunk alloc request unable to handle, which will cause ENOSPC. This is the one of the common complains from end users about why ENOSPC happens but there is still available space. Okay, I don't think this is the common case, AFAIK, the most ENOSPC is caused by our runtime worst case metadata reservation problem. btrfs has been inclined to create a fairly large metadata chunk (1G) in its initial mkfs stage and 256M metadata chunk is also a very large one. As of your below example, yes, we don't have space for metadata allocation, but do we really need to allocate a new one? Or am I missing something? thanks, -liubo Yes that's true this is not the common cause, but at least this patch may make the percentage of 'df' command reach as close to 100% as possible before hitting ENOSPC under normal operations. (If not using balance) And some case like the following mail may be improved by the patch: https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg36097.html I understand that most of the cases that a lot of free data space and no metadata space is caused by create and then delete large files, but if the last giga bytes can be allocated more carefully, at least the available bytes of 'df' command should be reduced before hit ENOSPC. How do you think about it? Sorry for the late reply. I just notice that a recent commit has fixed this problem. commit 47ab2a6c689913db23ccae38349714edf8365e0a Author: Josef Bacik jba...@fb.com Date: Thu Sep 18 11:20:02 2014 -0400 Btrfs: remove empty block groups automatically thanks, -liubo Oh, that's much better than my patch. So please ignore my patch. Thanks, Qu Thanks, Qu This patch will try not to alloc chunk which is more than half of the unallocated space, making the last space more balanced at a small cost of more fragmented chunk at the last 1G. Some easy example: Preallocate 17.5G on a 20G empty btrfs fs: [Before] # btrfs fi show /mnt/test Label: none uuid: da8741b1-5d47-4245-9e94-bfccea34e91e Total devices 1 FS bytes used 17.50GiB devid1 size 20.00GiB used 20.00GiB path /dev/sdb All space is allocated. No space later metadata space. [After] # btrfs fi show /mnt/test Label: none uuid: e6935aeb-a232-4140-84f9-80aab1f23d56 Total devices 1 FS bytes used 17.50GiB devid1 size 20.00GiB used 19.77GiB path /dev/sdb About 230M is still available for later metadata allocation. Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com --- fs/btrfs/volumes.c | 18 ++ 1 file changed, 18 insertions(+) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index d47289c..fa8de79 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -4240,6 +4240,7 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans, int ret; u64 max_stripe_size; u64 max_chunk_size; + u64 total_avail_space = 0; u64 stripe_size; u64 num_bytes; u64 raid_stripe_len = BTRFS_STRIPE_LEN; @@ -4352,10 +4353,27 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans, devices_info[ndevs].max_avail = max_avail; devices_info[ndevs].total_avail = total_avail; devices_info[ndevs].dev = device; + total_avail_space += total_avail;
[PATCH 1/2] btrfs-progs: make the search target device routine more clear for fi show
Extract the procedure of searching for a target device for fi show from the @map_seed_devices() function to make it more clear. Signed-off-by: Gui Hecheng guihc.f...@cn.fujitsu.com --- cmds-filesystem.c | 37 - 1 file changed, 28 insertions(+), 9 deletions(-) diff --git a/cmds-filesystem.c b/cmds-filesystem.c index bb5881e..6437e57 100644 --- a/cmds-filesystem.c +++ b/cmds-filesystem.c @@ -742,14 +742,10 @@ static int find_and_copy_seed(struct btrfs_fs_devices *seed, return 1; } -static int map_seed_devices(struct list_head *all_uuids, - char *search, int *found) +static int search_umounted_fs_uuids(struct list_head *all_uuids, + char *search) { - struct btrfs_fs_devices *cur_fs, *cur_seed; - struct btrfs_fs_devices *fs_copy, *seed_copy; - struct btrfs_fs_devices *opened_fs; - struct btrfs_device *device; - struct btrfs_fs_info *fs_info; + struct btrfs_fs_devices *cur_fs, *fs_copy; struct list_head *fs_uuids; int ret = 0; @@ -764,7 +760,7 @@ static int map_seed_devices(struct list_head *all_uuids, if (search) { if (uuid_search(cur_fs, search) == 0) continue; - *found = 1; + ret = 1; } fs_copy = malloc(sizeof(*fs_copy)); @@ -782,6 +778,22 @@ static int map_seed_devices(struct list_head *all_uuids, list_add(fs_copy-list, all_uuids); } +out: + return ret; +} + +static int map_seed_devices(struct list_head *all_uuids) +{ + struct btrfs_fs_devices *cur_fs, *cur_seed; + struct btrfs_fs_devices *seed_copy; + struct btrfs_fs_devices *opened_fs; + struct btrfs_device *device; + struct btrfs_fs_info *fs_info; + struct list_head *fs_uuids; + int ret = 0; + + fs_uuids = btrfs_scanned_uuids(); + list_for_each_entry(cur_fs, all_uuids, list) { device = list_first_entry(cur_fs-devices, struct btrfs_device, dev_list); @@ -943,11 +955,18 @@ devs_only: return 1; } + found = search_umounted_fs_uuids(all_uuids, search); + if (found 0) { + fprintf(stderr, + ERROR: %d while searching target device\n, ret); + return 1; + } + /* * scan_for_btrfs() don't build seed/sprout mapping, * do mapping build for each scanned fs here */ - ret = map_seed_devices(all_uuids, search, found); + ret = map_seed_devices(all_uuids); if (ret) { fprintf(stderr, ERROR: %d while mapping seed devices\n, ret); -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] btrfs-progs: skip mounted fs when deal with umounted ones for fi show
Stalling problems may happen when exec balance fi show cmds concurrently. With the following commit: commit 915902c500 btrfs-progs: fix device missing of btrfs fi show with seed devices The fi show cmd will bother the mounted fs when only umounted fs should be handled after @btrfs_can_kernel() has finished showing all mounted ones. We could skip the mounted fs after @btrfs_can_kernel() is done, then tasks keeps going on mounted fs while fi show continues on umounted ones separately. Reported-by: Petr Janecek jane...@ucw.cz Signed-off-by: Gui Hecheng guihc.f...@cn.fujitsu.com --- cmds-filesystem.c | 13 + 1 file changed, 13 insertions(+) diff --git a/cmds-filesystem.c b/cmds-filesystem.c index 6437e57..67fe52b 100644 --- a/cmds-filesystem.c +++ b/cmds-filesystem.c @@ -53,6 +53,15 @@ struct seen_fsid { static struct seen_fsid *seen_fsid_hash[SEEN_FSID_HASH_SIZE] = {NULL,}; +static int is_seen_fsid(u8 *fsid) +{ + u8 hash = fsid[0]; + int slot = hash % SEEN_FSID_HASH_SIZE; + struct seen_fsid *seen = seen_fsid_hash[slot]; + + return seen ? 1 : 0; +} + static int add_seen_fsid(u8 *fsid) { u8 hash = fsid[0]; @@ -763,6 +772,10 @@ static int search_umounted_fs_uuids(struct list_head *all_uuids, ret = 1; } + /* skip all fs already shown as mounted fs */ + if (is_seen_fsid(cur_fs-fsid)) + continue; + fs_copy = malloc(sizeof(*fs_copy)); if (!fs_copy) { ret = -ENOMEM; -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs-progs: rebuild missing block group during chunk recovery if possible
Before the patch, chunk will be considered bad if the corresponding block group is missing, even the only uncertain data is the 'used' member of the block group. This patch will try to recalculate the 'used' value of the block group and rebuild it. So even only chunk item and dev extent item is found, the chunk can be recovered. Although if extent tree is damanged and needed extent item can't be read, the block group's 'used' value will be the block group length, to prevent any later write/block reserve damaging the block group. In that case, we will prompt user and recommend them to use '--init-extent-tree' to rebuild extent tree if possible. Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com --- btrfsck.h | 3 +- chunk-recover.c | 242 +--- cmds-check.c| 29 --- 3 files changed, 234 insertions(+), 40 deletions(-) diff --git a/btrfsck.h b/btrfsck.h index 356c767..7a50648 100644 --- a/btrfsck.h +++ b/btrfsck.h @@ -179,5 +179,6 @@ btrfs_new_device_extent_record(struct extent_buffer *leaf, int check_chunks(struct cache_tree *chunk_cache, struct block_group_tree *block_group_cache, struct device_extent_tree *dev_extent_cache, -struct list_head *good, struct list_head *bad, int silent); +struct list_head *good, struct list_head *bad, +struct list_head *rebuild, int silent); #endif diff --git a/chunk-recover.c b/chunk-recover.c index 6f43066..dbf98b5 100644 --- a/chunk-recover.c +++ b/chunk-recover.c @@ -61,6 +61,7 @@ struct recover_control { struct list_head good_chunks; struct list_head bad_chunks; + struct list_head rebuild_chunks; struct list_head unrepaired_chunks; pthread_mutex_t rc_lock; }; @@ -203,6 +204,7 @@ static void init_recover_control(struct recover_control *rc, int verbose, INIT_LIST_HEAD(rc-good_chunks); INIT_LIST_HEAD(rc-bad_chunks); + INIT_LIST_HEAD(rc-rebuild_chunks); INIT_LIST_HEAD(rc-unrepaired_chunks); rc-verbose = verbose; @@ -529,22 +531,32 @@ static void print_check_result(struct recover_control *rc) return; printf(CHECK RESULT:\n); - printf(Healthy Chunks:\n); + printf(Recoverable Chunks:\n); list_for_each_entry(chunk, rc-good_chunks, list) { print_chunk_info(chunk, ); good++; total++; } - printf(Bad Chunks:\n); + list_for_each_entry(chunk, rc-rebuild_chunks, list) { + print_chunk_info(chunk, ); + good++; + total++; + } + list_for_each_entry(chunk, rc-unrepaired_chunks, list) { + print_chunk_info(chunk, ); + good++; + total++; + } + printf(Unrecoverable Chunks:\n); list_for_each_entry(chunk, rc-bad_chunks, list) { print_chunk_info(chunk, ); bad++; total++; } printf(\n); - printf(Total Chunks:\t%d\n, total); - printf( Heathy:\t%d\n, good); - printf( Bad:\t%d\n, bad); + printf(Total Chunks:\t\t%d\n, total); + printf( Recoverable:\t\t%d\n, good); + printf( Unrecoverable:\t%d\n, bad); printf(\n); printf(Orphan Block Groups:\n); @@ -555,6 +567,7 @@ static void print_check_result(struct recover_control *rc) printf(Orphan Device Extents:\n); list_for_each_entry(devext, rc-devext.no_chunk_orphans, chunk_list) print_device_extent_info(devext, ); + printf(\n); } static int check_chunk_by_metadata(struct recover_control *rc, @@ -938,6 +951,11 @@ static int build_device_maps_by_chunk_records(struct recover_control *rc, if (ret) return ret; } + list_for_each_entry(chunk, rc-rebuild_chunks, list) { + ret = build_device_map_by_chunk_record(root, chunk); + if (ret) + return ret; + } return ret; } @@ -1168,12 +1186,31 @@ static int __rebuild_device_items(struct btrfs_trans_handle *trans, return ret; } +static int __insert_chunk_item(struct btrfs_trans_handle *trans, + struct chunk_record *chunk_rec, + struct btrfs_root *chunk_root) +{ + struct btrfs_key key; + struct btrfs_chunk *chunk = NULL; + int ret = 0; + + chunk = create_chunk_item(chunk_rec); + if (!chunk) + return -ENOMEM; + key.objectid = BTRFS_FIRST_CHUNK_TREE_OBJECTID; + key.type = BTRFS_CHUNK_ITEM_KEY; + key.offset = chunk_rec-offset; + + ret = btrfs_insert_item(trans, chunk_root, key, chunk, + btrfs_chunk_item_size(chunk-num_stripes)); + free(chunk); + return ret; +} + static int __rebuild_chunk_items(struct
Re: Btrfs raid1 array has issues with rtorrent usage pattern.
It's specifically BTRFS related, I was able to reproduce it on a bare drive (no lvm, no md, no bcache). It's not bad RAM, I was able to reproduce it on multiple machines running either 3.17 or late RCs. I've tested 3.18-rc2 for about 2 hours now, can't get any failures, so that's good. If anyone else can reproduce this it'll probably need to be sent to 3.17-stable. On Wed, Oct 29, 2014 at 7:24 PM, Alec Blayne a...@tevsa.net wrote: Really nice to know it's already getting handled :) I'm already downgrading to 3.16.6 now that I know I won't have that issue. I was already planning to because of the read-only snapshots issue. Thank you and good luck debugging! On 29-10-2014 21:50, Dan Merillat wrote: I'm in the middle of debugging the exact same thing. 3.17.0 - rtorrent dies with SIGBUS. I've done some debugging, the sequence is something like this: open a new file fallocate() to the final size mmap() all (or a portion) of the file write to the region run SHA1 on that mmap'd region to validate the chink crash, eventually. Generally not at the same point. Reading that file (cat /dev/null) returns -EIO. Looking up the process maps, the SIGBUS appears to be happening in the middle of a mapped region of a pre-allocated file - I.E. it shouldn't be. I'm not completely ruling out a rtorrent bug but it appears sane to me. Weirder: old files, that have been around a while, work just fine for seeding. I've re-hashed my entire collection without an error. Seeing this on both inherit-COW and no-inherit-COW files, and the filesystem is not using compression. The interesting part is going back and attempting to read the files later they sometimes don't throw an IO error. Absolutely nothing in dmesg. Working on a testcase that triggers it reliably but no luck so far. I thought I had bad RAM but two people upgrading to 3.17 and seeing the same bug at around the same time can't be a coincidence. I rebooted to 3.17 on the 25th, the first new download was on the 28th and that failed. Working on a testcase for it that's more reproducable than go grab torrent files with rtorrent. On Tue, Oct 28, 2014 at 12:49 PM, Alec Blayne a...@tevsa.net wrote: Hi, it seems that when using rtorrent to download into a btrfs system, it leads to the creation of files that fail to read properly. For instance, I get rtorrent to crash, but if I try to rsync the file he was writting into someplace else, rsync also fails with the message can't map file $file: Input/Output error (5). If I give it time, eventually the file gets into a good state and I can rsync it somewhere else (as long as rtorrent doesn't keep writting into it). This doesn't happen using ext4 on the same system. No btrfs errors, or any other errors, show up in any log. Scrubbing or balancing don't turn up any issues. I've tried using a subvolume mounted with nodatacow and/or flushoncommit, which didn't help. I'm not using quotas and at some point had a single snapshot that I deleted. The filesystem was originally created recently (on a 3.16.4+ kernel). Here's what the array looks like: Label: 'data' uuid: ffe83a3d-f4ba-46b7-8424-4ec3380cb811 Total devices 4 FS bytes used 3.14TiB devid4 size 2.73TiB used 2.36TiB path /dev/sdd1 devid5 size 1.82TiB used 1.45TiB path /dev/sdc1 devid6 size 1.82TiB used 1.45TiB path /dev/sdb1 devid7 size 1.82TiB used 1.45TiB path /dev/sda1 Btrfs v3.17 Data, RAID1: total=3.34TiB, used=3.13TiB System, RAID1: total=32.00MiB, used=512.00KiB Metadata, RAID1: total=10.00GiB, used=7.31GiB GlobalReserve, single: total=512.00MiB, used=0.00B On linux 3.17.1: Linux 3.17.1-gentoo-r1 #3 SMP PREEMPT Tue Oct 28 02:43:11 WET 2014 x86_64 AMD Athlon(tm) 5350 APU with Radeon(tm) R3 AuthenticAMD GNU/Linux I'm utterly puzzled and clueless at how to dig into this issue. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID1 fails to recover chunk tree
just notice your case is different from others seen/working on. in your the layout has issue. its not about the raid. sorry. try: mount -o recovery,ro On 10/30/2014 03:32 AM, Zack Coffey wrote: $ sudo mount -o degraded,ro /dev/sdd1 /asdf mount: wrong fs type, bad option, bad superblock on /dev/sdd1, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so. $ dmesg | tail [524718.760792] BTRFS info (device sdd1): allowing degraded mounts [524718.760800] BTRFS info (device sdd1): disk space caching is enabled [524718.762087] BTRFS: failed to read chunk root on sdd1 [524718.776524] BTRFS: open_ctree failed $ uname -a Linux mach 3.17.1-52.g5c4d099-desktop #1 SMP PREEMPT Sat Oct 18 23:36:23 UTC 2014 (5c4d099) x86_64 x86_64 x86_64 GNU/Linux $ btrfs --version Btrfs v3.16.2+20141003 On 10/28/2014 11:55 PM, Anand Jain wrote: 'mount degraded,ro' see if there is any non-zero non-raid1 group profile. On 10/29/14 04:32, Zack Coffey wrote: Revisit of a previous issue. Setup a single 640GB drive with BTRFS and compression. This was not a system drive, just a place to put random junk. Made a RAID1 with another drive of just the metadata. Was in that state for less than 12 hours-ish, removed the second drive and now cannot get to any data on the original drive. Data remained single while only metadata was RAID1. Single drive btrfs was made on Ubuntu with kernel 3.13.0 and tools 3.12. $ sudo mount -o degraded /dev/sdc1 /media/Data/ mount: wrong fs type, bad option, bad superblock on /dev/sdc1, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so $ dmesg | tail [45353.869448] KBD BUG in ../../../../../../../../ drivers/2d/lnx/fgl/drm/kernel/ gal.c at line: 304! [45353.901511] KBD BUG in ../../../../../../../../ drivers/2d/lnx/fgl/drm/kernel/gal.c at line: 304! [45353.901666] KBD BUG in ../../../../../../../../drivers/2d/lnx/fgl/drm/kernel/gal.c at line: 304! [45354.148488] KBD BUG in ../../../../../../../../drivers/2d/lnx/fgl/drm/kernel/gal.c at line: 304! [45354.148573] KBD BUG in ../../../../../../../../drivers/2d/lnx/fgl/drm/kernel/gal.c at line: 304! [46241.155350] btrfs: device fsid bd78815a-802b-43e2-8387-fc6ab4237d67 devid 1 transid 60944 /dev/sdc1 [46241.155923] btrfs: allowing degraded mounts [46241.155927] btrfs: disk space caching is enabled [46241.159436] btrfs: failed to read chunk root on sdc1 [46241.177815] btrfs: open_ctree failed $ btrfs-show-super /dev/sdc1 superblock: bytenr=65536, device=/dev/sdc1 -- --- csum0x93bcb1b5 [match] bytenr 65536 flags 0x1 magic _BHRfS_M [match] fsidbd78815a-802b-43e2-8387-fc6ab4237d67 label generation 60944 root909586694144 sys_array_size 97 chunk_root_generation 60938 root_level 1 chunk_root 911673917440 chunk_root_level1 log_root0 log_root_transid0 log_root_level 0 total_bytes 1115871535104 bytes_used 321833435136 sectorsize 4096 nodesize4096 leafsize4096 stripesize 4096 root_dir6 num_devices 2 compat_flags0x0 compat_ro_flags 0x0 incompat_flags 0x9 csum_type 0 csum_size 4 cache_generation60944 uuid_tree_generation60944 dev_item.uuid d82b2027-17b6-4513-a86d-9227a42d7ed1 dev_item.fsid bd78815a-802b-43e2-8387-fc6ab4237d67 [match] dev_item.type 0 dev_item.total_bytes615763673088 dev_item.bytes_used 324270030848 dev_item.io_align 4096 dev_item.io_width 4096 dev_item.sector_size4096 dev_item.devid 1 dev_item.dev_group 0 dev_item.seek_speed 0 dev_item.bandwidth 0 dev_item.generation 0 $ sudo btrfs device add -f /dev/sdh1 /dev/sdc1 ERROR: error adding the device '/dev/sdh1' - Inappropriate ioctl for device $ sudo btrfs device delete missing /dev/sdc1 ERROR: error removing the device 'missing' - Inappropriate ioctl for device $ sudo mount -o degraded,defaults,compress=lzo /dev/sdc1 /media/Data/ mount: wrong fs type, bad option, bad superblock on /dev/sdc1, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so $ dmesg | tail [106991.655384] btrfs: device fsid bd78815a-802b-43e2-8387-fc6ab4237d67 devid 1 transid 60944 /dev/sdc1 [106991.665066] btrfs: device fsid bd78815a-802b-43e2-8387-fc6ab4237d67 devid 1 transid 60944 /dev/sdc1 [107019.954397] btrfs: device fsid bd78815a-802b-43e2-8387-fc6ab4237d67 devid 1 transid 60944 /dev/sdc1 [107019.962009] btrfs: device fsid
Re: [PATCH] btrfs: Enhance btrfs chunk allocation algorithm to reduce ENOSPC caused by unbalanced data/metadata allocation.
Original Message Subject: Re: [PATCH] btrfs: Enhance btrfs chunk allocation algorithm to reduce ENOSPC caused by unbalanced data/metadata allocation. From: Qu Wenruo quwen...@cn.fujitsu.com To: bo.li@oracle.com Date: 2014年10月30日 08:58 Original Message Subject: Re: [PATCH] btrfs: Enhance btrfs chunk allocation algorithm to reduce ENOSPC caused by unbalanced data/metadata allocation. From: Liu Bo bo.li@oracle.com To: Qu Wenruo quwen...@cn.fujitsu.com Date: 2014年10月29日 22:29 On Mon, Oct 27, 2014 at 04:36:26PM +0800, Qu Wenruo wrote: Original Message Subject: Re: [PATCH] btrfs: Enhance btrfs chunk allocation algorithm to reduce ENOSPC caused by unbalanced data/metadata allocation. From: Liu Bo bo.li@oracle.com To: Qu Wenruo quwen...@cn.fujitsu.com Date: 2014年10月27日 16:14 On Mon, Oct 27, 2014 at 08:18:12AM +0800, Qu Wenruo wrote: Original Message Subject: Re: [PATCH] btrfs: Enhance btrfs chunk allocation algorithm to reduce ENOSPC caused by unbalanced data/metadata allocation. From: Liu Bo bo.li@oracle.com To: Qu Wenruo quwen...@cn.fujitsu.com Date: 2014年10月24日 19:06 On Thu, Oct 23, 2014 at 10:37:51AM +0800, Qu Wenruo wrote: When btrfs allocate a chunk, it will try to alloc up to 1G for data and 256M for metadata, or 10% of all the writeable space if there is enough 10G for data, if (type BTRFS_BLOCK_GROUP_DATA) { max_stripe_size = 1024 * 1024 * 1024; max_chunk_size = 10 * max_stripe_size; Oh, sorry, 10G is right. Any other comments? Thanks, Qu ... thanks, -liubo space for the stripe on device. However, when we run out of space, this allocation may cause unbalanced chunk allocation. For example, there are only 1G unallocated space, and request for allocate DATA chunk is sent, and all the space will be allocated as data chunk, making later metadata chunk alloc request unable to handle, which will cause ENOSPC. This is the one of the common complains from end users about why ENOSPC happens but there is still available space. Okay, I don't think this is the common case, AFAIK, the most ENOSPC is caused by our runtime worst case metadata reservation problem. btrfs has been inclined to create a fairly large metadata chunk (1G) in its initial mkfs stage and 256M metadata chunk is also a very large one. As of your below example, yes, we don't have space for metadata allocation, but do we really need to allocate a new one? Or am I missing something? thanks, -liubo Yes that's true this is not the common cause, but at least this patch may make the percentage of 'df' command reach as close to 100% as possible before hitting ENOSPC under normal operations. (If not using balance) And some case like the following mail may be improved by the patch: https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg36097.html I understand that most of the cases that a lot of free data space and no metadata space is caused by create and then delete large files, but if the last giga bytes can be allocated more carefully, at least the available bytes of 'df' command should be reduced before hit ENOSPC. How do you think about it? Sorry for the late reply. I just notice that a recent commit has fixed this problem. commit 47ab2a6c689913db23ccae38349714edf8365e0a Author: Josef Bacik jba...@fb.com Date: Thu Sep 18 11:20:02 2014 -0400 Btrfs: remove empty block groups automatically thanks, -liubo Oh, that's much better than my patch. So please ignore my patch. Thanks, Qu Wait a second, that's true block group auto-reclaim can deal with some cases, but it will not improve the vanilla 'df' used percentage before hit ENOSPC. The old 10%/10G will still hit the ENOSPC below 90% used space if using 100G disk. This patch should improve it to above 95% or even above 99%. The old behavior may leave a bad image on normal users that btrfs can't use space effectively. So I still consider the patch has positive effect on btrfs. Thanks, Qu Thanks, Qu This patch will try not to alloc chunk which is more than half of the unallocated space, making the last space more balanced at a small cost of more fragmented chunk at the last 1G. Some easy example: Preallocate 17.5G on a 20G empty btrfs fs: [Before] # btrfs fi show /mnt/test Label: none uuid: da8741b1-5d47-4245-9e94-bfccea34e91e Total devices 1 FS bytes used 17.50GiB devid1 size 20.00GiB used 20.00GiB path /dev/sdb All space is allocated. No space later metadata space. [After] # btrfs fi show /mnt/test Label: none uuid: e6935aeb-a232-4140-84f9-80aab1f23d56 Total devices 1 FS bytes used 17.50GiB devid1 size 20.00GiB used 19.77GiB path /dev/sdb About 230M is still available for later metadata allocation. Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com --- fs/btrfs/volumes.c | 18 ++ 1 file changed, 18 insertions(+) diff --git
Re: read block failed check_tree_block / Couldn't read chunk tree
yes that's the one. btrfs-progs: fix device missing of btrfs fi show with seed devices Thanks On 10/29/2014 08:15 PM, Rene Thomas wrote: Can't find commit in official repos Get fatal: bad object 915902c5002485fb13d27c4b699a73fb66cc0f09 from git show Found commit 2513077f2f830b4bc83d528bfb6979eb461918bd btrfs-progs: fix device missing of btrfs fi show with seed devices Thanks René 2014-10-29 4:45 GMT+01:00 Anand Jain anand.j...@oracle.com: this is (most likely) due to patch below, commit 915902c5002485fb13d27c4b699a73fb66cc0f09 btrfs-progs: fix device missing of btrfs fi show with seed devices Could you try to back out the patch from progs and give it a shot ? and pls report what you see. Thanks. On 10/25/14 00:43, Rene Thomas wrote: # btrfs --version Btrfs v3.17 # btrfs fi show Label: 'mythstorage' uuid: 9b454272-6800-4b3c-b196-9e180407a6cb Total devices 1 FS bytes used 2.36MiB devid1 size 931.51GiB used 10.04GiB path /dev/sdd1 Check tree block failed, want=5845480062976, have=0 Check tree block failed, want=5845480062976, have=0 Check tree block failed, want=5845480062976, have=65536 Check tree block failed, want=5845480062976, have=0 Check tree block failed, want=5845480062976, have=0 read block failed check_tree_block Couldn't read chunk tree -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] revert btrfs-progs: do a separate probe for _transient_ replacing device
There is a compatibility issue with older kernel with the progs commit id as below. d0588bfa479409b2a0f6243f894338a01a56221a btrfs-progs: do a separate probe for _transient_ replacing device So as of now writing to revert the above commit id. The brewing sysfs interface would help to fix the impending issue, which is seed device would fail show in 'btrfs fi show' output of a sprout device. Signed-off-by: Anand Jain anand.j...@oracle.com --- v2: update commit with correct commit which this patch will revert utils.c | 19 +-- 1 file changed, 1 insertion(+), 18 deletions(-) diff --git a/utils.c b/utils.c index a8691fe..1d1cc77 100644 --- a/utils.c +++ b/utils.c @@ -1869,29 +1869,12 @@ int get_fs_info(char *path, struct btrfs_ioctl_fs_info_args *fi_args, if (!fi_args-num_devices) goto out; - /* -* with kernel patch -* btrfs: ioctl BTRFS_IOC_FS_INFO and BTRFS_IOC_DEV_INFO miss-matched with slots -* the kernel now returns total_devices which does not include -* replacing device if running. -* As we need to get dev info of the replace device if it is running, -* so just add one to fi_args-num_devices. -*/ - - di_args = *di_ret = malloc((fi_args-num_devices + 1) * sizeof(*di_args)); + di_args = *di_ret = malloc((fi_args-num_devices) * sizeof(*di_args)); if (!di_args) { ret = -errno; goto out; } - /* get the replace target device if it is there */ - ret = get_device_info(fd, i, di_args[ndevs]); - if (!ret) { - ndevs++; - fi_args-num_devices++; - } - i++; - for (; i = fi_args-max_id; ++i) { BUG_ON(ndevs = fi_args-num_devices); ret = get_device_info(fd, i, di_args[ndevs]); -- 2.0.0.153.g79d -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] revert btrfs-progs: do a separate probe for _transient_ replacing device
my ws commit ids have changed may be when I was to trying nail down an issue some time back. Thanks. V2. is out. On 10/29/2014 07:41 PM, Gui Hecheng wrote: On Wed, 2014-10-29 at 18:51 +0800, Anand Jain wrote: There is a compatibility issue with older kernel with the progs commit id as below. 05cd2907557ba627cfb86e60b214ea6228613a84 Which tree does this commit id belongs to? I can't find it anywhere? So as of now writing to revert the above commit id. The brewing sysfs interface would help to fix the impending issue, which is seed device would fail show in 'btrfs fi show' output of a sprout device. Signed-off-by: Anand Jain anand.j...@oracle.com --- utils.c | 19 +-- 1 file changed, 1 insertion(+), 18 deletions(-) diff --git a/utils.c b/utils.c index a8691fe..1d1cc77 100644 --- a/utils.c +++ b/utils.c @@ -1869,29 +1869,12 @@ int get_fs_info(char *path, struct btrfs_ioctl_fs_info_args *fi_args, if (!fi_args-num_devices) goto out; - /* -* with kernel patch -* btrfs: ioctl BTRFS_IOC_FS_INFO and BTRFS_IOC_DEV_INFO miss-matched with slots -* the kernel now returns total_devices which does not include -* replacing device if running. -* As we need to get dev info of the replace device if it is running, -* so just add one to fi_args-num_devices. -*/ - - di_args = *di_ret = malloc((fi_args-num_devices + 1) * sizeof(*di_args)); + di_args = *di_ret = malloc((fi_args-num_devices) * sizeof(*di_args)); if (!di_args) { ret = -errno; goto out; } - /* get the replace target device if it is there */ - ret = get_device_info(fd, i, di_args[ndevs]); - if (!ret) { - ndevs++; - fi_args-num_devices++; - } - i++; - for (; i = fi_args-max_id; ++i) { BUG_ON(ndevs = fi_args-num_devices); ret = get_device_info(fd, i, di_args[ndevs]); -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html