[PATCH v2] btrfs-progs: Fix a buffer overflow causing segfault in fstests/btrfs/069
The newly introduced search_chunk_tree_for_fs_info() won't count devid 0 in fi_arg-num_devices, which will cause buffer overflow since later get_device_info() will fill di_args with devid. This can be trigger by fstests/btrfs/069 and any operations needs to iterate over all the devices like 'fi show' or 'dev stat' while replacing. The fix is do an extra probe specifically for devid 0 after search_chunk_tree_for_fs_info() and change num_devices if needed. Reported-by: Tsutomu Itoh t-i...@jp.fujitsu.com Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com Signed-off-by: Gui Hecheng guihc.f...@cn.fujitsu.com --- utils.c | 17 + 1 file changed, 17 insertions(+) diff --git a/utils.c b/utils.c index af0a8fe..6581568 100644 --- a/utils.c +++ b/utils.c @@ -1934,8 +1934,10 @@ int get_fs_info(char *path, struct btrfs_ioctl_fs_info_args *fi_args, int ret = 0; int ndevs = 0; int i = 0; + int replacing = 0; struct btrfs_fs_devices *fs_devices_mnt = NULL; struct btrfs_ioctl_dev_info_args *di_args; + struct btrfs_ioctl_dev_info_args tmp; char mp[BTRFS_PATH_NAME_MAX + 1]; DIR *dirstream = NULL; @@ -2003,6 +2005,19 @@ int get_fs_info(char *path, struct btrfs_ioctl_fs_info_args *fi_args, ret = search_chunk_tree_for_fs_info(fd, fi_args); if (ret) goto out; + + /* +* search_chunk_tree_for_fs_info() will lacks the devid 0 +* so manual probe for it here. +*/ + ret = get_device_info(fd, 0, tmp); + if (!ret) { + fi_args-num_devices++; + ndevs++; + replacing = 1; + if (i == 0) + i++; + } } if (!fi_args-num_devices) @@ -2014,6 +2029,8 @@ int get_fs_info(char *path, struct btrfs_ioctl_fs_info_args *fi_args, goto out; } + if (replacing) + memcpy(di_args, tmp, sizeof(tmp)); for (; i = fi_args-max_id; ++i) { ret = get_device_info(fd, i, di_args[ndevs]); if (ret == -ENODEV) -- 2.2.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFE: per-subvolume timestamp that is updated on every change to a subvolume
Original Message Subject: Re: RFE: per-subvolume timestamp that is updated on every change to a subvolume From: Qu Wenruo quwen...@cn.fujitsu.com To: Lennart Poettering lenn...@poettering.net, linux-btrfs@vger.kernel.org Date: 2015年01月06日 14:02 Original Message Subject: RFE: per-subvolume timestamp that is updated on every change to a subvolume From: Lennart Poettering lenn...@poettering.net To: linux-btrfs@vger.kernel.org Date: 2015年01月06日 01:27 Heya! I am looking for a nice way to query the overall last modification timestamp of a subvolume. i.e. the most recent mtime of *any* file or directory within a subvolume. Ideally, I think, there was a btrfs_timespec field for this in struct btrfs_root_item, alas there isn't afaics. Any chance this can be added? In fact, btrfs_root_item contains one btrfs_inode_item, which contains the a/c/m/otime. But not sure if it contains the time you need. I'd better add acmotime output for inode_item in btrfs-debug-tree and try myself. Thanks, Qu The value in acmotime of the inode_item in root_item is not used, so it seems anyone can use it to record the acmotime for your purpose. Thanks, Qu Or is there another workable way to query this value? Maybe determine it from the current generation of a subvolume or so? Is that tracked? Ideas? Lennart -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: reada: Remove unused function
On Mon, 5 Jan 2015, David Sterba wrote: Remove the function btrfs_reada_detach() that is not used anywhere. This was partially found by using a static code analysis program called cppcheck. Signed-off-by: Rickard Strandqvist rickard_strandqv...@spectrumdigital.se No please, this function is part of public readahead API and similar patch has been NACKed several times. BTW how is this any kind of API for anybody, given it's not exported to modules? -- Jiri Kosina SUSE Labs -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: reada: Remove unused function
On Tue, Jan 06, 2015 at 11:42:07AM +0100, Jiri Kosina wrote: On Mon, 5 Jan 2015, David Sterba wrote: Remove the function btrfs_reada_detach() that is not used anywhere. This was partially found by using a static code analysis program called cppcheck. Signed-off-by: Rickard Strandqvist rickard_strandqv...@spectrumdigital.se No please, this function is part of public readahead API and similar patch has been NACKed several times. BTW how is this any kind of API for anybody, given it's not exported to modules? Scratch 'public' from the sentence, that was misleading. The API is internal to btrfs. The readahead can work in synchronous and asynchronous modes, this function is API to the async mode. Documented at reada.c: 34 /* 35 * This is the implementation for the generic read ahead framework. 36 * 37 * To trigger a readahead, btrfs_reada_add must be called. It will start 38 * a read ahead for the given range [start, end) on tree root. The returned 39 * handle can either be used to wait on the readahead to finish 40 * (btrfs_reada_wait), or to send it to the background (btrfs_reada_detach). 41 * ... I've experimented with it for readdir speedups, but I haven't finished that due to other problems. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs_inode_item's otime?
On Tue, 6 Jan 2015 10:47:00 PM Chris Samuel wrote: On Mon, 5 Jan 2015 06:21:52 PM Lennart Poettering wrote: It should be easy to initialize it to the mtime when the inode is first created... This I agree with, well worth doing anyway. I'll see if I can knock up a patch. Sadly it appears that the btrfs code sets mtime/ctime/atimeat inode creation via the normal filesystem inode structure, not through it's own, and as that doesn't include otime I'm afraid it's out of my league. Worth a shot though! All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs_inode_item's otime?
On Mon, 5 Jan 2015 06:21:52 PM Lennart Poettering wrote: Is this on purpose, or simply an oversight? The only hint I can see that it's deliberate is the comment in fs/btrfs/send.c that says: /* TODO Add otime support when the otime patches get into upstream */ However... It should be easy to initialize it to the mtime when the inode is first created... This I agree with, well worth doing anyway. I'll see if I can knock up a patch. All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: qgroup: move WARN_ON() to the correct location.
In function qgroup_excl_accounting(), we need to WARN when qg-excl is less than what we want to free, same to child and parents. But currently, for parent qgroup, the WARN_ON() is located after freeing qg-excl. It will WARN out even we free it normally. This patch move this WARN_ON() before freeing qg-excl. Signed-off-by: Dongsheng Yang yangds.f...@cn.fujitsu.com --- fs/btrfs/qgroup.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index 48b60db..97159a8 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -1431,9 +1431,8 @@ static int qgroup_excl_accounting(struct btrfs_fs_info *fs_info, qgroup = u64_to_ptr(unode-aux); qgroup-rfer += sign * oper-num_bytes; qgroup-rfer_cmpr += sign * oper-num_bytes; + WARN_ON(sign 0 qgroup-excl oper-num_bytes); qgroup-excl += sign * oper-num_bytes; - if (sign 0) - WARN_ON(qgroup-excl oper-num_bytes); qgroup-excl_cmpr += sign * oper-num_bytes; qgroup_dirty(fs_info, qgroup); -- 1.8.4.2 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: suppress a build warning on building 32bit kernel
On Mon, Jan 05, 2015 at 05:03:29PM +0900, Satoru Takeuchi wrote: - failrec = (struct io_failure_record *)state-private; + failrec = (struct io_failure_record *)(unsigned long)state-private; We're always using the 'private' data to store a pointer to 'struct io_failure_record *', please change the defintion in 'struct extent_state' instead of the typecasting. Current definition is as follow. === struct extent_state { ... /* for use by the FS */ u64 private; }; === It it OK to changing u64 private to struct io_failure_record *failrec and change {set,get}_state_private() to {set,get}_state_failrec()? Or is it better to keep the name private as is and just change its type to unsigned long or (void *)? I've looked at the implied changes that set/get functions renaming would need, also to keep the code sane. It does not seem to be small enough to fold in this patch so please go on with adding the typecasts. The code could use some cleanups but bugfixes first. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS: Transaction aborted (error -5)
Hi, Try to mount with -o recovery with either kernel (newer is pretty much always better). If that doesn't work then you should try upgrading btrfs-progs to 3.18 (released dozens of hours ago) and run 'btrfs check' on the volume and report the results. I don't recommend using --repair option just yet. mounting with -o recovery yields the same errors in dmesg output as without the -o recovery. This is even after upgrading the kernel to 3.18.1 and btrfs-progs to 3.18. BTRFS check yields this: # btrfs check /dev/sdd1 Checking filesystem on /dev/sdd1 UUID: adad9bea-fc42-4411-bfda-345111934fda checking extents checksum verify failed on 588447744 found 6E0D3115 wanted F90C810B checksum verify failed on 588447744 found 6E0D3115 wanted F90C810B checksum verify failed on 588447744 found 24492BB3 wanted F90C810B checksum verify failed on 588447744 found 6E0D3115 wanted F90C810B Csum didn't match owner ref check failed [588447744 16384] ref mismatch on [151193784320 32768] extent item 0, found 1 Backref 151193784320 root 277 owner 36161 offset 3833856 num_refs 0 not found in extent tree Incorrect local backref count on 151193784320 root 277 owner 36161 offset 3833856 found 1 wanted 0 back 0xad8be18 backpointer mismatch on [151193784320 32768] ref mismatch on [151193817088 32768] extent item 0, found 1 Backref 151193817088 root 277 owner 36161 offset 3915776 num_refs 0 not found in extent tree Incorrect local backref count on 151193817088 root 277 owner 36161 offset 3915776 found 1 wanted 0 back 0xad8bf00 backpointer mismatch on [151193817088 32768] ref mismatch on [151193849856 180224] extent item 0, found 1 Backref 151193849856 root 277 owner 36355 offset 3112960 num_refs 0 not found in extent tree Incorrect local backref count on 151193849856 root 277 owner 36355 offset 3112960 found 1 wanted 0 back 0xab333f0 backpointer mismatch on [151193849856 180224] ref mismatch on [151194030080 3145728] extent item 0, found 7 Backref 151194030080 root 277 owner 36187 offset 1048576 num_refs 0 not found in extent tree Incorrect local backref count on 151194030080 root 277 owner 36187 offset 1048576 found 7 wanted 0 back 0x9b82580 backpointer mismatch on [151194030080 3145728] ref mismatch on [151197175808 32768] extent item 0, found 1 Backref 151197175808 root 277 owner 36361 offset 2523136 num_refs 0 not found in extent tree Incorrect local backref count on 151197175808 root 277 owner 36361 offset 2523136 found 1 wanted 0 back 0xa0f5568 backpointer mismatch on [151197175808 32768] ref mismatch on [151197208576 32768] extent item 0, found 1 Backref 151197208576 root 277 owner 36361 offset 2572288 num_refs 0 not found in extent tree Incorrect local backref count on 151197208576 root 277 owner 36361 offset 2572288 found 1 wanted 0 back 0xa783490 backpointer mismatch on [151197208576 32768] ref mismatch on [151197241344 32768] extent item 0, found 1 Backref 151197241344 root 277 owner 36361 offset 2621440 num_refs 0 not found in extent tree Incorrect local backref count on 151197241344 root 277 owner 36361 offset 2621440 found 1 wanted 0 back 0xa4d67e8 backpointer mismatch on [151197241344 32768] ref mismatch on [151197274112 32768] extent item 0, found 1 Backref 151197274112 root 277 owner 36361 offset 2703360 num_refs 0 not found in extent tree Incorrect local backref count on 151197274112 root 277 owner 36361 offset 2703360 found 1 wanted 0 back 0x925de30 backpointer mismatch on [151197274112 32768] ref mismatch on [151197306880 16384] extent item 0, found 1 Backref 151197306880 root 277 owner 36361 offset 3637248 num_refs 0 not found in extent tree Incorrect local backref count on 151197306880 root 277 owner 36361 offset 3637248 found 1 wanted 0 back 0x916a658 backpointer mismatch on [151197306880 16384] ref mismatch on [151197323264 983040] extent item 0, found 3 Backref 151197323264 root 277 owner 36208 offset 0 num_refs 0 not found in extent tree Incorrect local backref count on 151197323264 root 277 owner 36208 offset 0 found 3 wanted 0 back 0xb18a1e0 backpointer mismatch on [151197323264 983040] ref mismatch on [151198306304 32768] extent item 0, found 1 Backref 151198306304 root 277 owner 36086 offset 3780608 num_refs 0 not found in extent tree Incorrect local backref count on 151198306304 root 277 owner 36086 offset 3780608 found 1 wanted 0 back 0xb30f878 backpointer mismatch on [151198306304 32768] ref mismatch on [151198339072 98304] extent item 0, found 1 Backref 151198339072 root 277 owner 36396 offset 901120 num_refs 0 not found in extent tree Incorrect local backref count on 151198339072 root 277 owner 36396 offset 901120 found 1 wanted 0 back 0x99bfc58 backpointer mismatch on [151198339072 98304] ref mismatch on [151198437376 16384] extent item 0, found 1 Backref 151198437376 root 277 owner 36396 offset 1015808 num_refs 0 not found in extent tree Incorrect local backref count on 151198437376 root 277 owner 36396 offset 1015808 found 1 wanted 0 back 0x99bfd40
Re: BTRFS: Transaction aborted (error -5)
Hi, [32079.815291] BTRFS info (device sdd1): disk space caching is enabled [32082.419524] BTRFS: sdd1 checksum verify failed on 588447744 wanted F90C810B found 6E0D3115 level 0 [32114.418433] BTRFS: sdd1 checksum verify failed on 588447744 wanted F90C810B found 6E0D3115 level 0 [32125.951446] BTRFS: sdd1 checksum verify failed on 588447744 wanted F90C810B found 6E0D3115 level 0 [32125.959497] BTRFS: sdd1 checksum verify failed on 588447744 wanted F90C810B found 24492BB3 level 0 Well I'm no expert, but it seems suspicious to me it doesn't find what it wants on a particular block twice, but then on the 3rd attempt it finds something different on the same block which also isn't what it wants. So that sounds like a device problem to me. Is this an SSD? What are your mount options (are you using discard)? And what's the metadata profile, is it single or DUP? I'm gonna guess it's an SSD with single copy of metadata which is why this isn't self-correcting. So I finished testing the drive using 'badblocks -n -s -v' (the non-destructive read-write mode). It came back clean, no bad blocks found. This I did with the entire drive unmounted. Yet, still, the file system reports the errors shortly after mounting. (See below) This drive is an older spinning type drive. This is the drive as reported by 'lsscsi': [3:0:0:0]diskATA WDC WD1001FALS-0 1D05 /dev/sdd Newegg lists it as a 'Western Digital WD Black WD1001FALS 1TB 7200 RPM 32MB Cache SATA 3.0Gb/s 3.5 Internal Hard Drive Bare Drive' The disk is attached to the system via this, as reported by 'lspci': 01:09.0 RAID bus controller: Silicon Image, Inc. SiI 3124 PCI-X Serial ATA Controller (rev 02) (Not sure why it lists it as a raid controller or a pci-x controller, as I used it a simple sata controller and it plugs into a regular 32bit pci slot). The motherboard is a Micro-Star MS-6570, with an AMD Athlon XP 3000+ (2171 MHz) processor and 2GB of RAM. Mount options are only: noatime BTRFS Profile is: # btrfs fi df /var/lib/ceph/osd/ceph-1/ Data, single: total=185.01GiB, used=183.39GiB System, DUP: total=8.00MiB, used=48.00KiB System, single: total=4.00MiB, used=0.00B Metadata, DUP: total=1.00GiB, used=367.19MiB Metadata, single: total=8.00MiB, used=0.00B GlobalReserve, single: total=128.00MiB, used=0.00B [162288.768747] BTRFS info (device sdc1): disk space caching is enabled [162290.463003] BTRFS info (device sdd1): disk space caching is enabled [162335.594094] BTRFS: sdd1 checksum verify failed on 588447744 wanted F90C810B found 6E0D3115 level 0 [162335.595476] BTRFS: sdd1 checksum verify failed on 588447744 wanted F90C810B found 6E0D3115 level 0 [162335.602066] BTRFS: sdd1 checksum verify failed on 588447744 wanted F90C810B found 24492BB3 level 0 [162335.602075] [ cut here ] [162335.602085] WARNING: CPU: 0 PID: 31841 at fs/btrfs/super.c:260 __btrfs_abort_transaction+0x43/0x110() [162335.602086] BTRFS: Transaction aborted (error -5) [162335.602087] Modules linked in: iscsi_trgt(O) [162335.602094] CPU: 0 PID: 31841 Comm: btrfs-cleaner Tainted: G O 3.18.1-gentoo-20150104-0921 #1 [162335.602096] Hardware name:/MS-6570, BIOS 6.00 PG 11/07/2003 [162335.602097] e68a5e68 e68a5e68 e68a5e28 c14e48a4 e68a5e58 c10345a0 c15cbefc e68a5e84 [162335.602101] 7c61 c15d895b 0104 c11cff13 c11cff13 fffb f4d23800 c150d330 [162335.602104] e68a5e70 c10345ee 0009 e68a5e68 c15cbefc e68a5e84 e68a5e9c c11cff13 [162335.602108] Call Trace: [162335.602117] [c14e48a4] dump_stack+0x16/0x18 [162335.602122] [c10345a0] warn_slowpath_common+0x70/0x90 [162335.602125] [c11cff13] ? __btrfs_abort_transaction+0x43/0x110 [162335.602127] [c11cff13] ? __btrfs_abort_transaction+0x43/0x110 [162335.602130] [c10345ee] warn_slowpath_fmt+0x2e/0x30 [162335.602133] [c11cff13] __btrfs_abort_transaction+0x43/0x110 [162335.602138] [c11ea884] btrfs_run_delayed_refs.part.73+0xd4/0x1d0 [162335.602140] [c11ea98f] btrfs_run_delayed_refs+0xf/0x20 [162335.602143] [c11f96f4] btrfs_should_end_transaction+0x34/0x50 [162335.602146] [c11e8ef9] btrfs_drop_snapshot+0x1c9/0x740 [162335.602149] [c11fb152] btrfs_clean_one_deleted_snapshot+0x62/0x90 [162335.602152] [c11f2a49] cleaner_kthread+0xd9/0x110 [162335.602155] [c11f2970] ? btrfs_destroy_pinned_extent+0x120/0x120 [162335.602160] [c1047415] kthread+0x95/0xb0 [162335.602164] [c14e9100] ret_from_kernel_thread+0x20/0x30 [162335.602166] [c1047380] ? kthread_worker_fn+0xb0/0xb0 [162335.602168] ---[ end trace ba640116f371d2ff ]--- [162335.602171] BTRFS: error (device sdd1) in btrfs_run_delayed_refs:2792: errno=-5 IO failure [162335.602173] BTRFS info (device sdd1): forced readonly -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS: Transaction aborted (error -5)
Hi, BTRFS check on /dev/sdc1 reveals everything looks ok: # btrfs check /dev/sdc1 Checking filesystem on /dev/sdc1 UUID: 26ed1033-429a-444f-97cc-ce8103db4c39 checking extents checking free space cache checking fs roots checking csums checking root refs found 195515710524 bytes used err is 0 total csum bytes: 205915200 total tree bytes: 407355392 total fs tree bytes: 94830592 total extent tree bytes: 31588352 btree space waste bytes: 100867438 file data blocks allocated: 537492316160 referenced 195656101888 Btrfs v3.18 (/dev/sdd1 and /dev/sdc1 are the only two btrfs file systems in this machine). Oddly, when the problem with /dev/sdd1 started, problems with /dev/sdc1 were also reported, but /dev/sdc1 managed to fix itself. Below is the complete dmesg output from when problems first started until /dev/sdd1 went readonly with errors. The strangest part of all of this, is that the dmesg output shows no errors about the drive being physically bad. (I ran badblocks -nsv on both /dev/sdd and /dev/sdc, and it confirmed 0 bad blocks for both drives). [25581.099684] BTRFS: sdd1 checksum verify failed on 521797632 wanted 8F2F5FEC found 3E879EFE level 0 [25581.105441] BTRFS: read error corrected: ino 1 off 521797632 (dev /dev/sdd1 sector 1035520) [25581.105612] BTRFS: read error corrected: ino 1 off 521801728 (dev /dev/sdd1 sector 1035528) [25581.105784] BTRFS: read error corrected: ino 1 off 521805824 (dev /dev/sdd1 sector 1035536) [25581.105956] BTRFS: read error corrected: ino 1 off 521809920 (dev /dev/sdd1 sector 1035544) [2.799514] BTRFS: sdd1 checksum verify failed on 680296448 wanted AB0E191F found 192D4134 level 0 [2.856199] BTRFS: read error corrected: ino 1 off 680296448 (dev /dev/sdd1 sector 1345088) [2.860571] BTRFS: read error corrected: ino 1 off 680300544 (dev /dev/sdd1 sector 1345096) [2.909634] BTRFS: read error corrected: ino 1 off 680304640 (dev /dev/sdd1 sector 1345104) [2.909876] BTRFS: read error corrected: ino 1 off 680308736 (dev /dev/sdd1 sector 1345112) [29292.777237] BTRFS: sdc1 checksum verify failed on 937738240 wanted F4196CDA found AF30B394 level 0 [29292.778022] BTRFS: sdc1 checksum verify failed on 937738240 wanted F4196CDA found AF30B394 level 0 [29292.781889] BTRFS: read error corrected: ino 1 off 937738240 (dev /dev/sdc1 sector 1847904) [29292.782054] BTRFS: read error corrected: ino 1 off 937742336 (dev /dev/sdc1 sector 1847912) [29292.782224] BTRFS: read error corrected: ino 1 off 937746432 (dev /dev/sdc1 sector 1847920) [29292.782399] BTRFS: read error corrected: ino 1 off 937750528 (dev /dev/sdc1 sector 1847928) [29691.731107] BTRFS: sdd1 checksum verify failed on 610877440 wanted 5A8006E7 found 1CFE4A20 level 0 [29691.791550] BTRFS: read error corrected: ino 1 off 610877440 (dev /dev/sdd1 sector 1209504) [29691.793252] BTRFS: read error corrected: ino 1 off 610881536 (dev /dev/sdd1 sector 1209512) [29691.793608] BTRFS: read error corrected: ino 1 off 610885632 (dev /dev/sdd1 sector 1209520) [29691.793797] BTRFS: read error corrected: ino 1 off 610889728 (dev /dev/sdd1 sector 1209528) [34626.017914] BTRFS: sdd1 checksum verify failed on 737181696 wanted 15D7099D found B6A2A7A9 level 0 [34626.022656] BTRFS: read error corrected: ino 1 off 737181696 (dev /dev/sdd1 sector 1456192) [34626.022867] BTRFS: read error corrected: ino 1 off 737185792 (dev /dev/sdd1 sector 1456200) [34626.023107] BTRFS: read error corrected: ino 1 off 737189888 (dev /dev/sdd1 sector 1456208) [34626.023314] BTRFS: read error corrected: ino 1 off 737193984 (dev /dev/sdd1 sector 1456216) [37057.349996] BTRFS: sdc1 checksum verify failed on 701792256 wanted A7BD5067 found 87EF0602 level 0 [37057.424920] BTRFS: read error corrected: ino 1 off 701792256 (dev /dev/sdc1 sector 1387072) [37057.425178] BTRFS: read error corrected: ino 1 off 701796352 (dev /dev/sdc1 sector 1387080) [37057.450174] BTRFS: read error corrected: ino 1 off 701800448 (dev /dev/sdc1 sector 1387088) [37057.453476] BTRFS: read error corrected: ino 1 off 701804544 (dev /dev/sdc1 sector 1387096) [38283.714855] BTRFS: sdd1 checksum verify failed on 190169088 wanted 27D1E032 found 585B1651 level 0 [38283.715349] BTRFS: sdd1 checksum verify failed on 190169088 wanted 27D1E032 found 585B1651 level 0 [38283.724140] BTRFS: read error corrected: ino 1 off 190169088 (dev /dev/sdd1 sector 387808) [38283.724313] BTRFS: read error corrected: ino 1 off 190173184 (dev /dev/sdd1 sector 387816) [38283.724485] BTRFS: read error corrected: ino 1 off 190177280 (dev /dev/sdd1 sector 387824) [38283.724648] BTRFS: read error corrected: ino 1 off 190181376 (dev /dev/sdd1 sector 387832) [38385.874438] BTRFS: sdd1 checksum verify failed on 472825856 wanted 937078F5 found 7FCB4F87 level 0 [38385.897113] BTRFS: read error corrected: ino 1 off 472825856 (dev /dev/sdd1 sector 939872) [38385.897336] BTRFS: read error corrected: ino 1 off 472829952 (dev /dev/sdd1 sector 939880)
[PATCH] fstests: add generic test for fsync after unlink
This test is motivated by an fsync issue discovered in btrfs. The issue was that after fsyncing an inode that got its link count decremented, and the new link count is greater than zero, after the fsync log replay the inode's parent directory metadata became inconsistent - it had a wrong i_size which prevented the directory from ever being removed (rmdir always failed with -ENOTEMPTY, even if the directory had no more child inodes). The btrfs issue was fixed by the following linux kernel patch: Btrfs: fix directory inconsistency after fsync log replay Signed-off-by: Filipe Manana fdman...@suse.com --- tests/generic/039 | 102 ++ tests/generic/039.out | 2 + tests/generic/group | 1 + 3 files changed, 105 insertions(+) create mode 100755 tests/generic/039 create mode 100644 tests/generic/039.out diff --git a/tests/generic/039 b/tests/generic/039 new file mode 100755 index 000..85646f9 --- /dev/null +++ b/tests/generic/039 @@ -0,0 +1,102 @@ +#! /bin/bash +# FS QA Test No. 039 +# +# This test is motivated by an fsync issue discovered in btrfs. +# The issue was that after fsyncing an inode that got its link count +# decremented, and the new link count is greater than zero, after the +# fsync log replay the inode's parent directory metadata became +# inconsistent - it had a wrong i_size which prevented the directory +# from ever being removed (rmdir always failed with -ENOTEMPTY, even +# if the directory had no more child inodes). +# +# The btrfs issue was fixed by the following linux kernel patch: +# +#Btrfs: fix directory inconsistency after fsync log replay +# +#--- +# Copyright (C) 2014 SUSE Linux Products GmbH. All Rights Reserved. +# Author: Filipe Manana fdman...@suse.com +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +#--- +# + +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo QA output created by $seq + +here=`pwd` +status=1 # failure is the default! + +_cleanup() +{ + _cleanup_flakey +} +trap _cleanup; exit \$status 0 1 2 3 15 + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter +. ./common/dmflakey + +# real QA test starts here +_supported_fs generic +_supported_os Linux +_need_to_be_root +_require_scratch +_require_dm_flakey + +rm -f $seqres.full + +_scratch_mkfs $seqres.full 21 + +_init_flakey +_mount_flakey + +# Create a test file with 2 hard links. +mkdir -p $SCRATCH_MNT/a/b +echo hello world $SCRATCH_MNT/a/b/foo +ln $SCRATCH_MNT/a/b/foo $SCRATCH_MNT/a/b/bar + +# Make sure all metadata and data are durably persisted. +sync + +# Now remove one of the hard links and fsync the inode. +rm -f $SCRATCH_MNT/a/b/bar +$XFS_IO_PROG -c fsync $SCRATCH_MNT/a/b/foo + +# Simulate a crash/power loss. This makes sure the next mount +# will see an fsync log and will replay that log. + +_load_flakey_table $FLAKEY_DROP_WRITES +_unmount_flakey + +_load_flakey_table $FLAKEY_ALLOW_WRITES +_mount_flakey + +# Remove the last hard link of the file and attempt to remove its parent +# directory - this failed in btrfs because the fsync log and replay code +# didn't decrement the parent directory's i_size - this made the btrfs +# rmdir implementation always fail with -ENOTEMPTY. +# +# The parent directory's metadata inconsistency was also detected by btrfs' +# fsck tool, which is run automatically by the fstests framework when the +# test finishes. +rm -f $SCRATCH_MNT/a/b/foo +rmdir $SCRATCH_MNT/a/b +rmdir $SCRATCH_MNT/a + +echo Silence is golden +status=0 +exit diff --git a/tests/generic/039.out b/tests/generic/039.out new file mode 100644 index 000..d4e7ef6 --- /dev/null +++ b/tests/generic/039.out @@ -0,0 +1,2 @@ +QA output created by 039 +Silence is golden diff --git a/tests/generic/group b/tests/generic/group index 1e89848..6af5a1a 100644 --- a/tests/generic/group +++ b/tests/generic/group @@ -41,6 +41,7 @@ 036 auto aio rw stress 037 metadata auto quick 038 auto stress +039 metadata auto quick 053 acl repair auto quick 062 attr udf auto quick 068 other auto freeze dangerous stress -- 2.1.3 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at
[PATCH] Btrfs: fix directory inconsistency after fsync log replay
If we have an inode (file) with a link count greater than 1, remove one of its hard links and, fsync the inode, power fail/crash and then replay the fsync log on the next mount, we end up getting the parent directory's metadata inconsistent - its i_size still reflects the deleted hard link. This prevents the directory from ever being deletable, as its i_size can never decrease to BTRFS_EMPTY_DIR_SIZE even if all of its children inodes are deleted. This is easy to reproduce with the following excerpt from a test case for xfstests that I just made (and it passes with xfs and ext4): mkdir $SCRATCH_MNT/testdir echo hello world $SCRATCH_MNT/testdir/foo ln $SCRATCH_MNT/testdir/foo $SCRATCH_MNT/testdir/bar # Make sure all metadata and data are durably persisted. sync # Now remove one of the hard links and fsync the inode. rm -f $SCRATCH_MNT/testdir/bar $XFS_IO_PROG -c fsync $SCRATCH_MNT/testdir/foo # Simulate a crash/power loss. This makes sure the next mount # will see an fsync log and will replay that log. _load_flakey_table $FLAKEY_DROP_WRITES _unmount_flakey _load_flakey_table $FLAKEY_ALLOW_WRITES _mount_flakey # Remove the last hard link of the file and attempt to remove its parent # directory - this failed in btrfs because the fsync log and replay code # didn't decrement the parent directory's i_size - this made the btrfs # rmdir implementation always fail with -ENOTEMPTY. # # The parent directory's metadata inconsistency was also detected by btrfs' # fsck tool, which is run automatically by the fstests framework when the # test finishes. rm -f $SCRATCH_MNT/testdir/foo rmdir $SCRATCH_MNT/testdir To fix this just make sure that on unlink, if the inode's link count is greater than 1 and its parent inode is not yet in the fsync log, we end up logging the parent inode. Signed-off-by: Filipe Manana fdman...@suse.com --- fs/btrfs/tree-log.c | 20 ++-- 1 file changed, 18 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index 9a02da1..1d65a46 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -4272,6 +4272,9 @@ static int btrfs_log_inode_parent(struct btrfs_trans_handle *trans, struct dentry *old_parent = NULL; int ret = 0; u64 last_committed = root-fs_info-last_trans_committed; + const struct dentry * const first_parent = parent; + const bool did_unlink = (BTRFS_I(inode)-last_unlink_trans +last_committed); sb = inode-i_sb; @@ -4327,7 +4330,6 @@ static int btrfs_log_inode_parent(struct btrfs_trans_handle *trans, goto end_trans; } - inode_only = LOG_INODE_EXISTS; while (1) { if (!parent || !parent-d_inode || sb != parent-d_inode-i_sb) break; @@ -4336,8 +4338,22 @@ static int btrfs_log_inode_parent(struct btrfs_trans_handle *trans, if (root != BTRFS_I(inode)-root) break; + /* +* On unlink we must make sure our immediate parent directory +* inode is fully logged. This is to prevent leaving dangling +* directory index entries and a wrong directory inode's i_size. +* Not doing so can result in a directory being impossible to +* delete after log replay (rmdir will always fail with error +* -ENOTEMPTY). +*/ + if (did_unlink parent == first_parent) + inode_only = LOG_INODE_ALL; + else + inode_only = LOG_INODE_EXISTS; + if (BTRFS_I(inode)-generation - root-fs_info-last_trans_committed) { + root-fs_info-last_trans_committed || + inode_only == LOG_INODE_ALL) { ret = btrfs_log_inode(trans, root, inode, inode_only, 0, LLONG_MAX, ctx); if (ret) -- 2.1.3 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 0/3] Btrfs: Enhancment for qgroup.
Hi Yang, On 2015/01/05 15:16, Dongsheng Yang wrote: Hi Josef and others, This patch set is about enhancing qgroup. [1/3]: fix a bug about qgroup leak when we exceed quota limit, It is reviewd by Josef. [2/3]: introduce a new accounter in qgroup to close a window where user will exceed the limit by qgroup. It looks good to Josef. [3/3]: a new patch to fix a bug reported by Satoru. I tested your the patchset v3. Although it's far better than the patchset v2, there is still one problem in this patchset. When I wrote 1.5GiB to a subvolume with 1.0 GiB limit, 1.0GiB - 139 block (in this case, 1KiB/block) was written. I consider user should be able to write just 1.0GiB in this case. * Test result === + mkfs.btrfs -f /dev/vdb Btrfs v3.17 See http://btrfs.wiki.kernel.org for more information. Turning ON incompat feature 'extref': increased hardlink limit per file to 65536 fs created label (null) on /dev/vdb nodesize 16384 leafsize 16384 sectorsize 4096 size 30.00GiB + mount /dev/vdb /root/btrfs-auto-test/ + ret=0 + btrfs quota enable /root/btrfs-auto-test/ + btrfs subvolume create /root/btrfs-auto-test//sub Create subvolume '/root/btrfs-auto-test/sub' + btrfs qgroup limit 1G /root/btrfs-auto-test//sub + dd if=/dev/zero of=/root/btrfs-auto-test//sub/file bs=1024 count=150 dd: error writing '/root/btrfs-auto-test//sub/file': Disk quota exceeded 1048438+0 records in# Tried to write 1GiB - 138 KiB 1048437+0 records out # Succeeded to write 1GiB - 139 KiB 1073599488 bytes (1.1 GB) copied, 19.0247 s, 56.4 MB/s === * note I tried to run the reproducer five times and the result is a bit different for each time. = # Written - 1 1GiB - 139 KiB 2 1GiB - 139 KiB 3 1GiB - 145 KiB 4 1GiB - 135 KiB 5 1GiB - 135 KiB == So I consider it's a problem comes from timing. If I changed the block size from 1KiB to 1 MiB, the difference in bytes got larger. # Written 1 1GiB - 1 MiB 2 1GiB - 1 MiB 3 1GiB - 1 MiB 4 1GiB - 1 MiB 5 1GiB - 1 MiB Thanks, Satoru BTW, I have some other plan about qgroup in my TODO list: Kernel: a). adjust the accounters in parent qgroup when we move the child qgroup. Currently, when we move a qgroup, the parent qgroup will not updated at the same time. This will cause some wrong numbers in qgroup. b). add a ioctl to show the qgroup info. Command btrfs qgroup show is showing the qgroup info read from qgroup tree. But there is some information in memory which is not synced into device. Then it will show some outdate number. c). limit and account size in 3 modes, data, metadata and both. qgroup is accounting the size both of data and metadata togather, but to a user, the data size is the most useful to them. d). remove a subvolume related qgroup when subvolume is deleted and there is no other reference to it. user-tool: a). Add the unit of B/K/M/G to btrfs qgroup show. b). get the information via ioctl rather than reading it from btree. Will keep the old way as a fallback for compatiblity. Any comment and sugguestion is welcome. :) Yang Dongsheng Yang (3): Btrfs: qgroup: free reserved in exceeding quota. Btrfs: qgroup: Introduce a may_use to account space_info-bytes_may_use. Btrfs: qgroup, Account data space in more proper timings. fs/btrfs/extent-tree.c | 41 +++--- fs/btrfs/file.c| 9 --- fs/btrfs/inode.c | 18 - fs/btrfs/qgroup.c | 68 +++--- fs/btrfs/qgroup.h | 4 +++ 5 files changed, 117 insertions(+), 23 deletions(-) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: clear bio reference after submit_one_bio()
Hi Naota, On 2015/01/06 1:01, Naohiro Aota wrote: After submit_one_bio(), `bio' can go away. However submit_extent_page() leave `bio' referable if submit_one_bio() failed (e.g. -ENOMEM on OOM). It will cause invalid paging request when submit_extent_page() is called next time. I reproduced ENOMEM case with the following script (need CONFIG_FAIL_PAGE_ALLOC, and CONFIG_FAULT_INJECTION_DEBUG_FS). I confirmed that this problem reproduce with 3.19-rc3 and not reproduce with 3.19-rc3 with your patch. Tested-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com Thank you for reporting this problem with the reproducer and fixing it too. NOTE: I used v3.19-rc3's tools/testing/fault-injection/failcmd.sh for the following ./failcmd.sh. ./failcmd.sh -p $percent -t $times -i $interval \ --ignore-gfp-highmem=N --ignore-gfp-wait=N --min-order=0 \ -- \ cat $directory/file /dev/null * 3.19-rc1 + your patch === # ./run 512+0 records in 512+0 records out # === * 3.19-rc3 === # ./run 512+0 records in 512+0 records out [ 188.433726] run (776): drop_caches: 1 [ 188.682372] FAULT_INJECTION: forcing a failure. name fail_page_alloc, interval 100, probability 111000, space 0, times 3 [ 188.689986] CPU: 0 PID: 954 Comm: cat Not tainted 3.19.0-rc3-ktest #1 [ 188.693834] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 [ 188.698466] 0064 88007b343618 816e5563 88007fc0fc78 [ 188.702730] 81c655c0 88007b343638 813851b5 0010 [ 188.707043] 0002 88007b343768 81188126 88007b3435a8 [ 188.711283] Call Trace: [ 188.712620] [816e5563] dump_stack+0x45/0x57 [ 188.715330] [813851b5] should_fail+0x135/0x140 [ 188.718218] [81188126] __alloc_pages_nodemask+0xd6/0xb30 [ 188.721567] [81339075] ? blk_rq_map_sg+0x35/0x170 [ 188.724558] [a0010705] ? virtio_queue_rq+0x145/0x2b0 [virtio_blk] [ 188.728191] [a01bd00f] ? btrfs_submit_compressed_read+0xcf/0x4d0 [btrfs] [ 188.732079] [811d99fb] ? kmem_cache_alloc+0x1cb/0x230 [ 188.735153] [81181265] ? mempool_alloc_slab+0x15/0x20 [ 188.738188] [811cee1a] alloc_pages_current+0x9a/0x120 [ 188.741153] [a01bd0e9] btrfs_submit_compressed_read+0x1a9/0x4d0 [btrfs] [ 188.744835] [a0178621] btrfs_submit_bio_hook+0x1c1/0x1d0 [btrfs] [ 188.748225] [a018b7b3] ? lookup_extent_mapping+0x13/0x20 [btrfs] [ 188.751547] [a0179c08] ? btrfs_get_extent+0x98/0xad0 [btrfs] [ 188.754656] [a01901d7] submit_one_bio+0x67/0xa0 [btrfs] [ 188.757554] [a0193f27] submit_extent_page.isra.35+0xd7/0x1c0 [btrfs] [ 188.760981] [a019509d] __do_readpage+0x31d/0x7b0 [btrfs] [ 188.763920] [a0195f10] ? btrfs_create_repair_bio+0x110/0x110 [btrfs] [ 188.767382] [a0179b70] ? btrfs_submit_direct+0x7b0/0x7b0 [btrfs] [ 188.770671] [a018f88d] ? btrfs_lookup_ordered_range+0x13d/0x180 [btrfs] [ 188.774366] [a01958ca] __extent_readpages.constprop.42+0x2ba/0x2d0 [btrfs] [ 188.778031] [a0179b70] ? btrfs_submit_direct+0x7b0/0x7b0 [btrfs] [ 188.781241] [a01969b9] extent_readpages+0x169/0x1b0 [btrfs] [ 188.784322] [a0179b70] ? btrfs_submit_direct+0x7b0/0x7b0 [btrfs] [ 188.789014] [a0176b0f] btrfs_readpages+0x1f/0x30 [btrfs] [ 188.792028] [8118bf5c] __do_page_cache_readahead+0x18c/0x1f0 [ 188.795078] [8118c09f] ondemand_readahead+0xdf/0x260 [ 188.797702] [a016c5df] ? btrfs_congested_fn+0x5f/0xa0 [btrfs] [ 188.800718] [8118c291] page_cache_async_readahead+0x71/0xa0 [ 188.803650] [8118017f] generic_file_read_iter+0x40f/0x5e0 [ 188.806480] [811f43be] new_sync_read+0x7e/0xb0 [ 188.808832] [811f55d8] __vfs_read+0x18/0x50 [ 188.811068] [811f569a] vfs_read+0x8a/0x140 [ 188.813298] [811f5796] SyS_read+0x46/0xb0 [ 188.815486] [81125806] ? __audit_syscall_exit+0x1f6/0x2a0 [ 188.818293] [816eb8e9] system_call_fastpath+0x12/0x17 [ 188.821005] BUG: unable to handle kernel paging request at 0001000c [ 188.821984] IP: [a01901b3] submit_one_bio+0x43/0xa0 [btrfs] [ 188.821984] PGD 7bad3067 PUD 0 [ 188.821984] Oops: [#1] SMP [ 188.821984] Modules linked in: ip6table_filter ip6_tables ebtable_nat ebtables bnep bluetooth rfkill btrfs xor raid6_pq microcode 8139too serio_raw virtio_balloon 8139cp mii nfsd auth_rpcgss nfs_acl lockd grace sunrpc virtio_blk ata_generic pata_acpi [ 188.821984] CPU: 1 PID: 954 Comm: cat Not tainted 3.19.0-rc3-ktest #1 [
[PATCH] Btrfs: lookup for block group only if needed when freeing a tree block
Very often our extent buffer's header generation doesn't match the current transaction's id or it is also referenced by other trees (snapshots), so we don't need the corresponding block group cache object. Therefore only search for it if we are going to use it, so we avoid an unnecessary search in the block groups rbtree (and acquiring and releasing its spinlock). Freeing a tree block is performed when COWing or deleting a node/leaf, which implies we are holding the node/leaf's parent node lock, therefore reducing the amount of time spent when freeing a tree block helps reducing the amount of time we are holding the parent node's lock. For example, for a run of xfstests/generic/083, the block group cache object was needed only 682 times for a total of 226691 calls to free a tree block. Signed-off-by: Filipe Manana fdman...@suse.com --- fs/btrfs/extent-tree.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index a80b971..5a45253 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -6205,7 +6205,6 @@ void btrfs_free_tree_block(struct btrfs_trans_handle *trans, struct extent_buffer *buf, u64 parent, int last_ref) { - struct btrfs_block_group_cache *cache = NULL; int pin = 1; int ret; @@ -6221,17 +6220,20 @@ void btrfs_free_tree_block(struct btrfs_trans_handle *trans, if (!last_ref) return; - cache = btrfs_lookup_block_group(root-fs_info, buf-start); - if (btrfs_header_generation(buf) == trans-transid) { + struct btrfs_block_group_cache *cache; + if (root-root_key.objectid != BTRFS_TREE_LOG_OBJECTID) { ret = check_ref_cleanup(trans, root, buf-start); if (!ret) goto out; } + cache = btrfs_lookup_block_group(root-fs_info, buf-start); + if (btrfs_header_flag(buf, BTRFS_HEADER_FLAG_WRITTEN)) { pin_down_extent(root, cache, buf-start, buf-len, 1); + btrfs_put_block_group(cache); goto out; } @@ -6239,6 +6241,7 @@ void btrfs_free_tree_block(struct btrfs_trans_handle *trans, btrfs_add_free_space(cache, buf-start, buf-len); btrfs_update_reserved_bytes(cache, buf-len, RESERVE_FREE, 0); + btrfs_put_block_group(cache); trace_btrfs_reserved_extent_free(root, buf-start, buf-len); pin = 0; } @@ -6253,7 +6256,6 @@ out: * anymore. */ clear_bit(EXTENT_BUFFER_CORRUPT, buf-bflags); - btrfs_put_block_group(cache); } /* Can return -ENOMEM */ -- 2.1.3 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: qgroup: move WARN_ON() to the correct location.
On 2015/01/06 21:54, Dongsheng Yang wrote: In function qgroup_excl_accounting(), we need to WARN when qg-excl is less than what we want to free, same to child and parents. But currently, for parent qgroup, the WARN_ON() is located after freeing qg-excl. It will WARN out even we free it normally. This patch move this WARN_ON() before freeing qg-excl. Signed-off-by: Dongsheng Yang yangds.f...@cn.fujitsu.com Reviewed-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com --- fs/btrfs/qgroup.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index 48b60db..97159a8 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -1431,9 +1431,8 @@ static int qgroup_excl_accounting(struct btrfs_fs_info *fs_info, qgroup = u64_to_ptr(unode-aux); qgroup-rfer += sign * oper-num_bytes; qgroup-rfer_cmpr += sign * oper-num_bytes; + WARN_ON(sign 0 qgroup-excl oper-num_bytes); qgroup-excl += sign * oper-num_bytes; - if (sign 0) - WARN_ON(qgroup-excl oper-num_bytes); qgroup-excl_cmpr += sign * oper-num_bytes; qgroup_dirty(fs_info, qgroup); -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Data recovery after RBD I/O error
On Mon, Jan 5, 2015 at 6:59 AM, Austin S Hemmelgarn ahferro...@gmail.com wrote: Secondly, I would highly recommend not using ANY non-cluster-aware FS on top of a clustered block device like RBD For my use-case, this is just a single server using the RBD device. No clustering involved on the BTRFS side of thing. However, it was really useful to take snapshots (just like LVM) before modifying the filesystem in any way. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel BUG at /home/apw/COD/linux/fs/btrfs/inode.c:3123!
Hi Tomasz, On 2014/12/20 8:28, Tomasz Chmielewski wrote: Get this BUG with 3.18.1 (pasted at the bottom of the email). Below all actions from creating the fs to BUG. I did not attempt to reproduce. I tried to reproduce this problem and have some questions. # mkfs.btrfs /dev/vdb Btrfs v3.17.3 See http://btrfs.wiki.kernel.org for more information. Turning ON incompat feature 'extref': increased hardlink limit per file to 65536 fs created label (null) on /dev/vdb nodesize 16384 leafsize 16384 sectorsize 4096 size 256.00GiB # mount -o noatime /dev/vdb /mnt/test/ # cd /mnt/test # btrfs sub cre subvolume Create subvolume './subvolume' # dd if=/dev/urandom of=bigfile.img bs=64k Does it really this command? I consider it will fill up whole /dev/vdb. And is it not subvolume/bigfile.img but bigfile.img? ^C91758+0 records in 91757+0 records out 6013386752 bytes (6.0 GB) copied, 374.777 s, 16.0 MB/s # btrfs sub list /mnt/test/ ID 257 gen 16 top level 5 path subvolume # btrfs quota enable /mnt/test # btrfs qgroup show /mnt/test qgroupid rfer excl 0/5 16384 16384 0/2576013403136 6013403136 # dd if=/dev/urandom of=bigfile2.img bs=64k ^C47721+0 records in 47720+0 records out 3127377920 bytes (3.1 GB) copied, 194.641 s, 16.1 MB/s If bigfile.img is just under /mnt/test, I can't understand why this command succeeded to write more 3 GiB. # btrfs qgroup show /mnt/test qgroupid rfer excl 0/5 16384 16384 0/2578704049152 8704049152 root@srv2:/mnt/test/subvolume# sync root@srv2:/mnt/test/subvolume# btrfs qgroup show /mnt/test qgroupid rfer excl 0/5 16384 16384 0/2579140781056 9140781056 # dd if=/dev/urandom of=bigfile3.img bs=64k ^C3617580+0 records in 3617579+0 records out 237081657344 bytes (237 GB) copied, 14796 s, 16.0 MB/s It's too. Thanks, Satoru # df -h Filesystem Size Used Avail Use% Mounted on (...) /dev/vdb256G 230G 25G 91% /mnt/test # btrfs qgroup show /mnt/test qgroupid rfer excl 0/5 1638416384 0/257245960245248 245960245248 # ls -l total 240451584 -rw-r--r-- 1 root root 3127377920 Dec 19 20:06 bigfile2.img -rw-r--r-- 1 root root 237081657344 Dec 20 00:15 bigfile3.img -rw-r--r-- 1 root root 6013386752 Dec 19 20:02 bigfile.img # rm bigfile3.img # sync # dmesg (...) [ 95.055420] BTRFS: device fsid 97f98279-21e7-4822-89be-3aed9dc05f2c devid 1 transid 3 /dev/vdb [ 118.446509] BTRFS info (device vdb): disk space caching is enabled [ 118.446518] BTRFS: flagging fs with big metadata feature [ 118.452176] BTRFS: creating UUID tree [ 575.189412] BTRFS info (device vdb): qgroup scan completed [15948.234826] [ cut here ] [15948.234883] kernel BUG at /home/apw/COD/linux/fs/btrfs/inode.c:3123! [15948.234906] invalid opcode: [#1] SMP [15948.234925] Modules linked in: nf_log_ipv6 ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables nf_log_ipv4 nf_log_common xt_LOG ipt_REJECT nf_reject_ipv4 xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables x_tables dm_crypt btrfs xor crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel ppdev aes_x86_64 lrw raid6_pq gf128mul glue_helper ablk_helper cryptd serio_raw mac_hid pvpanic 8250_fintek parport_pc i2c_piix4 lp parport psmouse qxl ttm floppy drm_kms_helper drm [15948.235172] CPU: 0 PID: 3274 Comm: btrfs-cleaner Not tainted 3.18.1-031801-generic #201412170637 [15948.235193] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 [15948.235222] task: 880036708a00 ti: 88007b97c000 task.ti: 88007b97c000 [15948.235240] RIP: 0010:[c0458ec9] [c0458ec9] btrfs_orphan_add+0x1a9/0x1c0 [btrfs] [15948.235305] RSP: 0018:88007b97fc98 EFLAGS: 00010286 [15948.235318] RAX: ffe4 RBX: 88007b80a800 RCX: [15948.235333] RDX: 219e RSI: 0004 RDI: 880079418138 [15948.235349] RBP: 88007b97fcd8 R08: 88007fc1cae0 R09: 88007ad272d0 [15948.235366] R10: R11: 0010 R12: 88007a2d9500 [15948.235381] R13: 8800027d60e0 R14: 88007b80ac58 R15: 0001 [15948.235401] FS: () GS:88007fc0() knlGS: [15948.235418] CS: 0010 DS: ES: CR0: 80050033 [15948.235432] CR2: 7f0489ff CR3: 7a5e CR4: 001407f0 [15948.235464] Stack: [15948.235473] 88007b97fcd8 c0497acf 88007b809800 88003c207400 [15948.235498] 88007b809800 88007ad272d0 88007a2d9500 0001 [15948.235521] 88007b97fd58 c04412e0 880079418000 0004c0427fea [15948.235551] Call Trace: [15948.235601] [c0497acf] ?
Re: btrfs_inode_item's otime?
On Mon, Jan 05, 2015 at 06:21:52PM +0100, Lennart Poettering wrote: btrfs' btrfs_inode_item structure contains a field for the birth time of a file, .otime. This field could be quite useful, and I'd like to make use of it. I can query it with the BTRFS_IOC_TREE_SEARCH ioctl from userspace, alas it appears that the entry is never actually initialized to anything other than 0? Is this on purpose, or simply an oversight? It should be easy to initialize it to the mtime when the inode is first created... I'ts probably just lack of implementation due to lack of interface to userspace, but we should set it. I am aware of the discussions about introducing the birth time as something queriable with a future xstat() call. Even if that high-level API doesn't exist yet, and even if it might be messy to use BTRFS_IOC_TREE_SEARCH to query the otime currently, I think it would be good to properly initialize the field, so that pre-existing file systems would report useful data when xstat() is added one day... Agreed. (Of course, even without xstat(), I think it would be good to have an unprivileged ioctl to query the otime in btrfs... the TREE_SEARCH ioctl after all requires privileges...) Adding this interface is a different question. I do not like to add ioctls that do too specialized things that normally fit into a generic interface like the xstat example. We could use the object properties instead (ie. export the otime as an extended attribute), but the work on that has stalled and it's not ready to just simply add the otime in advance. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs_inode_item's otime?
On Tue, Jan 06, 2015 at 11:43:22PM +1100, Chris Samuel wrote: On Tue, 6 Jan 2015 10:47:00 PM Chris Samuel wrote: On Mon, 5 Jan 2015 06:21:52 PM Lennart Poettering wrote: It should be easy to initialize it to the mtime when the inode is first created... This I agree with, well worth doing anyway. I'll see if I can knock up a patch. Sadly it appears that the btrfs code sets mtime/ctime/atimeat inode creation via the normal filesystem inode structure, not through it's own, and as that doesn't include otime I'm afraid it's out of my league. Worth a shot though! Set the otime in btrfs_new_inode after the call to fill_inode_item. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html