Re: Scrub causes oom after removal of failed disk (linux 3.10)
On 07/08/2013 11:36 PM, David Sterba wrote: On Wed, Jul 03, 2013 at 08:35:48PM +0200, Torbjørn wrote: Hi btrfs devs, I have a btrfs raid10 array consisting of 2TB drives. I added a new drive to the array, then balanced. The balance failed after ~50GB was moved to the new drive. The balance fixed lots of errors according to dmesg. Server rebooted The newly added drive were no longer detected as a btrfs disk. The array was then mounted -o recovery I ran btrfs dev del missing, and everything seemed to be fine. After this I ran a scrub on the array. The scrub was soon stopped by the oom-killer. After another reboot I started a new scrub. About 3TB into the scrub over 10 GB of memory was being consumed. The scrub had then fixed roughly 3,000,000 errors. Canceling the scrub and resuming it frees the 10 GB of memory. Thanks for the report. This looks like the same problem that was fixed by https://patchwork.kernel.org/patch/2697501/ Btrfs: free csums when we're done scrubbing an extent but I don't see it included in the current for-linus branch. We want this in the 3.10.x stable series and according to stable tree policy it has to be merged into Linus' tree first. david -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Ok, thanks -- Torbjørn -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: Cleanup for using BTRFS_SETGET_STACK instead of raw convert
于 2013年07月08日 21:24, David Sterba 写道: On Fri, Jul 05, 2013 at 10:01:30AM +0800, Qu Wenruo wrote: --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -2807,6 +2846,14 @@ static inline unsigned long btrfs_leaf_data(struct extent_buffer *l) /* struct btrfs_file_extent_item */ BTRFS_SETGET_FUNCS(file_extent_type, struct btrfs_file_extent_item, type, 8); +BTRFS_SETGET_STACK_FUNCS(stack_file_extent_bytenr, +struct btrfs_file_extent_item, disk_bytenr, 64); Please use the struct member name in the macro name, ie. stack_file_extent_disk_bytenr Thanks for pointing out the problem. I'll change the macro name and send a new patch. otherwise ok, Reviewed-by: David Sterba -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- - Qu Wenruo Development Dept.I Nanjing Fujitsu Nanda Software Tech. Co., Ltd.(FNST) No. 6 Wenzhu Road, Nanjing, 210012, China TEL: +86+25-86630566-8526 COINS: 7998-8526 FAX: +86+25-83317685 MAIL: quwen...@cn.fujitsu.com - -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs: stat(2) and /proc/pid/maps returns different devices
On Thu, Jul 04, 2013 at 01:51:38PM +0400, Andrew Vagin wrote: > We are not first who suffer from this problem: > https://bugzilla.redhat.com/show_bug.cgi?id=711881 > http://marc.info/?l=linux-btrfs&m=130074451403261 > https://bugzilla.openvz.org/show_bug.cgi?id=2653 > And about 2 years ago Mark Fasheh tried to fix this problem: > http://thr3ads.net/btrfs-devel/2011/05/2346176-RFC-PATCH-0-2-btrfs-vfs-Return-same-device-in-stat-2-and-proc-pid-maps > > Eric Biederman sugested to not create a new method and use vfs_getattr, > but here is a few problems: > * fanotify doesn't have dentry, but its fdinfo contains device. > * vfs_getattr can fail and which device should be shown in this case? > * vfs_getattr gets much more parameters, so here is a question about > performance degradation. > > So I have a question: Can two inodes from different subvolumes have > equal inode numbers? Yes, subvolumes are separate inode number spaces. > If someone have any suggestions how to fix this problem or any > explanation why this is not a problem at all, please write here. The xstat syscall instead of the potentially heavyweight vfs_getattr could fix that, but it's not merged. For suse kernels we've taken the hackish approach of patching fs/proc/task_mmu.c:show_map_vma() (and the nommu variant) and use vfs_getattr only for btrfs. http://kernel.opensuse.org/cgit/kernel-source/tree/patches.suse/btrfs-use-correct-device-for-maps.patch?id=2434fa6ee93a83b117461eb13f24272606677fec Only a temporary and not upstreamable solution, but without it the core packaging tool zypper would not work correctly. david -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Scrub causes oom after removal of failed disk (linux 3.10)
On Wed, Jul 03, 2013 at 08:35:48PM +0200, Torbjørn wrote: > Hi btrfs devs, > > I have a btrfs raid10 array consisting of 2TB drives. > > I added a new drive to the array, then balanced. > The balance failed after ~50GB was moved to the new drive. > The balance fixed lots of errors according to dmesg. > > Server rebooted > > The newly added drive were no longer detected as a btrfs disk. > The array was then mounted -o recovery > I ran btrfs dev del missing, and everything seemed to be fine. > > After this I ran a scrub on the array. > The scrub was soon stopped by the oom-killer. > > After another reboot I started a new scrub. > About 3TB into the scrub over 10 GB of memory was being consumed. > The scrub had then fixed roughly 3,000,000 errors. > > Canceling the scrub and resuming it frees the 10 GB of memory. Thanks for the report. This looks like the same problem that was fixed by https://patchwork.kernel.org/patch/2697501/ Btrfs: free csums when we're done scrubbing an extent but I don't see it included in the current for-linus branch. We want this in the 3.10.x stable series and according to stable tree policy it has to be merged into Linus' tree first. david -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Does balance implicitly defrag?
On 8 July 2013 22:12, David Sterba wrote: > On Tue, Jul 02, 2013 at 02:28:25PM +0100, Paul Richards wrote: >> When I run "btrfs filesystem balance", does this implicitly >> defragment the filesystem? (Assuming there is plenty free space) > > In some sense it defragments the filesystem. The blockgroups are moved > around based on the balance filter. Here a blockgroup consists of > various and unrelated file extents. Extents representing a single file > are not made contiguous, this is done by 'btrfs file defrag'. > > I think that the final layout of the 1G-chunks is not necessarily > contiguous, ie. it depends on the chunk-allocator, the time of the > allocation request and there's nothing like "preallocate 30G for whole > balance and put all data together because the space is available". > Okay, I think I understand. It sounds like balance will fix external fragmentation (within the block groups, up to the 1G-chunk granularity), and will not fix any internal fragmentation. Thanks. :) -- Paul Richards @pauldoo -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Does balance implicitly defrag?
On Tue, Jul 02, 2013 at 02:28:25PM +0100, Paul Richards wrote: > When I run "btrfs filesystem balance", does this implicitly > defragment the filesystem? (Assuming there is plenty free space) In some sense it defragments the filesystem. The blockgroups are moved around based on the balance filter. Here a blockgroup consists of various and unrelated file extents. Extents representing a single file are not made contiguous, this is done by 'btrfs file defrag'. I think that the final layout of the 1G-chunks is not necessarily contiguous, ie. it depends on the chunk-allocator, the time of the allocation request and there's nothing like "preallocate 30G for whole balance and put all data together because the space is available". david -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Qemu disk images on BTRFS suffer checksum errors
Am 08.07.2013 15:20, schrieb Josef Bacik: > On Mon, Jul 08, 2013 at 10:08:46AM +0200, Thomas Kuther wrote: >> Hello, >> >> I'm about to migrate from VirtualBox to Qemu+VGA-Passthrough. All my virtual >> disk images are stored in a BTRFS subvolume on-top of a MDRAID 1. >> The host runs kernel 3.10, and Qemu 1.5.1. The Testing-VM is a Windows 7 >> 64bit, using a RAW virtio disk with cache=none, same happens for qcow2, >> though. >> >> Using VirtualBox and in the past Vmware workstation I never had issues with >> corrupted diskimages, but now with Qemu all tries ended up with lots of >> errors like: >> >> [ 4871.863009] BTRFS info (device md10): csum failed ino 687 off 46213922816 >> csum 3817758510 private 402306600 >> [ 4872.481013] BTRFS info (device md10): csum failed ino 687 off 46213922816 >> csum 3817758510 private 402306600 >> [ 4904.055514] BTRFS info (device md10): csum failed ino 687 off 46213922816 >> csum 4060166193 private 402306600 >> [ 4904.748130] BTRFS info (device md10): csum failed ino 687 off 46213922816 >> csum 4060166193 private 402306600 >> [ 4904.987540] BTRFS info (device md10): csum failed ino 687 off 46213922816 >> csum 3817758510 private 402306600 >> [ 4905.024700] BTRFS info (device md10): csum failed ino 687 off 46213922816 >> csum 3817758510 private 402306600 >> [ 4932.497793] BTRFS info (device md10): csum failed ino 687 off 46213922816 >> csum 4060166193 private 402306600 >> [ 4932.533634] BTRFS info (device md10): csum failed ino 687 off 46213922816 >> csum 4060166193 private 402306600 >> >> Trying to copy the disk image elsewhere causes I/O errors at some point. >> >> I found a thread about the issue >> (http://comments.gmane.org/gmane.comp.file-systems.btrfs/20538) and also a >> bug report against Qemu from Josef Bacik describing the exact same problem: >> https://bugzilla.redhat.com/show_bug.cgi?id=693530 - Josef states it should >> be fixed since quite a while. >> >> Is this a regression in BTRFS, a problem with my setup (md raid1 layer below >> btrfs), or (still) a bug in Qemu? >> Would cache=writethrough or writeback be an option with BTRFS? >> > > So there were two aspects to that bug, one is the thing I describe where we > get > the same buffer for two parts of an iovec on reads. That part has been fixed. > The second part is where the application will modify the page while it's in > flight, and that hasn't been fixed. We have a few options here > > 1) Always double buffer direct io. Kind of defeats the purpose of direct io. > > 2) Check the buffer after we've written it to see if it matches the csum we > put > down, if not double buffer it and send it down again. This makes you checksum > the page twice and punishes O_DIRECT users that behave. > > I opted for #3 and let this sort of thing happen. So you can get around it by > doing nodatacow for that particular image which will disable checksumming for > just that file, or you can use cache=writethrough/writeback and that will use > buffered io. FYI this doesn't happen on _all_ qemu, just on guest OS'es that > don't provide stable pages, so Windows or like old RHEL versions that are on > ext3. Thanks, > > Josef > Thanks very much for the explanation, Josef. I opted for 3), too. Used chattr +C on the directory that is meant for holding the qemu image(s), and re-created the RAW image in there (so it has nodatacow flag set now) So far, no issues. Perfect. Thanks again. ~Tom -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs-progs: Fix automatic prerequisite generation
On Mon, Jul 08, 2013 at 06:46:07PM +0200, David Sterba wrote: > On Sat, Jul 06, 2013 at 05:22:31PM +0900, Kusanagi Kouichi wrote: > > Some files don't compile because of insufficient prerequisite. > > > > $ make btrfs > > ... > > [CC] btrfs.o > > btrfs.c:24:21: fatal error: version.h: No such file or directory > > #include "version.h" > > ^ > > compilation terminated. > > make: *** [btrfs.o] Error 1 > > Works great, thanks. I've noticed that DEPFLAGS is now unused and > removed it. Two more small things: the files were previously named .something.o.d and now they're something.d . * .gitignore needs to be updated * I'd prefer to keep the suffix .o.d, the leading . has go away due to the macro tricks I'll fold the following patch if you don't mind --- a/.gitignore +++ b/.gitignore @@ -1,6 +1,6 @@ *.o *.static.o -.*.o.d +*.o.d version.h version man/*.gz diff --git a/Makefile b/Makefile index 70880bc..849e3fe 100644 --- a/Makefile +++ b/Makefile @@ -76,8 +76,8 @@ else check = true endif -%.d: %.c - $(Q)$(CC) -MM -MG -MF $@ -MT $(@:.d=.o) -MT $(@:.d=.static.o) -MT $@ $(AM_CFLAGS) $(CFLAGS) $< +%.o.d: %.c + $(Q)$(CC) -MM -MG -MF $@ -MT $(@:.o.d=.o) -MT $(@:.o.d=.static.o) -MT $@ $(AM_CFLAGS) $(CFLAGS) $< .c.o: $(Q)$(check) $< @@ -186,7 +186,7 @@ install-man: clean : @echo "Cleaning" - $(Q)rm -f $(progs) cscope.out *.o *.d btrfs-convert btrfs-image btrfs-select-super \ + $(Q)rm -f $(progs) cscope.out *.o *.o.d btrfs-convert btrfs-image btrfs-select-super \ btrfs-zero-log btrfstune dir-test ioctl-test quick-test send-test btrfsck \ btrfs.static mkfs.btrfs.static btrfs-calc-size \ version.h \ @@ -205,5 +205,5 @@ install: $(libs) $(progs) install-man $(INSTALL) -m644 $(headers) $(DESTDIR)$(incdir) ifneq ($(MAKECMDGOALS),clean) --include $(objects:.o=.d) $(cmd-objects:.o=.d) $(subst .btrfs,, $(filter-out btrfsck.d, $(progs:=.d))) +-include $(objects:.o=.o.d) $(cmd-objects:.o=.o.d) $(subst .btrfs,, $(filter-out btrfsck.o.d, $(progs:=.o.d))) endif -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs-progs: Fix automatic prerequisite generation
On Sat, Jul 06, 2013 at 05:22:31PM +0900, Kusanagi Kouichi wrote: > Some files don't compile because of insufficient prerequisite. > > $ make btrfs > ... > [CC] btrfs.o > btrfs.c:24:21: fatal error: version.h: No such file or directory > #include "version.h" > ^ > compilation terminated. > make: *** [btrfs.o] Error 1 Works great, thanks. I've noticed that DEPFLAGS is now unused and removed it. david -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Btrfs-progs: remove incorrect slot decrement
Ok, sorry. I'm not used to git send-email and related best practices. Thanks for letting me know. On Mon, Jul 8, 2013 at 5:14 PM, David Sterba wrote: > On Fri, Jul 05, 2013 at 09:43:34PM +0100, Filipe David Borba Manana wrote: >> In btrfs_set_block_flags() we want to check if the slot >> in the leaf points to the first item in the leaf - if it >> doesn't check if the previous item in the leaf is an extent >> item. By removing this extra slot decrement we are indeed >> checking the item right before the slot, and not the second >> item before. >> >> V2: Added Josef Bacik's review mention. > > Please don't do that. It's ok to increase version if you update code or > changelog/subject, but not just the tags. > > thanks, > david -- Filipe David Manana, "Reasonable men adapt themselves to the world. Unreasonable men adapt the world to themselves. That's why all progress depends on unreasonable men." -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Btrfs-progs: remove incorrect slot decrement
On Fri, Jul 05, 2013 at 09:43:34PM +0100, Filipe David Borba Manana wrote: > In btrfs_set_block_flags() we want to check if the slot > in the leaf points to the first item in the leaf - if it > doesn't check if the previous item in the leaf is an extent > item. By removing this extra slot decrement we are indeed > checking the item right before the slot, and not the second > item before. > > V2: Added Josef Bacik's review mention. Please don't do that. It's ok to increase version if you update code or changelog/subject, but not just the tags. thanks, david -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
btrfsck: cmds-check.c:2063: check_owner_ref: Assertion `!(rec->is_root)' failed.
Here is a wrap-up of what is going on on #btrfs. I am trying to boot my debian/squeeze system with btrfs filesystem. System was setup with an initial kernel 2.6.32. The system now refuses to boot. Booting using a live knopix 7.2.0 (kernel 3.9.6), I can see: $ sudo vgchange -ay /dev/voxbox $ mkdir /tmp/sda1 $ sudo mount -v -t btrfs -o recovery /dev/mapper/voxbox-root /tmp/sda1 $ dmesg | tail [...] [13429.460451] device fsid af7e6809-af9d-474f-a332-295cdba1c09f devid 1 transid 2207957 /dev/mapper/voxbox-root [13429.461584] btrfs: enabling auto recovery [13429.529165] parent transid verify failed on 201236885504 wanted 1822936 found 2125997 [13429.532014] parent transid verify failed on 201236885504 wanted 1822936 found 2125997 [13429.587204] btrfs: open_ctree failed If I now use the cmd line tools from git: $ git clone git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git $ cd btrfs-progs/ $ make $ ./btrfsck /dev/mapper/voxbox-root parent transid verify failed on 201236885504 wanted 1822936 found 2125997 parent transid verify failed on 201236885504 wanted 1822936 found 2125997 parent transid verify failed on 201236885504 wanted 1822936 found 2125997 parent transid verify failed on 201236885504 wanted 1822936 found 2125997 Ignoring transid failure Checking filesystem on /dev/mapper/voxbox-root UUID: af7e6809-af9d-474f-a332-295cdba1c09f checking extents btrfsck: cmds-check.c:2063: check_owner_ref: Assertion `!(rec->is_root)' failed. Aborted For reference: $ sudo vgdisplay --- Volume group --- VG Name voxbox System ID Formatlvm2 Metadata Areas2 Metadata Sequence No 6 VG Access read/write VG Status resizable MAX LV0 Cur LV3 Open LV 0 Max PV0 Cur PV2 Act PV2 VG Size 2.73 TiB PE Size 4.00 MiB Total PE 715398 Alloc PE / Size 715398 / 2.73 TiB Free PE / Size 0 / 0 VG UUID mxbe5Y-Wjh1-fC2C-WS27-Fy0X-2JB6-PvrA0Q Regards, -- Mathieu -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: Cleanup for using BTRFS_SETGET_STACK instead of raw convert
On Fri, Jul 05, 2013 at 10:01:30AM +0800, Qu Wenruo wrote: > --- a/fs/btrfs/ctree.h > +++ b/fs/btrfs/ctree.h > @@ -2807,6 +2846,14 @@ static inline unsigned long btrfs_leaf_data(struct > extent_buffer *l) > > /* struct btrfs_file_extent_item */ > BTRFS_SETGET_FUNCS(file_extent_type, struct btrfs_file_extent_item, type, 8); > +BTRFS_SETGET_STACK_FUNCS(stack_file_extent_bytenr, > + struct btrfs_file_extent_item, disk_bytenr, 64); Please use the struct member name in the macro name, ie. stack_file_extent_disk_bytenr otherwise ok, Reviewed-by: David Sterba -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs crashes
On Mon, Jul 08, 2013 at 08:46:17AM +0200, Franziska Näpelt wrote: > Hi everybody, > > we are using a btrfs RAID 1 with four 2TB hard drives on a Debian 7.1 > (Kernel 3.9.6). > > After about one year of working, there was an error in messages log and the > filesystem was mounted read-only. > > After that I restarted the system but that doesn't fix the bug. The > btrfs-filesystem couldn't be mounted. > > I attach four logs: > > -> when the error occured > -> after the error occured > -> when rebooting the system > -> when I tried ti mount manually > > Before I rebuilt the filesystem (formatting everything and creating a new > btrfs-pool) I made a btrfs-image. I can provide it to you. > Can you file a bugzilla at bugzilla.kernel.org (make sure the component is set to btrfs) with all of this information and a link to the image, and the output of btrfsck? Please use the most recent version of btrfs-progs git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 5/5] Btrfs-progs: don't ignore errors in extent-tree.c
Several function return values were being completely ignored. V2: Added more error checking all over extent-tree.c Signed-off-by: Filipe David Borba Manana --- extent-tree.c | 20 +--- 1 file changed, 13 insertions(+), 7 deletions(-) diff --git a/extent-tree.c b/extent-tree.c index 6490d8a..6458ce4 100644 --- a/extent-tree.c +++ b/extent-tree.c @@ -2110,6 +2110,7 @@ static int finish_current_insert(struct btrfs_trans_handle *trans, extent_op->flags, &extent_op->key, extent_op->level, &key); + BUG_ON(ret); } else { BUG_ON(1); } @@ -2756,7 +2757,7 @@ static int alloc_reserved_tree_block(struct btrfs_trans_handle *trans, ret = update_block_group(trans, root, ins->objectid, root->leafsize, 1, 0); - return 0; + return ret; } static int alloc_tree_block(struct btrfs_trans_handle *trans, @@ -3345,12 +3346,14 @@ btrfs_add_block_group(struct btrfs_fs_info *fs_info, u64 bytes_used, u64 type, BUG_ON(ret); bit = block_group_state_bits(type); - set_extent_bits(block_group_cache, chunk_offset, - chunk_offset + size - 1, - bit | EXTENT_LOCKED, GFP_NOFS); + ret = set_extent_bits(block_group_cache, chunk_offset, + chunk_offset + size - 1, + bit | EXTENT_LOCKED, GFP_NOFS); + BUG_ON(ret); - set_state_private(block_group_cache, chunk_offset, - (unsigned long)cache); + ret = set_state_private(block_group_cache, chunk_offset, + (unsigned long)cache); + BUG_ON(ret); set_avail_alloc_bits(fs_info, type); return cache; @@ -3372,8 +3375,11 @@ int btrfs_make_block_group(struct btrfs_trans_handle *trans, sizeof(cache->item)); BUG_ON(ret); - finish_current_insert(trans, extent_root); + ret = finish_current_insert(trans, extent_root); + BUG_ON(ret); ret = del_pending_extents(trans, extent_root); + BUG_ON(ret); + return 0; } -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Qemu disk images on BTRFS suffer checksum errors
On Mon, Jul 08, 2013 at 10:08:46AM +0200, Thomas Kuther wrote: > Hello, > > I'm about to migrate from VirtualBox to Qemu+VGA-Passthrough. All my virtual > disk images are stored in a BTRFS subvolume on-top of a MDRAID 1. > The host runs kernel 3.10, and Qemu 1.5.1. The Testing-VM is a Windows 7 > 64bit, using a RAW virtio disk with cache=none, same happens for qcow2, > though. > > Using VirtualBox and in the past Vmware workstation I never had issues with > corrupted diskimages, but now with Qemu all tries ended up with lots of > errors like: > > [ 4871.863009] BTRFS info (device md10): csum failed ino 687 off 46213922816 > csum 3817758510 private 402306600 > [ 4872.481013] BTRFS info (device md10): csum failed ino 687 off 46213922816 > csum 3817758510 private 402306600 > [ 4904.055514] BTRFS info (device md10): csum failed ino 687 off 46213922816 > csum 4060166193 private 402306600 > [ 4904.748130] BTRFS info (device md10): csum failed ino 687 off 46213922816 > csum 4060166193 private 402306600 > [ 4904.987540] BTRFS info (device md10): csum failed ino 687 off 46213922816 > csum 3817758510 private 402306600 > [ 4905.024700] BTRFS info (device md10): csum failed ino 687 off 46213922816 > csum 3817758510 private 402306600 > [ 4932.497793] BTRFS info (device md10): csum failed ino 687 off 46213922816 > csum 4060166193 private 402306600 > [ 4932.533634] BTRFS info (device md10): csum failed ino 687 off 46213922816 > csum 4060166193 private 402306600 > > Trying to copy the disk image elsewhere causes I/O errors at some point. > > I found a thread about the issue > (http://comments.gmane.org/gmane.comp.file-systems.btrfs/20538) and also a > bug report against Qemu from Josef Bacik describing the exact same problem: > https://bugzilla.redhat.com/show_bug.cgi?id=693530 - Josef states it should > be fixed since quite a while. > > Is this a regression in BTRFS, a problem with my setup (md raid1 layer below > btrfs), or (still) a bug in Qemu? > Would cache=writethrough or writeback be an option with BTRFS? > So there were two aspects to that bug, one is the thing I describe where we get the same buffer for two parts of an iovec on reads. That part has been fixed. The second part is where the application will modify the page while it's in flight, and that hasn't been fixed. We have a few options here 1) Always double buffer direct io. Kind of defeats the purpose of direct io. 2) Check the buffer after we've written it to see if it matches the csum we put down, if not double buffer it and send it down again. This makes you checksum the page twice and punishes O_DIRECT users that behave. I opted for #3 and let this sort of thing happen. So you can get around it by doing nodatacow for that particular image which will disable checksumming for just that file, or you can use cache=writethrough/writeback and that will use buffered io. FYI this doesn't happen on _all_ qemu, just on guest OS'es that don't provide stable pages, so Windows or like old RHEL versions that are on ext3. Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Qemu disk images on BTRFS suffer checksum errors
Hello, I'm about to migrate from VirtualBox to Qemu+VGA-Passthrough. All my virtual disk images are stored in a BTRFS subvolume on-top of a MDRAID 1. The host runs kernel 3.10, and Qemu 1.5.1. The Testing-VM is a Windows 7 64bit, using a RAW virtio disk with cache=none, same happens for qcow2, though. Using VirtualBox and in the past Vmware workstation I never had issues with corrupted diskimages, but now with Qemu all tries ended up with lots of errors like: [ 4871.863009] BTRFS info (device md10): csum failed ino 687 off 46213922816 csum 3817758510 private 402306600 [ 4872.481013] BTRFS info (device md10): csum failed ino 687 off 46213922816 csum 3817758510 private 402306600 [ 4904.055514] BTRFS info (device md10): csum failed ino 687 off 46213922816 csum 4060166193 private 402306600 [ 4904.748130] BTRFS info (device md10): csum failed ino 687 off 46213922816 csum 4060166193 private 402306600 [ 4904.987540] BTRFS info (device md10): csum failed ino 687 off 46213922816 csum 3817758510 private 402306600 [ 4905.024700] BTRFS info (device md10): csum failed ino 687 off 46213922816 csum 3817758510 private 402306600 [ 4932.497793] BTRFS info (device md10): csum failed ino 687 off 46213922816 csum 4060166193 private 402306600 [ 4932.533634] BTRFS info (device md10): csum failed ino 687 off 46213922816 csum 4060166193 private 402306600 Trying to copy the disk image elsewhere causes I/O errors at some point. I found a thread about the issue (http://comments.gmane.org/gmane.comp.file-systems.btrfs/20538) and also a bug report against Qemu from Josef Bacik describing the exact same problem: https://bugzilla.redhat.com/show_bug.cgi?id=693530 - Josef states it should be fixed since quite a while. Is this a regression in BTRFS, a problem with my setup (md raid1 layer below btrfs), or (still) a bug in Qemu? Would cache=writethrough or writeback be an option with BTRFS? Thanks in advance for any input. Best regards, Thomas -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 13/13] btrfs-progs: fix memory leaks of device_list_add()
memory allocated by device_list_add has to be freed, the function introduced here device_list_remove() would just go that. however the challenging part is about where we would call this function. there are two ways its handled the threads calling open_ctree_broken(), open_ctree() and open_ctree_fd() (which leads call to device_list_add) would anyway call close_ctree() so we put device_list_remove() there which will take care of freeing memory. now for threads calling device_list_add() outside of open_ctree(), has to call device_list_remove() separately which can be called as a last function in the thread this patch just does that. device_list_remove accepts() NULL (deletes entire list) or fsid (which would delete only the fsid matched in the device_fs list). As of now though all calling functions use NULL, I see potential that we would use fsid when we have to create a single device list using both the device-tree and from the btrfs-kernel. further, mkfs.c thread should call device_list_remove() as well, however mkfs.c uses a lot of in-flight exits() which makes it very difficult to bring in this fix into mkfs.c. I shall be doing it in a separate patch. Signed-off-by: Anand Jain --- cmds-device.c |4 +++ cmds-filesystem.c | 11 - cmds-replace.c|2 + cmds-scrub.c |3 ++ disk-io.c |1 + mkfs.c|1 + utils.h |1 + volumes.c | 56 - volumes.h |3 ++ 9 files changed, 79 insertions(+), 3 deletions(-) diff --git a/cmds-device.c b/cmds-device.c index 9525fcf..e4a1f1b 100644 --- a/cmds-device.c +++ b/cmds-device.c @@ -29,6 +29,7 @@ #include "utils.h" #include "commands.h" +#include "volumes.h" /* FIXME - imported cruft, fix sparse errors and warnings */ #ifdef __CHECKER__ @@ -128,6 +129,7 @@ static int cmd_add_dev(int argc, char **argv) } close(fdmnt); + device_list_remove(NULL); if (ret) return ret+20; else @@ -213,6 +215,7 @@ static int cmd_scan_dev(int argc, char **argv) int ret; printf("Scanning for Btrfs filesystems\n"); ret = scan_for_btrfs(where, 1); + device_list_remove(NULL); if (ret){ fprintf(stderr, "ERROR: error %d while scanning\n", ret); return 18; @@ -397,6 +400,7 @@ static int cmd_dev_stats(int argc, char **argv) out: free(di_args); close(fdmnt); + device_list_remove(NULL); return err; } diff --git a/cmds-filesystem.c b/cmds-filesystem.c index a028e1d..1c26476 100644 --- a/cmds-filesystem.c +++ b/cmds-filesystem.c @@ -455,6 +455,7 @@ static int cmd_show(int argc, char **argv) print_one_uuid(fs_devices); } printf("%s\n", BTRFS_BUILD_VERSION); + device_list_remove(NULL); return 0; } @@ -683,13 +684,19 @@ static const char * const cmd_label_usage[] = { static int cmd_label(int argc, char **argv) { + int ret; if (check_argc_min(argc, 2) || check_argc_max(argc, 3)) usage(cmd_label_usage); if (argc > 2) - return set_label(argv[1], argv[2]); + ret = set_label(argv[1], argv[2]); else - return get_label(argv[1]); + ret = get_label(argv[1]); + + if (is_existing_blk_or_reg_file(argv[1])) + device_list_remove(NULL); + + return ret; } const struct cmd_group filesystem_cmd_group = { diff --git a/cmds-replace.c b/cmds-replace.c index c68986a..99a5abc 100644 --- a/cmds-replace.c +++ b/cmds-replace.c @@ -340,6 +340,7 @@ static int cmd_start_replace(int argc, char **argv) } } close(fdmnt); + device_list_remove(NULL); return 0; leave_with_error: @@ -349,6 +350,7 @@ leave_with_error: close(fdsrcdev); if (fddstdev != -1) close(fddstdev); + device_list_remove(NULL); return -1; } diff --git a/cmds-scrub.c b/cmds-scrub.c index 95dfee3..08aab54 100644 --- a/cmds-scrub.c +++ b/cmds-scrub.c @@ -1496,6 +1496,7 @@ out: unlink(sock_path); } close(fdmnt); + device_list_remove(NULL); if (err) return 1; @@ -1564,6 +1565,7 @@ static int cmd_scrub_cancel(int argc, char **argv) out: if (fdmnt != -1) close(fdmnt); + device_list_remove(NULL); return ret; } @@ -1722,6 +1724,7 @@ out: free(di_args); if (fdres > -1) close(fdres); + device_list_remove(NULL); return err; } diff --git a/disk-io.c b/disk-io.c index 3937e3f..9b72576 100644 --- a/disk-io.c +++ b/disk-io.c @@ -1338,6 +1338,7 @@ int close_ctree(struct btrfs_root *root) } close_all_devices(fs_info); + device_list_remove(NULL);