Re: Scrub causes oom after removal of failed disk (linux 3.10)

2013-07-08 Thread Torbjørn Skagestad

On 07/08/2013 11:36 PM, David Sterba wrote:

On Wed, Jul 03, 2013 at 08:35:48PM +0200, Torbjørn wrote:

Hi btrfs devs,

I have a btrfs raid10 array consisting of 2TB drives.

I added a new drive to the array, then balanced.
The balance failed after ~50GB was moved to the new drive.
The balance fixed lots of errors according to dmesg.

Server rebooted

The newly added drive were no longer detected as a btrfs disk.
The array was then mounted -o recovery
I ran btrfs dev del missing, and everything seemed to be fine.

After this I ran a scrub on the array.
The scrub was soon stopped by the oom-killer.

After another reboot I started a new scrub.
About 3TB into the scrub over 10 GB of memory was being consumed.
The scrub had then fixed roughly 3,000,000 errors.

Canceling the scrub and resuming it frees the 10 GB of memory.

Thanks for the report.

This looks like the same problem that was fixed by
https://patchwork.kernel.org/patch/2697501/
Btrfs: free csums when we're done scrubbing an extent

but I don't see it included in the current for-linus branch. We want
this in the 3.10.x stable series and according to stable tree policy it
has to be merged into Linus' tree first.

david
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Ok, thanks

--
Torbjørn
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: Cleanup for using BTRFS_SETGET_STACK instead of raw convert

2013-07-08 Thread Qu Wenruo

于 2013年07月08日 21:24, David Sterba 写道:

On Fri, Jul 05, 2013 at 10:01:30AM +0800, Qu Wenruo wrote:

--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2807,6 +2846,14 @@ static inline unsigned long btrfs_leaf_data(struct 
extent_buffer *l)
  
  /* struct btrfs_file_extent_item */

  BTRFS_SETGET_FUNCS(file_extent_type, struct btrfs_file_extent_item, type, 8);
+BTRFS_SETGET_STACK_FUNCS(stack_file_extent_bytenr,
+struct btrfs_file_extent_item, disk_bytenr, 64);

Please use the struct member name in the macro name, ie.
stack_file_extent_disk_bytenr

Thanks for pointing out the problem.
I'll change the macro name and send a new patch.



otherwise ok,
Reviewed-by: David Sterba 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html




--
-
Qu Wenruo
Development Dept.I
Nanjing Fujitsu Nanda Software Tech. Co., Ltd.(FNST)
No. 6 Wenzhu Road, Nanjing, 210012, China
TEL: +86+25-86630566-8526
COINS: 7998-8526
FAX: +86+25-83317685
MAIL: quwen...@cn.fujitsu.com
-

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs: stat(2) and /proc/pid/maps returns different devices

2013-07-08 Thread David Sterba
On Thu, Jul 04, 2013 at 01:51:38PM +0400, Andrew Vagin wrote:
> We are not first who suffer from this problem:
> https://bugzilla.redhat.com/show_bug.cgi?id=711881
> http://marc.info/?l=linux-btrfs&m=130074451403261
> https://bugzilla.openvz.org/show_bug.cgi?id=2653

> And about 2 years ago Mark Fasheh tried to fix this problem:
> http://thr3ads.net/btrfs-devel/2011/05/2346176-RFC-PATCH-0-2-btrfs-vfs-Return-same-device-in-stat-2-and-proc-pid-maps
> 
> Eric Biederman sugested to not create a new method and use vfs_getattr,
> but here is a few problems:
> * fanotify doesn't have dentry, but its fdinfo contains device.
> * vfs_getattr can fail and which device should be shown in this case?
> * vfs_getattr gets much more parameters, so here is a question about
>   performance degradation.
> 
> So I have a question: Can two inodes from different subvolumes have
> equal inode numbers?

Yes, subvolumes are separate inode number spaces.

> If someone have any suggestions how to fix this problem or any
> explanation why this is not a problem at all, please write here.

The xstat syscall instead of the potentially heavyweight vfs_getattr
could fix that, but it's not merged. For suse kernels we've taken the
hackish approach of patching fs/proc/task_mmu.c:show_map_vma() (and the
nommu variant) and use vfs_getattr only for btrfs.

http://kernel.opensuse.org/cgit/kernel-source/tree/patches.suse/btrfs-use-correct-device-for-maps.patch?id=2434fa6ee93a83b117461eb13f24272606677fec

Only a temporary and not upstreamable solution, but without it the core
packaging tool zypper would not work correctly.

david
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Scrub causes oom after removal of failed disk (linux 3.10)

2013-07-08 Thread David Sterba
On Wed, Jul 03, 2013 at 08:35:48PM +0200, Torbjørn wrote:
> Hi btrfs devs,
> 
> I have a btrfs raid10 array consisting of 2TB drives.
> 
> I added a new drive to the array, then balanced.
> The balance failed after ~50GB was moved to the new drive.
> The balance fixed lots of errors according to dmesg.
> 
> Server rebooted
> 
> The newly added drive were no longer detected as a btrfs disk.
> The array was then mounted -o recovery
> I ran btrfs dev del missing, and everything seemed to be fine.
> 
> After this I ran a scrub on the array.
> The scrub was soon stopped by the oom-killer.
> 
> After another reboot I started a new scrub.
> About 3TB into the scrub over 10 GB of memory was being consumed.
> The scrub had then fixed roughly 3,000,000 errors.
> 
> Canceling the scrub and resuming it frees the 10 GB of memory.

Thanks for the report.

This looks like the same problem that was fixed by
https://patchwork.kernel.org/patch/2697501/
Btrfs: free csums when we're done scrubbing an extent

but I don't see it included in the current for-linus branch. We want
this in the 3.10.x stable series and according to stable tree policy it
has to be merged into Linus' tree first.

david
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Does balance implicitly defrag?

2013-07-08 Thread Paul Richards
On 8 July 2013 22:12, David Sterba  wrote:
> On Tue, Jul 02, 2013 at 02:28:25PM +0100, Paul Richards wrote:
>> When I run "btrfs filesystem balance", does this implicitly
>> defragment the filesystem?  (Assuming there is plenty free space)
>
> In some sense it defragments the filesystem. The blockgroups are moved
> around based on the balance filter. Here a blockgroup consists of
> various and unrelated file extents. Extents representing a single file
> are not made contiguous, this is done by 'btrfs file defrag'.
>
> I think that the final layout of the 1G-chunks is not necessarily
> contiguous, ie. it depends on the chunk-allocator, the time of the
> allocation request and there's nothing like "preallocate 30G for whole
> balance and put all data together because the space is available".
>

Okay, I think I understand.  It sounds like balance will fix external
fragmentation (within the block groups, up to the 1G-chunk
granularity), and will not fix any internal fragmentation.

Thanks. :)




--
Paul Richards
@pauldoo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Does balance implicitly defrag?

2013-07-08 Thread David Sterba
On Tue, Jul 02, 2013 at 02:28:25PM +0100, Paul Richards wrote:
> When I run "btrfs filesystem balance", does this implicitly
> defragment the filesystem?  (Assuming there is plenty free space)

In some sense it defragments the filesystem. The blockgroups are moved
around based on the balance filter. Here a blockgroup consists of
various and unrelated file extents. Extents representing a single file
are not made contiguous, this is done by 'btrfs file defrag'.

I think that the final layout of the 1G-chunks is not necessarily
contiguous, ie. it depends on the chunk-allocator, the time of the
allocation request and there's nothing like "preallocate 30G for whole
balance and put all data together because the space is available".

david
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Qemu disk images on BTRFS suffer checksum errors

2013-07-08 Thread Thomas Kuther
Am 08.07.2013 15:20, schrieb Josef Bacik:
> On Mon, Jul 08, 2013 at 10:08:46AM +0200, Thomas Kuther wrote:
>> Hello,
>>
>> I'm about to migrate from VirtualBox to Qemu+VGA-Passthrough. All my virtual
>> disk images are stored in a BTRFS subvolume on-top of a MDRAID 1.
>> The host runs kernel 3.10, and Qemu 1.5.1. The Testing-VM is a Windows 7
>> 64bit, using a RAW virtio disk with cache=none, same happens for qcow2,
>> though.
>>
>> Using VirtualBox and in the past Vmware workstation I never had issues with
>> corrupted diskimages, but now with Qemu all tries ended up with lots of
>> errors like:
>>
>> [ 4871.863009] BTRFS info (device md10): csum failed ino 687 off 46213922816
>> csum 3817758510 private 402306600
>> [ 4872.481013] BTRFS info (device md10): csum failed ino 687 off 46213922816
>> csum 3817758510 private 402306600
>> [ 4904.055514] BTRFS info (device md10): csum failed ino 687 off 46213922816
>> csum 4060166193 private 402306600
>> [ 4904.748130] BTRFS info (device md10): csum failed ino 687 off 46213922816
>> csum 4060166193 private 402306600
>> [ 4904.987540] BTRFS info (device md10): csum failed ino 687 off 46213922816
>> csum 3817758510 private 402306600
>> [ 4905.024700] BTRFS info (device md10): csum failed ino 687 off 46213922816
>> csum 3817758510 private 402306600
>> [ 4932.497793] BTRFS info (device md10): csum failed ino 687 off 46213922816
>> csum 4060166193 private 402306600
>> [ 4932.533634] BTRFS info (device md10): csum failed ino 687 off 46213922816
>> csum 4060166193 private 402306600
>>
>> Trying to copy the disk image elsewhere causes I/O errors at some point.
>>
>> I found a thread about the issue
>> (http://comments.gmane.org/gmane.comp.file-systems.btrfs/20538) and also a
>> bug report against Qemu from Josef Bacik describing the exact same problem:
>> https://bugzilla.redhat.com/show_bug.cgi?id=693530 - Josef states it should
>> be fixed since quite a while.
>>
>> Is this a regression in BTRFS, a problem with my setup (md raid1 layer below
>> btrfs), or (still) a bug in Qemu?
>> Would cache=writethrough or writeback be an option with BTRFS?
>>
> 
> So there were two aspects to that bug, one is the thing I describe where we 
> get
> the same buffer for two parts of an iovec on reads.  That part has been fixed.
> The second part is where the application will modify the page while it's in
> flight, and that hasn't been fixed.  We have a few options here
> 
> 1) Always double buffer direct io.  Kind of defeats the purpose of direct io.
> 
> 2) Check the buffer after we've written it to see if it matches the csum we 
> put
> down, if not double buffer it and send it down again.  This makes you checksum
> the page twice and punishes O_DIRECT users that behave.
> 
> I opted for #3 and let this sort of thing happen.  So you can get around it by
> doing nodatacow for that particular image which will disable checksumming for
> just that file, or you can use cache=writethrough/writeback and that will use
> buffered io.  FYI this doesn't happen on _all_ qemu, just on guest OS'es that
> don't provide stable pages, so Windows or like old RHEL versions that are on
> ext3.  Thanks,
> 
> Josef
> 

Thanks very much for the explanation, Josef.

I opted for 3), too. Used chattr +C on the directory that is meant for
holding the qemu image(s), and re-created the RAW image in there (so it
has nodatacow flag set now)

So far, no issues. Perfect.

Thanks again.

~Tom


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: Fix automatic prerequisite generation

2013-07-08 Thread David Sterba
On Mon, Jul 08, 2013 at 06:46:07PM +0200, David Sterba wrote:
> On Sat, Jul 06, 2013 at 05:22:31PM +0900, Kusanagi Kouichi wrote:
> > Some files don't compile because of insufficient prerequisite.
> > 
> > $ make btrfs
> > ...
> > [CC] btrfs.o
> > btrfs.c:24:21: fatal error: version.h: No such file or directory
> >  #include "version.h"
> >  ^
> > compilation terminated.
> > make: *** [btrfs.o] Error 1
> 
> Works great, thanks. I've noticed that DEPFLAGS is now unused and
> removed it.

Two more small things: the files were previously named .something.o.d and
now they're something.d .

* .gitignore needs to be updated
* I'd prefer to keep the suffix .o.d, the leading . has go away due to
  the macro tricks

I'll fold the following patch if you don't mind

--- a/.gitignore
+++ b/.gitignore
@@ -1,6 +1,6 @@
 *.o
 *.static.o
-.*.o.d
+*.o.d
 version.h
 version
 man/*.gz
diff --git a/Makefile b/Makefile
index 70880bc..849e3fe 100644
--- a/Makefile
+++ b/Makefile
@@ -76,8 +76,8 @@ else
check = true
 endif

-%.d: %.c
-   $(Q)$(CC) -MM -MG -MF $@ -MT $(@:.d=.o) -MT $(@:.d=.static.o) -MT $@ 
$(AM_CFLAGS) $(CFLAGS) $<
+%.o.d: %.c
+   $(Q)$(CC) -MM -MG -MF $@ -MT $(@:.o.d=.o) -MT $(@:.o.d=.static.o) -MT 
$@ $(AM_CFLAGS) $(CFLAGS) $<

 .c.o:
$(Q)$(check) $<
@@ -186,7 +186,7 @@ install-man:

 clean :
@echo "Cleaning"
-   $(Q)rm -f $(progs) cscope.out *.o *.d btrfs-convert btrfs-image 
btrfs-select-super \
+   $(Q)rm -f $(progs) cscope.out *.o *.o.d btrfs-convert btrfs-image 
btrfs-select-super \
  btrfs-zero-log btrfstune dir-test ioctl-test quick-test send-test 
btrfsck \
  btrfs.static mkfs.btrfs.static btrfs-calc-size \
  version.h \
@@ -205,5 +205,5 @@ install: $(libs) $(progs) install-man
$(INSTALL) -m644 $(headers) $(DESTDIR)$(incdir)

 ifneq ($(MAKECMDGOALS),clean)
--include $(objects:.o=.d) $(cmd-objects:.o=.d) $(subst .btrfs,, $(filter-out 
btrfsck.d, $(progs:=.d)))
+-include $(objects:.o=.o.d) $(cmd-objects:.o=.o.d) $(subst .btrfs,, 
$(filter-out btrfsck.o.d, $(progs:=.o.d)))
 endif

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: Fix automatic prerequisite generation

2013-07-08 Thread David Sterba
On Sat, Jul 06, 2013 at 05:22:31PM +0900, Kusanagi Kouichi wrote:
> Some files don't compile because of insufficient prerequisite.
> 
> $ make btrfs
> ...
> [CC] btrfs.o
> btrfs.c:24:21: fatal error: version.h: No such file or directory
>  #include "version.h"
>  ^
> compilation terminated.
> make: *** [btrfs.o] Error 1

Works great, thanks. I've noticed that DEPFLAGS is now unused and
removed it.

david
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Btrfs-progs: remove incorrect slot decrement

2013-07-08 Thread Filipe David Manana
Ok, sorry.
I'm not used to git send-email and related best practices.

Thanks for letting me know.

On Mon, Jul 8, 2013 at 5:14 PM, David Sterba  wrote:
> On Fri, Jul 05, 2013 at 09:43:34PM +0100, Filipe David Borba Manana wrote:
>> In btrfs_set_block_flags() we want to check if the slot
>> in the leaf points to the first item in the leaf - if it
>> doesn't check if the previous item in the leaf is an extent
>> item. By removing this extra slot decrement we are indeed
>> checking the item right before the slot, and not the second
>> item before.
>>
>> V2: Added Josef Bacik's review mention.
>
> Please don't do that. It's ok to increase version if you update code or
> changelog/subject, but not just the tags.
>
> thanks,
> david



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Btrfs-progs: remove incorrect slot decrement

2013-07-08 Thread David Sterba
On Fri, Jul 05, 2013 at 09:43:34PM +0100, Filipe David Borba Manana wrote:
> In btrfs_set_block_flags() we want to check if the slot
> in the leaf points to the first item in the leaf - if it
> doesn't check if the previous item in the leaf is an extent
> item. By removing this extra slot decrement we are indeed
> checking the item right before the slot, and not the second
> item before.
> 
> V2: Added Josef Bacik's review mention.

Please don't do that. It's ok to increase version if you update code or
changelog/subject, but not just the tags.

thanks,
david
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfsck: cmds-check.c:2063: check_owner_ref: Assertion `!(rec->is_root)' failed.

2013-07-08 Thread Mathieu Malaterre
Here is a wrap-up of what is going on on #btrfs. I am trying to boot
my debian/squeeze system with btrfs filesystem. System was setup with
an initial kernel 2.6.32.
The system now refuses to boot.

Booting using a live knopix 7.2.0 (kernel 3.9.6), I can see:

$ sudo vgchange -ay /dev/voxbox
$ mkdir /tmp/sda1
$ sudo mount -v -t btrfs -o recovery /dev/mapper/voxbox-root /tmp/sda1
$ dmesg | tail
[...]
[13429.460451] device fsid af7e6809-af9d-474f-a332-295cdba1c09f devid
1 transid 2207957 /dev/mapper/voxbox-root
[13429.461584] btrfs: enabling auto recovery
[13429.529165] parent transid verify failed on 201236885504 wanted
1822936 found 2125997
[13429.532014] parent transid verify failed on 201236885504 wanted
1822936 found 2125997
[13429.587204] btrfs: open_ctree failed


If I now use the cmd line tools from git:

$ git clone git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git
$ cd btrfs-progs/
$ make
$ ./btrfsck /dev/mapper/voxbox-root
parent transid verify failed on 201236885504 wanted 1822936 found 2125997
parent transid verify failed on 201236885504 wanted 1822936 found 2125997
parent transid verify failed on 201236885504 wanted 1822936 found 2125997
parent transid verify failed on 201236885504 wanted 1822936 found 2125997
Ignoring transid failure
Checking filesystem on /dev/mapper/voxbox-root
UUID: af7e6809-af9d-474f-a332-295cdba1c09f
checking extents
btrfsck: cmds-check.c:2063: check_owner_ref: Assertion `!(rec->is_root)' failed.
Aborted


For reference:

$ sudo vgdisplay
  --- Volume group ---
  VG Name   voxbox
  System ID
  Formatlvm2
  Metadata Areas2
  Metadata Sequence No  6
  VG Access read/write
  VG Status resizable
  MAX LV0
  Cur LV3
  Open LV   0
  Max PV0
  Cur PV2
  Act PV2
  VG Size   2.73 TiB
  PE Size   4.00 MiB
  Total PE  715398
  Alloc PE / Size   715398 / 2.73 TiB
  Free  PE / Size   0 / 0
  VG UUID   mxbe5Y-Wjh1-fC2C-WS27-Fy0X-2JB6-PvrA0Q

Regards,
--
Mathieu
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: Cleanup for using BTRFS_SETGET_STACK instead of raw convert

2013-07-08 Thread David Sterba
On Fri, Jul 05, 2013 at 10:01:30AM +0800, Qu Wenruo wrote:
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -2807,6 +2846,14 @@ static inline unsigned long btrfs_leaf_data(struct 
> extent_buffer *l)
>  
>  /* struct btrfs_file_extent_item */
>  BTRFS_SETGET_FUNCS(file_extent_type, struct btrfs_file_extent_item, type, 8);
> +BTRFS_SETGET_STACK_FUNCS(stack_file_extent_bytenr,
> +  struct btrfs_file_extent_item, disk_bytenr, 64);

Please use the struct member name in the macro name, ie.
stack_file_extent_disk_bytenr

otherwise ok,
Reviewed-by: David Sterba 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs crashes

2013-07-08 Thread Josef Bacik
On Mon, Jul 08, 2013 at 08:46:17AM +0200, Franziska Näpelt wrote:
> Hi everybody,
> 
> we are using a btrfs RAID 1 with four 2TB hard drives on a Debian 7.1
> (Kernel 3.9.6).
> 
> After about one year of working, there was an error in messages log and the
> filesystem was mounted read-only.
> 
> After that I restarted the system but that doesn't fix the bug. The
> btrfs-filesystem couldn't be mounted.
> 
> I attach four logs:
> 
> -> when the error occured
> -> after the error occured
> -> when rebooting the system
> -> when I tried ti mount manually
> 
> Before I rebuilt the filesystem (formatting everything and creating a new
> btrfs-pool) I made a btrfs-image. I can provide it to you.
> 

Can you file a bugzilla at bugzilla.kernel.org (make sure the component is set
to btrfs) with all of this information and a link to the image, and the output
of btrfsck?  Please use the most recent version of btrfs-progs

git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git

Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 5/5] Btrfs-progs: don't ignore errors in extent-tree.c

2013-07-08 Thread Filipe David Borba Manana
Several function return values were being completely
ignored.

V2: Added more error checking all over extent-tree.c

Signed-off-by: Filipe David Borba Manana 
---
 extent-tree.c |   20 +---
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/extent-tree.c b/extent-tree.c
index 6490d8a..6458ce4 100644
--- a/extent-tree.c
+++ b/extent-tree.c
@@ -2110,6 +2110,7 @@ static int finish_current_insert(struct 
btrfs_trans_handle *trans,
extent_op->flags,
&extent_op->key,
extent_op->level, &key);
+   BUG_ON(ret);
} else {
BUG_ON(1);
}
@@ -2756,7 +2757,7 @@ static int alloc_reserved_tree_block(struct 
btrfs_trans_handle *trans,
 
ret = update_block_group(trans, root, ins->objectid, root->leafsize,
 1, 0);
-   return 0;
+   return ret;
 }
 
 static int alloc_tree_block(struct btrfs_trans_handle *trans,
@@ -3345,12 +3346,14 @@ btrfs_add_block_group(struct btrfs_fs_info *fs_info, 
u64 bytes_used, u64 type,
BUG_ON(ret);
 
bit = block_group_state_bits(type);
-   set_extent_bits(block_group_cache, chunk_offset,
-   chunk_offset + size - 1,
-   bit | EXTENT_LOCKED, GFP_NOFS);
+   ret = set_extent_bits(block_group_cache, chunk_offset,
+ chunk_offset + size - 1,
+ bit | EXTENT_LOCKED, GFP_NOFS);
+   BUG_ON(ret);
 
-   set_state_private(block_group_cache, chunk_offset,
- (unsigned long)cache);
+   ret = set_state_private(block_group_cache, chunk_offset,
+   (unsigned long)cache);
+   BUG_ON(ret);
set_avail_alloc_bits(fs_info, type);
 
return cache;
@@ -3372,8 +3375,11 @@ int btrfs_make_block_group(struct btrfs_trans_handle 
*trans,
sizeof(cache->item));
BUG_ON(ret);
 
-   finish_current_insert(trans, extent_root);
+   ret = finish_current_insert(trans, extent_root);
+   BUG_ON(ret);
ret = del_pending_extents(trans, extent_root);
+   BUG_ON(ret);
+
return 0;
 }
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Qemu disk images on BTRFS suffer checksum errors

2013-07-08 Thread Josef Bacik
On Mon, Jul 08, 2013 at 10:08:46AM +0200, Thomas Kuther wrote:
> Hello,
> 
> I'm about to migrate from VirtualBox to Qemu+VGA-Passthrough. All my virtual
> disk images are stored in a BTRFS subvolume on-top of a MDRAID 1.
> The host runs kernel 3.10, and Qemu 1.5.1. The Testing-VM is a Windows 7
> 64bit, using a RAW virtio disk with cache=none, same happens for qcow2,
> though.
> 
> Using VirtualBox and in the past Vmware workstation I never had issues with
> corrupted diskimages, but now with Qemu all tries ended up with lots of
> errors like:
> 
> [ 4871.863009] BTRFS info (device md10): csum failed ino 687 off 46213922816
> csum 3817758510 private 402306600
> [ 4872.481013] BTRFS info (device md10): csum failed ino 687 off 46213922816
> csum 3817758510 private 402306600
> [ 4904.055514] BTRFS info (device md10): csum failed ino 687 off 46213922816
> csum 4060166193 private 402306600
> [ 4904.748130] BTRFS info (device md10): csum failed ino 687 off 46213922816
> csum 4060166193 private 402306600
> [ 4904.987540] BTRFS info (device md10): csum failed ino 687 off 46213922816
> csum 3817758510 private 402306600
> [ 4905.024700] BTRFS info (device md10): csum failed ino 687 off 46213922816
> csum 3817758510 private 402306600
> [ 4932.497793] BTRFS info (device md10): csum failed ino 687 off 46213922816
> csum 4060166193 private 402306600
> [ 4932.533634] BTRFS info (device md10): csum failed ino 687 off 46213922816
> csum 4060166193 private 402306600
> 
> Trying to copy the disk image elsewhere causes I/O errors at some point.
> 
> I found a thread about the issue
> (http://comments.gmane.org/gmane.comp.file-systems.btrfs/20538) and also a
> bug report against Qemu from Josef Bacik describing the exact same problem:
> https://bugzilla.redhat.com/show_bug.cgi?id=693530 - Josef states it should
> be fixed since quite a while.
> 
> Is this a regression in BTRFS, a problem with my setup (md raid1 layer below
> btrfs), or (still) a bug in Qemu?
> Would cache=writethrough or writeback be an option with BTRFS?
> 

So there were two aspects to that bug, one is the thing I describe where we get
the same buffer for two parts of an iovec on reads.  That part has been fixed.
The second part is where the application will modify the page while it's in
flight, and that hasn't been fixed.  We have a few options here

1) Always double buffer direct io.  Kind of defeats the purpose of direct io.

2) Check the buffer after we've written it to see if it matches the csum we put
down, if not double buffer it and send it down again.  This makes you checksum
the page twice and punishes O_DIRECT users that behave.

I opted for #3 and let this sort of thing happen.  So you can get around it by
doing nodatacow for that particular image which will disable checksumming for
just that file, or you can use cache=writethrough/writeback and that will use
buffered io.  FYI this doesn't happen on _all_ qemu, just on guest OS'es that
don't provide stable pages, so Windows or like old RHEL versions that are on
ext3.  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Qemu disk images on BTRFS suffer checksum errors

2013-07-08 Thread Thomas Kuther

Hello,

I'm about to migrate from VirtualBox to Qemu+VGA-Passthrough. All my 
virtual disk images are stored in a BTRFS subvolume on-top of a MDRAID 
1.
The host runs kernel 3.10, and Qemu 1.5.1. The Testing-VM is a Windows 7 
64bit, using a RAW virtio disk with cache=none, same happens for qcow2, 
though.


Using VirtualBox and in the past Vmware workstation I never had issues 
with corrupted diskimages, but now with Qemu all tries ended up with 
lots of errors like:


[ 4871.863009] BTRFS info (device md10): csum failed ino 687 off 
46213922816 csum 3817758510 private 402306600
[ 4872.481013] BTRFS info (device md10): csum failed ino 687 off 
46213922816 csum 3817758510 private 402306600
[ 4904.055514] BTRFS info (device md10): csum failed ino 687 off 
46213922816 csum 4060166193 private 402306600
[ 4904.748130] BTRFS info (device md10): csum failed ino 687 off 
46213922816 csum 4060166193 private 402306600
[ 4904.987540] BTRFS info (device md10): csum failed ino 687 off 
46213922816 csum 3817758510 private 402306600
[ 4905.024700] BTRFS info (device md10): csum failed ino 687 off 
46213922816 csum 3817758510 private 402306600
[ 4932.497793] BTRFS info (device md10): csum failed ino 687 off 
46213922816 csum 4060166193 private 402306600
[ 4932.533634] BTRFS info (device md10): csum failed ino 687 off 
46213922816 csum 4060166193 private 402306600


Trying to copy the disk image elsewhere causes I/O errors at some point.

I found a thread about the issue 
(http://comments.gmane.org/gmane.comp.file-systems.btrfs/20538) and also 
a bug report against Qemu from Josef Bacik describing the exact same 
problem: https://bugzilla.redhat.com/show_bug.cgi?id=693530 - Josef 
states it should be fixed since quite a while.


Is this a regression in BTRFS, a problem with my setup (md raid1 layer 
below btrfs), or (still) a bug in Qemu?

Would cache=writethrough or writeback be an option with BTRFS?

Thanks in advance for any input.

Best regards,
Thomas

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 13/13] btrfs-progs: fix memory leaks of device_list_add()

2013-07-08 Thread Anand Jain
memory allocated by device_list_add has to be freed, the
function introduced here device_list_remove() would just
go that.
however the challenging part is about where we would
call this function.

there are two ways its handled
 the threads calling open_ctree_broken(), open_ctree() and
 open_ctree_fd() (which leads call to device_list_add)
 would anyway call close_ctree() so we put device_list_remove()
 there which will take care of freeing memory.

 now for threads calling device_list_add() outside of
 open_ctree(), has to call device_list_remove() separately
 which can be called as a last function in the thread
 this patch just does that.

device_list_remove accepts() NULL (deletes entire list)
or fsid (which would delete only the fsid matched in the
device_fs list). As of now though all calling functions
use NULL, I see potential that we would use fsid when we
have to create a single device list using both the
device-tree and from the btrfs-kernel.

further, mkfs.c thread should call device_list_remove()
as well, however mkfs.c uses a lot of in-flight exits()
which makes it very difficult to bring in this fix into
mkfs.c.  I shall be doing it in a separate patch.

Signed-off-by: Anand Jain 
---
 cmds-device.c |4 +++
 cmds-filesystem.c |   11 -
 cmds-replace.c|2 +
 cmds-scrub.c  |3 ++
 disk-io.c |1 +
 mkfs.c|1 +
 utils.h   |1 +
 volumes.c |   56 -
 volumes.h |3 ++
 9 files changed, 79 insertions(+), 3 deletions(-)

diff --git a/cmds-device.c b/cmds-device.c
index 9525fcf..e4a1f1b 100644
--- a/cmds-device.c
+++ b/cmds-device.c
@@ -29,6 +29,7 @@
 #include "utils.h"
 
 #include "commands.h"
+#include "volumes.h"
 
 /* FIXME - imported cruft, fix sparse errors and warnings */
 #ifdef __CHECKER__
@@ -128,6 +129,7 @@ static int cmd_add_dev(int argc, char **argv)
}
 
close(fdmnt);
+   device_list_remove(NULL);
if (ret)
return ret+20;
else
@@ -213,6 +215,7 @@ static int cmd_scan_dev(int argc, char **argv)
int ret;
printf("Scanning for Btrfs filesystems\n");
ret = scan_for_btrfs(where, 1);
+   device_list_remove(NULL);
if (ret){
fprintf(stderr, "ERROR: error %d while scanning\n", 
ret);
return 18;
@@ -397,6 +400,7 @@ static int cmd_dev_stats(int argc, char **argv)
 out:
free(di_args);
close(fdmnt);
+   device_list_remove(NULL);
 
return err;
 }
diff --git a/cmds-filesystem.c b/cmds-filesystem.c
index a028e1d..1c26476 100644
--- a/cmds-filesystem.c
+++ b/cmds-filesystem.c
@@ -455,6 +455,7 @@ static int cmd_show(int argc, char **argv)
print_one_uuid(fs_devices);
}
printf("%s\n", BTRFS_BUILD_VERSION);
+   device_list_remove(NULL);
return 0;
 }
 
@@ -683,13 +684,19 @@ static const char * const cmd_label_usage[] = {
 
 static int cmd_label(int argc, char **argv)
 {
+   int ret;
if (check_argc_min(argc, 2) || check_argc_max(argc, 3))
usage(cmd_label_usage);
 
if (argc > 2)
-   return set_label(argv[1], argv[2]);
+   ret = set_label(argv[1], argv[2]);
else
-   return get_label(argv[1]);
+   ret = get_label(argv[1]);
+
+   if (is_existing_blk_or_reg_file(argv[1]))
+   device_list_remove(NULL);
+
+   return ret;
 }
 
 const struct cmd_group filesystem_cmd_group = {
diff --git a/cmds-replace.c b/cmds-replace.c
index c68986a..99a5abc 100644
--- a/cmds-replace.c
+++ b/cmds-replace.c
@@ -340,6 +340,7 @@ static int cmd_start_replace(int argc, char **argv)
}
}
close(fdmnt);
+   device_list_remove(NULL);
return 0;
 
 leave_with_error:
@@ -349,6 +350,7 @@ leave_with_error:
close(fdsrcdev);
if (fddstdev != -1)
close(fddstdev);
+   device_list_remove(NULL);
return -1;
 }
 
diff --git a/cmds-scrub.c b/cmds-scrub.c
index 95dfee3..08aab54 100644
--- a/cmds-scrub.c
+++ b/cmds-scrub.c
@@ -1496,6 +1496,7 @@ out:
unlink(sock_path);
}
close(fdmnt);
+   device_list_remove(NULL);
 
if (err)
return 1;
@@ -1564,6 +1565,7 @@ static int cmd_scrub_cancel(int argc, char **argv)
 out:
if (fdmnt != -1)
close(fdmnt);
+   device_list_remove(NULL);
return ret;
 }
 
@@ -1722,6 +1724,7 @@ out:
free(di_args);
if (fdres > -1)
close(fdres);
+   device_list_remove(NULL);
 
return err;
 }
diff --git a/disk-io.c b/disk-io.c
index 3937e3f..9b72576 100644
--- a/disk-io.c
+++ b/disk-io.c
@@ -1338,6 +1338,7 @@ int close_ctree(struct btrfs_root *root)
}
 
close_all_devices(fs_info);
+   device_list_remove(NULL);