[PATCH v2] Btrfs: fix a dio write regression

2012-08-22 Thread bo . li . liu
From: Liu Bo 

This bug is introduced by commit 3b8bde746f6f9bd36a9f05f5f3b6e334318176a9
(Btrfs: lock extents as we map them in DIO).

In dio write, we should unlock the section which we didn't do IO on in case that
we fall back to buffered write.  But we need to not only unlock the section
but also cleanup reserved space for the section.

This bug was found while running xfstests 133, with this 133 no longer 
complains.

Signed-off-by: Liu Bo 
---
v1->v2: apply style comments from David Sterba.

 fs/btrfs/inode.c |   24 
 1 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 7131fac..ea6a4ee 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -5993,11 +5993,27 @@ unlock:
 * in the case of read we need to unlock only the end area that we
 * aren't using if there is any left over space.
 */
-   if (lockstart < lockend)
-   clear_extent_bit(&BTRFS_I(inode)->io_tree, lockstart, lockend,
-unlock_bits, 1, 0, &cached_state, GFP_NOFS);
-   else
+   if (lockstart < lockend) {
+   if (create && len < lockend - lockstart) {
+   clear_extent_bit(&BTRFS_I(inode)->io_tree, lockstart,
+lockstart + len - 1, unlock_bits, 1, 0,
+&cached_state, GFP_NOFS);
+   /*
+* Beside unlock, we also need to cleanup reserved space
+* for the left range by attaching EXTENT_DO_ACCOUNTING.
+*/
+   clear_extent_bit(&BTRFS_I(inode)->io_tree,
+lockstart + len, lockend,
+unlock_bits | EXTENT_DO_ACCOUNTING,
+1, 0, NULL, GFP_NOFS);
+   } else {
+   clear_extent_bit(&BTRFS_I(inode)->io_tree, lockstart,
+lockend, unlock_bits, 1, 0,
+&cached_state, GFP_NOFS);
+   }
+   } else {
free_extent_state(cached_state);
+   }
 
free_extent_map(em);
 
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfsprogs: cases of snapshot failures

2012-08-22 Thread Jan Engelhardt

Since btrfs does not do recursive atomic snapshots (which I am ok with),
I am doing this myself. A handful of suggestions/problems came up.

1. Maybe btrfsprogs could gain an option to do recursive non-atomic
snapshots at the userspace level, simply invoking low-level atomic
snapshots one by one?


For the following, the kernel is 3.4.4 with the too-overloaded "0.19"
version of btrfsprogs.

2. Subvolume directories are somewhat special, as you may know.
Only `btrfs sub create/snap` creates them, and they cannot be rmdird.

# btrfs sub list .
ID 256 top level 5 path HEAD
[...]
ID 450 top level 5 path HEAD/woven
ID 451 top level 5 path HEAD/leet

Attempting to snapshot a directory with further subvolumes in it
has the strange effect that directories get created, and do so
with the wrong inode info:

# ls -l HEAD
total 4
drwxr-xr-x 1 root root  18 Aug 18 01:04 .
dr-xr-xr-x 1 root root 218 Aug 23 00:25 ..
drwxr-xr-x 1 root root  66 Aug 17 23:04 leet
drwxrwx--- 1 root root 100 Aug 18 00:53 woven

# btrfs sub snap HEAD today
Create a snapshot of 'HEAD' in './today'
# ls -l today
total 4
drwxr-xr-x 1 root root  18 Aug 18 01:04 .
dr-xr-xr-x 1 root root 228 Aug 23 00:25 ..
drwxr-xr-x 1 root root   0 Aug 23 00:25 leet
drwxr-xr-x 1 root root   0 Aug 23 00:25 woven


3. The creation of these non-special directories in today/
is undesired, because now I need to rmdir them first before
creating the subsnapshots.

# btrfs sub snap HEAD today
Create a snapshot of 'HEAD' in './today'
# btrfs sub snap HEAD/leet today/
Create a snapshot of 'HEAD/leet' in 'today//leet'
ERROR: cannot snapshot 'HEAD/leet' - File exists


4. Because today/leet already exists as a non-subvolume root,
btrfsprogs defaults to creating another directory inside it
(unexpected, but ok). However, while doing so, it runs into
some unexplainable ENOTTY:

# btrfs sub snap HEAD today
Create a snapshot of 'HEAD' in './today'
# btrfs sub snap HEAD/leet today/leet
Create a snapshot of 'HEAD/leet' in 'today/leet/leet'
ERROR: cannot snapshot 'HEAD/leet' - Inappropriate ioctl for device
(^failure where success would have been expected)

Successes:
# rmdir today/leet/
# mkdir today/leet
# btrfs sub snap HEAD/leet today/leet
Create a snapshot of 'HEAD/leet' in 'today/leet/leet'

So today/leet as created by `sub snap HEAD` is also some weird
object.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Computer crash, btrfs partition errors

2012-08-22 Thread Not Zippy
Hi

The full output of the btrfs-debug-tree is 190MB compressed, did you
want it still ?

As far as the conditions, I was running a repo sync which I had
CTRL-Z, I then got distracted and mistakenly started the sync again
(not sure if you are familiar with repo command, it spawns git
processes to checkout project). The crash did not occur immediately
but it was probably within 2 minutes of me starting this second repo
sync. The lock up was bad enough that BUSIER did not reboot the PC, I
had to power down.

After the patch to btrfsck, I got this error:

# ./work/builds/btrfs-progs/btrfsck --repair /dev/sda2
enabling repair mode
checking extents
leaf parent key incorrect 46329503744
bad block 46329503744
owner ref check failed [46329503744 4096]
repair deleting extent record: key 46329503744 168 4096
adding new tree backref on start 46329503744 len 4096 parent 256 root 256
repaired damaged extent references
*** glibc detected *** ./work/builds/btrfs-progs/btrfsck: corrupted
double-linked list: 0x1202b220 ***
=== Backtrace: =
/lib64/libc.so.6(+0x77896)[0x7f0008c59896]
/lib64/libc.so.6(+0x77cfb)[0x7f0008c59cfb]
/lib64/libc.so.6(+0x784a8)[0x7f0008c5a4a8]
/lib64/libc.so.6(cfree+0x6c)[0x7f0008c5d84c]
./work/builds/btrfs-progs/btrfsck[0x415db8]
./work/builds/btrfs-progs/btrfsck[0x415e15]
./work/builds/btrfs-progs/btrfsck[0x40aa9e]
./work/builds/btrfs-progs/btrfsck[0x4046c7]
/lib64/libc.so.6(__libc_start_main+0xed)[0x7f0008c0636d]
./work/builds/btrfs-progs/btrfsck[0x4017f9]
=== Memory map: 
0040-00427000 r-xp  00:22 1375631
  /mnt/DevSystem/Work/builds/btrfs-progs/btrfsck
00626000-00627000 r--p 00026000 00:22 1375631
  /mnt/DevSystem/Work/builds/btrfs-progs/btrfsck
00627000-00628000 rw-p 00027000 00:22 1375631
  /mnt/DevSystem/Work/builds/btrfs-progs/btrfsck
01f27000-2576c000 rw-p  00:00 0  [heap]
7f000400-7f0004021000 rw-p  00:00 0
7f0004021000-7f000800 ---p  00:00 0
7f00089cb000-7f00089e r-xp  08:22 298053
  /lib64/libgcc_s.so.1
7f00089e-7f0008be ---p 00015000 08:22 298053
  /lib64/libgcc_s.so.1
7f0008be-7f0008be1000 r--p 00015000 08:22 298053
  /lib64/libgcc_s.so.1
7f0008be1000-7f0008be2000 rw-p 00016000 08:22 298053
  /lib64/libgcc_s.so.1
7f0008be2000-7f0008d64000 r-xp  08:22 2883622
  /lib64/libc-2.14.1.so
7f0008d64000-7f0008f64000 ---p 00182000 08:22 2883622
  /lib64/libc-2.14.1.so
7f0008f64000-7f0008f68000 r--p 00182000 08:22 2883622
  /lib64/libc-2.14.1.so
7f0008f68000-7f0008f69000 rw-p 00186000 08:22 2883622
  /lib64/libc-2.14.1.so
7f0008f69000-7f0008f6e000 rw-p  00:00 0
7f0008f6e000-7f0008fef000 r-xp  08:22 2883678
  /lib64/libm-2.14.1.so
7f0008fef000-7f00091ee000 ---p 00081000 08:22 2883678
  /lib64/libm-2.14.1.so
7f00091ee000-7f00091ef000 r--p 0008 08:22 2883678
  /lib64/libm-2.14.1.so
7f00091ef000-7f00091f rw-p 00081000 08:22 2883678
  /lib64/libm-2.14.1.so
7f00091f-7f00091f4000 r-xp  08:22 394806
  /lib64/libuuid.so.1.3.0
7f00091f4000-7f00093f3000 ---p 4000 08:22 394806
  /lib64/libuuid.so.1.3.0
7f00093f3000-7f00093f4000 r--p 3000 08:22 394806
  /lib64/libuuid.so.1.3.0
7f00093f4000-7f00093f5000 rw-p 4000 08:22 394806
  /lib64/libuuid.so.1.3.0
7f00093f5000-7f0009415000 r-xp  08:22 2883603
  /lib64/ld-2.14.1.so
7f00095d1000-7f00095d4000 rw-p  00:00 0
7f0009612000-7f0009615000 rw-p  00:00 0
7f0009615000-7f0009616000 r--p 0002 08:22 2883603
  /lib64/ld-2.14.1.so
7f0009616000-7f0009617000 rw-p 00021000 08:22 2883603
  /lib64/ld-2.14.1.so
7f0009617000-7f0009618000 rw-p  00:00 0
7fff70f01000-7fff70f23000 rw-p  00:00 0  [stack]
7fff70fff000-7fff7100 r-xp  00:00 0  [vdso]
ff60-ff601000 r-xp  00:00 0
  [vsyscall]
Aborted
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix a dio write regression

2012-08-22 Thread David Sterba
Hi,

a few minor style comments,

On Wed, Aug 22, 2012 at 06:11:14PM +0800, bo.li@oracle.com wrote:
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -5993,10 +5993,24 @@ unlock:
>* in the case of read we need to unlock only the end area that we
>* aren't using if there is any left over space.
>*/
> - if (lockstart < lockend)
> - clear_extent_bit(&BTRFS_I(inode)->io_tree, lockstart, lockend,
> -  unlock_bits, 1, 0, &cached_state, GFP_NOFS);
> - else
> + if (lockstart < lockend) {
> + if (create && len < lockend - lockstart) {
> + clear_extent_bit(&BTRFS_I(inode)->io_tree, lockstart,
> +  lockstart + len - 1, unlock_bits, 1, 0,
> +  &cached_state, GFP_NOFS);
> + /*
> +  * Beside unlock, we also need to cleanup reserved space
> +  * for the left range by attaching EXTENT_DO_ACCOUNTING.
> +  */
> + clear_extent_bit(&BTRFS_I(inode)->io_tree,
> +  lockstart + len, lockend, unlock_bits |
> +  EXTENT_DO_ACCOUNTING, 1, 0, NULL,

I'd prefer to see unlock_bits and the new value on one line

> +  GFP_NOFS);
> + } else

add { ... } around this

> + clear_extent_bit(&BTRFS_I(inode)->io_tree, lockstart,
> +  lockend, unlock_bits, 1, 0,
> +  &cached_state, GFP_NOFS);
> + } else

here too

>   free_extent_state(cached_state);
>  
>   free_extent_map(em);
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raw partition or LV for btrfs?

2012-08-22 Thread David Sterba
On Tue, Aug 14, 2012 at 07:23:48AM -0400, Calvin Walton wrote:
> A patch to add support for `btrfs fi defrag -c none ` or so would
> make this easier, and shouldn't be to hard to do :)

This one is on my list of 'nice to have', it's needed to extend the
ioctl to understand 'none' as to actually use no compression during the
defrag, while currently it means 'whatever compression the file has
set'.


david
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: linux 3.5.0: BTRFS error in compress_file_range:581 (failed to join transaction)

2012-08-22 Thread David Sterba
On Tue, Aug 14, 2012 at 09:00:53PM -0700, Marc MERLIN wrote:
> > What does the 'ret' shows?  Is it -ENOSPC?
> 
> I got nothing else in my logs.

Unless it was a second error from a filesystem that went RO, there
should be more than the "Failed to join transaction" message, and the
first occurence of some transaction abort would spit some stacktrace as
well.

As you wrote in next paragraphs, it was probably a cable disconnection, so my
bet is on EIO, and the transaction abort did the right things, so

> I powered the laptop back on and it came up like nothing ever happened.

s/I/you/
.

(besides the few uncommitted changes)


david
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Hung I/O, Kernel BUG with corrupt leaf (bad key order)

2012-08-22 Thread David Sterba
On Tue, Aug 14, 2012 at 01:20:36PM -0500, Peter Marheine wrote:
> Hi all,
> 
> I'm running btrfs in a 3-disk RAID1 configuration. After a hard
> power-off, I'm seeing a lot of hung I/O tasks on this volume,
> apparently due to a corrupt leaf. I first noticed the problem on
> kernel 3.4.7, and it's persisted with 3.4.8. Relevant parts of the
> kernel log follow.

What was the filesystem activity when the power-off happened?

> 
> [   85.179621] block group 38684065792 has an wrong amount of free space
> [   85.179667] btrfs: failed to load free space cache for block group
> 38684065792
> [  136.969477] btrfs: corrupt leaf, bad key order:
> block=1478255230976,root=1, slot=26
> [  136.998953] btrfs: corrupt leaf, bad key order:
> block=1478255230976,root=1, slot=26
> [  137.000492] btrfs: corrupt leaf, bad key order:
> block=1478255230976,root=1, slot=26
> [  137.000708] btrfs: corrupt leaf, bad key order:
> block=1478255230976,root=1, slot=26
> [  153.912922] btrfs: corrupt leaf, bad key order:
> block=1478255230976,root=1, slot=26
> [  153.913020] [ cut here ]
> [  153.913055] kernel BUG at fs/btrfs/inode.c:828!

 809 static noinline int cow_file_range(struct inode *inode,
 810struct page *locked_page,
 811u64 start, u64 end, int *page_started,
 812unsigned long *nr_written,
 813int unlock)
 814 {
[...]
 828 BUG_ON(btrfs_is_free_space_inode(root, inode));

plus the 'block group' warning above, this seems to be the but that Liu Bo
fixed with patches

Btrfs: fix a bug of writting free space cache with nodatacow option
Btrfs: fix a bug of writting free space cache during balance
Btrfs: fix btrfs_is_free_space_inode to recognize btree inode

that should appear in 3.6.

You can try to mount with 'nospace_cache' or 'clear_cache' if this would make a
difference to redo the space cache from scratch, but I'm afaraid the bad keys
will remain and would have to be removed via offline fsck.


david
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How to get Btrfs on 2nd partition of USB HDD to automount as read/write

2012-08-22 Thread David Sterba
On Tue, Aug 21, 2012 at 03:28:22PM -0400, dg1727 wrote:
> Thanks a lot for these answers.  As an exercise, how would I track 
> that patch so I can tell when it has been released?  Pointing me to 
> a webpage that covers this would be fine.  

You can easily check that the patch appears in the main progs repo at

http://git.kernel.org/?p=linux/kernel/git/mason/btrfs-progs.git;a=summary

or you can clone the repo and check the git log directly.


david
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Computer crash, btrfs partition errors

2012-08-22 Thread David Sterba
On Tue, Aug 21, 2012 at 09:50:58AM -0700, Not Zippy wrote:
> Thanks for the analysis, unfortunately I get the same assert error
> when I attempt to run the repair from the compiled source.
> 
> # ./work/builds/btrfs-progs/btrfsck
> usage: btrfsck dev
> Btrfs Btrfs v0.19

'git describe' would be more helpful, but as the following command
succeded, I assume that you're on the right version.

> # ./work/builds/btrfs-progs/btrfsck --repair /dev/sda2
> enabling repair mode
> checking extents
> leaf parent key incorrect 46329503744
> bad block 46329503744
> owner ref check failed [46329503744 4096]
> repair deleting extent record: key 46329503744 168 4096
> adding new tree backref on start 46329503744 len 4096 parent 256 root 256
> repaired damaged extent references

so it's able to fix that error, but due to failure in the next phase the
change is not permanent -- a quick hack here would be to commit
immediatelly after this phase

--- a/btrfsck.c
+++ b/btrfsck.c
@@ -3572,6 +3572,8 @@ int main(int ac, char **av)
if (ret)
fprintf(stderr, "Errors found in extent allocation tree\n");

+   goto out;
+
fprintf(stderr, "checking fs roots\n");
ret = check_fs_roots(root, &root_cache);
if (ret)
---

> checking fs roots
> btrfsck: btrfsck.c:397: process_inode_item: Assertion `!(rec->ino !=
> key->objectid || rec->refs > 1)' failed.
> Aborted

This would need more information to analyze further, like full output of
btrfs-debug-tree, and capture actual values from the Assertion to match
them against the output. Additionally, I'm still interested in details
about conditions that lead to this corruption, it's less painful to
reproduce it locally than the remote debugging ping-pong (though
sometimes nothing else is available).
I think it'd be interesting to dig deeper, but you were able to get to
your data so it's not probably that urgent for both of us.


david
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs-progs: seg fault in get_label_unmounted

2012-08-22 Thread David Sterba
On Wed, Aug 15, 2012 at 04:29:53PM +0800, Anand jain wrote:
> From: Anand Jain 
> 
> btrfs f l /
> No valid Btrfs found on /
> Segmentation fault (core dumped)

Patches fixing this have been sent like 4 times, last one was from
Alexander's 'btrfs prop', that modified it a bit more (to return the
label instead of printing it).

http://permalink.gmane.org/gmane.comp.file-systems.btrfs/18287

while Danny fixes more instances of unhandled error code from open_ctree

http://permalink.gmane.org/gmane.comp.file-systems.btrfs/15305

JFYI,
david
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


interaction with hardware RAID?

2012-08-22 Thread Daniel Pocock



It is well documented that btrfs data recovery (after silent corruption)
is dependent on the use of btrfs's own RAID1.

However, I'm curious about whether any hardware RAID vendors are
contemplating ways to integrate more closely with btrfs, for example,
such that when btrfs detects a bad checksum, it would be able to ask the
hardware RAID controller to return all alternate copies of the block.

Is this technically possible within any hardware RAID device today, even
though not implemented in btrfs?

Has there been any suggestion that vendors would support this in future,
presumably for the benefit of btrfs, ZFS and other checksumming filesystems?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL v2] Update LZO compression

2012-08-22 Thread Johannes Stezenbach
On Tue, Aug 21, 2012 at 05:21:50PM +0200, Markus F.X.J. Oberhumer wrote:
> as suggested on the mailing list I have converted the updated LZO
> code into git, so please pull my "lzo-update" branch from
...
> [ Changes in v2: Optimize code for CPUs with inefficient unaligned
>   access => significant speed increase on ARM ]

I can confirm that this new code runs at the same speed
as the current lzo code in the Linux kernel on
my ARM926EJ-S based platform.  I only tested decompression,
using the attached hacky userspace code.

   # time ./lzo-bench/old/unlzop /dev/null
   real0m 0.29s
   # time ./lzo-bench/new/unlzop /dev/null
   real0m 0.29s

   (where lzoimage is a Linux Image compressed with lzop)

So, from my side there are no more objections.
Thanks for doing this work, Markus.


Johannes


lzo-bench.tar.gz
Description: Binary data


[PATCH] Btrfs: fix a dio write regression

2012-08-22 Thread bo . li . liu
From: Liu Bo 

This bug is introduced by commit 3b8bde746f6f9bd36a9f05f5f3b6e334318176a9
(Btrfs: lock extents as we map them in DIO).

In dio write, we should unlock the section which we didn't do IO on in case that
we fall back to buffered write.  But we need to not only unlock the section
but also cleanup reserved space for the section.

This bug was found while running xfstests 133, with this 133 no longer 
complains.

Signed-off-by: Liu Bo 
---
 fs/btrfs/inode.c |   22 ++
 1 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 7131fac..e4ab92b 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -5993,10 +5993,24 @@ unlock:
 * in the case of read we need to unlock only the end area that we
 * aren't using if there is any left over space.
 */
-   if (lockstart < lockend)
-   clear_extent_bit(&BTRFS_I(inode)->io_tree, lockstart, lockend,
-unlock_bits, 1, 0, &cached_state, GFP_NOFS);
-   else
+   if (lockstart < lockend) {
+   if (create && len < lockend - lockstart) {
+   clear_extent_bit(&BTRFS_I(inode)->io_tree, lockstart,
+lockstart + len - 1, unlock_bits, 1, 0,
+&cached_state, GFP_NOFS);
+   /*
+* Beside unlock, we also need to cleanup reserved space
+* for the left range by attaching EXTENT_DO_ACCOUNTING.
+*/
+   clear_extent_bit(&BTRFS_I(inode)->io_tree,
+lockstart + len, lockend, unlock_bits |
+EXTENT_DO_ACCOUNTING, 1, 0, NULL,
+GFP_NOFS);
+   } else
+   clear_extent_bit(&BTRFS_I(inode)->io_tree, lockstart,
+lockend, unlock_bits, 1, 0,
+&cached_state, GFP_NOFS);
+   } else
free_extent_state(cached_state);
 
free_extent_map(em);
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html