Re: btrfs-progs: initial reference count of extent buffer is correct?

2014-08-25 Thread Liu Bo
On Mon, Aug 25, 2014 at 02:26:49PM +0900, Naohiro Aota wrote:
 Hi, list
 
 I'm having trouble with my btrfs FS recently and running btrfs check to
 try to fix the FS. Unfortunately, it aborted with:
 
 btrfsck: root-tree.c:81: btrfs_update_root: Assertion `!(ret != 0)' failed.
 
 It means that extent tree root is not found in tree root tree! Then
 I added btrfs_print_leaf() there to see what is happening there. There
 were (... METADATA_ITEM 0) keys listed. Well, I found tree root
 tree's root extent buffer is somewhat replaced by a extent buffer from
 the extent tree.
 
 Reading the code, it seems that free_some_buffers() reclaim extent
 buffers allocated to root trees because they are not
 extent_buffer_get()ed (i.e. @refs == 1). 
 
 To reproduce this problem, try running this code. This program first
 print root tree node's bytenr, and scan some trees. If your FS is large
 enough to run free_some_buffers(), tree root node's bytenr after the
 scan would be different.
 
 #include stdio.h
 #include ctree.h
 #include disk-io.h
 
 void scan_tree(struct btrfs_root *root, struct extent_buffer *eb)
 {
   u32 i;
   u32 nr;
   nr = btrfs_header_nritems(eb);
   if (btrfs_is_leaf(eb)) return;
   u32 size = btrfs_level_size(root, btrfs_header_level(eb) - 1);
   for (i = 0; i  nr; i++) {
 if (btrfs_is_leaf(eb)) return;
 u64 bytenr = btrfs_node_blockptr(eb, i);
 struct extent_buffer *next = read_tree_block(root, bytenr, size,
btrfs_node_ptr_generation(eb, 
 i));
 if (!next) continue;
 scan_tree(root, next);
   }
 }
 
 int main(int ac, char **av)
 {
   struct btrfs_fs_info *info;
   struct btrfs_root *root;
   info = open_ctree_fs_info(av[1], 0, 0, OPEN_CTREE_PARTIAL);
   root = info-fs_root;
   printf(tree root %lld\n, info-tree_root-node-start);
   scan_tree(info-fs_root, info-extent_root-node);
   scan_tree(info-fs_root, info-csum_root-node);
   scan_tree(info-fs_root, info-fs_root-node);
   printf(tree root %lld\n, info-tree_root-node-start);
   return close_ctree(root);
 }
 
 On my environment, the above code print the following result. Tree root
 tree variable is eventually pointing to another extent!
 
 $ ./btrfs-reproduce /dev/sda3
 tree root 91393835008
 tree root 49102848
 
 I found commit 53ee1bccf99cd5b474fe1aa857b7dd176e3a1407 changed the
 initial @refs to 1, stating that we don't give enough
 free_extent_buffer() to reduce the eb's references to zero so that the
 eb can finally be freed, but I don't think this is correct. Even if
 initial @refs == 2, one free_extent_buffer() would make the @refs to 1
 and so let it reclaimed by free_some_buffer(), so it does not seems to
 be a problem for me...
 
 I think there are some collides how to use extent buffer: should
 __alloc_extent_buffer set @refs = 2 for the caller or should the code
 call extent_buffer_get() by themselves everywhere you allocate eb before
 any other eb allocation not to let the first eb reclaimed? How to fix
 this problem? revert 53ee1bccf99cd5b474fe1aa857b7dd176e3a1407 is the
 collect way? or add missing extent_buffer_get() everywhere allocating
 is done?

You may think of it twice, commit 53ee1bccf99cd5b474fe1aa857b7dd176e3a1407 
is to fix a bug of assigning a free block to two different extent buffers, ie.
two different extent buffers' share the same eb-start, so it's not just bumping
a reference cnt.

Right now we want to be consistent with the kernel side, decreasing eb-refs=0
means it'd be freed, so droping free_some_buffer() can be a good choice.

And for caching extent buffer, we've increased eb-refs by 1 to keep it in the
cache rbtree.

thanks,
-liubo
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: improve free space cache management and space allocation

2014-08-25 Thread Filipe Manana
While under random IO, a block group's free space cache eventually reaches
a state where it has a mix of extent entries and bitmap entries representing
free space regions.

As later free space regions are returned to the cache, some of them are merged
with existing extent entries if they are contiguous with them. But others are
not merged, because despite the existence of adjacent free space regions in
the cache, the merging doesn't happen because the existing free space regions
are represented in bitmap extents. Even when new free space regions are merged
with existing extent entries (enlarging the free space range they represent),
we create chances of having after an enlarged region that is contiguous with
some other region represented in a bitmap entry.

Both clustered and non-clustered space allocation work by iterating over our
extent and bitmap entries and skipping any that represents a region smaller
then the allocation request (and giving preference to extent entries before
bitmap entries). By having a contiguous free space region that is represented
by 2 (or more) entries (mix of extent and bitmap entries), we end up not
satisfying an allocation request with a size larger than the size of any of
the entries but no larger than the sum of their sizes. Making the caller assume
we're under a ENOSPC condition or force it to allocate multiple smaller space
regions (as we do for file data writes), which adds extra overhead and more
chances of causing fragmentation due to the smaller regions being all spread
apart from each other (more likely when under concurrency).

For example, if we have the following in the cache:

* extent entry representing free space range: [128Mb - 256Kb, 128Mb[

* bitmap entry covering the range [128Mb, 256Mb[, but only with the bits
  representing the range [128Mb, 128Mb + 768Kb[ set - that is, only that
  space in this 128Mb area is marked as free

An allocation request for 1Mb, starting at offset not greater than 128Mb - 
256Kb,
would fail before, despite the existence of such contiguous free space area in 
the
cache. The caller could only allocate up to 768Kb of space at once and later 
another
256Kb (or vice-versa). In between each smaller allocation request, another task
working on a different file/inode might come in and take that space, preventing 
the
former task of getting a contiguous 1Mb region of free space.

Therefore this change implements the ability to move free space from bitmap
entries into existing and new free space regions represented with extent
entries. This is done when a space region is added to the cache.

A test was added to the sanity tests that explains in detail the issue too.

Some performance test results with compilebench on a 4 cores machine, with
32Gb of ram and using an HDD follow.

Test: compilebench -D /mnt -i 30 -r 1000 --makej

Before this change:

   intial create total runs 30 avg 69.02 MB/s (user 0.28s sys 0.57s)
   compile total runs 30 avg 314.96 MB/s (user 0.12s sys 0.25s)
   read compiled tree total runs 3 avg 27.14 MB/s (user 1.52s sys 0.90s)
   delete compiled tree total runs 30 avg 3.14 seconds (user 0.15s sys 0.66s)

After this change:

   intial create total runs 30 avg 68.37 MB/s (user 0.29s sys 0.55s)
   compile total runs 30 avg 382.83 MB/s (user 0.12s sys 0.24s)
   read compiled tree total runs 3 avg 27.82 MB/s (user 1.45s sys 0.97s)
   delete compiled tree total runs 30 avg 3.18 seconds (user 0.17s sys 0.65s)

Signed-off-by: Filipe Manana fdman...@suse.com
---
 fs/btrfs/free-space-cache.c   | 149 ++-
 fs/btrfs/tests/free-space-tests.c | 514 ++
 2 files changed, 662 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index 2f0fe10..23632ba 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -1951,6 +1951,137 @@ out:
return ret;
 }
 
+static void steal_from_bitmap_to_end(struct btrfs_free_space_ctl *ctl,
+struct btrfs_free_space *info,
+bool update_stat)
+{
+   struct btrfs_free_space *bitmap;
+   u64 bitmap_offset = info-offset;
+   unsigned long i;
+   unsigned long j;
+   const u64 end = info-offset + info-bytes;
+   u64 bytes;
+
+again:
+   bitmap = tree_search_offset(ctl, offset_to_bitmap(ctl, bitmap_offset),
+   1, 0);
+   if (!bitmap)
+   goto out;
+
+   if (end  bitmap-offset || (bitmap-offset + bitmap-bytes  end))
+   return;
+
+   i = offset_to_bit(bitmap-offset, ctl-unit, end);
+   j = find_next_zero_bit(bitmap-bitmap, BITS_PER_BITMAP, i);
+   if (j == i)
+   return;
+   bytes = (j - i) * ctl-unit;
+   info-bytes += bytes;
+
+   if (update_stat)
+   bitmap_clear_bits(ctl, bitmap, end, bytes);
+   else
+   __bitmap_clear_bits(ctl, bitmap, end, bytes);
+
+   

[PATCH] Btrfs: fix corruption after write/fsync failure + fsync + log recovery

2014-08-25 Thread Filipe Manana
While writing to a file, in inode.c:cow_file_range() (and same applies to
submit_compressed_extents()), after reserving an extent for the file data,
we create a new extent map for the written range and insert it into the
extent map cache. After that, we create an ordered operation, but if it
fails (due to a transient/temporary-ENOMEM), we return without dropping
that extent map, which points to a reserved extent that is freed when we
return. A subsequent incremental fsync (when the btrfs inode doesn't have
the flag BTRFS_INODE_NEEDS_FULL_SYNC) considers this extent map valid and
logs a file extent item based on that extent map, which points to a disk
extent that doesn't contain valid data - it was freed by us earlier, at this
point it might contain any random/garbage data.

Therefore, if we reach an error condition when cowing a file range after
we added the new extent map to the cache, drop it from the cache before
returning.

Some sequence of steps that lead to this:

$ mkfs.btrfs -f /dev/sdd
$ mount -o commit= /dev/sdd /mnt
$ cd /mnt

$ xfs_io -f -c pwrite -S 0x01 -b 4096 0 4096 -c fsync foo
$ xfs_io -c pwrite -S 0x02 -b 4096 4096 4096
$ sync

$ od -t x1 foo
000 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01
*
001 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02
*
002

$ xfs_io -c pwrite -S 0xa1 -b 4096 0 4096 foo

# Now this write + fsync fail with -ENOMEM, which was returned by
# btrfs_add_ordered_extent() in inode.c:cow_file_range().
$ xfs_io -c pwrite -S 0xff -b 4096 4096 4096 foo
$ xfs_io -c fsync foo
fsync: Cannot allocate memory

# Now do a new write + fsync, which will succeed. Our previous
# -ENOMEM was a transient/temporary error.
$ xfs_io -c pwrite -S 0xee -b 4096 16384 4096 foo
$ xfs_io -c fsync foo

# Our file content (in page cache) is now:
$ od -t x1 foo
000 a1 a1 a1 a1 a1 a1 a1 a1 a1 a1 a1 a1 a1 a1 a1 a1
*
001 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
*
002 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
*
004 ee ee ee ee ee ee ee ee ee ee ee ee ee ee ee ee
*
005

# Now reboot the machine, and mount the fs, so that fsync log replay
# takes place.

# The file content is now weird, in particular the first 8Kb, which
# do not match our data before nor after the sync command above.
$ od -t x1 foo
000 ee ee ee ee ee ee ee ee ee ee ee ee ee ee ee ee
*
001 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01
*
002 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
*
004 ee ee ee ee ee ee ee ee ee ee ee ee ee ee ee ee
*
005

# In fact these first 4Kb are a duplicate of the last 4kb block.
# The last write got an extent map/file extent item that points to
# the same disk extent that we got in the write+fsync that failed
# with the -ENOMEM error. btrfs-debug-tree and btrfsck allow us to
# verify that:

$ btrfs-debug-tree /dev/sdd
(...)
item 6 key (257 EXTENT_DATA 0) itemoff 15819 itemsize 53
extent data disk byte 12582912 nr 8192
extent data offset 0 nr 8192 ram 8192
item 7 key (257 EXTENT_DATA 8192) itemoff 15766 itemsize 53
extent data disk byte 0 nr 0
extent data offset 0 nr 8192 ram 8192
item 8 key (257 EXTENT_DATA 16384) itemoff 15713 itemsize 53
extent data disk byte 12582912 nr 4096
extent data offset 0 nr 4096 ram 4096

$ umount /dev/sdd
$ btrfsck /dev/sdd
Checking filesystem on /dev/sdd
UUID: db5e60e1-050d-41e6-8c7f-3d742dea5d8f
checking extents
extent item 12582912 has multiple extent items
ref mismatch on [12582912 4096] extent item 1, found 2
Backref bytes do not match extent backref, bytenr=12582912, ref bytes=4096, 
backref bytes=8192
backpointer mismatch on [12582912 4096]
Errors found in extent allocation tree or chunk allocation
checking free space cache
checking fs roots
root 5 inode 257 errors 1000, some csum missing
found 131074 bytes used err is 1
total csum bytes: 4
total tree bytes: 131072
total fs tree bytes: 32768
total extent tree bytes: 16384
btree space waste bytes: 123404
file data blocks allocated: 274432
 referenced 274432
Btrfs v3.14.1-96-gcc7fd5a-dirty

Signed-off-by: Filipe Manana fdman...@suse.com
---
 fs/btrfs/inode.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index c678dea..16e8146 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -792,8 +792,12 @@ retry:
ins.offset,
BTRFS_ORDERED_COMPRESSED,
async_extent-compress_type);
-   if (ret)
+   if (ret) {

Re: btrfs restore memory corruption (bug: 82701)

2014-08-25 Thread Gui Hecheng
On Mon, 2014-08-25 at 10:58 +0200, Marc Dietrich wrote:
 Am Freitag 22 August 2014, 10:42:18 schrieb Marc Dietrich:
  Am Freitag, 22. August 2014, 14:43:45 schrieb Gui Hecheng:
   On Thu, 2014-08-21 at 16:19 +0200, Marc Dietrich wrote:
Am Donnerstag, 21. August 2014, 17:52:16 schrieb Gui Hecheng:
 On Mon, 2014-08-18 at 11:25 +0200, Marc Dietrich wrote:
  Hi,
  
  I did a checkout of the latest btrfs progs to repair my damaged
  filesystem.
  Running btrfs restore gives me several failed to inflate: -6 and
  crashes
  with some memory corruption. I ran it again with valgrind and got:
  
  valgrind --log-file=x2 -v --leak-check=yes btrfs restore /dev/sda9
  /mnt/backup
  
  ==8528== Memcheck, a memory error detector
  ==8528== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et
  al.
  ==8528== Using Valgrind-3.8.1 and LibVEX; rerun with -h for
  copyright
  info
  ==8528== Command: btrfs restore /dev/sda9 /mnt/backup
  ==8528== Parent PID: 8453
  ==8528==
  ==8528== Syscall param pwrite64(buf) points to uninitialised byte(s)
  ==8528==at 0x59BE3C3: __pwrite_nocancel (in
  /lib64/libpthread-2.18.so)
  ==8528==by 0x41F22F: search_dir (cmds-restore.c:392)
  ==8528==by 0x41F8D0: search_dir (cmds-restore.c:895)
  ==8528==by 0x41F8D0: search_dir (cmds-restore.c:895)
  ==8528==by 0x41F8D0: search_dir (cmds-restore.c:895)
  ==8528==by 0x41F8D0: search_dir (cmds-restore.c:895)
  ==8528==by 0x41F8D0: search_dir (cmds-restore.c:895)
  ==8528==by 0x4204B8: cmd_restore (cmds-restore.c:1284)
  ==8528==by 0x4043FE: main (btrfs.c:286)
  ==8528==  Address 0x66956a0 is 7,056 bytes inside a block of size
  8,192
  alloc'd
  ==8528==at 0x4C277AB: malloc (in
  /usr/lib64/valgrind/vgpreload_memcheck- amd64-linux.so)
  ==8528==by 0x41EEAD: search_dir (cmds-restore.c:316)
  ==8528==by 0x41F8D0: search_dir (cmds-restore.c:895)
  ==8528==by 0x41F8D0: search_dir (cmds-restore.c:895)
  ==8528==by 0x41F8D0: search_dir (cmds-restore.c:895)
  ==8528==by 0x41F8D0: search_dir (cmds-restore.c:895)
  ==8528==by 0x41F8D0: search_dir (cmds-restore.c:895)
  ==8528==by 0x4204B8: cmd_restore (cmds-restore.c:1284)
  ==8528==by 0x4043FE: main (btrfs.c:286)
 
 ---[snip]-
  leaks ...
 --
   
   For the leak below...
   I've no idea why the @decompress_lzo() is not statisfied with @inbuf
   with the exact size of the disk bytes.
   Or maybe the compressed data had just sufferred damages...
   
   BTW, when you wrote your data, did that kernel has the following commit
   for btrfs?
   
 commit: 59516f6017c589e7316418fda6128ba8f829a77f
  
  mmh, I used the master branch which is still on 3.14.2 (from k.org).
  
  Ah, there is a development branch on another repo (repo.or.cz). Why oh why?
 
 Guy, 
 
 sorry to quote an earlier mail, I forgot to add you as CC on you latest post 
 and I'm not subscribed to the list.
 
  There is a development branch for btrfs-progs from david:
  http://github.com/kdave/btrfs-progs.git if you would like to try.
 
 ok, thanks will try.
 
  But here, what I mean is your *kernel* version when you wrote your data.
 
 I'm using btrfs since 3.14 or so (and maybe also some random distro kernel 
 based on 3.11). The partition contained a lot of larger git trees and virtual 
 machines - yes, not ideal for btrfs but a nice testcase ...
 
  There is a change for btrfs-restore which depends on a kernel commit.
  If you wrote your data with a older kernel and apply the 3.14.2
  btrfs-progs to restore, then there may be wandering stuffs.
 
 wow. That should never happend I think. Userspace should always be able to 
 fix 
 corruptions made by earlier kernels (except disk layout changes maybe).
 
  Now, I am just suspecting such a scenario.
 
 Possbile. So how to proceed? If I checkout the latest brtfs from the repo 
 above and restore again, are you still interested in the results?

Ah, I think you could clone the progs from the repo and apply the two
small pieces that I mentioned before.
Yes, I am still trying to follow the issues with restore. It seems
btrfs-restore needs more effect from btrfs developers since it doesn't
survive tough scenarioes.

 It seems there are lots of people reporting corruptions on the list and also 
 lots of fixes posted. Maybe it's better to restart from new (format a the 
 partiton) and report problems happen after that. What do you think?

Oh, I think you've just found a really good case for btrfs-restore.
Maybe you could keep a image of that, just like Zooko did here:
https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg36701.html

Thanks,
-Gui

 Marc


--
To unsubscribe from this list: send the line unsubscribe 

Re: superblock checksum mismatch after crash, cannot mount

2014-08-25 Thread Austin S Hemmelgarn
On 2014-08-24 15:48, Chris Murphy wrote:
 
 On Aug 24, 2014, at 10:59 AM, Flash ROM flashromg...@yandex.com wrote:
 While it sounds dumb, this strange thing being done to put partition table 
 in separate erase block, so it never read-modify-written when FAT entries 
 are updated. Should something go wrong, FAR can recover from backup copy. 
 But erased partition table just suxx. Then, FAT tables are aligned in way to 
 fit well around erase block bounds.
 
 I think you seriously overestimate the knowledge of camera manufacturer's 
 about the details of flash storage; and any ability to discover it; and any 
 willingness on the part of the flash manufacturer to reveal such underlying 
 details. The whole point of these cards is to completely abstract the reality 
 of the underlying hardware from the application layer - in this case the 
 camera or mobile device using it.
 
If you really know what you are doing, it is possible to determine erase
block size by looking at device performance timings, with surprisingly
high accuracy (assuming you aren't trying to have software do it for
you).  I've actually done this before on several occasions, with nearly
100% success.
 Also, with SDXC exFAT is now specified. And it has only one FAT there isn't a 
 backup FAT. So they're even more difficult to recover data from should things 
 go awry filesystem wise.
 
It's too bad that TFAT didn't catch on, as it would have been great for
SD cards if it could be configured to put each FAT on a different erase
block.
 
 This said, you can *try* to reformat, BUT no standard OS of firmware 
 formatter will help you with default settings. They can't know geometry of 
 underlying NAND and controller properties. There is no standard, widely 
 accepted way to get such information from card. No matter if you use OS 
 formatter, camera formatter or whatever. YOU WILL RUIN factory format (which 
 is crafted in best possible way) and replace it with another, very likely 
 suboptimal one.
 
 It's recommended by the card manufacturers to reformat it in each camera its 
 inserted into. It's the only recommended way to erase the sd card for 
 re-use, they don't recommend selectively deleting images. And it's known that 
 one camera's partition table and formatting can irritate another camera 
 make/model if the card isn't reformatted by that camera.
 
It's not just cameras that have this issue, a lot of other hardware
makes stupid assumptions about the format of media.  The first firmware
release for the Nintendo Wii for example, chocked if you tried to use an
SD card with more than one partition on it, and old desktop versions of
Windows won't ever show you anything other than the first partition on
an SD card (or most USB storage devices for that matter).




smime.p7s
Description: S/MIME Cryptographic Signature


Most recent stable enough btrfs-tools?

2014-08-25 Thread Martin Steigerwald
Hello!

I am a bit confused about btrfs-progs git repo URLs and branches.

What is the latest stuff that stills supposed to work okay?

My /home BTRFS RAID 1 on two SSDs filesystem has an error with btrfs check that 
btrfs-tools 3.14.1 cannot repair.

The repo I found on git.kernel.org

git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git

seems to be stuck at 3.14.2.

But well, now I see, there is an integration branch, buts its also just at:

commit 7b050795a01acb2bec0db84991b4bc9c8680e275
Author: Chris Mason c...@fb.com
Date:   Wed May 28 17:01:39 2014 -0400

scrub: fix uninit return variable in scrub_progress_cycle

Signed-off-by: Chris Mason c...@fb.com


Before posting details on this I would like to make sure trying with the most 
recent stuff.

What version do you recommend to try? Kernel wise I am on 3.16.1 plus to BTRFS 
hang / corruption fix patches from this mailing list. But I intend to switch to 
3.17-rc2 once it is out.

Thanks, 
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] Btrfs: fix task hang under heavy compressed write

2014-08-25 Thread Chris Mason
On 08/15/2014 11:36 AM, Liu Bo wrote:
 This has been reported and discussed for a long time, and this hang occurs in
 both 3.15 and 3.16.

[ great description ]

I ran this through tests last week, and an overnight test over the
weekend.  It's in my for-linus branch now, along with everything else I
plan on sending for rc3.

Please double check my merge, I had to undo your rebase onto Miao's patches.

-chris
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] Btrfs: fix task hang under heavy compressed write

2014-08-25 Thread Liu Bo
On Mon, Aug 25, 2014 at 10:58:13AM -0400, Chris Mason wrote:
 On 08/15/2014 11:36 AM, Liu Bo wrote:
  This has been reported and discussed for a long time, and this hang occurs 
  in
  both 3.15 and 3.16.
 
 [ great description ]
 
 I ran this through tests last week, and an overnight test over the
 weekend.  It's in my for-linus branch now, along with everything else I
 plan on sending for rc3.
 
 Please double check my merge, I had to undo your rebase onto Miao's patches.

Just checked, looks good ;)

thanks,
-liubo
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: Don't continue mounting when superblock csum mismatches even generation is less than 10.

2014-08-25 Thread David Sterba
On Wed, Aug 20, 2014 at 10:34:53AM +0800, Qu Wenruo wrote:
 Although as mentioned in the reply to David,
 the main problem is that I found two disk images with crazy values in
 superblock and wrong csum,
 but generation is still 4, and ignoring the csum error caused kernel BUG.

Can you please share the dump of the broken superblock
(btrfs-show-super)?  Thanks.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Most recent stable enough btrfs-tools?

2014-08-25 Thread Chris Murphy

On Aug 25, 2014, at 6:00 AM, Martin Steigerwald mar...@lichtvoll.de wrote:

 What is the latest stuff that stills supposed to work okay?

I'm new to git so take this with a grain of salt, but this returns no 
differences:

git diff mason/master sterba/v3.16.x

So I'd say we're about to see a btrfs-progs released.

mason=git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git
sterba=git://repo.or.cz/btrfs-progs-unstable/devel.git


Chris Murphy--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


typo in btrfs-progs master/v3.16.x

2014-08-25 Thread Chris Murphy
git diff mason/master sterba/integration-20140729

diff --git a/cmds-scrub.c b/cmds-scrub.c
index 731c5c9..0bf06ee 100644
--- a/cmds-scrub.c
+++ b/cmds-scrub.c
@@ -1527,16 +1527,16 @@ out:
 
 static const char * const cmd_scrub_start_usage[] = {
btrfs scrub start [-BdqrRf] [-c ioprio_class -n ioprio_classdata] 
path|device,
-   Start a new scrub. If a scrub is already running, the new one fails.,
+   Start a new scrub,
,
-B do not background,
-d stats per device (-B only),
-q be quiet,
-r read only mode,
-   -R raw print mode, print full data instead of summary,
+   -R raw print mode, print full data instead of summary


Looks like a missing , at the end of this line. All other lines end in ,



Chris Murphy

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: typo in btrfs-progs master/v3.16.x

2014-08-25 Thread Chris Murphy

On Aug 25, 2014, at 4:32 PM, David Sterba dste...@suse.cz wrote:

 On Mon, Aug 25, 2014 at 04:09:16PM -0600, Chris Murphy wrote:
 static const char * const cmd_scrub_start_usage[] = {
btrfs scrub start [-BdqrRf] [-c ioprio_class -n ioprio_classdata] 
 path|device,
 -   Start a new scrub. If a scrub is already running, the new one 
 fails.,
 +   Start a new scrub,
,
-B do not background,
-d stats per device (-B only),
-q be quiet,
-r read only mode,
 -   -R raw print mode, print full data instead of summary,
 +   -R raw print mode, print full data instead of summary
 
 
 Looks like a missing , at the end of this line. All other lines end in ,
 
 Thanks for checking, the v3.16.x version is correct.

Right. I had the diffs reversed from what I thought they were, so I came to the 
wrong conclusion.

Chris Murphy--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix corruption after write/fsync failure + fsync + log recovery

2014-08-25 Thread Liu Bo
On Mon, Aug 25, 2014 at 10:43:00AM +0100, Filipe Manana wrote:
 While writing to a file, in inode.c:cow_file_range() (and same applies to
 submit_compressed_extents()), after reserving an extent for the file data,
 we create a new extent map for the written range and insert it into the
 extent map cache. After that, we create an ordered operation, but if it
 fails (due to a transient/temporary-ENOMEM), we return without dropping
 that extent map, which points to a reserved extent that is freed when we
 return. A subsequent incremental fsync (when the btrfs inode doesn't have
 the flag BTRFS_INODE_NEEDS_FULL_SYNC) considers this extent map valid and
 logs a file extent item based on that extent map, which points to a disk
 extent that doesn't contain valid data - it was freed by us earlier, at this
 point it might contain any random/garbage data.
 
 Therefore, if we reach an error condition when cowing a file range after
 we added the new extent map to the cache, drop it from the cache before
 returning.
 
 Some sequence of steps that lead to this:
 
 $ mkfs.btrfs -f /dev/sdd
 $ mount -o commit= /dev/sdd /mnt
 $ cd /mnt
 
 $ xfs_io -f -c pwrite -S 0x01 -b 4096 0 4096 -c fsync foo
 $ xfs_io -c pwrite -S 0x02 -b 4096 4096 4096
 $ sync
 
 $ od -t x1 foo
 000 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01
 *
 001 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02
 *
 002
 
 $ xfs_io -c pwrite -S 0xa1 -b 4096 0 4096 foo
 
 # Now this write + fsync fail with -ENOMEM, which was returned by
 # btrfs_add_ordered_extent() in inode.c:cow_file_range().
 $ xfs_io -c pwrite -S 0xff -b 4096 4096 4096 foo
 $ xfs_io -c fsync foo
 fsync: Cannot allocate memory
 
 # Now do a new write + fsync, which will succeed. Our previous
 # -ENOMEM was a transient/temporary error.
 $ xfs_io -c pwrite -S 0xee -b 4096 16384 4096 foo
 $ xfs_io -c fsync foo
 
 # Our file content (in page cache) is now:
 $ od -t x1 foo
 000 a1 a1 a1 a1 a1 a1 a1 a1 a1 a1 a1 a1 a1 a1 a1 a1
 *
 001 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 *
 002 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 *
 004 ee ee ee ee ee ee ee ee ee ee ee ee ee ee ee ee
 *
 005
 
 # Now reboot the machine, and mount the fs, so that fsync log replay
 # takes place.
 
 # The file content is now weird, in particular the first 8Kb, which
 # do not match our data before nor after the sync command above.
 $ od -t x1 foo
 000 ee ee ee ee ee ee ee ee ee ee ee ee ee ee ee ee
 *
 001 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01
 *
 002 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 *
 004 ee ee ee ee ee ee ee ee ee ee ee ee ee ee ee ee
 *
 005
 
 # In fact these first 4Kb are a duplicate of the last 4kb block.
 # The last write got an extent map/file extent item that points to
 # the same disk extent that we got in the write+fsync that failed
 # with the -ENOMEM error. btrfs-debug-tree and btrfsck allow us to
 # verify that:
 
 $ btrfs-debug-tree /dev/sdd
 (...)
   item 6 key (257 EXTENT_DATA 0) itemoff 15819 itemsize 53
   extent data disk byte 12582912 nr 8192
   extent data offset 0 nr 8192 ram 8192
   item 7 key (257 EXTENT_DATA 8192) itemoff 15766 itemsize 53
   extent data disk byte 0 nr 0
   extent data offset 0 nr 8192 ram 8192
   item 8 key (257 EXTENT_DATA 16384) itemoff 15713 itemsize 53
   extent data disk byte 12582912 nr 4096
   extent data offset 0 nr 4096 ram 4096
 
 $ umount /dev/sdd
 $ btrfsck /dev/sdd
 Checking filesystem on /dev/sdd
 UUID: db5e60e1-050d-41e6-8c7f-3d742dea5d8f
 checking extents
 extent item 12582912 has multiple extent items
 ref mismatch on [12582912 4096] extent item 1, found 2
 Backref bytes do not match extent backref, bytenr=12582912, ref 
 bytes=4096, backref bytes=8192
 backpointer mismatch on [12582912 4096]
 Errors found in extent allocation tree or chunk allocation
 checking free space cache
 checking fs roots
 root 5 inode 257 errors 1000, some csum missing
 found 131074 bytes used err is 1
 total csum bytes: 4
 total tree bytes: 131072
 total fs tree bytes: 32768
 total extent tree bytes: 16384
 btree space waste bytes: 123404
 file data blocks allocated: 274432
  referenced 274432
 Btrfs v3.14.1-96-gcc7fd5a-dirty
 
 Signed-off-by: Filipe Manana fdman...@suse.com
 ---
  fs/btrfs/inode.c | 12 +---
  1 file changed, 9 insertions(+), 3 deletions(-)
 
 diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
 index c678dea..16e8146 100644
 --- a/fs/btrfs/inode.c
 +++ b/fs/btrfs/inode.c
 @@ -792,8 +792,12 @@ retry:
   ins.offset,