Re: [PATCH 0/6] fix INT_MAX readdir hang, plus cleanups

2013-07-11 Thread Zach Brown
On Mon, Jul 01, 2013 at 08:54:35AM -0400, Josef Bacik wrote:
> On Tue, Jun 04, 2013 at 06:17:54PM -0400, Zach Brown wrote:
> > 
> > I finally sat down to fix that readdir hang that has been in the back
> > of my mind for a while.  I *hope* that the fix is pretty simple: just
> > don't manufacture a fake f_pos, I *think* we can abuse f_version as an
> > indicator that we shouldn't return entries.  Does this look reasonable?
>
> One of these patches is making new entries not show up in readdir.  This was
> discovered while running stress.sh overnight, it complained about files not
> matching but when they were checked the files matched.  Dropping the entire
> series made stress.sh run fine.  So I'm dropping these for the next merge 
> window
> but I'll dig into it and try and figure out what was causing the problem.

OK, how about this.

First, just drop the series.  Most of it was opportunistic cleanups that
I saw as I was reading the code.  But it certainly isn't a comprehensive
cleanup.  We'd still want to go back later and fix how inefficient the
delayed item data structures are for readdir.  So from some perspective
it's just risky churn with little upside.

And I think I found a much simpler way to stop the current readdir from
looping instead of mucking around with f_version.  Just use LLONG_MAX if
entries have already overflowed INT_MAX.

We'd still want real freed offset reuse some day, but that's a bunch of
work that'll have to be done very carefully.  This will at least stop
64bit apps from failing with offsets past 64bits and is very low risk.

So just add this patch and forget about the rest of the series?

It'll still technically conflict with the s/filp->f_pos/ctx->pos/ in the
readdir interface change in -next but that fixup is trivial.

- z

>From e295deb0f56f846738ee3dec63fe0350f3952503 Mon Sep 17 00:00:00 2001
From: Zach Brown 
Date: Wed, 10 Jul 2013 19:48:51 -0400
Subject: [PATCH] btrfs: don't loop on large offsets in readdir

When btrfs readdir() hits the last entry it sets the readdir offset to a
huge value to stop buggy apps from breaking when the same name is
returned by readdir() with concurrent rename()s.

But unconditionally setting the offset to INT_MAX causes readdir() to
loop returning any entries with offsets past INT_MAX.  It only takes a
few hours of constant file creation and removal to create entries past
INT_MAX.

So let's set the huge offset to LLONG_MAX if the last entry has already
overflowed 32bit loff_t.   Without large offsets behaviour is identical.
With large offsets 64bit apps will work and 32bit apps will be no more
broken than they currently are if they see large offsets.

Signed-off-by: Zach Brown 
---
 fs/btrfs/inode.c | 33 +
 1 file changed, 25 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index c157482..8583e8d 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -5186,14 +5186,31 @@ next:
}
 
/* Reached end of directory/root. Bump pos past the last item. */
-   if (key_type == BTRFS_DIR_INDEX_KEY)
-   /*
-* 32-bit glibc will use getdents64, but then strtol -
-* so the last number we can serve is this.
-*/
-   filp->f_pos = 0x7fff;
-   else
-   filp->f_pos++;
+   filp->f_pos++;
+
+   /* 
+* Stop new entries from being returned after we return the last
+* entry.
+*
+* New directory entries are assigned a strictly increasing
+* offset.  This means that new entries created during readdir
+* are *guaranteed* to be seen in the future by that readdir.
+* This has broken buggy programs which operate on names as
+* they're returned by readdir.  Until we re-use freed offsets
+* we have this hack to stop new entries from being returned
+* under the assumption that they'll never reach this huge
+* offset.
+*
+* This is being careful not to overflow 32bit loff_t unless the
+* last entry requires it because doing so has broken 32bit apps
+* in the past.
+*/
+   if (key_type == BTRFS_DIR_INDEX_KEY) {
+   if (filp->f_pos >= INT_MAX)
+   filp->f_pos = LLONG_MAX;
+   else
+   filp->f_pos = INT_MAX;
+   }
 nopos:
ret = 0;
 err:
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs: stat(2) and /proc/pid/maps returns different devices

2013-07-11 Thread Andrew Vagin
On Wed, Jul 10, 2013 at 10:45:45AM -0700, Mark Fasheh wrote:
> On Wed, Jul 10, 2013 at 09:31:05AM -0700, Mark Fasheh wrote:
> > As far as I can tell we'll be carrying this patch until a better
> > solution is possible.
> > 
> > When that will happen, I don't know.
> > --Mark
> 
> Well, what do I get when I pretend I don't care any more? The little voice
> in my head says "keep plugging away". Here's another attempt at fixing this
> problem in a sane manner. Basically, this time we're adding a flag to
> s_flags which btrfs sets. Proc will see the flag and call ->getattr().
> 
> This compiles, but it needs testing (which I will get to soon). It still has
> a bunch of problems in my honest opinion but maybe if we get something
> acceptable upstream we can work from there.
> 
> Also, as Andrew pointed out there's more than one place which is return
> different device than from stat(2) so I probably need to update more sites
> to deal with this.

Yes, we need to fix unix_diag, fanotify fdinfo, ...

> 
> Does anyone see a problem with this approach?

Looks good for me. Thanks.

>   --Mark
> 
> --
> Mark Fasheh
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs-progs: mkfs.btrfs documentation: clarify current restrictions of sectorsize, nodesize and leafsize

2013-07-11 Thread Koen De Wit
Commit 8d082fb727ac11930ea20bf1612e334ea7c2b697 (Btrfs: do not mount when
we have a sectorsize unequal to PAGE_SIZE) requires the sectorsize to be
equal to the pagesize for the filesystem to be mountable.

The nodesize and leafsize should be equal, and not larger than 65536.

Adding this information to the manpage and usage instructions of mkfs.btrfs.
---
 man/mkfs.btrfs.8.in |   12 +---
 mkfs.c  |2 +-
 2 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/man/mkfs.btrfs.8.in b/man/mkfs.btrfs.8.in
index a3f1503..b2a4e73 100644
--- a/man/mkfs.btrfs.8.in
+++ b/man/mkfs.btrfs.8.in
@@ -46,7 +46,8 @@ there is a filesystem or partition table on the device 
already.
 .TP
 \fB\-l\fR, \fB\-\-leafsize \fIsize\fR
 Specify the leaf size, the least data item in which btrfs stores data. The
-default value is the page size.
+default value is the page size. Must be a multiple of the sectorsize, but
+not larger than 65536. Should be equal to the nodesize.
 .TP
 \fB\-L\fR, \fB\-\-label \fIname\fR
 Specify a label for the filesystem.
@@ -66,10 +67,15 @@ larger filesystems.  It is recommended for use with 
filesystems
 of 1 GiB or smaller.
 .TP
 \fB\-n\fR, \fB\-\-nodesize \fIsize\fR
-Specify the nodesize. By default the value is set to the pagesize.
+Specify the nodesize. By default the value is set to the pagesize. Must be a
+multiple of the sectorsize, but not larger than 65536. Should be equal to the
+leafsize.
 .TP
 \fB\-s\fR, \fB\-\-sectorsize \fIsize\fR
-Specify the sectorsize, the minimum block allocation.
+Specify the sectorsize, the minimum block allocation. The default value is
+the pagesize. If the sectorsize differs from the pagesize, the created
+filesystem cannot be mounted by the current kernel. Therefore it is not
+recommended to use this option.
 .TP
 \fB\-r\fR, \fB\-\-rootdir \fIrootdir\fR
 Specify a directory to copy into the newly created fs.
diff --git a/mkfs.c b/mkfs.c
index b412b7e..9f75c58 100644
--- a/mkfs.c
+++ b/mkfs.c
@@ -319,7 +319,7 @@ static void print_usage(void)
fprintf(stderr, "\t -m --metadata metadata profile, values like data 
profile\n");
fprintf(stderr, "\t -M --mixed mix metadata and data together\n");
fprintf(stderr, "\t -n --nodesize size of btree nodes\n");
-   fprintf(stderr, "\t -s --sectorsize min block allocation\n");
+   fprintf(stderr, "\t -s --sectorsize min block allocation (not mountable 
by current kernel)\n");
fprintf(stderr, "\t -r --rootdir the source directory\n");
fprintf(stderr, "\t -K --nodiscard do not perform whole device TRIM\n");
fprintf(stderr, "\t -V --version print the mkfs.btrfs version and 
exit\n");
-- 
1.7.2.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/5] Btrfs: batch the extent state operation in the end io handle of the read page

2013-07-11 Thread Josef Bacik
On Thu, Jul 11, 2013 at 01:25:39PM +0800, Miao Xie wrote:
> It is unnecessary to unlock the extent by the page size, we can do it
> in batches, it makes the random read be faster by ~6%.
> 
> Signed-off-by: Miao Xie 
> ---
>  fs/btrfs/extent_io.c | 70 
> ++--
>  1 file changed, 40 insertions(+), 30 deletions(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 9f4dedf..8f95418 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -762,15 +762,6 @@ static void cache_state(struct extent_state *state,
>   }
>  }
>  
> -static void uncache_state(struct extent_state **cached_ptr)
> -{
> - if (cached_ptr && (*cached_ptr)) {
> - struct extent_state *state = *cached_ptr;
> - *cached_ptr = NULL;
> - free_extent_state(state);
> - }
> -}
> -
>  /*
>   * set some bits on a range in the tree.  This may require allocations or
>   * sleeping, so the gfp mask is used to indicate what is allowed.
> @@ -2395,6 +2386,18 @@ static void end_bio_extent_writepage(struct bio *bio, 
> int err)
>   bio_put(bio);
>  }
>  
> +static void
> +endio_readpage_release_extent(struct extent_io_tree *tree, u64 start, u64 
> len,
> +   int uptodate)
> +{
> + struct extent_state *cached = NULL;
> + u64 end = start + len - 1;
> +
> + if (uptodate && tree->track_uptodate)
> + set_extent_uptodate(tree, start, end, &cached, GFP_ATOMIC);
> + unlock_extent_cached(tree, start, end, &cached, GFP_ATOMIC);
> +}
> +
>  /*
>   * after a readpage IO is done, we need to:
>   * clear the uptodate bits on error
> @@ -2417,6 +2420,8 @@ static void end_bio_extent_readpage(struct bio *bio, 
> int err)
>   u64 start;
>   u64 end;
>   u64 len;
> + u64 extent_start = 0;
> + u64 extent_len = 0;
>   int mirror;
>   int ret;
>  
> @@ -2425,8 +2430,6 @@ static void end_bio_extent_readpage(struct bio *bio, 
> int err)
>  
>   do {
>   struct page *page = bvec->bv_page;
> - struct extent_state *cached = NULL;
> - struct extent_state *state;
>   struct inode *inode = page->mapping->host;
>  
>   pr_debug("end_bio_extent_readpage: bi_sector=%llu, err=%d, "
> @@ -2452,17 +2455,6 @@ static void end_bio_extent_readpage(struct bio *bio, 
> int err)
>   if (++bvec <= bvec_end)
>   prefetchw(&bvec->bv_page->flags);
>  
> - spin_lock(&tree->lock);
> - state = find_first_extent_bit_state(tree, start, EXTENT_LOCKED);
> - if (likely(state && state->start == start)) {
> - /*
> -  * take a reference on the state, unlock will drop
> -  * the ref
> -  */
> - cache_state(state, &cached);
> - }
> - spin_unlock(&tree->lock);
> -
>   mirror = io_bio->mirror_num;
>   if (likely(uptodate && tree->ops &&
>  tree->ops->readpage_end_io_hook)) {
> @@ -2501,18 +2493,11 @@ static void end_bio_extent_readpage(struct bio *bio, 
> int err)
>   test_bit(BIO_UPTODATE, &bio->bi_flags);
>   if (err)
>   uptodate = 0;
> - uncache_state(&cached);
>   continue;
>   }
>   }
>  readpage_ok:
> - if (uptodate && tree->track_uptodate) {
> - set_extent_uptodate(tree, start, end, &cached,
> - GFP_ATOMIC);
> - }
> - unlock_extent_cached(tree, start, end, &cached, GFP_ATOMIC);
> -
> - if (uptodate) {
> + if (likely(uptodate)) {
>   loff_t i_size = i_size_read(inode);
>   pgoff_t end_index = i_size >> PAGE_CACHE_SHIFT;
>   unsigned offset;
> @@ -2528,8 +2513,33 @@ readpage_ok:
>   }
>   unlock_page(page);
>   offset += len;
> +
> + if (unlikely(!uptodate)) {
> + if (extent_len) {
> + endio_readpage_release_extent(tree,
> +   extent_start,
> +   extent_len, 1);
> + extent_start = 0;
> + extent_len = 0;
> + }
> + endio_readpage_release_extent(tree, start,
> +   end - start + 1, 0);
> + } else if (!extent_len) {
> + extent_start = start;
> + extent_len = end + 1 - start;
> + } else if (extent_start + extent_len == start) {
> +

[PATCH v5] Btrfs-progs: fix restore command leaving corrupted files

2013-07-11 Thread Filipe David Borba Manana
When there are files that have parts shared with snapshots, the
restore command was incorrectly restoring them, as it was not
taking into account the offset and number of bytes fields from
the file extent item. Besides leaving the recovered file corrupt,
it was also inneficient as it read and wrote more data than needed
(with each extent copy overwriting portions of the one previously
written).

The following steps show how to reproduce this corruption issue:

$ mkfs.btrfs -f  /dev/sdb3
$ mount /dev/sdb3 /mnt/btrfs
$ perl -e '$d = "\x41" . ("\x00" x (1024*1024+349)); 
open($f,">","/mnt/btrfs/foobar"); print $f $d; close($f);'
$ du -b /mnt/btrfs/foobar
1048926 /mnt/btrfs/foobar
$ md5sum /mnt/btrfs/foobar
f9f778f3a7410c40e4ed104a3a63c3c4  /mnt/btrfs/foobar

$ btrfs subvolume snapshot /mnt/btrfs /mnt/btrfs/my_snap
$ perl -e 'open($f, "+<", "/dev/btrfs/foobar"); seek($f, 4096, 0); print $f 
"\xff"; close($f);'
$ md5sum /mnt/btrfs/foobar
b983fcefd4622a03a78936484c40272b  /mnt/btrfs/foobar
$ umount /mnt/btrfs

$ btrfs restore /dev/sdb3 /tmp/copy
$ du -b /tmp/copy/foobar
1048926 /tmp/copy/foobar
$ md5sum /tmp/copy/foobar
88db338cbc1c44dfabae083f1ce642d5  /tmp/copy/foobar
$ od -t x1 -j 8192 -N 4 /tmp/copy/foobar
002 41 00 00 00
0020004
$ mount /dev/sdb3 /mnt/btrfs
$ od -t x1 -j 8192 -N 4 /mnt/btrfs/foobar
002 00 00 00 00
0020004
$ md5sum /mnt/btrfs/foobar
b983fcefd4622a03a78936484c40272b  /mnt/btrfs/foobar

Tested this change with zlib, lzo compression and file sizes larger
than 1GiB, and found no regression or other corruption issues (so far
at least).

Signed-off-by: Filipe David Borba Manana 
---

V2: updated commit message to include the C preprocessor macros
in the C program.
V3: updated commit message again to reflect the file size used in
the example in the C program macro.
V4: fixed wrong path in commit message in the perl command line.
V5: updated commit message with simpler steps to reproduce the issue.

 cmds-restore.c |   13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/cmds-restore.c b/cmds-restore.c
index e48df40..9688599 100644
--- a/cmds-restore.c
+++ b/cmds-restore.c
@@ -272,6 +272,7 @@ static int copy_one_extent(struct btrfs_root *root, int fd,
u64 bytenr;
u64 ram_size;
u64 disk_size;
+   u64 num_bytes;
u64 length;
u64 size_left;
u64 dev_bytenr;
@@ -288,7 +289,9 @@ static int copy_one_extent(struct btrfs_root *root, int fd,
disk_size = btrfs_file_extent_disk_num_bytes(leaf, fi);
ram_size = btrfs_file_extent_ram_bytes(leaf, fi);
offset = btrfs_file_extent_offset(leaf, fi);
-   size_left = disk_size;
+   num_bytes = btrfs_file_extent_num_bytes(leaf, fi);
+   size_left = num_bytes;
+   bytenr += offset;
 
if (offset)
printf("offset is %Lu\n", offset);
@@ -296,7 +299,7 @@ static int copy_one_extent(struct btrfs_root *root, int fd,
if (disk_size == 0)
return 0;
 
-   inbuf = malloc(disk_size);
+   inbuf = malloc(size_left);
if (!inbuf) {
fprintf(stderr, "No memory\n");
return -1;
@@ -351,8 +354,8 @@ again:
goto again;
 
if (compress == BTRFS_COMPRESS_NONE) {
-   while (total < ram_size) {
-   done = pwrite(fd, inbuf+total, ram_size-total,
+   while (total < num_bytes) {
+   done = pwrite(fd, inbuf+total, num_bytes-total,
  pos+total);
if (done < 0) {
ret = -1;
@@ -365,7 +368,7 @@ again:
goto out;
}
 
-   ret = decompress(inbuf, outbuf, disk_size, &ram_size, compress);
+   ret = decompress(inbuf, outbuf, num_bytes, &ram_size, compress);
if (ret) {
num_copies = btrfs_num_copies(&root->fs_info->mapping_tree,
  bytenr, length);
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: lz4 status?

2013-07-11 Thread Chris Mason
Quoting David Sterba (2013-07-10 18:54:54)
> On Sun, Jun 30, 2013 at 12:35:09PM -0500, Mitch Harder wrote:
> > There's been a parallel effort to incorporate a general set of lz4
> > patches in the kernel.
> > 
> > I see these patches are currently queued up in the linux-next tree, so
> > we may see them in the 3.11 kernel.
> 
> The patches are now merged into 3.11.
> 
> > It looks like lz4 and lz4hc will be provided.
> 
> Regarding HC mode, there are some core compression code changes needed
> in order to fully utilize the its potential, namely larger chunk size
> that's compressed at a time. There was some tiny yet measurable gain of
> HC against ordinary mode compared on current 4k-at-a-time
> implementation, but the space savings did not justify the speed drop of
> HC mode.
> 
> I can't say if the patchset will be ready for 3.12 though.

The current limits on the amount of data compressed at a time and the
amount of delayed allocation sent down at a time were pulled out of the
air.  

Changes to those limits are definitely ok if they are helping specific
workloads.

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: allow splitting of hole em's when dropping extent cache

2013-07-11 Thread Josef Bacik
I noticed while running multi-threaded fsync tests that sometimes fsck would
complain about an improper gap.  This happens because we fail to add a hole
extent to the file, which was happening when we'd split a hole EM because
btrfs_drop_extent_cache was just discarding the whole em instead of splitting
it.  So this patch fixes this by allowing us to split a hole em properly, which
means that added holes actually get logged properly and we no longer see this
fsck error.  Thankfully we're tolerant of these sort of problems so a user would
not see any adverse effects of this bug, other than fsck complaining.  Thanks,

Signed-off-by: Josef Bacik 
---
 fs/btrfs/file.c |   62 +++---
 1 files changed, 40 insertions(+), 22 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 2d70849..dda2efb 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -595,20 +595,29 @@ void btrfs_drop_extent_cache(struct inode *inode, u64 
start, u64 end,
if (no_splits)
goto next;
 
-   if (em->block_start < EXTENT_MAP_LAST_BYTE &&
-   em->start < start) {
+   if (em->start < start) {
split->start = em->start;
split->len = start - em->start;
-   split->orig_start = em->orig_start;
-   split->block_start = em->block_start;
 
-   if (compressed)
-   split->block_len = em->block_len;
-   else
-   split->block_len = split->len;
-   split->ram_bytes = em->ram_bytes;
-   split->orig_block_len = max(split->block_len,
-   em->orig_block_len);
+   if (em->block_start < EXTENT_MAP_LAST_BYTE) {
+   split->orig_start = em->orig_start;
+   split->block_start = em->block_start;
+
+   if (compressed)
+   split->block_len = em->block_len;
+   else
+   split->block_len = split->len;
+   split->orig_block_len = max(split->block_len,
+   em->orig_block_len);
+   split->ram_bytes = em->ram_bytes;
+   } else {
+   split->orig_start = split->start;
+   split->block_len = 0;
+   split->block_start = em->block_start;
+   split->orig_block_len = 0;
+   split->ram_bytes = split->len;
+   }
+
split->generation = gen;
split->bdev = em->bdev;
split->flags = flags;
@@ -619,8 +628,7 @@ void btrfs_drop_extent_cache(struct inode *inode, u64 
start, u64 end,
split = split2;
split2 = NULL;
}
-   if (em->block_start < EXTENT_MAP_LAST_BYTE &&
-   testend && em->start + em->len > start + len) {
+   if (testend && em->start + em->len > start + len) {
u64 diff = start + len - em->start;
 
split->start = start + len;
@@ -629,18 +637,28 @@ void btrfs_drop_extent_cache(struct inode *inode, u64 
start, u64 end,
split->flags = flags;
split->compress_type = em->compress_type;
split->generation = gen;
-   split->orig_block_len = max(em->block_len,
+
+   if (em->block_start < EXTENT_MAP_LAST_BYTE) {
+   split->orig_block_len = max(em->block_len,
em->orig_block_len);
-   split->ram_bytes = em->ram_bytes;
 
-   if (compressed) {
-   split->block_len = em->block_len;
-   split->block_start = em->block_start;
-   split->orig_start = em->orig_start;
+   split->ram_bytes = em->ram_bytes;
+   if (compressed) {
+   split->block_len = em->block_len;
+   split->block_start = em->block_start;
+   split->orig_start = em->orig_start;
+   } else {
+   split->block_len = split->len;
+   split->block_start = em->block_start
+   + diff;
+   split->orig

Re: [PATCH 2/5] Btrfs: add branch prediction hints in the read page end IO function

2013-07-11 Thread Chris Mason
Do you have benchmark numbers for how much these help?  I hesitate to
bring in the likely/unlikely unless we can see it on the benchmarks.

(The patch does look fine though)

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/5] Btrfs: don't cache the csum value into the extent state tree

2013-07-11 Thread Chris Mason
Quoting Miao Xie (2013-07-11 01:25:38)
> Before applying this patch, we cached the csum value into the extent state
> tree when reading some data from the disk, this operation increased the lock
> contention of the state tree.
> 
> Now, we just store the csum value into the bio structure or other unshared
> structure, so we can reduce the lock contention.

Perfect, this is a great way to use the extra bio struct.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: make errors in btrfs_num_copies less noisy

2013-07-11 Thread David Sterba
The log message level 'critical' is verbose enough, 'emergency' beeps on
all terminals.

Signed-off-by: David Sterba 
---
 fs/btrfs/volumes.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index b2d1eac..1fd7b5d 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -4194,13 +4194,13 @@ int btrfs_num_copies(struct btrfs_fs_info *fs_info, u64 
logical, u64 len)
 * and exit, so return 1 so the callers don't try to use other copies.
 */
if (!em) {
-   btrfs_emerg(fs_info, "No mapping for %Lu-%Lu\n", logical,
+   btrfs_crit(fs_info, "No mapping for %Lu-%Lu\n", logical,
logical+len);
return 1;
}
 
if (em->start > logical || em->start + em->len < logical) {
-   btrfs_emerg(fs_info, "Invalid mapping for %Lu-%Lu, got "
+   btrfs_crit(fs_info, "Invalid mapping for %Lu-%Lu, got "
"%Lu-%Lu\n", logical, logical+len, em->start,
em->start + em->len);
return 1;
-- 
1.7.9

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: make free space caching faster with many non-inline extent references

2013-07-11 Thread Liu Bo
So to cache free space, we iterate every extent item to gather free space info.

When we have say 10,000 non-inline extent refs(such as BTRFS_EXTENT_DATA_REF),
it takes quite a long time, and since inline extent refs and non-inline ones 
have
same objectid in their keys, we can just re-search the tree with the next 
address
to skip non-inline references.

(This is found by dedup feature because dedup extents can end up with many
non-inline extent refs.)

Signed-off-by: Liu Bo 
---
 fs/btrfs/extent-tree.c |   11 +++
 1 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 0236de7..2796622 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -420,6 +420,7 @@ again:
/* need to make sure the commit_root doesn't disappear */
down_read(&fs_info->extent_commit_sem);
 
+next:
ret = btrfs_search_slot(NULL, extent_root, &key, path, 0, 0);
if (ret < 0)
goto err;
@@ -459,6 +460,16 @@ again:
continue;
}
 
+   if (key.objectid < last) {
+   key.objectid = last;
+   key.offset = 0;
+   key.type = BTRFS_EXTENT_ITEM_KEY;
+
+   caching_ctl->progress = last;
+   btrfs_release_path(path);
+   goto next;
+   }
+
if (key.objectid < block_group->key.objectid) {
path->slots[0]++;
continue;
-- 
1.7.7

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/5] Btrfs: remove unnecessary argument of bio_readpage_error()

2013-07-11 Thread Miao Xie
There are only 4 patches in this patchset, not 5.
Sorry for my mistake.

Miao

On thu, 11 Jul 2013 13:25:36 +0800, Miao Xie wrote:
> Signed-off-by: Miao Xie 
> ---
>  fs/btrfs/extent_io.c | 25 +++--
>  1 file changed, 11 insertions(+), 14 deletions(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index f8586a9..4bfbcc5 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -2202,8 +2202,7 @@ out:
>   */
>  
>  static int bio_readpage_error(struct bio *failed_bio, struct page *page,
> - u64 start, u64 end, int failed_mirror,
> - struct extent_state *state)
> + u64 start, u64 end, int failed_mirror)
>  {
>   struct io_failure_record *failrec = NULL;
>   u64 private;
> @@ -2212,6 +2211,7 @@ static int bio_readpage_error(struct bio *failed_bio, 
> struct page *page,
>   struct extent_io_tree *failure_tree = &BTRFS_I(inode)->io_failure_tree;
>   struct extent_io_tree *tree = &BTRFS_I(inode)->io_tree;
>   struct extent_map_tree *em_tree = &BTRFS_I(inode)->extent_tree;
> + struct extent_state *state;
>   struct bio *bio;
>   int num_copies;
>   int ret;
> @@ -2297,21 +2297,18 @@ static int bio_readpage_error(struct bio *failed_bio, 
> struct page *page,
>* matter what the error is, it is very likely to persist.
>*/
>   pr_debug("bio_readpage_error: cannot repair, num_copies == 1. "
> -  "state=%p, num_copies=%d, next_mirror %d, "
> -  "failed_mirror %d\n", state, num_copies,
> -  failrec->this_mirror, failed_mirror);
> +  "num_copies=%d, next_mirror %d, failed_mirror %d\n", 
> +  num_copies, failrec->this_mirror, failed_mirror);
>   free_io_failure(inode, failrec, 0);
>   return -EIO;
>   }
>  
> - if (!state) {
> - spin_lock(&tree->lock);
> - state = find_first_extent_bit_state(tree, failrec->start,
> - EXTENT_LOCKED);
> - if (state && state->start != failrec->start)
> - state = NULL;
> - spin_unlock(&tree->lock);
> - }
> + spin_lock(&tree->lock);
> + state = find_first_extent_bit_state(tree, failrec->start,
> + EXTENT_LOCKED);
> + if (state && state->start != failrec->start)
> + state = NULL;
> + spin_unlock(&tree->lock);
>  
>   /*
>* there are two premises:
> @@ -2541,7 +2538,7 @@ static void end_bio_extent_readpage(struct bio *bio, 
> int err)
>* can't handle the error it will return -EIO and we
>* remain responsible for that page.
>*/
> - ret = bio_readpage_error(bio, page, start, end, mirror, 
> NULL);
> + ret = bio_readpage_error(bio, page, start, end, mirror);
>   if (ret == 0) {
>   uptodate =
>   test_bit(BIO_UPTODATE, &bio->bi_flags);
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html