[PATCH V20 14/19] Btrfs: subpage-blocksize: Enable dedupe ioctl

2016-07-03 Thread Chandan Rajendra
The function implementing the dedupe ioctl
i.e. btrfs_ioctl_file_extent_same(), returns with an error in
subpage-blocksize scenario. This was done due to the fact that Btrfs did
not have code to deal with block size < page size. This commit removes
this restriction since we now support "block size < page size".

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/ioctl.c | 10 --
 1 file changed, 10 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index fb92566..5d9062e 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3325,21 +3325,11 @@ ssize_t btrfs_dedupe_file_range(struct file *src_file, 
u64 loff, u64 olen,
 {
struct inode *src = file_inode(src_file);
struct inode *dst = file_inode(dst_file);
-   u64 bs = BTRFS_I(src)->root->fs_info->sb->s_blocksize;
ssize_t res;
 
if (olen > BTRFS_MAX_DEDUPE_LEN)
olen = BTRFS_MAX_DEDUPE_LEN;
 
-   if (WARN_ON_ONCE(bs < PAGE_SIZE)) {
-   /*
-* Btrfs does not support blocksize < page_size. As a
-* result, btrfs_cmp_data() won't correctly handle
-* this situation without an update.
-*/
-   return -EINVAL;
-   }
-
res = btrfs_extent_same(src, loff, olen, dst, dst_loff);
if (res)
return res;
-- 
2.5.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V20 19/19] Btrfs: subpage-blocksize: Rate limit scrub error message

2016-07-03 Thread Chandan Rajendra
btrfs/073 invokes scrub ioctl in a tight loop. In subpage-blocksize
scenario this results in a lot of "scrub: size assumption sectorsize !=
PAGE_SIZE " messages being printed on the console. To reduce the number
of such messages this commit uses btrfs_err_rl() instead of
btrfs_err().

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/scrub.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 86270c6..68c8a09 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -3856,7 +3856,7 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 
devid, u64 start,
 
if (fs_info->chunk_root->sectorsize != PAGE_SIZE) {
/* not supported for data w/o checksums */
-   btrfs_err(fs_info,
+   btrfs_err_rl(fs_info,
   "scrub: size assumption sectorsize != PAGE_SIZE "
   "(%d != %lu) fails",
   fs_info->chunk_root->sectorsize, PAGE_SIZE);
-- 
2.5.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V20 09/19] Btrfs: subpage-blocksize: Explicitly track I/O status of blocks of an ordered extent.

2016-07-03 Thread Chandan Rajendra
In subpage-blocksize scenario a page can have more than one block. So in
addition to PagePrivate2 flag, we would have to track the I/O status of
each block of a page to reliably mark the ordered extent as complete.

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/extent_io.c|  19 +--
 fs/btrfs/extent_io.h|   5 +-
 fs/btrfs/inode.c| 365 ++--
 fs/btrfs/ordered-data.c |  19 +++
 fs/btrfs/ordered-data.h |   4 +
 5 files changed, 296 insertions(+), 116 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 303b49e..694d2dc 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4661,11 +4661,10 @@ int extent_invalidatepage(struct extent_io_tree *tree,
  * to drop the page.
  */
 static int try_release_extent_state(struct extent_map_tree *map,
-   struct extent_io_tree *tree,
-   struct page *page, gfp_t mask)
+   struct extent_io_tree *tree,
+   struct page *page, u64 start, u64 end,
+   gfp_t mask)
 {
-   u64 start = page_offset(page);
-   u64 end = start + PAGE_SIZE - 1;
int ret = 1;
 
if (test_range_bit(tree, start, end,
@@ -4699,12 +4698,12 @@ static int try_release_extent_state(struct 
extent_map_tree *map,
  * map records are removed
  */
 int try_release_extent_mapping(struct extent_map_tree *map,
-  struct extent_io_tree *tree, struct page *page,
-  gfp_t mask)
+   struct extent_io_tree *tree, struct page *page,
+   u64 start, u64 end, gfp_t mask)
 {
struct extent_map *em;
-   u64 start = page_offset(page);
-   u64 end = start + PAGE_SIZE - 1;
+   u64 orig_start = start;
+   u64 orig_end = end;
 
if (gfpflags_allow_blocking(mask) &&
page->mapping->host->i_size > SZ_16M) {
@@ -4738,7 +4737,9 @@ int try_release_extent_mapping(struct extent_map_tree 
*map,
free_extent_map(em);
}
}
-   return try_release_extent_state(map, tree, page, mask);
+   return try_release_extent_state(map, tree, page,
+   orig_start, orig_end,
+   mask);
 }
 
 /*
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 91e7a75..2ea8451 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -279,8 +279,9 @@ typedef struct extent_map *(get_extent_t)(struct inode 
*inode,
 void extent_io_tree_init(struct extent_io_tree *tree,
 struct address_space *mapping);
 int try_release_extent_mapping(struct extent_map_tree *map,
-  struct extent_io_tree *tree, struct page *page,
-  gfp_t mask);
+   struct extent_io_tree *tree, struct page *page,
+   u64 start, u64 end,
+   gfp_t mask);
 int try_release_extent_buffer(struct page *page);
 int lock_extent_bits(struct extent_io_tree *tree, u64 start, u64 end,
 struct extent_state **cached);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index e9f9bb1..4ae5c25 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3072,56 +3072,119 @@ static void finish_ordered_fn(struct btrfs_work *work)
btrfs_finish_ordered_io(ordered_extent);
 }
 
-static int btrfs_writepage_end_io_hook(struct page *page, u64 start, u64 end,
-   struct extent_state *state, int uptodate)
+static void mark_blks_io_complete(struct btrfs_ordered_extent *ordered,
+   u64 blk, u64 nr_blks, int uptodate)
 {
-   struct inode *inode = page->mapping->host;
+   struct inode *inode = ordered->inode;
struct btrfs_root *root = BTRFS_I(inode)->root;
-   struct btrfs_ordered_extent *ordered_extent = NULL;
struct btrfs_workqueue *wq;
btrfs_work_func_t func;
-   u64 ordered_start, ordered_end;
int done;
 
-   trace_btrfs_writepage_end_io_hook(page, start, end, uptodate);
+   while (nr_blks--) {
+   if (test_and_set_bit(blk, ordered->blocks_done)) {
+   blk++;
+   continue;
+   }
 
-   ClearPagePrivate2(page);
-loop:
-   ordered_extent = btrfs_lookup_ordered_range(inode, start,
-   end - start + 1);
-   if (!ordered_extent)
-   goto out;
+   done = btrfs_dec_test_ordered_pending(inode, ,
+   ordered->file_offset
+   + (blk << inode->i_blkbits),
+   root->sectorsize,
+   uptodate);
+  

[PATCH V20 10/19] Btrfs: subpage-blocksize: btrfs_punch_hole: Fix uptodate blocks check

2016-07-03 Thread Chandan Rajendra
In case of subpage-blocksize, the file blocks to be punched may map only
part of a page. For file blocks inside such pages, we need to check for
the presence of BLK_STATE_UPTODATE flag.

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/file.c | 89 -
 1 file changed, 88 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 38f5e8e..89ded7b 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2335,6 +2335,8 @@ static int btrfs_punch_hole(struct inode *inode, loff_t 
offset, loff_t len)
struct btrfs_path *path;
struct btrfs_block_rsv *rsv;
struct btrfs_trans_handle *trans;
+   struct address_space *mapping = inode->i_mapping;
+   pgoff_t start_index, end_index;
u64 lockstart;
u64 lockend;
u64 tail_start;
@@ -2347,6 +2349,7 @@ static int btrfs_punch_hole(struct inode *inode, loff_t 
offset, loff_t len)
int err = 0;
unsigned int rsv_count;
bool same_block;
+   bool same_page;
bool no_holes = btrfs_fs_incompat(root->fs_info, NO_HOLES);
u64 ino_size;
bool truncated_block = false;
@@ -2443,11 +2446,45 @@ static int btrfs_punch_hole(struct inode *inode, loff_t 
offset, loff_t len)
goto out_only_mutex;
}
 
+   start_index = lockstart >> PAGE_SHIFT;
+   end_index = lockend >> PAGE_SHIFT;
+
+   same_page = lockstart >> PAGE_SHIFT
+   == lockend >> PAGE_SHIFT;
+
while (1) {
struct btrfs_ordered_extent *ordered;
+   struct page *start_page = NULL;
+   struct page *end_page = NULL;
+   u64 nr_pages;
+   int start_page_blks_uptodate;
+   int end_page_blks_uptodate;
 
truncate_pagecache_range(inode, lockstart, lockend);
 
+   if (lockstart & (PAGE_SIZE - 1)) {
+   start_page = find_or_create_page(mapping, start_index,
+   GFP_NOFS);
+   if (!start_page) {
+   inode_unlock(inode);
+   return -ENOMEM;
+   }
+   }
+
+   if (!same_page && ((lockend + 1) & (PAGE_SIZE - 1))) {
+   end_page = find_or_create_page(mapping, end_index,
+   GFP_NOFS);
+   if (!end_page) {
+   if (start_page) {
+   unlock_page(start_page);
+   put_page(start_page);
+   }
+   inode_unlock(inode);
+   return -ENOMEM;
+   }
+   }
+
+
lock_extent_bits(_I(inode)->io_tree, lockstart, lockend,
 _state);
ordered = btrfs_lookup_first_ordered_extent(inode, lockend);
@@ -2457,18 +2494,68 @@ static int btrfs_punch_hole(struct inode *inode, loff_t 
offset, loff_t len)
 * and nobody raced in and read a page in this range, if we did
 * we need to try again.
 */
+   nr_pages = round_up(lockend, PAGE_SIZE)
+   - round_down(lockstart, PAGE_SIZE);
+   nr_pages >>= PAGE_SHIFT;
+
+   start_page_blks_uptodate = 0;
+   end_page_blks_uptodate = 0;
+   if (root->sectorsize < PAGE_SIZE) {
+   u64 page_end;
+
+   page_end = round_down(lockstart, PAGE_SIZE)
+   + PAGE_SIZE - 1;
+   page_end = min(page_end, lockend);
+   if (start_page
+   && PagePrivate(start_page)
+   && test_page_blks_state(start_page, 1 << 
BLK_STATE_UPTODATE,
+   lockstart, page_end, 0))
+   start_page_blks_uptodate = 1;
+   if (end_page
+   && PagePrivate(end_page)
+   && test_page_blks_state(end_page, 1 << 
BLK_STATE_UPTODATE,
+   page_offset(end_page), 
lockend, 0))
+   end_page_blks_uptodate = 1;
+   } else {
+   if (start_page && PagePrivate(start_page)
+   && PageUptodate(start_page))
+   start_page_blks_uptodate = 1;
+   if (end_page && PagePrivate(end_page)
+   && PageUptodate(end_page))
+   end_page_blks_uptodate = 1;
+   }
+
if ((!ordered ||

[PATCH V20 12/19] Revert "btrfs: fix lockups from btrfs_clear_path_blocking"

2016-07-03 Thread Chandan Rajendra
The patch "Btrfs: subpage-blocksize: Prevent writes to an extent buffer
when PG_writeback flag is set" requires btrfs_try_tree_write_lock() to
be a true try lock w.r.t to both spinning and blocking locks. During
2015's Vault Conference Btrfs meetup, Chris Mason had suggested that he
will write up a suitable locking function to be used when writing dirty
pages that map metadata blocks. Until we have a suitable locking
function available, this patch temporarily disables the commit
f82c458a2c3ffb94b431fc6ad791a79df1b3713e.
---
 fs/btrfs/ctree.c   | 14 --
 fs/btrfs/locking.c | 24 +++-
 fs/btrfs/locking.h |  2 --
 3 files changed, 15 insertions(+), 25 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 0a56d1b..394ad8e 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -81,6 +81,13 @@ noinline void btrfs_clear_path_blocking(struct btrfs_path *p,
 {
int i;
 
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+   /* lockdep really cares that we take all of these spinlocks
+* in the right order.  If any of the locks in the path are not
+* currently blocking, it is going to complain.  So, make really
+* really sure by forcing the path to blocking before we clear
+* the path blocking.
+*/
if (held) {
btrfs_set_lock_blocking_rw(held, held_rw);
if (held_rw == BTRFS_WRITE_LOCK)
@@ -89,6 +96,7 @@ noinline void btrfs_clear_path_blocking(struct btrfs_path *p,
held_rw = BTRFS_READ_LOCK_BLOCKING;
}
btrfs_set_path_blocking(p);
+#endif
 
for (i = BTRFS_MAX_LEVEL - 1; i >= 0; i--) {
if (p->nodes[i] && p->locks[i]) {
@@ -100,8 +108,10 @@ noinline void btrfs_clear_path_blocking(struct btrfs_path 
*p,
}
}
 
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
if (held)
btrfs_clear_lock_blocking_rw(held, held_rw);
+#endif
 }
 
 /* this also releases the path */
@@ -2922,7 +2932,7 @@ cow_done:
}
p->locks[level] = BTRFS_WRITE_LOCK;
} else {
-   err = btrfs_tree_read_lock_atomic(b);
+   err = btrfs_try_tree_read_lock(b);
if (!err) {
btrfs_set_path_blocking(p);
btrfs_tree_read_lock(b);
@@ -3054,7 +3064,7 @@ again:
}
 
level = btrfs_header_level(b);
-   err = btrfs_tree_read_lock_atomic(b);
+   err = btrfs_try_tree_read_lock(b);
if (!err) {
btrfs_set_path_blocking(p);
btrfs_tree_read_lock(b);
diff --git a/fs/btrfs/locking.c b/fs/btrfs/locking.c
index d13128c..8b50e60 100644
--- a/fs/btrfs/locking.c
+++ b/fs/btrfs/locking.c
@@ -132,26 +132,6 @@ again:
 }
 
 /*
- * take a spinning read lock.
- * returns 1 if we get the read lock and 0 if we don't
- * this won't wait for blocking writers
- */
-int btrfs_tree_read_lock_atomic(struct extent_buffer *eb)
-{
-   if (atomic_read(>blocking_writers))
-   return 0;
-
-   read_lock(>lock);
-   if (atomic_read(>blocking_writers)) {
-   read_unlock(>lock);
-   return 0;
-   }
-   atomic_inc(>read_locks);
-   atomic_inc(>spinning_readers);
-   return 1;
-}
-
-/*
  * returns 1 if we get the read lock and 0 if we don't
  * this won't wait for blocking writers
  */
@@ -182,7 +162,9 @@ int btrfs_try_tree_write_lock(struct extent_buffer *eb)
atomic_read(>blocking_readers))
return 0;
 
-   write_lock(>lock);
+   if (!write_trylock(>lock))
+   return 0;
+
if (atomic_read(>blocking_writers) ||
atomic_read(>blocking_readers)) {
write_unlock(>lock);
diff --git a/fs/btrfs/locking.h b/fs/btrfs/locking.h
index c44a9d5..b81e0e9 100644
--- a/fs/btrfs/locking.h
+++ b/fs/btrfs/locking.h
@@ -35,8 +35,6 @@ void btrfs_clear_lock_blocking_rw(struct extent_buffer *eb, 
int rw);
 void btrfs_assert_tree_locked(struct extent_buffer *eb);
 int btrfs_try_tree_read_lock(struct extent_buffer *eb);
 int btrfs_try_tree_write_lock(struct extent_buffer *eb);
-int btrfs_tree_read_lock_atomic(struct extent_buffer *eb);
-
 
 static inline void btrfs_tree_unlock_rw(struct extent_buffer *eb, int rw)
 {
-- 
2.5.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V20 13/19] Btrfs: subpage-blocksize: Fix file defragmentation code

2016-07-03 Thread Chandan Rajendra
This commit gets file defragmentation code to work in subpage-blocksize
scenario. It does this by keeping track of page offsets that mark block
boundaries and passing them as arguments to the functions that implement
the defragmentation logic.

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/ioctl.c | 198 ++-
 1 file changed, 136 insertions(+), 62 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 001c111..fb92566 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -904,12 +904,13 @@ out_unlock:
 static int check_defrag_in_cache(struct inode *inode, u64 offset, u32 thresh)
 {
struct extent_io_tree *io_tree = _I(inode)->io_tree;
+   struct btrfs_root *root = BTRFS_I(inode)->root;
struct extent_map *em = NULL;
struct extent_map_tree *em_tree = _I(inode)->extent_tree;
u64 end;
 
read_lock(_tree->lock);
-   em = lookup_extent_mapping(em_tree, offset, PAGE_SIZE);
+   em = lookup_extent_mapping(em_tree, offset, root->sectorsize);
read_unlock(_tree->lock);
 
if (em) {
@@ -999,7 +1000,7 @@ static struct extent_map *defrag_lookup_extent(struct 
inode *inode, u64 start)
struct extent_map_tree *em_tree = _I(inode)->extent_tree;
struct extent_io_tree *io_tree = _I(inode)->io_tree;
struct extent_map *em;
-   u64 len = PAGE_SIZE;
+   u64 len = BTRFS_I(inode)->root->sectorsize;
 
/*
 * hopefully we have this extent in the tree already, try without
@@ -1118,37 +1119,47 @@ out:
  * before calling this.
  */
 static int cluster_pages_for_defrag(struct inode *inode,
-   struct page **pages,
-   unsigned long start_index,
-   unsigned long num_pages)
+   struct page **pages,
+   unsigned long start_index,
+   size_t pg_offset,
+   unsigned long num_blks)
 {
-   unsigned long file_end;
u64 isize = i_size_read(inode);
+   u64 start_blk;
+   u64 end_blk;
u64 page_start;
u64 page_end;
u64 page_cnt;
+   u64 blk_cnt;
int ret;
int i;
int i_done;
struct btrfs_ordered_extent *ordered;
struct extent_state *cached_state = NULL;
struct extent_io_tree *tree;
+   struct btrfs_root *root;
gfp_t mask = btrfs_alloc_write_mask(inode->i_mapping);
 
-   file_end = (isize - 1) >> PAGE_SHIFT;
-   if (!isize || start_index > file_end)
+   root = BTRFS_I(inode)->root;
+   start_blk = (start_index << PAGE_SHIFT) + pg_offset;
+   start_blk >>= inode->i_blkbits;
+   end_blk = (isize - 1) >> inode->i_blkbits;
+   if (!isize || start_blk > end_blk)
return 0;
 
-   page_cnt = min_t(u64, (u64)num_pages, (u64)file_end - start_index + 1);
+   blk_cnt = min_t(u64, (u64)num_blks, (u64)end_blk - start_blk + 1);
 
ret = btrfs_delalloc_reserve_space(inode,
-   start_index << PAGE_SHIFT,
-   page_cnt << PAGE_SHIFT);
+   start_blk << inode->i_blkbits,
+   blk_cnt << inode->i_blkbits);
if (ret)
return ret;
i_done = 0;
tree = _I(inode)->io_tree;
 
+   page_cnt = DIV_ROUND_UP(pg_offset + (blk_cnt << inode->i_blkbits),
+   PAGE_SIZE);
+
/* step one, lock all the pages */
for (i = 0; i < page_cnt; i++) {
struct page *page;
@@ -1159,12 +1170,22 @@ again:
break;
 
page_start = page_offset(page);
-   page_end = page_start + PAGE_SIZE - 1;
+
+   if (i == 0)
+   page_start += pg_offset;
+
+   if (i == page_cnt - 1) {
+   page_end = (start_index << PAGE_SHIFT) + pg_offset;
+   page_end += (blk_cnt << inode->i_blkbits) - 1;
+   } else {
+   page_end = page_offset(page) + PAGE_SIZE - 1;
+   }
+
while (1) {
lock_extent_bits(tree, page_start, page_end,
 _state);
-   ordered = btrfs_lookup_ordered_extent(inode,
- page_start);
+   ordered = btrfs_lookup_ordered_range(inode, page_start,
+   page_end - page_start + 
1);
unlock_extent_cached(tree, page_start, page_end,
 _state, GFP_NOFS);
if (!ordered)
@@ -1203,7 +1224,7 @@ again:
}
 
pages[i] = page;
- 

[PATCH V20 17/19] Btrfs: subpage-blocksize: __btrfs_lookup_bio_sums: Set offset when moving to a new bio_vec

2016-07-03 Thread Chandan Rajendra
In __btrfs_lookup_bio_sums() we set the file offset value at the
beginning of every iteration of the while loop. This is incorrect since
the blocks mapped by the current bvec->bv_page might not yet have been
completely processed.

This commit fixes the issue by setting the file offset value when we
move to the next bvec of the bio.

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/file-item.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
index 62a81ee..fb6a7e8 100644
--- a/fs/btrfs/file-item.c
+++ b/fs/btrfs/file-item.c
@@ -222,11 +222,11 @@ static int __btrfs_lookup_bio_sums(struct btrfs_root 
*root,
disk_bytenr = (u64)bio->bi_iter.bi_sector << 9;
if (dio)
offset = logical_offset;
+   else
+   offset = page_offset(bvec->bv_page) + bvec->bv_offset;
 
page_bytes_left = bvec->bv_len;
while (bio_index < bio->bi_vcnt) {
-   if (!dio)
-   offset = page_offset(bvec->bv_page) + bvec->bv_offset;
count = btrfs_find_ordered_sum(inode, offset, disk_bytenr,
   (u32 *)csum, nblocks);
if (count)
@@ -301,6 +301,9 @@ found:
goto done;
}
bvec++;
+   if (!dio)
+   offset = page_offset(bvec->bv_page)
+   + bvec->bv_offset;
page_bytes_left = bvec->bv_len;
}
 
-- 
2.5.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V20 15/19] Btrfs: subpage-blocksize: btrfs_clone: Flush dirty blocks of a page that do not map the clone range

2016-07-03 Thread Chandan Rajendra
After cloning the required extents, we truncate all the pages that map
the file range being cloned. In subpage-blocksize scenario, we could
have dirty blocks before and/or after the clone range in the
leading/trailing pages. Truncating these pages would lead to data
loss. Hence this commit forces such dirty blocks to be flushed to disk
before performing the clone operation.

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/ioctl.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 5d9062e..0ef3c32 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3921,6 +3921,7 @@ static noinline int btrfs_clone_files(struct file *file, 
struct file *file_src,
int ret;
u64 len = olen;
u64 bs = root->fs_info->sb->s_blocksize;
+   u64 dest_end;
int same_inode = src == inode;
 
/*
@@ -3981,6 +3982,21 @@ static noinline int btrfs_clone_files(struct file *file, 
struct file *file_src,
goto out_unlock;
}
 
+   if ((round_down(destoff, PAGE_SIZE) < inode->i_size) &&
+   !IS_ALIGNED(destoff, PAGE_SIZE)) {
+   ret = filemap_write_and_wait_range(inode->i_mapping,
+   round_down(destoff, PAGE_SIZE),
+   destoff - 1);
+   }
+
+   dest_end = destoff + len - 1;
+   if ((dest_end < inode->i_size) &&
+   !IS_ALIGNED(dest_end + 1, PAGE_SIZE)) {
+   ret = filemap_write_and_wait_range(inode->i_mapping,
+   dest_end + 1,
+   round_up(dest_end, PAGE_SIZE));
+   }
+
if (destoff > inode->i_size) {
ret = btrfs_cont_expand(inode, inode->i_size, destoff);
if (ret)
-- 
2.5.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V20 16/19] Btrfs: subpage-blocksize: Make file extent relocate code subpage blocksize aware

2016-07-03 Thread Chandan Rajendra
The file extent relocation code currently assumes blocksize to be same
as PAGE_SIZE. This commit adds code to support subpage blocksize
scenario.

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/relocation.c | 89 ---
 1 file changed, 70 insertions(+), 19 deletions(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 05b88f8..fc0ac5d 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -3106,14 +3106,19 @@ static int relocate_file_extent_cluster(struct inode 
*inode,
 {
u64 page_start;
u64 page_end;
+   u64 block_start;
u64 offset = BTRFS_I(inode)->index_cnt;
+   u64 blocksize = BTRFS_I(inode)->root->sectorsize;
+   u64 reserved_space;
unsigned long index;
unsigned long last_index;
struct page *page;
struct file_ra_state *ra;
gfp_t mask = btrfs_alloc_write_mask(inode->i_mapping);
+   int nr_blocks;
int nr = 0;
int ret = 0;
+   int i;
 
if (!cluster->nr)
return 0;
@@ -3133,13 +3138,19 @@ static int relocate_file_extent_cluster(struct inode 
*inode,
if (ret)
goto out;
 
+   page_start = cluster->start - offset;
+   page_end = min_t(u64, round_down(page_start, PAGE_SIZE) + PAGE_SIZE - 1,
+   cluster->end - offset);
+
index = (cluster->start - offset) >> PAGE_SHIFT;
last_index = (cluster->end - offset) >> PAGE_SHIFT;
while (index <= last_index) {
-   ret = btrfs_delalloc_reserve_metadata(inode, PAGE_SIZE);
+   reserved_space = page_end - page_start + 1;
+
+   ret = btrfs_delalloc_reserve_metadata(inode, reserved_space);
if (ret)
goto out;
-
+again:
page = find_lock_page(inode->i_mapping, index);
if (!page) {
page_cache_sync_readahead(inode->i_mapping,
@@ -3149,7 +3160,7 @@ static int relocate_file_extent_cluster(struct inode 
*inode,
   mask);
if (!page) {
btrfs_delalloc_release_metadata(inode,
-   PAGE_SIZE);
+   reserved_space);
ret = -ENOMEM;
goto out;
}
@@ -3161,6 +3172,37 @@ static int relocate_file_extent_cluster(struct inode 
*inode,
   last_index + 1 - index);
}
 
+   if (PageDirty(page)) {
+   u64 pg_offset = page_offset(page);
+
+   unlock_page(page);
+   put_page(page);
+   ret = btrfs_fdatawrite_range(inode, pg_offset,
+   page_start - 1);
+   if (ret) {
+   btrfs_delalloc_release_metadata(inode,
+   reserved_space);
+   goto out;
+   }
+
+   ret = filemap_fdatawait_range(inode->i_mapping,
+   pg_offset, page_start - 1);
+   if (ret) {
+   btrfs_delalloc_release_metadata(inode,
+   reserved_space);
+   goto out;
+   }
+
+   goto again;
+   }
+
+   if (BTRFS_I(inode)->root->sectorsize < PAGE_SIZE) {
+   ClearPageUptodate(page);
+   if (page->private)
+   clear_page_blks_state(page, 1 << 
BLK_STATE_UPTODATE,
+   page_start, page_end);
+   }
+
if (!PageUptodate(page)) {
btrfs_readpage(NULL, page);
lock_page(page);
@@ -3168,41 +3210,50 @@ static int relocate_file_extent_cluster(struct inode 
*inode,
unlock_page(page);
put_page(page);
btrfs_delalloc_release_metadata(inode,
-   PAGE_SIZE);
+   reserved_space);
ret = -EIO;
goto out;
}
}
 
-   page_start = page_offset(page);
-   page_end = page_start + PAGE_SIZE - 1;
-
lock_extent(_I(inode)->io_tree, page_start, page_end);
 
set_page_extent_mapped(page);
 
- 

[PATCH V20 11/19] Btrfs: subpage-blocksize: Prevent writes to an extent buffer when PG_writeback flag is set

2016-07-03 Thread Chandan Rajendra
In non-subpage-blocksize scenario, BTRFS_HEADER_FLAG_WRITTEN flag
prevents Btrfs code from writing into an extent buffer whose pages are
under writeback. This facility isn't sufficient for achieving the same
in subpage-blocksize scenario, since we have more than one extent buffer
mapped to a page.

Hence this patch adds a new flag (i.e. EXTENT_BUFFER_HEAD_WRITEBACK) and
corresponding code to track the writeback status of the page and to
prevent writes to any of the extent buffers mapped to the page while
writeback is going on.

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/ctree.c   |  18 ++
 fs/btrfs/extent-tree.c |  10 
 fs/btrfs/extent_io.c   | 150 -
 fs/btrfs/extent_io.h   |   1 +
 4 files changed, 152 insertions(+), 27 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 4e35a21..0a56d1b 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -1541,6 +1541,7 @@ noinline int btrfs_cow_block(struct btrfs_trans_handle 
*trans,
struct extent_buffer *parent, int parent_slot,
struct extent_buffer **cow_ret)
 {
+   struct extent_buffer_head *ebh = eb_head(buf);
u64 search_start;
int ret;
 
@@ -1555,6 +1556,14 @@ noinline int btrfs_cow_block(struct btrfs_trans_handle 
*trans,
 
if (!should_cow_block(trans, root, buf)) {
trans->dirty = true;
+   if (test_bit(EXTENT_BUFFER_HEAD_WRITEBACK, >bflags)) {
+   if (parent)
+   btrfs_set_lock_blocking(parent);
+   btrfs_set_lock_blocking(buf);
+   wait_on_bit_io(>bflags,
+   EXTENT_BUFFER_HEAD_WRITEBACK,
+   TASK_UNINTERRUPTIBLE);
+   }
*cow_ret = buf;
return 0;
}
@@ -2686,6 +2695,7 @@ int btrfs_search_slot(struct btrfs_trans_handle *trans, 
struct btrfs_root
  *root, struct btrfs_key *key, struct btrfs_path *p, int
  ins_len, int cow)
 {
+   struct extent_buffer_head *ebh;
struct extent_buffer *b;
int slot;
int ret;
@@ -2790,6 +2800,14 @@ again:
 */
if (!should_cow_block(trans, root, b)) {
trans->dirty = true;
+   ebh = eb_head(b);
+   if (test_bit(EXTENT_BUFFER_HEAD_WRITEBACK,
+   >bflags)) {
+   btrfs_set_path_blocking(p);
+   wait_on_bit_io(>bflags,
+   EXTENT_BUFFER_HEAD_WRITEBACK,
+   TASK_UNINTERRUPTIBLE);
+   }
goto cow_done;
}
 
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 590d0e7..4ead0ff 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -8224,15 +8224,25 @@ static struct extent_buffer *
 btrfs_init_new_buffer(struct btrfs_trans_handle *trans, struct btrfs_root 
*root,
  u64 bytenr, int level)
 {
+   struct extent_buffer_head *ebh;
struct extent_buffer *buf;
 
buf = btrfs_find_create_tree_block(root, bytenr, root->nodesize);
if (IS_ERR(buf))
return buf;
 
+   ebh = eb_head(buf);
btrfs_set_header_generation(buf, trans->transid);
btrfs_set_buffer_lockdep_class(root->root_key.objectid, buf, level);
btrfs_tree_lock(buf);
+
+   if (test_bit(EXTENT_BUFFER_HEAD_WRITEBACK,
+   >bflags)) {
+   btrfs_set_lock_blocking(buf);
+   wait_on_bit_io(>bflags, EXTENT_BUFFER_HEAD_WRITEBACK,
+   TASK_UNINTERRUPTIBLE);
+   }
+
clean_tree_block(trans, root->fs_info, buf);
clear_bit(EXTENT_BUFFER_STALE, >ebflags);
 
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 694d2dc..0bdb27d 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3725,6 +3725,52 @@ void wait_on_extent_buffer_writeback(struct 
extent_buffer *eb)
TASK_UNINTERRUPTIBLE);
 }
 
+static void lock_extent_buffers(struct extent_buffer_head *ebh,
+   struct extent_page_data *epd)
+{
+   struct extent_buffer *locked_eb = NULL;
+   struct extent_buffer *eb;
+again:
+   eb = >eb;
+   do {
+   if (eb == locked_eb)
+   continue;
+
+   if (!btrfs_try_tree_write_lock(eb))
+   goto backoff;
+
+   } while ((eb = eb->eb_next) != NULL);
+
+   return;
+
+backoff:
+   if (locked_eb && (locked_eb->start > eb->start))
+   btrfs_tree_unlock(locked_eb);
+
+   locked_eb = 

[PATCH V20 18/19] Btrfs: subpage-blocksize: Disable compression

2016-07-03 Thread Chandan Rajendra
The subpage-blocksize patchset does not yet support compression. Hence,
the kernel might crash when executing compression code in
subpage-blocksize scenario. This commit disables enabling compression
feature during 'mount' and also when the  user invokes
'chattr +c ' command.

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/ioctl.c |  8 +++-
 fs/btrfs/super.c | 20 
 2 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 0ef3c32..d7159db 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -322,6 +322,11 @@ static int btrfs_ioctl_setflags(struct file *file, void 
__user *arg)
} else if (flags & FS_COMPR_FL) {
const char *comp;
 
+   if (root->sectorsize < PAGE_SIZE) {
+   ret = -EINVAL;
+   goto out_drop;
+   }
+
ip->flags |= BTRFS_INODE_COMPRESS;
ip->flags &= ~BTRFS_INODE_NOCOMPRESS;
 
@@ -1344,7 +1349,8 @@ int btrfs_defrag_file(struct inode *inode, struct file 
*file,
return -EINVAL;
 
if (range->flags & BTRFS_DEFRAG_RANGE_COMPRESS) {
-   if (range->compress_type > BTRFS_COMPRESS_TYPES)
+   if ((range->compress_type > BTRFS_COMPRESS_TYPES)
+   || (root->sectorsize < PAGE_SIZE))
return -EINVAL;
if (range->compress_type)
compress_type = range->compress_type;
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index cba92e6..ddd4658 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -368,6 +368,17 @@ static const match_table_t tokens = {
{Opt_err, NULL},
 };
 
+static int can_enable_compression(struct btrfs_fs_info *fs_info)
+{
+   if (btrfs_super_sectorsize(fs_info->super_copy) < PAGE_SIZE) {
+   btrfs_err(fs_info,
+   "Compression is not supported for subpage-blocksize");
+   return 0;
+   }
+
+   return 1;
+}
+
 /*
  * Regular mount options parser.  Everything that is needed only when
  * reading in a new superblock is parsed here.
@@ -477,6 +488,10 @@ int btrfs_parse_options(struct btrfs_root *root, char 
*options,
if (token == Opt_compress ||
token == Opt_compress_force ||
strcmp(args[0].from, "zlib") == 0) {
+   if (!can_enable_compression(info)) {
+   ret = -EINVAL;
+   goto out;
+   }
compress_type = "zlib";
info->compress_type = BTRFS_COMPRESS_ZLIB;
btrfs_set_opt(info->mount_opt, COMPRESS);
@@ -484,6 +499,10 @@ int btrfs_parse_options(struct btrfs_root *root, char 
*options,
btrfs_clear_opt(info->mount_opt, NODATASUM);
no_compress = 0;
} else if (strcmp(args[0].from, "lzo") == 0) {
+   if (!can_enable_compression(info)) {
+   ret = -EINVAL;
+   goto out;
+   }
compress_type = "lzo";
info->compress_type = BTRFS_COMPRESS_LZO;
btrfs_set_opt(info->mount_opt, COMPRESS);
@@ -806,6 +825,7 @@ int btrfs_parse_options(struct btrfs_root *root, char 
*options,
break;
}
}
+
 check:
/*
 * Extra check for current option against current flag
-- 
2.5.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V20 03/19] Btrfs: subpage-blocksize: Make sure delalloc range intersects with the locked page's range

2016-07-03 Thread Chandan Rajendra
find_delalloc_range indirectly depends on EXTENT_UPTODDATE to make sure that
the delalloc range returned intersects with the file range mapped by the
page. Since we now track "uptodate" state in a per-page
bitmap (i.e. in btrfs_page_private->bstate), this commit makes an explicit
check to make sure that the delalloc range starts from within the file range
mapped by the page.

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/extent_io.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 0adbff5..f7d035b 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1581,6 +1581,7 @@ out:
  * 1 is returned if we find something, 0 if nothing was in the tree
  */
 static noinline u64 find_delalloc_range(struct extent_io_tree *tree,
+   struct page *locked_page,
u64 *start, u64 *end, u64 max_bytes,
struct extent_state **cached_state)
 {
@@ -1589,6 +1590,9 @@ static noinline u64 find_delalloc_range(struct 
extent_io_tree *tree,
u64 cur_start = *start;
u64 found = 0;
u64 total_bytes = 0;
+   u64 page_end;
+
+   page_end = page_offset(locked_page) + PAGE_SIZE - 1;
 
spin_lock(>lock);
 
@@ -1609,7 +1613,8 @@ static noinline u64 find_delalloc_range(struct 
extent_io_tree *tree,
  (state->state & EXTENT_BOUNDARY))) {
goto out;
}
-   if (!(state->state & EXTENT_DELALLOC)) {
+   if (!(state->state & EXTENT_DELALLOC)
+   || (page_end < state->start)) {
if (!found)
*end = state->end;
goto out;
@@ -1747,8 +1752,9 @@ again:
/* step one, find a bunch of delalloc bytes starting at start */
delalloc_start = *start;
delalloc_end = 0;
-   found = find_delalloc_range(tree, _start, _end,
-   max_bytes, _state);
+   found = find_delalloc_range(tree, locked_page,
+   _start, _end,
+   max_bytes, _state);
if (!found || delalloc_end <= *start) {
*start = delalloc_start;
*end = delalloc_end;
-- 
2.5.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V20 07/19] Btrfs: subpage-blocksize: Allow mounting filesystems where sectorsize < PAGE_SIZE

2016-07-03 Thread Chandan Rajendra
This commit allows mounting filesystem instances with sectorsize smaller
than the PAGE_SIZE.

Since the code assumes that the super block is either equal to or larger
than sectorsize, this commit brings back the nodesize argument for
btrfs_find_create_tree_block() function. This change allows us to be
able to mount and use filesystems with 2048 bytes as the sectorsize.

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/disk-io.c | 21 -
 fs/btrfs/disk-io.h |  2 +-
 fs/btrfs/extent-tree.c |  4 ++--
 fs/btrfs/extent_io.c   |  3 +--
 fs/btrfs/extent_io.h   |  4 ++--
 fs/btrfs/tree-log.c|  2 +-
 fs/btrfs/volumes.c | 10 +++---
 7 files changed, 18 insertions(+), 28 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 2d9e86b..0727c1c 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1099,7 +1099,7 @@ void readahead_tree_block(struct btrfs_root *root, u64 
bytenr)
struct extent_buffer *buf = NULL;
struct inode *btree_inode = root->fs_info->btree_inode;
 
-   buf = btrfs_find_create_tree_block(root, bytenr);
+   buf = btrfs_find_create_tree_block(root, bytenr, root->nodesize);
if (IS_ERR(buf))
return;
read_extent_buffer_pages(_I(btree_inode)->io_tree,
@@ -1115,7 +1115,7 @@ int reada_tree_block_flagged(struct btrfs_root *root, u64 
bytenr,
struct extent_io_tree *io_tree = _I(btree_inode)->io_tree;
int ret;
 
-   buf = btrfs_find_create_tree_block(root, bytenr);
+   buf = btrfs_find_create_tree_block(root, bytenr, root->nodesize);
if (IS_ERR(buf))
return 0;
 
@@ -1146,12 +1146,12 @@ struct extent_buffer *btrfs_find_tree_block(struct 
btrfs_fs_info *fs_info,
 }
 
 struct extent_buffer *btrfs_find_create_tree_block(struct btrfs_root *root,
-u64 bytenr)
+u64 bytenr, u32 blocksize)
 {
if (btrfs_test_is_dummy_root(root))
return alloc_test_extent_buffer(root->fs_info, bytenr,
-   root->nodesize);
-   return alloc_extent_buffer(root->fs_info, bytenr);
+   blocksize);
+   return alloc_extent_buffer(root->fs_info, bytenr, blocksize);
 }
 
 
@@ -1175,7 +1175,7 @@ struct extent_buffer *read_tree_block(struct btrfs_root 
*root, u64 bytenr,
struct extent_buffer *buf = NULL;
int ret;
 
-   buf = btrfs_find_create_tree_block(root, bytenr);
+   buf = btrfs_find_create_tree_block(root, bytenr, root->nodesize);
if (IS_ERR(buf))
return buf;
 
@@ -4093,17 +4093,12 @@ static int btrfs_check_super_valid(struct btrfs_fs_info 
*fs_info,
 * Check sectorsize and nodesize first, other check will need it.
 * Check all possible sectorsize(4K, 8K, 16K, 32K, 64K) here.
 */
-   if (!is_power_of_2(sectorsize) || sectorsize < 4096 ||
+   if (!is_power_of_2(sectorsize) || sectorsize < 2048 ||
sectorsize > BTRFS_MAX_METADATA_BLOCKSIZE) {
printk(KERN_ERR "BTRFS: invalid sectorsize %llu\n", sectorsize);
ret = -EINVAL;
}
-   /* Only PAGE SIZE is supported yet */
-   if (sectorsize != PAGE_SIZE) {
-   printk(KERN_ERR "BTRFS: sectorsize %llu not supported yet, only 
support %lu\n",
-   sectorsize, PAGE_SIZE);
-   ret = -EINVAL;
-   }
+
if (!is_power_of_2(nodesize) || nodesize < sectorsize ||
nodesize > BTRFS_MAX_METADATA_BLOCKSIZE) {
printk(KERN_ERR "BTRFS: invalid nodesize %llu\n", nodesize);
diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
index a81ff8d..aa3fb08 100644
--- a/fs/btrfs/disk-io.h
+++ b/fs/btrfs/disk-io.h
@@ -50,7 +50,7 @@ void readahead_tree_block(struct btrfs_root *root, u64 
bytenr);
 int reada_tree_block_flagged(struct btrfs_root *root, u64 bytenr,
 int mirror_num, struct extent_buffer **eb);
 struct extent_buffer *btrfs_find_create_tree_block(struct btrfs_root *root,
-  u64 bytenr);
+  u64 bytenr, u32 blocksize);
 void clean_tree_block(struct btrfs_trans_handle *trans,
  struct btrfs_fs_info *fs_info, struct extent_buffer *buf);
 int open_ctree(struct super_block *sb,
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 51e514c..590d0e7 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -8226,7 +8226,7 @@ btrfs_init_new_buffer(struct btrfs_trans_handle *trans, 
struct btrfs_root *root,
 {
struct extent_buffer *buf;
 
-   buf = btrfs_find_create_tree_block(root, bytenr);
+   buf = btrfs_find_create_tree_block(root, bytenr, root->nodesize);
if (IS_ERR(buf))
return buf;
 
@@ -8871,7 +8871,7 @@ static 

[PATCH V20 08/19] Btrfs: subpage-blocksize: Deal with partial ordered extent allocations.

2016-07-03 Thread Chandan Rajendra
In subpage-blocksize scenario, extent allocations for only some of the
dirty blocks of a page can succeed, while allocation for rest of the
blocks can fail. This patch allows I/O against such pages to be
submitted.

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/extent_io.c | 27 ++-
 fs/btrfs/inode.c | 18 +++---
 2 files changed, 29 insertions(+), 16 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 0ec3b1e..303b49e 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1864,17 +1864,23 @@ void extent_clear_unlock_delalloc(struct inode *inode, 
u64 start, u64 end,
if (page_ops & PAGE_SET_PRIVATE2)
SetPagePrivate2(pages[i]);
 
+   if (page_ops & PAGE_SET_ERROR)
+   SetPageError(pages[i]);
+
if (pages[i] == locked_page) {
put_page(pages[i]);
continue;
}
-   if (page_ops & PAGE_CLEAR_DIRTY)
+
+   if ((page_ops & PAGE_CLEAR_DIRTY)
+   && !PagePrivate2(pages[i]))
clear_page_dirty_for_io(pages[i]);
-   if (page_ops & PAGE_SET_WRITEBACK)
+   if ((page_ops & PAGE_SET_WRITEBACK)
+   && !PagePrivate2(pages[i]))
set_page_writeback(pages[i]);
-   if (page_ops & PAGE_SET_ERROR)
-   SetPageError(pages[i]);
-   if (page_ops & PAGE_END_WRITEBACK)
+
+   if ((page_ops & PAGE_END_WRITEBACK)
+   && !PagePrivate2(pages[i]))
end_page_writeback(pages[i]);
 
if (page_ops & PAGE_UNLOCK) {
@@ -2572,7 +2578,7 @@ void end_extent_writepage(struct page *page, int err, u64 
start, u64 end)
uptodate = 0;
}
 
-   if (!uptodate) {
+   if (!uptodate || PageError(page)) {
ClearPageUptodate(page);
SetPageError(page);
ret = ret < 0 ? ret : -EIO;
@@ -3427,7 +3433,6 @@ static noinline_for_stack int writepage_delalloc(struct 
inode *inode,
   nr_written);
/* File system has been set read-only */
if (ret) {
-   SetPageError(page);
/* fill_delalloc should be return < 0 for error
 * but just in case, we use > 0 here meaning the
 * IO is started, so we don't want to return > 0
@@ -3648,7 +3653,6 @@ static int __extent_writepage(struct page *page, struct 
writeback_control *wbc,
struct inode *inode = page->mapping->host;
struct extent_page_data *epd = data;
u64 start = page_offset(page);
-   u64 page_end = start + PAGE_SIZE - 1;
int ret;
int nr = 0;
size_t pg_offset = 0;
@@ -3693,7 +3697,7 @@ static int __extent_writepage(struct page *page, struct 
writeback_control *wbc,
ret = writepage_delalloc(inode, page, wbc, epd, start, _written);
if (ret == 1)
goto done_unlocked;
-   if (ret)
+   if (ret && !PagePrivate2(page))
goto done;
 
ret = __extent_writepage_io(inode, page, wbc, epd,
@@ -3707,10 +3711,7 @@ done:
set_page_writeback(page);
end_page_writeback(page);
}
-   if (PageError(page)) {
-   ret = ret < 0 ? ret : -EIO;
-   end_extent_writepage(page, ret, start, page_end);
-   }
+
unlock_page(page);
return ret;
 
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index e8a0005..e9f9bb1 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -950,6 +950,7 @@ static noinline int cow_file_range(struct inode *inode,
struct btrfs_key ins;
struct extent_map *em;
struct extent_map_tree *em_tree = _I(inode)->extent_tree;
+   struct btrfs_ordered_extent *ordered;
unsigned long page_ops, extent_ops;
int ret = 0;
 
@@ -1048,7 +1049,7 @@ static noinline int cow_file_range(struct inode *inode,
ret = btrfs_reloc_clone_csums(inode, start,
  cur_alloc_size);
if (ret)
-   goto out_drop_extent_cache;
+   goto out_remove_ordered_extent;
}
 
btrfs_dec_block_group_reservations(root->fs_info, ins.objectid);
@@ -1077,11 +1078,22 @@ static noinline int cow_file_range(struct inode *inode,
 out:
return ret;
 
+out_remove_ordered_extent:
+   ordered = btrfs_lookup_ordered_extent(inode, start);
+   

[PATCH V20 02/19] Btrfs: subpage-blocksize: Fix whole page write

2016-07-03 Thread Chandan Rajendra
For the subpage-blocksize scenario, a page can contain multiple
blocks. In such cases, this patch handles writing data to files.

Also, When setting EXTENT_DELALLOC, we no longer set EXTENT_UPTODATE bit on
the extent_io_tree since uptodate status is being tracked by the bitmap
pointed to by page->private.

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/extent_io.c  | 150 --
 fs/btrfs/file.c   |  17 ++
 fs/btrfs/inode.c  |  75 +
 fs/btrfs/relocation.c |   3 +
 4 files changed, 155 insertions(+), 90 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index a349f99..0adbff5 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1494,24 +1494,6 @@ void extent_range_redirty_for_io(struct inode *inode, 
u64 start, u64 end)
}
 }
 
-/*
- * helper function to set both pages and extents in the tree writeback
- */
-static void set_range_writeback(struct extent_io_tree *tree, u64 start, u64 
end)
-{
-   unsigned long index = start >> PAGE_SHIFT;
-   unsigned long end_index = end >> PAGE_SHIFT;
-   struct page *page;
-
-   while (index <= end_index) {
-   page = find_get_page(tree->mapping, index);
-   BUG_ON(!page); /* Pages should be in the extent_io_tree */
-   set_page_writeback(page);
-   put_page(page);
-   index++;
-   }
-}
-
 /* find the first state struct with 'bits' set after 'start', and
  * return it.  tree->lock must be held.  NULL will returned if
  * nothing was found after 'start'
@@ -2585,36 +2567,41 @@ void end_extent_writepage(struct page *page, int err, 
u64 start, u64 end)
  */
 static void end_bio_extent_writepage(struct bio *bio)
 {
+   struct btrfs_page_private *pg_private;
struct bio_vec *bvec;
+   unsigned long flags;
u64 start;
u64 end;
+   int clear_writeback;
int i;
 
bio_for_each_segment_all(bvec, bio, i) {
struct page *page = bvec->bv_page;
+   struct btrfs_root *root = BTRFS_I(page->mapping->host)->root;
 
-   /* We always issue full-page reads, but if some block
-* in a page fails to read, blk_update_request() will
-* advance bv_offset and adjust bv_len to compensate.
-* Print a warning for nonzero offsets, and an error
-* if they don't add up to a full page.  */
-   if (bvec->bv_offset || bvec->bv_len != PAGE_SIZE) {
-   if (bvec->bv_offset + bvec->bv_len != PAGE_SIZE)
-   
btrfs_err(BTRFS_I(page->mapping->host)->root->fs_info,
-  "partial page write in btrfs with offset %u 
and length %u",
-   bvec->bv_offset, bvec->bv_len);
-   else
-   
btrfs_info(BTRFS_I(page->mapping->host)->root->fs_info,
-  "incomplete page write in btrfs with offset 
%u and "
-  "length %u",
-   bvec->bv_offset, bvec->bv_len);
-   }
+   pg_private = NULL;
+   flags = 0;
+   clear_writeback = 1;
 
-   start = page_offset(page);
-   end = start + bvec->bv_offset + bvec->bv_len - 1;
+   start = page_offset(page) + bvec->bv_offset;
+   end = start + bvec->bv_len - 1;
+
+   if (root->sectorsize < PAGE_SIZE) {
+   pg_private = (struct btrfs_page_private *)page->private;
+   spin_lock_irqsave(_private->io_lock, flags);
+   }
 
end_extent_writepage(page, bio->bi_error, start, end);
-   end_page_writeback(page);
+
+   if (root->sectorsize < PAGE_SIZE) {
+   clear_page_blks_state(page, 1 << BLK_STATE_IO, start,
+   end);
+   clear_writeback = page_io_complete(page);
+   spin_unlock_irqrestore(_private->io_lock, flags);
+   }
+
+   if (clear_writeback)
+   end_page_writeback(page);
}
 
bio_put(bio);
@@ -3486,7 +3473,6 @@ static noinline_for_stack int 
__extent_writepage_io(struct inode *inode,
u64 block_start;
u64 iosize;
sector_t sector;
-   struct extent_state *cached_state = NULL;
struct extent_map *em;
struct block_device *bdev;
size_t pg_offset = 0;
@@ -3538,20 +3524,29 @@ static noinline_for_stack int 
__extent_writepage_io(struct inode *inode,
 page_end, NULL, 1);
break;
}
-   em = epd->get_extent(inode, page, pg_offset, cur,
-  

[PATCH V20 00/19] Allow I/O on blocks whose size is less than page size

2016-07-03 Thread Chandan Rajendra
Btrfs assumes block size to be the same as the machine's page
size. This would mean that a Btrfs instance created on a 4k page size
machine (e.g. x86) will not be mountable on machines with larger page
sizes (e.g. PPC64/AARCH64). This patchset aims to resolve this
incompatibility.

This patchset continues with the work posted previously at
http://thread.gmane.org/gmane.comp.file-systems.btrfs/57282

I have reverted the upstream commit "btrfs: fix lockups from
btrfs_clear_path_blocking" (f82c458a2c3ffb94b431fc6ad791a79df1b3713e)
since this led to soft-lockups when the patch "Btrfs:
subpagesize-blocksize: Prevent writes to an extent buffer when
PG_writeback flag is set" is applied. During 2015's Vault Conference
Btrfs meetup, Chris Mason had suggested that he will write up a
suitable locking function to be used when writing dirty pages that map
metadata blocks. Until we have a suitable locking function available,
this patchset temporarily disables the commit
f82c458a2c3ffb94b431fc6ad791a79df1b3713e.

The commits for the Btrfs kernel module can be found at
https://github.com/chandanr/linux/tree/btrfs/subpagesize-blocksize.

To create a filesystem with block size < page size, a patched version
of the Btrfs-progs package is required. The corresponding fixes for
Btrfs-progs can be found at
https://github.com/chandanr/btrfs-progs/tree/btrfs/subpagesize-blocksize.

The patchset is based off kdave/for-next branch. I had cherry picked the
following fixes from Chris Mason's git tree,
1. Btrfs: fix ->iterate_shared() by upgrading i_rwsem for delayed nodes

Fstests run status:
1. x86_64
   - With 4k sectorsize, all the tests that succeed with the for-next
 branch at git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git
 branch also do so with the patches applied.
   - With 2k sectorsize, generic/027 never seems to complete. In my
 case, the test did not complete even after 45 mins of run time.
2. ppc64
   - With 4k sectorsize, 16k nodesize and with "nospace_cache" mount
 option, except for scrub and compression tests, all the tests
 that succeed with the for-next branch at
 git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git
 branch also do so with the patches applied.
   - With 64k sectorsize & nodesize, all the tests that succeed with
 the for-next branch at
 git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git
 branch also do so with the patches applied.

TODO:
1. On ppc64, btrfsck segfaults when checking a filesystem instance
   having 2k sectorsize.
2. I am planning to fix scrub & compression via a separate patchset.

Changes from V19:
1. The patchset has been rebased on top of kdave/for-next branch.
2. The patch "Btrfs: subpage-blocksize: extent_clear_unlock_delalloc:
   Prevent page from being unlocked more than once" changes the
   signatures of the functions "cow_file_range" &
   "extent_clear_unlock_delalloc". This patch has now been moved to be
   the first patch in the patchset.
3. A new patch "Btrfs: subpage-blocksize: Rate limit scrub error
   message" has been added. btrfs/073 invokes the scrub ioctl in a
   tight loop. In subpage-blocksize scenario this results in a lot of
   "scrub: size assumption sectorsize != PAGE_SIZE" messages being
   printed on the console. Hence this patch rate limits such error
   messages.

Changes from V18:
1. The per-page bitmap used to track the block status is now allocated
   from a slab cache.
2. The per-page bitmap is allocated and used only in cases where
   sectorsize < PAGE_SIZE.
3. The new patch "Btrfs: subpage-blocksize: Disable compression"
   disables compression in subpage-blocksize scenario.

Changes from V17:
1. Due to mistakes made during git rebase operations, fixes ended up
   in incorrect patches. This patchset gets the fixes in the right
   patches.

Changes from V16:
1. The V15 patchset consisted of patches obtained from an incorrect
   git branch. Apologies for the mistake. All the entries listed under
   "Changes from V15" hold good for V16.

Changes from V15:
1. The invocation of cleancache_get_page() in __do_readpage() assumed
   blocksize to be same as PAGE_SIZE. We now invoke cleancache_get_page()
   only if blocksize is same as PAGE_SIZE. Thanks to David Sterba for
   pointing this out.
2. In __extent_writepage_io() we used to accumulate all the contiguous
   dirty blocks within the page before submitting the file offset range
   for I/O. In some cases this caused the bio to span across more than
   a stripe. For example, With 4k block size, 64K stripe size
   and 64K page size, assume
   - All the blocks mapped by the page are contiguous on the logical
 address space.
   - The first block of the page is mapped to the second block of the
 stripe.
   In such a scenario, we would add all the blocks of the page to
   bio. This would mean that we would overflow the stripe by one 4K
   block. Hence this patchset removes the optimization and invokes
   submit_extent_page() for 

[PATCH V20 04/19] Btrfs: subpage-blocksize: Define extent_buffer_head

2016-07-03 Thread Chandan Rajendra
In order to handle multiple extent buffers per page, first we need to create a
way to handle all the extent buffers that are attached to a page.

This patch creates a new data structure 'struct extent_buffer_head', and moves
fields that are common to all extent buffers from 'struct extent_buffer' to
'struct extent_buffer_head'

Also, this patch moves EXTENT_BUFFER_TREE_REF, EXTENT_BUFFER_DUMMY and
EXTENT_BUFFER_IN_TREE flags from extent_buffer->ebflags  to
extent_buffer_head->bflags.

Reviewed-by: Liu Bo 
Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/ctree.c   |   4 +-
 fs/btrfs/ctree.h   |   6 +-
 fs/btrfs/disk-io.c |  72 ++--
 fs/btrfs/extent-tree.c |   6 +-
 fs/btrfs/extent_io.c   | 602 ++---
 fs/btrfs/extent_io.h   |  63 ++--
 fs/btrfs/root-tree.c   |   2 +-
 fs/btrfs/super.c   |   9 +-
 fs/btrfs/tests/btrfs-tests.c   |  12 +-
 fs/btrfs/tests/extent-io-tests.c   |   5 +-
 fs/btrfs/tests/free-space-tree-tests.c |  79 +++--
 fs/btrfs/volumes.c |   2 +-
 include/trace/events/btrfs.h   |   2 +-
 13 files changed, 557 insertions(+), 307 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index e8a3ac6..4e35a21 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -160,7 +160,7 @@ struct extent_buffer *btrfs_root_node(struct btrfs_root 
*root)
 * the inc_not_zero dance and if it doesn't work then
 * synchronize_rcu and try again.
 */
-   if (atomic_inc_not_zero(>refs)) {
+   if (atomic_inc_not_zero(_head(eb)->refs)) {
rcu_read_unlock();
break;
}
@@ -1772,7 +1772,7 @@ static noinline int generic_bin_search(struct 
extent_buffer *eb,
int err;
 
if (low > high) {
-   btrfs_err(eb->fs_info,
+   btrfs_err(eb_head(eb)->fs_info,
 "%s: low (%d) < high (%d) eb %llu owner %llu level %d",
  __func__, low, high, eb->start,
  btrfs_header_owner(eb), btrfs_header_level(eb));
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index cc65e9b..893bedb 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1475,14 +1475,16 @@ static inline void btrfs_set_token_##name(struct 
extent_buffer *eb, \
 #define BTRFS_SETGET_HEADER_FUNCS(name, type, member, bits)\
 static inline u##bits btrfs_##name(struct extent_buffer *eb)   \
 {  \
-   type *p = page_address(eb->pages[0]);   \
+   type *p = page_address(eb_head(eb)->pages[0]) + \
+   (eb->start & (PAGE_SIZE -1));   \
u##bits res = le##bits##_to_cpu(p->member); \
return res; \
 }  \
 static inline void btrfs_set_##name(struct extent_buffer *eb,  \
u##bits val)\
 {  \
-   type *p = page_address(eb->pages[0]);   \
+   type *p = page_address(eb_head(eb)->pages[0]) + \
+   (eb->start & (PAGE_SIZE -1));   \
p->member = cpu_to_le##bits(val);   \
 }
 
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 685c81a..299f353 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -375,10 +375,9 @@ static int verify_parent_transid(struct extent_io_tree 
*io_tree,
ret = 0;
goto out;
}
-   btrfs_err_rl(eb->fs_info,
+   btrfs_err_rl(eb_head(eb)->fs_info,
"parent transid verify failed on %llu wanted %llu found %llu",
-   eb->start,
-   parent_transid, btrfs_header_generation(eb));
+   eb->start, parent_transid, btrfs_header_generation(eb));
ret = 1;
 
/*
@@ -452,7 +451,7 @@ static int btree_read_extent_buffer_pages(struct btrfs_root 
*root,
int mirror_num = 0;
int failed_mirror = 0;
 
-   clear_bit(EXTENT_BUFFER_CORRUPT, >bflags);
+   clear_bit(EXTENT_BUFFER_CORRUPT, >ebflags);
io_tree = _I(root->fs_info->btree_inode)->io_tree;
while (1) {
ret = read_extent_buffer_pages(io_tree, eb, start,
@@ -471,7 +470,7 @@ static int btree_read_extent_buffer_pages(struct btrfs_root 
*root,
 * there is no reason to read the other copies, they won't be
 * any less wrong.
 */
-   if 

[PATCH V20 05/19] Btrfs: subpage-blocksize: Read tree blocks whose size is < PAGE_SIZE

2016-07-03 Thread Chandan Rajendra
In the case of subpage-blocksize, this patch makes it possible to read
only a single metadata block from the disk instead of all the metadata
blocks that map into a page.

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/disk-io.c   |  52 -
 fs/btrfs/disk-io.h   |   3 ++
 fs/btrfs/extent_io.c | 128 +++
 3 files changed, 142 insertions(+), 41 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 299f353..b09d3e3 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -612,29 +612,36 @@ static noinline int check_leaf(struct btrfs_root *root,
return 0;
 }
 
-static int btree_readpage_end_io_hook(struct btrfs_io_bio *io_bio,
- u64 phy_offset, struct page *page,
- u64 start, u64 end, int mirror)
+int verify_extent_buffer_read(struct btrfs_io_bio *io_bio,
+   struct page *page,
+   u64 start, u64 end, int mirror)
 {
-   u64 found_start;
-   int found_level;
+   struct address_space *mapping = 
(io_bio->bio).bi_io_vec->bv_page->mapping;
+   struct extent_buffer_head *ebh;
struct extent_buffer *eb;
-   struct btrfs_root *root = BTRFS_I(page->mapping->host)->root;
+   struct btrfs_root *root = BTRFS_I(mapping->host)->root;
struct btrfs_fs_info *fs_info = root->fs_info;
-   int ret = 0;
+   u64 found_start;
+   int found_level;
int reads_done;
-
-   if (!page->private)
-   goto out;
+   int ret = 0;
 
eb = (struct extent_buffer *)page->private;
+   do {
+   if ((eb->start <= start) && (eb->start + eb->len - 1 > start))
+   break;
+   } while ((eb = eb->eb_next) != NULL);
+
+   ASSERT(eb);
+
+   ebh = eb_head(eb);
 
/* the pending IO might have been the only thing that kept this buffer
 * in memory.  Make sure we have a ref for all this other checks
 */
extent_buffer_get(eb);
 
-   reads_done = atomic_dec_and_test(_head(eb)->io_bvecs);
+   reads_done = atomic_dec_and_test(>io_bvecs);
if (!reads_done)
goto err;
 
@@ -690,30 +697,13 @@ err:
btree_readahead_hook(fs_info, eb, eb->start, ret);
 
if (ret) {
-   /*
-* our io error hook is going to dec the io pages
-* again, we have to make sure it has something
-* to decrement
-*/
atomic_inc(_head(eb)->io_bvecs);
clear_extent_buffer_uptodate(eb);
}
-   free_extent_buffer(eb);
-out:
-   return ret;
-}
 
-static int btree_io_failed_hook(struct page *page, int failed_mirror)
-{
-   struct extent_buffer *eb;
+   free_extent_buffer(eb);
 
-   eb = (struct extent_buffer *)page->private;
-   set_bit(EXTENT_BUFFER_READ_ERR, >ebflags);
-   eb->read_mirror = failed_mirror;
-   atomic_dec(_head(eb)->io_bvecs);
-   if (test_and_clear_bit(EXTENT_BUFFER_READAHEAD, >ebflags))
-   btree_readahead_hook(eb_head(eb)->fs_info, eb, eb->start, -EIO);
-   return -EIO;/* we fixed nothing */
+   return ret;
 }
 
 static void end_workqueue_bio(struct bio *bio)
@@ -4534,8 +4524,6 @@ static int btrfs_cleanup_transaction(struct btrfs_root 
*root)
 }
 
 static const struct extent_io_ops btree_extent_io_ops = {
-   .readpage_end_io_hook = btree_readpage_end_io_hook,
-   .readpage_io_failed_hook = btree_io_failed_hook,
.submit_bio_hook = btree_submit_bio_hook,
/* note we're sharing with inode.c for the merge bio hook */
.merge_bio_hook = btrfs_merge_bio_hook,
diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
index acba821..a81ff8d 100644
--- a/fs/btrfs/disk-io.h
+++ b/fs/btrfs/disk-io.h
@@ -113,6 +113,9 @@ static inline void btrfs_put_fs_root(struct btrfs_root 
*root)
kfree(root);
 }
 
+int verify_extent_buffer_read(struct btrfs_io_bio *io_bio,
+   struct page *page,
+   u64 start, u64 end, int mirror);
 void btrfs_mark_buffer_dirty(struct extent_buffer *buf);
 int btrfs_buffer_uptodate(struct extent_buffer *buf, u64 parent_transid,
  int atomic);
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 080baf7..a425f90 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -14,6 +14,7 @@
 #include "extent_io.h"
 #include "extent_map.h"
 #include "ctree.h"
+#include "disk-io.h"
 #include "btrfs_inode.h"
 #include "volumes.h"
 #include "check-integrity.h"
@@ -2207,7 +2208,7 @@ int repair_eb_io_failure(struct btrfs_root *root, struct 
extent_buffer *eb,
struct page *p = eb_head(eb)->pages[i];
 
ret = repair_io_failure(root->fs_info->btree_inode, start,
-   PAGE_SIZE, start, p,
+  

[PATCH V20 01/19] Btrfs: subpage-blocksize: Fix whole page read.

2016-07-03 Thread Chandan Rajendra
For the subpage-blocksize scenario, a page can contain multiple
blocks. In such cases, this patch handles reading data from files.

To track the status of individual blocks of a page, this patch makes use
of a bitmap pointed to by the newly introduced per-page 'struct
btrfs_page_private'.

The per-page btrfs_page_private->io_lock plays the same role as
BH_Uptodate_Lock (see end_buffer_async_read()) i.e. without the io_lock
we may end up in the following situation,

NOTE: Assume 64k page size and 4k block size. Also assume that the first
12 blocks of the page are contiguous while the next 4 blocks are
contiguous. When reading the page we end up submitting two "logical
address space" bios. So end_bio_extent_readpage function is invoked
twice, once for each bio.

|-+-+-|
| Task A  | Task B  | Task C  |
|-+-+-|
| end_bio_extent_readpage | | |
| process block 0 | | |
| - clear BLK_STATE_IO| | |
| - page_read_complete| | |
| process block 1 | | |
| | | |
| | | |
| | end_bio_extent_readpage | |
| | process block 0 | |
| | - clear BLK_STATE_IO| |
| | - page_read_complete| |
| | process block 1 | |
| | | |
| process block 11| process block 3 | |
| - clear BLK_STATE_IO| - clear BLK_STATE_IO| |
| - page_read_complete| - page_read_complete| |
|   - returns true|   - returns true| |
|   - unlock_page()   | | |
| | | lock_page() |
| |   - unlock_page()   | |
|-+-+-|

We end up incorrectly unlocking the page twice and "Task C" ends up
working on an unlocked page. So private->io_lock makes sure that only
one of the tasks gets "true" as the return value when page_io_complete()
is invoked. As an optimization the patch gets the io_lock only when the
last block of the bio_vec is being processed.

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/extent_io.c | 371 ---
 fs/btrfs/extent_io.h |  74 +-
 fs/btrfs/inode.c |  16 +--
 3 files changed, 338 insertions(+), 123 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index e197d47..a349f99 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -24,6 +24,7 @@
 
 static struct kmem_cache *extent_state_cache;
 static struct kmem_cache *extent_buffer_cache;
+static struct kmem_cache *page_private_cache;
 static struct bio_set *btrfs_bioset;
 
 static inline bool extent_state_in_tree(const struct extent_state *state)
@@ -174,10 +175,16 @@ int __init extent_io_init(void)
if (!extent_buffer_cache)
goto free_state_cache;
 
+   page_private_cache = kmem_cache_create("btrfs_page_private",
+   sizeof(struct btrfs_page_private), 0,
+   SLAB_RECLAIM_ACCOUNT | SLAB_MEM_SPREAD, NULL);
+   if (!page_private_cache)
+   goto free_buffer_cache;
+
btrfs_bioset = bioset_create(BIO_POOL_SIZE,
 offsetof(struct btrfs_io_bio, bio));
if (!btrfs_bioset)
-   goto free_buffer_cache;
+   goto free_page_private_cache;
 
if (bioset_integrity_create(btrfs_bioset, BIO_POOL_SIZE))
goto free_bioset;
@@ -188,6 +195,10 @@ free_bioset:
bioset_free(btrfs_bioset);
btrfs_bioset = NULL;
 
+free_page_private_cache:
+   kmem_cache_destroy(page_private_cache);
+   page_private_cache = NULL;
+
 free_buffer_cache:
kmem_cache_destroy(extent_buffer_cache);
extent_buffer_cache = NULL;
@@ -1323,6 +1334,95 @@ int clear_record_extent_bits(struct extent_io_tree 
*tree, u64 start, u64 end,
  changeset);
 }
 
+static int modify_page_blks_state(struct page *page,
+   unsigned long blk_states,
+   u64 start, u64 end, int set)
+{
+   struct inode *inode = page->mapping->host;
+   unsigned long *bitmap;
+   unsigned long first_state;
+   unsigned long state;
+   u64 nr_blks;
+   u64 blk;

[PATCH V20 06/19] Btrfs: subpage-blocksize: Write only dirty extent buffers belonging to a page

2016-07-03 Thread Chandan Rajendra
For the subpage-blocksize scenario, this patch adds the ability to write
a single extent buffer to the disk.

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/disk-io.c   |  32 +++---
 fs/btrfs/extent_io.c | 277 +--
 2 files changed, 242 insertions(+), 67 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index b09d3e3..2d9e86b 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -504,28 +504,30 @@ static int btree_read_extent_buffer_pages(struct 
btrfs_root *root,
 
 static int csum_dirty_buffer(struct btrfs_fs_info *fs_info, struct page *page)
 {
-   u64 start = page_offset(page);
-   u64 found_start;
struct extent_buffer *eb;
+   u64 found_start;
+   int ret;
 
eb = (struct extent_buffer *)page->private;
if (page != eb_head(eb)->pages[0])
return 0;
 
-   found_start = btrfs_header_bytenr(eb);
-   /*
-* Please do not consolidate these warnings into a single if.
-* It is useful to know what went wrong.
-*/
-   if (WARN_ON(found_start != start))
-   return -EUCLEAN;
-   if (WARN_ON(!PageUptodate(page)))
-   return -EUCLEAN;
-
-   ASSERT(memcmp_extent_buffer(eb, fs_info->fsid,
-   btrfs_header_fsid(), BTRFS_FSID_SIZE) == 0);
+   do {
+   if (!test_bit(EXTENT_BUFFER_WRITEBACK, >ebflags))
+   continue;
+   if (WARN_ON(!test_bit(EXTENT_BUFFER_UPTODATE, >ebflags)))
+   continue;
+   found_start = btrfs_header_bytenr(eb);
+   if (WARN_ON(found_start != eb->start))
+   return 0;
+   ASSERT(memcmp_extent_buffer(eb, fs_info->fsid,
+   btrfs_header_fsid(), BTRFS_FSID_SIZE) == 0);
+   ret = csum_tree_block(fs_info, eb, 0);
+   if (ret)
+   return ret;
+   } while ((eb = eb->eb_next) != NULL);
 
-   return csum_tree_block(fs_info, eb, 0);
+   return 0;
 }
 
 static int check_tree_block_fsid(struct btrfs_fs_info *fs_info,
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index a425f90..2b5fc13 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3724,29 +3724,49 @@ void wait_on_extent_buffer_writeback(struct 
extent_buffer *eb)
TASK_UNINTERRUPTIBLE);
 }
 
-static noinline_for_stack int
-lock_extent_buffer_for_io(struct extent_buffer *eb,
- struct btrfs_fs_info *fs_info,
- struct extent_page_data *epd)
+static void lock_extent_buffer_pages(struct extent_buffer_head *ebh,
+   struct extent_page_data *epd)
 {
+   struct extent_buffer *eb = >eb;
unsigned long i, num_pages;
-   int flush = 0;
+
+   num_pages = num_extent_pages(eb->start, eb->len);
+   for (i = 0; i < num_pages; i++) {
+   struct page *p = ebh->pages[i];
+   if (!trylock_page(p)) {
+   flush_write_bio(epd);
+   lock_page(p);
+   }
+   }
+
+   return;
+}
+
+static int noinline_for_stack
+lock_extent_buffer_for_io(struct extent_buffer *eb,
+   struct btrfs_fs_info *fs_info,
+   struct extent_page_data *epd)
+{
+   int dirty;
int ret = 0;
 
if (!btrfs_try_tree_write_lock(eb)) {
-   flush = 1;
flush_write_bio(epd);
btrfs_tree_lock(eb);
}
 
if (test_bit(EXTENT_BUFFER_WRITEBACK, >ebflags)) {
+   dirty = test_bit(EXTENT_BUFFER_DIRTY, >ebflags);
btrfs_tree_unlock(eb);
-   if (!epd->sync_io)
-   return 0;
-   if (!flush) {
-   flush_write_bio(epd);
-   flush = 1;
+   if (!epd->sync_io) {
+   if (!dirty)
+   return 1;
+   else
+   return 2;
}
+
+   flush_write_bio(epd);
+
while (1) {
wait_on_extent_buffer_writeback(eb);
btrfs_tree_lock(eb);
@@ -3769,29 +3789,14 @@ lock_extent_buffer_for_io(struct extent_buffer *eb,
__percpu_counter_add(_info->dirty_metadata_bytes,
 -eb->len,
 fs_info->dirty_metadata_batch);
-   ret = 1;
+   ret = 0;
} else {
spin_unlock(_head(eb)->refs_lock);
+   ret = 1;
}
 
btrfs_tree_unlock(eb);
 
-   if (!ret)
-   return ret;
-
-   num_pages = num_extent_pages(eb->start, eb->len);
-   for (i = 0; i < num_pages; i++) {
-   struct page *p = eb_head(eb)->pages[i];
-
-

Re: [PATCH v9 2/5] btrfs-progs: dedupe: Add enable command for dedupe command group

2016-07-03 Thread Qu Wenruo



At 06/30/2016 05:24 PM, Qu Wenruo wrote:

Add enable subcommand for dedupe commmand group.

Signed-off-by: Qu Wenruo 
---
 Documentation/btrfs-dedupe-inband.asciidoc | 114 ++-
 btrfs-completion   |   6 +-
 cmds-dedupe-ib.c   | 222 +
 ioctl.h|   2 +
 4 files changed, 342 insertions(+), 2 deletions(-)

diff --git a/Documentation/btrfs-dedupe-inband.asciidoc 
b/Documentation/btrfs-dedupe-inband.asciidoc
index 9ee2bc7..82f970a 100644
--- a/Documentation/btrfs-dedupe-inband.asciidoc
+++ b/Documentation/btrfs-dedupe-inband.asciidoc
@@ -22,7 +22,119 @@ use with caution.

 SUBCOMMAND
 --
-Nothing yet
+*enable* [options] ::
+Enable in-band de-duplication for a filesystem.
++
+`Options`
++
+-f|--force
+Force 'enable' command to be exected.
+Will skip memory limit check and allow 'enable' to be executed even in-band
+de-duplication is already enabled.
++
+NOTE: If re-enable dedupe with '-f' option, any unspecified parameter will be
+reset to its default value.
+
+-s|--storage-backend 
+Specify de-duplication hash storage backend.
+Only 'inmemory' backend is supported yet.
+If not specified, default value is 'inmemory'.
++
+Refer to *BACKENDS* sector for more information.
+
+-b|--blocksize 
+Specify dedupe block size.
+Supported values are power of 2 from '16K' to '8M'.
+Default value is '128K'.
++
+Refer to *BLOCKSIZE* sector for more information.
+
+-a|--hash-algorithm 
+Specify hash algorithm.
+Only 'sha256' is supported yet.
+
+-l|--limit-hash 
+Specify maximum number of hashes stored in memory.
+Only works for 'inmemory' backend.
+Conflicts with '-m' option.
++
+Only positive values are valid.
+Default value is '32K'.
+
+-m|--limit-memory 
+Specify maximum memory used for hashes.
+Only works for 'inmemory' backend.
+Conflicts with '-l' option.
++
+Only value larger than or equal to '1024' is valid.
+No default value.
++
+NOTE: Memory limit will be rounded down to kernel internal hash size,
+so the memory limit shown in 'btrfs dedupe status' may be different
+from the .
+
+WARNING: Too large value for '-l' or '-m' will easily trigger OOM.
+Please use with caution according to system memory.
+
+NOTE: In-band de-duplication is not compactible with compression yet.
+And compression has higher priority than in-band de-duplication, means if
+compression and de-duplication is enabled at the same time, only compression
+will work.
+
+BACKENDS
+
+Btrfs in-band de-duplication will support different storage backends, with
+different use case and features.
+
+In-memory backend::
+This backend provides backward-compatibility, and more fine-tuning options.
+But hash pool is non-persistent and may exhaust kernel memory if not setup
+properly.
++
+This backend can be used on old btrfs(without '-O dedupe' mkfs option).
+When used on old btrfs, this backend needs to be enabled manually after mount.
++
+Designed for fast hash search speed, in-memory backend will keep all dedupe
+hashes in memory. (Although overall performance is still much the same with
+'ondisk' backend if all 'ondisk' hash can be cached in memory)
++
+And only keeps limited number of hash in memory to avoid exhausting memory.
+Hashes over the limit will be dropped following Last-Recent-Use behavior.
+So this backend has a consistent overhead for given limit but can\'t ensure
+all duplicated blocks will be de-duplicated.
++
+After umount and mount, in-memory backend need to refill its hash pool.
+
+On-disk backend::
+This backend provides persistent hash pool, with more smart memory management
+for hash pool.
+But it\'s not backward-compatible, meaning it must be used with '-O dedupe' 
mkfs
+option and older kernel can\'t mount it read-write.
++
+Designed for de-duplication rate, hash pool is stored as btrfs B+ tree on disk.
+This behavior may cause extra disk IO for hash search under high memory
+pressure.
++
+After umount and mount, on-disk backend still has its hash on disk, no need to
+refill its dedupe hash pool.
+
+Currently, only 'inmemory' backend is supported in btrfs-progs.
+
+DEDUPE BLOCK SIZE
+
+In-band de-duplication is done at dedupe block size.
+Any data smaller than dedupe block size won\'t go through in-band
+de-duplication.
+
+And dedupe block size affects dedupe rate and fragmentation heavily.
+
+Smaller block size will cause more fragments, but higher dedupe rate.
+
+Larger block size will cause less fragments, but lower dedupe rate.
+
+In-band de-duplication rate is highly related to the workload pattern.
+So it\'s highly recommended to align dedupe block size to the workload
+block size to make full use of de-duplication.

 EXIT STATUS
 ---
diff --git a/btrfs-completion b/btrfs-completion
index 3ede77b..50f7ea2 100644
--- a/btrfs-completion
+++ b/btrfs-completion
@@ -29,7 +29,7 @@ _btrfs()

local cmd=${words[1]}

-

[PATCH] btrfs-progs: use the correct struct for BTRFS_IOC_LOGICAL_INO

2016-07-03 Thread Hans van Kranenburg
BTRFS_IOC_LOGICAL_INO takes a btrfs_ioctl_logical_ino_args as argument,
not a btrfs_ioctl_ino_path_args. The lines were probably copy/pasted
when the code was written.

Since btrfs_ioctl_logical_ino_args and btrfs_ioctl_ino_path_args have
the same size, the actual IOCTL definition here does not change.

But, it makes the code less confusing for the reader.

Signed-off-by: Hans van Kranenburg 
---
 ioctl.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ioctl.h b/ioctl.h
index 5f18bcb..620dd3d 100644
--- a/ioctl.h
+++ b/ioctl.h
@@ -685,7 +685,7 @@ static inline char *btrfs_err_str(enum btrfs_err_code 
err_code)
 #define BTRFS_IOC_INO_PATHS _IOWR(BTRFS_IOCTL_MAGIC, 35, \
struct btrfs_ioctl_ino_path_args)
 #define BTRFS_IOC_LOGICAL_INO _IOWR(BTRFS_IOCTL_MAGIC, 36, \
-   struct btrfs_ioctl_ino_path_args)
+   struct btrfs_ioctl_logical_ino_args)
 #define BTRFS_IOC_SET_RECEIVED_SUBVOL _IOWR(BTRFS_IOCTL_MAGIC, 37, \
struct btrfs_ioctl_received_subvol_args)
 #define BTRFS_IOC_SEND _IOW(BTRFS_IOCTL_MAGIC, 38, struct 
btrfs_ioctl_send_args)
-- 
2.8.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: use the correct struct for BTRFS_IOC_LOGICAL_INO

2016-07-03 Thread Hans van Kranenburg
BTRFS_IOC_LOGICAL_INO takes a btrfs_ioctl_logical_ino_args as argument,
not a btrfs_ioctl_ino_path_args. The lines were probably copy/pasted
when the code was written.

Since btrfs_ioctl_logical_ino_args and btrfs_ioctl_ino_path_args have
the same size, the actual IOCTL definition here does not change.

But, it makes the code less confusing for the reader.

Signed-off-by: Hans van Kranenburg 
---
 include/uapi/linux/btrfs.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h
index 2bdd1e3..ac5eacd 100644
--- a/include/uapi/linux/btrfs.h
+++ b/include/uapi/linux/btrfs.h
@@ -798,7 +798,7 @@ static inline char *btrfs_err_str(enum btrfs_err_code 
err_code)
 #define BTRFS_IOC_INO_PATHS _IOWR(BTRFS_IOCTL_MAGIC, 35, \
struct btrfs_ioctl_ino_path_args)
 #define BTRFS_IOC_LOGICAL_INO _IOWR(BTRFS_IOCTL_MAGIC, 36, \
-   struct btrfs_ioctl_ino_path_args)
+   struct btrfs_ioctl_logical_ino_args)
 #define BTRFS_IOC_SET_RECEIVED_SUBVOL _IOWR(BTRFS_IOCTL_MAGIC, 37, \
struct btrfs_ioctl_received_subvol_args)
 #define BTRFS_IOC_SEND _IOW(BTRFS_IOCTL_MAGIC, 38, struct 
btrfs_ioctl_send_args)
-- 
2.8.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs defrag questions

2016-07-03 Thread Adam Borowski
On Sun, Jul 03, 2016 at 04:15:02PM +0200, Henk Slager wrote:
> >> Provided that Dropbox is running in the system, does it mean that it
> >> cannot be defagmented?
> >
> > That is probably true. Files that are mapped into memory (like running
> > executables) cannot be changed on disk. You could make a copy of that
> > file, remove the original, and rename the new into place. As long as
> > the executable is running it will stay on disk but you can now
> > defragment the file and next time dropbox is started it will use the
> > new one.
> 
> I get:
> ERROR: cannot open ./dropbox: Text file busy
> 
> when I run:
> btrfs fi defrag -v ./dropbox
> 
> This is with kernel 4.6.2 and progs 4.6.1, dropbox running and mount
> option compress=lzo

This is the same thing as with dedupe: the kernel requires you to have the
file opened for writing despite there being no direct reasons for this.
Defragging is not a write operation in POSIX sense: it doesn't alter the
file's contents in any way.

I think it'd be good to relax this requirement to check whether the user
_could_ open the file for writing (ie, cap or w permissions).

-- 
An imaginary friend squared is a real enemy.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs defrag questions

2016-07-03 Thread Henk Slager
On Sun, Jul 3, 2016 at 12:33 PM, Kai Krakow  wrote:
> Am Fri, 1 Jul 2016 22:14:00 +0200
> schrieb Dmitry Katsubo :
>
>> Hello everyone,
>>
>> Question #1:
>>
>> While doing defrag I got the following message:
>>
>> # btrfs fi defrag -r /home
>> ERROR: defrag failed on /home/user/.dropbox-dist/dropbox: Success
>> total 1 failures
>>
>> I feel that something went wrong, but the message is a bit misleading.
>>
>> Provided that Dropbox is running in the system, does it mean that it
>> cannot be defagmented?
>
> That is probably true. Files that are mapped into memory (like running
> executables) cannot be changed on disk. You could make a copy of that
> file, remove the original, and rename the new into place. As long as
> the executable is running it will stay on disk but you can now
> defragment the file and next time dropbox is started it will use the
> new one.

I get:
ERROR: cannot open ./dropbox: Text file busy

when I run:
btrfs fi defrag -v ./dropbox

This is with kernel 4.6.2 and progs 4.6.1, dropbox running and mount
option compress=lzo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs defrag questions

2016-07-03 Thread Kai Krakow
Am Fri, 1 Jul 2016 22:14:00 +0200
schrieb Dmitry Katsubo :

> Hello everyone,
> 
> Question #1:
> 
> While doing defrag I got the following message:
> 
> # btrfs fi defrag -r /home
> ERROR: defrag failed on /home/user/.dropbox-dist/dropbox: Success
> total 1 failures
> 
> I feel that something went wrong, but the message is a bit misleading.
> 
> Provided that Dropbox is running in the system, does it mean that it
> cannot be defagmented?

That is probably true. Files that are mapped into memory (like running
executables) cannot be changed on disk. You could make a copy of that
file, remove the original, and rename the new into place. As long as
the executable is running it will stay on disk but you can now
defragment the file and next time dropbox is started it will use the
new one.

> Question #2:
> 
> Suppose that in above example /home/ftp is mounted as another btrfs
> array (not subvolume). Will 'btrfs fi defrag -r /home' defragment it
> (recursively) as well?

Yes, last time I tried the command crossed file system boundaries. It
will simply report ioctl errors if it operates on incompatible files
and continue its way.

-- 
Regards,
Kai

Replies to list-only preferred.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html