Machine lockup due to btrfs-transaction on AWS EC2 Ubuntu 14.04

2014-07-29 Thread Peter Waller
Hi All,

I've reported a bug with Ubuntu here:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1349711

The machine in question has one BTRFS volume which is 87% full and
lives on an Logical Volume Manager (LVM) block device on top of one
Amazon Elastic Block Store (EBS) device.

We have other machines in a similar configuration which have not
displayed this behaviour.

The one thing which makes this machine different is that it has
directories which contain many thousands of files. We don't make heavy
use of subvolumes or snapshots.

More details follow:

# cat /proc/version_signature
Ubuntu 3.13.0-32.57-generic 3.13.11.4

The machine had a soft-lockup with messages like this appearing on the console:

[246736.752053] INFO: rcu_sched self-detected stall on CPU { 0}
(t=2220246 jiffies g=35399662 c=35399661 q=0)
[246736.756059] INFO: rcu_sched detected stalls on CPUs/tasks: { 0}
(detected by 1, t=2220247 jiffies, g=35399662, c=35399661, q=0)
[246764.192014] BUG: soft lockup - CPU#0 stuck for 23s! [kworker/u30:2:1828]
[246764.212058] BUG: soft lockup - CPU#1 stuck for 23s! [btrfs-transacti:492]


After the first lockup and reboot, the following messages were in
dmesg, which I ignored because after some research I saw that they
were changed to warnings and considered non-harmful. A btrfs-scrub
performed after this failed without error:


[ 77.609490] BTRFS error (device dm-0): block group 10766778368 has
wrong amount of free space
[ 77.613678] BTRFS error (device dm-0): failed to load free space
cache for block group 10766778368
[ 77.643801] BTRFS error (device dm-0): block group 19356712960 has
wrong amount of free space
[ 77.648952] BTRFS error (device dm-0): failed to load free space
cache for block group 19356712960
[ 77.926325] BTRFS error (device dm-0): block group 20430454784 has
wrong amount of free space
[ 77.931078] BTRFS error (device dm-0): failed to load free space
cache for block group 20430454784
[ 78.111437] BTRFS error (device dm-0): block group 21504196608 has
wrong amount of free space
[ 78.116165] BTRFS error (device dm-0): failed to load free space
cache for block group 21504196608


After the second time I've observed the lockup and rebooted, these
messages have appeared:


[ 45.390221] BTRFS error (device dm-0): free space inode generation
(0) did not match free space cache generation (70012)
[ 45.413472] BTRFS error (device dm-0): free space inode generation
(0) did not match free space cache generation (70012)
[ 467.423961] BTRFS error (device dm-0): block group 518646661120 has
wrong amount of free space
[ 467.429251] BTRFS error (device dm-0): failed to load free space
cache for block group 518646661120


I would like to know if these second messages are harmful and if
remedial action is needed in response to the latter messages.
Searching for messages similar to my lockup I found this report which
suggested the problem may be fixed in 3.14.

Any advice appreciated,

Thanks,

- Peter
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Machine lockup due to btrfs-transaction on AWS EC2 Ubuntu 14.04

2014-07-29 Thread Peter Waller
Someone on IRC suggested that I clear the free cache:

 sudo mount -o remount,clear_cache /path/to/dev /path/to/mount
 sudo mount -o remount,space_cache /path/to/dev /path/to/mount


The former command printed `btrfs: disk space caching is enabled` and
the latter repeated it, making me think that maybe the latter was
unnecessary.

On 29 July 2014 09:04, Peter Waller pe...@scraperwiki.com wrote:
 Hi All,

 I've reported a bug with Ubuntu here:
 https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1349711

 The machine in question has one BTRFS volume which is 87% full and
 lives on an Logical Volume Manager (LVM) block device on top of one
 Amazon Elastic Block Store (EBS) device.

 We have other machines in a similar configuration which have not
 displayed this behaviour.

 The one thing which makes this machine different is that it has
 directories which contain many thousands of files. We don't make heavy
 use of subvolumes or snapshots.

 More details follow:

 # cat /proc/version_signature
 Ubuntu 3.13.0-32.57-generic 3.13.11.4

 The machine had a soft-lockup with messages like this appearing on the 
 console:

 [246736.752053] INFO: rcu_sched self-detected stall on CPU { 0}
 (t=2220246 jiffies g=35399662 c=35399661 q=0)
 [246736.756059] INFO: rcu_sched detected stalls on CPUs/tasks: { 0}
 (detected by 1, t=2220247 jiffies, g=35399662, c=35399661, q=0)
 [246764.192014] BUG: soft lockup - CPU#0 stuck for 23s! [kworker/u30:2:1828]
 [246764.212058] BUG: soft lockup - CPU#1 stuck for 23s! [btrfs-transacti:492]


 After the first lockup and reboot, the following messages were in
 dmesg, which I ignored because after some research I saw that they
 were changed to warnings and considered non-harmful. A btrfs-scrub
 performed after this failed without error:


 [ 77.609490] BTRFS error (device dm-0): block group 10766778368 has
 wrong amount of free space
 [ 77.613678] BTRFS error (device dm-0): failed to load free space
 cache for block group 10766778368
 [ 77.643801] BTRFS error (device dm-0): block group 19356712960 has
 wrong amount of free space
 [ 77.648952] BTRFS error (device dm-0): failed to load free space
 cache for block group 19356712960
 [ 77.926325] BTRFS error (device dm-0): block group 20430454784 has
 wrong amount of free space
 [ 77.931078] BTRFS error (device dm-0): failed to load free space
 cache for block group 20430454784
 [ 78.111437] BTRFS error (device dm-0): block group 21504196608 has
 wrong amount of free space
 [ 78.116165] BTRFS error (device dm-0): failed to load free space
 cache for block group 21504196608


 After the second time I've observed the lockup and rebooted, these
 messages have appeared:


 [ 45.390221] BTRFS error (device dm-0): free space inode generation
 (0) did not match free space cache generation (70012)
 [ 45.413472] BTRFS error (device dm-0): free space inode generation
 (0) did not match free space cache generation (70012)
 [ 467.423961] BTRFS error (device dm-0): block group 518646661120 has
 wrong amount of free space
 [ 467.429251] BTRFS error (device dm-0): failed to load free space
 cache for block group 518646661120


 I would like to know if these second messages are harmful and if
 remedial action is needed in response to the latter messages.
 Searching for messages similar to my lockup I found this report which
 suggested the problem may be fixed in 3.14.

 Any advice appreciated,

 Thanks,

 - Peter
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 09/12] Btrfs: modify clean_io_failure and make it suit direct io

2014-07-29 Thread Miao Xie
We could not use clean_io_failure in the direct IO path because it got the
filesystem information from the page structure, but the page in the direct
IO bio didn't have the filesystem information in its structure. So we need
modify it and pass all the information it need by parameters.

Signed-off-by: Miao Xie mi...@cn.fujitsu.com
---
Changelog v1-v2:
- None
---
 fs/btrfs/extent_io.c | 31 +++
 fs/btrfs/extent_io.h |  6 +++---
 fs/btrfs/scrub.c |  3 +--
 3 files changed, 19 insertions(+), 21 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 1389759..8082220 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1992,10 +1992,10 @@ static int free_io_failure(struct inode *inode, struct 
io_failure_record *rec)
  * currently, there can be no more than two copies of every data bit. thus,
  * exactly one rewrite is required.
  */
-int repair_io_failure(struct btrfs_fs_info *fs_info, u64 start,
-   u64 length, u64 logical, struct page *page,
-   unsigned int pg_offset, int mirror_num)
+int repair_io_failure(struct inode *inode, u64 start, u64 length, u64 logical,
+ struct page *page, unsigned int pg_offset, int mirror_num)
 {
+   struct btrfs_fs_info *fs_info = BTRFS_I(inode)-root-fs_info;
struct bio *bio;
struct btrfs_device *dev;
u64 map_length = 0;
@@ -2043,10 +2043,9 @@ int repair_io_failure(struct btrfs_fs_info *fs_info, u64 
start,
}
 
printk_ratelimited_in_rcu(KERN_INFO
-   BTRFS: read error corrected: ino %lu off %llu 
-   (dev %s sector %llu)\n, page-mapping-host-i_ino,
-   start, rcu_str_deref(dev-name), sector);
-
+ BTRFS: read error corrected: ino %llu off 
%llu (dev %s sector %llu)\n,
+ btrfs_ino(inode), start,
+ rcu_str_deref(dev-name), sector);
bio_put(bio);
return 0;
 }
@@ -2063,9 +2062,10 @@ int repair_eb_io_failure(struct btrfs_root *root, struct 
extent_buffer *eb,
 
for (i = 0; i  num_pages; i++) {
struct page *p = extent_buffer_page(eb, i);
-   ret = repair_io_failure(root-fs_info, start, PAGE_CACHE_SIZE,
-   start, p, start - page_offset(p),
-   mirror_num);
+
+   ret = repair_io_failure(root-fs_info-btree_inode, start,
+   PAGE_CACHE_SIZE, start, p,
+   start - page_offset(p), mirror_num);
if (ret)
break;
start += PAGE_CACHE_SIZE;
@@ -2078,12 +2078,12 @@ int repair_eb_io_failure(struct btrfs_root *root, 
struct extent_buffer *eb,
  * each time an IO finishes, we do a fast check in the IO failure tree
  * to see if we need to process or clean up an io_failure_record
  */
-static int clean_io_failure(u64 start, struct page *page)
+static int clean_io_failure(struct inode *inode, u64 start,
+   struct page *page, unsigned int pg_offset)
 {
u64 private;
u64 private_failure;
struct io_failure_record *failrec;
-   struct inode *inode = page-mapping-host;
struct btrfs_fs_info *fs_info = BTRFS_I(inode)-root-fs_info;
struct extent_state *state;
int num_copies;
@@ -2123,10 +2123,9 @@ static int clean_io_failure(u64 start, struct page *page)
num_copies = btrfs_num_copies(fs_info, failrec-logical,
  failrec-len);
if (num_copies  1)  {
-   repair_io_failure(fs_info, start, failrec-len,
+   repair_io_failure(inode, start, failrec-len,
  failrec-logical, page,
- start - page_offset(page),
- failrec-failed_mirror);
+ pg_offset, failrec-failed_mirror);
}
}
 
@@ -2535,7 +2534,7 @@ static void end_bio_extent_readpage(struct bio *bio, int 
err)
if (ret)
uptodate = 0;
else
-   clean_io_failure(start, page);
+   clean_io_failure(inode, start, page, 0);
}
 
if (likely(uptodate))
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 4366453..7662eaa 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -341,9 +341,9 @@ struct bio *btrfs_bio_clone(struct bio *bio, gfp_t 
gfp_mask);
 
 struct btrfs_fs_info;
 
-int repair_io_failure(struct btrfs_fs_info *fs_info, u64 start,
-   u64 length, u64 logical, struct page *page,
-   unsigned int pg_offset, 

[PATCH v2 00/12] Implement the data repair function for direct read

2014-07-29 Thread Miao Xie
This patchset implement the data repair function for the direct read, it
is implemented like buffered read:
1.When we find the data is not right, we try to read the data from the other
  mirror.
2.When the io on the mirror ends, we will insert the endio work into the
  system workqueue, not btrfs own endio workqueue, because the original
  endio work is still blocked in the btrfs endio workqueue, if we insert
  the endio work of the io on the mirror into that workqueue, deadlock
  would happen.
3.If We get right data, we write it back to repair the corrupted mirror.
4.If the data on the new mirror is still corrupted, we will try next
  mirror until we read right data or all the mirrors are traversed.
5.After the above work, we set the uptodate flag according to the result.

The difference is that the direct read may be splited to several small io,
in order to get the number of the mirror on which the io error happens. we
have to do data check and repair on the end IO function of those sub-IO
request.

Besides that, we also fixed some bugs of direct io.

Changelog v1 - v2:
- Fix the warning which was triggered by __GFP_ZERO in the 2nd patch

We can pull this patchset from the URL

  https://github.com/miaoxie/linux-btrfs.git for-Chris

Thanks
Miao
---
Miao Xie (12):
  Btrfs: fix put dio bio twice when we submit dio bio fail
  Btrfs: load checksum data once when submitting a direct read io
  Btrfs: cleanup similar code of the buffered data data check and dio
read data check
  Btrfs: do file data check by sub-bio's self
  Btrfs: fix missing error handler if submiting re-read bio fails
  Btrfs: Cleanup unused variant and argument of IO failure handlers
  Btrfs: split bio_readpage_error into several functions
  Btrfs: modify repair_io_failure and make it suit direct io
  Btrfs: modify clean_io_failure and make it suit direct io
  Btrfs: Set real mirror number for read operation on RAID0/5/6
  Btrfs: implement repair function when direct read fails
  Btrfs: cleanup the read failure record after write or when the inode
is freeing

 fs/btrfs/btrfs_inode.h |  10 +-
 fs/btrfs/ctree.h   |   3 +-
 fs/btrfs/disk-io.c |  43 +++--
 fs/btrfs/disk-io.h |   1 +
 fs/btrfs/extent_io.c   | 254 ++--
 fs/btrfs/extent_io.h   |  38 -
 fs/btrfs/file-item.c   |  14 +-
 fs/btrfs/inode.c   | 451 -
 fs/btrfs/scrub.c   |   4 +-
 fs/btrfs/volumes.c |   5 +
 fs/btrfs/volumes.h |   5 +-
 11 files changed, 622 insertions(+), 206 deletions(-)

-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 03/12] Btrfs: cleanup similar code of the buffered data data check and dio read data check

2014-07-29 Thread Miao Xie
Signed-off-by: Miao Xie mi...@cn.fujitsu.com
---
Changelog v1-v2:
- None
---
 fs/btrfs/inode.c | 102 +--
 1 file changed, 47 insertions(+), 55 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index fd88126..2e261b1 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2830,6 +2830,40 @@ static int btrfs_writepage_end_io_hook(struct page 
*page, u64 start, u64 end,
return 0;
 }
 
+static int __readpage_endio_check(struct inode *inode,
+ struct btrfs_io_bio *io_bio,
+ int icsum, struct page *page,
+ int pgoff, u64 start, size_t len)
+{
+   char *kaddr;
+   u32 csum_expected;
+   u32 csum = ~(u32)0;
+   static DEFINE_RATELIMIT_STATE(_rs, DEFAULT_RATELIMIT_INTERVAL,
+ DEFAULT_RATELIMIT_BURST);
+
+   csum_expected = *(((u32 *)io_bio-csum) + icsum);
+
+   kaddr = kmap_atomic(page);
+   csum = btrfs_csum_data(kaddr + pgoff, csum,  len);
+   btrfs_csum_final(csum, (char *)csum);
+   if (csum != csum_expected)
+   goto zeroit;
+
+   kunmap_atomic(kaddr);
+   return 0;
+zeroit:
+   if (__ratelimit(_rs))
+   btrfs_info(BTRFS_I(inode)-root-fs_info,
+  csum failed ino %llu off %llu csum %u expected csum 
%u,
+  btrfs_ino(inode), start, csum, csum_expected);
+   memset(kaddr + pgoff, 1, len);
+   flush_dcache_page(page);
+   kunmap_atomic(kaddr);
+   if (csum_expected == 0)
+   return 0;
+   return -EIO;
+}
+
 /*
  * when reads are done, we need to check csums to verify the data is correct
  * if there's a match, we allow the bio to finish.  If not, the code in
@@ -2842,20 +2876,15 @@ static int btrfs_readpage_end_io_hook(struct 
btrfs_io_bio *io_bio,
size_t offset = start - page_offset(page);
struct inode *inode = page-mapping-host;
struct extent_io_tree *io_tree = BTRFS_I(inode)-io_tree;
-   char *kaddr;
struct btrfs_root *root = BTRFS_I(inode)-root;
-   u32 csum_expected;
-   u32 csum = ~(u32)0;
-   static DEFINE_RATELIMIT_STATE(_rs, DEFAULT_RATELIMIT_INTERVAL,
- DEFAULT_RATELIMIT_BURST);
 
if (PageChecked(page)) {
ClearPageChecked(page);
-   goto good;
+   return 0;
}
 
if (BTRFS_I(inode)-flags  BTRFS_INODE_NODATASUM)
-   goto good;
+   return 0;
 
if (root-root_key.objectid == BTRFS_DATA_RELOC_TREE_OBJECTID 
test_range_bit(io_tree, start, end, EXTENT_NODATASUM, 1, NULL)) {
@@ -2865,28 +2894,8 @@ static int btrfs_readpage_end_io_hook(struct 
btrfs_io_bio *io_bio,
}
 
phy_offset = inode-i_sb-s_blocksize_bits;
-   csum_expected = *(((u32 *)io_bio-csum) + phy_offset);
-
-   kaddr = kmap_atomic(page);
-   csum = btrfs_csum_data(kaddr + offset, csum,  end - start + 1);
-   btrfs_csum_final(csum, (char *)csum);
-   if (csum != csum_expected)
-   goto zeroit;
-
-   kunmap_atomic(kaddr);
-good:
-   return 0;
-
-zeroit:
-   if (__ratelimit(_rs))
-   btrfs_info(root-fs_info, csum failed ino %llu off %llu csum 
%u expected csum %u,
-   btrfs_ino(page-mapping-host), start, csum, 
csum_expected);
-   memset(kaddr + offset, 1, end - start + 1);
-   flush_dcache_page(page);
-   kunmap_atomic(kaddr);
-   if (csum_expected == 0)
-   return 0;
-   return -EIO;
+   return __readpage_endio_check(inode, io_bio, phy_offset, page, offset,
+ start, (size_t)(end - start + 1));
 }
 
 struct delayed_iput {
@@ -7079,41 +7088,24 @@ static void btrfs_endio_direct_read(struct bio *bio, 
int err)
struct btrfs_dio_private *dip = bio-bi_private;
struct bio_vec *bvec;
struct inode *inode = dip-inode;
-   struct btrfs_root *root = BTRFS_I(inode)-root;
struct bio *dio_bio;
struct btrfs_io_bio *io_bio = btrfs_io_bio(bio);
-   u32 *csums = (u32 *)io_bio-csum;
u64 start;
+   int ret;
int i;
 
+   if (err || (BTRFS_I(inode)-flags  BTRFS_INODE_NODATASUM))
+   goto skip_checksum;
+
start = dip-logical_offset;
bio_for_each_segment_all(bvec, bio, i) {
-   if (!(BTRFS_I(inode)-flags  BTRFS_INODE_NODATASUM)) {
-   struct page *page = bvec-bv_page;
-   char *kaddr;
-   u32 csum = ~(u32)0;
-   unsigned long flags;
-
-   local_irq_save(flags);
-   kaddr = kmap_atomic(page);
-   csum = btrfs_csum_data(kaddr + bvec-bv_offset,
-  csum, bvec-bv_len);
-   

[PATCH v2 08/12] Btrfs: modify repair_io_failure and make it suit direct io

2014-07-29 Thread Miao Xie
The original code of repair_io_failure was just used for buffered read,
because it got some filesystem data from page structure, it is safe for
the page in the page cache. But when we do a direct read, the pages in bio
are not in the page cache, that is there is no filesystem data in the page
structure. In order to implement direct read data repair, we need modify
repair_io_failure and pass all filesystem data it need by function
parameters.

Signed-off-by: Miao Xie mi...@cn.fujitsu.com
---
Changelog v1-v2:
- None
---
 fs/btrfs/extent_io.c | 8 +---
 fs/btrfs/extent_io.h | 2 +-
 fs/btrfs/scrub.c | 1 +
 3 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index daa3e9c..1389759 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1994,7 +1994,7 @@ static int free_io_failure(struct inode *inode, struct 
io_failure_record *rec)
  */
 int repair_io_failure(struct btrfs_fs_info *fs_info, u64 start,
u64 length, u64 logical, struct page *page,
-   int mirror_num)
+   unsigned int pg_offset, int mirror_num)
 {
struct bio *bio;
struct btrfs_device *dev;
@@ -2033,7 +2033,7 @@ int repair_io_failure(struct btrfs_fs_info *fs_info, u64 
start,
return -EIO;
}
bio-bi_bdev = dev-bdev;
-   bio_add_page(bio, page, length, start - page_offset(page));
+   bio_add_page(bio, page, length, pg_offset);
 
if (btrfsic_submit_bio_wait(WRITE_SYNC, bio)) {
/* try to remap that extent elsewhere? */
@@ -2064,7 +2064,8 @@ int repair_eb_io_failure(struct btrfs_root *root, struct 
extent_buffer *eb,
for (i = 0; i  num_pages; i++) {
struct page *p = extent_buffer_page(eb, i);
ret = repair_io_failure(root-fs_info, start, PAGE_CACHE_SIZE,
-   start, p, mirror_num);
+   start, p, start - page_offset(p),
+   mirror_num);
if (ret)
break;
start += PAGE_CACHE_SIZE;
@@ -2124,6 +2125,7 @@ static int clean_io_failure(u64 start, struct page *page)
if (num_copies  1)  {
repair_io_failure(fs_info, start, failrec-len,
  failrec-logical, page,
+ start - page_offset(page),
  failrec-failed_mirror);
}
}
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 4ce0547..4366453 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -343,7 +343,7 @@ struct btrfs_fs_info;
 
 int repair_io_failure(struct btrfs_fs_info *fs_info, u64 start,
u64 length, u64 logical, struct page *page,
-   int mirror_num);
+   unsigned int pg_offset, int mirror_num);
 int end_extent_writepage(struct page *page, int err, u64 start, u64 end);
 int repair_eb_io_failure(struct btrfs_root *root, struct extent_buffer *eb,
 int mirror_num);
diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index b6d198f..0609245 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -684,6 +684,7 @@ static int scrub_fixup_readpage(u64 inum, u64 offset, u64 
root, void *fixup_ctx)
fs_info = BTRFS_I(inode)-root-fs_info;
ret = repair_io_failure(fs_info, offset, PAGE_SIZE,
fixup-logical, page,
+   offset - page_offset(page),
fixup-mirror_num);
unlock_page(page);
corrected = !ret;
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 01/12] Btrfs: fix put dio bio twice when we submit dio bio fail

2014-07-29 Thread Miao Xie
The caller of btrfs_submit_direct_hook() will put the original dio bio
when btrfs_submit_direct_hook() return a error number, so we needn't
put the original bio in btrfs_submit_direct_hook().

Signed-off-by: Miao Xie mi...@cn.fujitsu.com
---
Changelog v1-v2:
- None
---
 fs/btrfs/inode.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 6b65fab..548489e 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7294,10 +7294,8 @@ static int btrfs_submit_direct_hook(int rw, struct 
btrfs_dio_private *dip,
map_length = orig_bio-bi_iter.bi_size;
ret = btrfs_map_block(root-fs_info, rw, start_sector  9,
  map_length, NULL, 0);
-   if (ret) {
-   bio_put(orig_bio);
+   if (ret)
return -EIO;
-   }
 
if (map_length = orig_bio-bi_iter.bi_size) {
bio = orig_bio;
@@ -7314,6 +7312,7 @@ static int btrfs_submit_direct_hook(int rw, struct 
btrfs_dio_private *dip,
bio = btrfs_dio_bio_alloc(orig_bio-bi_bdev, start_sector, GFP_NOFS);
if (!bio)
return -ENOMEM;
+
bio-bi_private = dip;
bio-bi_end_io = btrfs_end_dio_bio;
atomic_inc(dip-pending_bios);
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 11/12] Btrfs: implement repair function when direct read fails

2014-07-29 Thread Miao Xie
This patch implement data repair function when direct read fails.

The detail of the implementation is:
- When we find the data is not right, we try to read the data from the other
  mirror.
- After we get right data, we write it back to the corrupted mirror.
- And if the data on the new mirror is still corrupted, we will try next
  mirror until we read right data or all the mirrors are traversed.
- After the above work, we set the uptodate flag according to the result.

Signed-off-by: Miao Xie mi...@cn.fujitsu.com
---
Changelog v1-v2:
- None
---
 fs/btrfs/btrfs_inode.h |   2 +-
 fs/btrfs/disk-io.c |  43 ++--
 fs/btrfs/disk-io.h |   1 +
 fs/btrfs/extent_io.c   |  12 ++-
 fs/btrfs/extent_io.h   |   5 +-
 fs/btrfs/inode.c   | 276 +
 6 files changed, 300 insertions(+), 39 deletions(-)

diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index 745fca40..20d4975 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -271,7 +271,7 @@ struct btrfs_dio_private {
 * The original bio may be splited to several sub-bios, this is
 * done during endio of sub-bios
 */
-   int (*subio_endio)(struct inode *, struct btrfs_io_bio *);
+   int (*subio_endio)(struct inode *, struct btrfs_io_bio *, int);
 };
 
 /*
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 08e65e9..56b1546 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -691,6 +691,27 @@ static int btree_io_failed_hook(struct page *page, int 
failed_mirror)
return -EIO;/* we fixed nothing */
 }
 
+static inline void do_end_workqueue_fn(struct end_io_wq *end_io_wq)
+{
+   struct bio *bio = end_io_wq-bio;
+
+   bio-bi_private = end_io_wq-private;
+   bio-bi_end_io = end_io_wq-end_io;
+   bio_endio_nodec(bio, end_io_wq-error);
+   kfree(end_io_wq);
+}
+
+static void dio_end_workqueue_fn(struct work_struct *work)
+{
+   struct btrfs_work *bwork;
+   struct end_io_wq *end_io_wq;
+
+   bwork = container_of(work, struct btrfs_work, normal_work);
+   end_io_wq = container_of(bwork, struct end_io_wq, work);
+
+   do_end_workqueue_fn(end_io_wq);
+}
+
 static void end_workqueue_bio(struct bio *bio, int err)
 {
struct end_io_wq *end_io_wq = bio-bi_private;
@@ -698,7 +719,12 @@ static void end_workqueue_bio(struct bio *bio, int err)
 
fs_info = end_io_wq-info;
end_io_wq-error = err;
-   btrfs_init_work(end_io_wq-work, end_workqueue_fn, NULL, NULL);
+
+   if (likely(end_io_wq-metadata != BTRFS_WQ_ENDIO_DIO_REPAIR))
+   btrfs_init_work(end_io_wq-work, end_workqueue_fn, NULL,
+   NULL);
+   else
+   INIT_WORK(end_io_wq-work.normal_work, dio_end_workqueue_fn);
 
if (bio-bi_rw  REQ_WRITE) {
if (end_io_wq-metadata == BTRFS_WQ_ENDIO_METADATA)
@@ -714,7 +740,9 @@ static void end_workqueue_bio(struct bio *bio, int err)
btrfs_queue_work(fs_info-endio_write_workers,
 end_io_wq-work);
} else {
-   if (end_io_wq-metadata == BTRFS_WQ_ENDIO_RAID56)
+   if (unlikely(end_io_wq-metadata == BTRFS_WQ_ENDIO_DIO_REPAIR))
+   queue_work(system_wq, end_io_wq-work.normal_work);
+   else if (end_io_wq-metadata == BTRFS_WQ_ENDIO_RAID56)
btrfs_queue_work(fs_info-endio_raid56_workers,
 end_io_wq-work);
else if (end_io_wq-metadata)
@@ -738,6 +766,7 @@ int btrfs_bio_wq_end_io(struct btrfs_fs_info *info, struct 
bio *bio,
int metadata)
 {
struct end_io_wq *end_io_wq;
+
end_io_wq = kmalloc(sizeof(*end_io_wq), GFP_NOFS);
if (!end_io_wq)
return -ENOMEM;
@@ -1730,18 +1759,10 @@ static int setup_bdi(struct btrfs_fs_info *info, struct 
backing_dev_info *bdi)
  */
 static void end_workqueue_fn(struct btrfs_work *work)
 {
-   struct bio *bio;
struct end_io_wq *end_io_wq;
-   int error;
 
end_io_wq = container_of(work, struct end_io_wq, work);
-   bio = end_io_wq-bio;
-
-   error = end_io_wq-error;
-   bio-bi_private = end_io_wq-private;
-   bio-bi_end_io = end_io_wq-end_io;
-   kfree(end_io_wq);
-   bio_endio_nodec(bio, error);
+   do_end_workqueue_fn(end_io_wq);
 }
 
 static int cleaner_kthread(void *arg)
diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
index 23ce3ce..4fde7a0 100644
--- a/fs/btrfs/disk-io.h
+++ b/fs/btrfs/disk-io.h
@@ -30,6 +30,7 @@ enum {
BTRFS_WQ_ENDIO_METADATA = 1,
BTRFS_WQ_ENDIO_FREE_SPACE = 2,
BTRFS_WQ_ENDIO_RAID56 = 3,
+   BTRFS_WQ_ENDIO_DIO_REPAIR = 4,
 };
 
 static inline u64 btrfs_sb_offset(int mirror)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 8082220..31600ef 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ 

[PATCH v2 02/12] Btrfs: load checksum data once when submitting a direct read io

2014-07-29 Thread Miao Xie
The current code would load checksum data for several times when we split
a whole direct read io because of the limit of the raid stripe, it would
make us search the csum tree for several times. In fact, it just wasted time,
and made the contention of the csum tree root be more serious. This patch
improves this problem by loading the data at once.

Signed-off-by: Miao Xie mi...@cn.fujitsu.com
---
Changelog v1 - v2:
- Remove the __GFP_ZERO flag in btrfs_submit_direct because it would trigger
  a WARNing. It is reported by Filipe David Manana, Thanks.
---
 fs/btrfs/btrfs_inode.h |  1 -
 fs/btrfs/ctree.h   |  3 +--
 fs/btrfs/extent_io.c   | 13 +++--
 fs/btrfs/file-item.c   | 14 ++
 fs/btrfs/inode.c   | 38 +-
 5 files changed, 35 insertions(+), 34 deletions(-)

diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index a0cf3e5..b69bf7e 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -263,7 +263,6 @@ struct btrfs_dio_private {
 
/* dio_bio came from fs/direct-io.c */
struct bio *dio_bio;
-   u8 csum[0];
 };
 
 /*
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index be91397..40e9938 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3739,8 +3739,7 @@ int btrfs_del_csums(struct btrfs_trans_handle *trans,
 int btrfs_lookup_bio_sums(struct btrfs_root *root, struct inode *inode,
  struct bio *bio, u32 *dst);
 int btrfs_lookup_bio_sums_dio(struct btrfs_root *root, struct inode *inode,
- struct btrfs_dio_private *dip, struct bio *bio,
- u64 logical_offset);
+ struct bio *bio, u64 logical_offset);
 int btrfs_insert_file_extent(struct btrfs_trans_handle *trans,
 struct btrfs_root *root,
 u64 objectid, u64 pos,
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 23398ad..0fb63c4 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2617,9 +2617,18 @@ btrfs_bio_alloc(struct block_device *bdev, u64 
first_sector, int nr_vecs,
 
 struct bio *btrfs_bio_clone(struct bio *bio, gfp_t gfp_mask)
 {
-   return bio_clone_bioset(bio, gfp_mask, btrfs_bioset);
-}
+   struct btrfs_io_bio *btrfs_bio;
+   struct bio *new;
 
+   new = bio_clone_bioset(bio, gfp_mask, btrfs_bioset);
+   if (new) {
+   btrfs_bio = btrfs_io_bio(new);
+   btrfs_bio-csum = NULL;
+   btrfs_bio-csum_allocated = NULL;
+   btrfs_bio-end_io = NULL;
+   }
+   return bio;
+}
 
 /* this also allocates from the btrfs_bioset */
 struct bio *btrfs_io_bio_alloc(gfp_t gfp_mask, unsigned int nr_iovecs)
diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
index f46cfe4..cf1b94f 100644
--- a/fs/btrfs/file-item.c
+++ b/fs/btrfs/file-item.c
@@ -299,19 +299,9 @@ int btrfs_lookup_bio_sums(struct btrfs_root *root, struct 
inode *inode,
 }
 
 int btrfs_lookup_bio_sums_dio(struct btrfs_root *root, struct inode *inode,
- struct btrfs_dio_private *dip, struct bio *bio,
- u64 offset)
+ struct bio *bio, u64 offset)
 {
-   int len = (bio-bi_iter.bi_sector  9) - dip-disk_bytenr;
-   u16 csum_size = btrfs_super_csum_size(root-fs_info-super_copy);
-   int ret;
-
-   len = inode-i_sb-s_blocksize_bits;
-   len *= csum_size;
-
-   ret = __btrfs_lookup_bio_sums(root, inode, bio, offset,
- (u32 *)(dip-csum + len), 1);
-   return ret;
+   return __btrfs_lookup_bio_sums(root, inode, bio, offset, NULL, 1);
 }
 
 int btrfs_lookup_csums_range(struct btrfs_root *root, u64 start, u64 end,
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 548489e..fd88126 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7081,7 +7081,8 @@ static void btrfs_endio_direct_read(struct bio *bio, int 
err)
struct inode *inode = dip-inode;
struct btrfs_root *root = BTRFS_I(inode)-root;
struct bio *dio_bio;
-   u32 *csums = (u32 *)dip-csum;
+   struct btrfs_io_bio *io_bio = btrfs_io_bio(bio);
+   u32 *csums = (u32 *)io_bio-csum;
u64 start;
int i;
 
@@ -7123,6 +7124,9 @@ static void btrfs_endio_direct_read(struct bio *bio, int 
err)
if (err)
clear_bit(BIO_UPTODATE, dio_bio-bi_flags);
dio_end_io(dio_bio, err);
+
+   if (io_bio-end_io)
+   io_bio-end_io(io_bio, err);
bio_put(bio);
 }
 
@@ -7261,13 +7265,20 @@ static inline int __btrfs_submit_dio_bio(struct bio 
*bio, struct inode *inode,
ret = btrfs_csum_one_bio(root, inode, bio, file_offset, 1);
if (ret)
goto err;
-   } else if (!skip_sum) {
-   ret = btrfs_lookup_bio_sums_dio(root, inode, dip, bio,
+   } else {
+   /*
+* We have 

[PATCH v2 05/12] Btrfs: fix missing error handler if submiting re-read bio fails

2014-07-29 Thread Miao Xie
We forgot to free failure record and bio after submitting re-read bio failed,
fix it.

Signed-off-by: Miao Xie mi...@cn.fujitsu.com
---
Changelog v1-v2:
- None
---
 fs/btrfs/extent_io.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 881fc49..fb00736 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2345,6 +2345,11 @@ static int bio_readpage_error(struct bio *failed_bio, 
u64 phy_offset,
ret = tree-ops-submit_bio_hook(inode, read_mode, bio,
 failrec-this_mirror,
 failrec-bio_flags, 0);
+   if (ret) {
+   free_io_failure(inode, failrec, 0);
+   bio_put(bio);
+   }
+
return ret;
 }
 
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 12/12] Btrfs: cleanup the read failure record after write or when the inode is freeing

2014-07-29 Thread Miao Xie
After the data is written successfully, we should cleanup the read failure 
record
in that range because
- If we set data COW for the file, the range that the failure record pointed to 
is
  mapped to a new place, so it is invalid.
- If we set no data COW for the file, and if there is no error during writting,
  the corrupted data is corrected, so the failure record can be removed. And if
  some errors happen on the mirrors, we also needn't worry about it because the
  failure record will be recreated if we read the same place again.

Sometimes, we may fail to correct the data, so the failure records will be left
in the tree, we need free them when we free the inode or the memory leak 
happens.

Signed-off-by: Miao Xie mi...@cn.fujitsu.com
---
Changelog v1-v2:
- None
---
 fs/btrfs/extent_io.c | 34 ++
 fs/btrfs/extent_io.h |  1 +
 fs/btrfs/inode.c |  6 ++
 3 files changed, 41 insertions(+)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 31600ef..39783e7 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2135,6 +2135,40 @@ out:
return 0;
 }
 
+/*
+ * Can be called when
+ * - hold extent lock
+ * - under ordered extent
+ * - the inode is freeing
+ */
+void btrfs_free_io_failure_record(struct inode *inode, u64 start, u64 end)
+{
+   struct extent_io_tree *failure_tree = BTRFS_I(inode)-io_failure_tree;
+   struct io_failure_record *failrec;
+   struct extent_state *state, *next;
+
+   if (RB_EMPTY_ROOT(failure_tree-state))
+   return;
+
+   spin_lock(failure_tree-lock);
+   state = find_first_extent_bit_state(failure_tree, start, EXTENT_DIRTY);
+   while (state) {
+   if (state-start  end)
+   break;
+
+   ASSERT(state-end = end);
+
+   next = next_state(state);
+
+   failrec = (struct io_failure_record *)state-private;
+   free_extent_state(state);
+   kfree(failrec);
+
+   state = next;
+   }
+   spin_unlock(failure_tree-lock);
+}
+
 int btrfs_get_io_failure_record(struct inode *inode, u64 start, u64 end,
struct io_failure_record **failrec_ret)
 {
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index b23c7c2..5c48eda 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -369,6 +369,7 @@ struct io_failure_record {
int in_validation;
 };
 
+void btrfs_free_io_failure_record(struct inode *inode, u64 start, u64 end);
 int btrfs_get_io_failure_record(struct inode *inode, u64 start, u64 end,
struct io_failure_record **failrec_ret);
 int btrfs_check_repairable(struct inode *inode, struct bio *failed_bio,
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index e087189..56bd9c1 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2639,6 +2639,10 @@ static int btrfs_finish_ordered_io(struct 
btrfs_ordered_extent *ordered_extent)
goto out;
}
 
+   btrfs_free_io_failure_record(inode, ordered_extent-file_offset,
+ordered_extent-file_offset +
+ordered_extent-len - 1);
+
if (test_bit(BTRFS_ORDERED_TRUNCATED, ordered_extent-flags)) {
truncated = true;
logical_len = ordered_extent-truncated_len;
@@ -4723,6 +4727,8 @@ void btrfs_evict_inode(struct inode *inode)
/* do we really want it for -i_nlink  0 and zero btrfs_root_refs? */
btrfs_wait_ordered_range(inode, 0, (u64)-1);
 
+   btrfs_free_io_failure_record(inode, 0, (u64)-1);
+
if (root-fs_info-log_root_recovering) {
BUG_ON(test_bit(BTRFS_INODE_HAS_ORPHAN_ITEM,
 BTRFS_I(inode)-runtime_flags));
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 06/12] Btrfs: Cleanup unused variant and argument of IO failure handlers

2014-07-29 Thread Miao Xie
Signed-off-by: Miao Xie mi...@cn.fujitsu.com
---
Changelog v1-v2:
- None
---
 fs/btrfs/extent_io.c | 26 ++
 1 file changed, 10 insertions(+), 16 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index fb00736..f71b34f 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1978,8 +1978,7 @@ struct io_failure_record {
int in_validation;
 };
 
-static int free_io_failure(struct inode *inode, struct io_failure_record *rec,
-   int did_repair)
+static int free_io_failure(struct inode *inode, struct io_failure_record *rec)
 {
int ret;
int err = 0;
@@ -2106,7 +2105,6 @@ static int clean_io_failure(u64 start, struct page *page)
struct btrfs_fs_info *fs_info = BTRFS_I(inode)-root-fs_info;
struct extent_state *state;
int num_copies;
-   int did_repair = 0;
int ret;
 
private = 0;
@@ -2127,7 +2125,6 @@ static int clean_io_failure(u64 start, struct page *page)
/* there was no real error, just free the record */
pr_debug(clean_io_failure: freeing dummy error at %llu\n,
 failrec-start);
-   did_repair = 1;
goto out;
}
if (fs_info-sb-s_flags  MS_RDONLY)
@@ -2144,19 +2141,16 @@ static int clean_io_failure(u64 start, struct page 
*page)
num_copies = btrfs_num_copies(fs_info, failrec-logical,
  failrec-len);
if (num_copies  1)  {
-   ret = repair_io_failure(fs_info, start, failrec-len,
-   failrec-logical, page,
-   failrec-failed_mirror);
-   did_repair = !ret;
+   repair_io_failure(fs_info, start, failrec-len,
+ failrec-logical, page,
+ failrec-failed_mirror);
}
-   ret = 0;
}
 
 out:
-   if (!ret)
-   ret = free_io_failure(inode, failrec, did_repair);
+   free_io_failure(inode, failrec);
 
-   return ret;
+   return 0;
 }
 
 /*
@@ -2266,7 +2260,7 @@ static int bio_readpage_error(struct bio *failed_bio, u64 
phy_offset,
 */
pr_debug(bio_readpage_error: cannot repair, num_copies=%d, 
next_mirror %d, failed_mirror %d\n,
 num_copies, failrec-this_mirror, failed_mirror);
-   free_io_failure(inode, failrec, 0);
+   free_io_failure(inode, failrec);
return -EIO;
}
 
@@ -2309,13 +2303,13 @@ static int bio_readpage_error(struct bio *failed_bio, 
u64 phy_offset,
if (failrec-this_mirror  num_copies) {
pr_debug(bio_readpage_error: (fail) num_copies=%d, next_mirror 
%d, failed_mirror %d\n,
 num_copies, failrec-this_mirror, failed_mirror);
-   free_io_failure(inode, failrec, 0);
+   free_io_failure(inode, failrec);
return -EIO;
}
 
bio = btrfs_io_bio_alloc(GFP_NOFS, 1);
if (!bio) {
-   free_io_failure(inode, failrec, 0);
+   free_io_failure(inode, failrec);
return -EIO;
}
bio-bi_end_io = failed_bio-bi_end_io;
@@ -2346,7 +2340,7 @@ static int bio_readpage_error(struct bio *failed_bio, u64 
phy_offset,
 failrec-this_mirror,
 failrec-bio_flags, 0);
if (ret) {
-   free_io_failure(inode, failrec, 0);
+   free_io_failure(inode, failrec);
bio_put(bio);
}
 
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 04/12] Btrfs: do file data check by sub-bio's self

2014-07-29 Thread Miao Xie
Direct IO splits the original bio to several sub-bios because of the limit of
raid stripe, and the filesystem will wait for all sub-bios and then run final
end io process.

But it was very hard to implement the data repair when dio read failure happens,
because at the final end io function, we didn't know which mirror the data was
read from. So in order to implement the data repair, we have to move the file 
data
check in the final end io function to the sub-bio end io function, in which we 
can
get the mirror number of the device we access. This patch did this work as the
first step of the direct io data repair implementation.

Signed-off-by: Miao Xie mi...@cn.fujitsu.com
---
Changelog v1-v2:
- None
---
 fs/btrfs/btrfs_inode.h |   9 +
 fs/btrfs/extent_io.c   |   2 +-
 fs/btrfs/inode.c   | 100 -
 fs/btrfs/volumes.h |   5 ++-
 4 files changed, 87 insertions(+), 29 deletions(-)

diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index b69bf7e..745fca40 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -245,8 +245,11 @@ static inline int btrfs_inode_in_log(struct inode *inode, 
u64 generation)
return 0;
 }
 
+#define BTRFS_DIO_ORIG_BIO_SUBMITTED   0x1
+
 struct btrfs_dio_private {
struct inode *inode;
+   unsigned long flags;
u64 logical_offset;
u64 disk_bytenr;
u64 bytes;
@@ -263,6 +266,12 @@ struct btrfs_dio_private {
 
/* dio_bio came from fs/direct-io.c */
struct bio *dio_bio;
+
+   /*
+* The original bio may be splited to several sub-bios, this is
+* done during endio of sub-bios
+*/
+   int (*subio_endio)(struct inode *, struct btrfs_io_bio *);
 };
 
 /*
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 0fb63c4..881fc49 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2469,7 +2469,7 @@ static void end_bio_extent_readpage(struct bio *bio, int 
err)
struct inode *inode = page-mapping-host;
 
pr_debug(end_bio_extent_readpage: bi_sector=%llu, err=%d, 
-mirror=%lu\n, (u64)bio-bi_iter.bi_sector, err,
+mirror=%u\n, (u64)bio-bi_iter.bi_sector, err,
 io_bio-mirror_num);
tree = BTRFS_I(inode)-io_tree;
 
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 2e261b1..3e95a2b 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7083,29 +7083,40 @@ unlock_err:
return ret;
 }
 
-static void btrfs_endio_direct_read(struct bio *bio, int err)
+static int btrfs_subio_endio_read(struct inode *inode,
+ struct btrfs_io_bio *io_bio)
 {
-   struct btrfs_dio_private *dip = bio-bi_private;
struct bio_vec *bvec;
-   struct inode *inode = dip-inode;
-   struct bio *dio_bio;
-   struct btrfs_io_bio *io_bio = btrfs_io_bio(bio);
u64 start;
-   int ret;
int i;
+   int ret;
+   int err = 0;
 
-   if (err || (BTRFS_I(inode)-flags  BTRFS_INODE_NODATASUM))
-   goto skip_checksum;
+   if (BTRFS_I(inode)-flags  BTRFS_INODE_NODATASUM)
+   return 0;
 
-   start = dip-logical_offset;
-   bio_for_each_segment_all(bvec, bio, i) {
+   start = io_bio-logical;
+   bio_for_each_segment_all(bvec, io_bio-bio, i) {
ret = __readpage_endio_check(inode, io_bio, i, bvec-bv_page,
 0, start, bvec-bv_len);
if (ret)
err = -EIO;
start += bvec-bv_len;
}
-skip_checksum:
+
+   return err;
+}
+
+static void btrfs_endio_direct_read(struct bio *bio, int err)
+{
+   struct btrfs_dio_private *dip = bio-bi_private;
+   struct inode *inode = dip-inode;
+   struct bio *dio_bio;
+   struct btrfs_io_bio *io_bio = btrfs_io_bio(bio);
+
+   if (!err  (dip-flags  BTRFS_DIO_ORIG_BIO_SUBMITTED))
+   err = btrfs_subio_endio_read(inode, io_bio);
+
unlock_extent(BTRFS_I(inode)-io_tree, dip-logical_offset,
  dip-logical_offset + dip-bytes - 1);
dio_bio = dip-dio_bio;
@@ -7182,6 +7193,7 @@ static int __btrfs_submit_bio_start_direct_io(struct 
inode *inode, int rw,
 static void btrfs_end_dio_bio(struct bio *bio, int err)
 {
struct btrfs_dio_private *dip = bio-bi_private;
+   int ret;
 
if (err) {
btrfs_err(BTRFS_I(dip-inode)-root-fs_info,
@@ -7189,6 +7201,13 @@ static void btrfs_end_dio_bio(struct bio *bio, int err)
  btrfs_ino(dip-inode), bio-bi_rw,
  (unsigned long long)bio-bi_iter.bi_sector,
  bio-bi_iter.bi_size, err);
+   } else if (dip-subio_endio) {
+   ret = dip-subio_endio(dip-inode, btrfs_io_bio(bio));
+   if (ret)
+   err = ret;
+   }
+
+   if (err) {

[PATCH v2 07/12] Btrfs: split bio_readpage_error into several functions

2014-07-29 Thread Miao Xie
The data repair function of direct read will be implemented later, and some code
in bio_readpage_error will be reused, so split bio_readpage_error into
several functions which will be used in direct read repair later.

Signed-off-by: Miao Xie mi...@cn.fujitsu.com
---
Changelog v1-v2:
- None
---
 fs/btrfs/extent_io.c | 159 ++-
 fs/btrfs/extent_io.h |  28 +
 2 files changed, 123 insertions(+), 64 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index f71b34f..daa3e9c 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1959,25 +1959,6 @@ static void check_page_uptodate(struct extent_io_tree 
*tree, struct page *page)
SetPageUptodate(page);
 }
 
-/*
- * When IO fails, either with EIO or csum verification fails, we
- * try other mirrors that might have a good copy of the data.  This
- * io_failure_record is used to record state as we go through all the
- * mirrors.  If another mirror has good data, the page is set up to date
- * and things continue.  If a good mirror can't be found, the original
- * bio end_io callback is called to indicate things have failed.
- */
-struct io_failure_record {
-   struct page *page;
-   u64 start;
-   u64 len;
-   u64 logical;
-   unsigned long bio_flags;
-   int this_mirror;
-   int failed_mirror;
-   int in_validation;
-};
-
 static int free_io_failure(struct inode *inode, struct io_failure_record *rec)
 {
int ret;
@@ -2153,40 +2134,24 @@ out:
return 0;
 }
 
-/*
- * this is a generic handler for readpage errors (default
- * readpage_io_failed_hook). if other copies exist, read those and write back
- * good data to the failed position. does not investigate in remapping the
- * failed extent elsewhere, hoping the device will be smart enough to do this 
as
- * needed
- */
-
-static int bio_readpage_error(struct bio *failed_bio, u64 phy_offset,
- struct page *page, u64 start, u64 end,
- int failed_mirror)
+int btrfs_get_io_failure_record(struct inode *inode, u64 start, u64 end,
+   struct io_failure_record **failrec_ret)
 {
-   struct io_failure_record *failrec = NULL;
+   struct io_failure_record *failrec;
u64 private;
struct extent_map *em;
-   struct inode *inode = page-mapping-host;
struct extent_io_tree *failure_tree = BTRFS_I(inode)-io_failure_tree;
struct extent_io_tree *tree = BTRFS_I(inode)-io_tree;
struct extent_map_tree *em_tree = BTRFS_I(inode)-extent_tree;
-   struct bio *bio;
-   struct btrfs_io_bio *btrfs_failed_bio;
-   struct btrfs_io_bio *btrfs_bio;
-   int num_copies;
int ret;
-   int read_mode;
u64 logical;
 
-   BUG_ON(failed_bio-bi_rw  REQ_WRITE);
-
ret = get_state_private(failure_tree, start, private);
if (ret) {
failrec = kzalloc(sizeof(*failrec), GFP_NOFS);
if (!failrec)
return -ENOMEM;
+
failrec-start = start;
failrec-len = end - start + 1;
failrec-this_mirror = 0;
@@ -2206,11 +2171,11 @@ static int bio_readpage_error(struct bio *failed_bio, 
u64 phy_offset,
em = NULL;
}
read_unlock(em_tree-lock);
-
if (!em) {
kfree(failrec);
return -EIO;
}
+
logical = start - em-start;
logical = em-block_start + logical;
if (test_bit(EXTENT_FLAG_COMPRESSED, em-flags)) {
@@ -2219,8 +2184,10 @@ static int bio_readpage_error(struct bio *failed_bio, 
u64 phy_offset,
extent_set_compress_type(failrec-bio_flags,
 em-compress_type);
}
-   pr_debug(bio_readpage_error: (new) logical=%llu, start=%llu, 
-len=%llu\n, logical, start, failrec-len);
+
+   pr_debug(Get IO Failure Record: (new) logical=%llu, 
start=%llu, len=%llu\n,
+logical, start, failrec-len);
+
failrec-logical = logical;
free_extent_map(em);
 
@@ -2240,8 +2207,7 @@ static int bio_readpage_error(struct bio *failed_bio, u64 
phy_offset,
}
} else {
failrec = (struct io_failure_record *)(unsigned long)private;
-   pr_debug(bio_readpage_error: (found) logical=%llu, 
-start=%llu, len=%llu, validation=%d\n,
+   pr_debug(Get IO Failure Record: (found) logical=%llu, 
start=%llu, len=%llu, validation=%d\n,
 failrec-logical, failrec-start, failrec-len,
 failrec-in_validation);
/*
@@ -2250,6 +2216,17 @@ static int bio_readpage_error(struct bio *failed_bio, 
u64 

Re: BTRFS hang with 3.16-rc5 (and also with 3.16-rc4)

2014-07-29 Thread Liu Bo
On Mon, Jul 28, 2014 at 01:11:19PM +0200, Torbjørn wrote:
 On 28. juli 2014 12:00, Liu Bo wrote:
 snip
 This seems to be incomplete(Looks like dmesg has reached its buffer size 
 limit),
 does /var/log/message have the whole stack info?
 
 thanks,
 -liubo
 Hi,
 
 Complete log was over 40MB. I uploaded everything from boot until
 blocked for 120 seconds started to appear.
 If you want all the trailing log as well, let me know.
 
 https://gist.github.com/anonymous/7958d8917967f727f324

Sorry...still don't get why it's locked up, io_ctl_prepare_pages() has several
callers, and they are properly released from the code level.  And the warnings
printed in the log belong to other btrfs partitions, not the hanged btrfs one,
and we're still not able to know which one holds the free space cache inode 
page.

Maybe we'd better resort to a bisect between 3.14 and 3.15(I know it'd be a lot
of time though).

Here, doing rsync on compress=lzo full btrfs never hit that problem, shrug...

thanks,
-liubo

 
 --
 Torbjørn
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: fix regression of btrfs device replace

2014-07-29 Thread Liu Bo
Commit 49c6f736f34f901117c20960ebd7d5e60f12fcac(
btrfs: dev replace should replace the sysfs entry) added the missing sysfs entry
in the process of device replace, but didn't take missing devices into account,
so now we have

BUG: unable to handle kernel NULL pointer dereference at 0088
IP: [a0268551] btrfs_kobj_rm_device+0x21/0x40 [btrfs]
...

To reproduce it,
1. mkfs.btrfs -f disk1 disk2
2. mkfs.ext4 disk1
3. mount disk2 /mnt -odegraded
4. btrfs replace start -B 1 disk3 /mnt
--

This fixes the problem.

Reported-by: Chris Murphy li...@colorremedies.com
Signed-off-by: Liu Bo bo.li@oracle.com
---
 fs/btrfs/sysfs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
index 7869936..12e5355 100644
--- a/fs/btrfs/sysfs.c
+++ b/fs/btrfs/sysfs.c
@@ -614,7 +614,7 @@ int btrfs_kobj_rm_device(struct btrfs_fs_info *fs_info,
if (!fs_info-device_dir_kobj)
return -EINVAL;
 
-   if (one_device) {
+   if (one_device  one_device-bdev) {
disk = one_device-bdev-bd_part;
disk_kobj = part_to_dev(disk)-kobj;
 
-- 
1.8.1.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH V5 00/12] Btrfs: Subpagesize-blocksize: Get rid of whole page I/O.

2014-07-29 Thread Chandan Rajendra
This patchset continues with the work posted earlier at
https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg35547.html.

Changes from V1:
1. Remove usage of bio_vec-bv_{len,offset} in end_bio_extent_readpage()
   and end_bio_extent_writepage().

Changes from V2:
1. Get __extent_writepage() to write only the dirty blocks of a page.
2. Fix page private not zero on page warning message which is printed
   when running xfstests.

Changes from V3:
1. Get Hole punching and Extent preallocation to work correctly in
   subpagesize-blocksize scenario.
2. Get btrfs_page_mkwrite() to reserve space in sectorsized units.

Changes from V4:
1. V2's Btrfs: subpagesize-blocksize: Get rid of whole page reads
   patch was incorrectly replaced with an older version when working
   on V3 patches. Fix this.
2. Fix btrfs_endio_direct_read() to compute checksums for all possible
   blocks in a page.

Xfstests' generic tests were run on an x86_64 machine with the patches
applied. The Btrfs kernel module was compiled without ACL and quotas support
and hence tests related to those were not run.

On multiple runs of the tests with 4k blocksize, 'umount' process would
sometimes get blocked indefinitely causing 'hung task detector' to print the
function call trace.

For 2k blocksize, the following xfstests' generic tests failed:
068, 075, 091, 112, 127, 224 and 274.

The following is a list of known TODO items which will be implemented in
future revisions of this patchset:
1. The following command would cause a soft-lockup.
   xfs_io -f -c pwrite 0 6144 -c sync_range 0 4096 -c truncate 4095 
file.bin
2. Re-base patchset on top of linux-btrfs/next branch to make use of immutable 
biovecs.
2. Remove PAGE_CACHE_SIZE delalloc reservation in 
btrfs_writepage_fixup_worker().
3. Create separate slab caches for 'extent buffer head' and 'extent buffer'.
4. Add 'leak list' tracking for 'extent buffer' instances.
5. Rename EXTENT_BUFFER_TREE_REF and EXTENT_BUFFER_IN_TREE to
   EXTENT_BUFFER_HEAD_TREE_REF and EXTENT_BUFFER_HEAD_IN_TREE respectively.
6. Get Xfstests' generic tests to successfully run on both 4k and 2k
   blocksizes.

Chandan Rajendra (10):
  Btrfs: subpagesize-blocksize: Get rid of whole page reads.
  Btrfs: subpagesize-blocksize: Get rid of whole page writes.
  Btrfs: subpagesize-blocksize: __btrfs_buffered_write: Reserve/release
extents aligned to block size.
  Btrfs: subpagesize-blocksize: Read tree blocks whose size is
PAGE_CACHE_SIZE.
  Btrfs: subpagesize-blocksize: Write only dirty extent buffers
belonging to a page
  Btrfs: subpagesize-blocksize: Compute and look up csums based on
sectorsized blocks.
  Btrfs: subpagesize-blocksize: __extent_writepage: Write only dirty
blocks of a page.
  Btrfs: subpagesize-blocksize: fallocate: Work with sectorsized units.
  Btrfs: subpagesize-blocksize: btrfs_page_mkwrite: Reserve space in
sectorsized units.
  Btrfs: subpagesize-blocksize: Search for all ordered extents that
could span across a page.

Chandra Seetharaman (2):
  Btrfs: subpagesize-blocksize: Define extent_buffer_head.
  Btrfs: subpagesize-blocksize: Allow mounting filesystems where
sectorsize != PAGE_SIZE

 fs/btrfs/backref.c   |2 +-
 fs/btrfs/ctree.c |2 +-
 fs/btrfs/ctree.h |8 +-
 fs/btrfs/disk-io.c   |  117 +++--
 fs/btrfs/disk-io.h   |3 +
 fs/btrfs/extent-tree.c   |6 +-
 fs/btrfs/extent_io.c | 1191 --
 fs/btrfs/extent_io.h |   48 +-
 fs/btrfs/file-item.c |   85 +--
 fs/btrfs/file.c  |   67 ++-
 fs/btrfs/inode.c |  184 ---
 fs/btrfs/volumes.c   |2 +-
 fs/btrfs/volumes.h   |3 +
 include/trace/events/btrfs.h |2 +-
 14 files changed, 1124 insertions(+), 596 deletions(-)

-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH V5 06/12] Btrfs: subpagesize-blocksize: Write only dirty extent buffers belonging to a page

2014-07-29 Thread Chandan Rajendra
For the subpagesize-blocksize scenario, This patch adds the ability to write a
single extent buffer to the disk.

Signed-off-by: Chandan Rajendra chan...@linux.vnet.ibm.com
---
 fs/btrfs/disk-io.c   |  20 ++--
 fs/btrfs/extent_io.c | 279 ++-
 2 files changed, 244 insertions(+), 55 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index b2c4e9d..28a45f6 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -466,17 +466,23 @@ static int btree_read_extent_buffer_pages(struct 
btrfs_root *root,
 
 static int csum_dirty_buffer(struct btrfs_root *root, struct page *page)
 {
-   u64 start = page_offset(page);
-   u64 found_start;
struct extent_buffer *eb;
+   u64 found_start;
 
eb = (struct extent_buffer *)page-private;
-   if (page != eb-pages[0])
+   if (page != eb_head(eb)-pages[0])
return 0;
-   found_start = btrfs_header_bytenr(eb);
-   if (WARN_ON(found_start != start || !PageUptodate(page)))
-   return 0;
-   csum_tree_block(root, eb, 0);
+   do {
+   if (!test_bit(EXTENT_BUFFER_WRITEBACK, eb-ebflags))
+   continue;
+   if (WARN_ON(!test_bit(EXTENT_BUFFER_UPTODATE, eb-ebflags)))
+   continue;
+   found_start = btrfs_header_bytenr(eb);
+   if (WARN_ON(found_start != eb-start))
+   return 0;
+   csum_tree_block(root, eb, 0);
+   } while ((eb = eb-eb_next) != NULL);
+
return 0;
 }
 
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 58511f3..9937851 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3407,32 +3407,53 @@ void wait_on_extent_buffer_writeback(struct 
extent_buffer *eb)
TASK_UNINTERRUPTIBLE);
 }
 
-static int lock_extent_buffer_for_io(struct extent_buffer *eb,
-struct btrfs_fs_info *fs_info,
-struct extent_page_data *epd)
+static void lock_extent_buffer_pages(struct extent_buffer_head *ebh,
+   struct extent_page_data *epd)
 {
+   struct extent_buffer *eb = ebh-eb;
unsigned long i, num_pages;
-   int flush = 0;
+
+   num_pages = num_extent_pages(eb-start, eb-len);
+   for (i = 0; i  num_pages; i++) {
+   struct page *p = extent_buffer_page(eb, i);
+
+   if (!trylock_page(p)) {
+   flush_write_bio(epd);
+   lock_page(p);
+   }
+   }
+
+   return;
+}
+
+static int lock_extent_buffer_for_io(struct extent_buffer *eb,
+   struct btrfs_fs_info *fs_info,
+   struct extent_page_data *epd)
+{
+   int dirty;
int ret = 0;
 
if (!btrfs_try_tree_write_lock(eb)) {
-   flush = 1;
flush_write_bio(epd);
btrfs_tree_lock(eb);
}
 
-   if (test_bit(EXTENT_BUFFER_WRITEBACK, eb-bflags)) {
+   if (test_bit(EXTENT_BUFFER_WRITEBACK, eb-ebflags)) {
+   dirty = test_bit(EXTENT_BUFFER_DIRTY, eb-ebflags);
btrfs_tree_unlock(eb);
-   if (!epd-sync_io)
-   return 0;
-   if (!flush) {
-   flush_write_bio(epd);
-   flush = 1;
+   if (!epd-sync_io) {
+   if (!dirty)
+   return 1;
+   else
+   return 2;
}
+
+   flush_write_bio(epd);
+
while (1) {
wait_on_extent_buffer_writeback(eb);
btrfs_tree_lock(eb);
-   if (!test_bit(EXTENT_BUFFER_WRITEBACK, eb-bflags))
+   if (!test_bit(EXTENT_BUFFER_WRITEBACK, eb-ebflags))
break;
btrfs_tree_unlock(eb);
}
@@ -3443,27 +3464,25 @@ static int lock_extent_buffer_for_io(struct 
extent_buffer *eb,
 * under IO since we can end up having no IO bits set for a short period
 * of time.
 */
-   spin_lock(eb-refs_lock);
-   if (test_and_clear_bit(EXTENT_BUFFER_DIRTY, eb-bflags)) {
-   set_bit(EXTENT_BUFFER_WRITEBACK, eb-bflags);
-   spin_unlock(eb-refs_lock);
+   spin_lock(eb_head(eb)-refs_lock);
+   if (test_and_clear_bit(EXTENT_BUFFER_DIRTY, eb-ebflags)) {
+   set_bit(EXTENT_BUFFER_WRITEBACK, eb-ebflags);
+   spin_unlock(eb_head(eb)-refs_lock);
btrfs_set_header_flag(eb, BTRFS_HEADER_FLAG_WRITTEN);
__percpu_counter_add(fs_info-dirty_metadata_bytes,
 -eb-len,
 fs_info-dirty_metadata_batch);
-   ret = 1;
+   ret = 0;

[RFC PATCH V5 12/12] Btrfs: subpagesize-blocksize: Search for all ordered extents that could span across a page.

2014-07-29 Thread Chandan Rajendra
In subpagesize-blocksize scenario it is not sufficient to search using the
first byte of the page to make sure that there are no ordered extents
present across the page. Fix this.

Signed-off-by: Chandan Rajendra chan...@linux.vnet.ibm.com
---
 fs/btrfs/extent_io.c | 3 ++-
 fs/btrfs/inode.c | 6 +++---
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index f938a5c..84924c8 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3066,7 +3066,8 @@ static int __extent_read_full_page(struct extent_io_tree 
*tree,
 
while (1) {
lock_extent(tree, start, end);
-   ordered = btrfs_lookup_ordered_extent(inode, start);
+   ordered = btrfs_lookup_ordered_range(inode, start,
+   PAGE_CACHE_SIZE);
if (!ordered)
break;
unlock_extent(tree, start, end);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index d7a3ca7..b710837 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1784,7 +1784,7 @@ again:
if (PagePrivate2(page))
goto out;
 
-   ordered = btrfs_lookup_ordered_extent(inode, page_start);
+   ordered = btrfs_lookup_ordered_range(inode, page_start, 
PAGE_CACHE_SIZE);
if (ordered) {
unlock_extent_cached(BTRFS_I(inode)-io_tree, page_start,
 page_end, cached_state, GFP_NOFS);
@@ -7610,7 +7610,7 @@ static void btrfs_invalidatepage(struct page *page, 
unsigned int offset,
 
if (!inode_evicting)
lock_extent_bits(tree, page_start, page_end, 0, cached_state);
-   ordered = btrfs_lookup_ordered_extent(inode, page_start);
+   ordered = btrfs_lookup_ordered_range(inode, page_start, 
PAGE_CACHE_SIZE);
if (ordered) {
/*
 * IO on this page will never be started, so we need
@@ -7735,7 +7735,7 @@ again:
 * we can't set the delalloc bits if there are pending ordered
 * extents.  Drop our locks and wait for them to finish
 */
-   ordered = btrfs_lookup_ordered_extent(inode, page_start);
+   ordered = btrfs_lookup_ordered_range(inode, page_start, page_end);
if (ordered) {
unlock_extent_cached(io_tree, page_start, page_end,
 cached_state, GFP_NOFS);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH V5 03/12] Btrfs: subpagesize-blocksize: __btrfs_buffered_write: Reserve/release extents aligned to block size.

2014-07-29 Thread Chandan Rajendra
Currently, the code reserves/releases extents in multiples of PAGE_CACHE_SIZE
units. Fix this.

Signed-off-by: Chandan Rajendra chan...@linux.vnet.ibm.com
---
 fs/btrfs/file.c | 32 
 1 file changed, 20 insertions(+), 12 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 006af2f..541e227 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1339,18 +1339,21 @@ fail:
 static noinline int
 lock_and_cleanup_extent_if_need(struct inode *inode, struct page **pages,
size_t num_pages, loff_t pos,
+   size_t write_bytes,
u64 *lockstart, u64 *lockend,
struct extent_state **cached_state)
 {
+   struct btrfs_root *root = BTRFS_I(inode)-root;
u64 start_pos;
u64 last_pos;
int i;
int ret = 0;
 
-   start_pos = pos  ~((u64)PAGE_CACHE_SIZE - 1);
-   last_pos = start_pos + ((u64)num_pages  PAGE_CACHE_SHIFT) - 1;
+   start_pos = pos  ~((u64)root-sectorsize - 1);
+   last_pos = start_pos
+   + ALIGN(pos + write_bytes - start_pos, root-sectorsize) - 1;
 
-   if (start_pos  inode-i_size) {
+   if (start_pos  inode-i_size) {
struct btrfs_ordered_extent *ordered;
lock_extent_bits(BTRFS_I(inode)-io_tree,
 start_pos, last_pos, 0, cached_state);
@@ -1468,6 +1471,7 @@ static noinline ssize_t __btrfs_buffered_write(struct 
file *file,
 
while (iov_iter_count(i)  0) {
size_t offset = pos  (PAGE_CACHE_SIZE - 1);
+   size_t sector_offset;
size_t write_bytes = min(iov_iter_count(i),
 nrptrs * (size_t)PAGE_CACHE_SIZE -
 offset);
@@ -1488,7 +1492,9 @@ static noinline ssize_t __btrfs_buffered_write(struct 
file *file,
break;
}
 
-   reserve_bytes = num_pages  PAGE_CACHE_SHIFT;
+   sector_offset = pos  (root-sectorsize - 1);
+   reserve_bytes = ALIGN(write_bytes + sector_offset, 
root-sectorsize);
+
ret = btrfs_check_data_free_space(inode, reserve_bytes);
if (ret == -ENOSPC 
(BTRFS_I(inode)-flags  (BTRFS_INODE_NODATACOW |
@@ -1503,7 +1509,9 @@ static noinline ssize_t __btrfs_buffered_write(struct 
file *file,
num_pages = (write_bytes + offset +
 PAGE_CACHE_SIZE - 1) 
PAGE_CACHE_SHIFT;
-   reserve_bytes = num_pages  PAGE_CACHE_SHIFT;
+
+   reserve_bytes = ALIGN(write_bytes + 
sector_offset,
+   root-sectorsize);
ret = 0;
} else {
ret = -ENOSPC;
@@ -1536,8 +1544,8 @@ again:
break;
 
ret = lock_and_cleanup_extent_if_need(inode, pages, num_pages,
- pos, lockstart, lockend,
- cached_state);
+   pos, write_bytes, lockstart, 
lockend,
+   cached_state);
if (ret  0) {
if (ret == -EAGAIN)
goto again;
@@ -1574,9 +1582,9 @@ again:
 * we still have an outstanding extent for the chunk we actually
 * managed to copy.
 */
-   if (num_pages  dirty_pages) {
-   release_bytes = (num_pages - dirty_pages) 
-   PAGE_CACHE_SHIFT;
+   if (write_bytes  copied) {
+   release_bytes = (write_bytes - copied)
+~((u64)root-sectorsize - 1);
if (copied  0) {
spin_lock(BTRFS_I(inode)-lock);
BTRFS_I(inode)-outstanding_extents++;
@@ -1590,7 +1598,7 @@ again:
 release_bytes);
}
 
-   release_bytes = dirty_pages  PAGE_CACHE_SHIFT;
+   release_bytes = ALIGN(copied + sector_offset, root-sectorsize);
 
if (copied  0)
ret = btrfs_dirty_pages(root, inode, pages,
@@ -1609,7 +1617,7 @@ again:
if (only_release_metadata  copied  0) {
u64 lockstart = round_down(pos, root-sectorsize);
u64 lockend = lockstart +
-   (dirty_pages  PAGE_CACHE_SHIFT) - 1;
+   ALIGN(copied, root-sectorsize) - 1;
 

[RFC PATCH V5 08/12] Btrfs: subpagesize-blocksize: Compute and look up csums based on sectorsized blocks.

2014-07-29 Thread Chandan Rajendra
Checksums are applicable to sectorsize units. The current code uses
bio-bv_len units to compute and look up checksums. This works on machines
where sectorsize == PAGE_CACHE_SIZE. This patch makes the checksum
computation and look up code to work with sectorsize units.

Signed-off-by: Chandan Rajendra chan...@linux.vnet.ibm.com
---
 fs/btrfs/file-item.c | 85 
 fs/btrfs/inode.c | 50 ---
 2 files changed, 85 insertions(+), 50 deletions(-)

diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
index 9d84658..16deb87 100644
--- a/fs/btrfs/file-item.c
+++ b/fs/btrfs/file-item.c
@@ -172,6 +172,7 @@ static int __btrfs_lookup_bio_sums(struct btrfs_root *root,
u64 item_start_offset = 0;
u64 item_last_offset = 0;
u64 disk_bytenr;
+   u64 page_bytes_left;
u32 diff;
int nblocks;
int bio_index = 0;
@@ -220,6 +221,8 @@ static int __btrfs_lookup_bio_sums(struct btrfs_root *root,
disk_bytenr = (u64)bio-bi_sector  9;
if (dio)
offset = logical_offset;
+
+   page_bytes_left = bvec-bv_len;
while (bio_index  bio-bi_vcnt) {
if (!dio)
offset = page_offset(bvec-bv_page) + bvec-bv_offset;
@@ -243,7 +246,7 @@ static int __btrfs_lookup_bio_sums(struct btrfs_root *root,
if (BTRFS_I(inode)-root-root_key.objectid ==
BTRFS_DATA_RELOC_TREE_OBJECTID) {
set_extent_bits(io_tree, offset,
-   offset + bvec-bv_len - 1,
+   offset + root-sectorsize - 1,
EXTENT_NODATASUM, GFP_NOFS);
} else {

btrfs_info(BTRFS_I(inode)-root-fs_info,
@@ -281,11 +284,17 @@ static int __btrfs_lookup_bio_sums(struct btrfs_root 
*root,
 found:
csum += count * csum_size;
nblocks -= count;
+
while (count--) {
-   disk_bytenr += bvec-bv_len;
-   offset += bvec-bv_len;
-   bio_index++;
-   bvec++;
+   disk_bytenr += root-sectorsize;
+   offset += root-sectorsize;
+   page_bytes_left -= root-sectorsize;
+   if (!page_bytes_left) {
+   bio_index++;
+   bvec++;
+   page_bytes_left = bvec-bv_len;
+   }
+
}
}
btrfs_free_path(path);
@@ -442,6 +451,8 @@ int btrfs_csum_one_bio(struct btrfs_root *root, struct 
inode *inode,
struct bio_vec *bvec = bio-bi_io_vec;
int bio_index = 0;
int index;
+   int nr_sectors;
+   int i;
unsigned long total_bytes = 0;
unsigned long this_sum_bytes = 0;
u64 offset;
@@ -468,41 +479,49 @@ int btrfs_csum_one_bio(struct btrfs_root *root, struct 
inode *inode,
if (!contig)
offset = page_offset(bvec-bv_page) + bvec-bv_offset;
 
-   if (offset = ordered-file_offset + ordered-len ||
-   offset  ordered-file_offset) {
-   unsigned long bytes_left;
-   sums-len = this_sum_bytes;
-   this_sum_bytes = 0;
-   btrfs_add_ordered_sum(inode, ordered, sums);
-   btrfs_put_ordered_extent(ordered);
+   data = kmap_atomic(bvec-bv_page);
 
-   bytes_left = bio-bi_size - total_bytes;
+   nr_sectors = (bvec-bv_len + root-sectorsize - 1)
+root-fs_info-sb-s_blocksize_bits;
+
+   for (i = 0; i  nr_sectors; i++) {
+   if (offset = ordered-file_offset + ordered-len ||
+   offset  ordered-file_offset) {
+   unsigned long bytes_left;
+   sums-len = this_sum_bytes;
+   this_sum_bytes = 0;
+   btrfs_add_ordered_sum(inode, ordered, sums);
+   btrfs_put_ordered_extent(ordered);
+
+   bytes_left = bio-bi_size - total_bytes;
+
+   sums = kzalloc(btrfs_ordered_sum_size(root, 
bytes_left),
+   GFP_NOFS);
+   BUG_ON(!sums); /* -ENOMEM */
+   sums-len = bytes_left;
+   ordered = btrfs_lookup_ordered_extent(inode, 
offset);
+   BUG_ON(!ordered); /* Logic error */
+   sums-bytenr = ((u64)bio-bi_sector  9) +
+

[RFC PATCH V5 05/12] Btrfs: subpagesize-blocksize: Read tree blocks whose size is PAGE_CACHE_SIZE.

2014-07-29 Thread Chandan Rajendra
In the case of subpagesize-blocksize, this patch makes it possible to read
only a single metadata block from the disk instead of all the metadata blocks
that map into a page.

Signed-off-by: Chandan Rajendra chan...@linux.vnet.ibm.com
---
 fs/btrfs/disk-io.c   |  45 -
 fs/btrfs/disk-io.h   |   3 ++
 fs/btrfs/extent_io.c | 135 +++
 3 files changed, 137 insertions(+), 46 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index bda2157..b2c4e9d 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -413,7 +413,7 @@ static int btree_read_extent_buffer_pages(struct btrfs_root 
*root,
int mirror_num = 0;
int failed_mirror = 0;
 
-   clear_bit(EXTENT_BUFFER_CORRUPT, eb-bflags);
+   clear_bit(EXTENT_BUFFER_CORRUPT, eb-ebflags);
io_tree = BTRFS_I(root-fs_info-btree_inode)-io_tree;
while (1) {
ret = read_extent_buffer_pages(io_tree, eb, start,
@@ -432,7 +432,7 @@ static int btree_read_extent_buffer_pages(struct btrfs_root 
*root,
 * there is no reason to read the other copies, they won't be
 * any less wrong.
 */
-   if (test_bit(EXTENT_BUFFER_CORRUPT, eb-bflags))
+   if (test_bit(EXTENT_BUFFER_CORRUPT, eb-ebflags))
break;
 
num_copies = btrfs_num_copies(root-fs_info,
@@ -564,12 +564,13 @@ static noinline int check_leaf(struct btrfs_root *root,
return 0;
 }
 
-static int btree_readpage_end_io_hook(struct btrfs_io_bio *io_bio,
- u64 phy_offset, struct page *page,
- u64 start, u64 end, int mirror)
+int verify_extent_buffer_read(struct btrfs_io_bio *io_bio,
+   struct page *page,
+   u64 start, u64 end, int mirror)
 {
u64 found_start;
int found_level;
+   struct extent_buffer_head *ebh;
struct extent_buffer *eb;
struct btrfs_root *root = BTRFS_I(page-mapping-host)-root;
int ret = 0;
@@ -579,18 +580,26 @@ static int btree_readpage_end_io_hook(struct btrfs_io_bio 
*io_bio,
goto out;
 
eb = (struct extent_buffer *)page-private;
+   do {
+   if ((eb-start = start)  (eb-start + eb-len - 1  start))
+   break;
+   } while ((eb = eb-eb_next) != NULL);
+
+   BUG_ON(!eb);
+
+   ebh = eb_head(eb);
 
/* the pending IO might have been the only thing that kept this buffer
 * in memory.  Make sure we have a ref for all this other checks
 */
extent_buffer_get(eb);
 
-   reads_done = atomic_dec_and_test(eb-io_pages);
+   reads_done = atomic_dec_and_test(ebh-io_bvecs);
if (!reads_done)
goto err;
 
eb-read_mirror = mirror;
-   if (test_bit(EXTENT_BUFFER_IOERR, eb-bflags)) {
+   if (test_bit(EXTENT_BUFFER_IOERR, eb-ebflags)) {
ret = -EIO;
goto err;
}
@@ -632,7 +641,7 @@ static int btree_readpage_end_io_hook(struct btrfs_io_bio 
*io_bio,
 * return -EIO.
 */
if (found_level == 0  check_leaf(root, eb)) {
-   set_bit(EXTENT_BUFFER_CORRUPT, eb-bflags);
+   set_bit(EXTENT_BUFFER_CORRUPT, eb-ebflags);
ret = -EIO;
}
 
@@ -640,7 +649,7 @@ static int btree_readpage_end_io_hook(struct btrfs_io_bio 
*io_bio,
set_extent_buffer_uptodate(eb);
 err:
if (reads_done 
-   test_and_clear_bit(EXTENT_BUFFER_READAHEAD, eb-bflags))
+   test_and_clear_bit(EXTENT_BUFFER_READAHEAD, eb-ebflags))
btree_readahead_hook(root, eb, eb-start, ret);
 
if (ret) {
@@ -649,7 +658,7 @@ err:
 * again, we have to make sure it has something
 * to decrement
 */
-   atomic_inc(eb-io_pages);
+   atomic_inc(eb_head(eb)-io_bvecs);
clear_extent_buffer_uptodate(eb);
}
free_extent_buffer(eb);
@@ -657,20 +666,6 @@ out:
return ret;
 }
 
-static int btree_io_failed_hook(struct page *page, int failed_mirror)
-{
-   struct extent_buffer *eb;
-   struct btrfs_root *root = BTRFS_I(page-mapping-host)-root;
-
-   eb = (struct extent_buffer *)page-private;
-   set_bit(EXTENT_BUFFER_IOERR, eb-bflags);
-   eb-read_mirror = failed_mirror;
-   atomic_dec(eb-io_pages);
-   if (test_and_clear_bit(EXTENT_BUFFER_READAHEAD, eb-bflags))
-   btree_readahead_hook(root, eb, eb-start, -EIO);
-   return -EIO;/* we fixed nothing */
-}
-
 static void end_workqueue_bio(struct bio *bio, int err)
 {
struct end_io_wq *end_io_wq = bio-bi_private;
@@ -4109,8 +4104,6 @@ static int btrfs_cleanup_transaction(struct btrfs_root 
*root)
 }
 
 static struct extent_io_ops btree_extent_io_ops = {
-   .readpage_end_io_hook = 

[RFC PATCH V5 11/12] Btrfs: subpagesize-blocksize: btrfs_page_mkwrite: Reserve space in sectorsized units.

2014-07-29 Thread Chandan Rajendra
In subpagesize-blocksize scenario, if i_size occurs in a block which is not
the last block in the page, then the space to be reserved should be calculated
appropriately.

Signed-off-by: Chandan Rajendra chan...@linux.vnet.ibm.com
---
 fs/btrfs/inode.c | 33 ++---
 1 file changed, 22 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 1872725..d7a3ca7 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7698,26 +7698,23 @@ int btrfs_page_mkwrite(struct vm_area_struct *vma, 
struct vm_fault *vmf)
loff_t size;
int ret;
int reserved = 0;
+   u64 delalloc_size;
u64 page_start;
u64 page_end;
 
sb_start_pagefault(inode-i_sb);
-   ret  = btrfs_delalloc_reserve_space(inode, PAGE_CACHE_SIZE);
-   if (!ret) {
-   ret = file_update_time(vma-vm_file);
-   reserved = 1;
-   }
+
+   ret = file_update_time(vma-vm_file);
if (ret) {
if (ret == -ENOMEM)
ret = VM_FAULT_OOM;
else /* -ENOSPC, -EIO, etc */
ret = VM_FAULT_SIGBUS;
-   if (reserved)
-   goto out;
-   goto out_noreserve;
+   goto out;
}
 
ret = VM_FAULT_NOPAGE; /* make the VM retry the fault */
+
 again:
lock_page(page);
size = i_size_read(inode);
@@ -7748,6 +7745,19 @@ again:
goto again;
}
 
+   if (page-index == ((size - 1)  PAGE_CACHE_SHIFT))
+   delalloc_size = round_up(size - page_start, root-sectorsize);
+   else
+   delalloc_size = PAGE_CACHE_SIZE;
+
+   ret = btrfs_delalloc_reserve_space(inode, delalloc_size);
+   if (ret) {
+   /* -ENOSPC */
+   ret = VM_FAULT_SIGBUS;
+   goto out_unlock;
+   }
+   reserved = 1;
+
/*
 * XXX - page_mkwrite gets called every time the page is dirtied, even
 * if it was already dirty, so for space accounting reasons we need to
@@ -7760,7 +7770,8 @@ again:
  EXTENT_DO_ACCOUNTING | EXTENT_DEFRAG,
  0, 0, cached_state, GFP_NOFS);
 
-   ret = btrfs_set_extent_delalloc(inode, page_start, page_end,
+   ret = btrfs_set_extent_delalloc(inode, page_start,
+   page_start + delalloc_size - 1,
cached_state);
if (ret) {
unlock_extent_cached(io_tree, page_start, page_end,
@@ -7799,8 +7810,8 @@ out_unlock:
}
unlock_page(page);
 out:
-   btrfs_delalloc_release_space(inode, PAGE_CACHE_SIZE);
-out_noreserve:
+   if (reserved)
+   btrfs_delalloc_release_space(inode, delalloc_size);
sb_end_pagefault(inode-i_sb);
return ret;
 }
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH V5 09/12] Btrfs: subpagesize-blocksize: __extent_writepage: Write only dirty blocks of a page.

2014-07-29 Thread Chandan Rajendra
The code now loops across 'ordered extents' instead of 'extent maps' to figure
out the dirty blocks of the page to be submitted for a write operation.

Signed-off-by: Chandan Rajendra chan...@linux.vnet.ibm.com
---
 fs/btrfs/extent_io.c | 68 +---
 1 file changed, 27 insertions(+), 41 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 9937851..f938a5c 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3129,22 +3129,22 @@ static int __extent_writepage(struct page *page, struct 
writeback_control *wbc,
struct inode *inode = page-mapping-host;
struct extent_page_data *epd = data;
struct extent_io_tree *tree = epd-tree;
+   struct btrfs_ordered_extent *ordered;
u64 start = page_offset(page);
u64 delalloc_start;
u64 page_end = start + PAGE_CACHE_SIZE - 1;
u64 end;
u64 cur = start;
u64 extent_offset;
+   u64 extent_end;
u64 last_byte = i_size_read(inode);
-   u64 block_start;
u64 iosize;
sector_t sector;
struct extent_state *cached_state = NULL;
-   struct extent_map *em;
struct block_device *bdev;
int ret;
int nr = 0;
-   size_t pg_offset = 0;
+   size_t pg_offset;
size_t blocksize;
loff_t i_size = i_size_read(inode);
unsigned long end_index = i_size  PAGE_CACHE_SHIFT;
@@ -3184,7 +3184,6 @@ static int __extent_writepage(struct page *page, struct 
writeback_control *wbc,
kunmap_atomic(userpage);
flush_dcache_page(page);
}
-   pg_offset = 0;
 
set_page_extent_mapped(page);
 
@@ -3295,57 +3294,45 @@ static int __extent_writepage(struct page *page, struct 
writeback_control *wbc,
 page_end, NULL, 1);
break;
}
-   em = epd-get_extent(inode, page, pg_offset, cur,
-end - cur + 1, 1);
-   if (IS_ERR_OR_NULL(em)) {
-   SetPageError(page);
-   break;
+
+   ordered = btrfs_lookup_ordered_extent(inode, cur);
+   if (!ordered) {
+   cur += blocksize;
+   continue;
}
 
-   extent_offset = cur - em-start;
-   BUG_ON(extent_map_end(em) = cur);
+   pg_offset = cur  (PAGE_CACHE_SIZE - 1);
+
+   extent_offset = cur - ordered-file_offset;
+   extent_end = ordered-file_offset + ordered-len;
+   extent_end = (extent_end  ordered-file_offset) ? -1 : 
extent_end;
+   BUG_ON(extent_end = cur);
BUG_ON(end  cur);
-   iosize = min(extent_map_end(em) - cur, end - cur + 1);
+   iosize = min(extent_end - cur, end - cur + 1);
iosize = ALIGN(iosize, blocksize);
-   sector = (em-block_start + extent_offset)  9;
-   bdev = em-bdev;
-   block_start = em-block_start;
-   compressed = test_bit(EXTENT_FLAG_COMPRESSED, em-flags);
-   free_extent_map(em);
-   em = NULL;
-
+   sector = (ordered-start + extent_offset)  9;
+   bdev = BTRFS_I(inode)-root-fs_info-fs_devices-latest_bdev;
+   compressed = test_bit(BTRFS_ORDERED_COMPRESSED, 
ordered-flags);
+   btrfs_put_ordered_extent(ordered);
+   ordered = NULL;
/*
 * compressed and inline extents are written through other
 * paths in the FS
 */
-   if (compressed || block_start == EXTENT_MAP_HOLE ||
-   block_start == EXTENT_MAP_INLINE) {
-   /*
-* end_io notification does not happen here for
-* compressed extents
-*/
-   if (!compressed  tree-ops 
-   tree-ops-writepage_end_io_hook)
-   tree-ops-writepage_end_io_hook(page, cur,
-cur + iosize - 1,
-NULL, 1);
-   else if (compressed) {
-   /* we don't want to end_page_writeback on
-* a compressed extent.  this happens
-* elsewhere
-*/
-   nr++;
-   }
-
+   if (compressed) {
+   /* we don't want to end_page_writeback on
+* a compressed extent.  this happens
+* elsewhere
+*/
+   nr++;
cur += iosize;
- 

[RFC PATCH V5 10/12] Btrfs: subpagesize-blocksize: fallocate: Work with sectorsized units.

2014-07-29 Thread Chandan Rajendra
While at it, this commit changes btrfs_truncate_page() to truncate sectorsized
blocks instead of pages. Hence the function has been renamed to
btrfs_truncate_block().

Signed-off-by: Chandan Rajendra chan...@linux.vnet.ibm.com
---
 fs/btrfs/ctree.h |  2 +-
 fs/btrfs/file.c  | 35 ++-
 fs/btrfs/inode.c | 48 +---
 3 files changed, 44 insertions(+), 41 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 901ada2..4a93d21 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3721,7 +3721,7 @@ int btrfs_unlink_subvol(struct btrfs_trans_handle *trans,
struct btrfs_root *root,
struct inode *dir, u64 objectid,
const char *name, int name_len);
-int btrfs_truncate_page(struct inode *inode, loff_t from, loff_t len,
+int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
int front);
 int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
   struct btrfs_root *root,
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 541e227..abacd5f 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2161,6 +2161,7 @@ static int btrfs_punch_hole(struct inode *inode, loff_t 
offset, loff_t len)
struct btrfs_path *path;
struct btrfs_block_rsv *rsv;
struct btrfs_trans_handle *trans;
+   unsigned char blocksize_bits = inode-i_sb-s_blocksize_bits;
u64 lockstart = round_up(offset, BTRFS_I(inode)-root-sectorsize);
u64 lockend = round_down(offset + len,
 BTRFS_I(inode)-root-sectorsize) - 1;
@@ -2170,8 +2171,8 @@ static int btrfs_punch_hole(struct inode *inode, loff_t 
offset, loff_t len)
int ret = 0;
int err = 0;
int rsv_count;
-   bool same_page = ((offset  PAGE_CACHE_SHIFT) ==
- ((offset + len - 1)  PAGE_CACHE_SHIFT));
+   bool same_block = ((offset  blocksize_bits) ==
+ ((offset + len - 1)  blocksize_bits));
bool no_holes = btrfs_fs_incompat(root-fs_info, NO_HOLES);
 
ret = btrfs_wait_ordered_range(inode, offset, len);
@@ -2180,32 +2181,32 @@ static int btrfs_punch_hole(struct inode *inode, loff_t 
offset, loff_t len)
 
mutex_lock(inode-i_mutex);
/*
-* We needn't truncate any page which is beyond the end of the file
+* We needn't truncate any block which is beyond the end of the file
 * because we are sure there is no data there.
 */
/*
-* Only do this if we are in the same page and we aren't doing the
-* entire page.
+* Only do this if we are in the same block and we aren't doing the
+* entire block.
 */
-   if (same_page  len  PAGE_CACHE_SIZE) {
-   if (offset  round_up(inode-i_size, PAGE_CACHE_SIZE))
-   ret = btrfs_truncate_page(inode, offset, len, 0);
+   if (same_block  len  root-sectorsize) {
+   if (offset  round_up(inode-i_size, root-sectorsize))
+   ret = btrfs_truncate_block(inode, offset, len, 0);
mutex_unlock(inode-i_mutex);
return ret;
}
 
-   /* zero back part of the first page */
-   if (offset  round_up(inode-i_size, PAGE_CACHE_SIZE)) {
-   ret = btrfs_truncate_page(inode, offset, 0, 0);
+   /* zero back part of the first block */
+   if (offset  round_up(inode-i_size, root-sectorsize)) {
+   ret = btrfs_truncate_block(inode, offset, 0, 0);
if (ret) {
mutex_unlock(inode-i_mutex);
return ret;
}
}
 
-   /* zero the front end of the last page */
-   if (offset + len  round_up(inode-i_size, PAGE_CACHE_SIZE)) {
-   ret = btrfs_truncate_page(inode, offset + len, 0, 1);
+   /* zero the front end of the last block */
+   if (offset + len  round_up(inode-i_size, root-sectorsize)) {
+   ret = btrfs_truncate_block(inode, offset + len, 0, 1);
if (ret) {
mutex_unlock(inode-i_mutex);
return ret;
@@ -2410,10 +2411,10 @@ static long btrfs_fallocate(struct file *file, int mode,
} else {
/*
 * If we are fallocating from the end of the file onward we
-* need to zero out the end of the page if i_size lands in the
-* middle of a page.
+* need to zero out the end of the block if i_size lands in the
+* middle of a block.
 */
-   ret = btrfs_truncate_page(inode, inode-i_size, 0, 0);
+   ret = btrfs_truncate_block(inode, inode-i_size, 0, 0);
if (ret)
goto out;
}
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c

[RFC PATCH V5 04/12] Btrfs: subpagesize-blocksize: Define extent_buffer_head.

2014-07-29 Thread Chandan Rajendra
From: Chandra Seetharaman sekha...@us.ibm.com

In order to handle multiple extent buffers per page, first we need to create a
way to handle all the extent buffers that are attached to a page.

This patch creates a new data structure 'struct extent_buffer_head', and moves
fields that are common to all extent buffers in a page from 'struct extent
buffer' to 'struct extent_buffer_head'

Also, this patch moves EXTENT_BUFFER_TREE_REF, EXTENT_BUFFER_DUMMY and
EXTENT_BUFFER_IN_TREE flags from extent_buffer-ebflags  to
extent_buffer_head-bflags.

Signed-off-by: Chandra Seetharaman sekha...@us.ibm.com
Signed-off-by: Chandan Rajendra chan...@linux.vnet.ibm.com
---
 fs/btrfs/backref.c   |   2 +-
 fs/btrfs/ctree.c |   2 +-
 fs/btrfs/ctree.h |   6 +-
 fs/btrfs/disk-io.c   |  46 --
 fs/btrfs/extent-tree.c   |   6 +-
 fs/btrfs/extent_io.c | 372 +--
 fs/btrfs/extent_io.h |  46 --
 fs/btrfs/volumes.c   |   2 +-
 include/trace/events/btrfs.h |   2 +-
 9 files changed, 326 insertions(+), 158 deletions(-)

diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
index a88da72..603ae44 100644
--- a/fs/btrfs/backref.c
+++ b/fs/btrfs/backref.c
@@ -1272,7 +1272,7 @@ char *btrfs_ref_to_path(struct btrfs_root *fs_root, 
struct btrfs_path *path,
eb = path-nodes[0];
/* make sure we can use eb after releasing the path */
if (eb != eb_in) {
-   atomic_inc(eb-refs);
+   atomic_inc(eb_head(eb)-refs);
btrfs_tree_read_lock(eb);
btrfs_set_lock_blocking_rw(eb, BTRFS_READ_LOCK);
}
diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index cbd3a7d..0d4ad91 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -169,7 +169,7 @@ struct extent_buffer *btrfs_root_node(struct btrfs_root 
*root)
 * the inc_not_zero dance and if it doesn't work then
 * synchronize_rcu and try again.
 */
-   if (atomic_inc_not_zero(eb-refs)) {
+   if (atomic_inc_not_zero(eb_head(eb)-refs)) {
rcu_read_unlock();
break;
}
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index dac6653..901ada2 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2138,14 +2138,16 @@ static inline void btrfs_set_token_##name(struct 
extent_buffer *eb, \
 #define BTRFS_SETGET_HEADER_FUNCS(name, type, member, bits)\
 static inline u##bits btrfs_##name(struct extent_buffer *eb)   \
 {  \
-   type *p = page_address(eb-pages[0]);   \
+   type *p = page_address(eb_head(eb)-pages[0]) + \
+   (eb-start  (PAGE_CACHE_SIZE -1)); \
u##bits res = le##bits##_to_cpu(p-member); \
return res; \
 }  \
 static inline void btrfs_set_##name(struct extent_buffer *eb,  \
u##bits val)\
 {  \
-   type *p = page_address(eb-pages[0]);   \
+   type *p = page_address(eb_head(eb)-pages[0]) + \
+   (eb-start  (PAGE_CACHE_SIZE -1)); \
p-member = cpu_to_le##bits(val);   \
 }
 
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index cc1b423..bda2157 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1018,13 +1018,21 @@ static int btree_set_page_dirty(struct page *page)
 {
 #ifdef DEBUG
struct extent_buffer *eb;
+   int i, dirty = 0;
 
BUG_ON(!PagePrivate(page));
eb = (struct extent_buffer *)page-private;
BUG_ON(!eb);
-   BUG_ON(!test_bit(EXTENT_BUFFER_DIRTY, eb-bflags));
-   BUG_ON(!atomic_read(eb-refs));
-   btrfs_assert_tree_locked(eb);
+
+   do {
+   dirty = test_bit(EXTENT_BUFFER_DIRTY, eb-ebflags);
+   if (dirty)
+   break;
+   } while ((eb = eb-eb_next) != NULL);
+
+   BUG_ON(!dirty);
+   BUG_ON(!atomic_read((eb_head(eb)-refs)));
+   btrfs_assert_tree_locked(ebh-eb);
 #endif
return __set_page_dirty_nobuffers(page);
 }
@@ -1068,7 +1076,7 @@ int reada_tree_block_flagged(struct btrfs_root *root, u64 
bytenr, u32 blocksize,
if (!buf)
return 0;
 
-   set_bit(EXTENT_BUFFER_READAHEAD, buf-bflags);
+   set_bit(EXTENT_BUFFER_READAHEAD, buf-ebflags);
 
ret = read_extent_buffer_pages(io_tree, buf, 0, WAIT_PAGE_LOCK,
   btree_get_extent, mirror_num);
@@ -1077,7 

[RFC PATCH V5 01/12] Btrfs: subpagesize-blocksize: Get rid of whole page reads.

2014-07-29 Thread Chandan Rajendra
Based on original patch from Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com

bio_vec-{bv_offset, bv_len} cannot be relied upon by the end bio functions
to track the file offset range operated on by the bio. Hence this patch adds
two new members to 'struct btrfs_io_bio' to track the file offset range.

This patch also brings back check_page_locked() to reliably unlock pages in
readpage's end bio function.

Signed-off-by: Chandan Rajendra chan...@linux.vnet.ibm.com
---
 fs/btrfs/extent_io.c | 200 ++-
 fs/btrfs/volumes.h   |   3 +
 2 files changed, 90 insertions(+), 113 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index fbe501d..fa28545 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1943,15 +1943,29 @@ int test_range_bit(struct extent_io_tree *tree, u64 
start, u64 end,
  * helper function to set a given page up to date if all the
  * extents in the tree for that page are up to date
  */
-static void check_page_uptodate(struct extent_io_tree *tree, struct page *page)
+static void check_page_uptodate(struct extent_io_tree *tree, struct page *page,
+   struct extent_state *cached)
 {
u64 start = page_offset(page);
u64 end = start + PAGE_CACHE_SIZE - 1;
-   if (test_range_bit(tree, start, end, EXTENT_UPTODATE, 1, NULL))
+   if (test_range_bit(tree, start, end, EXTENT_UPTODATE, 1, cached))
SetPageUptodate(page);
 }
 
 /*
+ * helper function to unlock a page if all the extents in the tree
+ * for that page are unlocked
+ */
+static void check_page_locked(struct extent_io_tree *tree, struct page *page)
+{
+   u64 start = page_offset(page);
+   u64 end = start + PAGE_CACHE_SIZE - 1;
+
+   if (!test_range_bit(tree, start, end, EXTENT_LOCKED, 0, NULL)) {
+   unlock_page(page);
+   }
+}
+
  * When IO fails, either with EIO or csum verification fails, we
  * try other mirrors that might have a good copy of the data.  This
  * io_failure_record is used to record state as we go through all the
@@ -2173,6 +2187,7 @@ static int bio_readpage_error(struct bio *failed_bio, u64 
phy_offset,
struct bio *bio;
struct btrfs_io_bio *btrfs_failed_bio;
struct btrfs_io_bio *btrfs_bio;
+   int nr_sectors;
int num_copies;
int ret;
int read_mode;
@@ -2267,7 +2282,8 @@ static int bio_readpage_error(struct bio *failed_bio, u64 
phy_offset,
 *  a) deliver good data to the caller
 *  b) correct the bad sectors on disk
 */
-   if (failed_bio-bi_vcnt  1) {
+   nr_sectors = btrfs_io_bio(failed_bio)-len  
inode-i_sb-s_blocksize_bits;
+   if (nr_sectors  1) {
/*
 * to fulfill b), we need to know the exact failing sectors, as
 * we don't want to rewrite any more than the failed ones. thus,
@@ -2314,6 +2330,8 @@ static int bio_readpage_error(struct bio *failed_bio, u64 
phy_offset,
bio-bi_sector = failrec-logical  9;
bio-bi_bdev = BTRFS_I(inode)-root-fs_info-fs_devices-latest_bdev;
bio-bi_size = 0;
+   btrfs_io_bio(bio)-start_offset = start;
+   btrfs_io_bio(bio)-len = end - start + 1;
 
btrfs_failed_bio = btrfs_io_bio(failed_bio);
if (btrfs_failed_bio-csum) {
@@ -2414,18 +2432,6 @@ static void end_bio_extent_writepage(struct bio *bio, 
int err)
bio_put(bio);
 }
 
-static void
-endio_readpage_release_extent(struct extent_io_tree *tree, u64 start, u64 len,
- int uptodate)
-{
-   struct extent_state *cached = NULL;
-   u64 end = start + len - 1;
-
-   if (uptodate  tree-track_uptodate)
-   set_extent_uptodate(tree, start, end, cached, GFP_ATOMIC);
-   unlock_extent_cached(tree, start, end, cached, GFP_ATOMIC);
-}
-
 /*
  * after a readpage IO is done, we need to:
  * clear the uptodate bits on error
@@ -2440,76 +2446,50 @@ endio_readpage_release_extent(struct extent_io_tree 
*tree, u64 start, u64 len,
 static void end_bio_extent_readpage(struct bio *bio, int err)
 {
int uptodate = test_bit(BIO_UPTODATE, bio-bi_flags);
-   struct bio_vec *bvec_end = bio-bi_io_vec + bio-bi_vcnt - 1;
-   struct bio_vec *bvec = bio-bi_io_vec;
struct btrfs_io_bio *io_bio = btrfs_io_bio(bio);
+   struct bio_vec *bvec = bio-bi_io_vec;
+   struct bio_vec *bvec_end = bio-bi_io_vec + bio-bi_vcnt - 1;
+   struct address_space *mapping;
+   struct extent_state *cached = NULL;
struct extent_io_tree *tree;
-   u64 offset = 0;
+   struct btrfs_root *root;
+   struct inode *inode;
+   struct page *page;
u64 start;
-   u64 end;
+   u64 offset = 0;
u64 len;
-   u64 extent_start = 0;
-   u64 extent_len = 0;
+   int nr_sectors;
int mirror;
int ret;
 
-   if (err)
-   uptodate = 0;
+   mapping = 

[RFC PATCH V5 07/12] Btrfs: subpagesize-blocksize: Allow mounting filesystems where sectorsize != PAGE_SIZE

2014-07-29 Thread Chandan Rajendra
From: Chandra Seetharaman sekha...@us.ibm.com

This patch allows mounting filesystems with blocksize smaller than the
PAGE_SIZE.

Signed-off-by: Chandra Seetharaman sekha...@us.ibm.com
Signed-off-by: Chandan Rajendra chan...@linux.vnet.ibm.com
---
 fs/btrfs/disk-io.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 28a45f6..3bb7072 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2599,12 +2599,6 @@ int open_ctree(struct super_block *sb,
goto fail_sb_buffer;
}
 
-   if (sectorsize != PAGE_SIZE) {
-   printk(KERN_WARNING BTRFS: Incompatible sector size(%lu) 
-  found on %s\n, (unsigned long)sectorsize, sb-s_id);
-   goto fail_sb_buffer;
-   }
-
mutex_lock(fs_info-chunk_mutex);
ret = btrfs_read_sys_array(tree_root);
mutex_unlock(fs_info-chunk_mutex);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH V5 02/12] Btrfs: subpagesize-blocksize: Get rid of whole page writes.

2014-07-29 Thread Chandan Rajendra
This commit brings back functions that set/clear EXTENT_WRITEBACK bits. These
are required to reliably clear PG_writeback page flag.

Signed-off-by: Chandan Rajendra chan...@linux.vnet.ibm.com
---
 fs/btrfs/extent_io.c | 134 +--
 fs/btrfs/extent_io.h |   2 +-
 fs/btrfs/inode.c |  47 +-
 3 files changed, 124 insertions(+), 59 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index fa28545..ba04bd2 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1293,6 +1293,20 @@ int clear_extent_uptodate(struct extent_io_tree *tree, 
u64 start, u64 end,
cached_state, mask);
 }
 
+static int set_extent_writeback(struct extent_io_tree *tree, u64 start, u64 
end,
+   struct extent_state **cached_state, gfp_t mask)
+{
+   return set_extent_bit(tree, start, end, EXTENT_WRITEBACK, NULL,
+   cached_state, mask);
+}
+
+static int clear_extent_writeback(struct extent_io_tree *tree, u64 start, u64 
end,
+   struct extent_state **cached_state, gfp_t mask)
+{
+   return clear_extent_bit(tree, start, end, EXTENT_WRITEBACK, 1, 0,
+   cached_state, mask);
+}
+
 /*
  * either insert or lock state struct between start and end use mask to tell
  * us if waiting is desired.
@@ -1399,6 +1413,7 @@ static int set_range_writeback(struct extent_io_tree 
*tree, u64 start, u64 end)
page_cache_release(page);
index++;
}
+   set_extent_writeback(tree, start, end, NULL, GFP_NOFS);
return 0;
 }
 
@@ -1966,6 +1981,16 @@ static void check_page_locked(struct extent_io_tree 
*tree, struct page *page)
}
 }
 
+static void check_page_writeback(struct extent_io_tree *tree, struct page 
*page)
+{
+   u64 start = page_offset(page);
+   u64 end = start + PAGE_CACHE_SIZE - 1;
+
+   if (!test_range_bit(tree, start, end, EXTENT_WRITEBACK, 0, NULL))
+   end_page_writeback(page);
+}
+
+/*
  * When IO fails, either with EIO or csum verification fails, we
  * try other mirrors that might have a good copy of the data.  This
  * io_failure_record is used to record state as we go through all the
@@ -2359,27 +2384,69 @@ static int bio_readpage_error(struct bio *failed_bio, 
u64 phy_offset,
 }
 
 /* lots and lots of room for performance fixes in the end_bio funcs */
-
-int end_extent_writepage(struct page *page, int err, u64 start, u64 end)
+void end_extents_write(struct inode *inode, int err, u64 start, u64 end)
 {
+   struct extent_io_tree *tree = BTRFS_I(inode)-io_tree;
int uptodate = (err == 0);
-   struct extent_io_tree *tree;
+   pgoff_t index, end_index;
+   u64 page_start, page_end;
+   struct page *page;
int ret;
 
-   tree = BTRFS_I(page-mapping-host)-io_tree;
+   index = start  PAGE_CACHE_SHIFT;
+   end_index = end  PAGE_CACHE_SHIFT;
 
-   if (tree-ops  tree-ops-writepage_end_io_hook) {
-   ret = tree-ops-writepage_end_io_hook(page, start,
-  end, NULL, uptodate);
-   if (ret)
-   uptodate = 0;
+   page_start = start;
+
+   while (index = end_index) {
+   page = find_get_page(inode-i_mapping, index);
+   BUG_ON(!page);
+
+   page_end = min_t(u64, end, page_offset(page) + PAGE_CACHE_SIZE 
- 1);
+
+   if (tree-ops  tree-ops-writepage_end_io_hook) {
+   ret = tree-ops-writepage_end_io_hook(page,
+   page_start, page_end,
+   NULL, uptodate);
+   if (ret)
+   uptodate = 0;
+   }
+
+   page_start = page_end + 1;
+
+   ++index;
+
+   if (!uptodate) {
+   ClearPageUptodate(page);
+   SetPageError(page);
+   }
+
+   page_cache_release(page);
}
+}
 
-   if (!uptodate) {
-   ClearPageUptodate(page);
-   SetPageError(page);
+static void clear_extent_and_page_writeback(struct address_space *mapping,
+   struct extent_io_tree *tree,
+   struct btrfs_io_bio *io_bio)
+{
+   struct page *page;
+   pgoff_t index;
+   u64 offset, len;
+
+   offset  = io_bio-start_offset;
+   len = io_bio-len;
+
+   clear_extent_writeback(tree, offset, offset + len - 1, NULL,
+   GFP_ATOMIC);
+
+   index = offset  PAGE_CACHE_SHIFT;
+   while (offset  io_bio-start_offset + len) {
+   page = find_get_page(mapping, index);
+   check_page_writeback(tree, page);
+   page_cache_release(page);
+   

Re: [PATCH 3/4] btrfs-progs: Fix wrong indent in btrfs-progs.

2014-07-29 Thread David Sterba
On Thu, Jul 10, 2014 at 11:05:12AM +0800, Qu Wenruo wrote:
 When editing cmds-filesystem.c, I found cmd_filesystem_df() uses 7
 spaces as indent instead of 1 tab (or 8 spaces). which makes indent
 quite embarrassing.
 Such problem is especillay hard to detect when reviewing patches,
 since the leading '+' makes a tab only 7 spaces long, makeing 7 spaces
 look the same with a tab.
 
 This patch fixes all the 7 spaces indent.

The whitespace changes cause patch conflicts and I'm inclined not to add
them or postpone after the patch queue is relatively calm.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] btrfs-progs: replace a confusing raw number with a macro

2014-07-29 Thread David Sterba
On Thu, Jul 17, 2014 at 10:40:38AM +0800, Gui Hecheng wrote:
 The raw number 36 for the uuid string length is somewhat confusing,
 use a macro to define replace it.

There's the BTRFS_UUID_UNPARSED_SIZE macro, please use it instead to
avoid duplicate definitions.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] btrfs-progs: replace a confusing raw number with a macro

2014-07-29 Thread David Sterba
On Tue, Jul 29, 2014 at 02:16:14PM +0200, David Sterba wrote:
 On Thu, Jul 17, 2014 at 10:40:38AM +0800, Gui Hecheng wrote:
  The raw number 36 for the uuid string length is somewhat confusing,
  use a macro to define replace it.
 
 There's the BTRFS_UUID_UNPARSED_SIZE macro, please use it instead to
 avoid duplicate definitions.

Or I'll do that myself as it's a really a small change.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: Return right extent when fiemap gives unaligned offset and len.

2014-07-29 Thread David Sterba
On Fri, Jul 25, 2014 at 09:29:05AM +0800, Qu Wenruo wrote:
 
  Original Message 
 Subject: Re: [PATCH] btrfs: Return right extent when fiemap gives unaligned
 offset and len.
 From: David Sterba dste...@suse.cz
 To: Qu Wenruo quwen...@cn.fujitsu.com
 Date: 2014年07月24日 20:17
 On Fri, Jul 18, 2014 at 09:55:43AM +0800, Qu Wenruo wrote:
 When page aligned start and len passed to extent_fiemap(), the result is
 good, but when start and len is not aligned, e.g. start = 1 and len =
 4095 is passed to extent_fiemap(), it returns no extent.
 
 The problem is that start and len is all rounded down which causes the
 problem.
 ALIGN rounds up, not down. So the wrong rounding will use incorrect start
 (4096) and finds no extents if there's eg. only one [0,4095].
 Sorry for the wrong description in patch.
 Should I reword the patch and send a v2 patch?

You already did, but yes please.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-porgs: fix xfstest btrfs/023 random failure

2014-07-29 Thread David Sterba
On Thu, Jul 17, 2014 at 05:28:24PM +0800, Anand Jain wrote:
 xfstest btrfs/023 which does the following tests
 
 create_group_profile raid0
 check_group_profile RAID0
 
 create_group_profile raid1
 check_group_profile RAID1
 
 create_group_profile raid10
 check_group_profile RAID10
 
 create_group_profile raid5
 check_group_profile RAID5
 
 create_group_profile raid6
 check_group_profile RAID6
 
 fails randomly with the error as below
 
  ERROR: device scan failed '/dev/sde' - Invalid argument
 
 since failure is at random group profile it indicates to me that
 btrfs kernel did not see the newly created btrfs on the device
 
 To note: I have the following patch on the kernel which
 is not yet integrated, but its not related to this bug.
 
 btrfs: RFC: code optimize use btrfs_get_bdev_and_sb() at btrfs_scan_one_device

I guess the error was caused by this patch, and the fsync just made the
race window smaller. If you still think the fsync is useful, please
update the changelog.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] btrfs-progs: introduce test_issubvolname() for simplicity

2014-07-29 Thread David Sterba
On Fri, Jul 25, 2014 at 03:16:58PM +0900, Satoru Takeuchi wrote:
 From: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com
 
 There are many duplicated codes to check if the given string is
 correct subvolume name. Introduce test_issubvolname() for this
 purpose for simplicity.

Please move it to utils.c
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs-progs: fix some build warnings on 32bit platform

2014-07-29 Thread David Sterba
On Sun, Jul 27, 2014 at 12:49:55AM +0800, Wang Shilong wrote:
 cmds-restore.c:120:4: warning: format %lu expects argument of type
 long unsigned int, but argument 3 has type size_t [-Wformat=]
 fprintf(stderr, bad compress length %lu\n, in_len);

 --- a/cmds-restore.c
 +++ b/cmds-restore.c
 @@ -117,7 +117,7 @@ static int decompress_lzo(unsigned char *inbuf, char 
 *outbuf, u64 compress_len,
   in_len = read_compress_length(inbuf);
  
   if ((tot_in + LZO_LEN + in_len)  tot_len) {
 - fprintf(stderr, bad compress length %lu\n, in_len);
 + fprintf(stderr, bad compress length %u\n, in_len);

in_len is size_t, this prints a warning on 64bit. Let's make it %lu and
cas to unsigned long (fixed locally, no need to resend), we'er not using
the 'z' size_t modifier for printf anywhere.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs-progs: fix some build warnings on 32bit platform

2014-07-29 Thread Wang Shilong
Hi David,

 On Sun, Jul 27, 2014 at 12:49:55AM +0800, Wang Shilong wrote:
 cmds-restore.c:120:4: warning: format %lu expects argument of type
 long unsigned int, but argument 3 has type size_t [-Wformat=]
fprintf(stderr, bad compress length %lu\n, in_len);
 
 --- a/cmds-restore.c
 +++ b/cmds-restore.c
 @@ -117,7 +117,7 @@ static int decompress_lzo(unsigned char *inbuf, char 
 *outbuf, u64 compress_len,
  in_len = read_compress_length(inbuf);
 
  if ((tot_in + LZO_LEN + in_len)  tot_len) {
 -fprintf(stderr, bad compress length %lu\n, in_len);
 +fprintf(stderr, bad compress length %u\n, in_len);
 
 in_len is size_t, this prints a warning on 64bit. Let's make it %lu and
 cas to unsigned long (fixed locally, no need to resend), we'er not using
 the 'z' size_t modifier for printf anywhere.


I am sorry that i forgot to test the patch on 64bit.  As i have left current
company, i could not setup a 64 bit machine until next week.

So i will appreciate it if you could fix this when merging or wait
until next week once i got a 64bit machine.


Thanks,
Wang--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/4] btrfs-progs: Check fstype in find_mount_root()

2014-07-29 Thread David Sterba
On Wed, Jul 23, 2014 at 01:47:35PM +0800, Qu Wenruo wrote:
 --- a/utils.c
 +++ b/utils.c
 @@ -2390,6 +2390,9 @@ int lookup_ino_rootid(int fd, u64 *rootid)
   return 0;
  }
  
 +/* return 0 if a btrfs mount point if found
 + * return 1 if a mount point is found but not btrfs
 + * return 0 if something goes wrong */
  int find_mount_root(const char *path, char **mount_root)
  {
   FILE *mnttab;
 @@ -2397,6 +2400,7 @@ int find_mount_root(const char *path, char **mount_root)
   struct mntent *ent;
   int len;
   int ret;
 + int not_btrfs;

[CC] utils.o
utils.c: In function ‘find_mount_root’:
utils.c:2342:6: warning: ‘not_btrfs’ may be used uninitialized in this function

I've initialized to 1 to fix it, no need to resend the patch.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs-progs: fix some build warnings on 32bit platform

2014-07-29 Thread David Sterba
On Tue, Jul 29, 2014 at 09:56:46PM +0800, Wang Shilong wrote:
 Hi David,
 
  On Sun, Jul 27, 2014 at 12:49:55AM +0800, Wang Shilong wrote:
  cmds-restore.c:120:4: warning: format %lu expects argument of type
  long unsigned int, but argument 3 has type size_t [-Wformat=]
 fprintf(stderr, bad compress length %lu\n, in_len);
  
  --- a/cmds-restore.c
  +++ b/cmds-restore.c
  @@ -117,7 +117,7 @@ static int decompress_lzo(unsigned char *inbuf, char 
  *outbuf, u64 compress_len,
 in_len = read_compress_length(inbuf);
  
 if ((tot_in + LZO_LEN + in_len)  tot_len) {
  -  fprintf(stderr, bad compress length %lu\n, in_len);
  +  fprintf(stderr, bad compress length %u\n, in_len);
  
  in_len is size_t, this prints a warning on 64bit. Let's make it %lu and
  cas to unsigned long (fixed locally, no need to resend), we'er not using
  the 'z' size_t modifier for printf anywhere.
 
 
 I am sorry that i forgot to test the patch on 64bit.  As i have left current
 company, i could not setup a 64 bit machine until next week.
 
 So i will appreciate it if you could fix this when merging or wait
 until next week once i got a 64bit machine.

No problem, already fixed.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS hang with 3.16-rc5 (and also with 3.16-rc4)

2014-07-29 Thread Torbjørn

On 07/29/2014 12:18 PM, Liu Bo wrote:

On Mon, Jul 28, 2014 at 01:11:19PM +0200, Torbjørn wrote:

On 28. juli 2014 12:00, Liu Bo wrote:
snip

This seems to be incomplete(Looks like dmesg has reached its buffer size limit),
does /var/log/message have the whole stack info?

thanks,
-liubo

Hi,

Complete log was over 40MB. I uploaded everything from boot until
blocked for 120 seconds started to appear.
If you want all the trailing log as well, let me know.

https://gist.github.com/anonymous/7958d8917967f727f324

Sorry...still don't get why it's locked up, io_ctl_prepare_pages() has several
callers, and they are properly released from the code level.  And the warnings
printed in the log belong to other btrfs partitions, not the hanged btrfs one,
and we're still not able to know which one holds the free space cache inode 
page.

Maybe we'd better resort to a bisect between 3.14 and 3.15(I know it'd be a lot
of time though).

Here, doing rsync on compress=lzo full btrfs never hit that problem, shrug...

thanks,
-liubo


--
Torbjørn

That's too bad.

My reproducer is not 100% guaranteed to trigger the hang, so doing a 
bisect might lead us to some innocent commit.

I have run the rsync + snapshot job several times here now, and no hang.

--
Torbjørn
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Multi Core Support for compression in compression.c

2014-07-29 Thread Nick Krause
On Mon, Jul 28, 2014 at 2:36 PM, Nick Krause xerofo...@gmail.com wrote:
 On Mon, Jul 28, 2014 at 12:19 PM, Austin S Hemmelgarn
 ahferro...@gmail.com wrote:
 On 2014-07-28 11:57, Nick Krause wrote:
 On Mon, Jul 28, 2014 at 11:13 AM, Nick Krause xerofo...@gmail.com
 wrote:
 On Mon, Jul 28, 2014 at 6:10 AM, Austin S Hemmelgarn
 ahferro...@gmail.com wrote:
 On 07/27/2014 11:21 PM, Nick Krause wrote:
 On Sun, Jul 27, 2014 at 10:56 PM, Austin S Hemmelgarn
 ahferro...@gmail.com wrote:
 On 07/27/2014 04:47 PM, Nick Krause wrote:
 This may be a bad idea , but compression in brtfs seems
 to be only using one core to compress. Depending on the
 CPU used and the amount of cores in the CPU we can make
 this much faster with multiple cores. This seems bad by
 my reading at least I would recommend for writing
 compression we write a function to use a certain amount
 of cores based on the load of the system's CPU not using
 more then 75% of the system's CPU resources as my system
 when idle has never needed more then one core of my i5
 2500k to run when with interrupts for opening eclipse are
 running. For reading compression on good core seems fine
 to me as testing other compression software for reads ,
 it's way less CPU intensive. Cheers Nick
 We would probably get a bigger benefit from taking an
 approach like SquashFS has recently added, that is,
 allowing multi-threaded decompression fro reads, and
 decompressing directly into the pagecache. Such an approach
 would likely make zlib compression much more scalable on
 large systems.



 Austin, That seems better then my idea as you seem to be more
 up to date on brtfs devolopment. If you and the other
 developers of brtfs are interested in adding this as a
 feature please let me known as I would like to help improve
 brtfs as the file system as an idea is great just seems like
 it needs a lot of work :). Nick
 I wouldn't say that I am a BTRFS developer (power user maybe?),
 but I would definitely say that parallelizing compression on
 writes would be a good idea too (especially for things like
 lz4, which IIRC is either in 3.16 or in the queue for 3.17).
 Both options would be a lot of work, but almost any performance
 optimization would.  I would almost say that it would provide a
 bigger performance improvement to get BTRFS to intelligently
 stripe reads and writes (at the moment, any given worker thread
 only dispatches one write or read to a single device at a
 time, and any given write() or read() syscall gets handled by
 only one worker).


 I will look into this idea and see if I can do this for writes.
 Regards Nick

 Austin, Seems since we don't want to release the cache for inodes
 in order to improve writes if are going to use the page cache. We
 seem to be doing this for writes in end_compressed_bio_write for
 standard pages and in end_compressed_bio_write. If we want to cache
 write pages why are we removing then ? Seems like this needs to be
 removed in order to start off. Regards Nick

 I'm not entirely sure, it's been a while since I went exploring in the
 page-cache code.  My guess is that there is some reason that you and I
 aren't seeing that we are trying for write-around semantics, maybe one
 of the people who originally wrote this code could weigh in?  Part of
 this might be to do with the fact that normal page-cache semantics
 don't always work as expected with COW filesystems (cause a write goes
 to a different block on the device than a read before the write would
 have gone to).  It might be easier to parallelize reads first, and
 then work from that (and most workloads would probably benefit more
 from the parallelized reads).

 I will look into this later today and work on it then.
 Regards Nick

Seems the best way to do is to create a kernel thread per core like in NFS and
depending on the load of the system use these threads.
Regards Nick
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Multi Core Support for compression in compression.c

2014-07-29 Thread Austin S Hemmelgarn
On 2014-07-29 13:08, Nick Krause wrote:
 On Mon, Jul 28, 2014 at 2:36 PM, Nick Krause xerofo...@gmail.com wrote:
 On Mon, Jul 28, 2014 at 12:19 PM, Austin S Hemmelgarn
 ahferro...@gmail.com wrote:
 On 2014-07-28 11:57, Nick Krause wrote:
 On Mon, Jul 28, 2014 at 11:13 AM, Nick Krause xerofo...@gmail.com
 wrote:
 On Mon, Jul 28, 2014 at 6:10 AM, Austin S Hemmelgarn
 ahferro...@gmail.com wrote:
 On 07/27/2014 11:21 PM, Nick Krause wrote:
 On Sun, Jul 27, 2014 at 10:56 PM, Austin S Hemmelgarn
 ahferro...@gmail.com wrote:
 On 07/27/2014 04:47 PM, Nick Krause wrote:
 This may be a bad idea , but compression in brtfs seems
 to be only using one core to compress. Depending on the
 CPU used and the amount of cores in the CPU we can make
 this much faster with multiple cores. This seems bad by
 my reading at least I would recommend for writing
 compression we write a function to use a certain amount
 of cores based on the load of the system's CPU not using
 more then 75% of the system's CPU resources as my system
 when idle has never needed more then one core of my i5
 2500k to run when with interrupts for opening eclipse are
 running. For reading compression on good core seems fine
 to me as testing other compression software for reads ,
 it's way less CPU intensive. Cheers Nick
 We would probably get a bigger benefit from taking an
 approach like SquashFS has recently added, that is,
 allowing multi-threaded decompression fro reads, and
 decompressing directly into the pagecache. Such an approach
 would likely make zlib compression much more scalable on
 large systems.



 Austin, That seems better then my idea as you seem to be more
 up to date on brtfs devolopment. If you and the other
 developers of brtfs are interested in adding this as a
 feature please let me known as I would like to help improve
 brtfs as the file system as an idea is great just seems like
 it needs a lot of work :). Nick
 I wouldn't say that I am a BTRFS developer (power user maybe?),
 but I would definitely say that parallelizing compression on
 writes would be a good idea too (especially for things like
 lz4, which IIRC is either in 3.16 or in the queue for 3.17).
 Both options would be a lot of work, but almost any performance
 optimization would.  I would almost say that it would provide a
 bigger performance improvement to get BTRFS to intelligently
 stripe reads and writes (at the moment, any given worker thread
 only dispatches one write or read to a single device at a
 time, and any given write() or read() syscall gets handled by
 only one worker).


 I will look into this idea and see if I can do this for writes.
 Regards Nick

 Austin, Seems since we don't want to release the cache for inodes
 in order to improve writes if are going to use the page cache. We
 seem to be doing this for writes in end_compressed_bio_write for
 standard pages and in end_compressed_bio_write. If we want to cache
 write pages why are we removing then ? Seems like this needs to be
 removed in order to start off. Regards Nick

 I'm not entirely sure, it's been a while since I went exploring in the
 page-cache code.  My guess is that there is some reason that you and I
 aren't seeing that we are trying for write-around semantics, maybe one
 of the people who originally wrote this code could weigh in?  Part of
 this might be to do with the fact that normal page-cache semantics
 don't always work as expected with COW filesystems (cause a write goes
 to a different block on the device than a read before the write would
 have gone to).  It might be easier to parallelize reads first, and
 then work from that (and most workloads would probably benefit more
 from the parallelized reads).

 I will look into this later today and work on it then.
 Regards Nick
 
 Seems the best way to do is to create a kernel thread per core like in NFS and
 depending on the load of the system use these threads.
 Regards Nick
 
It might be more work now, but it would probably be better in the long
run to do it using kernel workqueues, as they would provide better
support for suspend/hibernate/resume, and then you wouldn't need to
worry about scheduling or how many CPU cores are in the system.



smime.p7s
Description: S/MIME Cryptographic Signature


Re: Multi Core Support for compression in compression.c

2014-07-29 Thread Nick Krause
On Tue, Jul 29, 2014 at 1:14 PM, Austin S Hemmelgarn
ahferro...@gmail.com wrote:
 On 2014-07-29 13:08, Nick Krause wrote:
 On Mon, Jul 28, 2014 at 2:36 PM, Nick Krause xerofo...@gmail.com wrote:
 On Mon, Jul 28, 2014 at 12:19 PM, Austin S Hemmelgarn
 ahferro...@gmail.com wrote:
 On 2014-07-28 11:57, Nick Krause wrote:
 On Mon, Jul 28, 2014 at 11:13 AM, Nick Krause xerofo...@gmail.com
 wrote:
 On Mon, Jul 28, 2014 at 6:10 AM, Austin S Hemmelgarn
 ahferro...@gmail.com wrote:
 On 07/27/2014 11:21 PM, Nick Krause wrote:
 On Sun, Jul 27, 2014 at 10:56 PM, Austin S Hemmelgarn
 ahferro...@gmail.com wrote:
 On 07/27/2014 04:47 PM, Nick Krause wrote:
 This may be a bad idea , but compression in brtfs seems
 to be only using one core to compress. Depending on the
 CPU used and the amount of cores in the CPU we can make
 this much faster with multiple cores. This seems bad by
 my reading at least I would recommend for writing
 compression we write a function to use a certain amount
 of cores based on the load of the system's CPU not using
 more then 75% of the system's CPU resources as my system
 when idle has never needed more then one core of my i5
 2500k to run when with interrupts for opening eclipse are
 running. For reading compression on good core seems fine
 to me as testing other compression software for reads ,
 it's way less CPU intensive. Cheers Nick
 We would probably get a bigger benefit from taking an
 approach like SquashFS has recently added, that is,
 allowing multi-threaded decompression fro reads, and
 decompressing directly into the pagecache. Such an approach
 would likely make zlib compression much more scalable on
 large systems.



 Austin, That seems better then my idea as you seem to be more
 up to date on brtfs devolopment. If you and the other
 developers of brtfs are interested in adding this as a
 feature please let me known as I would like to help improve
 brtfs as the file system as an idea is great just seems like
 it needs a lot of work :). Nick
 I wouldn't say that I am a BTRFS developer (power user maybe?),
 but I would definitely say that parallelizing compression on
 writes would be a good idea too (especially for things like
 lz4, which IIRC is either in 3.16 or in the queue for 3.17).
 Both options would be a lot of work, but almost any performance
 optimization would.  I would almost say that it would provide a
 bigger performance improvement to get BTRFS to intelligently
 stripe reads and writes (at the moment, any given worker thread
 only dispatches one write or read to a single device at a
 time, and any given write() or read() syscall gets handled by
 only one worker).


 I will look into this idea and see if I can do this for writes.
 Regards Nick

 Austin, Seems since we don't want to release the cache for inodes
 in order to improve writes if are going to use the page cache. We
 seem to be doing this for writes in end_compressed_bio_write for
 standard pages and in end_compressed_bio_write. If we want to cache
 write pages why are we removing then ? Seems like this needs to be
 removed in order to start off. Regards Nick

 I'm not entirely sure, it's been a while since I went exploring in the
 page-cache code.  My guess is that there is some reason that you and I
 aren't seeing that we are trying for write-around semantics, maybe one
 of the people who originally wrote this code could weigh in?  Part of
 this might be to do with the fact that normal page-cache semantics
 don't always work as expected with COW filesystems (cause a write goes
 to a different block on the device than a read before the write would
 have gone to).  It might be easier to parallelize reads first, and
 then work from that (and most workloads would probably benefit more
 from the parallelized reads).

 I will look into this later today and work on it then.
 Regards Nick

 Seems the best way to do is to create a kernel thread per core like in NFS 
 and
 depending on the load of the system use these threads.
 Regards Nick

 It might be more work now, but it would probably be better in the long
 run to do it using kernel workqueues, as they would provide better
 support for suspend/hibernate/resume, and then you wouldn't need to
 worry about scheduling or how many CPU cores are in the system.


Seems better then my ideas , I will need to work on this later as for now I have
some reading on the Linux networking stack.
Regards Nick
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 2/2] btrfs-progs: Unify the messy error message formats

2014-07-29 Thread Kyle Gates

 Date: Tue, 29 Jul 2014 11:18:17 +0900
 From: takeuchi_sat...@jp.fujitsu.com
 To: kylega...@hotmail.com; linux-btrfs@vger.kernel.org
 Subject: Re: [PATCH 2/2] btrfs-progs: Unify the messy error message formats

 Hi Kyle,

 (2014/07/28 22:24), Kyle Gates wrote:

 small wording error inline below

 
 Date: Fri, 25 Jul 2014 15:17:05 +0900
 From: takeuchi_sat...@jp.fujitsu.com
 To: linux-btrfs@vger.kernel.org
 Subject: [PATCH 2/2] btrfs-progs: Unify the messy error message formats

 From: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com

 - There are many format to show snapshot name in error messages,
 '%s', '%s, %s, ('%s'), and ('%s). Since it's messy,
 unify these to '%s' format.
 - Fix a type: s/uncorrect/incorrect/

 Signed-off-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com

 ---
 cmds-subvolume.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

 diff --git a/cmds-subvolume.c b/cmds-subvolume.c
 index b7bfb3e..ce38503 100644
 --- a/cmds-subvolume.c
 +++ b/cmds-subvolume.c
 @@ -140,14 +140,14 @@ static int cmd_subvol_create(int argc, char **argv)
 dstdir = dirname(dupdir);

 if (!test_issubvolname(newname)) {
 - fprintf(stderr, ERROR: uncorrect subvolume name ('%s')\n,
 + fprintf(stderr, ERROR: incorrect subvolume name '%s'\n,
 newname);
 goto out;
 }

 len = strlen(newname);
 if (len == 0 || len= BTRFS_VOL_NAME_MAX) {
 - fprintf(stderr, ERROR: subvolume name('%s)\n,
 + fprintf(stderr, ERROR: subvolume name too long '%s'\n,
 newname);
 goto out;
 }
 @@ -314,7 +314,7 @@ again:
 free(cpath);

 if (!test_issubvolname(vname)) {
 - fprintf(stderr, ERROR: incorrect subvolume name ('%s')\n,
 + fprintf(stderr, ERROR: incorrect subvolume name '%s'\n,
 vname);
 ret = 1;
 goto out;
 @@ -322,7 +322,7 @@ again:

 len = strlen(vname);
 if (len == 0 || len= BTRFS_VOL_NAME_MAX) {
 - fprintf(stderr, ERROR: snapshot name too long ('%s)\n,
 + fprintf(stderr, ERROR: too long snapshot name '%s'\n,

 + fprintf(stderr, ERROR: snapshot name too long '%s'\n,

 Thank you for your comment. Fixed. How about is it?

Yes, that looks good. Thanks.

 ===
 From 73f9847c603fbe863f072d029b1a4948a1032d6e Mon Sep 17 00:00:00 2001
 From: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com
 Date: Fri, 25 Jul 2014 12:46:27 +0900
 Subject: [PATCH] btrfs-progs: unify the format of error messages.

 ---
 cmds-subvolume.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

 diff --git a/cmds-subvolume.c b/cmds-subvolume.c
 index b7bfb3e..5a99c94 100644
 --- a/cmds-subvolume.c
 +++ b/cmds-subvolume.c
 @@ -140,14 +140,14 @@ static int cmd_subvol_create(int argc, char **argv)
 dstdir = dirname(dupdir);

 if (!test_issubvolname(newname)) {
 - fprintf(stderr, ERROR: uncorrect subvolume name ('%s')\n,
 + fprintf(stderr, ERROR: incorrect subvolume name '%s'\n,
 newname);
 goto out;
 }

 len = strlen(newname);
 if (len == 0 || len= BTRFS_VOL_NAME_MAX) {
 - fprintf(stderr, ERROR: subvolume name too long ('%s)\n,
 + fprintf(stderr, ERROR: subvolume name too long '%s'\n,
 newname);
 goto out;
 }
 @@ -314,7 +314,7 @@ again:
 free(cpath);

 if (!test_issubvolname(vname)) {
 - fprintf(stderr, ERROR: incorrect subvolume name ('%s')\n,
 + fprintf(stderr, ERROR: incorrect subvolume name '%s'\n,
 vname);
 ret = 1;
 goto out;
 @@ -322,7 +322,7 @@ again:

 len = strlen(vname);
 if (len == 0 || len= BTRFS_VOL_NAME_MAX) {
 - fprintf(stderr, ERROR: snapshot name too long ('%s)\n,
 + fprintf(stderr, ERROR: snapshot name too long '%s'\n,
 vname);
 ret = 1;
 goto out;
 @@ -722,14 +722,14 @@ static int cmd_snapshot(int argc, char **argv)
 }

 if (!test_issubvolname(newname)) {
 - fprintf(stderr, ERROR: incorrect snapshot name ('%s')\n,
 + fprintf(stderr, ERROR: incorrect snapshot name '%s'\n,
 newname);
 goto out;
 }

 len = strlen(newname);
 if (len == 0 || len= BTRFS_VOL_NAME_MAX) {
 - fprintf(stderr, ERROR: snapshot name too long ('%s)\n,
 + fprintf(stderr, ERROR: snapshot name too long '%s'\n,
 newname);
 goto out;
 }
 @@ -778,7 +778,7 @@ static int cmd_snapshot(int argc, char **argv)
 res = ioctl(fddst, BTRFS_IOC_SNAP_CREATE_V2, args);

 if (res  0) {
 - fprintf( stderr, ERROR: cannot snapshot %s - %s\n,
 + fprintf( stderr, ERROR: cannot snapshot '%s' - %s\n,
 subvol_descr, strerror(errno));
 goto out;
 }
 @@ -991,7 +991,7 @@ static int cmd_subvol_show(int argc, char **argv)

 ret = find_mount_root(fullpath, mnt);
 if (ret  0) {
 - fprintf(stderr, ERROR: find_mount_root failed on %s: 
 + fprintf(stderr, ERROR: find_mount_root failed on '%s': 
 %s\n, fullpath, strerror(-ret));
 goto out;
 }
 --
 1.9.3

  --
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] btrfs-progs: replace a confusing raw number with a macro

2014-07-29 Thread Gui Hecheng
On Tue, 2014-07-29 at 14:19 +0200, David Sterba wrote:
 On Tue, Jul 29, 2014 at 02:16:14PM +0200, David Sterba wrote:
  On Thu, Jul 17, 2014 at 10:40:38AM +0800, Gui Hecheng wrote:
   The raw number 36 for the uuid string length is somewhat confusing,
   use a macro to define replace it.
  
  There's the BTRFS_UUID_UNPARSED_SIZE macro, please use it instead to
  avoid duplicate definitions.
 
 Or I'll do that myself as it's a really a small change.
Oh, yes, it is really kind of you to do so :)


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] btrfs-progs: introduce test_issubvolname() for simplicity

2014-07-29 Thread Satoru Takeuchi

Hi David,

(2014/07/29 22:32), David Sterba wrote:

On Fri, Jul 25, 2014 at 03:16:58PM +0900, Satoru Takeuchi wrote:

From: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com

There are many duplicated codes to check if the given string is
correct subvolume name. Introduce test_issubvolname() for this
purpose for simplicity.


Please move it to utils.c


OK, I'll do. In addition, how about moving test_isdir() and
test_issubvolume() to utils.c too? These are also utility functions.

Thanks,
Satoru


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Work Queue for btrfs compression writes

2014-07-29 Thread Nick Krause
Hey Guys ,
I am new to   reading  and writing  kernel code.I got interested in
writing code for btrfs as it seems to
need more work then other file systems and this seems other then
drivers, a good use of time on my part.
I interested in helping improving the compression of btrfs by using  a
set of threads using work queues like XFS
or reads and keeping the page cache after reading compressed blocks as
these seem to be a great way to improve
on  compression performance mostly with large partitions of compressed
data. I am not asking you to write the code
for me but as I am new a little guidance and help would be greatly
appreciated as this seems like too much work for just a newbie.
Thanks A lot,
Nick
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH not for integration] btrfs-devlist: dumps btrfs_device and btrfs_fs_devices from kernel

2014-07-29 Thread Anand Jain




On 26/07/2014 15:05, Goffredo Baroncelli wrote:

On 07/25/2014 02:33 PM, Anand Jain wrote:

This would dump the following info:

fs_address dev_address dev_root_addr root_fsid
fsid name uuid (seed_fsid@seed_addr sprout_fsid@sprout_addr)
(fs_num_devices fs_open_devices fs_rw_devices fs_missing_devices 
fs_total_devices) fs_total_rw_bytes fs_num_can_discard
devid gen total_bytes disk_total_bytes bytes_used type io_align 
io_width sector_size fmode
not_fs_Mounted|not_fs_Seeding|not_fs_Rotating

not_Writable|not_MD|not_Missing|not_Discard|not_Replace_tgt|not_Run_pending|not_Nobarriers|not_Stat_valid|not_Bdev

Applies on Chris integration branch now


Hi Anand,

why not export these information via sysfs ?



Thanks for the comments.

 If its for the purpose of recreating the kernel
 LIST_HEAD(fs_uuids) in the user-space, what would you
 choose ? sysfs / ioctl / memory-dump / any-other better way ?

Regds, Anand




BR
G.Baroncelli



Anand Jain (1):
   btrfs: introduce BTRFS_IOC_GET_DEVS

  fs/btrfs/super.c   |  86 +++
  fs/btrfs/volumes.c | 145 +
  fs/btrfs/volumes.h |   3 +
  include/uapi/linux/btrfs.h |  53 -
  4 files changed, 286 insertions(+), 1 deletion(-)

Anand Jain (1):
   btrfs-progs: introduce btrfs-devlist

  .gitignore  |   1 +
  Makefile|   4 +-
  btrfs-devlist.c | 268 
  ioctl.h |  52 +++
  4 files changed, 323 insertions(+), 2 deletions(-)
  create mode 100644 btrfs-devlist.c





--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS hang with 3.16-rc5 (and also with 3.16-rc4)

2014-07-29 Thread Liu Bo
On Tue, Jul 29, 2014 at 05:07:31PM +0200, Torbjørn wrote:
 On 07/29/2014 12:18 PM, Liu Bo wrote:
 On Mon, Jul 28, 2014 at 01:11:19PM +0200, Torbjørn wrote:
 On 28. juli 2014 12:00, Liu Bo wrote:
 snip
 This seems to be incomplete(Looks like dmesg has reached its buffer size 
 limit),
 does /var/log/message have the whole stack info?
 
 thanks,
 -liubo
 Hi,
 
 Complete log was over 40MB. I uploaded everything from boot until
 blocked for 120 seconds started to appear.
 If you want all the trailing log as well, let me know.
 
 https://gist.github.com/anonymous/7958d8917967f727f324
 Sorry...still don't get why it's locked up, io_ctl_prepare_pages() has 
 several
 callers, and they are properly released from the code level.  And the 
 warnings
 printed in the log belong to other btrfs partitions, not the hanged btrfs 
 one,
 and we're still not able to know which one holds the free space cache inode 
 page.
 
 Maybe we'd better resort to a bisect between 3.14 and 3.15(I know it'd be a 
 lot
 of time though).
 
 Here, doing rsync on compress=lzo full btrfs never hit that problem, shrug...
 
 thanks,
 -liubo
 
 --
 Torbjørn
 That's too bad.
 
 My reproducer is not 100% guaranteed to trigger the hang, so doing a
 bisect might lead us to some innocent commit.
 I have run the rsync + snapshot job several times here now, and no hang.

Good news! I've reproduced it with my xfstests config, will dig into it closer.

thanks,
-liubo
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] btrfs-progs: correct manpage option description for scrub

2014-07-29 Thread Satoru Takeuchi
Hi Gui,

(2014/07/17 11:40), Gui Hecheng wrote:
 The -f option of scrub means to skip checking running scrub,
 not to force checking.
 
 Signed-off-by: Gui Hecheng guihc.f...@cn.fujitsu.com
 ---
   Documentation/btrfs-scrub.txt | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/Documentation/btrfs-scrub.txt b/Documentation/btrfs-scrub.txt
 index 7b27d63..1af9b9f 100644
 --- a/Documentation/btrfs-scrub.txt
 +++ b/Documentation/btrfs-scrub.txt
 @@ -47,7 +47,7 @@ manpage).
   -n ioprio_classdata
   Set IO priority classdata (see `ionice`(1) manpage).
   -f
 -force to check whether scrub has started or resumed in userspace.
 +force to skip checking whether scrub has started or resumed in userspace.

I consider scrub has started and resumed is not user-friendly
expression. First, it can be replaced with more easy one,
scrub is running. Second, there in no explanation about
this checking behavior before -f option's description.

So, how about the following idea?

 Fix 1. Add If scrub is already running running, it fails.
to the description before `Options` section
 Fix 2. Replace force to check ... with
force starting new scrub even if scrub is already running.
 Fix 3. Fix cmd_scrub_start_usage too.

Thanks,
Satoru

   this is useful when scrub stat record file is damaged.
   
   *cancel* path|device::
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] btrfs-progs: correct manpage option description for scrub

2014-07-29 Thread Gui Hecheng
On Wed, 2014-07-30 at 14:20 +0900, Satoru Takeuchi wrote:
 Hi Gui,
 
 (2014/07/17 11:40), Gui Hecheng wrote:
  The -f option of scrub means to skip checking running scrub,
  not to force checking.
  
  Signed-off-by: Gui Hecheng guihc.f...@cn.fujitsu.com
  ---
Documentation/btrfs-scrub.txt | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
  
  diff --git a/Documentation/btrfs-scrub.txt b/Documentation/btrfs-scrub.txt
  index 7b27d63..1af9b9f 100644
  --- a/Documentation/btrfs-scrub.txt
  +++ b/Documentation/btrfs-scrub.txt
  @@ -47,7 +47,7 @@ manpage).
-n ioprio_classdata
Set IO priority classdata (see `ionice`(1) manpage).
-f
  -force to check whether scrub has started or resumed in userspace.
  +force to skip checking whether scrub has started or resumed in userspace.

Hi Satoru,
Thanks for your comments first. My opinions are as follows:

 I consider scrub has started and resumed is not user-friendly
 expression. First, it can be replaced with more easy one,
 scrub is running. Second, there in no explanation about
 this checking behavior before -f option's description.

Yes, Scrub is running is more precise.

 So, how about the following idea?
 
  Fix 1. Add If scrub is already running running, it fails.
 to the description before `Options` section
  This is really a valuable idea. 

  Fix 2. Replace force to check ... with
 force starting new scrub even if scrub is already running.
  This is more precise.

  Fix 3. Fix cmd_scrub_start_usage too.
  Of course, thanks for reminding me.

So please let me rework this patch, could I add your sign-off-by then?

Thanks,
Gui
 
 Thanks,
 Satoru
 
this is useful when scrub stat record file is damaged.

*cancel* path|device::
  
 


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html