Re: [PATCH v3 06/10] writeback: introduce super_operations->write_metadata

2018-09-28 Thread Chandan Rajendra
On Monday, January 29, 2018 2:36:15 PM IST Chandan Rajendra wrote:
> On Wednesday, January 3, 2018 9:59:24 PM IST Josef Bacik wrote:
> > On Wed, Jan 03, 2018 at 05:26:03PM +0100, Jan Kara wrote:
> 
> > 
> > Oh ok well if that's the case then I'll fix this up to be a ratio, test
> > everything, and send it along probably early next week.  Thanks,
> > 
> 
> Hi Josef,
> 
> Did you get a chance to work on the next version of this patchset?
> 
> 
> 

Josef,  Any updates on this and the "Kill Btree inode" patchset?

-- 
chandan



Re: [PATCH v3 06/10] writeback: introduce super_operations->write_metadata

2018-01-29 Thread Chandan Rajendra
On Wednesday, January 3, 2018 9:59:24 PM IST Josef Bacik wrote:
> On Wed, Jan 03, 2018 at 05:26:03PM +0100, Jan Kara wrote:

> 
> Oh ok well if that's the case then I'll fix this up to be a ratio, test
> everything, and send it along probably early next week.  Thanks,
> 

Hi Josef,

Did you get a chance to work on the next version of this patchset?


-- 
chandan

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V21 00/19] Allow I/O on blocks whose size is less than page size

2017-06-19 Thread Chandan Rajendra
On Sunday, October 2, 2016 6:54:09 PM IST Chandan Rajendra wrote:
> Btrfs assumes block size to be the same as the machine's page
> size. This would mean that a Btrfs instance created on a 4k page size
> machine (e.g. x86) will not be mountable on machines with larger page
> sizes (e.g. PPC64/AARCH64). This patchset aims to resolve this
> incompatibility.
> 
> This patchset continues with the work posted previously at
> http://marc.info/?l=linux-btrfs&m=146760691422240&w=2
> 
> This patchset is based on top of Josef's
> 1. Metadata throttling in writeback patches
> 2. Kill the btree inode patches

Hi Josef,

Did you get any chance to work on the above listed patchsets? 

Please let me know when you get a fairly working solution uploaded on your 
Linux git tree. I could use it to rebase my patchset and start testing the
code base.

I have put in a lot of time & effort to get the subpage-blocksize
patchset in its current form. Rebasing and retesting the
subpage-blocksize patchset across various kernel releases also would
consume time. It would be great to have it merged into the mainline
kernel. Once that is done, I will have to get other features of Btrfs
(scrub, compression, etc) to work in subpage-blocksize scenario.

It would be great to have it merged into the mainline kernel
soon. Once that is done, I will have to get other features of Btrfs
(scrub, compression, etc) to work in subpage-blocksize scenario.

> The major change in this version is the usage of kmalloc()-ed memory for
> holding metadata blocks whose size is less than the machine's page size. This
> vastly reduces the complexity of extent buffer mangement (Thanks to Josef's
> "Kill the btree inode patches").
> 
> When writing back dirty extent buffers, we currently track the corresponding
> extent buffers using the pointer at page->private. With kmalloc-ed() memory
> this isn't possible and hence we track the first extent buffer under writeback
> using bio->bi_private. Also, For kmalloc-ed() extent buffers this patchset
> currently limits the number of dirty extent buffers in a "write" bio to
> 1. This limit will be removed in a future patchset.
> 

-- 
chandan

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/2] Btrfs: compression must free at least PAGE_SIZE

2017-05-25 Thread Chandan Rajendra
On Sunday, May 21, 2017 12:10:39 AM IST Timofey Titovets wrote:
> Btrfs already skip store of data where compression didn't free at least one 
> byte.
> So make logic better and make check that compression free at least one 
> PAGE_SIZE,
> because in another case it useless to store this data compressed
> 
> Signed-off-by: Timofey Titovets 
> ---
>  fs/btrfs/lzo.c  | 5 -
>  fs/btrfs/zlib.c | 3 ++-
>  2 files changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/lzo.c b/fs/btrfs/lzo.c
> index bd0b0938..39678499 100644
> --- a/fs/btrfs/lzo.c
> +++ b/fs/btrfs/lzo.c
> @@ -229,8 +229,11 @@ static int lzo_compress_pages(struct list_head *ws,
>   in_len = min(bytes_left, PAGE_SIZE);
>   }
> 
> - if (tot_out > tot_in)
> + /* Compression must save at least one PAGE_SIZE */
> + if (tot_out + PAGE_SIZE > tot_in) {
> + ret = -E2BIG;
>   goto out;
> + }

Apologies for the delayed response.

I am not really sure if compression code must save atleast one sectorsize
worth of space. But if other developers agree to it, then the above
'if' condition can be replaced with,

u32 sectorsize = btrfs_inode_sectorsize(mapping->host);
...
...

if (tot_out + sectorsize > tot_in) {

> 
>   /* store the size of all chunks of compressed data */
>   cpage_out = kmap(pages[0]);
> diff --git a/fs/btrfs/zlib.c b/fs/btrfs/zlib.c
> index 135b1082..11e117b5 100644
> --- a/fs/btrfs/zlib.c
> +++ b/fs/btrfs/zlib.c
> @@ -191,7 +191,8 @@ static int zlib_compress_pages(struct list_head *ws,
>   goto out;
>   }
> 
> - if (workspace->strm.total_out >= workspace->strm.total_in) {
> + /* Compression must save at least one PAGE_SIZE */
> + if (workspace->strm.total_out + PAGE_SIZE > workspace->strm.total_in) {
>   ret = -E2BIG;
>   goto out;
>   }
> --
> 2.13.0
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 


-- 
chandan

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] Btrfs: don't pass the inode through clean_io_failure

2017-05-08 Thread Chandan Rajendra
On Friday, May 05, 2017 11:57:15 AM Josef Bacik wrote:
> Instead pass around the failure tree and the io tree.
>
The changes look fine,

Reviewed-by: Chandan Rajendra 

-- 
chandan

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] btrfs: remove inode argument from repair_io_failure

2017-05-08 Thread Chandan Rajendra
On Friday, May 05, 2017 11:57:14 AM Josef Bacik wrote:
> Once we remove the btree_inode we won't have an inode to pass anymore, just 
> pass
> the fs_info directly and the inum since we use that to print out the repair
> message.
>
The changes look fine,

Reviewed-by: Chandan Rajendra 

-- 
chandan

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] Btrfs: replace tree->mapping with tree->private_data

2017-05-08 Thread Chandan Rajendra
On Friday, May 05, 2017 11:57:13 AM Josef Bacik wrote:
> For extent_io tree's we have carried the address_mapping of the inode around 
> in
> the io tree in order to pull the inode back out for calling into various tree
> ops hooks.  This works fine when everything that has an extent_io_tree has an
> inode.  But we are going to remove the btree_inode, so we need to change this.
> Instead just have a generic void * for private data that we can initialize 
> with,
> and have all the tree ops use that instead.  This had a lot of cascading 
> changes
> but should be relatively straightforward.
>

The changes look fine,

Reviewed-by: Chandan Rajendra 

-- 
chandan

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: adds FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR ioctl interface support

2017-04-04 Thread Chandan Jay Sharma
This patch adds FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR ioctl interface support
for btrfs. Extended file attributes are 32 bit values (FS_XFLAGS_SYNC,
FS_XFLAG_IMMUTABLE, etc) which have one-to-one mapping to the flag values
that can be stored in inode->i_flags (i.e. S_SYNC, S_IMMUTABLE, etc).
The flags can be set/unset to enable/disable file attributes.
These attributes are listed/modified by lsattr/chattr.

Signed-off-by: Chandan Jay Sharma 
---
 fs/btrfs/ioctl.c | 148 +++
 1 file changed, 148 insertions(+)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index dabfc7a..5d8486b 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -132,6 +132,25 @@ static unsigned int btrfs_flags_to_ioctl(unsigned int 
flags)
return iflags;
 }
 
+/* Transfer ioctl flags to btrfs internal flags */
+static unsigned int btrfs_ioctl_to_flags(unsigned int iflags)
+{
+   unsigned int flags = 0;
+
+   if (iflags & FS_SYNC_FL)
+   flags |= BTRFS_INODE_SYNC;
+   if (iflags & FS_IMMUTABLE_FL)
+   flags |= BTRFS_INODE_IMMUTABLE;
+   if (iflags & FS_APPEND_FL)
+   flags |= BTRFS_INODE_APPEND;
+   if (iflags & FS_NODUMP_FL)
+   flags |= BTRFS_INODE_NODUMP;
+   if (iflags & FS_NOATIME_FL)
+   flags |= BTRFS_INODE_NOATIME;
+
+   return flags;
+}
+
 /*
  * Update inode->i_flags based on the btrfs internal flags.
  */
@@ -157,6 +176,75 @@ void btrfs_update_iflags(struct inode *inode)
 }
 
 /*
+ * Propagate flags from i_flags to BTRFS_I(inode)->flags
+ */
+void btrfs_get_inode_flags(struct btrfs_inode *ip)
+{
+   unsigned int vfs_fl;
+   unsigned long old_fl, new_fl;
+
+   do {
+   vfs_fl = ip->vfs_inode.i_flags;
+   old_fl = ip->flags;
+   new_fl = old_fl & ~(BTRFS_INODE_SYNC|BTRFS_INODE_APPEND|
+   BTRFS_INODE_IMMUTABLE|BTRFS_INODE_NOATIME|
+   BTRFS_INODE_DIRSYNC);
+   if (vfs_fl & S_SYNC)
+   new_fl |= BTRFS_INODE_SYNC;
+   if (vfs_fl & S_APPEND)
+   new_fl |= BTRFS_INODE_APPEND;
+   if (vfs_fl & S_IMMUTABLE)
+   new_fl |= BTRFS_INODE_IMMUTABLE;
+   if (vfs_fl & S_NOATIME)
+   new_fl |= BTRFS_INODE_NOATIME;
+   if (vfs_fl & S_DIRSYNC)
+   new_fl |= BTRFS_INODE_DIRSYNC;
+   } while (cmpxchg(&ip->flags, old_fl, new_fl) != old_fl);
+}
+
+/*
+ * Translate btrfs internal flags BTRFS_I(inode)->flags to xflags.
+ */
+static inline unsigned int btrfs_flags_to_xflags(unsigned int flags)
+{
+   unsigned int xflags = 0;
+
+   if (flags & BTRFS_INODE_SYNC)
+   xflags |= FS_XFLAG_SYNC;
+   if (flags & BTRFS_INODE_IMMUTABLE)
+   xflags |= FS_XFLAG_IMMUTABLE;
+   if (flags & BTRFS_INODE_APPEND)
+   xflags |= FS_XFLAG_APPEND;
+   if (flags & BTRFS_INODE_NODUMP)
+   xflags |= FS_XFLAG_NODUMP;
+   if (flags & BTRFS_INODE_NOATIME)
+   xflags |= FS_XFLAG_NOATIME;
+
+   return xflags;
+}
+
+/*
+ * Transfer xflags flags to ioctl flags.
+ */
+static inline unsigned int btrfs_xflags_to_ioctl(unsigned int xflags)
+{
+   unsigned int flags = 0;
+
+   if (xflags & FS_XFLAG_SYNC)
+   flags |= FS_SYNC_FL;
+   if (xflags & FS_XFLAG_IMMUTABLE)
+   flags |= FS_IMMUTABLE_FL;
+   if (xflags & FS_XFLAG_APPEND)
+   flags |= FS_APPEND_FL;
+   if (xflags & FS_XFLAG_NODUMP)
+   flags |= FS_NODUMP_FL;
+   if (xflags & FS_XFLAG_NOATIME)
+   flags |= FS_NOATIME_FL;
+
+   return flags;
+}
+
+/*
  * Inherit flags from the parent inode.
  *
  * Currently only the compression flags and the cow flags are inherited.
@@ -5504,6 +5592,62 @@ static int btrfs_ioctl_set_features(struct file *file, 
void __user *arg)
return ret;
 }
 
+static int btrfs_ioctl_fsgetxattr(struct file *file, void __user *arg)
+{
+   struct fsxattr fa;
+   struct btrfs_inode *ip = BTRFS_I(file_inode(file));
+
+   memset(&fa, 0, sizeof(struct fsxattr));
+   btrfs_get_inode_flags(ip);
+   fa.fsx_xflags = btrfs_flags_to_xflags(ip->flags);
+
+   if (copy_to_user((struct fsxattr __user *)arg,
+   &fa, sizeof(fa)))
+   return -EFAULT;
+
+   return 0;
+}
+
+static int btrfs_ioctl_fssetxattr(struct file *file, void __user *arg)
+{
+   struct inode *inode = file_inode(file);
+   struct btrfs_inode *ip = BTRFS_I(inode);
+   struct btrfs_root *root = ip->root;
+   struct fsxattr fa;
+   unsigned int flags;
+   int err;
+
+   /* Make sure caller has proper permission */
+   if (!inode_owner_or_capable(inode))
+  

[PATCH] btrfs: add support for extended file attributes

2017-03-14 Thread Chandan Jay Sharma
This commit impliments FS_IOC_FSGETXATTR and FS_IOC_FSSETXATTR ioctls for BTRFS.

Signed-off-by: Chandan Jay Sharma 
---
 fs/btrfs/ioctl.c | 146 +++
 1 file changed, 146 insertions(+)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 33f967d..9d30afe 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -132,6 +132,25 @@ static unsigned int btrfs_flags_to_ioctl(unsigned int 
flags)
return iflags;
 }
 
+/* Transfer ioctl flags to btrfs internal flags */
+static unsigned int btrfs_ioctl_to_flags(unsigned int iflags)
+{
+   unsigned int flags = 0;
+
+   if (iflags & FS_SYNC_FL)
+   flags |= BTRFS_INODE_SYNC;
+   if (iflags & FS_IMMUTABLE_FL)
+   flags |= BTRFS_INODE_IMMUTABLE;
+   if (iflags & FS_APPEND_FL)
+   flags |= BTRFS_INODE_APPEND;
+   if (iflags & FS_NODUMP_FL)
+   flags |= BTRFS_INODE_NODUMP;
+   if (iflags & FS_NOATIME_FL)
+   flags |= BTRFS_INODE_NOATIME;
+
+   return flags;
+}
+
 /*
  * Update inode->i_flags based on the btrfs internal flags.
  */
@@ -157,6 +176,75 @@ void btrfs_update_iflags(struct inode *inode)
 }
 
 /*
+ * Propagate flags from i_flags to BTRFS_I(inode)->flags
+ */
+void btrfs_get_inode_flags(struct btrfs_inode *ip)
+{
+   unsigned int vfs_fl;
+   unsigned long old_fl, new_fl;
+
+   do {
+   vfs_fl = ip->vfs_inode.i_flags;
+   old_fl = ip->flags;
+   new_fl = old_fl & ~(BTRFS_INODE_SYNC | BTRFS_INODE_APPEND |
+   BTRFS_INODE_IMMUTABLE | BTRFS_INODE_NOATIME |
+   BTRFS_INODE_DIRSYNC);
+   if (vfs_fl & S_SYNC)
+   new_fl |= BTRFS_INODE_SYNC;
+   if (vfs_fl & S_APPEND)
+   new_fl |= BTRFS_INODE_APPEND;
+   if (vfs_fl & S_IMMUTABLE)
+   new_fl |= BTRFS_INODE_IMMUTABLE;
+   if (vfs_fl & S_NOATIME)
+   new_fl |= BTRFS_INODE_NOATIME;
+   if (vfs_fl & S_DIRSYNC)
+   new_fl |= BTRFS_INODE_DIRSYNC;
+   } while (cmpxchg(&ip->flags, old_fl, new_fl) != old_fl);
+}
+
+/*
+ * Translate btrfs internal flags BTRFS_I(inode)->flags to xflags.
+ */
+static inline unsigned int btrfs_flags_to_xflags(unsigned int flags)
+{
+   unsigned int xflags = 0;
+
+   if (flags & BTRFS_INODE_SYNC)
+   xflags |= FS_XFLAG_SYNC;
+   if (flags & BTRFS_INODE_IMMUTABLE)
+   xflags |= FS_XFLAG_IMMUTABLE;
+   if (flags & BTRFS_INODE_APPEND)
+   xflags |= FS_XFLAG_APPEND;
+   if (flags & BTRFS_INODE_NODUMP)
+   xflags |= FS_XFLAG_NODUMP;
+   if (flags & BTRFS_INODE_NOATIME)
+   xflags |= FS_XFLAG_NOATIME;
+
+   return xflags;
+}
+
+/*
+ * Transfer xflags flags to ioctl flags.
+ */
+static inline unsigned int btrfs_xflags_to_ioctl(unsigned int xflags)
+{
+   unsigned int flags = 0;
+
+   if (xflags & FS_XFLAG_SYNC)
+   flags |= FS_SYNC_FL;
+   if (xflags & FS_XFLAG_IMMUTABLE)
+   flags |= FS_IMMUTABLE_FL;
+   if (xflags & FS_XFLAG_APPEND)
+   flags |= FS_APPEND_FL;
+   if (xflags & FS_XFLAG_NODUMP)
+   flags |= FS_NODUMP_FL;
+   if (xflags & FS_XFLAG_NOATIME)
+   flags |= FS_NOATIME_FL;
+
+   return flags;
+}
+
+/*
  * Inherit flags from the parent inode.
  *
  * Currently only the compression flags and the cow flags are inherited.
@@ -5511,6 +5599,60 @@ static int btrfs_ioctl_set_features(struct file *file, 
void __user *arg)
return ret;
 }
 
+static int btrfs_ioctl_fsgetxattr(struct file *file, void __user *arg)
+{
+   struct fsxattr fa;
+   struct btrfs_inode *ip = BTRFS_I(file_inode(file));
+
+   memset(&fa, 0, sizeof(struct fsxattr));
+   btrfs_get_inode_flags(ip);
+   fa.fsx_xflags = btrfs_flags_to_xflags(ip->flags);
+
+   if (copy_to_user(arg, &fa, sizeof(fa)))
+   return -EFAULT;
+
+   return 0;
+}
+
+static int btrfs_ioctl_fssetxattr(struct file *file, void __user *arg)
+{
+   struct inode *inode = file_inode(file);
+   struct btrfs_inode *ip = BTRFS_I(inode);
+   struct btrfs_root *root = ip->root;
+   struct fsxattr fa;
+   unsigned int flags;
+   int err;
+
+   /* Make sure caller has proper permission */
+   if (!inode_owner_or_capable(inode))
+   return -EPERM;
+
+   if (btrfs_root_readonly(root))
+   return -EROFS;
+
+   memset(&fa, 0, sizeof(struct fsxattr));
+   if (copy_from_user(&fa, arg, sizeof(fa)))
+   return -EFAULT;
+
+   flags = btrfs_xflags_to_ioctl(fa.fsx_xflags);
+
+   if (btrfs_mask_flags(inode->i_mode, flags) != flags)
+   ret

Re: [PATCH] generic/311: Disable dmesg check

2017-02-22 Thread Chandan Rajendra
On Monday, February 20, 2017 11:03:11 PM Anand Jain wrote:
> 
> Hi Chandan,
> 
> On 07/17/15 12:56, Chandan Rajendra wrote:
> > When running generic/311 on Btrfs' subpagesize-blocksize patchset (on ppc64
> > with 4k sectorsize and 16k node/leaf size) I noticed the following call 
> > trace,
> >
> > BTRFS (device dm-0): parent transid verify failed on 29720576 wanted 160 
> > found 158
> > BTRFS (device dm-0): parent transid verify failed on 29720576 wanted 160 
> > found 158
> > BTRFS: Transaction aborted (error -5)
> >
> > WARNING: at /root/repos/linux/fs/btrfs/super.c:260
> > Modules linked in:
> > CPU: 3 PID: 30769 Comm: umount Tainted: GWL 
> > 4.0.0-rc5-11671-g8b82e73e #63
> > task: c00079aaddb0 ti: c00079a48000 task.ti: c00079a48000
> > NIP: c0499aa0 LR: c0499a9c CTR: c0779630
> > REGS: c00079a4b480 TRAP: 0700   Tainted: GW   L   
> > (4.0.0-rc5-11671-g8b82e73e)
> > MSR: 800100029032   CR: 28008828  XER: 2000
> > CFAR: c0a23914 SOFTE: 1
> > GPR00: c0499a9c c00079a4b700 c103bdf8 0025
> > GPR04: 0001 0502 c107e918 0cda
> > GPR08: 0007 0007 0001 c10f5044
> > GPR12: 28008822 cfdc0d80 2000 10152e00
> > GPR16: 010002979380 10140724  
> > GPR20:    
> > GPR24: c000151f61a8  c00055e5e800 c0aac270
> > GPR28: 04a4 fffb c00055e5e800 c000679204d0
> > NIP [c0499aa0] .__btrfs_abort_transaction+0x180/0x190
> > LR [c0499a9c] .__btrfs_abort_transaction+0x17c/0x190
> > Call Trace:
> > [c00079a4b700] [c0499a9c] 
> > .__btrfs_abort_transaction+0x17c/0x190 (unreliable)
> > [c00079a4b7a0] [c0541678] .__btrfs_run_delayed_items+0xe8/0x220
> > [c00079a4b850] [c04d5b3c] .btrfs_commit_transaction+0x37c/0xca0
> > [c00079a4b960] [c049824c] .btrfs_sync_fs+0x6c/0x1a0
> > [c00079a4ba00] [c0255270] .sync_filesystem+0xd0/0x100
> > [c00079a4ba80] [c0218070] .generic_shutdown_super+0x40/0x170
> > [c00079a4bb10] [c0218598] .kill_anon_super+0x18/0x30
> > [c00079a4bb90] [c0498418] .btrfs_kill_super+0x18/0xc0
> > [c00079a4bc10] [c0218ac8] .deactivate_locked_super+0x98/0xe0
> > [c00079a4bc90] [c023e744] .cleanup_mnt+0x54/0xa0
> > [c00079a4bd10] [c00b7d14] .task_work_run+0x114/0x150
> > [c00079a4bdb0] [c0015f84] .do_notify_resume+0x74/0x80
> > [c00079a4be30] [c0009838] .ret_from_except_lite+0x64/0x68
> > Instruction dump:
> > ebc1fff0 ebe1fff8 4bfffb28 6000 3ce2ffcd 38e7e818 4bbc 3c62ffd2
> > 7fa4eb78 3863b808 48589e1d 6000 <0fe0> 4bfffedc 6000 6000
> > BTRFS: error (device dm-0) in __btrfs_run_delayed_items:1188: errno=-5 IO 
> > failure
> >
> >
> > The call trace is seen when executing _run_test() for the 8th time.
> > The above trace is actually a false-positive failure as indicated below,
> >  fsync-tester
> >fsync(fd)
> >Write delayed inode item to fs tree
> >  (assume transid to be 160)
> >  (assume tree block to start at logical address 29720576)
> >  md5sum $testfile
> >This causes a delayed inode to be added
> >  Load flakey table
> >i.e. drop writes that are initiated from now onwards
> >  Unmount filesystem
> >btrfs_sync_fs is invoked
> >  Write 29720576 metadata block to disk
> >  free_extent_buffer(29720576)
> >release_extent_buffer(29720576)
> >Start writing delayed inode
> >  Traverse the fs tree
> >(assume the parent tree block of 29720576 is still in memory)
> >When reading 29720576 from disk, parent's blkptr will have generation
> >set to 160. But the on-disk tree block will have an older
> >generation (say, 158). Transid verification fails and hence the
> >transaction gets aborted
> >
> > The test only cares about the FS instance before the unmount
> > operation (i.e. the synced FS). Hence to get the test to pass, ignore the
> > false-positive trace that could be generated.
> 
>   Looks like this patch didn't make it, is there any kernel patch
>   which fixed this bug ? Or any hints on how to reproduce this bug ?
> 

Hi Anand,


Re: [PATCH] Btrfs: fix wrong argument for btrfs_lookup_ordered_range

2017-01-25 Thread Chandan Rajendra
On Tuesday, January 24, 2017 03:58:51 PM Liu Bo wrote:
> Commit "d0b7da88 Btrfs: btrfs_page_mkwrite: Reserve space in sectorsized 
> units"
> did this, but btrfs_lookup_ordered_range expects a 'length' rather than a
> 'page_end'.
> 
> Signed-off-by: Liu Bo 
> ---
> Is this a candidate for stable?
> 
>  fs/btrfs/inode.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 4e02426..366cf0b 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -9023,7 +9023,7 @@ int btrfs_page_mkwrite(struct vm_area_struct *vma, 
> struct vm_fault *vmf)
>* we can't set the delalloc bits if there are pending ordered
>* extents.  Drop our locks and wait for them to finish
>*/
> - ordered = btrfs_lookup_ordered_range(inode, page_start, page_end);
> + ordered = btrfs_lookup_ordered_range(inode, page_start, PAGE_SIZE);
>   if (ordered) {
>   unlock_extent_cached(io_tree, page_start, page_end,
>&cached_state, GFP_NOFS);
> 

Thanks for fixing this,
Reviewed-by: Chandan Rajendra 

As for the question about whether this commit should be merged into the stable
trees ... I am not sure about that since I don't notice any sort of filesystem
corruption that can be caused by the current code i.e. With the existing code,
apart from any ordered extents that map the page in question, we are most
likely to be *unnecessarily* starting i/o on ordered extents that don't map
the file offset range covered by the page. Chris, Josef or David, Please let
us know your thoughts on this.

-- 
chandan

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs/012: Enable test to be executed on non-4k block size filesystems

2016-12-26 Thread Chandan Rajendra
To get the test to work on non-4k block sized filesystems, this commit
obtains the block size of the Btrfs filesystem from $TEST_DIR.

Signed-off-by: Chandan Rajendra 
---
 tests/btrfs/012 | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tests/btrfs/012 b/tests/btrfs/012
index b39dec0..cbd3882 100755
--- a/tests/btrfs/012
+++ b/tests/btrfs/012
@@ -63,8 +63,10 @@ _require_command "$E2FSCK_PROG" e2fsck
 
 rm -f $seqres.full
 
+BLOCK_SIZE=`get_block_size $TEST_DIR`
+
 # Create & populate an ext4 filesystem
-$MKFS_EXT4_PROG -F -b 4096 $SCRATCH_DEV > $seqres.full 2>&1 || \
+$MKFS_EXT4_PROG -F -b $BLOCK_SIZE $SCRATCH_DEV > $seqres.full 2>&1 || \
_notrun "Could not create ext4 filesystem"
 # Manual mount so we don't use -t btrfs or selinux context
 mount -t ext4 $SCRATCH_DEV $SCRATCH_MNT
-- 
2.5.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: Fix deadlock between direct IO and fast fsync

2016-12-23 Thread Chandan Rajendra
On Friday, December 23, 2016 04:18:00 PM Liu Bo wrote:
> On Fri, Dec 23, 2016 at 05:27:55PM +0530, Chandan Rajendra wrote:
> > On Friday, December 23, 2016 03:57:40 PM Chandan Rajendra wrote:
> > > On Friday, December 23, 2016 03:00:18 PM Chandan Rajendra wrote:
> > > > The following deadlock is seen when executing generic/113 test,
> > > > 
> > > >  
> > > > -+
> > > >   Direct I/O task   Fast fsync 
> > > > task
> > > >  
> > > > -+
> > > >   btrfs_direct_IO
> > > > __blockdev_direct_IO
> > > >  do_blockdev_direct_IO
> > > >   do_direct_IO
> > > >btrfs_get_blocks_direct
> > > > while (blocks needs to written)
> > > >  get_more_blocks (first iteration)
> > > >   btrfs_get_blocks_direct
> > > >btrfs_create_dio_extent
> > > >  down_read(&BTRFS_I(inode) >dio_sem)
> > > >  Create and add extent map and ordered extent
> > > >  up_read(&BTRFS_I(inode) >dio_sem)
> > > > 
> > > > btrfs_sync_file
> > > >   
> > > > btrfs_log_dentry_safe
> > > >
> > > > btrfs_log_inode_parent
> > > > 
> > > > btrfs_log_inode
> > > >  
> > > > btrfs_log_changed_extents
> > > >   
> > > > down_write(&BTRFS_I(inode) >dio_sem)
> > > >
> > > > Collect new extent maps and ordered extents
> > > > 
> > > > wait for ordered extent completion
> > > >  get_more_blocks (second iteration)
> > > >   btrfs_get_blocks_direct
> > > >btrfs_create_dio_extent
> > > >  down_read(&BTRFS_I(inode) >dio_sem)
> > > >  
> > > > --
> > > > 
> > > > In the above description, Btrfs direct I/O code path has not yet started
> > > > submitting bios for file range covered by the initial ordered
> > > > extent. Meanwhile, The fast fsync task obtains the write semaphore and
> > > > waits for I/O on the ordered extent to get completed. However, the
> > > > Direct I/O task is now blocked on obtaining the read semaphore.
> > > > 
> > > > To resolve the deadlock, this commit modifies the Direct I/O code path
> > > > to obtain the read semaphore before invoking
> > > > __blockdev_direct_IO(). The semaphore is then given up after
> > > > __blockdev_direct_IO() returns. This allows the Direct I/O code to
> > > > complete I/O on all the ordered extents it creates.
> > > >
> > > 
> > > Btw, I was able to reproduce the issue on kdave/for-next branch with 
> > > "Merge
> > > branch 'for-next-next-4.9-20161125' into for-next-20161125" as the topmost
> > > commit. The issue cannot be reproduced yet on latest code available from
> > > kdave/for-next branch.
> > > 
> > >
> > 
> > Maybe changes in upstream might have masked the issue in the recent
> > kdave/for-next branch. I say that because 'git bisect' resulted in the
> > following commit ...
> 
> I guess that the for-next branch didn't revert this patch[1] as upstream
> did, so that generic/113 would complain, however, even w/o that patch,
> this fix is still required since the deadlock could be reproduced by
> running generic/113 with '-ofragment=data' and in fact Filipe has
> proposed a almost same fix but not a real patch in this thread [2].
> 
> [1]: Btrfs: adjust len of writes if following a preallocated extent
> https://patchwork.kernel.org/patch/9413129/
> [2]: https://patchwork.kernel.org/patch/9445231/
>

Ah ok. Thanks for pointing it out.

-- 
chandan

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: Fix deadlock between direct IO and fast fsync

2016-12-23 Thread Chandan Rajendra
On Friday, December 23, 2016 03:57:40 PM Chandan Rajendra wrote:
> On Friday, December 23, 2016 03:00:18 PM Chandan Rajendra wrote:
> > The following deadlock is seen when executing generic/113 test,
> > 
> >  
> > -+
> >   Direct I/O task   Fast fsync task
> >  
> > -+
> >   btrfs_direct_IO
> > __blockdev_direct_IO
> >  do_blockdev_direct_IO
> >   do_direct_IO
> >btrfs_get_blocks_direct
> > while (blocks needs to written)
> >  get_more_blocks (first iteration)
> >   btrfs_get_blocks_direct
> >btrfs_create_dio_extent
> >  down_read(&BTRFS_I(inode) >dio_sem)
> >  Create and add extent map and ordered extent
> >  up_read(&BTRFS_I(inode) >dio_sem)
> > btrfs_sync_file
> >   
> > btrfs_log_dentry_safe
> >
> > btrfs_log_inode_parent
> > 
> > btrfs_log_inode
> >  
> > btrfs_log_changed_extents
> >   
> > down_write(&BTRFS_I(inode) >dio_sem)
> >Collect 
> > new extent maps and ordered extents
> > wait 
> > for ordered extent completion
> >  get_more_blocks (second iteration)
> >   btrfs_get_blocks_direct
> >btrfs_create_dio_extent
> >  down_read(&BTRFS_I(inode) >dio_sem)
> >  
> > --
> > 
> > In the above description, Btrfs direct I/O code path has not yet started
> > submitting bios for file range covered by the initial ordered
> > extent. Meanwhile, The fast fsync task obtains the write semaphore and
> > waits for I/O on the ordered extent to get completed. However, the
> > Direct I/O task is now blocked on obtaining the read semaphore.
> > 
> > To resolve the deadlock, this commit modifies the Direct I/O code path
> > to obtain the read semaphore before invoking
> > __blockdev_direct_IO(). The semaphore is then given up after
> > __blockdev_direct_IO() returns. This allows the Direct I/O code to
> > complete I/O on all the ordered extents it creates.
> >
> 
> Btw, I was able to reproduce the issue on kdave/for-next branch with "Merge
> branch 'for-next-next-4.9-20161125' into for-next-20161125" as the topmost
> commit. The issue cannot be reproduced yet on latest code available from
> kdave/for-next branch.
> 
>

Maybe changes in upstream might have masked the issue in the recent
kdave/for-next branch. I say that because 'git bisect' resulted in the
following commit ...

e3597e6090ddf40904dce6d0a5a404e2c490cac6
Author: Chris Mason 
AuthorDate: Tue Nov 1 12:54:45 2016 -0700
Commit: Chris Mason 
CommitDate: Tue Nov 1 12:54:45 2016 -0700

Parent: 570dd45 btrfs: fix races on root_log_ctx lists
Parent: 9d1032c btrfs: fix WARNING in btrfs_select_ref_head()
Merged: btrfs-next-for-linus-4.8 kdave-master linus-v4.7-rc6 local-v4.7-rc4
Containing: direct-io-fsync-deadlock kdave-for-next
Follows:v4.8-rc8 (57)
Precedes:   next-20161219 (30006)

Merge branch 'for-4.9-rc3' of 
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.9

5 files changed, 29 insertions(+), 9 deletions(-)
fs/btrfs/extent-tree.c |  3 +++
fs/btrfs/extent_io.c   |  8 
fs/btrfs/inode.c   | 13 +
fs/btrfs/ioctl.c   |  5 +
fs/btrfs/relocation.c  |  9 -

-- 
chandan

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: Fix deadlock between direct IO and fast fsync

2016-12-23 Thread Chandan Rajendra
On Friday, December 23, 2016 03:00:18 PM Chandan Rajendra wrote:
> The following deadlock is seen when executing generic/113 test,
> 
>  
> -+
>   Direct I/O task   Fast fsync task
>  
> -+
>   btrfs_direct_IO
> __blockdev_direct_IO
>  do_blockdev_direct_IO
>   do_direct_IO
>btrfs_get_blocks_direct
> while (blocks needs to written)
>  get_more_blocks (first iteration)
>   btrfs_get_blocks_direct
>btrfs_create_dio_extent
>  down_read(&BTRFS_I(inode) >dio_sem)
>  Create and add extent map and ordered extent
>  up_read(&BTRFS_I(inode) >dio_sem)
> btrfs_sync_file
>   
> btrfs_log_dentry_safe
>
> btrfs_log_inode_parent
> 
> btrfs_log_inode
>  
> btrfs_log_changed_extents
>   
> down_write(&BTRFS_I(inode) >dio_sem)
>Collect 
> new extent maps and ordered extents
> wait for 
> ordered extent completion
>  get_more_blocks (second iteration)
>   btrfs_get_blocks_direct
>btrfs_create_dio_extent
>  down_read(&BTRFS_I(inode) >dio_sem)
>  
> --
> 
> In the above description, Btrfs direct I/O code path has not yet started
> submitting bios for file range covered by the initial ordered
> extent. Meanwhile, The fast fsync task obtains the write semaphore and
> waits for I/O on the ordered extent to get completed. However, the
> Direct I/O task is now blocked on obtaining the read semaphore.
> 
> To resolve the deadlock, this commit modifies the Direct I/O code path
> to obtain the read semaphore before invoking
> __blockdev_direct_IO(). The semaphore is then given up after
> __blockdev_direct_IO() returns. This allows the Direct I/O code to
> complete I/O on all the ordered extents it creates.
>

Btw, I was able to reproduce the issue on kdave/for-next branch with "Merge
branch 'for-next-next-4.9-20161125' into for-next-20161125" as the topmost
commit. The issue cannot be reproduced yet on latest code available from
kdave/for-next branch.

-- 
chandan

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: Fix deadlock between direct IO and fast fsync

2016-12-23 Thread Chandan Rajendra
The following deadlock is seen when executing generic/113 test,

 
-+
  Direct I/O task   Fast fsync task
 
-+
  btrfs_direct_IO
__blockdev_direct_IO
 do_blockdev_direct_IO
  do_direct_IO
   btrfs_get_blocks_direct
while (blocks needs to written)
 get_more_blocks (first iteration)
  btrfs_get_blocks_direct
   btrfs_create_dio_extent
 down_read(&BTRFS_I(inode) >dio_sem)
 Create and add extent map and ordered extent
 up_read(&BTRFS_I(inode) >dio_sem)
btrfs_sync_file
  
btrfs_log_dentry_safe
   
btrfs_log_inode_parent
btrfs_log_inode
 
btrfs_log_changed_extents
  
down_write(&BTRFS_I(inode) >dio_sem)
   Collect new 
extent maps and ordered extents
wait for 
ordered extent completion
 get_more_blocks (second iteration)
  btrfs_get_blocks_direct
   btrfs_create_dio_extent
 down_read(&BTRFS_I(inode) >dio_sem)
 
--

In the above description, Btrfs direct I/O code path has not yet started
submitting bios for file range covered by the initial ordered
extent. Meanwhile, The fast fsync task obtains the write semaphore and
waits for I/O on the ordered extent to get completed. However, the
Direct I/O task is now blocked on obtaining the read semaphore.

To resolve the deadlock, this commit modifies the Direct I/O code path
to obtain the read semaphore before invoking
__blockdev_direct_IO(). The semaphore is then given up after
__blockdev_direct_IO() returns. This allows the Direct I/O code to
complete I/O on all the ordered extents it creates.

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/inode.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 5ca88f0..f796037 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7325,7 +7325,6 @@ static struct extent_map *btrfs_create_dio_extent(struct 
inode *inode,
struct extent_map *em = NULL;
int ret;
 
-   down_read(&BTRFS_I(inode)->dio_sem);
if (type != BTRFS_ORDERED_NOCOW) {
em = create_pinned_em(inode, start, len, orig_start,
  block_start, block_len, orig_block_len,
@@ -7344,7 +7343,6 @@ static struct extent_map *btrfs_create_dio_extent(struct 
inode *inode,
em = ERR_PTR(ret);
}
  out:
-   up_read(&BTRFS_I(inode)->dio_sem);
 
return em;
 }
@@ -8800,6 +8798,7 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct 
iov_iter *iter)
dio_data.unsubmitted_oe_range_start = (u64)offset;
dio_data.unsubmitted_oe_range_end = (u64)offset;
current->journal_info = &dio_data;
+   down_read(&BTRFS_I(inode)->dio_sem);
} else if (test_bit(BTRFS_INODE_READDIO_NEED_LOCK,
 &BTRFS_I(inode)->runtime_flags)) {
inode_dio_end(inode);
@@ -8812,6 +8811,7 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct 
iov_iter *iter)
   iter, btrfs_get_blocks_direct, NULL,
   btrfs_submit_direct, flags);
if (iov_iter_rw(iter) == WRITE) {
+   up_read(&BTRFS_I(inode)->dio_sem);
current->journal_info = NULL;
if (ret < 0 && ret != -EIOCBQUEUED) {
if (dio_data.reserve)
-- 
2.5.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 5/6] btrfs-progs: convert: Switch to new rollback function

2016-12-19 Thread Chandan Rajendra
On Monday, December 19, 2016 02:56:41 PM Qu Wenruo wrote:
> Since we have the whole facilities needed to rollback, switch to the new
> rollback.
> 
> The new rollback function can handle the following things that old
> rollback either can't handle or just refuse to rollback:
> 
> 1) New converted btrfs which allocates new data chunk
>This is due to the too restrict may_roll_back() condition, which is
>never a good friend for new convert behavior.
> 
>The new rollback behavior fixes it by not checking data chunks, but
>only to check the file extents of the convert image file.
> 
>If all file extents except the ones in reserved ranges, then we allow
>rollback.
> 
> 2) New converted btrfs which enabled NO_HOLES feature
>Thanks to previous patches, we can convert to real NO_HOLES btrfs.
> 
>And since old rollback assumes that file extents and holes covers the
>whole image file, it will fail due to the non-exists holes.
> 
>Fix it by iterating file extents of convert image, and only compare
>the size we checked against file size if NO_HOLES is not enabled.
> 
> And makes rollback function simpler:
> 
> 1) Read-n-write vs extra chunk tree relocation
>Since converted btrfs only has 3 ranges that are not 1:1 mapped, just
>read this data out, and close btrfs, finally write them into position
>will be good enough, for both new convert and old convert.
>(To be more specific, old convert is just a subset of more universal
> new convert behavior)
> 
>No extra work is needed any more, and we can even open the btrfs RO.

Thanks for fixing this. The patchset works fine on ppc64 and x86_64.

Tested-by: Chandan Rajendra 

-- 
chandan

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/3] btrfs-progs: convert: Rework rollback to handle new convert image

2016-12-15 Thread Chandan Rajendra
On Thursday, December 15, 2016 05:03:30 PM Qu Wenruo wrote:
> Although commit 9c4b820412746b3 tried to make the rollback condition
> less restrict, to co-operate with new rollback behavior, it's still too
> restrict.
> 
> If btrfs allocates a new data chunk, it's highly possible that the new
> chunk will not be 1:1 mapped anymore.
> 
> And this makes old rollback check fails, and refuse to rollback.
> 
> This patch rework it by checking rollback condition more accurately.
> 
> 1) Rollback condition
>Unlike old chunk level check, we use file extent level check.
>So we manually check every file extents of convert image file.
> 
>Only when all file extents except ones in btrfs relocated ranges(*)
>are mapped 1:1 we allow rollback.
> 
>This behavior make both old and new behavior happy.
> *:
>[0, 1M)
>[btrfs_sb_offset(1), +64K)
>[btrfs_sb_offset(2), +64K)
> 
> 2) Rollback method
>Old rollback method is quite complex, using extent_io tree to mark
>every checked ranges.
>And do extra chunk tree operation before rollback.
> 
>The new rollback method is quite simple.
>1) open btrfs
>2) read and save relocated data
>3) close btrfs
>4) write relocated into place.
> 
> Such rework fixes the following problem
> 1) rollback failure after new data chunk allocation
> 2) rollback failure after correct NO_HOLES convert

Hi Qu,

With this patch applied, I get the following on an x86_64 machine,

[root@localhost]~/btrfs-progs# btrfs-convert -r /dev/loop0
ctree.c:1112: btrfs_search_slot: Warning: assertion `p->nodes[0] != NULL` 
failed, value 1
btrfs-convert(btrfs_search_slot+0x117)[0x40c906]
btrfs-convert(btrfs_lookup_dir_item+0x70)[0x41d902]
btrfs-convert(main+0x5e2)[0x43af50]
/lib64/libc.so.6(__libc_start_main+0xf0)[0x7f7fb6168700]
btrfs-convert(_start+0x29)[0x408a69]
extent buffer leak: start 67305472 len 16384
rollback complete

The same error occurs on a ppc64 machine when using 64k sectorsize.

The three 'rollback' patches were applied on top of commit
9ce512ac57cb08edf2f742da085c383834f804dd (i.e. btrfs-progs: check: Fix false
alert on generation mismatch for tree reloc tree) that is available on David's
devel branch.

-- 
chandan

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] btrfs: qgroup: Fix qgroup reserved space underflow by only freeing reserved ranges

2016-12-13 Thread Chandan Rajendra
On Friday, December 02, 2016 10:03:07 AM Qu Wenruo wrote:
> [BUG]
> For the following case, btrfs can underflow qgroup reserved space
> at error path:
> (Page size 4K, function name without "btrfs_" prefix)
> 
>  Task A  | Task B
> --
> Buffered_write [0, 2K)   |
> |- check_data_free_space()   |
> |  |- qgroup_reserve_data()  |
> | Range aligned to page  |
> | range [0, 4K)  <<< |
> | 4K bytes reserved  <<< |
> |- copy pages to page cache  |
>  | Buffered_write [2K, 4K)
>  | |- check_data_free_space()
>  | |  |- qgroup_reserved_data()
>  | | Range alinged to page
>  | | range [0, 4K)
>  | | Already reserved by A <<<
>  | | 0 bytes reserved  <<<
>  | |- delalloc_reserve_metadata()
>  | |  And it *FAILED* (Maybe EQUOTA)
>  | |- free_reserved_data_space()
>   |- qgroup_free_data()
>  Range aligned to page range
>  [0, 4K)
>  Freeing 4K
> (Special thanks to Chandan for the detailed report and analyse)
> 
> [CAUSE]
> Above Task B is freeing reserved data range [0, 4K) which is actually
> reserved by Task A.
> 
> And at write back time, page dirty by Task A will go through writeback
> routine, which will free 4K reserved data space at file extent insert
> time, causing the qgroup underflow.
> 
> [FIX]
> For btrfs_qgroup_free_data(), add @reserved parameter to only free
> data ranges reserved by previous btrfs_qgroup_reserve_data().
> So in above case, Task B will try to free 0 byte, so no underflow.
>

The changes look good to me. Also, I did not notice any regressions when
executing fstests with the patch applied.

Reviewed-by: Chandan Rajendra 
Tested-by: Chandan Rajendra 

-- 
chandan

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] btrfs: qgroup: Introduce extent changeset for qgroup reserve functions

2016-12-13 Thread Chandan Rajendra
On Friday, December 02, 2016 10:03:06 AM Qu Wenruo wrote:
> Introduce a new parameter, struct extent_changeset for
> btrfs_qgroup_reserved_data() and its callers.
> 
> Such extent_changeset was used in btrfs_qgroup_reserve_data() to record
> which range it reserved in current reserve, so it can free it at error
> path.
> 
> The reason we need to export it to callers is, at buffered write error
> path, without knowing what exactly which range we reserved in current
> allocation, we can free space which is not reserved by us.
> 
> This will lead to qgroup reserved space underflow.

The changes look good to me.

Reviewed-by: Chandan Rajendra 

-- 
chandan

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] btrfs-convert: Fix migrate_super_block() to work with 64k sectorsize

2016-12-08 Thread Chandan Rajendra
On Friday, December 09, 2016 09:09:29 AM Qu Wenruo wrote:
> 
> At 12/08/2016 09:56 PM, Chandan Rajendra wrote:
> > migrate_super_block() uses sectorsize to refer to the size of the
> > superblock. Hence on 64k sectorsize filesystems, it ends up computing
> > checksum beyond the super block length (i.e.
> > BTRFS_SUPER_INFO_SIZE). This commit fixes the bug by using
> > BTRFS_SUPER_INFO_SIZE instead of sectorsize of the underlying
> > filesystem.
> >
> > Signed-off-by: Chandan Rajendra 
> 
> Reviewed-by: Qu Wenruo 
> 
> BTW would you please enhance the convert tests?
> Current convert tests only uses 4K as block size.
> So adding 64K blocksize would definitely improve the tests.
>

Thanks for the hint. I just executed btrfs/012 with 64k blocksize hardcoded
and found that 'btrfs rollback' failed. I will fix rollback first and then
work on getting btrfs/012 to support 64k blocksize.

-- 
chandan

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] btrfs-progs: btrfs-convert: Prevent accounting blocks beyond end of device

2016-12-08 Thread Chandan Rajendra
On Friday, December 09, 2016 09:03:57 AM Qu Wenruo wrote:
> Hi Chandan,
> 
> Thanks for the patch.
> 
> At 12/08/2016 09:56 PM, Chandan Rajendra wrote:
> > When looping across data block bitmap, __ext2_add_one_block() may add
> > blocks which do not exist on the underlying disk. This commit prevents
> > this from happening by checking the block index against the maximum
> > block count that was present in the ext4 filesystem instance that is
> > being converted.
> 
> The patch looks good to me.
> 
> Reviewed-by: Qu Wenruo 
> 
> Just curious about if such image can pass e2fsck.
> And would you please upload a minimal image as btrfs-progs test case?
> 

Hi Qu,

Such an ext4 filesystem can be consistently created on ppc64 with 64k as the
blocksize of the filesystem. Also, the filesystem thus created passes e2fsck.

-- 
chandan

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] btrfs: qgroup: Introduce extent changeset for qgroup reserve functions

2016-12-08 Thread Chandan Rajendra
On Friday, December 02, 2016 10:03:06 AM Qu Wenruo wrote:
> Introduce a new parameter, struct extent_changeset for
> btrfs_qgroup_reserved_data() and its callers.
> 
> Such extent_changeset was used in btrfs_qgroup_reserve_data() to record
> which range it reserved in current reserve, so it can free it at error
> path.
> 
> The reason we need to export it to callers is, at buffered write error
> path, without knowing what exactly which range we reserved in current
> allocation, we can free space which is not reserved by us.
> 
> This will lead to qgroup reserved space underflow.
>

Hi Qu,

On which git tree are these patches based off? This patch fails to apply
cleanly on kdave/for-next branch that is available as of today.

-- 
chandan

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] btrfs-convert: Fix migrate_super_block() to work with 64k sectorsize

2016-12-08 Thread Chandan Rajendra
migrate_super_block() uses sectorsize to refer to the size of the
superblock. Hence on 64k sectorsize filesystems, it ends up computing
checksum beyond the super block length (i.e.
BTRFS_SUPER_INFO_SIZE). This commit fixes the bug by using
BTRFS_SUPER_INFO_SIZE instead of sectorsize of the underlying
filesystem.

Signed-off-by: Chandan Rajendra 
---
 convert/main.c | 23 ---
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/convert/main.c b/convert/main.c
index 1148a36..fd6f77b 100644
--- a/convert/main.c
+++ b/convert/main.c
@@ -1360,7 +1360,7 @@ err:
 /*
  * Migrate super block to its default position and zero 0 ~ 16k
  */
-static int migrate_super_block(int fd, u64 old_bytenr, u32 sectorsize)
+static int migrate_super_block(int fd, u64 old_bytenr)
 {
int ret;
struct extent_buffer *buf;
@@ -1368,13 +1368,13 @@ static int migrate_super_block(int fd, u64 old_bytenr, 
u32 sectorsize)
u32 len;
u32 bytenr;
 
-   buf = malloc(sizeof(*buf) + sectorsize);
+   buf = malloc(sizeof(*buf) + BTRFS_SUPER_INFO_SIZE);
if (!buf)
return -ENOMEM;
 
-   buf->len = sectorsize;
-   ret = pread(fd, buf->data, sectorsize, old_bytenr);
-   if (ret != sectorsize)
+   buf->len = BTRFS_SUPER_INFO_SIZE;
+   ret = pread(fd, buf->data, BTRFS_SUPER_INFO_SIZE, old_bytenr);
+   if (ret != BTRFS_SUPER_INFO_SIZE)
goto fail;
 
super = (struct btrfs_super_block *)buf->data;
@@ -1382,19 +1382,20 @@ static int migrate_super_block(int fd, u64 old_bytenr, 
u32 sectorsize)
btrfs_set_super_bytenr(super, BTRFS_SUPER_INFO_OFFSET);
 
csum_tree_block_size(buf, BTRFS_CRC32_SIZE, 0);
-   ret = pwrite(fd, buf->data, sectorsize, BTRFS_SUPER_INFO_OFFSET);
-   if (ret != sectorsize)
+   ret = pwrite(fd, buf->data, BTRFS_SUPER_INFO_SIZE,
+   BTRFS_SUPER_INFO_OFFSET);
+   if (ret != BTRFS_SUPER_INFO_SIZE)
goto fail;
 
ret = fsync(fd);
if (ret)
goto fail;
 
-   memset(buf->data, 0, sectorsize);
+   memset(buf->data, 0, BTRFS_SUPER_INFO_SIZE);
for (bytenr = 0; bytenr < BTRFS_SUPER_INFO_OFFSET; ) {
len = BTRFS_SUPER_INFO_OFFSET - bytenr;
-   if (len > sectorsize)
-   len = sectorsize;
+   if (len > BTRFS_SUPER_INFO_SIZE)
+   len = BTRFS_SUPER_INFO_SIZE;
ret = pwrite(fd, buf->data, len, bytenr);
if (ret != len) {
fprintf(stderr, "unable to zero fill device\n");
@@ -2519,7 +2520,7 @@ static int do_convert(const char *devname, int datacsum, 
int packing,
 * If this step succeed, we get a mountable btrfs. Otherwise
 * the source fs is left unchanged.
 */
-   ret = migrate_super_block(fd, mkfs_cfg.super_bytenr, blocksize);
+   ret = migrate_super_block(fd, mkfs_cfg.super_bytenr);
if (ret) {
error("unable to migrate super block: %d", ret);
goto fail;
-- 
2.5.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] btrfs-progs: btrfs-convert: Prevent accounting blocks beyond end of device

2016-12-08 Thread Chandan Rajendra
When looping across data block bitmap, __ext2_add_one_block() may add
blocks which do not exist on the underlying disk. This commit prevents
this from happening by checking the block index against the maximum
block count that was present in the ext4 filesystem instance that is
being converted.

Signed-off-by: Chandan Rajendra 
---
 convert/main.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/convert/main.c b/convert/main.c
index 4b4cea4..1148a36 100644
--- a/convert/main.c
+++ b/convert/main.c
@@ -1525,6 +1525,9 @@ static int __ext2_add_one_block(ext2_filsys fs, char 
*bitmap,
offset /= EXT2FS_CLUSTER_RATIO(fs);
offset += group_nr * EXT2_CLUSTERS_PER_GROUP(fs->super);
for (i = 0; i < EXT2_CLUSTERS_PER_GROUP(fs->super); i++) {
+   if ((i + offset) >= ext2fs_blocks_count(fs->super))
+   break;
+
if (ext2fs_test_bit(i, bitmap)) {
u64 start;
 
-- 
2.5.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V2] btrfs-progs: Use helper function to access btrfs_super_block->sys_chunk_array_size

2016-12-01 Thread Chandan Rajendra
btrfs_super_block->sys_chunk_array_size is stored as le32 data on
disk. However insert_temp_chunk_item() writes sys_chunk_array_size in
host cpu order. This commit fixes this by using super block access
helper functions to read and write
btrfs_super_block->sys_chunk_array_size field.

Signed-off-by: Chandan Rajendra 
---
 utils.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/utils.c b/utils.c
index d0189ad..74dde1e 100644
--- a/utils.c
+++ b/utils.c
@@ -562,14 +562,18 @@ static int insert_temp_chunk_item(int fd, struct 
extent_buffer *buf,
 */
if (type & BTRFS_BLOCK_GROUP_SYSTEM) {
char *cur;
+   u32 array_size;
 
-   cur = (char *)sb->sys_chunk_array + sb->sys_chunk_array_size;
+   cur = (char *)sb->sys_chunk_array
+   + btrfs_super_sys_array_size(sb);
memcpy(cur, &disk_key, sizeof(disk_key));
cur += sizeof(disk_key);
read_extent_buffer(buf, cur, (unsigned long int)chunk,
   btrfs_chunk_item_size(1));
-   sb->sys_chunk_array_size += btrfs_chunk_item_size(1) +
+   array_size = btrfs_super_sys_array_size(sb);
+   array_size += btrfs_chunk_item_size(1) +
sizeof(disk_key);
+   btrfs_set_super_sys_array_size(sb, array_size);
 
ret = write_temp_super(fd, sb, cfg->super_bytenr);
}
-- 
2.5.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Qgroup accounting issue on kdave/for-next branch

2016-11-29 Thread Chandan Rajendra
On Tuesday, November 29, 2016 04:41:41 PM Qu Wenruo wrote:
> 
> At 11/29/2016 04:21 PM, Chandan Rajendra wrote:
> > On Tuesday, November 29, 2016 03:55:53 PM Qu Wenruo wrote:
> >> At 11/29/2016 02:36 PM, Chandan Rajendra wrote:
> >>> When executing btrfs/126 test on kdave/for-next branch on a ppc64 guest, I
> >>> noticed the following call trace.
> >>>
> >>> [   77.335887] [ cut here ]
> >>> [   77.336115] WARNING: CPU: 0 PID: 8325 at 
> >>> /root/repos/linux/fs/btrfs/qgroup.c:2443 
> >>> .btrfs_qgroup_free_refroot+0x188/0x220
> >>> [   77.336303] Modules linked in:
> >>> [   77.336393] CPU: 0 PID: 8325 Comm: umount Not tainted 
> >>> 4.9.0-rc5-00062-g6b74e43 #22
> >>> [   77.336526] task: c0062b8d4880 task.stack: c0062b9a4000
> >>> [   77.336638] NIP: c05cf018 LR: c05ceff8 CTR: 
> >>> c05390c0
> >>> [   77.336771] REGS: c0062b9a7450 TRAP: 0700   Not tainted  
> >>> (4.9.0-rc5-00062-g6b74e43)
> >>> [   77.336908] MSR: 8282b032
> >>> [   77.336960] <
> >>> [   77.337027] SF
> >>> [   77.337053] ,VEC
> >>> [   77.337087] ,VSX
> >>> [   77.337114] ,EE
> >>> [   77.337146] ,FP
> >>> [   77.337173] ,ME
> >>> [   77.337207] ,IR
> >>> [   77.337233] ,DR
> >>> [   77.337267] ,RI
> >>> [   77.337294] >
> >>> [   77.337330]   CR: 88000842  XER: 
> >>> [   77.337392] CFAR: c05c9b5c
> >>> [   77.337443] SOFTE: 1
> >>> [   77.337477]
> >>>GPR00:
> >>> [   77.337517] c05cefcc
> >>> [   77.337575] c0062b9a76d0
> >>> [   77.337626] c1103a00
> >>> [   77.337714] c0063e058d40
> >>> [   77.337765]
> >>>GPR04:
> >>> [   77.337817] c0062b9a7740
> >>> [   77.337868] 0008
> >>> [   77.337927] 0005f000
> >>> [   77.337978] 00063eed
> >>> [   77.338037]
> >>>GPR08:
> >>> [   77.338080] c0062f9629c8
> >>> [   77.338138] 0001
> >>> [   77.338191] f000
> >>> [   77.338248] c0063e058e80
> >>> [   77.338300]
> >>>GPR12:
> >>> [   77.338384] 
> >>> [   77.338435] cfe0
> >>> [   77.338498] 2000
> >>> [   77.338548] 
> >>> [   77.338605]
> >>>GPR16:
> >>> [   77.338645] 0008
> >>> [   77.338703] 4d5906fc
> >>> [   77.338754] 4d5a6c08
> >>> [   77.338811] 4d54b3d0
> >>> [   77.338866]
> >>>GPR20:
> >>> [   77.338921] 01000bcf8440
> >>> [   77.338972] 
> >>> [   77.339030] 
> >>> [   77.339080] c0062523b078
> >>> [   77.339138]
> >>>GPR24:
> >>> [   77.339178] c0062523b080
> >>> [   77.339240] c0da2b58
> >>> [   77.339290] 
> >>> [   77.339347] c0062e539600
> >>> [   77.339398]
> >>>GPR28:
> >>> [   77.339485] 0006
> >>> [   77.339536] c0062523b000
> >>> [   77.339594] c0062e539600
> >>> [   77.339644] c0062e539688
> >>>
> >>> [   77.339740] NIP [c05cf018] 
> >>> .btrfs_qgroup_free_refroot+0x188/0x220
> >>> [   77.339852] LR [c05ceff8] 
> >>> .btrfs_qgroup_free_refroot+0x168/0x220
> >>> [   77.339959] Call Trace:
> >>> [   77.339998] [c0062b9a76d0] [c05cefcc] 
> >>> .btrfs_qgroup_free_refroot+0x13c/0x220
> >>> [   77.340123]  (unreliable)
> >>> [   77.340193] [c0062b9a7790] [c054210c] 
> >>> .commit_fs_roots+0x19c/0x240
> >>> [   77.340355] [c0062b9a78a0] [c05451a0] 
> >>> .btrfs_commit_transaction.part.5+0x480/0xbe0
> >>> [   77.340554] [c0062b9a7970] [c0503bd4] 
> >>> .btrfs_sync_fs+0x74/0x1c0
> >>> [   77.340725] [c0062b9a7a10] [c02d6ce0] 
> >>> .sync_filesystem+0xd0/0x100
> >>> [   77.340891] [c0062b9a7a90] [c0

[PATCH] btrfs-progs: Use helper functions to access btrfs_super_block->sys_chunk_array_size

2016-11-29 Thread Chandan Rajendra
btrfs_super_block->sys_chunk_array_size is stored as le32 data on
disk. However insert_temp_chunk_item() writes sys_chunk_array_size in
host cpu order. This commit fixes this by using super block access
helper functions to read and write
btrfs_super_block->sys_chunk_array_size field.

Signed-off-by: Chandan Rajendra 
---
 utils.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/utils.c b/utils.c
index d0189ad..7b17b20 100644
--- a/utils.c
+++ b/utils.c
@@ -562,14 +562,17 @@ static int insert_temp_chunk_item(int fd, struct 
extent_buffer *buf,
 */
if (type & BTRFS_BLOCK_GROUP_SYSTEM) {
char *cur;
+   u32 array_size;
 
cur = (char *)sb->sys_chunk_array + sb->sys_chunk_array_size;
memcpy(cur, &disk_key, sizeof(disk_key));
cur += sizeof(disk_key);
read_extent_buffer(buf, cur, (unsigned long int)chunk,
   btrfs_chunk_item_size(1));
-   sb->sys_chunk_array_size += btrfs_chunk_item_size(1) +
+   array_size = btrfs_super_sys_array_size(sb);
+   array_size += btrfs_chunk_item_size(1) +
sizeof(disk_key);
+   btrfs_set_super_sys_array_size(sb, array_size);
 
ret = write_temp_super(fd, sb, cfg->super_bytenr);
}
-- 
2.5.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Qgroup accounting issue on kdave/for-next branch

2016-11-29 Thread Chandan Rajendra
On Tuesday, November 29, 2016 03:55:53 PM Qu Wenruo wrote:
> At 11/29/2016 02:36 PM, Chandan Rajendra wrote:
> > When executing btrfs/126 test on kdave/for-next branch on a ppc64 guest, I
> > noticed the following call trace.
> >
> > [   77.335887] [ cut here ]
> > [   77.336115] WARNING: CPU: 0 PID: 8325 at 
> > /root/repos/linux/fs/btrfs/qgroup.c:2443 
> > .btrfs_qgroup_free_refroot+0x188/0x220
> > [   77.336303] Modules linked in:
> > [   77.336393] CPU: 0 PID: 8325 Comm: umount Not tainted 
> > 4.9.0-rc5-00062-g6b74e43 #22
> > [   77.336526] task: c0062b8d4880 task.stack: c0062b9a4000
> > [   77.336638] NIP: c05cf018 LR: c05ceff8 CTR: 
> > c05390c0
> > [   77.336771] REGS: c0062b9a7450 TRAP: 0700   Not tainted  
> > (4.9.0-rc5-00062-g6b74e43)
> > [   77.336908] MSR: 8282b032
> > [   77.336960] <
> > [   77.337027] SF
> > [   77.337053] ,VEC
> > [   77.337087] ,VSX
> > [   77.337114] ,EE
> > [   77.337146] ,FP
> > [   77.337173] ,ME
> > [   77.337207] ,IR
> > [   77.337233] ,DR
> > [   77.337267] ,RI
> > [   77.337294] >
> > [   77.337330]   CR: 88000842  XER: 
> > [   77.337392] CFAR: c05c9b5c
> > [   77.337443] SOFTE: 1
> > [   77.337477]
> >GPR00:
> > [   77.337517] c05cefcc
> > [   77.337575] c0062b9a76d0
> > [   77.337626] c1103a00
> > [   77.337714] c0063e058d40
> > [   77.337765]
> >GPR04:
> > [   77.337817] c0062b9a7740
> > [   77.337868] 0008
> > [   77.337927] 0005f000
> > [   77.337978] 00063eed
> > [   77.338037]
> >GPR08:
> > [   77.338080] c0062f9629c8
> > [   77.338138] 0001
> > [   77.338191] f000
> > [   77.338248] c0063e058e80
> > [   77.338300]
> >GPR12:
> > [   77.338384] 
> > [   77.338435] cfe0
> > [   77.338498] 2000
> > [   77.338548] 
> > [   77.338605]
> >GPR16:
> > [   77.338645] 0008
> > [   77.338703] 4d5906fc
> > [   77.338754] 4d5a6c08
> > [   77.338811] 4d54b3d0
> > [   77.338866]
> >GPR20:
> > [   77.338921] 01000bcf8440
> > [   77.338972] 
> > [   77.339030] 
> > [   77.339080] c0062523b078
> > [   77.339138]
> >GPR24:
> > [   77.339178] c0062523b080
> > [   77.339240] c0da2b58
> > [   77.339290] 
> > [   77.339347] c0062e539600
> > [   77.339398]
> >GPR28:
> > [   77.339485] 0006
> > [   77.339536] c0062523b000
> > [   77.339594] c0062e539600
> > [   77.339644] c0062e539688
> >
> > [   77.339740] NIP [c05cf018] .btrfs_qgroup_free_refroot+0x188/0x220
> > [   77.339852] LR [c05ceff8] .btrfs_qgroup_free_refroot+0x168/0x220
> > [   77.339959] Call Trace:
> > [   77.339998] [c0062b9a76d0] [c05cefcc] 
> > .btrfs_qgroup_free_refroot+0x13c/0x220
> > [   77.340123]  (unreliable)
> > [   77.340193] [c0062b9a7790] [c054210c] 
> > .commit_fs_roots+0x19c/0x240
> > [   77.340355] [c0062b9a78a0] [c05451a0] 
> > .btrfs_commit_transaction.part.5+0x480/0xbe0
> > [   77.340554] [c0062b9a7970] [c0503bd4] 
> > .btrfs_sync_fs+0x74/0x1c0
> > [   77.340725] [c0062b9a7a10] [c02d6ce0] 
> > .sync_filesystem+0xd0/0x100
> > [   77.340891] [c0062b9a7a90] [c0294f88] 
> > .generic_shutdown_super+0x38/0x1a0
> > [   77.341052] [c0062b9a7b10] [c0295508] 
> > .kill_anon_super+0x18/0x30
> > [   77.342243] [c0062b9a7b90] [c0503db8] 
> > .btrfs_kill_super+0x18/0xd0
> > [   77.342411] [c0062b9a7c10] [c02958c8] 
> > .deactivate_locked_super+0x98/0xe0
> > [   77.342573] [c0062b9a7c90] [c02be114] .cleanup_mnt+0x54/0xa0
> > [   77.342723] [c0062b9a7d10] [c00d7e74] 
> > .task_work_run+0x124/0x180
> > [   77.342862] [c0062b9a7db0] [c001c354] 
> > .do_notify_resume+0xa4/0xb0
> > [   77.343030] [c0062b9a7e30] [c000c344] 
> > .ret_from_except_lite+0x70/0x74
> > [   77.343187] Instruction dump:
> > [   77.343276] 3b40 e87d0

Qgroup accounting issue on kdave/for-next branch

2016-11-28 Thread Chandan Rajendra
o page cache   |   |
 || Write 4k bytes to the file|
 || at offset range [4096, 8191]  |
 || - __btrfs_buffered_write  |
 || -- btrfs_check_data_free_space|
 || --- Since EXTENT_QGROUP_RESERVED  |
 || is already set we don't   |
 || reserve qgroup space  |
 || -- Assume the call to |
 || btrfs_delalloc_reserve_metadata() |
 || fails |
 || -- btrfs_free_reserved_data_space |
 || --- Clear EXTENT_QGROUP_RESERVED  |
 || file range [4096, 8191]   |
 |+---|

On x86_64, It has been almost impossible to get the call to
btrfs_delalloc_reserve_metadata() in __btrfs_buffered_write() to fail. Hence I
have the following change in __btrfs_buffered_write() ...

if (pos != 2048)
   ret = btrfs_delalloc_reserve_metadata(inode, reserve_bytes); 

   
else


   ret = 1; 

   
if (ret) {  


   if (!only_release_metadata)  

   
btrfs_free_reserved_data_space(inode, pos,  


   write_bytes);


   else
btrfs_end_write_no_snapshoting(root);
   break;   

   
} 


With the above change and by using max_inline=0 mount option, the following
command line will reproduce the call trace on an x86_64 machine,
$ xfs_io -f -c 'pwrite -b 2048 0 8192' -c sync /mnt/file-0.bin

-- 
chandan

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V21 18/19] Btrfs: subpage-blocksize: __btrfs_lookup_bio_sums: Set offset when moving to a new bio_vec

2016-10-02 Thread Chandan Rajendra
In __btrfs_lookup_bio_sums() we set the file offset value at the
beginning of every iteration of the while loop. This is incorrect since
the blocks mapped by the current bvec->bv_page might not yet have been
completely processed.

This commit fixes the issue by setting the file offset value when we
move to the next bvec of the bio.

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/file-item.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
index d0d571c..8fc09c1 100644
--- a/fs/btrfs/file-item.c
+++ b/fs/btrfs/file-item.c
@@ -222,11 +222,11 @@ static int __btrfs_lookup_bio_sums(struct btrfs_root 
*root,
disk_bytenr = (u64)bio->bi_iter.bi_sector << 9;
if (dio)
offset = logical_offset;
+   else
+   offset = page_offset(bvec->bv_page) + bvec->bv_offset;
 
page_bytes_left = bvec->bv_len;
while (bio_index < bio->bi_vcnt) {
-   if (!dio)
-   offset = page_offset(bvec->bv_page) + bvec->bv_offset;
count = btrfs_find_ordered_sum(inode, offset, disk_bytenr,
   (u32 *)csum, nblocks);
if (count)
@@ -301,6 +301,9 @@ found:
goto done;
}
bvec++;
+   if (!dio)
+   offset = page_offset(bvec->bv_page)
+   + bvec->bv_offset;
page_bytes_left = bvec->bv_len;
}
 
-- 
2.5.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V21 19/19] Btrfs: subpage-blocksize: Disable compression

2016-10-02 Thread Chandan Rajendra
The subpage-blocksize patchset does not yet support compression. Hence,
the kernel might crash when executing compression code in
subpage-blocksize scenario. This commit disables enabling compression
feature during 'mount' and also when the  user invokes
'chattr +c ' command.

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/ioctl.c |  8 +++-
 fs/btrfs/super.c | 19 +++
 2 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 0fdc0a0..862d97b 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -322,6 +322,11 @@ static int btrfs_ioctl_setflags(struct file *file, void 
__user *arg)
} else if (flags & FS_COMPR_FL) {
const char *comp;
 
+   if (root->sectorsize < PAGE_SIZE) {
+   ret = -EINVAL;
+   goto out_drop;
+   }
+
ip->flags |= BTRFS_INODE_COMPRESS;
ip->flags &= ~BTRFS_INODE_NOCOMPRESS;
 
@@ -1342,7 +1347,8 @@ int btrfs_defrag_file(struct inode *inode, struct file 
*file,
return -EINVAL;
 
if (range->flags & BTRFS_DEFRAG_RANGE_COMPRESS) {
-   if (range->compress_type > BTRFS_COMPRESS_TYPES)
+   if ((range->compress_type > BTRFS_COMPRESS_TYPES)
+   || (root->sectorsize < PAGE_SIZE))
return -EINVAL;
if (range->compress_type)
compress_type = range->compress_type;
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 73a1d8d..3a2e9d7 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -392,6 +392,17 @@ static const match_table_t tokens = {
{Opt_err, NULL},
 };
 
+static int can_enable_compression(struct btrfs_fs_info *fs_info)
+{
+   if (btrfs_super_sectorsize(fs_info->super_copy) < PAGE_SIZE) {
+   btrfs_err(fs_info,
+   "Compression is not supported for subpage-blocksize");
+   return 0;
+   }
+
+   return 1;
+}
+
 /*
  * Regular mount options parser.  Everything that is needed only when
  * reading in a new superblock is parsed here.
@@ -502,6 +513,10 @@ int btrfs_parse_options(struct btrfs_root *root, char 
*options,
if (token == Opt_compress ||
token == Opt_compress_force ||
strcmp(args[0].from, "zlib") == 0) {
+   if (!can_enable_compression(info)) {
+   ret = -EINVAL;
+   goto out;
+   }
compress_type = "zlib";
info->compress_type = BTRFS_COMPRESS_ZLIB;
btrfs_set_opt(info->mount_opt, COMPRESS);
@@ -509,6 +524,10 @@ int btrfs_parse_options(struct btrfs_root *root, char 
*options,
btrfs_clear_opt(info->mount_opt, NODATASUM);
no_compress = 0;
} else if (strcmp(args[0].from, "lzo") == 0) {
+   if (!can_enable_compression(info)) {
+   ret = -EINVAL;
+   goto out;
+   }
compress_type = "lzo";
info->compress_type = BTRFS_COMPRESS_LZO;
btrfs_set_opt(info->mount_opt, COMPRESS);
-- 
2.5.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V21 17/19] Btrfs: subpage-blocksize: Make file extent relocate code subpage blocksize aware

2016-10-02 Thread Chandan Rajendra
The file extent relocation code currently assumes blocksize to be same
as PAGE_SIZE. This commit adds code to support subpage blocksize
scenario.

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/relocation.c | 90 ---
 1 file changed, 71 insertions(+), 19 deletions(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index f724fb5..75e51a3 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -3114,14 +3114,19 @@ static int relocate_file_extent_cluster(struct inode 
*inode,
 {
u64 page_start;
u64 page_end;
+   u64 block_start;
u64 offset = BTRFS_I(inode)->index_cnt;
+   u64 blocksize = BTRFS_I(inode)->root->sectorsize;
+   u64 reserved_space;
unsigned long index;
unsigned long last_index;
struct page *page;
struct file_ra_state *ra;
gfp_t mask = btrfs_alloc_write_mask(inode->i_mapping);
+   int nr_blocks;
int nr = 0;
int ret = 0;
+   int i;
 
if (!cluster->nr)
return 0;
@@ -3141,13 +3146,19 @@ static int relocate_file_extent_cluster(struct inode 
*inode,
if (ret)
goto out;
 
+   page_start = cluster->start - offset;
+   page_end = min_t(u64, round_down(page_start, PAGE_SIZE) + PAGE_SIZE - 1,
+   cluster->end - offset);
+
index = (cluster->start - offset) >> PAGE_SHIFT;
last_index = (cluster->end - offset) >> PAGE_SHIFT;
while (index <= last_index) {
-   ret = btrfs_delalloc_reserve_metadata(inode, PAGE_SIZE);
+   reserved_space = page_end - page_start + 1;
+
+   ret = btrfs_delalloc_reserve_metadata(inode, reserved_space);
if (ret)
goto out;
-
+again:
page = find_lock_page(inode->i_mapping, index);
if (!page) {
page_cache_sync_readahead(inode->i_mapping,
@@ -3157,7 +3168,7 @@ static int relocate_file_extent_cluster(struct inode 
*inode,
   mask);
if (!page) {
btrfs_delalloc_release_metadata(inode,
-   PAGE_SIZE);
+   reserved_space);
ret = -ENOMEM;
goto out;
}
@@ -3169,6 +3180,38 @@ static int relocate_file_extent_cluster(struct inode 
*inode,
   last_index + 1 - index);
}
 
+   if (PageDirty(page)) {
+   u64 pg_offset = page_offset(page);
+
+   unlock_page(page);
+   put_page(page);
+   ret = btrfs_fdatawrite_range(inode, pg_offset,
+   page_start - 1);
+   if (ret) {
+   btrfs_delalloc_release_metadata(inode,
+   reserved_space);
+   goto out;
+   }
+
+   ret = filemap_fdatawait_range(inode->i_mapping,
+   pg_offset, page_start - 1);
+   if (ret) {
+   btrfs_delalloc_release_metadata(inode,
+   reserved_space);
+   goto out;
+   }
+
+   goto again;
+   }
+
+   if (BTRFS_I(inode)->root->sectorsize < PAGE_SIZE) {
+   ClearPageUptodate(page);
+   if (page->private)
+   clear_page_blks_state(page,
+   1 << BLK_STATE_UPTODATE,
+   page_start, page_end);
+   }
+
if (!PageUptodate(page)) {
btrfs_readpage(NULL, page);
lock_page(page);
@@ -3176,35 +3219,40 @@ static int relocate_file_extent_cluster(struct inode 
*inode,
unlock_page(page);
put_page(page);
btrfs_delalloc_release_metadata(inode,
-   PAGE_SIZE);
+   reserved_space);
ret = -EIO;
goto out;
}
}
 
-   page_start = page_offset(page);
-   page_end = page_start + PAGE_SIZE - 1;
-
 

[PATCH V21 16/19] Btrfs: subpage-blocksize: btrfs_clone: Flush dirty blocks of a page that do not map the clone range

2016-10-02 Thread Chandan Rajendra
After cloning the required extents, we truncate all the pages that map
the file range being cloned. In subpage-blocksize scenario, we could
have dirty blocks before and/or after the clone range in the
leading/trailing pages. Truncating these pages would lead to data
loss. Hence this commit forces such dirty blocks to be flushed to disk
before performing the clone operation.

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/ioctl.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index cf13029..0fdc0a0 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3914,6 +3914,7 @@ static noinline int btrfs_clone_files(struct file *file, 
struct file *file_src,
int ret;
u64 len = olen;
u64 bs = root->fs_info->sb->s_blocksize;
+   u64 dest_end;
int same_inode = src == inode;
 
/*
@@ -3974,6 +3975,21 @@ static noinline int btrfs_clone_files(struct file *file, 
struct file *file_src,
goto out_unlock;
}
 
+   if ((round_down(destoff, PAGE_SIZE) < inode->i_size) &&
+   !IS_ALIGNED(destoff, PAGE_SIZE)) {
+   ret = filemap_write_and_wait_range(inode->i_mapping,
+   round_down(destoff, PAGE_SIZE),
+   destoff - 1);
+   }
+
+   dest_end = destoff + len - 1;
+   if ((dest_end < inode->i_size) &&
+   !IS_ALIGNED(dest_end + 1, PAGE_SIZE)) {
+   ret = filemap_write_and_wait_range(inode->i_mapping,
+   dest_end + 1,
+   round_up(dest_end, PAGE_SIZE));
+   }
+
if (destoff > inode->i_size) {
ret = btrfs_cont_expand(inode, inode->i_size, destoff);
if (ret)
-- 
2.5.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V21 13/19] Btrfs: subpage-blocksize: btrfs_punch_hole: Fix uptodate blocks check

2016-10-02 Thread Chandan Rajendra
In case of subpage-blocksize, the file blocks to be punched may map only
part of a page. For file blocks inside such pages, we need to check for
the presence of BLK_STATE_UPTODATE flag.

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/file.c | 89 -
 1 file changed, 88 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 54602e6..6490e56 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2350,6 +2350,8 @@ static int btrfs_punch_hole(struct inode *inode, loff_t 
offset, loff_t len)
struct btrfs_path *path;
struct btrfs_block_rsv *rsv;
struct btrfs_trans_handle *trans;
+   struct address_space *mapping = inode->i_mapping;
+   pgoff_t start_index, end_index;
u64 lockstart;
u64 lockend;
u64 tail_start;
@@ -2362,6 +2364,7 @@ static int btrfs_punch_hole(struct inode *inode, loff_t 
offset, loff_t len)
int err = 0;
unsigned int rsv_count;
bool same_block;
+   bool same_page;
bool no_holes = btrfs_fs_incompat(root->fs_info, NO_HOLES);
u64 ino_size;
bool truncated_block = false;
@@ -2458,11 +2461,45 @@ static int btrfs_punch_hole(struct inode *inode, loff_t 
offset, loff_t len)
goto out_only_mutex;
}
 
+   start_index = lockstart >> PAGE_SHIFT;
+   end_index = lockend >> PAGE_SHIFT;
+
+   same_page = lockstart >> PAGE_SHIFT
+   == lockend >> PAGE_SHIFT;
+
while (1) {
struct btrfs_ordered_extent *ordered;
+   struct page *start_page = NULL;
+   struct page *end_page = NULL;
+   u64 nr_pages;
+   int start_page_blks_uptodate;
+   int end_page_blks_uptodate;
 
truncate_pagecache_range(inode, lockstart, lockend);
 
+   if (lockstart & (PAGE_SIZE - 1)) {
+   start_page = find_or_create_page(mapping, start_index,
+   GFP_NOFS);
+   if (!start_page) {
+   inode_unlock(inode);
+   return -ENOMEM;
+   }
+   }
+
+   if (!same_page && ((lockend + 1) & (PAGE_SIZE - 1))) {
+   end_page = find_or_create_page(mapping, end_index,
+   GFP_NOFS);
+   if (!end_page) {
+   if (start_page) {
+   unlock_page(start_page);
+   put_page(start_page);
+   }
+   inode_unlock(inode);
+   return -ENOMEM;
+   }
+   }
+
+
lock_extent_bits(&BTRFS_I(inode)->io_tree, lockstart, lockend,
 &cached_state);
ordered = btrfs_lookup_first_ordered_extent(inode, lockend);
@@ -2472,18 +2509,68 @@ static int btrfs_punch_hole(struct inode *inode, loff_t 
offset, loff_t len)
 * and nobody raced in and read a page in this range, if we did
 * we need to try again.
 */
+   nr_pages = round_up(lockend, PAGE_SIZE)
+   - round_down(lockstart, PAGE_SIZE);
+   nr_pages >>= PAGE_SHIFT;
+
+   start_page_blks_uptodate = 0;
+   end_page_blks_uptodate = 0;
+   if (root->sectorsize < PAGE_SIZE) {
+   u64 page_end;
+
+   page_end = round_down(lockstart, PAGE_SIZE)
+   + PAGE_SIZE - 1;
+   page_end = min(page_end, lockend);
+   if (start_page
+   && PagePrivate(start_page)
+   && test_page_blks_state(start_page, 1 << 
BLK_STATE_UPTODATE,
+   lockstart, page_end, 0))
+   start_page_blks_uptodate = 1;
+   if (end_page
+   && PagePrivate(end_page)
+   && test_page_blks_state(end_page, 1 << 
BLK_STATE_UPTODATE,
+   page_offset(end_page), 
lockend, 0))
+   end_page_blks_uptodate = 1;
+   } else {
+   if (start_page && PagePrivate(start_page)
+   && PageUptodate(start_page))
+   start_page_blks_uptodate = 1;
+   if (end_page && PagePrivate(end_page)
+   && PageUptodate(end_page))
+

[PATCH V21 12/19] Btrfs: subpage-blocksize: Explicitly track I/O status of blocks of an ordered extent.

2016-10-02 Thread Chandan Rajendra
In subpage-blocksize scenario a page can have more than one block. So in
addition to PagePrivate2 flag, we would have to track the I/O status of
each block of a page to reliably mark the ordered extent as complete.

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/extent_io.c|  19 +--
 fs/btrfs/extent_io.h|   5 +-
 fs/btrfs/inode.c| 363 ++--
 fs/btrfs/ordered-data.c |  19 +++
 fs/btrfs/ordered-data.h |   4 +
 5 files changed, 294 insertions(+), 116 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 0832797..df6172c 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4483,11 +4483,10 @@ int extent_invalidatepage(struct extent_io_tree *tree,
  * to drop the page.
  */
 static int try_release_extent_state(struct extent_map_tree *map,
-   struct extent_io_tree *tree,
-   struct page *page, gfp_t mask)
+   struct extent_io_tree *tree,
+   struct page *page, u64 start, u64 end,
+   gfp_t mask)
 {
-   u64 start = page_offset(page);
-   u64 end = start + PAGE_SIZE - 1;
int ret = 1;
 
if (test_range_bit(tree, start, end,
@@ -4521,12 +4520,12 @@ static int try_release_extent_state(struct 
extent_map_tree *map,
  * map records are removed
  */
 int try_release_extent_mapping(struct extent_map_tree *map,
-  struct extent_io_tree *tree, struct page *page,
-  gfp_t mask)
+   struct extent_io_tree *tree, struct page *page,
+   u64 start, u64 end, gfp_t mask)
 {
struct extent_map *em;
-   u64 start = page_offset(page);
-   u64 end = start + PAGE_SIZE - 1;
+   u64 orig_start = start;
+   u64 orig_end = end;
 
if (gfpflags_allow_blocking(mask) &&
page->mapping->host->i_size > SZ_16M) {
@@ -4560,7 +4559,9 @@ int try_release_extent_mapping(struct extent_map_tree 
*map,
free_extent_map(em);
}
}
-   return try_release_extent_state(map, tree, page, mask);
+   return try_release_extent_state(map, tree, page,
+   orig_start, orig_end,
+   mask);
 }
 
 /*
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index ad5b000..491f9b4 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -276,8 +276,9 @@ typedef struct extent_map *(get_extent_t)(struct inode 
*inode,
 
 void extent_io_tree_init(struct extent_io_tree *tree, void *private_data);
 int try_release_extent_mapping(struct extent_map_tree *map,
-  struct extent_io_tree *tree, struct page *page,
-  gfp_t mask);
+   struct extent_io_tree *tree, struct page *page,
+   u64 start, u64 end,
+   gfp_t mask);
 int try_release_extent_buffer(struct page *page);
 int lock_extent_bits(struct extent_io_tree *tree, u64 start, u64 end,
 struct extent_state **cached);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index cf55622..03b9425 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3077,56 +3077,119 @@ static void finish_ordered_fn(struct btrfs_work *work)
btrfs_finish_ordered_io(ordered_extent);
 }
 
-static int btrfs_writepage_end_io_hook(struct page *page, u64 start, u64 end,
-   struct extent_state *state, int uptodate)
+static void mark_blks_io_complete(struct btrfs_ordered_extent *ordered,
+   u64 blk, u64 nr_blks, int uptodate)
 {
-   struct inode *inode = page->mapping->host;
+   struct inode *inode = ordered->inode;
struct btrfs_root *root = BTRFS_I(inode)->root;
-   struct btrfs_ordered_extent *ordered_extent = NULL;
struct btrfs_workqueue *wq;
btrfs_work_func_t func;
-   u64 ordered_start, ordered_end;
int done;
 
-   trace_btrfs_writepage_end_io_hook(page, start, end, uptodate);
+   while (nr_blks--) {
+   if (test_and_set_bit(blk, ordered->blocks_done)) {
+   blk++;
+   continue;
+   }
 
-   ClearPagePrivate2(page);
-loop:
-   ordered_extent = btrfs_lookup_ordered_range(inode, start,
-   end - start + 1);
-   if (!ordered_extent)
-   goto out;
+   done = btrfs_dec_test_ordered_pending(inode, &ordered,
+   ordered->file_offset
+   + (blk << inode->i_blkbits),
+   root->sectorsize,
+   uptodate);

[PATCH V21 06/19] Btrfs: subpage-blocksize: Fix whole page write

2016-10-02 Thread Chandan Rajendra
For the subpage-blocksize scenario, a page can contain multiple
blocks. In such cases, this patch handles writing data to files.

Also, When setting EXTENT_DELALLOC, we no longer set EXTENT_UPTODATE bit on
the extent_io_tree since uptodate status is being tracked either by the
bitmap pointed to by page->private or by the PG_uptodate flag.

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/extent_io.c  | 114 +++---
 fs/btrfs/file.c   |  16 +++
 fs/btrfs/inode.c  |  69 --
 fs/btrfs/relocation.c |   3 ++
 4 files changed, 137 insertions(+), 65 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index b3885cc..6cac61f 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2573,36 +2573,41 @@ void end_extent_writepage(struct page *page, int err, 
u64 start, u64 end)
  */
 static void end_bio_extent_writepage(struct bio *bio)
 {
+   struct btrfs_page_private *pg_private;
struct bio_vec *bvec;
+   unsigned long flags;
u64 start;
u64 end;
+   int clear_writeback;
int i;
 
bio_for_each_segment_all(bvec, bio, i) {
struct page *page = bvec->bv_page;
+   struct btrfs_root *root = BTRFS_I(page->mapping->host)->root;
 
-   /* We always issue full-page reads, but if some block
-* in a page fails to read, blk_update_request() will
-* advance bv_offset and adjust bv_len to compensate.
-* Print a warning for nonzero offsets, and an error
-* if they don't add up to a full page.  */
-   if (bvec->bv_offset || bvec->bv_len != PAGE_SIZE) {
-   if (bvec->bv_offset + bvec->bv_len != PAGE_SIZE)
-   
btrfs_err(BTRFS_I(page->mapping->host)->root->fs_info,
-  "partial page write in btrfs with offset %u 
and length %u",
-   bvec->bv_offset, bvec->bv_len);
-   else
-   
btrfs_info(BTRFS_I(page->mapping->host)->root->fs_info,
-  "incomplete page write in btrfs with offset 
%u and "
-  "length %u",
-   bvec->bv_offset, bvec->bv_len);
-   }
+   pg_private = NULL;
+   flags = 0;
+   clear_writeback = 1;
+
+   start = page_offset(page) + bvec->bv_offset;
+   end = start + bvec->bv_len - 1;
 
-   start = page_offset(page);
-   end = start + bvec->bv_offset + bvec->bv_len - 1;
+   if (root->sectorsize < PAGE_SIZE) {
+   pg_private = (struct btrfs_page_private *)page->private;
+   spin_lock_irqsave(&pg_private->io_lock, flags);
+   }
 
end_extent_writepage(page, bio->bi_error, start, end);
-   end_page_writeback(page);
+
+   if (root->sectorsize < PAGE_SIZE) {
+   clear_page_blks_state(page, 1 << BLK_STATE_IO, start,
+   end);
+   clear_writeback = page_io_complete(page);
+   spin_unlock_irqrestore(&pg_private->io_lock, flags);
+   }
+
+   if (clear_writeback)
+   end_page_writeback(page);
}
 
bio_put(bio);
@@ -3465,7 +3470,6 @@ static noinline_for_stack int 
__extent_writepage_io(struct inode *inode,
u64 block_start;
u64 iosize;
sector_t sector;
-   struct extent_state *cached_state = NULL;
struct extent_map *em;
struct block_device *bdev;
size_t pg_offset = 0;
@@ -3517,20 +3521,29 @@ static noinline_for_stack int 
__extent_writepage_io(struct inode *inode,
 page_end, NULL, 1);
break;
}
-   em = epd->get_extent(inode, page, pg_offset, cur,
-end - cur + 1, 1);
+
+   if (blocksize < PAGE_SIZE
+   && !test_page_blks_state(page, BLK_STATE_DIRTY, cur,
+   cur + blocksize - 1, 1)) {
+   cur += blocksize;
+   continue;
+   }
+
+   pg_offset = cur & (PAGE_SIZE - 1);
+
+   em = epd->get_extent(inode, page, pg_offset, cur, blocksize, 1);
if (IS_ERR_OR_NULL(em)) {
SetPageError(page);
ret = PTR_ERR_OR_ZERO(em);
break;
}
 
-   extent_offset = cur - em

[PATCH V21 08/19] Btrfs: subpage-blocksize: Execute sanity tests on all possible block sizes

2016-10-02 Thread Chandan Rajendra
This commit executes sanity tests for all valid sectorsize/nodesize
combinations.

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/tests/btrfs-tests.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/tests/btrfs-tests.c b/fs/btrfs/tests/btrfs-tests.c
index dca90d6..0f2afb6 100644
--- a/fs/btrfs/tests/btrfs-tests.c
+++ b/fs/btrfs/tests/btrfs-tests.c
@@ -261,13 +261,19 @@ int btrfs_run_sanity_tests(void)
int ret, i;
u32 sectorsize, nodesize;
u32 test_sectorsize[] = {
-   PAGE_SIZE,
+   4096,
+   8192,
+   16384,
+   32768,
+   65536,
};
ret = btrfs_init_test_fs();
if (ret)
return ret;
for (i = 0; i < ARRAY_SIZE(test_sectorsize); i++) {
sectorsize = test_sectorsize[i];
+   if (sectorsize > PAGE_SIZE)
+   break;
for (nodesize = sectorsize;
 nodesize <= BTRFS_MAX_METADATA_BLOCKSIZE;
 nodesize <<= 1) {
-- 
2.5.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V21 05/19] Btrfs: subpage-blocksize: Fix whole page read.

2016-10-02 Thread Chandan Rajendra
For the subpage-blocksize scenario, a page can contain multiple
blocks. In such cases, this patch handles reading data from files.

To track the status of individual blocks of a page, this patch makes use
of a bitmap pointed to by the newly introduced per-page 'struct
btrfs_page_private'.

The per-page btrfs_page_private->io_lock plays the same role as
BH_Uptodate_Lock (see end_buffer_async_read()) i.e. without the io_lock
we may end up in the following situation,

NOTE: Assume 64k page size and 4k block size. Also assume that the first
12 blocks of the page are contiguous while the next 4 blocks are
contiguous. When reading the page we end up submitting two "logical
address space" bios. So end_bio_extent_readpage function is invoked
twice, once for each bio.

|-+-+-|
| Task A  | Task B  | Task C  |
|-+-+-|
| end_bio_extent_readpage | | |
| process block 0 | | |
| - clear BLK_STATE_IO| | |
| - page_read_complete| | |
| process block 1 | | |
| | | |
| | | |
| | end_bio_extent_readpage | |
| | process block 0 | |
| | - clear BLK_STATE_IO| |
| | - page_read_complete| |
| | process block 1 | |
| | | |
| process block 11| process block 3 | |
| - clear BLK_STATE_IO| - clear BLK_STATE_IO| |
| - page_read_complete| - page_read_complete| |
|   - returns true|   - returns true| |
|   - unlock_page()   | | |
| | | lock_page() |
| |   - unlock_page()   | |
|-+-+-|

We end up incorrectly unlocking the page twice and "Task C" ends up
working on an unlocked page. So private->io_lock makes sure that only
one of the tasks gets "true" as the return value when page_io_complete()
is invoked. As an optimization the patch gets the io_lock only when the
last block of the bio_vec is being processed.

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/extent_io.c | 299 +--
 fs/btrfs/extent_io.h |  76 -
 fs/btrfs/inode.c |  13 +--
 3 files changed, 320 insertions(+), 68 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 522c943..b3885cc 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -23,6 +23,7 @@
 
 static struct kmem_cache *extent_state_cache;
 static struct kmem_cache *extent_buffer_cache;
+static struct kmem_cache *page_private_cache;
 static struct bio_set *btrfs_bioset;
 
 static inline bool extent_state_in_tree(const struct extent_state *state)
@@ -163,10 +164,16 @@ int __init extent_io_init(void)
if (!extent_buffer_cache)
goto free_state_cache;
 
+   page_private_cache = kmem_cache_create("btrfs_page_private",
+   sizeof(struct btrfs_page_private), 0,
+   SLAB_MEM_SPREAD, NULL);
+   if (!page_private_cache)
+   goto free_buffer_cache;
+
btrfs_bioset = bioset_create(BIO_POOL_SIZE,
 offsetof(struct btrfs_io_bio, bio));
if (!btrfs_bioset)
-   goto free_buffer_cache;
+   goto free_page_private_cache;
 
if (bioset_integrity_create(btrfs_bioset, BIO_POOL_SIZE))
goto free_bioset;
@@ -177,6 +184,10 @@ free_bioset:
bioset_free(btrfs_bioset);
btrfs_bioset = NULL;
 
+free_page_private_cache:
+   kmem_cache_destroy(page_private_cache);
+   page_private_cache = NULL;
+
 free_buffer_cache:
kmem_cache_destroy(extent_buffer_cache);
extent_buffer_cache = NULL;
@@ -1311,6 +1322,96 @@ int clear_record_extent_bits(struct extent_io_tree 
*tree, u64 start, u64 end,
  changeset);
 }
 
+static int modify_page_blks_state(struct page *page,
+   unsigned long blk_states,
+   u64 start, u64 end, int set)
+{
+   struct inode *inode = page->mapping->host;
+   unsigned long *bitmap;
+   unsigned long first_state;
+   unsigned long state;
+   u64 nr_blks;
+

[PATCH V21 14/19] Btrfs: subpage-blocksize: Fix file defragmentation code

2016-10-02 Thread Chandan Rajendra
This commit gets file defragmentation code to work in subpage-blocksize
scenario. It does this by keeping track of page offsets that mark block
boundaries and passing them as arguments to the functions that implement
the defragmentation logic.

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/ioctl.c | 198 ++-
 1 file changed, 136 insertions(+), 62 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index a222bad..4077fc1 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -902,12 +902,13 @@ out_unlock:
 static int check_defrag_in_cache(struct inode *inode, u64 offset, u32 thresh)
 {
struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
+   struct btrfs_root *root = BTRFS_I(inode)->root;
struct extent_map *em = NULL;
struct extent_map_tree *em_tree = &BTRFS_I(inode)->extent_tree;
u64 end;
 
read_lock(&em_tree->lock);
-   em = lookup_extent_mapping(em_tree, offset, PAGE_SIZE);
+   em = lookup_extent_mapping(em_tree, offset, root->sectorsize);
read_unlock(&em_tree->lock);
 
if (em) {
@@ -997,7 +998,7 @@ static struct extent_map *defrag_lookup_extent(struct inode 
*inode, u64 start)
struct extent_map_tree *em_tree = &BTRFS_I(inode)->extent_tree;
struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
struct extent_map *em;
-   u64 len = PAGE_SIZE;
+   u64 len = BTRFS_I(inode)->root->sectorsize;
 
/*
 * hopefully we have this extent in the tree already, try without
@@ -1116,37 +1117,47 @@ out:
  * before calling this.
  */
 static int cluster_pages_for_defrag(struct inode *inode,
-   struct page **pages,
-   unsigned long start_index,
-   unsigned long num_pages)
+   struct page **pages,
+   unsigned long start_index,
+   size_t pg_offset,
+   unsigned long num_blks)
 {
-   unsigned long file_end;
u64 isize = i_size_read(inode);
+   u64 start_blk;
+   u64 end_blk;
u64 page_start;
u64 page_end;
u64 page_cnt;
+   u64 blk_cnt;
int ret;
int i;
int i_done;
struct btrfs_ordered_extent *ordered;
struct extent_state *cached_state = NULL;
struct extent_io_tree *tree;
+   struct btrfs_root *root;
gfp_t mask = btrfs_alloc_write_mask(inode->i_mapping);
 
-   file_end = (isize - 1) >> PAGE_SHIFT;
-   if (!isize || start_index > file_end)
+   root = BTRFS_I(inode)->root;
+   start_blk = (start_index << PAGE_SHIFT) + pg_offset;
+   start_blk >>= inode->i_blkbits;
+   end_blk = (isize - 1) >> inode->i_blkbits;
+   if (!isize || start_blk > end_blk)
return 0;
 
-   page_cnt = min_t(u64, (u64)num_pages, (u64)file_end - start_index + 1);
+   blk_cnt = min_t(u64, (u64)num_blks, (u64)end_blk - start_blk + 1);
 
ret = btrfs_delalloc_reserve_space(inode,
-   start_index << PAGE_SHIFT,
-   page_cnt << PAGE_SHIFT);
+   start_blk << inode->i_blkbits,
+   blk_cnt << inode->i_blkbits);
if (ret)
return ret;
i_done = 0;
tree = &BTRFS_I(inode)->io_tree;
 
+   page_cnt = DIV_ROUND_UP(pg_offset + (blk_cnt << inode->i_blkbits),
+   PAGE_SIZE);
+
/* step one, lock all the pages */
for (i = 0; i < page_cnt; i++) {
struct page *page;
@@ -1157,12 +1168,22 @@ again:
break;
 
page_start = page_offset(page);
-   page_end = page_start + PAGE_SIZE - 1;
+
+   if (i == 0)
+   page_start += pg_offset;
+
+   if (i == page_cnt - 1) {
+   page_end = (start_index << PAGE_SHIFT) + pg_offset;
+   page_end += (blk_cnt << inode->i_blkbits) - 1;
+   } else {
+   page_end = page_offset(page) + PAGE_SIZE - 1;
+   }
+
while (1) {
lock_extent_bits(tree, page_start, page_end,
 &cached_state);
-   ordered = btrfs_lookup_ordered_extent(inode,
- page_start);
+   ordered = btrfs_lookup_ordered_range(inode, page_start,
+   page_end - page_start + 
1);
unlock_extent_cached(tree, page_start, page_end,
 

[PATCH V21 04/19] Btrfs: Remove extent_io_tree's track_uptodate member

2016-10-02 Thread Chandan Rajendra
We now track block uptodate status using a page's PG_Uptodate
flag. Hence this commit removes the now unused
extent_io_tree->track_uptodate member.

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/disk-io.c   | 1 -
 fs/btrfs/extent_io.h | 1 -
 fs/btrfs/inode.c | 2 --
 3 files changed, 4 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 03ac601..9ff48a7 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2085,7 +2085,6 @@ int btrfs_init_eb_info(struct btrfs_fs_info *fs_info)
 
eb_info->fs_info = fs_info;
extent_io_tree_init(&eb_info->io_tree, eb_info);
-   eb_info->io_tree.track_uptodate = 0;
eb_info->io_tree.ops = &btree_extent_io_ops;
extent_io_tree_init(&eb_info->io_failure_tree, eb_info);
INIT_RADIX_TREE(&eb_info->buffer_radix, GFP_ATOMIC);
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 922f4c1..9aa22f9 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -102,7 +102,6 @@ struct extent_io_tree {
struct rb_root state;
void *private_data;
u64 dirty_bytes;
-   int track_uptodate;
spinlock_t lock;
const struct extent_io_ops *ops;
 };
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 652d01d..ac4a7c0 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -9306,8 +9306,6 @@ struct inode *btrfs_alloc_inode(struct super_block *sb)
extent_map_tree_init(&ei->extent_tree);
extent_io_tree_init(&ei->io_tree, inode);
extent_io_tree_init(&ei->io_failure_tree, inode);
-   ei->io_tree.track_uptodate = 1;
-   ei->io_failure_tree.track_uptodate = 1;
atomic_set(&ei->sync_writers, 0);
mutex_init(&ei->log_mutex);
mutex_init(&ei->delalloc_mutex);
-- 
2.5.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V21 11/19] Btrfs: subpage-blocksize: Deal with partial ordered extent allocations.

2016-10-02 Thread Chandan Rajendra
In subpage-blocksize scenario, extent allocations for only some of the
dirty blocks of a page can succeed, while allocation for rest of the
blocks can fail. This patch allows I/O against such pages to be
submitted.

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/extent_io.c | 27 ++-
 fs/btrfs/inode.c | 11 ++-
 2 files changed, 24 insertions(+), 14 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 9af8237..0832797 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1843,17 +1843,23 @@ void extent_clear_unlock_delalloc(struct inode *inode, 
u64 start, u64 end,
if (page_ops & PAGE_SET_PRIVATE2)
SetPagePrivate2(pages[i]);
 
+   if (page_ops & PAGE_SET_ERROR)
+   SetPageError(pages[i]);
+
if (pages[i] == locked_page) {
put_page(pages[i]);
continue;
}
-   if (page_ops & PAGE_CLEAR_DIRTY)
+
+   if ((page_ops & PAGE_CLEAR_DIRTY)
+   && !PagePrivate2(pages[i]))
clear_page_dirty_for_io(pages[i]);
-   if (page_ops & PAGE_SET_WRITEBACK)
+   if ((page_ops & PAGE_SET_WRITEBACK)
+   && !PagePrivate2(pages[i]))
set_page_writeback(pages[i]);
-   if (page_ops & PAGE_SET_ERROR)
-   SetPageError(pages[i]);
-   if (page_ops & PAGE_END_WRITEBACK)
+
+   if ((page_ops & PAGE_END_WRITEBACK)
+   && !PagePrivate2(pages[i]))
end_page_writeback(pages[i]);
 
if (page_ops & PAGE_UNLOCK) {
@@ -2554,7 +2560,7 @@ void end_extent_writepage(struct page *page, int err, u64 
start, u64 end)
uptodate = 0;
}
 
-   if (!uptodate) {
+   if (!uptodate || PageError(page)) {
ClearPageUptodate(page);
SetPageError(page);
ret = ret < 0 ? ret : -EIO;
@@ -3401,7 +3407,6 @@ static noinline_for_stack int writepage_delalloc(struct 
inode *inode,
   nr_written);
/* File system has been set read-only */
if (ret) {
-   SetPageError(page);
/* fill_delalloc should be return < 0 for error
 * but just in case, we use > 0 here meaning the
 * IO is started, so we don't want to return > 0
@@ -3618,7 +3623,6 @@ static int __extent_writepage(struct page *page, struct 
writeback_control *wbc,
struct inode *inode = page->mapping->host;
struct extent_page_data *epd = data;
u64 start = page_offset(page);
-   u64 page_end = start + PAGE_SIZE - 1;
int ret;
int nr = 0;
size_t pg_offset = 0;
@@ -3661,7 +3665,7 @@ static int __extent_writepage(struct page *page, struct 
writeback_control *wbc,
ret = writepage_delalloc(inode, page, wbc, epd, start, &nr_written);
if (ret == 1)
goto done_unlocked;
-   if (ret)
+   if (ret && !PagePrivate2(page))
goto done;
 
ret = __extent_writepage_io(inode, page, wbc, epd,
@@ -3675,10 +3679,7 @@ done:
set_page_writeback(page);
end_page_writeback(page);
}
-   if (PageError(page)) {
-   ret = ret < 0 ? ret : -EIO;
-   end_extent_writepage(page, ret, start, page_end);
-   }
+
unlock_page(page);
return ret;
 
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 42f844b..cf55622 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -951,6 +951,7 @@ static noinline int cow_file_range(struct inode *inode,
struct btrfs_key ins;
struct extent_map *em;
struct extent_map_tree *em_tree = &BTRFS_I(inode)->extent_tree;
+   struct btrfs_ordered_extent *ordered;
unsigned long page_ops, extent_ops;
int ret = 0;
 
@@ -1049,7 +1050,7 @@ static noinline int cow_file_range(struct inode *inode,
ret = btrfs_reloc_clone_csums(inode, start,
  cur_alloc_size);
if (ret)
-   goto out_drop_extent_cache;
+   goto out_remove_ordered_extent;
}
 
btrfs_dec_block_group_reservations(root->fs_info, ins.objectid);
@@ -1078,6 +1079,14 @@ static noinline int cow_file_range(struct inode *inode,
 out:
return ret;
 

[PATCH V21 09/19] Btrfs: subpage-blocksize: Compute free space tree BITMAP_RANGE based on sectorsize

2016-10-02 Thread Chandan Rajendra
The default bitmap length computation in free space tree sanity tests
assumes PAGE_SIZE as the sectorsize. This commit fixes this by using a
variable sectorsize to calculate BITMAP_RANGE.

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/tests/free-space-tree-tests.c | 79 ++
 1 file changed, 43 insertions(+), 36 deletions(-)

diff --git a/fs/btrfs/tests/free-space-tree-tests.c 
b/fs/btrfs/tests/free-space-tree-tests.c
index 3bf5df1..11d9fb0 100644
--- a/fs/btrfs/tests/free-space-tree-tests.c
+++ b/fs/btrfs/tests/free-space-tree-tests.c
@@ -31,7 +31,7 @@ struct free_space_extent {
  * The test cases align their operations to this in order to hit some of the
  * edge cases in the bitmap code.
  */
-#define BITMAP_RANGE (BTRFS_FREE_SPACE_BITMAP_BITS * PAGE_SIZE)
+#define BITMAP_RANGE(sectorsize) (BTRFS_FREE_SPACE_BITMAP_BITS * (sectorsize))
 
 static int __check_free_space_extents(struct btrfs_trans_handle *trans,
  struct btrfs_fs_info *fs_info,
@@ -203,14 +203,15 @@ static int test_remove_beginning(struct 
btrfs_trans_handle *trans,
 struct btrfs_block_group_cache *cache,
 struct btrfs_path *path)
 {
+   u64 bitmap_range = BITMAP_RANGE(fs_info->tree_root->sectorsize);
struct free_space_extent extents[] = {
-   {cache->key.objectid + BITMAP_RANGE,
-   cache->key.offset - BITMAP_RANGE},
+   {cache->key.objectid + bitmap_range,
+   cache->key.offset - bitmap_range},
};
int ret;
 
ret = __remove_from_free_space_tree(trans, fs_info, cache, path,
-   cache->key.objectid, BITMAP_RANGE);
+   cache->key.objectid, bitmap_range);
if (ret) {
test_msg("Could not remove free space\n");
return ret;
@@ -226,15 +227,16 @@ static int test_remove_end(struct btrfs_trans_handle 
*trans,
   struct btrfs_block_group_cache *cache,
   struct btrfs_path *path)
 {
+   u64 bitmap_range = BITMAP_RANGE(fs_info->tree_root->sectorsize);
struct free_space_extent extents[] = {
-   {cache->key.objectid, cache->key.offset - BITMAP_RANGE},
+   {cache->key.objectid, cache->key.offset - bitmap_range},
};
int ret;
 
ret = __remove_from_free_space_tree(trans, fs_info, cache, path,
cache->key.objectid +
-   cache->key.offset - BITMAP_RANGE,
-   BITMAP_RANGE);
+   cache->key.offset - bitmap_range,
+   bitmap_range);
if (ret) {
test_msg("Could not remove free space\n");
return ret;
@@ -249,16 +251,17 @@ static int test_remove_middle(struct btrfs_trans_handle 
*trans,
  struct btrfs_block_group_cache *cache,
  struct btrfs_path *path)
 {
+   u64 bitmap_range = BITMAP_RANGE(fs_info->tree_root->sectorsize);
struct free_space_extent extents[] = {
-   {cache->key.objectid, BITMAP_RANGE},
-   {cache->key.objectid + 2 * BITMAP_RANGE,
-   cache->key.offset - 2 * BITMAP_RANGE},
+   {cache->key.objectid, bitmap_range},
+   {cache->key.objectid + 2 * bitmap_range,
+   cache->key.offset - 2 * bitmap_range},
};
int ret;
 
ret = __remove_from_free_space_tree(trans, fs_info, cache, path,
-   cache->key.objectid + BITMAP_RANGE,
-   BITMAP_RANGE);
+   cache->key.objectid + bitmap_range,
+   bitmap_range);
if (ret) {
test_msg("Could not remove free space\n");
return ret;
@@ -273,8 +276,9 @@ static int test_merge_left(struct btrfs_trans_handle *trans,
   struct btrfs_block_group_cache *cache,
   struct btrfs_path *path)
 {
+   u64 bitmap_range = BITMAP_RANGE(fs_info->tree_root->sectorsize);
struct free_space_extent extents[] = {
-   {cache->key.objectid, 2 * BITMAP_RANGE},
+   {cache->key.objectid, 2 * bitmap_range},
};
int ret;
 
@@ -287,15 +291,15 @@ static int test_merge_left(struct btrfs_trans_handle 
*trans,
}
 
ret = __add_to_free_space_tree(trans, fs_info, cache, path,
-  cache->key.objectid, BITMAP_RANGE);
+   

[PATCH V21 10/19] Btrfs: subpage-blocksize: Allow mounting filesystems where sectorsize < PAGE_SIZE

2016-10-02 Thread Chandan Rajendra
This commit allows mounting filesystem instances with sectorsize smaller
than the PAGE_SIZE.

Since the code assumes that the super block is either equal to or larger
than sectorsize, this commit brings back the nodesize argument for
btrfs_find_create_tree_block() function. This change allows us to be
able to mount and use filesystems with 2048 bytes as the sectorsize.

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/disk-io.c | 21 -
 fs/btrfs/disk-io.h |  2 +-
 fs/btrfs/extent-tree.c |  4 ++--
 fs/btrfs/extent_io.c   |  3 +--
 fs/btrfs/extent_io.h   |  2 +-
 fs/btrfs/tree-log.c|  2 +-
 fs/btrfs/volumes.c | 10 +++---
 7 files changed, 17 insertions(+), 27 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 5663481..2684438 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -936,7 +936,7 @@ void readahead_tree_block(struct btrfs_root *root, u64 
bytenr)
 {
struct extent_buffer *buf = NULL;
 
-   buf = btrfs_find_create_tree_block(root, bytenr);
+   buf = btrfs_find_create_tree_block(root, bytenr, root->nodesize);
if (IS_ERR(buf))
return;
read_extent_buffer_pages(buf, WAIT_NONE, 0);
@@ -949,7 +949,7 @@ int reada_tree_block_flagged(struct btrfs_root *root, u64 
bytenr,
struct extent_buffer *buf = NULL;
int ret;
 
-   buf = btrfs_find_create_tree_block(root, bytenr);
+   buf = btrfs_find_create_tree_block(root, bytenr, root->nodesize);
if (IS_ERR(buf))
return 0;
 
@@ -979,12 +979,12 @@ struct extent_buffer *btrfs_find_tree_block(struct 
btrfs_fs_info *fs_info,
 }
 
 struct extent_buffer *btrfs_find_create_tree_block(struct btrfs_root *root,
-u64 bytenr)
+u64 bytenr, u32 blocksize)
 {
if (btrfs_is_testing(root->fs_info))
return alloc_test_extent_buffer(root->fs_info->eb_info, bytenr,
-   root->nodesize);
-   return alloc_extent_buffer(root->fs_info, bytenr);
+   blocksize);
+   return alloc_extent_buffer(root->fs_info, bytenr, blocksize);
 }
 
 
@@ -1006,7 +1006,7 @@ struct extent_buffer *read_tree_block(struct btrfs_root 
*root, u64 bytenr,
struct extent_buffer *buf = NULL;
int ret;
 
-   buf = btrfs_find_create_tree_block(root, bytenr);
+   buf = btrfs_find_create_tree_block(root, bytenr, root->nodesize);
if (IS_ERR(buf))
return buf;
 
@@ -3891,17 +3891,12 @@ static int btrfs_check_super_valid(struct btrfs_fs_info 
*fs_info,
 * Check sectorsize and nodesize first, other check will need it.
 * Check all possible sectorsize(4K, 8K, 16K, 32K, 64K) here.
 */
-   if (!is_power_of_2(sectorsize) || sectorsize < 4096 ||
+   if (!is_power_of_2(sectorsize) || sectorsize < 2048 ||
sectorsize > BTRFS_MAX_METADATA_BLOCKSIZE) {
printk(KERN_ERR "BTRFS: invalid sectorsize %llu\n", sectorsize);
ret = -EINVAL;
}
-   /* Only PAGE SIZE is supported yet */
-   if (sectorsize != PAGE_SIZE) {
-   printk(KERN_ERR "BTRFS: sectorsize %llu not supported yet, only 
support %lu\n",
-   sectorsize, PAGE_SIZE);
-   ret = -EINVAL;
-   }
+
if (!is_power_of_2(nodesize) || nodesize < sectorsize ||
nodesize > BTRFS_MAX_METADATA_BLOCKSIZE) {
printk(KERN_ERR "BTRFS: invalid nodesize %llu\n", nodesize);
diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
index 591f078..5f6263e 100644
--- a/fs/btrfs/disk-io.h
+++ b/fs/btrfs/disk-io.h
@@ -50,7 +50,7 @@ void readahead_tree_block(struct btrfs_root *root, u64 
bytenr);
 int reada_tree_block_flagged(struct btrfs_root *root, u64 bytenr,
 int mirror_num, struct extent_buffer **eb);
 struct extent_buffer *btrfs_find_create_tree_block(struct btrfs_root *root,
-  u64 bytenr);
+  u64 bytenr, u32 blocksize);
 void clean_tree_block(struct btrfs_trans_handle *trans,
  struct btrfs_fs_info *fs_info, struct extent_buffer *buf);
 int open_ctree(struct super_block *sb,
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 03da2f6..25fbfa2 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -8238,7 +8238,7 @@ btrfs_init_new_buffer(struct btrfs_trans_handle *trans, 
struct btrfs_root *root,
 {
struct extent_buffer *buf;
 
-   buf = btrfs_find_create_tree_block(root, bytenr);
+   buf = btrfs_find_create_tree_block(root, bytenr, root->nodesize);
if (IS_ERR(buf))
return buf;
 
@@ -8885,7 +8885,7 @@ static noinline int do_walk_down(str

[PATCH V21 07/19] Btrfs: subpage-blocksize: Use kmalloc()-ed memory to hold metadata blocks

2016-10-02 Thread Chandan Rajendra
For subpage-blocksizes this commit uses kmalloc()-ed memory to buffer
metadata blocks in memory.

When reading/writing metadata blocks, We now track the first extent
buffer using bio->bi_private. With kmalloc()-ed memory we cannot use
page->private. Hence when writing dirty extent buffers in
subpage-blocksize scenario, this commit forces each bio to contain a
single extent buffer. For the non subpage-blocksize scenario we continue
to track the corresponding extent buffer using page->private and hence a
single write bio will continue to have more than one dirty extent
buffer.

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/ctree.h |   6 +-
 fs/btrfs/disk-io.c   |  27 +++---
 fs/btrfs/extent_io.c | 204 +--
 fs/btrfs/extent_io.h |   8 +-
 fs/btrfs/tests/extent-io-tests.c |   4 +-
 5 files changed, 158 insertions(+), 91 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index b9ee7cf..745284c 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1491,14 +1491,16 @@ static inline void btrfs_set_token_##name(struct 
extent_buffer *eb, \
 #define BTRFS_SETGET_HEADER_FUNCS(name, type, member, bits)\
 static inline u##bits btrfs_##name(struct extent_buffer *eb)   \
 {  \
-   type *p = page_address(eb->pages[0]);   \
+   type *p = (type *)((u8 *)page_address(eb->pages[0]) \
+   + eb->pg_offset);   \
u##bits res = le##bits##_to_cpu(p->member); \
return res; \
 }  \
 static inline void btrfs_set_##name(struct extent_buffer *eb,  \
u##bits val)\
 {  \
-   type *p = page_address(eb->pages[0]);   \
+   type *p = (type *)((u8 *)page_address(eb->pages[0]) \
+   + eb->pg_offset);   \
p->member = cpu_to_le##bits(val);   \
 }
 
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 9ff48a7..5663481 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -448,13 +448,10 @@ static int btree_read_extent_buffer_pages(struct 
btrfs_root *root,
  * we only fill in the checksum field in the first page of a multi-page block
  */
 
-static int csum_dirty_buffer(struct btrfs_fs_info *fs_info, struct page *page)
+static int csum_dirty_buffer(struct btrfs_fs_info *fs_info,
+   struct extent_buffer *eb)
 {
-   struct extent_buffer *eb;
 
-   eb = (struct extent_buffer *)page->private;
-   if (page != eb->pages[0])
-   return 0;
ASSERT(memcmp_extent_buffer(eb, fs_info->fsid,
btrfs_header_fsid(), BTRFS_FSID_SIZE) == 0);
 
@@ -557,11 +554,10 @@ static int btree_readpage_end_io_hook(struct btrfs_io_bio 
*io_bio,
int ret = 0;
int reads_done;
 
-   if (!page->private)
+   eb = (io_bio->bio).bi_private;
+   if (!eb)
goto out;
 
-   eb = (struct extent_buffer *)page->private;
-
/* the pending IO might have been the only thing that kept this buffer
 * in memory.  Make sure we have a ref for all this other checks
 */
@@ -646,11 +642,11 @@ out:
return ret;
 }
 
-static int btree_io_failed_hook(struct page *page, int failed_mirror)
+static int btree_io_failed_hook(struct page *page, void *private,
+   int failed_mirror)
 {
-   struct extent_buffer *eb;
+   struct extent_buffer *eb = private;
 
-   eb = (struct extent_buffer *)page->private;
set_bit(EXTENT_BUFFER_READ_ERR, &eb->bflags);
eb->read_mirror = failed_mirror;
atomic_dec(&eb->io_pages);
@@ -829,11 +825,18 @@ int btrfs_wq_submit_bio(struct btrfs_fs_info *fs_info, 
struct bio *bio,
 
 static int btree_csum_one_bio(struct btrfs_fs_info *fs_info, struct bio *bio)
 {
+   struct extent_buffer *eb = bio->bi_private;
struct bio_vec *bvec;
int i, ret = 0;
 
bio_for_each_segment_all(bvec, bio, i) {
-   ret = csum_dirty_buffer(fs_info, bvec->bv_page);
+   if (eb->len >= PAGE_SIZE)
+   eb = (struct extent_buffer *)(bvec->bv_page->private);
+
+   if (bvec->bv_page != eb->pages[0])
+   continue;
+
+   ret = csum_dirty_buffer(fs_info, eb);
if (ret)
break;
}
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 6cac61f..8ace

[PATCH V21 15/19] Btrfs: subpage-blocksize: Enable dedupe ioctl

2016-10-02 Thread Chandan Rajendra
The function implementing the dedupe ioctl
i.e. btrfs_ioctl_file_extent_same(), returns with an error in
subpage-blocksize scenario. This was done due to the fact that Btrfs did
not have code to deal with block size < page size. This commit removes
this restriction since we now support "block size < page size".

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/ioctl.c | 10 --
 1 file changed, 10 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 4077fc1..cf13029 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3321,21 +3321,11 @@ ssize_t btrfs_dedupe_file_range(struct file *src_file, 
u64 loff, u64 olen,
 {
struct inode *src = file_inode(src_file);
struct inode *dst = file_inode(dst_file);
-   u64 bs = BTRFS_I(src)->root->fs_info->sb->s_blocksize;
ssize_t res;
 
if (olen > BTRFS_MAX_DEDUPE_LEN)
olen = BTRFS_MAX_DEDUPE_LEN;
 
-   if (WARN_ON_ONCE(bs < PAGE_SIZE)) {
-   /*
-* Btrfs does not support blocksize < page_size. As a
-* result, btrfs_cmp_data() won't correctly handle
-* this situation without an update.
-*/
-   return -EINVAL;
-   }
-
res = btrfs_extent_same(src, loff, olen, dst, dst_loff);
if (res)
return res;
-- 
2.5.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V21 01/19] Btrfs: subpage-blocksize: extent_clear_unlock_delalloc: Prevent page from being unlocked more than once

2016-10-02 Thread Chandan Rajendra
extent_clear_unlock_delalloc() can unlock a page more than once as shown
below (assume 4k as the block size and 64k as the page size).

cow_file_range
  create 4k ordered extent corresponding to page offsets 0 - 4095
  extent_clear_unlock_delalloc corresponding to page offsets 0 - 4095
unlock page
  create 4k ordered extent corresponding to page offsets 4096 - 8191
  extent_clear_unlock_delalloc corresponding to page offsets 4096 - 8191
unlock page

To prevent such a scenario this commit passes "delalloc end" to
extent_clear_unlock_delalloc() to help decide whether the page can be unlocked
or not.

NOTE: Since extent_clear_unlock_delalloc() is used by compression code
as well, the commit passes ordered extent "end" as the value for the
argument corresponding to "delalloc end" for invocations made from
compression code path. This will be fixed by a future commit that gets
compression to work in subpage-blocksize scenario.

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/extent_io.c | 16 +++
 fs/btrfs/extent_io.h |  5 ++--
 fs/btrfs/inode.c | 78 +---
 3 files changed, 57 insertions(+), 42 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index f669240..dc60c604 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1708,9 +1708,8 @@ out_failed:
 }
 
 void extent_clear_unlock_delalloc(struct inode *inode, u64 start, u64 end,
-struct page *locked_page,
-unsigned clear_bits,
-unsigned long page_ops)
+   u64 delalloc_end, struct page *locked_page,
+   unsigned clear_bits, unsigned long page_ops)
 {
struct extent_io_tree *tree = &BTRFS_I(inode)->io_tree;
int ret;
@@ -1718,6 +1717,7 @@ void extent_clear_unlock_delalloc(struct inode *inode, 
u64 start, u64 end,
unsigned long index = start >> PAGE_SHIFT;
unsigned long end_index = end >> PAGE_SHIFT;
unsigned long nr_pages = end_index - index + 1;
+   u64 page_end;
int i;
 
clear_extent_bit(tree, start, end, clear_bits, 1, 0, NULL, GFP_NOFS);
@@ -1748,8 +1748,14 @@ void extent_clear_unlock_delalloc(struct inode *inode, 
u64 start, u64 end,
SetPageError(pages[i]);
if (page_ops & PAGE_END_WRITEBACK)
end_page_writeback(pages[i]);
-   if (page_ops & PAGE_UNLOCK)
-   unlock_page(pages[i]);
+
+   if (page_ops & PAGE_UNLOCK) {
+   page_end = page_offset(pages[i]) +
+   PAGE_SIZE - 1;
+   if ((page_end <= end)
+   || (end == delalloc_end))
+   unlock_page(pages[i]);
+   }
put_page(pages[i]);
}
nr_pages -= ret;
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 06b6f14..0948bca 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -430,9 +430,8 @@ int map_private_extent_buffer(struct extent_buffer *eb, 
unsigned long offset,
 void extent_range_clear_dirty_for_io(struct inode *inode, u64 start, u64 end);
 void extent_range_redirty_for_io(struct inode *inode, u64 start, u64 end);
 void extent_clear_unlock_delalloc(struct inode *inode, u64 start, u64 end,
-struct page *locked_page,
-unsigned bits_to_clear,
-unsigned long page_ops);
+   u64 delalloc_end, struct page *locked_page,
+   unsigned bits_to_clear, unsigned long page_ops);
 struct bio *
 btrfs_bio_alloc(struct block_device *bdev, u64 first_sector, int nr_vecs,
gfp_t gfp_flags);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 3440b52..3e4feac 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -560,12 +560,13 @@ cont:
 * we don't need to create any more async work items.
 * Unlock and free up our temp pages.
 */
-   extent_clear_unlock_delalloc(inode, start, end, NULL,
-clear_flags, PAGE_UNLOCK |
-PAGE_CLEAR_DIRTY |
-PAGE_SET_WRITEBACK |
-page_error_op |
-PAGE_END_WRITEBACK);
+   extent_clear_unlock_delalloc(inode, start, end, end,
+   NULL, cle

[PATCH V21 03/19] Btrfs: subpage-blocksize: Use PG_Uptodate flag to track block uptodate status

2016-10-02 Thread Chandan Rajendra
This commit causes a block's uptodate status to be tracked using
struct page's PG_Uptodate flag instead of extent_io_tree's
EXTENT_UPTODATE flag.

This is in preparation for subpage-blocksize patchset which will use a
per-page bitmap for tracking individual block's uptodate status in the
case of blocksize < PAGE_SIZE. We will continue to use PG_Uptodate flag
to track uptodate status for blocksize == PAGE_SIZE scenario.

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/extent_io.c | 61 +++-
 fs/btrfs/extent_io.h |  2 +-
 fs/btrfs/inode.c |  6 ++
 3 files changed, 11 insertions(+), 58 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index dd7faa1..522c943 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1950,12 +1950,9 @@ int test_range_bit(struct extent_io_tree *tree, u64 
start, u64 end,
  * helper function to set a given page up to date if all the
  * extents in the tree for that page are up to date
  */
-static void check_page_uptodate(struct extent_io_tree *tree, struct page *page)
+static void check_page_uptodate(struct page *page)
 {
-   u64 start = page_offset(page);
-   u64 end = start + PAGE_SIZE - 1;
-   if (test_range_bit(tree, start, end, EXTENT_UPTODATE, 1, NULL))
-   SetPageUptodate(page);
+   SetPageUptodate(page);
 }
 
 int free_io_failure(struct extent_io_tree *failure_tree,
@@ -2492,18 +2489,6 @@ static void end_bio_extent_writepage(struct bio *bio)
bio_put(bio);
 }
 
-static void
-endio_readpage_release_extent(struct extent_io_tree *tree, u64 start, u64 len,
- int uptodate)
-{
-   struct extent_state *cached = NULL;
-   u64 end = start + len - 1;
-
-   if (uptodate && tree->track_uptodate)
-   set_extent_uptodate(tree, start, end, &cached, GFP_ATOMIC);
-   unlock_extent_cached(tree, start, end, &cached, GFP_ATOMIC);
-}
-
 /*
  * after a readpage IO is done, we need to:
  * clear the uptodate bits on error
@@ -2525,8 +2510,6 @@ static void end_bio_extent_readpage(struct bio *bio)
u64 start;
u64 end;
u64 len;
-   u64 extent_start = 0;
-   u64 extent_len = 0;
int mirror;
int ret;
int i;
@@ -2612,7 +2595,7 @@ readpage_ok:
off = i_size & (PAGE_SIZE-1);
if (page->index == end_index && off)
zero_user_segment(page, off, PAGE_SIZE);
-   SetPageUptodate(page);
+   check_page_uptodate(page);
} else {
ClearPageUptodate(page);
SetPageError(page);
@@ -2620,32 +2603,10 @@ readpage_ok:
unlock_page(page);
offset += len;
 
-   if (unlikely(!uptodate)) {
-   if (extent_len) {
-   endio_readpage_release_extent(tree,
- extent_start,
- extent_len, 1);
-   extent_start = 0;
-   extent_len = 0;
-   }
-   endio_readpage_release_extent(tree, start,
- end - start + 1, 0);
-   } else if (!extent_len) {
-   extent_start = start;
-   extent_len = end + 1 - start;
-   } else if (extent_start + extent_len == start) {
-   extent_len += end + 1 - start;
-   } else {
-   endio_readpage_release_extent(tree, extent_start,
- extent_len, uptodate);
-   extent_start = start;
-   extent_len = end + 1 - start;
-   }
+   unlock_extent_cached(tree, start, end, NULL, GFP_ATOMIC);
+
}
 
-   if (extent_len)
-   endio_readpage_release_extent(tree, extent_start, extent_len,
- uptodate);
if (io_bio->end_io)
io_bio->end_io(io_bio, bio->bi_error);
bio_put(bio);
@@ -2933,18 +2894,15 @@ static int __do_readpage(struct extent_io_tree *tree,
 
if (cur >= last_byte) {
char *userpage;
-   struct extent_state *cached = NULL;
 
iosize = PAGE_SIZE - pg_offset;
userpage = kmap_atomic(page);
memset(userpage + pg_offset, 0, iosize);
flush_dcache_page(page);
kunmap_atomic(userpage);
-   set_extent_uptodate(tree, cur, cur + iosize - 1,
-   &

[PATCH V21 02/19] Btrfs: subpage-blocksize: Make sure delalloc range intersects with the locked page's range

2016-10-02 Thread Chandan Rajendra
find_delalloc_range indirectly depends on EXTENT_UPTODDATE to make sure that
the delalloc range returned intersects with the file range mapped by the
page. Since we now track "uptodate" state in a per-page
bitmap (i.e. in btrfs_page_private->bstate), this commit makes an explicit
check to make sure that the delalloc range starts from within the file range
mapped by the page.

Reviewed-by: Josef Bacik 
Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/extent_io.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index dc60c604..dd7faa1 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1477,6 +1477,7 @@ out:
  * 1 is returned if we find something, 0 if nothing was in the tree
  */
 static noinline u64 find_delalloc_range(struct extent_io_tree *tree,
+   struct page *locked_page,
u64 *start, u64 *end, u64 max_bytes,
struct extent_state **cached_state)
 {
@@ -1485,6 +1486,9 @@ static noinline u64 find_delalloc_range(struct 
extent_io_tree *tree,
u64 cur_start = *start;
u64 found = 0;
u64 total_bytes = 0;
+   u64 page_end;
+
+   page_end = page_offset(locked_page) + PAGE_SIZE - 1;
 
spin_lock(&tree->lock);
 
@@ -1505,7 +1509,8 @@ static noinline u64 find_delalloc_range(struct 
extent_io_tree *tree,
  (state->state & EXTENT_BOUNDARY))) {
goto out;
}
-   if (!(state->state & EXTENT_DELALLOC)) {
+   if (!(state->state & EXTENT_DELALLOC)
+   || (page_end < state->start)) {
if (!found)
*end = state->end;
goto out;
@@ -1643,8 +1648,9 @@ again:
/* step one, find a bunch of delalloc bytes starting at start */
delalloc_start = *start;
delalloc_end = 0;
-   found = find_delalloc_range(tree, &delalloc_start, &delalloc_end,
-   max_bytes, &cached_state);
+   found = find_delalloc_range(tree, locked_page,
+   &delalloc_start, &delalloc_end,
+   max_bytes, &cached_state);
if (!found || delalloc_end <= *start) {
*start = delalloc_start;
*end = delalloc_end;
-- 
2.5.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V21 00/19] Allow I/O on blocks whose size is less than page size

2016-10-02 Thread Chandan Rajendra
   a stripe. For example, With 4k block size, 64K stripe size
   and 64K page size, assume
   - All the blocks mapped by the page are contiguous on the logical
 address space.
   - The first block of the page is mapped to the second block of the
 stripe.
   In such a scenario, we would add all the blocks of the page to
   bio. This would mean that we would overflow the stripe by one 4K
   block. Hence this patchset removes the optimization and invokes
   submit_extent_page() for every dirty 4K block.
3. The following patches are newly added:
   - Btrfs: subpage-blocksize: __btrfs_lookup_bio_sums: Set offset
 when moving to a new bio_vec 
   - Btrfs: subpage-blocksize: Make file extent relocate code subpage
 blocksize aware 
   - Btrfs: btrfs_clone: Flush dirty blocks of a page that do not map
 the clone range

Changes from V14:
1. Fix usage of cleancache_get_page() in __do_readpage().
   In filesystems which support subpage-blocksize scenario, a page can
   map one or more blocks. Hence cleancache_get_page() should be
   invoked only when the page maps a non-hole extent and block size
   being used is equal to the page size. Thanks to David Sterba for
   pointing this out.
2. Replace page_read_complete() and page_write_complete() functions
   with page_io_complete().
3. Provide more documentation (as part of both commit message and code
   comments) about the usage of the per-page
   btrfs_page_private->io_lock.

Changes from V13:
1. Enable dedup ioctl to work in subpagesize-blocksize scenario.

Changes from V12:
1. The logic in the function btrfs_punch_hole() has been fixed to
   check for the presence of BLK_STATE_UPTODATE flags for blocks in
   pages which partially map the file range being punched.
   
Changes from V11:
1. Addressed the review comments provided by Liu Bo for version V11.
2. Fixed file defragmentation code to work in subpagesize-blocksize
   scenario.
3. Many "hard to reproduce" bugs were fixed.

Chandan Rajendra (19):
  Btrfs: subpage-blocksize: extent_clear_unlock_delalloc: Prevent page
from being unlocked more than once
  Btrfs: subpage-blocksize: Make sure delalloc range intersects with the
locked page's range
  Btrfs: subpage-blocksize: Use PG_Uptodate flag to track block uptodate
status
  Btrfs: Remove extent_io_tree's track_uptodate member
  Btrfs: subpage-blocksize: Fix whole page read.
  Btrfs: subpage-blocksize: Fix whole page write
  Btrfs: subpage-blocksize: Use kmalloc()-ed memory to hold metadata
blocks
  Btrfs: subpage-blocksize: Execute sanity tests on all possible block
sizes
  Btrfs: subpage-blocksize: Compute free space tree BITMAP_RANGE based
on sectorsize
  Btrfs: subpage-blocksize: Allow mounting filesystems where sectorsize
< PAGE_SIZE
  Btrfs: subpage-blocksize: Deal with partial ordered extent
allocations.
  Btrfs: subpage-blocksize: Explicitly track I/O status of blocks of an
ordered extent.
  Btrfs: subpage-blocksize: btrfs_punch_hole: Fix uptodate blocks check
  Btrfs: subpage-blocksize: Fix file defragmentation code
  Btrfs: subpage-blocksize: Enable dedupe ioctl
  Btrfs: subpage-blocksize: btrfs_clone: Flush dirty blocks of a page
that do not map the clone range
  Btrfs: subpage-blocksize: Make file extent relocate code subpage
blocksize aware
  Btrfs: subpage-blocksize: __btrfs_lookup_bio_sums: Set offset when
moving to a new bio_vec
  Btrfs: subpage-blocksize: Disable compression

 fs/btrfs/ctree.h   |   6 +-
 fs/btrfs/disk-io.c |  49 +--
 fs/btrfs/disk-io.h |   2 +-
 fs/btrfs/extent-tree.c |   4 +-
 fs/btrfs/extent_io.c   | 739 +
 fs/btrfs/extent_io.h   |  99 -
 fs/btrfs/file-item.c   |   7 +-
 fs/btrfs/file.c| 105 -
 fs/btrfs/inode.c   | 472 +++--
 fs/btrfs/ioctl.c   | 232 +++
 fs/btrfs/ordered-data.c|  19 +
 fs/btrfs/ordered-data.h|   4 +
 fs/btrfs/relocation.c  |  87 +++-
 fs/btrfs/super.c   |  19 +
 fs/btrfs/tests/btrfs-tests.c   |   8 +-
 fs/btrfs/tests/extent-io-tests.c   |   4 +-
 fs/btrfs/tests/free-space-tree-tests.c |  79 ++--
 fs/btrfs/tree-log.c|   2 +-
 fs/btrfs/volumes.c |  10 +-
 19 files changed, 1373 insertions(+), 574 deletions(-)

-- 
2.5.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/6] Btrfs: free space tree and sanity test fixes

2016-09-28 Thread Chandan Rajendra
On Thursday, September 22, 2016 05:22:31 PM Omar Sandoval wrote:
> From: Omar Sandoval 
> 
> This is v2 of my earlier series "Btrfs: fix free space tree
> bitmaps+tests on big-endian systems" [1]. Patches 1, 4, and 5 are the
> same as patches 1, 2, and 3 from the original series. I've added patch 2
> to fix another bug I noticed (an xfstest went out earlier). Patch 3 is
> the result of the earlier discussion here [2]. Finally, patch 6 was
> necessary to get the sanity tests to run on my MIPS emulator.
> 
> This series applies to v4.8-rc7. The sanity tests pass on both x86-64
> and MIPS, and there are no xfstests regressions. Chandan and Anatoly,
> could you test these out as well?

Hello Omar,

I have executed xfstests on a big endian ppc64 guest with 'MOUNT_OPTIONS="-o
space_cache=v2"' config option. I have also executed generic/127 on a
filesystem created using "fragment-free-space-tree.py" that you had provided
sometime ago. I did not notice any regressions during the test runs.

Tested-by: Chandan Rajendra 

> 
> I'm working on the btrfs-progs follow up, but these patches are safe
> without that -- the new FREE_SPACE_TREE_VALID bit will stop all versions
> of btrfs-progs from mounting read-write.
> 
> Thanks!
> 
> 1: http://marc.info/?l=linux-btrfs&m=146853909905570&w=2
> 2: http://marc.info/?l=linux-btrfs&m=147448992301110&w=2
> 
> Cc: Chandan Rajendra 
> Cc: Anatoly Pugachev 
> 
> Omar Sandoval (6):
>   Btrfs: fix free space tree bitmaps on big-endian systems
>   Btrfs: fix mount -o clear_cache,space_cache=v2
>   Btrfs: catch invalid free space trees
>   Btrfs: fix extent buffer bitmap tests on big-endian systems
>   Btrfs: expand free space tree sanity tests to catch endianness bug
>   Btrfs: use less memory for delalloc sanity tests
> 
>  fs/btrfs/ctree.h   |   3 +-
>  fs/btrfs/disk-io.c |  33 ---
>  fs/btrfs/extent_io.c   |  64 +
>  fs/btrfs/extent_io.h   |  22 +
>  fs/btrfs/free-space-tree.c |  19 ++--
>  fs/btrfs/tests/extent-io-tests.c   |  95 +++
>  fs/btrfs/tests/free-space-tree-tests.c | 164 
> +++--
>  include/uapi/linux/btrfs.h |  10 +-
>  8 files changed, 261 insertions(+), 149 deletions(-)
> 
> 

-- 
chandan

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: Free fs_info->eb_info only when it holds a valid pointer

2016-09-14 Thread Chandan Rajendra
The following command line sequence causes a NULL pointer dereference,

mount /dev/loop0 /mnt/dir1
mount /dev/loop0 /mnt/dir2

[  159.964194] BUG: unable to handle kernel NULL pointer dereference at 
0070
[  159.965147] IP: [] list_lru_destroy+0x8/0x20
[  159.965147] PGD 0
[  159.965147] Oops:  [#1] SMP DEBUG_PAGEALLOC
[  159.965147] Modules linked in:
[  159.965147] CPU: 2 PID: 3043 Comm: mount Not tainted 4.7.0-ge96efee1-dirty #5
[  159.965147] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
Bochs 01/01/2011
[  159.965147] task: 8818b511a400 task.stack: 8818a5108000
[  159.965147] RIP: 0010:[]  [] 
list_lru_destroy+0x8/0x20
[  159.965147] RSP: 0018:8818a510bbd8  EFLAGS: 00010246
[  159.965147] RAX:  RBX: 0070 RCX: c100
[  159.965147] RDX: 82041b78 RSI: 8818b511a400 RDI: 0070
[  159.965147] RBP: 8818a510bbe0 R08: 8818a5108000 R09: 8818a50a6000
[  159.965147] R10:  R11: 00253e9b4bd4 R12: 82098f80
[  159.965147] R13: 8818b266e000 R14: 8818a476 R15: 8818b5449b50
[  159.965147] FS:  7f29f0ab0840() GS:88193348() 
knlGS:
[  159.965147] CS:  0010 DS:  ES:  CR0: 80050033
[  159.965147] CR2: 0070 CR3: 0018a5239000 CR4: 06e0
[  159.965147] Stack:
[  159.965147]   8818a510bcb8 813464e5 
000fa510bc00
[  159.965147]  00200100243e 8818a4004d18 00080046 
8818a510bc20
[  159.965147]  8148243e 8818a510bc50 8818b5449b30 
0008
[  159.965147] Call Trace:
[  159.965147]  [] btrfs_mount+0xad5/0xee0
[  159.965147]  [] ? find_next_zero_bit+0x1e/0x20
[  159.965147]  [] mount_fs+0x34/0x160
[  159.965147]  [] ? __alloc_percpu+0x10/0x20
[  159.965147]  [] vfs_kern_mount+0x62/0x100
[  159.965147]  [] btrfs_mount+0x186/0xee0
[  159.965147]  [] ? find_next_zero_bit+0x1e/0x20
[  159.965147]  [] mount_fs+0x34/0x160
[  159.965147]  [] ? __alloc_percpu+0x10/0x20
[  159.965147]  [] vfs_kern_mount+0x62/0x100
[  159.965147]  [] do_mount+0x1b6/0xc40
[  159.965147]  [] ? memdup_user+0x3d/0x70
[  159.965147]  [] SyS_mount+0x7e/0xd0
[  159.965147]  [] entry_SYSCALL_64_fastpath+0x13/0x8f
[  159.965147] Code: 89 08 48 89 e5 48 8b 02 48 89 70 08 48 89 06 48 89 56 08 
48 89 32 5d 48 83 6f 10 01 c3 66 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb <48> 8b 
3f 48 85 ff 74 0c e8 9b 99 02 00 48 c7 03 00 00 00 00 5b
[  159.965147] RIP  [] list_lru_destroy+0x8/0x20
[  159.965147]  RSP 
[  159.965147] CR2: 0070
[  159.999634] ---[ end trace 04bad43e08a10198 ]---

When servicing the second mount command, btrfs_mount() invokes
free_fs_info() because super_block->s_root is already set. At this
instance we would not have initialized btrfs_fs_info->eb_info to a valid
memory address. Hence the statement,
list_lru_destroy(&fs_info->eb_info->lru_list); causes a NULL pointer
dereference.

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/ctree.h | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index ee6956c..33ce069 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2882,8 +2882,11 @@ static inline int btrfs_need_cleaner_sleep(struct 
btrfs_root *root)
 
 static inline void free_fs_info(struct btrfs_fs_info *fs_info)
 {
-   list_lru_destroy(&fs_info->eb_info->lru_list);
-   kfree(fs_info->eb_info);
+   if (fs_info->eb_info) {
+   list_lru_destroy(&fs_info->eb_info->lru_list);
+   kfree(fs_info->eb_info);
+   }
+
kfree(fs_info->balance_ctl);
kfree(fs_info->delayed_root);
kfree(fs_info->extent_root);
-- 
2.5.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/7] Btrfs: kill the btree_inode

2016-09-07 Thread Chandan Rajendra
On Friday, September 02, 2016 03:40:05 PM Josef Bacik wrote:

Please find my comment inlined below,

> In order to more efficiently support sub-page blocksizes we need to stop
> allocating pages from pagecache for our metadata.  Instead switch to using the
> account_metadata* counters for making sure we are keeping the system aware of
> how much dirty metadata we have, and use the ->free_cached_objects super
> operation in order to handle freeing up extent buffers.  This greatly 
> simplifies
> how we deal with extent buffers as now we no longer have to tie the page cache
> reclaimation stuff to the extent buffer stuff.  This will also allow us to
> simply kmalloc() our data for sub-page blocksizes.
> 
> Signed-off-by: Josef Bacik 
> ---
>  fs/btrfs/btrfs_inode.h |   3 +-
>  fs/btrfs/ctree.c   |  10 +-
>  fs/btrfs/ctree.h   |  13 +-
>  fs/btrfs/disk-io.c | 389 --
>  fs/btrfs/extent_io.c   | 913 
> ++---
>  fs/btrfs/extent_io.h   |  49 +-
>  fs/btrfs/inode.c   |   6 +-
>  fs/btrfs/root-tree.c   |   2 +-
>  fs/btrfs/super.c   |  29 +-
>  fs/btrfs/tests/btrfs-tests.c   |  37 +-
>  fs/btrfs/tests/extent-io-tests.c   |   4 +-
>  fs/btrfs/tests/free-space-tree-tests.c |   4 +-
>  fs/btrfs/tests/qgroup-tests.c  |   4 +-
>  fs/btrfs/transaction.c |  11 +-
>  14 files changed, 726 insertions(+), 748 deletions(-)
> 
> diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
> index 1a8fa46..ad7b185 100644
> --- a/fs/btrfs/btrfs_inode.h
> +++ b/fs/btrfs/btrfs_inode.h
> @@ -229,10 +229,9 @@ static inline u64 btrfs_ino(struct inode *inode)
>   u64 ino = BTRFS_I(inode)->location.objectid;
> 
>   /*
> -  * !ino: btree_inode
>* type == BTRFS_ROOT_ITEM_KEY: subvol dir
>*/
> - if (!ino || BTRFS_I(inode)->location.type == BTRFS_ROOT_ITEM_KEY)
> + if (BTRFS_I(inode)->location.type == BTRFS_ROOT_ITEM_KEY)
>   ino = inode->i_ino;
>   return ino;
>  }
> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> index d1c56c9..b267053 100644
> --- a/fs/btrfs/ctree.c
> +++ b/fs/btrfs/ctree.c
> @@ -1373,8 +1373,8 @@ tree_mod_log_rewind(struct btrfs_fs_info *fs_info, 
> struct btrfs_path *path,
> 
>   if (tm->op == MOD_LOG_KEY_REMOVE_WHILE_FREEING) {
>   BUG_ON(tm->slot != 0);
> - eb_rewin = alloc_dummy_extent_buffer(fs_info, eb->start,
> - eb->len);
> + eb_rewin = alloc_dummy_extent_buffer(fs_info->eb_info,
> +  eb->start, eb->len);
>   if (!eb_rewin) {
>   btrfs_tree_read_unlock_blocking(eb);
>   free_extent_buffer(eb);
> @@ -1455,8 +1455,8 @@ get_old_root(struct btrfs_root *root, u64 time_seq)
>   } else if (old_root) {
>   btrfs_tree_read_unlock(eb_root);
>   free_extent_buffer(eb_root);
> - eb = alloc_dummy_extent_buffer(root->fs_info, logical,
> - root->nodesize);
> + eb = alloc_dummy_extent_buffer(root->fs_info->eb_info, logical,
> +root->nodesize);
>   } else {
>   btrfs_set_lock_blocking_rw(eb_root, BTRFS_READ_LOCK);
>   eb = btrfs_clone_extent_buffer(eb_root);
> @@ -1772,7 +1772,7 @@ static noinline int generic_bin_search(struct 
> extent_buffer *eb,
>   int err;
> 
>   if (low > high) {
> - btrfs_err(eb->fs_info,
> + btrfs_err(eb->eb_info->fs_info,
>"%s: low (%d) > high (%d) eb %llu owner %llu level %d",
> __func__, low, high, eb->start,
> btrfs_header_owner(eb), btrfs_header_level(eb));
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 282a031..ee6956c 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -37,6 +37,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include "extent_io.h"
>  #include "extent_map.h"
>  #include "async-thread.h"
> @@ -675,6 +676,7 @@ struct btrfs_device;
>  struct btrfs_fs_devices;
>  struct btrfs_balance_control;
>  struct btrfs_delayed_root;
> +struct btrfs_eb_info;
> 
>  #define BTRFS_FS_BARRIER 1
>  #define BTRFS_FS_CLOSING_START   2
> @@ -797,7 +799,7 @@ struct btrfs_fs_info {
>   struct btrfs_super_block *super_for_commit;
>   struct block_device *__bdev;
>   struct super_block *sb;
> - struct inode *btree_inode;
> + struct btrfs_eb_info *eb_info;
>   struct backing_dev_info bdi;
>   struct mutex tree_log_mutex;
>   struct mutex transaction_kthread_mutex;
> @@ -1042,10 +1044,6 @@ struct btrfs_fs_info {
>   /* readahead works cnt */
>   atomic_t reada_works_cnt;
> 
> 

Re: [PATCH V20 04/19] Btrfs: subpage-blocksize: Define extent_buffer_head

2016-07-27 Thread Chandan Rajendra
On Tuesday, July 26, 2016 01:42:08 PM Josef Bacik wrote:
> On 07/04/2016 12:34 AM, Chandan Rajendra wrote:
> > In order to handle multiple extent buffers per page, first we need to 
> > create a
> > way to handle all the extent buffers that are attached to a page.
> >
> > This patch creates a new data structure 'struct extent_buffer_head', and 
> > moves
> > fields that are common to all extent buffers from 'struct extent_buffer' to
> > 'struct extent_buffer_head'
> >
> > Also, this patch moves EXTENT_BUFFER_TREE_REF, EXTENT_BUFFER_DUMMY and
> > EXTENT_BUFFER_IN_TREE flags from extent_buffer->ebflags  to
> > extent_buffer_head->bflags.
> >
> > Reviewed-by: Liu Bo 
> > Signed-off-by: Chandan Rajendra 
> 
> I'm sorry Chandan I'm still having problems with this one.  XFS kmalloc()'s 
> its 
> sub pagesize ranges for it's metadata buffers, how about we do that instead 
> of 
> doing the extent_buffer_head.  Look at xfs_buf_allocate_memory() for what I'm 
> thinking.  Thanks,
> 

Ok. I will look into the xfs metadata buffer allocation code. Thanks for the
guidance.

-- 
chandan

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V20 01/19] Btrfs: subpage-blocksize: Fix whole page read.

2016-07-27 Thread Chandan Rajendra
On Tuesday, July 26, 2016 12:11:49 PM Josef Bacik wrote:
> On 07/04/2016 12:34 AM, Chandan Rajendra wrote:
> > For the subpage-blocksize scenario, a page can contain multiple
> > blocks. In such cases, this patch handles reading data from files.
> >
> > To track the status of individual blocks of a page, this patch makes use
> > of a bitmap pointed to by the newly introduced per-page 'struct
> > btrfs_page_private'.
> >
> > The per-page btrfs_page_private->io_lock plays the same role as
> > BH_Uptodate_Lock (see end_buffer_async_read()) i.e. without the io_lock
> > we may end up in the following situation,
> >
> > NOTE: Assume 64k page size and 4k block size. Also assume that the first
> > 12 blocks of the page are contiguous while the next 4 blocks are
> > contiguous. When reading the page we end up submitting two "logical
> > address space" bios. So end_bio_extent_readpage function is invoked
> > twice, once for each bio.
> >
> > |-+-+-|
> > | Task A  | Task B  | Task C  |
> > |-+-+-|
> > | end_bio_extent_readpage | | |
> > | process block 0 | | |
> > | - clear BLK_STATE_IO| | |
> > | - page_read_complete| | |
> > | process block 1 | | |
> > | | | |
> > | | | |
> > | | end_bio_extent_readpage | |
> > | | process block 0 | |
> > | | - clear BLK_STATE_IO| |
> > | | - page_read_complete| |
> > | | process block 1 | |
> > | | | |
> > | process block 11| process block 3 | |
> > | - clear BLK_STATE_IO| - clear BLK_STATE_IO| |
> > | - page_read_complete| - page_read_complete| |
> > |   - returns true|   - returns true| |
> > |   - unlock_page()   | | |
> > | | | lock_page() |
> > | |   - unlock_page()   | |
> > |-+-+-|
> >
> > We end up incorrectly unlocking the page twice and "Task C" ends up
> > working on an unlocked page. So private->io_lock makes sure that only
> > one of the tasks gets "true" as the return value when page_io_complete()
> > is invoked. As an optimization the patch gets the io_lock only when the
> > last block of the bio_vec is being processed.
> >
> > Signed-off-by: Chandan Rajendra 
> > ---
> >  fs/btrfs/extent_io.c | 371 
> > ---
> >  fs/btrfs/extent_io.h |  74 +-
> >  fs/btrfs/inode.c |  16 +--
> >  3 files changed, 338 insertions(+), 123 deletions(-)
> >
> > diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> > index e197d47..a349f99 100644
> > --- a/fs/btrfs/extent_io.c
> > +++ b/fs/btrfs/extent_io.c
> > @@ -24,6 +24,7 @@
> >
> >  static struct kmem_cache *extent_state_cache;
> >  static struct kmem_cache *extent_buffer_cache;
> > +static struct kmem_cache *page_private_cache;
> >  static struct bio_set *btrfs_bioset;
> >
> >  static inline bool extent_state_in_tree(const struct extent_state *state)
> > @@ -174,10 +175,16 @@ int __init extent_io_init(void)
> > if (!extent_buffer_cache)
> > goto free_state_cache;
> >
> > +   page_private_cache = kmem_cache_create("btrfs_page_private",
> > +   sizeof(struct btrfs_page_private), 0,
> > +   SLAB_RECLAIM_ACCOUNT | SLAB_MEM_SPREAD, NULL);
> > +   if (!page_private_cache)
> > +   goto free_buffer_cache;
> > +
> > btrfs_bioset = bioset_create(BIO_POOL_SIZE,
> >  offsetof(struct btrfs_io_bio, bio));
> > if (!btrfs_bioset)
> > -   goto free_buffer_cache;
> > +   goto free_page_priv

Re: [PATCH 0/3] Btrfs: fix free space tree bitmaps+tests on big-endian systems

2016-07-19 Thread Chandan Rajendra
On Monday, July 18, 2016 03:31:04 PM Omar Sandoval wrote:
> On Mon, Jul 18, 2016 at 02:43:26PM -0400, Chris Mason wrote:
> > 
> > 
> > On 07/17/2016 08:19 AM, Chandan Rajendra wrote:
> > > On Friday, July 15, 2016 12:15:15 PM Omar Sandoval wrote:
> > > > On Fri, Jul 15, 2016 at 12:34:10PM +0530, Chandan Rajendra wrote:
> > > > > On Thursday, July 14, 2016 07:47:04 PM Chris Mason wrote:
> > > > > > On 07/14/2016 07:31 PM, Omar Sandoval wrote:
> > > > > > > From: Omar Sandoval 
> > > > > > > 
> > > > > > > So it turns out that the free space tree bitmap handling has 
> > > > > > > always been
> > > > > > > broken on big-endian systems. Totally my bad.
> > > > > > > 
> > > > > > > Patch 1 fixes this. Technically, it's a disk format change for
> > > > > > > big-endian systems, but it never could have worked before, so I 
> > > > > > > won't go
> > > > > > > through the trouble of any incompat bits. If you've somehow been 
> > > > > > > using
> > > > > > > space_cache=v2 on a big-endian system (I doubt anyone is), you're 
> > > > > > > going
> > > > > > > to want to mount with nospace_cache to clear it and wait for this 
> > > > > > > to go
> > > > > > > in.
> > > > > > > 
> > > > > > > Patch 2 fixes a similar error in the sanity tests (it's the same 
> > > > > > > as the
> > > > > > > v2 I posted here [1]) and patch 3 expands the sanity tests to 
> > > > > > > catch the
> > > > > > > oversight that patch 1 fixes.
> > > > > > > 
> > > > > > > Applies to v4.7-rc7. No regressions in xfstests, and the sanity 
> > > > > > > tests
> > > > > > > pass on x86_64 and MIPS.
> > > > > > 
> > > > > > Thanks for fixing this up Omar.  Any big endian friends want to try 
> > > > > > this
> > > > > > out in extended testing and make sure we've nailed it down?
> > > > > > 
> > > > > 
> > > > > Hi Omar & Chris,
> > > > > 
> > > > > I will run fstests with this patchset applied on ppc64 BE and inform 
> > > > > you about
> > > > > the results.
> > > > > 
> > > > 
> > > > Thanks, Chandan! I set up my xfstests for space_cache=v2 by doing:
> > > > 
> > > > mkfs.btrfs "$TEST_DEV"
> > > > mount -o space_cache=v2 "$TEST_DEV" "$TEST_DIR"
> > > > umount "$TEST_DEV"
> > > > 
> > > > and adding
> > > > 
> > > > export MOUNT_OPTIONS="-o space_cache=v2"
> > > > 
> > > > to local.config. btrfsck also needs the patch here [1].
> > > > 
> > > > 
> > > 
> > > Hi,
> > > 
> > > I did execute the fstests tests suite on ppc64 BE as per above 
> > > configuration
> > > and there were no new regressions. Also, I did execute fsx (via 
> > > generic/127)
> > > thrice on the same filesystem instance,
> > > 1. With the unpatched kernel and later
> > > 2. With the patched kernel and again
> > > 3. With the unpatched kernel
> > > ... there were no new regressions when executing the above steps.
> > 
> > Thanks Chandan!  But I'm a little confused.  If the patch is helping, we
> > should be storing bitmaps wrong on disk unpatched.  There should be problems
> > going back and forth.
> > 
> > -chris
> 
> Yeah, this should definitely not work. It's possible that things are
> just silently failing and getting corrupted if the module isn't built
> with CONFIG_BTRFS_ASSERT, but btrfsck v4.6.1 + my patch should catch
> that.
> 
> Chandan, is fsx creating enough fragmentation to trigger the switch to
> bitmaps? You can check with `btrfs inspect dump-tree`; there should be
> FREE_SPACE_BITMAP items. If there are only FREE_SPACE_EXTENT items, then
> it's not testing the right code path.
> 
> I have a script here [1] that I've been using to test the free space
> tree. When I ran it with `--check` on MIPS, it failed on the old kernel
> and passed with this series. If you stick a return after the call to
> `unlink_every_other_file()`, you'll get a nice, fragmented filesystem to
> feed to xfstests, as well.

You are right, There were only FREE_SPACE_EXTENT items in the filesystem that
was operated on by fsx. I executed fragment_free_space_tree.py to create a
filesystem with FREE_SPACE_BITMAP items. When such a filesystem is created
with the unpatched kernel, later mounted on a patched kernel and fsx executed
on it, I see that we fail assertion statements in free-space-tree.c. For e.g.

BTRFS error (device loop0): incorrect extent count for 289406976; counted 8186, 
expected 8192
BTRFS: assertion failed: 0, file: /root/repos/linux/fs/btrfs/free-space-tree.c, 
line: 1485

-- 
chandan

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] Btrfs: fix free space tree bitmaps+tests on big-endian systems

2016-07-17 Thread Chandan Rajendra
On Friday, July 15, 2016 12:15:15 PM Omar Sandoval wrote:
> On Fri, Jul 15, 2016 at 12:34:10PM +0530, Chandan Rajendra wrote:
> > On Thursday, July 14, 2016 07:47:04 PM Chris Mason wrote:
> > > On 07/14/2016 07:31 PM, Omar Sandoval wrote:
> > > > From: Omar Sandoval 
> > > >
> > > > So it turns out that the free space tree bitmap handling has always been
> > > > broken on big-endian systems. Totally my bad.
> > > >
> > > > Patch 1 fixes this. Technically, it's a disk format change for
> > > > big-endian systems, but it never could have worked before, so I won't go
> > > > through the trouble of any incompat bits. If you've somehow been using
> > > > space_cache=v2 on a big-endian system (I doubt anyone is), you're going
> > > > to want to mount with nospace_cache to clear it and wait for this to go
> > > > in.
> > > >
> > > > Patch 2 fixes a similar error in the sanity tests (it's the same as the
> > > > v2 I posted here [1]) and patch 3 expands the sanity tests to catch the
> > > > oversight that patch 1 fixes.
> > > >
> > > > Applies to v4.7-rc7. No regressions in xfstests, and the sanity tests
> > > > pass on x86_64 and MIPS.
> > > 
> > > Thanks for fixing this up Omar.  Any big endian friends want to try this 
> > > out in extended testing and make sure we've nailed it down?
> > >
> > 
> > Hi Omar & Chris,
> > 
> > I will run fstests with this patchset applied on ppc64 BE and inform you 
> > about
> > the results.
> > 
> 
> Thanks, Chandan! I set up my xfstests for space_cache=v2 by doing:
> 
> mkfs.btrfs "$TEST_DEV"
> mount -o space_cache=v2 "$TEST_DEV" "$TEST_DIR"
> umount "$TEST_DEV"
> 
> and adding
> 
> export MOUNT_OPTIONS="-o space_cache=v2"
> 
> to local.config. btrfsck also needs the patch here [1].
> 
> 

Hi,

I did execute the fstests tests suite on ppc64 BE as per above configuration
and there were no new regressions. Also, I did execute fsx (via generic/127)
thrice on the same filesystem instance,
1. With the unpatched kernel and later
2. With the patched kernel and again
3. With the unpatched kernel
... there were no new regressions when executing the above steps.

Tested-by: Chandan Rajendra 

-- 
chandan

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: allocate exact page array size in extent_buffer

2016-07-15 Thread Chandan Rajendra
On Friday, July 15, 2016 11:44:06 AM David Sterba wrote:
> On Fri, Jul 15, 2016 at 11:47:07AM +0530, Chandan Rajendra wrote:
> > On Thursday, July 14, 2016 02:29:32 PM David Sterba wrote:
> > > The calculation of extent_buffer::pages size was done for 4k PAGE_SIZE,
> > > but this wastes 15 unused pointers on arches with large page size. Eg.
> > > on ppc64 this gives 15 * 8 = 120 bytes.
> > >
> > 
> > The non PAGE_SIZE aligned extent buffer usage in page straddling tests in
> > test_eb_bitmaps() need atleast one more page. So how about the following ...
> > 
> > #define INLINE_EXTENT_BUFFER_PAGES(BTRFS_MAX_METADATA_BLOCKSIZE / 
> > PAGE_SIZE + 1)
> 
> Could the extra page pointer be normally used? Ie. not just for the sake
> of the tests. I'd rather not waste the bytes. As a compromise, we can do +1
> only if the tests are compiled in.
> 

I don't see any other scenario where the extra page pointer gets used. Also, I
just executed fstests with your patch applied and disabling self-tests from
the kernel configuration. The tests ran fine.

-- 
chandan

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] Btrfs: fix free space tree bitmaps+tests on big-endian systems

2016-07-15 Thread Chandan Rajendra
On Thursday, July 14, 2016 07:47:04 PM Chris Mason wrote:
> On 07/14/2016 07:31 PM, Omar Sandoval wrote:
> > From: Omar Sandoval 
> >
> > So it turns out that the free space tree bitmap handling has always been
> > broken on big-endian systems. Totally my bad.
> >
> > Patch 1 fixes this. Technically, it's a disk format change for
> > big-endian systems, but it never could have worked before, so I won't go
> > through the trouble of any incompat bits. If you've somehow been using
> > space_cache=v2 on a big-endian system (I doubt anyone is), you're going
> > to want to mount with nospace_cache to clear it and wait for this to go
> > in.
> >
> > Patch 2 fixes a similar error in the sanity tests (it's the same as the
> > v2 I posted here [1]) and patch 3 expands the sanity tests to catch the
> > oversight that patch 1 fixes.
> >
> > Applies to v4.7-rc7. No regressions in xfstests, and the sanity tests
> > pass on x86_64 and MIPS.
> 
> Thanks for fixing this up Omar.  Any big endian friends want to try this 
> out in extended testing and make sure we've nailed it down?
>

Hi Omar & Chris,

I will run fstests with this patchset applied on ppc64 BE and inform you about
the results.

> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
chandan

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: allocate exact page array size in extent_buffer

2016-07-14 Thread Chandan Rajendra
On Thursday, July 14, 2016 02:29:32 PM David Sterba wrote:
> The calculation of extent_buffer::pages size was done for 4k PAGE_SIZE,
> but this wastes 15 unused pointers on arches with large page size. Eg.
> on ppc64 this gives 15 * 8 = 120 bytes.
>

The non PAGE_SIZE aligned extent buffer usage in page straddling tests in
test_eb_bitmaps() need atleast one more page. So how about the following ...

#define INLINE_EXTENT_BUFFER_PAGES(BTRFS_MAX_METADATA_BLOCKSIZE / PAGE_SIZE 
+ 1)


> Signed-off-by: David Sterba 
> ---
>  fs/btrfs/ctree.h | 6 --
>  fs/btrfs/extent_io.c | 2 ++
>  fs/btrfs/extent_io.h | 8 +++-
>  3 files changed, 9 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 4274a7bfdaed..f914f6187753 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -66,12 +66,6 @@ struct btrfs_ordered_sum;
>  #define BTRFS_COMPAT_EXTENT_TREE_V0
> 
>  /*
> - * the max metadata block size.  This limit is somewhat artificial,
> - * but the memmove costs go through the roof for larger blocks.
> - */
> -#define BTRFS_MAX_METADATA_BLOCKSIZE 65536
> -
> -/*
>   * we can actually store much bigger names, but lets not confuse the rest
>   * of linux
>   */
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 75533adef998..6f468a1842e6 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -4660,6 +4660,8 @@ __alloc_extent_buffer(struct btrfs_fs_info *fs_info, 
> u64 start,
>   /*
>* Sanity checks, currently the maximum is 64k covered by 16x 4k pages
>*/
> + BUILD_BUG_ON(INLINE_EXTENT_BUFFER_PAGES * PAGE_SIZE
> + != BTRFS_MAX_METADATA_BLOCKSIZE);
>   BUILD_BUG_ON(BTRFS_MAX_METADATA_BLOCKSIZE
>   > MAX_INLINE_EXTENT_BUFFER_SIZE);
>   BUG_ON(len > MAX_INLINE_EXTENT_BUFFER_SIZE);
> diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
> index c0c1c4fef6ce..edfa1a0ab82b 100644
> --- a/fs/btrfs/extent_io.h
> +++ b/fs/btrfs/extent_io.h
> @@ -4,6 +4,12 @@
>  #include 
>  #include "ulist.h"
> 
> +/*
> + * The maximum metadata block size.  This limit is somewhat artificial,
> + * but the memmove costs go through the roof for larger blocks.
> + */
> +#define BTRFS_MAX_METADATA_BLOCKSIZE (65536U)
> +
>  /* bits for the extent state */
>  #define EXTENT_DIRTY (1U << 0)
>  #define EXTENT_WRITEBACK (1U << 1)
> @@ -118,7 +124,7 @@ struct extent_state {
>  #endif
>  };
> 
> -#define INLINE_EXTENT_BUFFER_PAGES 16
> +#define INLINE_EXTENT_BUFFER_PAGES(BTRFS_MAX_METADATA_BLOCKSIZE / 
> PAGE_SIZE)
>  #define MAX_INLINE_EXTENT_BUFFER_SIZE (INLINE_EXTENT_BUFFER_PAGES * 
> PAGE_SIZE)
>  struct extent_buffer {
>   u64 start;
> 

-- 
chandan

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: A lot warnings in dmesg while running thunderbird

2016-07-10 Thread Chandan Rajendra
On Friday, July 08, 2016 12:02:35 PM Chris Mason wrote:
> 
> On 07/08/2016 11:02 AM, Gabriel C wrote:
> > On 08.07.2016 14:41, Chris Mason wrote:
> >
> >>
> >>
> >> On 07/08/2016 05:57 AM, Gabriel C wrote:
> >>> 2016-07-07 21:21 GMT+02:00 Chris Mason :
> >>>>
> >>>>
> >>>> On 07/07/2016 06:24 AM, Gabriel C wrote:
> >>>>>
> >>>>> Hi,
> >>>>>
> >>>>> while running thunderbird on linux 4.6.3 and 4.7.0-rc6 ( didn't tested
> >>>>> other versions )
> >>>>> I trigger the following :
> >>>>
> >>>>
> >>>> I definitely thought we had this fixed in v4.7-rc.  Can you easily
> >>>> fsck this filesystem?  Something strange is going on.
> >>>
> >>> Yes , btrfs check and btrfs check  --check-data-csum are fine , no
> >>> errors found.
> >>>
> >>> If you want me to test any patches let me know.
> >>>
> >>
> >> Can you please try a v4.5 stable kernel?  I'm curious if this really
> >> is the same regression that I tried to fix in v4.7
> >>
> >
> > I'm on linux 4.5.7 now and everything is fine. I'm writing this email
> > from thunderbird.. which was not
> > possible in 4.6.3 or 4.7.-rc.
> >
> > Let me know you want me to test other kernels or whatever else may help
> > fixing this problem.
> >
> 
> Can you please run the attached test program:
> 
> gcc -o short-write short-write.c -lpthread
> ./short-write some-new-file-on-btrfs
> 
> I want to see if you're triggering the same problem we've tried to fix, 
> or something else.
>

Hi Chris,

I am able to reproduce the issue with the 'short-write' program. But before
the call trace associated with btrfs_destroy_inode(), I see the following call
trace ...

[ cut here ]
WARNING: CPU: 2 PID: 2311 at 
/home/chandan/repos/linux/fs/btrfs/extent-tree.c:4303 
btrfs_free_reserved_data_space_noquota+0xe8/0x100
Modules linked in:
CPU: 2 PID: 2311 Comm: short-write Not tainted 4.7.0-rc6-ga99cde4 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
  8818ceb8ba30 8145c2a1 
  8818ceb8ba70 81056a7c 10cf81346936
 8818bdba4800 1000 8818bdd5ee00 8818bf1bbd84
Call Trace:
 [] dump_stack+0x4d/0x6c
 [] __warn+0xcc/0xf0
 [] warn_slowpath_null+0x18/0x20
 [] btrfs_free_reserved_data_space_noquota+0xe8/0x100
 [] btrfs_clear_bit_hook+0x2f9/0x370
 [] clear_state_bit+0x55/0x1b0
 [] __clear_extent_bit+0x220/0x3b0
 [] ? __btrfs_qgroup_release_data+0x82/0x110
 [] clear_extent_bit+0x25/0x30
 [] btrfs_invalidatepage+0x273/0x2c0
 [] truncate_inode_page+0x83/0x90
 [] truncate_inode_pages_range+0x17a/0x6c0
 [] truncate_pagecache+0x42/0x60
 [] truncate_setsize+0x2d/0x40
 [] btrfs_setattr+0x1ef/0x320
 [] notify_change+0x1dc/0x380
 [] do_truncate+0x61/0xa0
 [] do_sys_ftruncate.constprop.17+0xf9/0x160
 [] SyS_ftruncate+0x9/0x10
 [] entry_SYSCALL_64_fastpath+0x13/0x8f
---[ end trace 5682b0d8e8a631ed ]---


I will continue to debug and find out the root cause.

-- 
chandan

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V20 14/19] Btrfs: subpage-blocksize: Enable dedupe ioctl

2016-07-03 Thread Chandan Rajendra
The function implementing the dedupe ioctl
i.e. btrfs_ioctl_file_extent_same(), returns with an error in
subpage-blocksize scenario. This was done due to the fact that Btrfs did
not have code to deal with block size < page size. This commit removes
this restriction since we now support "block size < page size".

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/ioctl.c | 10 --
 1 file changed, 10 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index fb92566..5d9062e 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3325,21 +3325,11 @@ ssize_t btrfs_dedupe_file_range(struct file *src_file, 
u64 loff, u64 olen,
 {
struct inode *src = file_inode(src_file);
struct inode *dst = file_inode(dst_file);
-   u64 bs = BTRFS_I(src)->root->fs_info->sb->s_blocksize;
ssize_t res;
 
if (olen > BTRFS_MAX_DEDUPE_LEN)
olen = BTRFS_MAX_DEDUPE_LEN;
 
-   if (WARN_ON_ONCE(bs < PAGE_SIZE)) {
-   /*
-* Btrfs does not support blocksize < page_size. As a
-* result, btrfs_cmp_data() won't correctly handle
-* this situation without an update.
-*/
-   return -EINVAL;
-   }
-
res = btrfs_extent_same(src, loff, olen, dst, dst_loff);
if (res)
return res;
-- 
2.5.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V20 19/19] Btrfs: subpage-blocksize: Rate limit scrub error message

2016-07-03 Thread Chandan Rajendra
btrfs/073 invokes scrub ioctl in a tight loop. In subpage-blocksize
scenario this results in a lot of "scrub: size assumption sectorsize !=
PAGE_SIZE " messages being printed on the console. To reduce the number
of such messages this commit uses btrfs_err_rl() instead of
btrfs_err().

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/scrub.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 86270c6..68c8a09 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -3856,7 +3856,7 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 
devid, u64 start,
 
if (fs_info->chunk_root->sectorsize != PAGE_SIZE) {
/* not supported for data w/o checksums */
-   btrfs_err(fs_info,
+   btrfs_err_rl(fs_info,
   "scrub: size assumption sectorsize != PAGE_SIZE "
   "(%d != %lu) fails",
   fs_info->chunk_root->sectorsize, PAGE_SIZE);
-- 
2.5.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V20 09/19] Btrfs: subpage-blocksize: Explicitly track I/O status of blocks of an ordered extent.

2016-07-03 Thread Chandan Rajendra
In subpage-blocksize scenario a page can have more than one block. So in
addition to PagePrivate2 flag, we would have to track the I/O status of
each block of a page to reliably mark the ordered extent as complete.

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/extent_io.c|  19 +--
 fs/btrfs/extent_io.h|   5 +-
 fs/btrfs/inode.c| 365 ++--
 fs/btrfs/ordered-data.c |  19 +++
 fs/btrfs/ordered-data.h |   4 +
 5 files changed, 296 insertions(+), 116 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 303b49e..694d2dc 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4661,11 +4661,10 @@ int extent_invalidatepage(struct extent_io_tree *tree,
  * to drop the page.
  */
 static int try_release_extent_state(struct extent_map_tree *map,
-   struct extent_io_tree *tree,
-   struct page *page, gfp_t mask)
+   struct extent_io_tree *tree,
+   struct page *page, u64 start, u64 end,
+   gfp_t mask)
 {
-   u64 start = page_offset(page);
-   u64 end = start + PAGE_SIZE - 1;
int ret = 1;
 
if (test_range_bit(tree, start, end,
@@ -4699,12 +4698,12 @@ static int try_release_extent_state(struct 
extent_map_tree *map,
  * map records are removed
  */
 int try_release_extent_mapping(struct extent_map_tree *map,
-  struct extent_io_tree *tree, struct page *page,
-  gfp_t mask)
+   struct extent_io_tree *tree, struct page *page,
+   u64 start, u64 end, gfp_t mask)
 {
struct extent_map *em;
-   u64 start = page_offset(page);
-   u64 end = start + PAGE_SIZE - 1;
+   u64 orig_start = start;
+   u64 orig_end = end;
 
if (gfpflags_allow_blocking(mask) &&
page->mapping->host->i_size > SZ_16M) {
@@ -4738,7 +4737,9 @@ int try_release_extent_mapping(struct extent_map_tree 
*map,
free_extent_map(em);
}
}
-   return try_release_extent_state(map, tree, page, mask);
+   return try_release_extent_state(map, tree, page,
+   orig_start, orig_end,
+   mask);
 }
 
 /*
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 91e7a75..2ea8451 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -279,8 +279,9 @@ typedef struct extent_map *(get_extent_t)(struct inode 
*inode,
 void extent_io_tree_init(struct extent_io_tree *tree,
 struct address_space *mapping);
 int try_release_extent_mapping(struct extent_map_tree *map,
-  struct extent_io_tree *tree, struct page *page,
-  gfp_t mask);
+   struct extent_io_tree *tree, struct page *page,
+   u64 start, u64 end,
+   gfp_t mask);
 int try_release_extent_buffer(struct page *page);
 int lock_extent_bits(struct extent_io_tree *tree, u64 start, u64 end,
 struct extent_state **cached);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index e9f9bb1..4ae5c25 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3072,56 +3072,119 @@ static void finish_ordered_fn(struct btrfs_work *work)
btrfs_finish_ordered_io(ordered_extent);
 }
 
-static int btrfs_writepage_end_io_hook(struct page *page, u64 start, u64 end,
-   struct extent_state *state, int uptodate)
+static void mark_blks_io_complete(struct btrfs_ordered_extent *ordered,
+   u64 blk, u64 nr_blks, int uptodate)
 {
-   struct inode *inode = page->mapping->host;
+   struct inode *inode = ordered->inode;
struct btrfs_root *root = BTRFS_I(inode)->root;
-   struct btrfs_ordered_extent *ordered_extent = NULL;
struct btrfs_workqueue *wq;
btrfs_work_func_t func;
-   u64 ordered_start, ordered_end;
int done;
 
-   trace_btrfs_writepage_end_io_hook(page, start, end, uptodate);
+   while (nr_blks--) {
+   if (test_and_set_bit(blk, ordered->blocks_done)) {
+   blk++;
+   continue;
+   }
 
-   ClearPagePrivate2(page);
-loop:
-   ordered_extent = btrfs_lookup_ordered_range(inode, start,
-   end - start + 1);
-   if (!ordered_extent)
-   goto out;
+   done = btrfs_dec_test_ordered_pending(inode, &ordered,
+   ordered->file_offset
+   + (blk << inode->i_blkbits),
+   root->sectorsize,
+

[PATCH V20 10/19] Btrfs: subpage-blocksize: btrfs_punch_hole: Fix uptodate blocks check

2016-07-03 Thread Chandan Rajendra
In case of subpage-blocksize, the file blocks to be punched may map only
part of a page. For file blocks inside such pages, we need to check for
the presence of BLK_STATE_UPTODATE flag.

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/file.c | 89 -
 1 file changed, 88 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 38f5e8e..89ded7b 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2335,6 +2335,8 @@ static int btrfs_punch_hole(struct inode *inode, loff_t 
offset, loff_t len)
struct btrfs_path *path;
struct btrfs_block_rsv *rsv;
struct btrfs_trans_handle *trans;
+   struct address_space *mapping = inode->i_mapping;
+   pgoff_t start_index, end_index;
u64 lockstart;
u64 lockend;
u64 tail_start;
@@ -2347,6 +2349,7 @@ static int btrfs_punch_hole(struct inode *inode, loff_t 
offset, loff_t len)
int err = 0;
unsigned int rsv_count;
bool same_block;
+   bool same_page;
bool no_holes = btrfs_fs_incompat(root->fs_info, NO_HOLES);
u64 ino_size;
bool truncated_block = false;
@@ -2443,11 +2446,45 @@ static int btrfs_punch_hole(struct inode *inode, loff_t 
offset, loff_t len)
goto out_only_mutex;
}
 
+   start_index = lockstart >> PAGE_SHIFT;
+   end_index = lockend >> PAGE_SHIFT;
+
+   same_page = lockstart >> PAGE_SHIFT
+   == lockend >> PAGE_SHIFT;
+
while (1) {
struct btrfs_ordered_extent *ordered;
+   struct page *start_page = NULL;
+   struct page *end_page = NULL;
+   u64 nr_pages;
+   int start_page_blks_uptodate;
+   int end_page_blks_uptodate;
 
truncate_pagecache_range(inode, lockstart, lockend);
 
+   if (lockstart & (PAGE_SIZE - 1)) {
+   start_page = find_or_create_page(mapping, start_index,
+   GFP_NOFS);
+   if (!start_page) {
+   inode_unlock(inode);
+   return -ENOMEM;
+   }
+   }
+
+   if (!same_page && ((lockend + 1) & (PAGE_SIZE - 1))) {
+   end_page = find_or_create_page(mapping, end_index,
+   GFP_NOFS);
+   if (!end_page) {
+   if (start_page) {
+   unlock_page(start_page);
+   put_page(start_page);
+   }
+   inode_unlock(inode);
+   return -ENOMEM;
+   }
+   }
+
+
lock_extent_bits(&BTRFS_I(inode)->io_tree, lockstart, lockend,
 &cached_state);
ordered = btrfs_lookup_first_ordered_extent(inode, lockend);
@@ -2457,18 +2494,68 @@ static int btrfs_punch_hole(struct inode *inode, loff_t 
offset, loff_t len)
 * and nobody raced in and read a page in this range, if we did
 * we need to try again.
 */
+   nr_pages = round_up(lockend, PAGE_SIZE)
+   - round_down(lockstart, PAGE_SIZE);
+   nr_pages >>= PAGE_SHIFT;
+
+   start_page_blks_uptodate = 0;
+   end_page_blks_uptodate = 0;
+   if (root->sectorsize < PAGE_SIZE) {
+   u64 page_end;
+
+   page_end = round_down(lockstart, PAGE_SIZE)
+   + PAGE_SIZE - 1;
+   page_end = min(page_end, lockend);
+   if (start_page
+   && PagePrivate(start_page)
+   && test_page_blks_state(start_page, 1 << 
BLK_STATE_UPTODATE,
+   lockstart, page_end, 0))
+   start_page_blks_uptodate = 1;
+   if (end_page
+   && PagePrivate(end_page)
+   && test_page_blks_state(end_page, 1 << 
BLK_STATE_UPTODATE,
+   page_offset(end_page), 
lockend, 0))
+   end_page_blks_uptodate = 1;
+   } else {
+   if (start_page && PagePrivate(start_page)
+   && PageUptodate(start_page))
+   start_page_blks_uptodate = 1;
+   if (end_page && PagePrivate(end_page)
+   && PageUptodate(end_page))
+

[PATCH V20 12/19] Revert "btrfs: fix lockups from btrfs_clear_path_blocking"

2016-07-03 Thread Chandan Rajendra
The patch "Btrfs: subpage-blocksize: Prevent writes to an extent buffer
when PG_writeback flag is set" requires btrfs_try_tree_write_lock() to
be a true try lock w.r.t to both spinning and blocking locks. During
2015's Vault Conference Btrfs meetup, Chris Mason had suggested that he
will write up a suitable locking function to be used when writing dirty
pages that map metadata blocks. Until we have a suitable locking
function available, this patch temporarily disables the commit
f82c458a2c3ffb94b431fc6ad791a79df1b3713e.
---
 fs/btrfs/ctree.c   | 14 --
 fs/btrfs/locking.c | 24 +++-
 fs/btrfs/locking.h |  2 --
 3 files changed, 15 insertions(+), 25 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 0a56d1b..394ad8e 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -81,6 +81,13 @@ noinline void btrfs_clear_path_blocking(struct btrfs_path *p,
 {
int i;
 
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+   /* lockdep really cares that we take all of these spinlocks
+* in the right order.  If any of the locks in the path are not
+* currently blocking, it is going to complain.  So, make really
+* really sure by forcing the path to blocking before we clear
+* the path blocking.
+*/
if (held) {
btrfs_set_lock_blocking_rw(held, held_rw);
if (held_rw == BTRFS_WRITE_LOCK)
@@ -89,6 +96,7 @@ noinline void btrfs_clear_path_blocking(struct btrfs_path *p,
held_rw = BTRFS_READ_LOCK_BLOCKING;
}
btrfs_set_path_blocking(p);
+#endif
 
for (i = BTRFS_MAX_LEVEL - 1; i >= 0; i--) {
if (p->nodes[i] && p->locks[i]) {
@@ -100,8 +108,10 @@ noinline void btrfs_clear_path_blocking(struct btrfs_path 
*p,
}
}
 
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
if (held)
btrfs_clear_lock_blocking_rw(held, held_rw);
+#endif
 }
 
 /* this also releases the path */
@@ -2922,7 +2932,7 @@ cow_done:
}
p->locks[level] = BTRFS_WRITE_LOCK;
} else {
-   err = btrfs_tree_read_lock_atomic(b);
+   err = btrfs_try_tree_read_lock(b);
if (!err) {
btrfs_set_path_blocking(p);
btrfs_tree_read_lock(b);
@@ -3054,7 +3064,7 @@ again:
}
 
level = btrfs_header_level(b);
-   err = btrfs_tree_read_lock_atomic(b);
+   err = btrfs_try_tree_read_lock(b);
if (!err) {
btrfs_set_path_blocking(p);
btrfs_tree_read_lock(b);
diff --git a/fs/btrfs/locking.c b/fs/btrfs/locking.c
index d13128c..8b50e60 100644
--- a/fs/btrfs/locking.c
+++ b/fs/btrfs/locking.c
@@ -132,26 +132,6 @@ again:
 }
 
 /*
- * take a spinning read lock.
- * returns 1 if we get the read lock and 0 if we don't
- * this won't wait for blocking writers
- */
-int btrfs_tree_read_lock_atomic(struct extent_buffer *eb)
-{
-   if (atomic_read(&eb->blocking_writers))
-   return 0;
-
-   read_lock(&eb->lock);
-   if (atomic_read(&eb->blocking_writers)) {
-   read_unlock(&eb->lock);
-   return 0;
-   }
-   atomic_inc(&eb->read_locks);
-   atomic_inc(&eb->spinning_readers);
-   return 1;
-}
-
-/*
  * returns 1 if we get the read lock and 0 if we don't
  * this won't wait for blocking writers
  */
@@ -182,7 +162,9 @@ int btrfs_try_tree_write_lock(struct extent_buffer *eb)
atomic_read(&eb->blocking_readers))
return 0;
 
-   write_lock(&eb->lock);
+   if (!write_trylock(&eb->lock))
+   return 0;
+
if (atomic_read(&eb->blocking_writers) ||
atomic_read(&eb->blocking_readers)) {
write_unlock(&eb->lock);
diff --git a/fs/btrfs/locking.h b/fs/btrfs/locking.h
index c44a9d5..b81e0e9 100644
--- a/fs/btrfs/locking.h
+++ b/fs/btrfs/locking.h
@@ -35,8 +35,6 @@ void btrfs_clear_lock_blocking_rw(struct extent_buffer *eb, 
int rw);
 void btrfs_assert_tree_locked(struct extent_buffer *eb);
 int btrfs_try_tree_read_lock(struct extent_buffer *eb);
 int btrfs_try_tree_write_lock(struct extent_buffer *eb);
-int btrfs_tree_read_lock_atomic(struct extent_buffer *eb);
-
 
 static inline void btrfs_tree_unlock_rw(struct extent_buffer *eb, int rw)
 {
-- 
2.5.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V20 13/19] Btrfs: subpage-blocksize: Fix file defragmentation code

2016-07-03 Thread Chandan Rajendra
This commit gets file defragmentation code to work in subpage-blocksize
scenario. It does this by keeping track of page offsets that mark block
boundaries and passing them as arguments to the functions that implement
the defragmentation logic.

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/ioctl.c | 198 ++-
 1 file changed, 136 insertions(+), 62 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 001c111..fb92566 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -904,12 +904,13 @@ out_unlock:
 static int check_defrag_in_cache(struct inode *inode, u64 offset, u32 thresh)
 {
struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
+   struct btrfs_root *root = BTRFS_I(inode)->root;
struct extent_map *em = NULL;
struct extent_map_tree *em_tree = &BTRFS_I(inode)->extent_tree;
u64 end;
 
read_lock(&em_tree->lock);
-   em = lookup_extent_mapping(em_tree, offset, PAGE_SIZE);
+   em = lookup_extent_mapping(em_tree, offset, root->sectorsize);
read_unlock(&em_tree->lock);
 
if (em) {
@@ -999,7 +1000,7 @@ static struct extent_map *defrag_lookup_extent(struct 
inode *inode, u64 start)
struct extent_map_tree *em_tree = &BTRFS_I(inode)->extent_tree;
struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
struct extent_map *em;
-   u64 len = PAGE_SIZE;
+   u64 len = BTRFS_I(inode)->root->sectorsize;
 
/*
 * hopefully we have this extent in the tree already, try without
@@ -1118,37 +1119,47 @@ out:
  * before calling this.
  */
 static int cluster_pages_for_defrag(struct inode *inode,
-   struct page **pages,
-   unsigned long start_index,
-   unsigned long num_pages)
+   struct page **pages,
+   unsigned long start_index,
+   size_t pg_offset,
+   unsigned long num_blks)
 {
-   unsigned long file_end;
u64 isize = i_size_read(inode);
+   u64 start_blk;
+   u64 end_blk;
u64 page_start;
u64 page_end;
u64 page_cnt;
+   u64 blk_cnt;
int ret;
int i;
int i_done;
struct btrfs_ordered_extent *ordered;
struct extent_state *cached_state = NULL;
struct extent_io_tree *tree;
+   struct btrfs_root *root;
gfp_t mask = btrfs_alloc_write_mask(inode->i_mapping);
 
-   file_end = (isize - 1) >> PAGE_SHIFT;
-   if (!isize || start_index > file_end)
+   root = BTRFS_I(inode)->root;
+   start_blk = (start_index << PAGE_SHIFT) + pg_offset;
+   start_blk >>= inode->i_blkbits;
+   end_blk = (isize - 1) >> inode->i_blkbits;
+   if (!isize || start_blk > end_blk)
return 0;
 
-   page_cnt = min_t(u64, (u64)num_pages, (u64)file_end - start_index + 1);
+   blk_cnt = min_t(u64, (u64)num_blks, (u64)end_blk - start_blk + 1);
 
ret = btrfs_delalloc_reserve_space(inode,
-   start_index << PAGE_SHIFT,
-   page_cnt << PAGE_SHIFT);
+   start_blk << inode->i_blkbits,
+   blk_cnt << inode->i_blkbits);
if (ret)
return ret;
i_done = 0;
tree = &BTRFS_I(inode)->io_tree;
 
+   page_cnt = DIV_ROUND_UP(pg_offset + (blk_cnt << inode->i_blkbits),
+   PAGE_SIZE);
+
/* step one, lock all the pages */
for (i = 0; i < page_cnt; i++) {
struct page *page;
@@ -1159,12 +1170,22 @@ again:
break;
 
page_start = page_offset(page);
-   page_end = page_start + PAGE_SIZE - 1;
+
+   if (i == 0)
+   page_start += pg_offset;
+
+   if (i == page_cnt - 1) {
+   page_end = (start_index << PAGE_SHIFT) + pg_offset;
+   page_end += (blk_cnt << inode->i_blkbits) - 1;
+   } else {
+   page_end = page_offset(page) + PAGE_SIZE - 1;
+   }
+
while (1) {
lock_extent_bits(tree, page_start, page_end,
 &cached_state);
-   ordered = btrfs_lookup_ordered_extent(inode,
- page_start);
+   ordered = btrfs_lookup_ordered_range(inode, page_start,
+   page_end - page_start + 
1);
unlock_extent_cached(tree, page_start, page_end,
 

[PATCH V20 17/19] Btrfs: subpage-blocksize: __btrfs_lookup_bio_sums: Set offset when moving to a new bio_vec

2016-07-03 Thread Chandan Rajendra
In __btrfs_lookup_bio_sums() we set the file offset value at the
beginning of every iteration of the while loop. This is incorrect since
the blocks mapped by the current bvec->bv_page might not yet have been
completely processed.

This commit fixes the issue by setting the file offset value when we
move to the next bvec of the bio.

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/file-item.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
index 62a81ee..fb6a7e8 100644
--- a/fs/btrfs/file-item.c
+++ b/fs/btrfs/file-item.c
@@ -222,11 +222,11 @@ static int __btrfs_lookup_bio_sums(struct btrfs_root 
*root,
disk_bytenr = (u64)bio->bi_iter.bi_sector << 9;
if (dio)
offset = logical_offset;
+   else
+   offset = page_offset(bvec->bv_page) + bvec->bv_offset;
 
page_bytes_left = bvec->bv_len;
while (bio_index < bio->bi_vcnt) {
-   if (!dio)
-   offset = page_offset(bvec->bv_page) + bvec->bv_offset;
count = btrfs_find_ordered_sum(inode, offset, disk_bytenr,
   (u32 *)csum, nblocks);
if (count)
@@ -301,6 +301,9 @@ found:
goto done;
}
bvec++;
+   if (!dio)
+   offset = page_offset(bvec->bv_page)
+   + bvec->bv_offset;
page_bytes_left = bvec->bv_len;
}
 
-- 
2.5.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V20 11/19] Btrfs: subpage-blocksize: Prevent writes to an extent buffer when PG_writeback flag is set

2016-07-03 Thread Chandan Rajendra
In non-subpage-blocksize scenario, BTRFS_HEADER_FLAG_WRITTEN flag
prevents Btrfs code from writing into an extent buffer whose pages are
under writeback. This facility isn't sufficient for achieving the same
in subpage-blocksize scenario, since we have more than one extent buffer
mapped to a page.

Hence this patch adds a new flag (i.e. EXTENT_BUFFER_HEAD_WRITEBACK) and
corresponding code to track the writeback status of the page and to
prevent writes to any of the extent buffers mapped to the page while
writeback is going on.

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/ctree.c   |  18 ++
 fs/btrfs/extent-tree.c |  10 
 fs/btrfs/extent_io.c   | 150 -
 fs/btrfs/extent_io.h   |   1 +
 4 files changed, 152 insertions(+), 27 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 4e35a21..0a56d1b 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -1541,6 +1541,7 @@ noinline int btrfs_cow_block(struct btrfs_trans_handle 
*trans,
struct extent_buffer *parent, int parent_slot,
struct extent_buffer **cow_ret)
 {
+   struct extent_buffer_head *ebh = eb_head(buf);
u64 search_start;
int ret;
 
@@ -1555,6 +1556,14 @@ noinline int btrfs_cow_block(struct btrfs_trans_handle 
*trans,
 
if (!should_cow_block(trans, root, buf)) {
trans->dirty = true;
+   if (test_bit(EXTENT_BUFFER_HEAD_WRITEBACK, &ebh->bflags)) {
+   if (parent)
+   btrfs_set_lock_blocking(parent);
+   btrfs_set_lock_blocking(buf);
+   wait_on_bit_io(&ebh->bflags,
+   EXTENT_BUFFER_HEAD_WRITEBACK,
+   TASK_UNINTERRUPTIBLE);
+   }
*cow_ret = buf;
return 0;
}
@@ -2686,6 +2695,7 @@ int btrfs_search_slot(struct btrfs_trans_handle *trans, 
struct btrfs_root
  *root, struct btrfs_key *key, struct btrfs_path *p, int
  ins_len, int cow)
 {
+   struct extent_buffer_head *ebh;
struct extent_buffer *b;
int slot;
int ret;
@@ -2790,6 +2800,14 @@ again:
 */
if (!should_cow_block(trans, root, b)) {
trans->dirty = true;
+   ebh = eb_head(b);
+   if (test_bit(EXTENT_BUFFER_HEAD_WRITEBACK,
+   &ebh->bflags)) {
+   btrfs_set_path_blocking(p);
+   wait_on_bit_io(&ebh->bflags,
+   EXTENT_BUFFER_HEAD_WRITEBACK,
+   TASK_UNINTERRUPTIBLE);
+   }
goto cow_done;
}
 
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 590d0e7..4ead0ff 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -8224,15 +8224,25 @@ static struct extent_buffer *
 btrfs_init_new_buffer(struct btrfs_trans_handle *trans, struct btrfs_root 
*root,
  u64 bytenr, int level)
 {
+   struct extent_buffer_head *ebh;
struct extent_buffer *buf;
 
buf = btrfs_find_create_tree_block(root, bytenr, root->nodesize);
if (IS_ERR(buf))
return buf;
 
+   ebh = eb_head(buf);
btrfs_set_header_generation(buf, trans->transid);
btrfs_set_buffer_lockdep_class(root->root_key.objectid, buf, level);
btrfs_tree_lock(buf);
+
+   if (test_bit(EXTENT_BUFFER_HEAD_WRITEBACK,
+   &ebh->bflags)) {
+   btrfs_set_lock_blocking(buf);
+   wait_on_bit_io(&ebh->bflags, EXTENT_BUFFER_HEAD_WRITEBACK,
+   TASK_UNINTERRUPTIBLE);
+   }
+
clean_tree_block(trans, root->fs_info, buf);
clear_bit(EXTENT_BUFFER_STALE, &buf->ebflags);
 
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 694d2dc..0bdb27d 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3725,6 +3725,52 @@ void wait_on_extent_buffer_writeback(struct 
extent_buffer *eb)
TASK_UNINTERRUPTIBLE);
 }
 
+static void lock_extent_buffers(struct extent_buffer_head *ebh,
+   struct extent_page_data *epd)
+{
+   struct extent_buffer *locked_eb = NULL;
+   struct extent_buffer *eb;
+again:
+   eb = &ebh->eb;
+   do {
+   if (eb == locked_eb)
+   continue;
+
+   if (!btrfs_try_tree_write_lock(eb))
+   goto backoff;
+
+   } while ((eb = eb->eb_next) != NULL);
+
+   return;
+
+backoff:
+   if (locked

[PATCH V20 18/19] Btrfs: subpage-blocksize: Disable compression

2016-07-03 Thread Chandan Rajendra
The subpage-blocksize patchset does not yet support compression. Hence,
the kernel might crash when executing compression code in
subpage-blocksize scenario. This commit disables enabling compression
feature during 'mount' and also when the  user invokes
'chattr +c ' command.

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/ioctl.c |  8 +++-
 fs/btrfs/super.c | 20 
 2 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 0ef3c32..d7159db 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -322,6 +322,11 @@ static int btrfs_ioctl_setflags(struct file *file, void 
__user *arg)
} else if (flags & FS_COMPR_FL) {
const char *comp;
 
+   if (root->sectorsize < PAGE_SIZE) {
+   ret = -EINVAL;
+   goto out_drop;
+   }
+
ip->flags |= BTRFS_INODE_COMPRESS;
ip->flags &= ~BTRFS_INODE_NOCOMPRESS;
 
@@ -1344,7 +1349,8 @@ int btrfs_defrag_file(struct inode *inode, struct file 
*file,
return -EINVAL;
 
if (range->flags & BTRFS_DEFRAG_RANGE_COMPRESS) {
-   if (range->compress_type > BTRFS_COMPRESS_TYPES)
+   if ((range->compress_type > BTRFS_COMPRESS_TYPES)
+   || (root->sectorsize < PAGE_SIZE))
return -EINVAL;
if (range->compress_type)
compress_type = range->compress_type;
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index cba92e6..ddd4658 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -368,6 +368,17 @@ static const match_table_t tokens = {
{Opt_err, NULL},
 };
 
+static int can_enable_compression(struct btrfs_fs_info *fs_info)
+{
+   if (btrfs_super_sectorsize(fs_info->super_copy) < PAGE_SIZE) {
+   btrfs_err(fs_info,
+   "Compression is not supported for subpage-blocksize");
+   return 0;
+   }
+
+   return 1;
+}
+
 /*
  * Regular mount options parser.  Everything that is needed only when
  * reading in a new superblock is parsed here.
@@ -477,6 +488,10 @@ int btrfs_parse_options(struct btrfs_root *root, char 
*options,
if (token == Opt_compress ||
token == Opt_compress_force ||
strcmp(args[0].from, "zlib") == 0) {
+   if (!can_enable_compression(info)) {
+   ret = -EINVAL;
+   goto out;
+   }
compress_type = "zlib";
info->compress_type = BTRFS_COMPRESS_ZLIB;
btrfs_set_opt(info->mount_opt, COMPRESS);
@@ -484,6 +499,10 @@ int btrfs_parse_options(struct btrfs_root *root, char 
*options,
btrfs_clear_opt(info->mount_opt, NODATASUM);
no_compress = 0;
} else if (strcmp(args[0].from, "lzo") == 0) {
+   if (!can_enable_compression(info)) {
+   ret = -EINVAL;
+   goto out;
+   }
compress_type = "lzo";
info->compress_type = BTRFS_COMPRESS_LZO;
btrfs_set_opt(info->mount_opt, COMPRESS);
@@ -806,6 +825,7 @@ int btrfs_parse_options(struct btrfs_root *root, char 
*options,
break;
}
}
+
 check:
/*
 * Extra check for current option against current flag
-- 
2.5.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V20 15/19] Btrfs: subpage-blocksize: btrfs_clone: Flush dirty blocks of a page that do not map the clone range

2016-07-03 Thread Chandan Rajendra
After cloning the required extents, we truncate all the pages that map
the file range being cloned. In subpage-blocksize scenario, we could
have dirty blocks before and/or after the clone range in the
leading/trailing pages. Truncating these pages would lead to data
loss. Hence this commit forces such dirty blocks to be flushed to disk
before performing the clone operation.

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/ioctl.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 5d9062e..0ef3c32 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3921,6 +3921,7 @@ static noinline int btrfs_clone_files(struct file *file, 
struct file *file_src,
int ret;
u64 len = olen;
u64 bs = root->fs_info->sb->s_blocksize;
+   u64 dest_end;
int same_inode = src == inode;
 
/*
@@ -3981,6 +3982,21 @@ static noinline int btrfs_clone_files(struct file *file, 
struct file *file_src,
goto out_unlock;
}
 
+   if ((round_down(destoff, PAGE_SIZE) < inode->i_size) &&
+   !IS_ALIGNED(destoff, PAGE_SIZE)) {
+   ret = filemap_write_and_wait_range(inode->i_mapping,
+   round_down(destoff, PAGE_SIZE),
+   destoff - 1);
+   }
+
+   dest_end = destoff + len - 1;
+   if ((dest_end < inode->i_size) &&
+   !IS_ALIGNED(dest_end + 1, PAGE_SIZE)) {
+   ret = filemap_write_and_wait_range(inode->i_mapping,
+   dest_end + 1,
+   round_up(dest_end, PAGE_SIZE));
+   }
+
if (destoff > inode->i_size) {
ret = btrfs_cont_expand(inode, inode->i_size, destoff);
if (ret)
-- 
2.5.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V20 16/19] Btrfs: subpage-blocksize: Make file extent relocate code subpage blocksize aware

2016-07-03 Thread Chandan Rajendra
The file extent relocation code currently assumes blocksize to be same
as PAGE_SIZE. This commit adds code to support subpage blocksize
scenario.

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/relocation.c | 89 ---
 1 file changed, 70 insertions(+), 19 deletions(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 05b88f8..fc0ac5d 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -3106,14 +3106,19 @@ static int relocate_file_extent_cluster(struct inode 
*inode,
 {
u64 page_start;
u64 page_end;
+   u64 block_start;
u64 offset = BTRFS_I(inode)->index_cnt;
+   u64 blocksize = BTRFS_I(inode)->root->sectorsize;
+   u64 reserved_space;
unsigned long index;
unsigned long last_index;
struct page *page;
struct file_ra_state *ra;
gfp_t mask = btrfs_alloc_write_mask(inode->i_mapping);
+   int nr_blocks;
int nr = 0;
int ret = 0;
+   int i;
 
if (!cluster->nr)
return 0;
@@ -3133,13 +3138,19 @@ static int relocate_file_extent_cluster(struct inode 
*inode,
if (ret)
goto out;
 
+   page_start = cluster->start - offset;
+   page_end = min_t(u64, round_down(page_start, PAGE_SIZE) + PAGE_SIZE - 1,
+   cluster->end - offset);
+
index = (cluster->start - offset) >> PAGE_SHIFT;
last_index = (cluster->end - offset) >> PAGE_SHIFT;
while (index <= last_index) {
-   ret = btrfs_delalloc_reserve_metadata(inode, PAGE_SIZE);
+   reserved_space = page_end - page_start + 1;
+
+   ret = btrfs_delalloc_reserve_metadata(inode, reserved_space);
if (ret)
goto out;
-
+again:
page = find_lock_page(inode->i_mapping, index);
if (!page) {
page_cache_sync_readahead(inode->i_mapping,
@@ -3149,7 +3160,7 @@ static int relocate_file_extent_cluster(struct inode 
*inode,
   mask);
if (!page) {
btrfs_delalloc_release_metadata(inode,
-   PAGE_SIZE);
+   reserved_space);
ret = -ENOMEM;
goto out;
}
@@ -3161,6 +3172,37 @@ static int relocate_file_extent_cluster(struct inode 
*inode,
   last_index + 1 - index);
}
 
+   if (PageDirty(page)) {
+   u64 pg_offset = page_offset(page);
+
+   unlock_page(page);
+   put_page(page);
+   ret = btrfs_fdatawrite_range(inode, pg_offset,
+   page_start - 1);
+   if (ret) {
+   btrfs_delalloc_release_metadata(inode,
+   reserved_space);
+   goto out;
+   }
+
+   ret = filemap_fdatawait_range(inode->i_mapping,
+   pg_offset, page_start - 1);
+   if (ret) {
+   btrfs_delalloc_release_metadata(inode,
+   reserved_space);
+   goto out;
+   }
+
+   goto again;
+   }
+
+   if (BTRFS_I(inode)->root->sectorsize < PAGE_SIZE) {
+   ClearPageUptodate(page);
+   if (page->private)
+   clear_page_blks_state(page, 1 << 
BLK_STATE_UPTODATE,
+   page_start, page_end);
+   }
+
if (!PageUptodate(page)) {
btrfs_readpage(NULL, page);
lock_page(page);
@@ -3168,41 +3210,50 @@ static int relocate_file_extent_cluster(struct inode 
*inode,
unlock_page(page);
put_page(page);
btrfs_delalloc_release_metadata(inode,
-   PAGE_SIZE);
+   reserved_space);
ret = -EIO;
goto out;
}
}
 
-   page_start = page_offset(page);
-   page_end = page_start + PAGE_SIZE - 1;
-
lock_extent(&BTRFS_I(inode)->io_tree, page_start, page_end);
 

[PATCH V20 03/19] Btrfs: subpage-blocksize: Make sure delalloc range intersects with the locked page's range

2016-07-03 Thread Chandan Rajendra
find_delalloc_range indirectly depends on EXTENT_UPTODDATE to make sure that
the delalloc range returned intersects with the file range mapped by the
page. Since we now track "uptodate" state in a per-page
bitmap (i.e. in btrfs_page_private->bstate), this commit makes an explicit
check to make sure that the delalloc range starts from within the file range
mapped by the page.

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/extent_io.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 0adbff5..f7d035b 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1581,6 +1581,7 @@ out:
  * 1 is returned if we find something, 0 if nothing was in the tree
  */
 static noinline u64 find_delalloc_range(struct extent_io_tree *tree,
+   struct page *locked_page,
u64 *start, u64 *end, u64 max_bytes,
struct extent_state **cached_state)
 {
@@ -1589,6 +1590,9 @@ static noinline u64 find_delalloc_range(struct 
extent_io_tree *tree,
u64 cur_start = *start;
u64 found = 0;
u64 total_bytes = 0;
+   u64 page_end;
+
+   page_end = page_offset(locked_page) + PAGE_SIZE - 1;
 
spin_lock(&tree->lock);
 
@@ -1609,7 +1613,8 @@ static noinline u64 find_delalloc_range(struct 
extent_io_tree *tree,
  (state->state & EXTENT_BOUNDARY))) {
goto out;
}
-   if (!(state->state & EXTENT_DELALLOC)) {
+   if (!(state->state & EXTENT_DELALLOC)
+   || (page_end < state->start)) {
if (!found)
*end = state->end;
goto out;
@@ -1747,8 +1752,9 @@ again:
/* step one, find a bunch of delalloc bytes starting at start */
delalloc_start = *start;
delalloc_end = 0;
-   found = find_delalloc_range(tree, &delalloc_start, &delalloc_end,
-   max_bytes, &cached_state);
+   found = find_delalloc_range(tree, locked_page,
+   &delalloc_start, &delalloc_end,
+   max_bytes, &cached_state);
if (!found || delalloc_end <= *start) {
*start = delalloc_start;
*end = delalloc_end;
-- 
2.5.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V20 07/19] Btrfs: subpage-blocksize: Allow mounting filesystems where sectorsize < PAGE_SIZE

2016-07-03 Thread Chandan Rajendra
This commit allows mounting filesystem instances with sectorsize smaller
than the PAGE_SIZE.

Since the code assumes that the super block is either equal to or larger
than sectorsize, this commit brings back the nodesize argument for
btrfs_find_create_tree_block() function. This change allows us to be
able to mount and use filesystems with 2048 bytes as the sectorsize.

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/disk-io.c | 21 -
 fs/btrfs/disk-io.h |  2 +-
 fs/btrfs/extent-tree.c |  4 ++--
 fs/btrfs/extent_io.c   |  3 +--
 fs/btrfs/extent_io.h   |  4 ++--
 fs/btrfs/tree-log.c|  2 +-
 fs/btrfs/volumes.c | 10 +++---
 7 files changed, 18 insertions(+), 28 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 2d9e86b..0727c1c 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1099,7 +1099,7 @@ void readahead_tree_block(struct btrfs_root *root, u64 
bytenr)
struct extent_buffer *buf = NULL;
struct inode *btree_inode = root->fs_info->btree_inode;
 
-   buf = btrfs_find_create_tree_block(root, bytenr);
+   buf = btrfs_find_create_tree_block(root, bytenr, root->nodesize);
if (IS_ERR(buf))
return;
read_extent_buffer_pages(&BTRFS_I(btree_inode)->io_tree,
@@ -1115,7 +1115,7 @@ int reada_tree_block_flagged(struct btrfs_root *root, u64 
bytenr,
struct extent_io_tree *io_tree = &BTRFS_I(btree_inode)->io_tree;
int ret;
 
-   buf = btrfs_find_create_tree_block(root, bytenr);
+   buf = btrfs_find_create_tree_block(root, bytenr, root->nodesize);
if (IS_ERR(buf))
return 0;
 
@@ -1146,12 +1146,12 @@ struct extent_buffer *btrfs_find_tree_block(struct 
btrfs_fs_info *fs_info,
 }
 
 struct extent_buffer *btrfs_find_create_tree_block(struct btrfs_root *root,
-u64 bytenr)
+u64 bytenr, u32 blocksize)
 {
if (btrfs_test_is_dummy_root(root))
return alloc_test_extent_buffer(root->fs_info, bytenr,
-   root->nodesize);
-   return alloc_extent_buffer(root->fs_info, bytenr);
+   blocksize);
+   return alloc_extent_buffer(root->fs_info, bytenr, blocksize);
 }
 
 
@@ -1175,7 +1175,7 @@ struct extent_buffer *read_tree_block(struct btrfs_root 
*root, u64 bytenr,
struct extent_buffer *buf = NULL;
int ret;
 
-   buf = btrfs_find_create_tree_block(root, bytenr);
+   buf = btrfs_find_create_tree_block(root, bytenr, root->nodesize);
if (IS_ERR(buf))
return buf;
 
@@ -4093,17 +4093,12 @@ static int btrfs_check_super_valid(struct btrfs_fs_info 
*fs_info,
 * Check sectorsize and nodesize first, other check will need it.
 * Check all possible sectorsize(4K, 8K, 16K, 32K, 64K) here.
 */
-   if (!is_power_of_2(sectorsize) || sectorsize < 4096 ||
+   if (!is_power_of_2(sectorsize) || sectorsize < 2048 ||
sectorsize > BTRFS_MAX_METADATA_BLOCKSIZE) {
printk(KERN_ERR "BTRFS: invalid sectorsize %llu\n", sectorsize);
ret = -EINVAL;
}
-   /* Only PAGE SIZE is supported yet */
-   if (sectorsize != PAGE_SIZE) {
-   printk(KERN_ERR "BTRFS: sectorsize %llu not supported yet, only 
support %lu\n",
-   sectorsize, PAGE_SIZE);
-   ret = -EINVAL;
-   }
+
if (!is_power_of_2(nodesize) || nodesize < sectorsize ||
nodesize > BTRFS_MAX_METADATA_BLOCKSIZE) {
printk(KERN_ERR "BTRFS: invalid nodesize %llu\n", nodesize);
diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
index a81ff8d..aa3fb08 100644
--- a/fs/btrfs/disk-io.h
+++ b/fs/btrfs/disk-io.h
@@ -50,7 +50,7 @@ void readahead_tree_block(struct btrfs_root *root, u64 
bytenr);
 int reada_tree_block_flagged(struct btrfs_root *root, u64 bytenr,
 int mirror_num, struct extent_buffer **eb);
 struct extent_buffer *btrfs_find_create_tree_block(struct btrfs_root *root,
-  u64 bytenr);
+  u64 bytenr, u32 blocksize);
 void clean_tree_block(struct btrfs_trans_handle *trans,
  struct btrfs_fs_info *fs_info, struct extent_buffer *buf);
 int open_ctree(struct super_block *sb,
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 51e514c..590d0e7 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -8226,7 +8226,7 @@ btrfs_init_new_buffer(struct btrfs_trans_handle *trans, 
struct btrfs_root *root,
 {
struct extent_buffer *buf;
 
-   buf = btrfs_find_create_tree_block(root, bytenr);
+   buf = btrfs_find_create_tree_block(root, bytenr, root->nodesize);
   

[PATCH V20 08/19] Btrfs: subpage-blocksize: Deal with partial ordered extent allocations.

2016-07-03 Thread Chandan Rajendra
In subpage-blocksize scenario, extent allocations for only some of the
dirty blocks of a page can succeed, while allocation for rest of the
blocks can fail. This patch allows I/O against such pages to be
submitted.

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/extent_io.c | 27 ++-
 fs/btrfs/inode.c | 18 +++---
 2 files changed, 29 insertions(+), 16 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 0ec3b1e..303b49e 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1864,17 +1864,23 @@ void extent_clear_unlock_delalloc(struct inode *inode, 
u64 start, u64 end,
if (page_ops & PAGE_SET_PRIVATE2)
SetPagePrivate2(pages[i]);
 
+   if (page_ops & PAGE_SET_ERROR)
+   SetPageError(pages[i]);
+
if (pages[i] == locked_page) {
put_page(pages[i]);
continue;
}
-   if (page_ops & PAGE_CLEAR_DIRTY)
+
+   if ((page_ops & PAGE_CLEAR_DIRTY)
+   && !PagePrivate2(pages[i]))
clear_page_dirty_for_io(pages[i]);
-   if (page_ops & PAGE_SET_WRITEBACK)
+   if ((page_ops & PAGE_SET_WRITEBACK)
+   && !PagePrivate2(pages[i]))
set_page_writeback(pages[i]);
-   if (page_ops & PAGE_SET_ERROR)
-   SetPageError(pages[i]);
-   if (page_ops & PAGE_END_WRITEBACK)
+
+   if ((page_ops & PAGE_END_WRITEBACK)
+   && !PagePrivate2(pages[i]))
end_page_writeback(pages[i]);
 
if (page_ops & PAGE_UNLOCK) {
@@ -2572,7 +2578,7 @@ void end_extent_writepage(struct page *page, int err, u64 
start, u64 end)
uptodate = 0;
}
 
-   if (!uptodate) {
+   if (!uptodate || PageError(page)) {
ClearPageUptodate(page);
SetPageError(page);
ret = ret < 0 ? ret : -EIO;
@@ -3427,7 +3433,6 @@ static noinline_for_stack int writepage_delalloc(struct 
inode *inode,
   nr_written);
/* File system has been set read-only */
if (ret) {
-   SetPageError(page);
/* fill_delalloc should be return < 0 for error
 * but just in case, we use > 0 here meaning the
 * IO is started, so we don't want to return > 0
@@ -3648,7 +3653,6 @@ static int __extent_writepage(struct page *page, struct 
writeback_control *wbc,
struct inode *inode = page->mapping->host;
struct extent_page_data *epd = data;
u64 start = page_offset(page);
-   u64 page_end = start + PAGE_SIZE - 1;
int ret;
int nr = 0;
size_t pg_offset = 0;
@@ -3693,7 +3697,7 @@ static int __extent_writepage(struct page *page, struct 
writeback_control *wbc,
ret = writepage_delalloc(inode, page, wbc, epd, start, &nr_written);
if (ret == 1)
goto done_unlocked;
-   if (ret)
+   if (ret && !PagePrivate2(page))
goto done;
 
ret = __extent_writepage_io(inode, page, wbc, epd,
@@ -3707,10 +3711,7 @@ done:
set_page_writeback(page);
end_page_writeback(page);
}
-   if (PageError(page)) {
-   ret = ret < 0 ? ret : -EIO;
-   end_extent_writepage(page, ret, start, page_end);
-   }
+
unlock_page(page);
return ret;
 
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index e8a0005..e9f9bb1 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -950,6 +950,7 @@ static noinline int cow_file_range(struct inode *inode,
struct btrfs_key ins;
struct extent_map *em;
struct extent_map_tree *em_tree = &BTRFS_I(inode)->extent_tree;
+   struct btrfs_ordered_extent *ordered;
unsigned long page_ops, extent_ops;
int ret = 0;
 
@@ -1048,7 +1049,7 @@ static noinline int cow_file_range(struct inode *inode,
ret = btrfs_reloc_clone_csums(inode, start,
  cur_alloc_size);
if (ret)
-   goto out_drop_extent_cache;
+   goto out_remove_ordered_extent;
}
 
btrfs_dec_block_group_reservations(root->fs_info, ins.objectid);
@@ -1077,11 +1078,22 @@ static noinline int cow_file_range(struct inode *inode,
 out:
return ret;
 

[PATCH V20 02/19] Btrfs: subpage-blocksize: Fix whole page write

2016-07-03 Thread Chandan Rajendra
For the subpage-blocksize scenario, a page can contain multiple
blocks. In such cases, this patch handles writing data to files.

Also, When setting EXTENT_DELALLOC, we no longer set EXTENT_UPTODATE bit on
the extent_io_tree since uptodate status is being tracked by the bitmap
pointed to by page->private.

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/extent_io.c  | 150 --
 fs/btrfs/file.c   |  17 ++
 fs/btrfs/inode.c  |  75 +
 fs/btrfs/relocation.c |   3 +
 4 files changed, 155 insertions(+), 90 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index a349f99..0adbff5 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1494,24 +1494,6 @@ void extent_range_redirty_for_io(struct inode *inode, 
u64 start, u64 end)
}
 }
 
-/*
- * helper function to set both pages and extents in the tree writeback
- */
-static void set_range_writeback(struct extent_io_tree *tree, u64 start, u64 
end)
-{
-   unsigned long index = start >> PAGE_SHIFT;
-   unsigned long end_index = end >> PAGE_SHIFT;
-   struct page *page;
-
-   while (index <= end_index) {
-   page = find_get_page(tree->mapping, index);
-   BUG_ON(!page); /* Pages should be in the extent_io_tree */
-   set_page_writeback(page);
-   put_page(page);
-   index++;
-   }
-}
-
 /* find the first state struct with 'bits' set after 'start', and
  * return it.  tree->lock must be held.  NULL will returned if
  * nothing was found after 'start'
@@ -2585,36 +2567,41 @@ void end_extent_writepage(struct page *page, int err, 
u64 start, u64 end)
  */
 static void end_bio_extent_writepage(struct bio *bio)
 {
+   struct btrfs_page_private *pg_private;
struct bio_vec *bvec;
+   unsigned long flags;
u64 start;
u64 end;
+   int clear_writeback;
int i;
 
bio_for_each_segment_all(bvec, bio, i) {
struct page *page = bvec->bv_page;
+   struct btrfs_root *root = BTRFS_I(page->mapping->host)->root;
 
-   /* We always issue full-page reads, but if some block
-* in a page fails to read, blk_update_request() will
-* advance bv_offset and adjust bv_len to compensate.
-* Print a warning for nonzero offsets, and an error
-* if they don't add up to a full page.  */
-   if (bvec->bv_offset || bvec->bv_len != PAGE_SIZE) {
-   if (bvec->bv_offset + bvec->bv_len != PAGE_SIZE)
-   
btrfs_err(BTRFS_I(page->mapping->host)->root->fs_info,
-  "partial page write in btrfs with offset %u 
and length %u",
-   bvec->bv_offset, bvec->bv_len);
-   else
-   
btrfs_info(BTRFS_I(page->mapping->host)->root->fs_info,
-  "incomplete page write in btrfs with offset 
%u and "
-  "length %u",
-   bvec->bv_offset, bvec->bv_len);
-   }
+   pg_private = NULL;
+   flags = 0;
+   clear_writeback = 1;
 
-   start = page_offset(page);
-   end = start + bvec->bv_offset + bvec->bv_len - 1;
+   start = page_offset(page) + bvec->bv_offset;
+   end = start + bvec->bv_len - 1;
+
+   if (root->sectorsize < PAGE_SIZE) {
+   pg_private = (struct btrfs_page_private *)page->private;
+   spin_lock_irqsave(&pg_private->io_lock, flags);
+   }
 
end_extent_writepage(page, bio->bi_error, start, end);
-   end_page_writeback(page);
+
+   if (root->sectorsize < PAGE_SIZE) {
+   clear_page_blks_state(page, 1 << BLK_STATE_IO, start,
+   end);
+   clear_writeback = page_io_complete(page);
+   spin_unlock_irqrestore(&pg_private->io_lock, flags);
+   }
+
+   if (clear_writeback)
+   end_page_writeback(page);
}
 
bio_put(bio);
@@ -3486,7 +3473,6 @@ static noinline_for_stack int 
__extent_writepage_io(struct inode *inode,
u64 block_start;
u64 iosize;
sector_t sector;
-   struct extent_state *cached_state = NULL;
struct extent_map *em;
struct block_device *bdev;
size_t pg_offset = 0;
@@ -3538,20 +3524,29 @@ static noinline_for_stack int 
__extent_writepage_io(struct inode *inode,
  

[PATCH V20 00/19] Allow I/O on blocks whose size is less than page size

2016-07-03 Thread Chandan Rajendra
at we would overflow the stripe by one 4K
   block. Hence this patchset removes the optimization and invokes
   submit_extent_page() for every dirty 4K block.
3. The following patches are newly added:
   - Btrfs: subpage-blocksize: __btrfs_lookup_bio_sums: Set offset
 when moving to a new bio_vec 
   - Btrfs: subpage-blocksize: Make file extent relocate code subpage
 blocksize aware 
   - Btrfs: btrfs_clone: Flush dirty blocks of a page that do not map
 the clone range

Changes from V14:
1. Fix usage of cleancache_get_page() in __do_readpage().
   In filesystems which support subpage-blocksize scenario, a page can
   map one or more blocks. Hence cleancache_get_page() should be
   invoked only when the page maps a non-hole extent and block size
   being used is equal to the page size. Thanks to David Sterba for
   pointing this out.
2. Replace page_read_complete() and page_write_complete() functions
   with page_io_complete().
3. Provide more documentation (as part of both commit message and code
   comments) about the usage of the per-page
   btrfs_page_private->io_lock.

Changes from V13:
1. Enable dedup ioctl to work in subpagesize-blocksize scenario.

Changes from V12:
1. The logic in the function btrfs_punch_hole() has been fixed to
   check for the presence of BLK_STATE_UPTODATE flags for blocks in
   pages which partially map the file range being punched.
   
Changes from V11:
1. Addressed the review comments provided by Liu Bo for version V11.
2. Fixed file defragmentation code to work in subpagesize-blocksize
   scenario.
3. Many "hard to reproduce" bugs were fixed.


Chandan Rajendra (19):
  Btrfs: subpage-blocksize: Fix whole page read.
  Btrfs: subpage-blocksize: Fix whole page write
  Btrfs: subpage-blocksize: Make sure delalloc range intersects with the
locked page's range
  Btrfs: subpage-blocksize: Define extent_buffer_head
  Btrfs: subpage-blocksize: Read tree blocks whose size is < PAGE_SIZE
  Btrfs: subpage-blocksize: Write only dirty extent buffers belonging to
a page
  Btrfs: subpage-blocksize: Allow mounting filesystems where sectorsize
< PAGE_SIZE
  Btrfs: subpage-blocksize: Deal with partial ordered extent
allocations.
  Btrfs: subpage-blocksize: Explicitly track I/O status of blocks of an
ordered extent.
  Btrfs: subpage-blocksize: btrfs_punch_hole: Fix uptodate blocks check
  Btrfs: subpage-blocksize: Prevent writes to an extent buffer when
PG_writeback flag is set
  Revert "btrfs: fix lockups from btrfs_clear_path_blocking"
  Btrfs: subpage-blocksize: Fix file defragmentation code
  Btrfs: subpage-blocksize: Enable dedupe ioctl
  Btrfs: subpage-blocksize: btrfs_clone: Flush dirty blocks of a page
that do not map the clone range
  Btrfs: subpage-blocksize: Make file extent relocate code subpage
blocksize aware
  Btrfs: subpage-blocksize: __btrfs_lookup_bio_sums: Set offset when
moving to a new bio_vec
  Btrfs: subpage-blocksize: Disable compression
  Btrfs: subpage-blocksize: Rate limit scrub error message

 fs/btrfs/ctree.c   |   36 +-
 fs/btrfs/ctree.h   |6 +-
 fs/btrfs/disk-io.c |  167 ++--
 fs/btrfs/disk-io.h |5 +-
 fs/btrfs/extent-tree.c |   20 +-
 fs/btrfs/extent_io.c   | 1687 +++-
 fs/btrfs/extent_io.h   |  147 ++-
 fs/btrfs/file-item.c   |7 +-
 fs/btrfs/file.c|  106 +-
 fs/btrfs/inode.c   |  404 ++--
 fs/btrfs/ioctl.c   |  232 +++--
 fs/btrfs/locking.c |   24 +-
 fs/btrfs/locking.h |2 -
 fs/btrfs/ordered-data.c|   19 +
 fs/btrfs/ordered-data.h|4 +
 fs/btrfs/relocation.c  |   86 +-
 fs/btrfs/root-tree.c   |2 +-
 fs/btrfs/scrub.c   |2 +-
 fs/btrfs/super.c   |   29 +-
 fs/btrfs/tests/btrfs-tests.c   |   12 +-
 fs/btrfs/tests/extent-io-tests.c   |5 +-
 fs/btrfs/tests/free-space-tree-tests.c |   79 +-
 fs/btrfs/tree-log.c|2 +-
 fs/btrfs/volumes.c |   12 +-
 include/trace/events/btrfs.h   |2 +-
 25 files changed, 2227 insertions(+), 870 deletions(-)

-- 
2.5.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V20 01/19] Btrfs: subpage-blocksize: Fix whole page read.

2016-07-03 Thread Chandan Rajendra
For the subpage-blocksize scenario, a page can contain multiple
blocks. In such cases, this patch handles reading data from files.

To track the status of individual blocks of a page, this patch makes use
of a bitmap pointed to by the newly introduced per-page 'struct
btrfs_page_private'.

The per-page btrfs_page_private->io_lock plays the same role as
BH_Uptodate_Lock (see end_buffer_async_read()) i.e. without the io_lock
we may end up in the following situation,

NOTE: Assume 64k page size and 4k block size. Also assume that the first
12 blocks of the page are contiguous while the next 4 blocks are
contiguous. When reading the page we end up submitting two "logical
address space" bios. So end_bio_extent_readpage function is invoked
twice, once for each bio.

|-+-+-|
| Task A  | Task B  | Task C  |
|-+-+-|
| end_bio_extent_readpage | | |
| process block 0 | | |
| - clear BLK_STATE_IO| | |
| - page_read_complete| | |
| process block 1 | | |
| | | |
| | | |
| | end_bio_extent_readpage | |
| | process block 0 | |
| | - clear BLK_STATE_IO| |
| | - page_read_complete| |
| | process block 1 | |
| | | |
| process block 11| process block 3 | |
| - clear BLK_STATE_IO| - clear BLK_STATE_IO| |
| - page_read_complete| - page_read_complete| |
|   - returns true|   - returns true| |
|   - unlock_page()   | | |
| | | lock_page() |
| |   - unlock_page()   | |
|-+-+-|

We end up incorrectly unlocking the page twice and "Task C" ends up
working on an unlocked page. So private->io_lock makes sure that only
one of the tasks gets "true" as the return value when page_io_complete()
is invoked. As an optimization the patch gets the io_lock only when the
last block of the bio_vec is being processed.

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/extent_io.c | 371 ---
 fs/btrfs/extent_io.h |  74 +-
 fs/btrfs/inode.c |  16 +--
 3 files changed, 338 insertions(+), 123 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index e197d47..a349f99 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -24,6 +24,7 @@
 
 static struct kmem_cache *extent_state_cache;
 static struct kmem_cache *extent_buffer_cache;
+static struct kmem_cache *page_private_cache;
 static struct bio_set *btrfs_bioset;
 
 static inline bool extent_state_in_tree(const struct extent_state *state)
@@ -174,10 +175,16 @@ int __init extent_io_init(void)
if (!extent_buffer_cache)
goto free_state_cache;
 
+   page_private_cache = kmem_cache_create("btrfs_page_private",
+   sizeof(struct btrfs_page_private), 0,
+   SLAB_RECLAIM_ACCOUNT | SLAB_MEM_SPREAD, NULL);
+   if (!page_private_cache)
+   goto free_buffer_cache;
+
btrfs_bioset = bioset_create(BIO_POOL_SIZE,
 offsetof(struct btrfs_io_bio, bio));
if (!btrfs_bioset)
-   goto free_buffer_cache;
+   goto free_page_private_cache;
 
if (bioset_integrity_create(btrfs_bioset, BIO_POOL_SIZE))
goto free_bioset;
@@ -188,6 +195,10 @@ free_bioset:
bioset_free(btrfs_bioset);
btrfs_bioset = NULL;
 
+free_page_private_cache:
+   kmem_cache_destroy(page_private_cache);
+   page_private_cache = NULL;
+
 free_buffer_cache:
kmem_cache_destroy(extent_buffer_cache);
extent_buffer_cache = NULL;
@@ -1323,6 +1334,95 @@ int clear_record_extent_bits(struct extent_io_tree 
*tree, u64 start, u64 end,
  changeset);
 }
 
+static int modify_page_blks_state(struct page *page,
+   unsigned long blk_states,
+   u64 start, u64 end, int set)
+{
+   struct inode *inode = page->mapping->host;
+   unsigned long *bitmap;
+   unsigned long first_state;
+   unsigned long state;

[PATCH V20 06/19] Btrfs: subpage-blocksize: Write only dirty extent buffers belonging to a page

2016-07-03 Thread Chandan Rajendra
For the subpage-blocksize scenario, this patch adds the ability to write
a single extent buffer to the disk.

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/disk-io.c   |  32 +++---
 fs/btrfs/extent_io.c | 277 +--
 2 files changed, 242 insertions(+), 67 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index b09d3e3..2d9e86b 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -504,28 +504,30 @@ static int btree_read_extent_buffer_pages(struct 
btrfs_root *root,
 
 static int csum_dirty_buffer(struct btrfs_fs_info *fs_info, struct page *page)
 {
-   u64 start = page_offset(page);
-   u64 found_start;
struct extent_buffer *eb;
+   u64 found_start;
+   int ret;
 
eb = (struct extent_buffer *)page->private;
if (page != eb_head(eb)->pages[0])
return 0;
 
-   found_start = btrfs_header_bytenr(eb);
-   /*
-* Please do not consolidate these warnings into a single if.
-* It is useful to know what went wrong.
-*/
-   if (WARN_ON(found_start != start))
-   return -EUCLEAN;
-   if (WARN_ON(!PageUptodate(page)))
-   return -EUCLEAN;
-
-   ASSERT(memcmp_extent_buffer(eb, fs_info->fsid,
-   btrfs_header_fsid(), BTRFS_FSID_SIZE) == 0);
+   do {
+   if (!test_bit(EXTENT_BUFFER_WRITEBACK, &eb->ebflags))
+   continue;
+   if (WARN_ON(!test_bit(EXTENT_BUFFER_UPTODATE, &eb->ebflags)))
+   continue;
+   found_start = btrfs_header_bytenr(eb);
+   if (WARN_ON(found_start != eb->start))
+   return 0;
+   ASSERT(memcmp_extent_buffer(eb, fs_info->fsid,
+   btrfs_header_fsid(), BTRFS_FSID_SIZE) == 0);
+   ret = csum_tree_block(fs_info, eb, 0);
+   if (ret)
+   return ret;
+   } while ((eb = eb->eb_next) != NULL);
 
-   return csum_tree_block(fs_info, eb, 0);
+   return 0;
 }
 
 static int check_tree_block_fsid(struct btrfs_fs_info *fs_info,
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index a425f90..2b5fc13 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3724,29 +3724,49 @@ void wait_on_extent_buffer_writeback(struct 
extent_buffer *eb)
TASK_UNINTERRUPTIBLE);
 }
 
-static noinline_for_stack int
-lock_extent_buffer_for_io(struct extent_buffer *eb,
- struct btrfs_fs_info *fs_info,
- struct extent_page_data *epd)
+static void lock_extent_buffer_pages(struct extent_buffer_head *ebh,
+   struct extent_page_data *epd)
 {
+   struct extent_buffer *eb = &ebh->eb;
unsigned long i, num_pages;
-   int flush = 0;
+
+   num_pages = num_extent_pages(eb->start, eb->len);
+   for (i = 0; i < num_pages; i++) {
+   struct page *p = ebh->pages[i];
+   if (!trylock_page(p)) {
+   flush_write_bio(epd);
+   lock_page(p);
+   }
+   }
+
+   return;
+}
+
+static int noinline_for_stack
+lock_extent_buffer_for_io(struct extent_buffer *eb,
+   struct btrfs_fs_info *fs_info,
+   struct extent_page_data *epd)
+{
+   int dirty;
int ret = 0;
 
if (!btrfs_try_tree_write_lock(eb)) {
-   flush = 1;
flush_write_bio(epd);
btrfs_tree_lock(eb);
}
 
if (test_bit(EXTENT_BUFFER_WRITEBACK, &eb->ebflags)) {
+   dirty = test_bit(EXTENT_BUFFER_DIRTY, &eb->ebflags);
btrfs_tree_unlock(eb);
-   if (!epd->sync_io)
-   return 0;
-   if (!flush) {
-   flush_write_bio(epd);
-   flush = 1;
+   if (!epd->sync_io) {
+   if (!dirty)
+   return 1;
+   else
+   return 2;
}
+
+   flush_write_bio(epd);
+
while (1) {
wait_on_extent_buffer_writeback(eb);
btrfs_tree_lock(eb);
@@ -3769,29 +3789,14 @@ lock_extent_buffer_for_io(struct extent_buffer *eb,
__percpu_counter_add(&fs_info->dirty_metadata_bytes,
 -eb->len,
 fs_info->dirty_metadata_batch);
-   ret = 1;
+   ret = 0;
} else {
spin_unlock(&eb_head(eb)->refs_lock);
+   ret = 1;
}
 
btrfs_tree_unlock(eb);
 
-   if (!ret)
-   return ret;
-
-   num_pages = num_extent_pages(eb->start, eb->len)

[PATCH V20 04/19] Btrfs: subpage-blocksize: Define extent_buffer_head

2016-07-03 Thread Chandan Rajendra
In order to handle multiple extent buffers per page, first we need to create a
way to handle all the extent buffers that are attached to a page.

This patch creates a new data structure 'struct extent_buffer_head', and moves
fields that are common to all extent buffers from 'struct extent_buffer' to
'struct extent_buffer_head'

Also, this patch moves EXTENT_BUFFER_TREE_REF, EXTENT_BUFFER_DUMMY and
EXTENT_BUFFER_IN_TREE flags from extent_buffer->ebflags  to
extent_buffer_head->bflags.

Reviewed-by: Liu Bo 
Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/ctree.c   |   4 +-
 fs/btrfs/ctree.h   |   6 +-
 fs/btrfs/disk-io.c |  72 ++--
 fs/btrfs/extent-tree.c |   6 +-
 fs/btrfs/extent_io.c   | 602 ++---
 fs/btrfs/extent_io.h   |  63 ++--
 fs/btrfs/root-tree.c   |   2 +-
 fs/btrfs/super.c   |   9 +-
 fs/btrfs/tests/btrfs-tests.c   |  12 +-
 fs/btrfs/tests/extent-io-tests.c   |   5 +-
 fs/btrfs/tests/free-space-tree-tests.c |  79 +++--
 fs/btrfs/volumes.c |   2 +-
 include/trace/events/btrfs.h   |   2 +-
 13 files changed, 557 insertions(+), 307 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index e8a3ac6..4e35a21 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -160,7 +160,7 @@ struct extent_buffer *btrfs_root_node(struct btrfs_root 
*root)
 * the inc_not_zero dance and if it doesn't work then
 * synchronize_rcu and try again.
 */
-   if (atomic_inc_not_zero(&eb->refs)) {
+   if (atomic_inc_not_zero(&eb_head(eb)->refs)) {
rcu_read_unlock();
break;
}
@@ -1772,7 +1772,7 @@ static noinline int generic_bin_search(struct 
extent_buffer *eb,
int err;
 
if (low > high) {
-   btrfs_err(eb->fs_info,
+   btrfs_err(eb_head(eb)->fs_info,
 "%s: low (%d) < high (%d) eb %llu owner %llu level %d",
  __func__, low, high, eb->start,
  btrfs_header_owner(eb), btrfs_header_level(eb));
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index cc65e9b..893bedb 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1475,14 +1475,16 @@ static inline void btrfs_set_token_##name(struct 
extent_buffer *eb, \
 #define BTRFS_SETGET_HEADER_FUNCS(name, type, member, bits)\
 static inline u##bits btrfs_##name(struct extent_buffer *eb)   \
 {  \
-   type *p = page_address(eb->pages[0]);   \
+   type *p = page_address(eb_head(eb)->pages[0]) + \
+   (eb->start & (PAGE_SIZE -1));   \
u##bits res = le##bits##_to_cpu(p->member); \
return res; \
 }  \
 static inline void btrfs_set_##name(struct extent_buffer *eb,  \
u##bits val)\
 {  \
-   type *p = page_address(eb->pages[0]);   \
+   type *p = page_address(eb_head(eb)->pages[0]) + \
+   (eb->start & (PAGE_SIZE -1));   \
p->member = cpu_to_le##bits(val);   \
 }
 
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 685c81a..299f353 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -375,10 +375,9 @@ static int verify_parent_transid(struct extent_io_tree 
*io_tree,
ret = 0;
goto out;
}
-   btrfs_err_rl(eb->fs_info,
+   btrfs_err_rl(eb_head(eb)->fs_info,
"parent transid verify failed on %llu wanted %llu found %llu",
-   eb->start,
-   parent_transid, btrfs_header_generation(eb));
+   eb->start, parent_transid, btrfs_header_generation(eb));
ret = 1;
 
/*
@@ -452,7 +451,7 @@ static int btree_read_extent_buffer_pages(struct btrfs_root 
*root,
int mirror_num = 0;
int failed_mirror = 0;
 
-   clear_bit(EXTENT_BUFFER_CORRUPT, &eb->bflags);
+   clear_bit(EXTENT_BUFFER_CORRUPT, &eb->ebflags);
io_tree = &BTRFS_I(root->fs_info->btree_inode)->io_tree;
while (1) {
ret = read_extent_buffer_pages(io_tree, eb, start,
@@ -471,7 +470,7 @@ static int btree_read_extent_buffer_pages(struct btrfs_root 
*root,
 * there is

[PATCH V20 05/19] Btrfs: subpage-blocksize: Read tree blocks whose size is < PAGE_SIZE

2016-07-03 Thread Chandan Rajendra
In the case of subpage-blocksize, this patch makes it possible to read
only a single metadata block from the disk instead of all the metadata
blocks that map into a page.

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/disk-io.c   |  52 -
 fs/btrfs/disk-io.h   |   3 ++
 fs/btrfs/extent_io.c | 128 +++
 3 files changed, 142 insertions(+), 41 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 299f353..b09d3e3 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -612,29 +612,36 @@ static noinline int check_leaf(struct btrfs_root *root,
return 0;
 }
 
-static int btree_readpage_end_io_hook(struct btrfs_io_bio *io_bio,
- u64 phy_offset, struct page *page,
- u64 start, u64 end, int mirror)
+int verify_extent_buffer_read(struct btrfs_io_bio *io_bio,
+   struct page *page,
+   u64 start, u64 end, int mirror)
 {
-   u64 found_start;
-   int found_level;
+   struct address_space *mapping = 
(io_bio->bio).bi_io_vec->bv_page->mapping;
+   struct extent_buffer_head *ebh;
struct extent_buffer *eb;
-   struct btrfs_root *root = BTRFS_I(page->mapping->host)->root;
+   struct btrfs_root *root = BTRFS_I(mapping->host)->root;
struct btrfs_fs_info *fs_info = root->fs_info;
-   int ret = 0;
+   u64 found_start;
+   int found_level;
int reads_done;
-
-   if (!page->private)
-   goto out;
+   int ret = 0;
 
eb = (struct extent_buffer *)page->private;
+   do {
+   if ((eb->start <= start) && (eb->start + eb->len - 1 > start))
+   break;
+   } while ((eb = eb->eb_next) != NULL);
+
+   ASSERT(eb);
+
+   ebh = eb_head(eb);
 
/* the pending IO might have been the only thing that kept this buffer
 * in memory.  Make sure we have a ref for all this other checks
 */
extent_buffer_get(eb);
 
-   reads_done = atomic_dec_and_test(&eb_head(eb)->io_bvecs);
+   reads_done = atomic_dec_and_test(&ebh->io_bvecs);
if (!reads_done)
goto err;
 
@@ -690,30 +697,13 @@ err:
btree_readahead_hook(fs_info, eb, eb->start, ret);
 
if (ret) {
-   /*
-* our io error hook is going to dec the io pages
-* again, we have to make sure it has something
-* to decrement
-*/
atomic_inc(&eb_head(eb)->io_bvecs);
clear_extent_buffer_uptodate(eb);
}
-   free_extent_buffer(eb);
-out:
-   return ret;
-}
 
-static int btree_io_failed_hook(struct page *page, int failed_mirror)
-{
-   struct extent_buffer *eb;
+   free_extent_buffer(eb);
 
-   eb = (struct extent_buffer *)page->private;
-   set_bit(EXTENT_BUFFER_READ_ERR, &eb->ebflags);
-   eb->read_mirror = failed_mirror;
-   atomic_dec(&eb_head(eb)->io_bvecs);
-   if (test_and_clear_bit(EXTENT_BUFFER_READAHEAD, &eb->ebflags))
-   btree_readahead_hook(eb_head(eb)->fs_info, eb, eb->start, -EIO);
-   return -EIO;/* we fixed nothing */
+   return ret;
 }
 
 static void end_workqueue_bio(struct bio *bio)
@@ -4534,8 +4524,6 @@ static int btrfs_cleanup_transaction(struct btrfs_root 
*root)
 }
 
 static const struct extent_io_ops btree_extent_io_ops = {
-   .readpage_end_io_hook = btree_readpage_end_io_hook,
-   .readpage_io_failed_hook = btree_io_failed_hook,
.submit_bio_hook = btree_submit_bio_hook,
/* note we're sharing with inode.c for the merge bio hook */
.merge_bio_hook = btrfs_merge_bio_hook,
diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
index acba821..a81ff8d 100644
--- a/fs/btrfs/disk-io.h
+++ b/fs/btrfs/disk-io.h
@@ -113,6 +113,9 @@ static inline void btrfs_put_fs_root(struct btrfs_root 
*root)
kfree(root);
 }
 
+int verify_extent_buffer_read(struct btrfs_io_bio *io_bio,
+   struct page *page,
+   u64 start, u64 end, int mirror);
 void btrfs_mark_buffer_dirty(struct extent_buffer *buf);
 int btrfs_buffer_uptodate(struct extent_buffer *buf, u64 parent_transid,
  int atomic);
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 080baf7..a425f90 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -14,6 +14,7 @@
 #include "extent_io.h"
 #include "extent_map.h"
 #include "ctree.h"
+#include "disk-io.h"
 #include "btrfs_inode.h"
 #include "volumes.h"
 #include "check-integrity.h"
@@ -2207,7 +2208,7 @@ int repair_eb_io_failure(struct btrfs_root *root, struct 
extent_buffer *eb,
 

Re: btrfs/113: Assertion failure

2016-07-01 Thread Chandan Rajendra
On Friday, July 01, 2016 04:25:52 PM Josef Bacik wrote:
> On 07/01/2016 12:11 PM, Chandan Rajendra wrote:
> > Sorry, Forgot to add the mailing list to CC. Doing it now ...
> >
> >> While running btrfs/113, I see the following call trace,
> >>
> >> [  182.272009] BTRFS: assertion failed: !current->journal_info || flush != 
> >> BTRFS_RESERVE_FLUSH_ALL, file: 
> >> /home/chandan/repos/linux/fs/btrfs/extent-tree.c, line: 5131
> >> [  182.274010] ----[ cut here ]
> >> [  182.274685] kernel BUG at 
> >> /home/chandan/repos/linux/fs/btrfs/ctree.h:3347!
> >> [  182.274982] invalid opcode:  [#1] SMP DEBUG_PAGEALLOC
> >> [  182.274982] Modules linked in:
> >> [  182.274982] CPU: 2 PID: 2911 Comm: xfs_io Not tainted 
> >> 4.6.0-g5027553-dirty #29
> >> [  182.274982] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> >> Bochs 01/01/2011
> >> [  182.274982] task: 8818a4c3a400 ti: 8818aec4c000 task.ti: 
> >> 8818aec4c000
> >> [  182.274982] RIP: 0010:[]  [] 
> >> assfail+0x1a/0x1c
> >> [  182.274982] RSP: 0018:8818aec4f5d8  EFLAGS: 00010282
> >> [  182.274982] RAX: 0097 RBX: 0003 RCX: 
> >> 8203dc18
> >> [  182.274982] RDX: 0001 RSI: 0286 RDI: 
> >> 822c8ccc
> >> [  182.274982] RBP: 8818aec4f5d8 R08: fffe R09: 
> >> 
> >> [  182.274982] R10: 0005 R11: 028d R12: 
> >> 8818b44ff000
> >> [  182.274982] R13: 0003 R14: 8818a54a01f0 R15: 
> >> 8818a4f36150
> >> [  182.274982] FS:  7f54fbc6b700() GS:88193350() 
> >> knlGS:
> >> [  182.274982] CS:  0010 DS:  ES:  CR0: 80050033
> >> [  182.274982] CR2: 0061eba0 CR3: 0018aec54000 CR4: 
> >> 06e0
> >> [  182.274982] Stack:
> >> [  182.274982]  8818aec4fa70 8134a891 0002 
> >> 0003
> >> [  182.274982]  8818b2502600 8818b44ff000  
> >> 
> >> [  182.274982]     
> >> 
> >> [  182.274982] Call Trace:
> >> [  182.274982]  [] __reserve_metadata_bytes+0xb1/0x1fe0
> >> [  182.274982]  [] ? lookup_address+0x23/0x30
> >> [  182.274982]  [] ? _lookup_address_cpa.isra.9+0x2d/0x30
> >> [  182.274982]  [] ? 
> >> __change_page_attr_set_clr+0xeb/0xc80
> >> [  182.274982]  [] ? lookup_address+0x23/0x30
> >> [  182.274982]  [] ? get_alloc_profile+0x8a/0x1a0
> >> [  182.274982]  [] ? btrfs_get_alloc_profile+0x2b/0x30
> >> [  182.274982]  [] ? can_overcommit+0x9e/0x100
> >> [  182.274982]  [] ? 
> >> __reserve_metadata_bytes+0xc88/0x1fe0
> >> [  182.274982]  [] ? __alloc_pages_nodemask+0x10d/0xc80
> >> [  182.274982]  [] ? _lookup_address_cpa.isra.9+0x2d/0x30
> >> [  182.274982]  [] ? 
> >> __change_page_attr_set_clr+0xeb/0xc80
> >> [  182.274982]  [] ? lookup_address+0x23/0x30
> >> [  182.274982]  [] ? __slab_free+0x96/0x2b0
> >> [  182.274982]  [] ? __probe_kernel_read+0x39/0x90
> >> [  182.274982]  [] ? insert_state+0xc9/0x150
> >> [  182.274982]  [] ? 
> >> add_delayed_ref_tail_merge+0x2e/0x350
> >> [  182.274982]  [] reserve_metadata_bytes+0x1f/0xe0
> >> [  182.274982]  [] btrfs_block_rsv_add+0x26/0x50
> >> [  182.274982]  [] ? free_extent_state+0x15/0x20
> >> [  182.274982]  [] 
> >> btrfs_delalloc_reserve_metadata+0x13e/0x490
> >> [  182.274982]  [] btrfs_delalloc_reserve_space+0x2a/0x50
> >> [  182.274982]  [] btrfs_truncate_block+0x8a/0x430
> >> [  182.274982]  [] ? 
> >> generic_bin_search.constprop.35+0x86/0x1e0
> >> [  182.274982]  [] truncate_inline_extent+0x157/0x260
> >> [  182.274982]  [] ? btrfs_search_slot+0x86c/0x990
> >> [  182.274982]  [] ? free_extent_map+0x4c/0xa0
> >> [  182.274982]  [] btrfs_truncate_inode_items+0xba7/0xdc0
> >> [  182.274982]  [] btrfs_truncate+0x168/0x280
> >> [  182.274982]  [] btrfs_setattr+0x214/0x320
> >> [  182.274982]  [] notify_change+0x1dc/0x380
> >> [  182.274982]  [] do_truncate+0x61/0xa0
> >> [  182.274982]  [] 
> >> do_sys_ftruncate.constprop.17+0xf9/0x160
> >> [  182.274982]  [] SyS_ftruncate+0x9/0x10
> >> [  182.274982]  [] entry_SYSCALL_64_fastpath+0x13/0x8f
> >&

Re: btrfs/113: Assertion failure

2016-07-01 Thread Chandan Rajendra
Sorry, Forgot to add the mailing list to CC. Doing it now ...

> While running btrfs/113, I see the following call trace,
> 
> [  182.272009] BTRFS: assertion failed: !current->journal_info || flush != 
> BTRFS_RESERVE_FLUSH_ALL, file: 
> /home/chandan/repos/linux/fs/btrfs/extent-tree.c, line: 5131
> [  182.274010] [ cut here ]
> [  182.274685] kernel BUG at /home/chandan/repos/linux/fs/btrfs/ctree.h:3347!
> [  182.274982] invalid opcode:  [#1] SMP DEBUG_PAGEALLOC
> [  182.274982] Modules linked in:
> [  182.274982] CPU: 2 PID: 2911 Comm: xfs_io Not tainted 4.6.0-g5027553-dirty 
> #29
> [  182.274982] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> Bochs 01/01/2011
> [  182.274982] task: 8818a4c3a400 ti: 8818aec4c000 task.ti: 
> 8818aec4c000
> [  182.274982] RIP: 0010:[]  [] 
> assfail+0x1a/0x1c
> [  182.274982] RSP: 0018:8818aec4f5d8  EFLAGS: 00010282
> [  182.274982] RAX: 0097 RBX: 0003 RCX: 
> 8203dc18
> [  182.274982] RDX: 0001 RSI: 0286 RDI: 
> 822c8ccc
> [  182.274982] RBP: 8818aec4f5d8 R08: fffe R09: 
> 
> [  182.274982] R10: 0005 R11: 028d R12: 
> 8818b44ff000
> [  182.274982] R13: 0003 R14: 8818a54a01f0 R15: 
> 8818a4f36150
> [  182.274982] FS:  7f54fbc6b700() GS:88193350() 
> knlGS:
> [  182.274982] CS:  0010 DS:  ES:  CR0: 80050033
> [  182.274982] CR2: 0061eba0 CR3: 0018aec54000 CR4: 
> 06e0
> [  182.274982] Stack:
> [  182.274982]  8818aec4fa70 8134a891 0002 
> 0003
> [  182.274982]  8818b2502600 8818b44ff000  
> 
> [  182.274982]     
> 
> [  182.274982] Call Trace:
> [  182.274982]  [] __reserve_metadata_bytes+0xb1/0x1fe0
> [  182.274982]  [] ? lookup_address+0x23/0x30
> [  182.274982]  [] ? _lookup_address_cpa.isra.9+0x2d/0x30
> [  182.274982]  [] ? __change_page_attr_set_clr+0xeb/0xc80
> [  182.274982]  [] ? lookup_address+0x23/0x30
> [  182.274982]  [] ? get_alloc_profile+0x8a/0x1a0
> [  182.274982]  [] ? btrfs_get_alloc_profile+0x2b/0x30
> [  182.274982]  [] ? can_overcommit+0x9e/0x100
> [  182.274982]  [] ? __reserve_metadata_bytes+0xc88/0x1fe0
> [  182.274982]  [] ? __alloc_pages_nodemask+0x10d/0xc80
> [  182.274982]  [] ? _lookup_address_cpa.isra.9+0x2d/0x30
> [  182.274982]  [] ? __change_page_attr_set_clr+0xeb/0xc80
> [  182.274982]  [] ? lookup_address+0x23/0x30
> [  182.274982]  [] ? __slab_free+0x96/0x2b0
> [  182.274982]  [] ? __probe_kernel_read+0x39/0x90
> [  182.274982]  [] ? insert_state+0xc9/0x150
> [  182.274982]  [] ? add_delayed_ref_tail_merge+0x2e/0x350
> [  182.274982]  [] reserve_metadata_bytes+0x1f/0xe0
> [  182.274982]  [] btrfs_block_rsv_add+0x26/0x50
> [  182.274982]  [] ? free_extent_state+0x15/0x20
> [  182.274982]  [] 
> btrfs_delalloc_reserve_metadata+0x13e/0x490
> [  182.274982]  [] btrfs_delalloc_reserve_space+0x2a/0x50
> [  182.274982]  [] btrfs_truncate_block+0x8a/0x430
> [  182.274982]  [] ? 
> generic_bin_search.constprop.35+0x86/0x1e0
> [  182.274982]  [] truncate_inline_extent+0x157/0x260
> [  182.274982]  [] ? btrfs_search_slot+0x86c/0x990
> [  182.274982]  [] ? free_extent_map+0x4c/0xa0
> [  182.274982]  [] btrfs_truncate_inode_items+0xba7/0xdc0
> [  182.274982]  [] btrfs_truncate+0x168/0x280
> [  182.274982]  [] btrfs_setattr+0x214/0x320
> [  182.274982]  [] notify_change+0x1dc/0x380
> [  182.274982]  [] do_truncate+0x61/0xa0
> [  182.274982]  [] do_sys_ftruncate.constprop.17+0xf9/0x160
> [  182.274982]  [] SyS_ftruncate+0x9/0x10
> [  182.274982]  [] entry_SYSCALL_64_fastpath+0x13/0x8f
> [  182.274982] Code: 48 c7 c7 48 14 df 81 48 89 e5 e8 ac ac d3 ff 0f 0b 55 89 
> d1 31 c0 48 89 f2 48 89 fe 48 c7 c7 48 14 df 81 48 89 e5 e8 90 ac d3 ff <0f> 
> 0b 55 b9 11 00 00 00 48 89 e5 41 55 49 89 f5 41 54 49 89 d4 
> [  182.274982] RIP  [] assfail+0x1a/0x1c
> [  182.274982]  RSP 
> [  182.327207] ---[ end trace 44721e14eef0a6b2 ]---
> 
> 
> Basically btrfs_truncate() starts a transaction, btrfs_truncate_inode_items()
> encounters an inline extent and invokes
> btrfs_truncate_block(). btrfs_truncate_block() tries to reserve delalloc (both
> data & metadata) space. While doing so it passes BTRFS_RESERVE_FLUSH_ALL as an
> argument. Since we already have a transaction running, we fail the following
> ASSERT() statement in __reserve_metadata_bytes(),
> 
>ASSERT(!current->journal_info || flush != BTRFS_RESERVE_FLUSH_ALL);
> 

-- 
chandan

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix double free of fs root

2016-06-30 Thread Chandan Rajendra
On Tuesday, June 28, 2016 01:44:38 PM Liu Bo wrote:
> I got this warning while mounting a btrfs image,
> 
> [ 3020.509606] [ cut here ]
> [ 3020.510107] WARNING: CPU: 3 PID: 5581 at lib/idr.c:1051 
> ida_remove+0xca/0x190
> [ 3020.510853] ida_remove called for id=42 which is not allocated.
> [ 3020.511466] Modules linked in:
> [ 3020.511802] CPU: 3 PID: 5581 Comm: mount Not tainted 4.7.0-rc5+ #274
> [ 3020.512438] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> 1.8.2-20150714_191134- 04/01/2014
> [ 3020.513385]  0286 21295d86 88006c66b8f0 
> 8182ba5a
> [ 3020.514153]   0009 88006c66b930 
> 810e0ed7
> [ 3020.514928]  041b 8289a8c0 88007f437880 
> 
> [ 3020.515717] Call Trace:
> [ 3020.515965]  [] dump_stack+0xc9/0x13f
> [ 3020.516487]  [] __warn+0x147/0x160
> [ 3020.517005]  [] warn_slowpath_fmt+0x5f/0x80
> [ 3020.517572]  [] ida_remove+0xca/0x190
> [ 3020.518075]  [] free_anon_bdev+0x2c/0x60
> [ 3020.518609]  [] free_fs_root+0x13f/0x160
> [ 3020.519138]  [] btrfs_get_fs_root+0x379/0x3d0
> [ 3020.519710]  [] ? __mutex_unlock_slowpath+0x155/0x2c0
> [ 3020.520366]  [] open_ctree+0x2e91/0x3200
> [ 3020.520965]  [] btrfs_mount+0x1322/0x15b0
> [ 3020.521536]  [] ? kmemleak_alloc_percpu+0x44/0x170
> [ 3020.522167]  [] ? lockdep_init_map+0x61/0x210
> [ 3020.522780]  [] mount_fs+0x49/0x2c0
> [ 3020.523305]  [] vfs_kern_mount+0xac/0x1b0
> [ 3020.523872]  [] btrfs_mount+0x421/0x15b0
> [ 3020.524402]  [] ? kmemleak_alloc_percpu+0x44/0x170
> [ 3020.525045]  [] ? lockdep_init_map+0x61/0x210
> [ 3020.525657]  [] ? lockdep_init_map+0x61/0x210
> [ 3020.526289]  [] mount_fs+0x49/0x2c0
> [ 3020.526803]  [] vfs_kern_mount+0xac/0x1b0
> [ 3020.527365]  [] do_mount+0x41a/0x1770
> [ 3020.527899]  [] ? strndup_user+0x6d/0xc0
> [ 3020.528447]  [] ? memdup_user+0x78/0xb0
> [ 3020.528987]  [] SyS_mount+0x150/0x160
> [ 3020.529493]  [] entry_SYSCALL_64_fastpath+0x1f/0xbd
> 
> It turns out that we free fs root twice, btrfs_init_fs_root() calls
> free_anon_bdev(root->anon_dev) and later then btrfs_get_fs_root() cals
> free_fs_root which does another free_anon_bdev() and it ends up with the
> above warning.
> 
> Instead of reset root->anon_dev to 0 after free_anon_bdev(), we can let
> btrfs_init_fs_root() return directly since its callers have already done
> the free job by calling free_fs_root().

Looks good to me. Thanks for fixing this.

Reviewed-by: Chandan Rajendra 

-- 
chandan

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v11 00/13] Btrfs dedupe framework

2016-06-24 Thread Chandan Rajendra
On Saturday, June 25, 2016 09:22:44 AM Qu Wenruo wrote:
> 
> On 06/24/2016 05:29 PM, Chandan Rajendra wrote:
> > On Friday, June 24, 2016 10:50:41 AM Qu Wenruo wrote:
> >> Hi Chandan, David,
> >>
> >> When I'm trying to rebase dedupe patchset on top of Chadan's sub page
> >> size patchset (using David's for-next-test-20160620), although the
> >> rebase itself is quite simple, but I'm afraid that I found some bugs for
> >> sub page size patchset, *without* dedupe patchset applied.
> >>
> >> These bugs seems to be unrelated to each other
> >> 1) state leak at btrfs rmmod time
> >
> > The leak was due to not freeing 'cached_state' in
> > read_extent_buffer_pages(). I have fixed this and the fix will be part of 
> > the
> > patchset when I post the next version to the mailing list.
> >
> > I have always compiled the btrfs code as part of the vmlinux image and hence
> > have never rmmod the btrfs module during my local testing. The space leak
> > messages might have appeared when I shut down my guest. Hence I had never
> > noticed them before. Thanks once again for informing me about it.
> >
> >> 2) bytes_may_use leak at qgroup EDQUOTA error time
> >
> > I have a slightly older version of btrfs-progs which does not yet have btrfs
> > dedupe" command. I will get the new version and check if the space leak can 
> > be
> > reproduced on my machine.
> >
> > However, I don't see the space leak warning messages when the reproducer
> > script is executed after commenting out "btrfs dedupe enable $mnt".
> 
> Strange.
> That dedupe command is not useful at all, as I'm using the branch 
> without the dedupe patchset.
> Even with btrfs-progs dedupe patchset, dedupe enable only output ENOTTY 
> error message.
> 
> I'll double check if it's related to the dedupe.
> 
> BTW, are you testing with 4K page size?

Yes, I executed the script with 4k page size. I had based my patchset on top
of 4.7-rc2 kernel. If you are interested, you can get the kernel sources at
'https://github.com/chandanr/linux subpagesize-blocksize'.

I will soon rebase my patchset on David's master branch. I will let you know
if I hit the space leak issue on the rebased kernel.

-- 
chandan

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v11 00/13] Btrfs dedupe framework

2016-06-24 Thread Chandan Rajendra
On Friday, June 24, 2016 10:50:41 AM Qu Wenruo wrote:
> Hi Chandan, David,
> 
> When I'm trying to rebase dedupe patchset on top of Chadan's sub page 
> size patchset (using David's for-next-test-20160620), although the 
> rebase itself is quite simple, but I'm afraid that I found some bugs for 
> sub page size patchset, *without* dedupe patchset applied.
> 
> These bugs seems to be unrelated to each other
> 1) state leak at btrfs rmmod time

The leak was due to not freeing 'cached_state' in
read_extent_buffer_pages(). I have fixed this and the fix will be part of the
patchset when I post the next version to the mailing list.

I have always compiled the btrfs code as part of the vmlinux image and hence
have never rmmod the btrfs module during my local testing. The space leak
messages might have appeared when I shut down my guest. Hence I had never
noticed them before. Thanks once again for informing me about it.

> 2) bytes_may_use leak at qgroup EDQUOTA error time

I have a slightly older version of btrfs-progs which does not yet have btrfs
dedupe" command. I will get the new version and check if the space leak can be
reproduced on my machine.

However, I don't see the space leak warning messages when the reproducer
script is executed after commenting out "btrfs dedupe enable $mnt".

> 3) selftest is run several times at modules load time
> 15 times, to be more exact
> And since I didn't found any immediate number related to run it 15
> times, I assume at least it's not designed to do it 15 times.
> 
> The reproducer for 1) and 2) is quite simple, extracted from btrfs/022 
> test case:
> --
> dev=/dev/sdb5
> mnt=/mnt/test
> 
> umount $dev &> /dev/null
> 
> mkfs.btrfs $dev -f
> mount $dev $mnt -o nospace_cache
> btrfs dedupe enable $mnt
> btrfs sub create $mnt/sub
> btrfs quota enable $mnt
> 
> 
> # Just use small limit, making ftrace less noise.
> btrfs qgroup limit 512K 0/257 $mnt
> dd if=/dev/urandom of=$mnt/sub/test bs=1M count=1
> umount $mnt
> rmmod btrfs
> --
> 
> At unmount time, kernel warning will happen due to may_use bytes leak.
> I could dig it further, as it looks like a bug in space reservation 
> failure case.
> --
> BTRFS: space_info 1 has 8044544 free, is not full
> BTRFS: space_info total=8388608, used=344064, pinned=0, reserved=0, 
> may_use=409600, readonly=0
> --
> 
> And at rmmod time, btrfs will detect extent_state leak, whose length is 
> always 4095 (page size - 1).
> 
> Hope this will help, and I'm willing to help to fix the problem.
> 
> Thanks,
> Qu
> 
> At 06/23/2016 08:17 PM, David Sterba wrote:
> > On Tue, Jun 21, 2016 at 10:25:19PM +0530, Chandan Rajendra wrote:
> >>>> I'm completely OK to do the rebase, but since I don't have 64K page size
> >>>> machine to test the rebase, we can only test if 4K system is unaffected.
> >>>>
> >>>> Although not much help, at least it would be better than making it 
> >>>> compile.
> >>>>
> >>>> Also such rebase may help us to expose bad design/unexpected corner case
> >>>> in dedupe.
> >>>> So if it's OK, please let me try to do the rebase.
> >>>
> >>> Well, if you base dedupe on subpage, then it could be hard to find the
> >>> patchset that introduces bugs, or combination of both. We should be able
> >>> to test the features independently, and thus I'm proposing to first find
> >>> some common patchset that makes that possible.
> >>
> >> I am not sure if I understood the above statement correctly. Do you mean to
> >> commit the 'common/simple' patches from both the subpage-blocksize & dedupe
> >> patchset first and then bring in the complicated ones later?
> >
> > That would be great yes, but ...
> >
> >> If yes, then we have a problem doing that w.r.t subpage-blocksize
> >> patchset. The first few patches bring in the core changes necessary for the
> >> other remaining patches.
> >
> > ... not easily possible. I looked again for common functions that change
> > the singature and found only cow_file_range and run_delalloc_nocow. The
> > plan:
> >
> > - separate patch that adds new parameters required by both patches to
> >   the functions
> > - update all call sites, add 0/NULL as defaults for the new unused
> >   parameters
> > - rebase both patches on top of this patch
> >
> > How does this help: if a patch starts to use the new parameter, it
> > changes only the value at all call sites

Re: [PATCH v11 00/13] Btrfs dedupe framework

2016-06-23 Thread Chandan Rajendra
On Friday, June 24, 2016 10:50:41 AM Qu Wenruo wrote:
> Hi Chandan, David,
> 
> When I'm trying to rebase dedupe patchset on top of Chadan's sub page 
> size patchset (using David's for-next-test-20160620), although the 
> rebase itself is quite simple, but I'm afraid that I found some bugs for 
> sub page size patchset, *without* dedupe patchset applied.
> 
> These bugs seems to be unrelated to each other
> 1) state leak at btrfs rmmod time
> 2) bytes_may_use leak at qgroup EDQUOTA error time
> 3) selftest is run several times at modules load time
> 15 times, to be more exact
> And since I didn't found any immediate number related to run it 15
> times, I assume at least it's not designed to do it 15 times.
>

Ah, In btrfs_run_sanity_tests(), just after,

for (i = 0; i < ARRAY_SIZE(test_sectorsize); i++) {
sectorsize = test_sectorsize[i];


I missed out on adding "if (sectorsize > PAGE_SIZE) break;". I will fix this
up in the next post of the patchset. Thanks for pointing this out.

> The reproducer for 1) and 2) is quite simple, extracted from btrfs/022 
> test case:
> --
> dev=/dev/sdb5
> mnt=/mnt/test
> 
> umount $dev &> /dev/null
> 
> mkfs.btrfs $dev -f
> mount $dev $mnt -o nospace_cache
> btrfs dedupe enable $mnt
> btrfs sub create $mnt/sub
> btrfs quota enable $mnt
> 
> 
> # Just use small limit, making ftrace less noise.
> btrfs qgroup limit 512K 0/257 $mnt
> dd if=/dev/urandom of=$mnt/sub/test bs=1M count=1
> umount $mnt
> rmmod btrfs
> --
> 
> At unmount time, kernel warning will happen due to may_use bytes leak.
> I could dig it further, as it looks like a bug in space reservation 
> failure case.
> --
> BTRFS: space_info 1 has 8044544 free, is not full
> BTRFS: space_info total=8388608, used=344064, pinned=0, reserved=0, 
> may_use=409600, readonly=0
> --
> 
> And at rmmod time, btrfs will detect extent_state leak, whose length is 
> always 4095 (page size - 1).
>

Qu, I will investigate and fix this issue. And thanks a lot for the
reproducer test.

-- 
chandan

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v11 00/13] Btrfs dedupe framework

2016-06-23 Thread Chandan Rajendra
On Thursday, June 23, 2016 02:17:38 PM David Sterba wrote:
> On Tue, Jun 21, 2016 at 10:25:19PM +0530, Chandan Rajendra wrote:
> > > > I'm completely OK to do the rebase, but since I don't have 64K page 
> > > > size 
> > > > machine to test the rebase, we can only test if 4K system is unaffected.
> > > > 
> > > > Although not much help, at least it would be better than making it 
> > > > compile.
> > > > 
> > > > Also such rebase may help us to expose bad design/unexpected corner 
> > > > case 
> > > > in dedupe.
> > > > So if it's OK, please let me try to do the rebase.
> > > 
> > > Well, if you base dedupe on subpage, then it could be hard to find the
> > > patchset that introduces bugs, or combination of both. We should be able
> > > to test the features independently, and thus I'm proposing to first find
> > > some common patchset that makes that possible.
> > 
> > I am not sure if I understood the above statement correctly. Do you mean to
> > commit the 'common/simple' patches from both the subpage-blocksize & dedupe
> > patchset first and then bring in the complicated ones later?
> 
> That would be great yes, but ...
> 
> > If yes, then we have a problem doing that w.r.t subpage-blocksize
> > patchset. The first few patches bring in the core changes necessary for the
> > other remaining patches.
> 
> ... not easily possible. I looked again for common functions that change
> the singature and found only cow_file_range and run_delalloc_nocow. The
> plan:
> 
> - separate patch that adds new parameters required by both patches to
>   the functions
> - update all call sites, add 0/NULL as defaults for the new unused
>   parameters
> - rebase both patches on top of this patch
> 
> How does this help: if a patch starts to use the new parameter, it
> changes only the value at all call sites. This is much easier to verify
> and merge manually compared to adding a new parameter to the middle of
> the list, namely when the functions take 6+.

David, I can implement it. In my next post of the subpage-blocksize patchset, I
will bring in this change.


-- 
chandan

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: Force stripesize to the value of sectorsize

2016-06-23 Thread Chandan Rajendra
Btrfs code currently assumes stripesize to be same as
sectorsize. However Btrfs-progs (until commit
df05c7ed455f519e6e15e46196392e4757257305) has been setting
btrfs_super_block->stripesize to a value of 4096.

This commit makes sure that the value of btrfs_super_block->stripesize
is a power of 2. Later, it unconditionally sets btrfs_root->stripesize
to sectorsize.

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/disk-io.c | 6 ++
 fs/btrfs/volumes.c | 4 ++--
 2 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 54cca7a..60ce119 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2806,7 +2806,7 @@ int open_ctree(struct super_block *sb,
 
nodesize = btrfs_super_nodesize(disk_super);
sectorsize = btrfs_super_sectorsize(disk_super);
-   stripesize = btrfs_super_stripesize(disk_super);
+   stripesize = sectorsize;
fs_info->dirty_metadata_batch = nodesize * (1 + ilog2(nr_cpu_ids));
fs_info->delalloc_batch = sectorsize * 512 * (1 + ilog2(nr_cpu_ids));
 
@@ -4133,9 +4133,7 @@ static int btrfs_check_super_valid(struct btrfs_fs_info 
*fs_info,
   btrfs_super_bytes_used(sb));
ret = -EINVAL;
}
-   if (!is_power_of_2(btrfs_super_stripesize(sb)) ||
-   ((btrfs_super_stripesize(sb) != sectorsize) &&
-   (btrfs_super_stripesize(sb) != 4096))) {
+   if (!is_power_of_2(btrfs_super_stripesize(sb))) {
btrfs_err(fs_info, "invalid stripesize %u",
   btrfs_super_stripesize(sb));
ret = -EINVAL;
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index c3a2900..64eec2c 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -4694,12 +4694,12 @@ static int __btrfs_alloc_chunk(struct 
btrfs_trans_handle *trans,
 
if (type & BTRFS_BLOCK_GROUP_RAID5) {
raid_stripe_len = find_raid56_stripe_len(ndevs - 1,
-btrfs_super_stripesize(info->super_copy));
+   extent_root->stripesize);
data_stripes = num_stripes - 1;
}
if (type & BTRFS_BLOCK_GROUP_RAID6) {
raid_stripe_len = find_raid56_stripe_len(ndevs - 2,
-btrfs_super_stripesize(info->super_copy));
+   extent_root->stripesize);
data_stripes = num_stripes - 2;
}
 
-- 
2.5.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v11 00/13] Btrfs dedupe framework

2016-06-21 Thread Chandan Rajendra
On Tuesday, June 21, 2016 11:34:57 AM David Sterba wrote:
> On Tue, Jun 21, 2016 at 05:26:23PM +0800, Qu Wenruo wrote:
> > > Yeah, but I'm now concerned about the way both will be integrated in the
> > > development or preview branches, not really the functionality itself.
> > >
> > > Now the conflicts are not trivial, so this takes extra time on my side
> > > and I can't be sure about the result in the end if I put only minor
> > > efforts to resolve the conflicts ("make it compile"). And I don't want
> > > to do that too often.
> > >
> > > As stated in past discussions, the features of this impact should spend
> > > one development cycle in for-next, even if it's not ready for merge or
> > > there are reviews going on.
> > >
> > > The subpage patchset is now in a relatively good shape to start actual
> > > testing, which already revealed some problems.
> > >
> > >
> > I'm completely OK to do the rebase, but since I don't have 64K page size 
> > machine to test the rebase, we can only test if 4K system is unaffected.
> > 
> > Although not much help, at least it would be better than making it compile.
> > 
> > Also such rebase may help us to expose bad design/unexpected corner case 
> > in dedupe.
> > So if it's OK, please let me try to do the rebase.
> 
> Well, if you base dedupe on subpage, then it could be hard to find the
> patchset that introduces bugs, or combination of both. We should be able
> to test the features independently, and thus I'm proposing to first find
> some common patchset that makes that possible.
>

Hi David,

I am not sure if I understood the above statement correctly. Do you mean to
commit the 'common/simple' patches from both the subpage-blocksize & dedupe
patchset first and then bring in the complicated ones later?

If yes, then we have a problem doing that w.r.t subpage-blocksize
patchset. The first few patches bring in the core changes necessary for the
other remaining patches.

-- 
chandan

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V19 05/19] Btrfs: subpage-blocksize: Read tree blocks whose size is < PAGE_SIZE

2016-06-20 Thread Chandan Rajendra
On Monday, June 20, 2016 01:54:05 PM David Sterba wrote:
> On Tue, Jun 14, 2016 at 12:41:02PM +0530, Chandan Rajendra wrote:
> > In the case of subpage-blocksize, this patch makes it possible to read
> > only a single metadata block from the disk instead of all the metadata
> > blocks that map into a page.
> 
> This patch has a conflict with a next pending patch
> 
> "Btrfs: fix eb memory leak due to readpage failure"
> https://patchwork.kernel.org/patch/9153927/
>

I will fix this and also the merge conflict in the patch " [PATCH V19 11/19]
Btrfs: subpage-blocksize: Prevent writes to an extent buffer when PG_writeback
flag is set" and resend the patchset.

-- 
chandan

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: let super_stripesize match with sectorsize

2016-06-16 Thread Chandan Rajendra
On Thursday, June 16, 2016 11:08:05 PM Liu Bo wrote:
> On Fri, Jun 17, 2016 at 10:48:05AM +0530, Chandan Rajendra wrote:
> > On Thursday, June 16, 2016 10:01:41 AM Liu Bo wrote:
> > > On Thu, Jun 16, 2016 at 01:53:59PM +0530, Chandan Rajendra wrote:
> > > > On Wednesday, June 15, 2016 05:09:55 PM Liu Bo wrote:
> > > > > On Wed, Jun 15, 2016 at 03:50:17PM +0530, Chandan Rajendra wrote:
> > > > > > On Wednesday, June 15, 2016 09:12:28 AM Chandan Rajendra wrote:
> > > > > > > Hello Liu Bo,
> > > > > > > 
> > > > > > > We have to fix the following check in check_super() as well,
> > > > > > > 
> > > > > > >if (btrfs_super_stripesize(sb) != 4096) {
> > > > > > >
> > > > > > > error("invalid stripesize %u",
> > > > > > > btrfs_super_stripesize(sb));
> > > > > > > goto error_out;
> > > > > > > 
> > > > > > > }
> > > > > > > 
> > > > > > > i.e. btrfs_super_stripesize(sb) must be equal to
> > > > > > > btrfs_super_sectorsize(sb).
> > > > > > > 
> > > > > > > However in btrfs-progs (mkfs.c to be precise) since we had
> > > > > > > stripesize
> > > > > > > hardcoded to 4096, setting stripesize to the value of sectorsize
> > > > > > > in
> > > > > > > mkfs.c will cause the following to occur when mkfs.btrfs is
> > > > > > > invoked
> > > > > > > for
> > > > > > > devices with existing Btrfs filesystem instances,
> > > > > > > 
> > > > > > > NOTE: Assume we have changed the stripesize validation in
> > > > > > > btrfs-progs'
> > > > > > > check_super() to,
> > > > > > > 
> > > > > > > if (btrfs_super_stripesize(sb) !=
> > > > > > > btrfs_super_sectorsize(sb)) {
> > > > > > > 
> > > > > > > error("invalid stripesize %u",
> > > > > > > btrfs_super_stripesize(sb));
> > > > > > > goto error_out;
> > > > > > > 
> > > > > > > }
> > > > > > > 
> > > > > > > main()
> > > > > > > 
> > > > > > >  for each device file passed as an argument,
> > > > > > >  
> > > > > > >test_dev_for_mkfs()
> > > > > > >
> > > > > > >  check_mounted
> > > > > > >  
> > > > > > >check_mounted_where
> > > > > > >
> > > > > > >  btrfs_scan_one_device
> > > > > > >  
> > > > > > >btrfs_read_dev_super
> > > > > > >
> > > > > > >  check_super() call will fail for existing
> > > > > > >  filesystems
> > > > > > >  which
> > > > > > > 
> > > > > > > have stripesize set to 4k. All existing filesystem instances
> > > > > > > will
> > > > > > > fall
> > > > > > > into
> > > > > > > this category.
> > > > > > > 
> > > > > > > This error value is pushed up the call stack and this causes the
> > > > > > > device
> > > > > > > to
> > > > > > > not get added to the fs_devices_mnt list in
> > > > > > > check_mounted_where().
> > > > > > > Hence
> > > > > > > we
> > > > > > > would fail to correctly check the mount status of the
> > > > > > > multi-device
> > > > > > > btrfs
> > > > > > > filesystems.
> > > > > > 
> > > > > > We can end up in the following scenario,
> > > > > > - /dev/loop0, /dev/loop1 and /dev/loop2 are mounted as a single
> > > > > > 
> > > > > >   filesystem. The filesystem was created by an older version of
> &g

[PATCH V2] Btrfs-progs: Initialize stripesize to the value of sectorsize

2016-06-16 Thread Chandan Rajendra
stripesize should ideally be set to the value of sectorsize. However
previous versions of btrfs-progs/mkfs.btrfs had set stripesize to a
value of 4096. On machines with PAGE_SIZE other than 4096, This could
lead to the following scenario,

- /dev/loop0, /dev/loop1 and /dev/loop2 are mounted as a single
  filesystem. The filesystem was created by an older version of mkfs.btrfs
  which set stripesize to 4k.
- losetup -a
   /dev/loop0: [0030]:19477 (/root/disk-imgs/file-0.img)
   /dev/loop1: [0030]:16577 (/root/disk-imgs/file-1.img)
   /dev/loop2: [64770]:3423229 (/root/disk-imgs/file-2.img)
- /etc/mtab lists only /dev/loop0
- losetup /dev/loop4 /root/disk-imgs/file-1.img
  The new mkfs.btrfs invoked as 'mkfs.btrfs -f /dev/loop4' succeeds even
  though /dev/loop1 has already been mounted and has
  /root/disk-imgs/file-1.img as its backing file.

The above behaviour occurs because check_super() function returns an
error code (due to stripesize not being set to 4096) and hence
check_mounted_where() function treats /dev/loop1 as a disk containing a
filesystem other than Btrfs.

Hence as a workaround this commit allows 4096 as a valid stripesize.

Signed-off-by: Chandan Rajendra 
---
Changelog:
v1->v2: Use Tabs to indent rather than spaces. Thanks to Satoru Takeuchi for
pointing this out.

disk-io.c | 3 ++-
 mkfs.c| 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/disk-io.c b/disk-io.c
index 77eb0a6..fbce506 100644
--- a/disk-io.c
+++ b/disk-io.c
@@ -1476,7 +1476,8 @@ static int check_super(struct btrfs_super_block *sb)
error("invalid bytes_used %llu", btrfs_super_bytes_used(sb));
goto error_out;
}
-   if (btrfs_super_stripesize(sb) != 4096) {
+   if ((btrfs_super_stripesize(sb) != 4096)
+   && (btrfs_super_stripesize(sb) != btrfs_super_sectorsize(sb))) {
error("invalid stripesize %u", btrfs_super_stripesize(sb));
goto error_out;
}
diff --git a/mkfs.c b/mkfs.c
index a3a3c14..697bdc2 100644
--- a/mkfs.c
+++ b/mkfs.c
@@ -1482,6 +1482,7 @@ int main(int argc, char **argv)
}
 
sectorsize = max(sectorsize, (u32)sysconf(_SC_PAGESIZE));
+   stripesize = sectorsize;
saved_optind = optind;
dev_cnt = argc - optind;
if (dev_cnt == 0)
-- 
2.5.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: let super_stripesize match with sectorsize

2016-06-16 Thread Chandan Rajendra
On Thursday, June 16, 2016 10:01:41 AM Liu Bo wrote:
> On Thu, Jun 16, 2016 at 01:53:59PM +0530, Chandan Rajendra wrote:
> > On Wednesday, June 15, 2016 05:09:55 PM Liu Bo wrote:
> > > On Wed, Jun 15, 2016 at 03:50:17PM +0530, Chandan Rajendra wrote:
> > > > On Wednesday, June 15, 2016 09:12:28 AM Chandan Rajendra wrote:
> > > > > Hello Liu Bo,
> > > > > 
> > > > > We have to fix the following check in check_super() as well,
> > > > > 
> > > > >if (btrfs_super_stripesize(sb) != 4096) {
> > > > >
> > > > > error("invalid stripesize %u",
> > > > > btrfs_super_stripesize(sb));
> > > > > goto error_out;
> > > > > 
> > > > > }
> > > > > 
> > > > > i.e. btrfs_super_stripesize(sb) must be equal to
> > > > > btrfs_super_sectorsize(sb).
> > > > > 
> > > > > However in btrfs-progs (mkfs.c to be precise) since we had
> > > > > stripesize
> > > > > hardcoded to 4096, setting stripesize to the value of sectorsize in
> > > > > mkfs.c will cause the following to occur when mkfs.btrfs is invoked
> > > > > for
> > > > > devices with existing Btrfs filesystem instances,
> > > > > 
> > > > > NOTE: Assume we have changed the stripesize validation in
> > > > > btrfs-progs'
> > > > > check_super() to,
> > > > > 
> > > > > if (btrfs_super_stripesize(sb) !=
> > > > > btrfs_super_sectorsize(sb)) {
> > > > > 
> > > > > error("invalid stripesize %u",
> > > > > btrfs_super_stripesize(sb));
> > > > > goto error_out;
> > > > > 
> > > > > }
> > > > > 
> > > > > main()
> > > > > 
> > > > >  for each device file passed as an argument,
> > > > >  
> > > > >test_dev_for_mkfs()
> > > > >
> > > > >  check_mounted
> > > > >  
> > > > >check_mounted_where
> > > > >
> > > > >  btrfs_scan_one_device
> > > > >  
> > > > >btrfs_read_dev_super
> > > > >
> > > > >  check_super() call will fail for existing filesystems
> > > > >  which
> > > > > 
> > > > > have stripesize set to 4k. All existing filesystem instances will
> > > > > fall
> > > > > into
> > > > > this category.
> > > > > 
> > > > > This error value is pushed up the call stack and this causes the
> > > > > device
> > > > > to
> > > > > not get added to the fs_devices_mnt list in check_mounted_where().
> > > > > Hence
> > > > > we
> > > > > would fail to correctly check the mount status of the multi-device
> > > > > btrfs
> > > > > filesystems.
> > > > 
> > > > We can end up in the following scenario,
> > > > - /dev/loop0, /dev/loop1 and /dev/loop2 are mounted as a single
> > > > 
> > > >   filesystem. The filesystem was created by an older version of
> > > >   mkfs.btrfs
> > > >   which set stripesize to 4k.
> > > > 
> > > > - losetup -a
> > > > 
> > > >/dev/loop0: [0030]:19477 (/root/disk-imgs/file-0.img)
> > > >/dev/loop1: [0030]:16577 (/root/disk-imgs/file-1.img)
> > > >/dev/loop2: [64770]:3423229 (/root/disk-imgs/file-2.img)
> > > > 
> > > > - /etc/mtab lists only /dev/loop0
> > > > - "losetup /dev/loop4 /root/disk-imgs/file-1.img"
> > > > 
> > > >The new mkfs.btrfs invoked as 'mkfs.btrfs -f /dev/loop4' succeeds
> > > >even
> > > >though /dev/loop1 has already been mounted and has
> > > >/root/disk-imgs/file-1.img as its backing file.
> > > > 
> > > > So IMHO the only solution is to have the stripesize check in
> > > > check_super()
> > > > to allow both '4k' and 'sectorsize' as valid values i.e.
> > > > 
> > > >  

[PATCH] Btrfs-progs: Initialize stripesize to the value of sectorsize

2016-06-16 Thread Chandan Rajendra
stripesize should ideally be set to the value of sectorsize. However
previous versions of btrfs-progs/mkfs.btrfs had set stripesize to a
value of 4096. On machines with PAGE_SIZE other than 4096, This could
lead to the following scenario,

- /dev/loop0, /dev/loop1 and /dev/loop2 are mounted as a single
  filesystem. The filesystem was created by an older version of mkfs.btrfs
  which set stripesize to 4k.
- losetup -a
   /dev/loop0: [0030]:19477 (/root/disk-imgs/file-0.img)
   /dev/loop1: [0030]:16577 (/root/disk-imgs/file-1.img)
   /dev/loop2: [64770]:3423229 (/root/disk-imgs/file-2.img)
- /etc/mtab lists only /dev/loop0
- losetup /dev/loop4 /root/disk-imgs/file-1.img
  The new mkfs.btrfs invoked as 'mkfs.btrfs -f /dev/loop4' succeeds even
  though /dev/loop1 has already been mounted and has
  /root/disk-imgs/file-1.img as its backing file.

The above behaviour occurs because check_super() function returns an
error code (due to stripesize not being set to 4096) and hence
check_mounted_where() function treats /dev/loop1 as a disk containing a
filesystem other than Btrfs.

Hence as a workaround this commit allows 4096 as a valid stripesize.

Signed-off-by: Chandan Rajendra 
---
 disk-io.c | 4 +++-
 mkfs.c| 1 +
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/disk-io.c b/disk-io.c
index 77eb0a6..1ac7631 100644
--- a/disk-io.c
+++ b/disk-io.c
@@ -1476,7 +1476,9 @@ static int check_super(struct btrfs_super_block *sb)
error("invalid bytes_used %llu", btrfs_super_bytes_used(sb));
goto error_out;
}
-   if (btrfs_super_stripesize(sb) != 4096) {
+
+if ((btrfs_super_stripesize(sb) != 4096)
+   && (btrfs_super_stripesize(sb) != btrfs_super_sectorsize(sb))) {
error("invalid stripesize %u", btrfs_super_stripesize(sb));
goto error_out;
}
diff --git a/mkfs.c b/mkfs.c
index a3a3c14..697bdc2 100644
--- a/mkfs.c
+++ b/mkfs.c
@@ -1482,6 +1482,7 @@ int main(int argc, char **argv)
}
 
sectorsize = max(sectorsize, (u32)sysconf(_SC_PAGESIZE));
+   stripesize = sectorsize;
saved_optind = optind;
dev_cnt = argc - optind;
if (dev_cnt == 0)
-- 
2.5.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: btrfs_check_super_valid: Allow 4096 as stripesize

2016-06-16 Thread Chandan Rajendra
Older btrfs-progs/mkfs.btrfs sets 4096 as the stripesize. Hence
restricting stripesize to be equal to sectorsize would cause super block
validation to return an error on architectures where PAGE_SIZE is not
equal to 4096.

Hence as a workaround, this commit allows stripesize to be set to 4096
bytes.

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/disk-io.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 1142127..7f92b4f 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -4139,7 +4139,8 @@ static int btrfs_check_super_valid(struct btrfs_fs_info 
*fs_info,
ret = -EINVAL;
}
if (!is_power_of_2(btrfs_super_stripesize(sb)) ||
-   btrfs_super_stripesize(sb) != sectorsize) {
+   ((btrfs_super_stripesize(sb) != sectorsize) &&
+   (btrfs_super_stripesize(sb) != 4096))) {
btrfs_err(fs_info, "invalid stripesize %u",
   btrfs_super_stripesize(sb));
ret = -EINVAL;
-- 
2.5.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: let super_stripesize match with sectorsize

2016-06-16 Thread Chandan Rajendra
On Wednesday, June 15, 2016 05:09:55 PM Liu Bo wrote:
> On Wed, Jun 15, 2016 at 03:50:17PM +0530, Chandan Rajendra wrote:
> > On Wednesday, June 15, 2016 09:12:28 AM Chandan Rajendra wrote:
> > > Hello Liu Bo,
> > > 
> > > We have to fix the following check in check_super() as well,
> > > 
> > >if (btrfs_super_stripesize(sb) != 4096) {
> > >
> > > error("invalid stripesize %u",
> > > btrfs_super_stripesize(sb));
> > > goto error_out;
> > > 
> > > }
> > > 
> > > i.e. btrfs_super_stripesize(sb) must be equal to
> > > btrfs_super_sectorsize(sb).
> > > 
> > > However in btrfs-progs (mkfs.c to be precise) since we had stripesize
> > > hardcoded to 4096, setting stripesize to the value of sectorsize in
> > > mkfs.c will cause the following to occur when mkfs.btrfs is invoked for
> > > devices with existing Btrfs filesystem instances,
> > > 
> > > NOTE: Assume we have changed the stripesize validation in btrfs-progs'
> > > check_super() to,
> > > 
> > > if (btrfs_super_stripesize(sb) != btrfs_super_sectorsize(sb)) {
> > > 
> > > error("invalid stripesize %u",
> > > btrfs_super_stripesize(sb));
> > > goto error_out;
> > > 
> > > }
> > > 
> > > main()
> > > 
> > >  for each device file passed as an argument,
> > >  
> > >test_dev_for_mkfs()
> > >
> > >  check_mounted
> > >  
> > >check_mounted_where
> > >
> > >  btrfs_scan_one_device
> > >  
> > >btrfs_read_dev_super
> > >
> > >  check_super() call will fail for existing filesystems which
> > > 
> > > have stripesize set to 4k. All existing filesystem instances will fall
> > > into
> > > this category.
> > > 
> > > This error value is pushed up the call stack and this causes the device
> > > to
> > > not get added to the fs_devices_mnt list in check_mounted_where(). Hence
> > > we
> > > would fail to correctly check the mount status of the multi-device btrfs
> > > filesystems.
> > 
> > We can end up in the following scenario,
> > - /dev/loop0, /dev/loop1 and /dev/loop2 are mounted as a single
> > 
> >   filesystem. The filesystem was created by an older version of mkfs.btrfs
> >   which set stripesize to 4k.
> > 
> > - losetup -a
> > 
> >/dev/loop0: [0030]:19477 (/root/disk-imgs/file-0.img)
> >/dev/loop1: [0030]:16577 (/root/disk-imgs/file-1.img)
> >/dev/loop2: [64770]:3423229 (/root/disk-imgs/file-2.img)
> > 
> > - /etc/mtab lists only /dev/loop0
> > - "losetup /dev/loop4 /root/disk-imgs/file-1.img"
> > 
> >The new mkfs.btrfs invoked as 'mkfs.btrfs -f /dev/loop4' succeeds even
> >though /dev/loop1 has already been mounted and has
> >/root/disk-imgs/file-1.img as its backing file.
> > 
> > So IMHO the only solution is to have the stripesize check in check_super()
> > to allow both '4k' and 'sectorsize' as valid values i.e.
> > 
> > if ((btrfs_super_stripesize(sb) != 4096)
> > 
> > && (btrfs_super_stripesize(sb) != btrfs_super_sectorsize(sb))) {
> > 
> > error("invalid stripesize %u",
> > btrfs_super_stripesize(sb));
> > goto error_out;
> > 
> > }
> 
> That's a good one.
> 
> But if we go back to the original point, in the kernel side,
> 1. in open_ctree(), root->stripesize = btrfs_super_stripesize();
> 
> 2. in find_free_extent(),
>   ...
>   search_start = ALIGN(offset, root->stripesize);
>   ...
> 3. in btrfs_alloc_tree_block(),
>   ...
>   ret = btrfs_reserve_extent(..., &ins, ...);
>   ...
>   buf = btrfs_init_new_buffer(trans, root, ins.objectid, level);
> 
> 4. in btrfs_init_new_buffer(),
>   ...
>   buf = btrfs_find_create_tree_block(root, bytenr);
>   ...
> 
> Because 'num_bytes' we pass to find_free_extent() is aligned to
> sectorsize, the free space we can find is aligned to sectorsize,
> which means 'offset' in '1. find_free_extent()' is aligned to sectorsize

Re: [PATCH] Btrfs: let super_stripesize match with sectorsize

2016-06-15 Thread Chandan Rajendra
On Wednesday, June 15, 2016 09:12:28 AM Chandan Rajendra wrote:
> Hello Liu Bo,
> 
> We have to fix the following check in check_super() as well,
> 
>if (btrfs_super_stripesize(sb) != 4096) {
> error("invalid stripesize %u", btrfs_super_stripesize(sb));
> goto error_out;
> }
> 
> i.e. btrfs_super_stripesize(sb) must be equal to
> btrfs_super_sectorsize(sb).
> 
> However in btrfs-progs (mkfs.c to be precise) since we had stripesize
> hardcoded to 4096, setting stripesize to the value of sectorsize in
> mkfs.c will cause the following to occur when mkfs.btrfs is invoked for
> devices with existing Btrfs filesystem instances,
> 
> NOTE: Assume we have changed the stripesize validation in btrfs-progs'
> check_super() to,
> 
> if (btrfs_super_stripesize(sb) != btrfs_super_sectorsize(sb)) {
> error("invalid stripesize %u", btrfs_super_stripesize(sb));
> goto error_out;
> }
> 
> 
> main()
>  for each device file passed as an argument,
>test_dev_for_mkfs()
>  check_mounted
>check_mounted_where
>  btrfs_scan_one_device
>btrfs_read_dev_super
>  check_super() call will fail for existing filesystems which
> have stripesize set to 4k. All existing filesystem instances will fall into
> this category.
> 
> This error value is pushed up the call stack and this causes the device to
> not get added to the fs_devices_mnt list in check_mounted_where(). Hence we
> would fail to correctly check the mount status of the multi-device btrfs
> filesystems.
> 


We can end up in the following scenario,
- /dev/loop0, /dev/loop1 and /dev/loop2 are mounted as a single
  filesystem. The filesystem was created by an older version of mkfs.btrfs
  which set stripesize to 4k. 
- losetup -a
   /dev/loop0: [0030]:19477 (/root/disk-imgs/file-0.img)  
   /dev/loop1: [0030]:16577 (/root/disk-imgs/file-1.img)  
   /dev/loop2: [64770]:3423229 (/root/disk-imgs/file-2.img)  
- /etc/mtab lists only /dev/loop0
- "losetup /dev/loop4 /root/disk-imgs/file-1.img"
   The new mkfs.btrfs invoked as 'mkfs.btrfs -f /dev/loop4' succeeds even
   though /dev/loop1 has already been mounted and has
   /root/disk-imgs/file-1.img as its backing file.

So IMHO the only solution is to have the stripesize check in check_super() to
allow both '4k' and 'sectorsize' as valid values i.e.

if ((btrfs_super_stripesize(sb) != 4096)
&& (btrfs_super_stripesize(sb) != btrfs_super_sectorsize(sb))) {
error("invalid stripesize %u", btrfs_super_stripesize(sb));
goto error_out;
}
-- 
chandan

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: let super_stripesize match with sectorsize

2016-06-14 Thread Chandan Rajendra
On Tuesday, June 14, 2016 02:33:43 PM Liu Bo wrote:
> Right now stripesize is set to 4096 while sectorsize is set to
> max(4096, pagesize).  However, kernel requires super_stripesize
> to match with sectorsize.
> 
> Reported-by: Eryu Guan 
> Signed-off-by: Liu Bo 
> ---
>  mkfs.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/mkfs.c b/mkfs.c
> index a3a3c14..8d00766 100644
> --- a/mkfs.c
> +++ b/mkfs.c
> @@ -1482,6 +1482,8 @@ int main(int argc, char **argv)
>   }
> 
>   sectorsize = max(sectorsize, (u32)sysconf(_SC_PAGESIZE));
> + stripesize = sectorsize;
> +
>   saved_optind = optind;
>   dev_cnt = argc - optind;
>   if (dev_cnt == 0)

Hello Liu Bo,

We have to fix the following check in check_super() as well,

   if (btrfs_super_stripesize(sb) != 4096) {
error("invalid stripesize %u", btrfs_super_stripesize(sb));
goto error_out;
}

i.e. btrfs_super_stripesize(sb) must be equal to
btrfs_super_sectorsize(sb).

However in btrfs-progs (mkfs.c to be precise) since we had stripesize
hardcoded to 4096, setting stripesize to the value of sectorsize in
mkfs.c will cause the following to occur when mkfs.btrfs is invoked for
devices with existing Btrfs filesystem instances,

NOTE: Assume we have changed the stripesize validation in btrfs-progs'
check_super() to,

if (btrfs_super_stripesize(sb) != btrfs_super_sectorsize(sb)) {
error("invalid stripesize %u", btrfs_super_stripesize(sb));
goto error_out;
}


main()
 for each device file passed as an argument,
   test_dev_for_mkfs()
 check_mounted
   check_mounted_where
 btrfs_scan_one_device
   btrfs_read_dev_super
 check_super() call will fail for existing filesystems which have
 stripesize set to 4k. All existing filesystem instances will fall
 into this category.

This error value is pushed up the call stack and this causes the device to not
get added to the fs_devices_mnt list in check_mounted_where(). Hence we would
fail to correctly check the mount status of the multi-device btrfs
filesystems.

I will try to figure out a solution to this problem.

-- 
chandan

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V19 19/19] Btrfs: subpage-blocksize: Disable compression

2016-06-14 Thread Chandan Rajendra
The subpage-blocksize patchset does not yet support compression. Hence,
the kernel might crash when executing compression code in
subpage-blocksize scenario. This commit disables enabling compression
feature during 'mount' and also when the  user invokes
'chattr +c ' command.

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/ioctl.c |  8 +++-
 fs/btrfs/super.c | 20 
 2 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 77c2aa8..c4fd80e 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -322,6 +322,11 @@ static int btrfs_ioctl_setflags(struct file *file, void 
__user *arg)
} else if (flags & FS_COMPR_FL) {
const char *comp;
 
+   if (root->sectorsize < PAGE_SIZE) {
+   ret = -EINVAL;
+   goto out_drop;
+   }
+
ip->flags |= BTRFS_INODE_COMPRESS;
ip->flags &= ~BTRFS_INODE_NOCOMPRESS;
 
@@ -1342,7 +1347,8 @@ int btrfs_defrag_file(struct inode *inode, struct file 
*file,
return -EINVAL;
 
if (range->flags & BTRFS_DEFRAG_RANGE_COMPRESS) {
-   if (range->compress_type > BTRFS_COMPRESS_TYPES)
+   if ((range->compress_type > BTRFS_COMPRESS_TYPES)
+   || (root->sectorsize < PAGE_SIZE))
return -EINVAL;
if (range->compress_type)
compress_type = range->compress_type;
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index ae30f52..70c0ee3 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -368,6 +368,17 @@ static const match_table_t tokens = {
{Opt_err, NULL},
 };
 
+static int can_enable_compression(struct btrfs_fs_info *fs_info)
+{
+   if (btrfs_super_sectorsize(fs_info->super_copy) < PAGE_SIZE) {
+   btrfs_err(fs_info,
+   "Compression is not supported for subpage-blocksize");
+   return 0;
+   }
+
+   return 1;
+}
+
 /*
  * Regular mount options parser.  Everything that is needed only when
  * reading in a new superblock is parsed here.
@@ -477,6 +488,10 @@ int btrfs_parse_options(struct btrfs_root *root, char 
*options,
if (token == Opt_compress ||
token == Opt_compress_force ||
strcmp(args[0].from, "zlib") == 0) {
+   if (!can_enable_compression(info)) {
+   ret = -EINVAL;
+   goto out;
+   }
compress_type = "zlib";
info->compress_type = BTRFS_COMPRESS_ZLIB;
btrfs_set_opt(info->mount_opt, COMPRESS);
@@ -484,6 +499,10 @@ int btrfs_parse_options(struct btrfs_root *root, char 
*options,
btrfs_clear_opt(info->mount_opt, NODATASUM);
no_compress = 0;
} else if (strcmp(args[0].from, "lzo") == 0) {
+   if (!can_enable_compression(info)) {
+   ret = -EINVAL;
+   goto out;
+   }
compress_type = "lzo";
info->compress_type = BTRFS_COMPRESS_LZO;
btrfs_set_opt(info->mount_opt, COMPRESS);
@@ -806,6 +825,7 @@ int btrfs_parse_options(struct btrfs_root *root, char 
*options,
break;
}
}
+
 check:
/*
 * Extra check for current option against current flag
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   3   4   5   6   7   >